Deep-RL for compliant control of a series-elastic snake robot

Decentralized control architectures, such as those conventionally defined by central pattern generators, independently coordinate spatially distributed portions of articulated bodies to achieve system-level objectives. State of the art distributed algorithms for reinforcement learning employ a different but conceptually related idea; independent agents simultaneously coordinating their own behaviors in parallel environments while asynchronously updating the policy of a system- or, rather, meta-level agent. This work, to the best of the authors’ knowledge, is the first to explicitly explore the potential relationship between the underlying concepts in homogeneous decentralized control for articulated locomotion and distributed learning. We present an approach that leverages the structure of the asynchronous advantage actor-critic (A3C) algorithm to provide a natural framework for learning decentralized control policies on a single platform. Our primary contribution shows an individual agent in the A3C algorithm can be defined by an independently controlled portion of the robot’s body, thus enabling distributed learning on a single platform for efficient hardware implementation. To this end, we showed how the system is trained offline using hardware experiments implementing an autonomous decentralized compliant control framework. Our experimental results showed that the trained agent outperforms the compliant control baseline by more than $40\%$ in terms of steady progression through a series of randomized, highly cluttered evaluation environments.