Distributed RL for Collaborative Policies

Recent years have seen massive leaps forward in single-agent artificial intelligence, in particular in deep-reinforcement learning (deep-RL). A large community has been focusing on multi-agent reinforcement learning (MARL), interested in extending these single-agent approaches to multi-agent systems. However, natural extensions of single-agent approaches fail when applied to multi-agent problems. The joint MARL problem can rarely be solved, mainly due to high-dimensional state-action space that has to be explored by agents. In this context, the community has been striving for distributed solutions. In this project, we develop a fully-distributed, scalable learning framework, where multiple agents learn a common, collaborative policy in a shared environment, that can then be deployed on an arbitrary number of agents with little to no extra training. We focus on a multi-robot construction problem, inspired by Harvard’s TERMES project, where simple robots must gather, carry and place simple block elements to build a user-specified 3D structure. This problem, cast in the RL framework, is a very difficult game with sparse and delayed rewards, as robots need to build scaffolding to reach the higher levels of the structure to complete construction. We use a fully decentralized architecture, where each agent runs an identical copy of the policy without explicitly communicating with other agents, and yet a common goal is achieved. The policy is learned centrally, meaning that all agents contribute to it during training. Once learned, the policy can be implemented on an arbitrary number of agents, and lets each agent view others as moving features in the system. To this end, we extend the single-agent asynchronous advantage actor-critic (A3C) algorithm to let multiple agents learn a homogeneous, collaborative policy in a shared environment.

Distributed Reinforcement Learning for Multi-Robot Decentralized Collective Construction

People: Guillaume Sartoretti, Bill Paivine ,Yue (Holmes) Wu, Yunfei Shi

MRConstructionF1