Multi Agent Deep Reinforcement Learning for Autonomous Driving
Reinforcement learning has become a powerful learning framework now capable of learning complex policies in high dimensional environments. At every time step t, the agent observes the current state of the environment, sₜ , and uses the policy it is learning π(s), to inform a decision about which action, aₜ , it should take if it wishes to maximise the total reward in the current episode.
In this article, I will be introducing cooperative multi-agent reinforcement learning and how it is applicable in the autonomous driving problem. I will also be providing a multi-agent training environment which is adapted from the StarCraft Multi-Agent Challenge. These are based on the assumption that you already have some basic knowledge on single-agent reinforcement learning, otherwise it is recommended that you first take a look at some learning resources on the introduction of reinforcement learning (for example DeepMind provides a comprehensive course on reinforcement learning).
Driving is a task which involves social interaction. In the real world, humans use experience and intuition to understand other drivers’ behaviour. Similarly, a single-agent predicts other agents’ behaviour solely based on its input observation. This has a few limitations however. When other agents have drastic change in behaviour, the model usually does not respond well. It is a main challenge to improve generalisation in single-agent reinforcement learning. As the number of agents on the road increases, the environment becomes more unstable as it is now a partially observable Markov decision process, which means that the environment state depends not just on the action of the single-agent, but also dependant on the actions of the other agents. Therefore, it is reasonable to model autonomous driving as a cooperative multi-agent system. Agents learn policies together, and the learnt optimal strategy is for them to be able to make systematic changes to accommodate others’ strategies.
Common cooperative multi-agents settings are divided into centralised settings and decentralised settings. For the centralised setting, the observations of all agents are concatenated to be fed as input to a central controller, which then outputs the actions for all agents. A single policy for the central controller is trained. However, centralised approach often faces scalability limitations. As the number of agents increases, the action-observation space increases proportionally which makes it difficult to train an optimal policy.
The second approach is a decentralised setting, where the policy of each agent is trained individually. During deployment, nearby agents are connected through a communication channel and they will receive each other’s observations. A decentralised approach can handle larger scale multi-agent problems.
Let’s get to the actual training of a reinforcement learning agent in a simulator! I realise that there are not many online tutorials for multi-agent autonomous driving, therefore I will be pointing some resources that I find to be helpful myself. We will be using the CARLA simulator, which is an open source simulator for autonomous driving. Since we would like to use a multi-agent environment, I would recommend using the MACAD framework, which provides a gym-like API that is easy to use. First of all, to set it up, follow the “Getting Started” section in the README.md to download CARLA and set up MACAD. When setting up MACAD, follow the option 2 for developers.
Once you got that set up, let’s introduce a set of “centralised training decentralised execution” algorithms that can be easily integrated into training your autonomous driving agent in the MACAD environment. Navigate to this github repository and follow the “Installation Instructions for macad-gym”. Then, training an agent is very straightforward simply by running the command below.
python src/main.py --config=qmix --env-config=macad
The arguments for config should point to the algorithm that you want to use, in the repository, there are five algorithms supported, namely Independent Q Learning (iql), Monotonic Value Factorisation (qmix), Counterfactual Multi-Agent Policy Gradients (coma), Value Decomposition Network (vdn) and Factorisation with Transformation (qtran). To adjust the RL hyperparameters, simply change the value in the config files.
You might wonder what exactly is centralised training decentralised execution? For such a setting, in general, multi agents are trained together to tune their behaviours towards achieving collective benefits, however during deployment, each agent has their own policies which they follow. As compared to the centralised setting mentioned above, it is a scalable method as many agents could be added to the training process without affecting training convergence. In comparison with the decentralised setting, it does not require communication between vehicles during deployment, which might be a challenge if the connection bandwidth between vehicles is low. This paper provides further detailed insights on centralised training decentralised execution and the algorithms that are involved.
To conclude this article, it introduces how cooperative multi-agent reinforcement learning could be applied to the autonomous driving problem and a multi-agent training environment is provided. Feel free to drop any questions that you have, and happy hacking!