-
Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective
Authors:
Muhammad Aneeq uz Zaman,
Mathieu Laurière,
Alec Koppel,
Tamer Başar
Abstract:
In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainti…
▽ More
In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainties, we formulate the problem in a worst-case (minimax) framework, which is is intractable in general. Thus, we focus on the Linear Quadratic setting to derive benchmark solutions. First, since no standard theory exists for this problem due to the distributed information structure, we utilize the Mean-Field Type Game (MFTG) paradigm to establish guarantees on the solution quality in the sense of achieved Nash equilibrium of the MFTG. This in turn allows us to compare the performance against the corresponding original robust multi-agent control problem. Then, we propose a Receding-horizon Gradient Descent Ascent RL algorithm to find the MFTG Nash equilibrium and we prove a non-asymptotic rate of convergence. Finally, we provide numerical experiments to demonstrate the efficacy of our approach relative to a baseline algorithm.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Authors:
Zida Wu,
Mathieu Lauriere,
Samuel Jia Cong Chua,
Matthieu Geist,
Olivier Pietquin,
Ankur Mehta
Abstract:
Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the desig…
▽ More
Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the design of an additional inner-loop replay buffer, the agents can effectively learn to achieve Nash equilibrium from any distribution, mitigating catastrophic forgetting. The resulting policy can be applied to various initial distributions. Numerical experiments on four canonical examples demonstrate our algorithm has better convergence properties than SOTA algorithms, in particular a DRL version of Fictitious Play for population-dependent policies.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Solving N-player dynamic routing games with congestion: a mean field approach
Authors:
Theophile Cabannes,
Mathieu Lauriere,
Julien Perolat,
Raphael Marinier,
Sertan Girgin,
Sarah Perrin,
Olivier Pietquin,
Alexandre M. Bayen,
Eric Goubault,
Romuald Elie
Abstract:
The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can rep…
▽ More
The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can reproduce heterogeneous departure times and congestion spill back phenomena. However, as Nash equilibrium computations are PPAD-complete, solving the game becomes intractable for large but realistic numbers of vehicles N. Therefore, the corresponding mean field game is also introduced. Experiments were performed on several classical benchmark networks of the traffic community: the Pigou, Braess, and Sioux Falls networks with heterogeneous origin, destination and departure time tuples. The Pigou and the Braess examples reveal that the mean field approximation is generally very accurate and computationally efficient as soon as the number of vehicles exceeds a few dozen. On the Sioux Falls network (76 links, 100 time steps), this approach enables learning traffic dynamics with more than 14,000 vehicles.
△ Less
Submitted 27 October, 2021; v1 submitted 22 October, 2021;
originally announced October 2021.
-
COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability
Authors:
Arthur Charpentier,
Romuald Elie,
Mathieu Laurière,
Viet Chi Tran
Abstract:
We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and de…
▽ More
We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and detection intervention levers. With parametric specification based on literature on COVID-19, we investigate the sensitivities of various quantities on the optimal strategies, taking into account the subtle trade-off between the sanitary and the socio-economic cost of the pandemic, together with the limited capacity level of ICU. We identify the optimal lockdown policy as an intervention structured in 4 successive phases: First a quick and strong lockdown intervention to stop the exponential growth of the contagion; second a short transition phase to reduce the prevalence of the virus; third a long period with full ICU capacity and stable virus prevalence; finally a return to normal social interactions with disappearance of the virus. The optimal scenario hereby avoids the second wave of infection, provided the lockdown is released sufficiently slowly. We also provide optimal intervention measures with increasing ICU capacity, as well as optimization over the effort on detection of infectious and immune individuals. Whenever massive resources are introduced to detect infected individuals, the pressure on social distancing can be released, whereas the impact of detection of immune individuals reveals to be more moderate.
△ Less
Submitted 21 May, 2020; v1 submitted 13 May, 2020;
originally announced May 2020.
-
Mean Field Type Control with Congestion (II): An Augmented Lagrangian Method
Authors:
Yves Achdou,
Mathieu Lauriere
Abstract:
This work deals with a numerical method for solving a mean-field type control problem with congestion. It is the continuation of an article by the same authors, in which suitably defined weak solutions of the system of partial differential equations arising from the model were discussed and existence and uniqueness were proved. Here, the focus is put on numerical methods: a monotone finite differe…
▽ More
This work deals with a numerical method for solving a mean-field type control problem with congestion. It is the continuation of an article by the same authors, in which suitably defined weak solutions of the system of partial differential equations arising from the model were discussed and existence and uniqueness were proved. Here, the focus is put on numerical methods: a monotone finite difference scheme is proposed and shown to have a variational interpretation. Then an Alternating Direction Method of Multipliers for solving the variational problem is addressed. It is based on an augmented Lagrangian. Two kinds of boundary conditions are considered: periodic conditions and more realistic boundary conditions associated to state constrained problems. Various test cases and numerical results are presented.
△ Less
Submitted 7 November, 2016;
originally announced November 2016.