Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning
Authors:
Matej Jusup,
Barna Pásztor,
Tadeusz Janik,
Kenan Zhang,
Francesco Corman,
Andreas Krause,
Ilija Bogunovic
Abstract:
Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent interacting with the infinite population of identical agents instead of considering individual pairwise interactions. In this paper, we address an important generalization where…
▽ More
Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent interacting with the infinite population of identical agents instead of considering individual pairwise interactions. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-M$^3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. Beyond the synthetic swarm motion benchmark, we showcase Safe-M$^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on vehicle trajectory data from a service provider in Shenzhen. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
△ Less
Submitted 27 December, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
Bayesian Origin-Destination Estimation in Networked Transit Systems using Nodal In- and Outflow Counts
Authors:
Steffen O. P. Blume,
Francesco Corman,
Giovanni Sansavini
Abstract:
We propose a Bayesian inference approach for static Origin-Destination (OD)-estimation in large-scale networked transit systems. The approach finds posterior distribution estimates of the OD-coefficients, which describe the relative proportions of passengers travelling between origin and destination locations, via a Hamiltonian Monte Carlo sampling procedure. We suggest two different inference mod…
▽ More
We propose a Bayesian inference approach for static Origin-Destination (OD)-estimation in large-scale networked transit systems. The approach finds posterior distribution estimates of the OD-coefficients, which describe the relative proportions of passengers travelling between origin and destination locations, via a Hamiltonian Monte Carlo sampling procedure. We suggest two different inference model formulations: the instantaneous-balance and average-delay model. We discuss both models' sensitivity to various count observation properties, and establish that the average-delay model is generally more robust in determining the coefficient posteriors. The instantaneous-balance model, however, requires lower resolution count observations and produces comparably accurate estimates as the average-delay model, pending that count observations are only moderately interfered by trend fluctuations or the truncation of the observation window, and sufficient number of dispersed data records are available. We demonstrate that the Bayesian posterior distribution estimates provide quantifiable measures of the estimation uncertainty and prediction quality of the model, whereas the point estimates obtained from an alternative constrained quadratic programming optimisation approach only provide the residual errors between the predictions and observations. Moreover, the Bayesian approach proves more robust in scaling to high-dimensional underdetermined problems. The Bayesian instantaneous-balance OD-coefficient posteriors are determined for the New York City (NYC) subway network, based on several years of entry and exit count observations recorded at station turnstiles across the network. The average-delay model proves intractable on the real-world test scenario, given its computational time complexity and the incompleteness as well as coarseness of the turnstile records.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.