-
Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning
Authors:
Matej Jusup,
Barna Pásztor,
Tadeusz Janik,
Kenan Zhang,
Francesco Corman,
Andreas Krause,
Ilija Bogunovic
Abstract:
Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent interacting with the infinite population of identical agents instead of considering individual pairwise interactions. In this paper, we address an important generalization where…
▽ More
Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent interacting with the infinite population of identical agents instead of considering individual pairwise interactions. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-M$^3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. Beyond the synthetic swarm motion benchmark, we showcase Safe-M$^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on vehicle trajectory data from a service provider in Shenzhen. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
△ Less
Submitted 27 December, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Active Exploration via Experiment Design in Markov Chains
Authors:
Mojmír Mutný,
Tadeusz Janik,
Andreas Krause
Abstract:
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can on…
▽ More
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can only choose them by selecting a {\em policy} controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm -- \textsc{markov-design} -- that efficiently selects policies whose measurement allocation \emph{provably converges to the optimal one}. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
△ Less
Submitted 9 November, 2022; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Stellar occultations enable milliarcsecond astrometry for Trans-Neptunian objects and Centaurs
Authors:
F. L. Rommel,
F. Braga-Ribas,
J. Desmars,
J. I. B. Camargo,
J. L. Ortiz,
B. Sicardy,
R. Vieira-Martins,
M. Assafin,
P. Santos-Sanz,
R. Duffard,
E. Fernández-Valenzuela,
J. Lecacheux,
B. E. Morgado,
G. Benedetti-Rossi,
A. R. Gomes-Júnior,
C. L. Pereira,
D. Herald,
W. Hanna,
J. Bradshaw,
N. Morales,
J. Brimacombe,
A. Burtovoi,
T. Carruthers,
J. R. de Barros,
M. Fiori
, et al. (44 additional authors not shown)
Abstract:
Trans-Neptunian objects (TNOs) and Centaurs are remnants of our planetary system formation, and their physical properties have invaluable information for evolutionary theories. Stellar occultation is a ground-based method for studying these small bodies and has presented exciting results. These observations can provide precise profiles of the involved body, allowing an accurate determination of it…
▽ More
Trans-Neptunian objects (TNOs) and Centaurs are remnants of our planetary system formation, and their physical properties have invaluable information for evolutionary theories. Stellar occultation is a ground-based method for studying these small bodies and has presented exciting results. These observations can provide precise profiles of the involved body, allowing an accurate determination of its size and shape. The goal is to show that even single-chord detections of TNOs allow us to measure their milliarcsecond astrometric positions in the reference frame of the Gaia second data release (DR2). Accurated ephemerides can then be generated, allowing predictions of stellar occultations with much higher reliability. We analyzed data from stellar occultations to obtain astrometric positions of the involved bodies. The events published before the Gaia era were updated so that the Gaia DR2 catalog is the reference. Previously determined sizes were used to calculate the position of the object center and its corresponding error with respect to the detected chord and the International Celestial Reference System (ICRS) propagated Gaia DR2 star position. We derive 37 precise astrometric positions for 19 TNOs and 4 Centaurs. Twenty-one of these events are presented here for the first time. Although about 68\% of our results are based on single-chord detection, most have intrinsic precision at the submilliarcsecond level. Lower limits on the diameter and shape constraints for a few bodies are also presented as valuable byproducts. Using the Gaia DR2 catalog, we show that even a single detection of a stellar occultation allows improving the object ephemeris significantly, which in turn enables predicting a future stellar occultation with high accuracy. Observational campaigns can be efficiently organized with this help, and may provide a full physical characterization of the involved object.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Pluto's lower atmosphere and pressure evolution from ground-based stellar occultations, 1988-2016
Authors:
E. Meza,
B. Sicardy,
M. Assafin,
J. L. Ortiz,
T. Bertrand,
E. Lellouch,
J. Desmars,
F. Forget,
D. Bérard,
A. Doressoundiram,
J. Lecacheux,
J. Marques Oliveira,
F. Roques,
T. Widemann,
F. Colas,
F. Vachier,
S. Renner,
R. Leiva,
F. Braga-Ribas,
G. Benedetti-Rossi,
J. I. B. Camargo,
A. Dias-Oliveira,
B. Morgado,
A. R. Gomes-Júnior,
R. Vieira-Martins
, et al. (145 additional authors not shown)
Abstract:
Context. Pluto's tenuous nitrogen (N2) atmosphere undergoes strong seasonal effects due to high obliquity and orbital eccentricity, and has been recently (July 2015) observed by the New Horizons spacecraft. Goals are (i) construct a well calibrated record of the seasonal evolution of surface pressure on Pluto and (ii) constrain the structure of the lower atmosphere using a central flash observed i…
▽ More
Context. Pluto's tenuous nitrogen (N2) atmosphere undergoes strong seasonal effects due to high obliquity and orbital eccentricity, and has been recently (July 2015) observed by the New Horizons spacecraft. Goals are (i) construct a well calibrated record of the seasonal evolution of surface pressure on Pluto and (ii) constrain the structure of the lower atmosphere using a central flash observed in 2015. Method: eleven stellar occultations by Pluto observed between 2002 and 2016 are used to retrieve atmospheric profiles (density, pressure, temperature) between $\sim$5 km and $\sim$380 km altitude levels (i.e. pressures from about 10 microbar to 10 nanobar). Results: (i) Pressure has suffered a monotonic increase from 1988 to 2016, that is compared to a seasonal volatile transport model, from which tight constraints on a combination of albedo and emissivity of N2 ice are derived; (ii) A central flash observed on 2015 June 29 is consistent with New Horizons REX profiles, provided that (a) large diurnal temperature variations (not expected by current models) occur over Sputnik Planitia and/or (b) hazes with tangential optical depth of about 0.3 are present at 4-7 km altitude levels and/or (c) the nominal REX density values are overestimated by an implausibly large factor of about 20% and/or (d) higher terrains block part of the flash in the Charon facing hemisphere.
△ Less
Submitted 6 March, 2019;
originally announced March 2019.