Search | arXiv e-print repository

Multi-Agent Soft Actor-Critic with Global Loss for Autonomous Mobility-on-Demand Fleet Control

Authors: Zeno Woywood, Jasper I. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer

Abstract: We study a sequential decision-making problem for a profit-maximizing operator of an Autonomous Mobility-on-Demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorit… ▽ More We study a sequential decision-making problem for a profit-maximizing operator of an Autonomous Mobility-on-Demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider global actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2402.09992 [pdf, other]

Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

Authors: Tobias Enders, James Harrison, Maximilian Schiffer

Abstract: We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage stochastic combinatorial optimization problems from the operations research domain. In this context, risk-sensitive algorithms promise to learn robust policies. While this field is of general interest to the reinforcement learning community, most studies up-to-date focus on t… ▽ More We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage stochastic combinatorial optimization problems from the operations research domain. In this context, risk-sensitive algorithms promise to learn robust policies. While this field is of general interest to the reinforcement learning community, most studies up-to-date focus on theoretical results rather than real-world performance. With this work, we aim to bridge this gap by formally deriving a novel risk-sensitive deep reinforcement learning algorithm while providing numerical evidence for its efficacy. Specifically, we introduce discrete Soft Actor-Critic for the entropic risk measure by deriving a version of the Bellman equation for the respective Q-values. We establish a corresponding policy improvement result and infer a practical algorithm. We introduce an environment that represents typical contextual multi-stage stochastic combinatorial optimization problems and perform numerical experiments to empirically validate our algorithm's robustness against realistic distribution shifts, without compromising performance on the training distribution. We show that our algorithm is superior to risk-neutral Soft Actor-Critic as well as to two benchmark approaches for robust deep reinforcement learning. Thereby, we provide the first structured analysis on the robustness of reinforcement learning under distribution shifts in the realm of contextual multi-stage stochastic combinatorial optimization problems. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 11 pages, 8 figures

arXiv:2312.08884 [pdf, other]

Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

Authors: Heiko Hoppe, Tobias Enders, Quentin Cappart, Maximilian Schiffer

Abstract: We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with r… ▽ More We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD. △ Less

Submitted 19 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 22 pages, 6 figures, extended version of paper accepted at the 6th Learning for Dynamics & Control Conference (L4DC 2024)

arXiv:2306.16110 [pdf, other]

Deep-subwavelength Phase Retarders at Mid-Infrared Frequencies with van der Waals Flakes

Authors: Michael T. Enders, Mitradeep Sarkar, Aleksandra Deeva, Maxime Giteau, Hanan Herzig Sheinfux, Mehrdad Shokooh-Saremi, Frank H. L. Koppens, Georgia T. Papadakis

Abstract: Phase retardation is a cornerstone of modern optics, yet, at mid-infrared (mid-IR) frequencies, it remains a major challenge due to the scarcity of simultaneously transparent and birefringent crystals. Most materials resonantly absorb due to lattice vibrations occurring at mid-IR frequencies, and natural birefringence is weak, calling for hundreds of microns to millimeters-thick phase retarders fo… ▽ More Phase retardation is a cornerstone of modern optics, yet, at mid-infrared (mid-IR) frequencies, it remains a major challenge due to the scarcity of simultaneously transparent and birefringent crystals. Most materials resonantly absorb due to lattice vibrations occurring at mid-IR frequencies, and natural birefringence is weak, calling for hundreds of microns to millimeters-thick phase retarders for sufficient polarization rotation. We demonstrate mid-IR phase retardation with flakes of $α$-molybdenum trioxide ($α$-MoO$_3$) that are more than ten times thinner than the operational wavelength, achieving 90 degrees polarization rotation within one micrometer of material. We report conversion ratios above 50% in reflection and transmission mode, and wavelength tunability by several micrometers. Our results showcase that exfoliated flakes of low-dimensional crystals can serve as a platform for mid-IR miniaturized integrated polarization control. △ Less

Submitted 29 June, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 8 pages, 5 figures

arXiv:2305.13994 [pdf, other]

Retrieving optical parameters of emerging van der Waals flakes

Authors: Mitradeep Sarkar, Michael T. Enders, Mehrdad Shokooh-Saremi, Kenji Watanabe, Takashi Taniguchi, Hanan Herzig Sheinfux, Frank H. L. Koppens, Georgia Theano Papadakis

Abstract: High-quality low-dimensional layered and van der Waals materials are typically exfoliated, with sample cross sectional areas on the order of tens to hundreds of microns. The small size of flakes makes the experimental characterization of their dielectric properties unsuitable with conventional spectroscopic ellipsometry, due to beam-sample size mismatch and non-uniformities of the crystal axes. Pr… ▽ More High-quality low-dimensional layered and van der Waals materials are typically exfoliated, with sample cross sectional areas on the order of tens to hundreds of microns. The small size of flakes makes the experimental characterization of their dielectric properties unsuitable with conventional spectroscopic ellipsometry, due to beam-sample size mismatch and non-uniformities of the crystal axes. Previously, the experimental measurement of the dielectrirc permittivity of such microcrystals was carried out with near-field tip-based scanning probes. These measurements are sensitive to external conditions like vibrations and temperature, and require non-deterministic numerical fitting to some a priori known model. We present an alternative method to extract the in-plane dielectric permittivity of van der Waals microcrystals, based on identifying reflectance minima in spectroscopic measurements. Our method does not require complex fitting algorithms nor near field tip-based measurements and accommodates for small-area samples. We demonstrate the robustness of our method using hexagonal boron nitride and α-MoO3, and recover their dielectric permittivities that are close to literature values. △ Less

Submitted 29 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 10 pages, 4 figure and 3 tables

arXiv:2212.07313 [pdf, other]

Hybrid Multi-agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

Authors: Tobias Enders, James Harrison, Marco Pavone, Maximilian Schiffer

Abstract: We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. Thereby, we fac… ▽ More We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. Thereby, we factorize the operator's otherwise intractable action space, but still obtain a globally coordinated decision. Experiments based on real-world taxi data show that our method outperforms state of the art benchmarks with respect to performance, stability, and computational tractability. △ Less

Submitted 10 May, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: 20 pages, 7 figures, extended version of paper accepted at the 5th Learning for Dynamics & Control Conference (L4DC 2023)

arXiv:2210.02155 [pdf, other]

doi 10.1103/PhysRevApplied.19.L051002

Design rules for active control of narrowband thermal emission using phase-change materials

Authors: Maxime Giteau, Mitradeep Sarkar, Maria Paula Ayala, Michael T. Enders, Georgia T. Papadakis

Abstract: We propose an analytical framework to design actively tunable narrowband thermal emitters at infrared frequencies. We exemplify the proposed design rules using phase-change materials (PCM), considering dielectric-to-dielectric PCMs (e.g. GSST) and dielectric-to-metal PCMs (e.g. $\mathrm{VO_2}$). Based on these, we numerically illustrate near-unity ON-OFF switching and arbitrarily large spectral sh… ▽ More We propose an analytical framework to design actively tunable narrowband thermal emitters at infrared frequencies. We exemplify the proposed design rules using phase-change materials (PCM), considering dielectric-to-dielectric PCMs (e.g. GSST) and dielectric-to-metal PCMs (e.g. $\mathrm{VO_2}$). Based on these, we numerically illustrate near-unity ON-OFF switching and arbitrarily large spectral shifting between two emission wavelengths, respectively. The proposed systems are lithography-free and consist of one or several thin emitter layers, a spacer layer which includes the PCM, and a back reflector. Our model applies to normal incidence, though we show that the behavior is essentially angle-independent. The presented formalism is general and can be extended to \textit{any} mechanism that modifies the optical properties of a material, such as electrostatic gating or thermo-optical modulation. △ Less

Submitted 18 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: 11 pages, 7 figures

Showing 1–7 of 7 results for author: Enders, T