-
Cryogenic sapphire optical reference cavity with crystalline coatings at $\mathrm{ 1 \times 10^{-16}}$ fractional instability
Authors:
Jose Valencia,
George Iskander,
Nicholas V. Nardelli,
David R. Leibrandt,
David B. Hume
Abstract:
The frequency stability of a laser locked to an optical reference cavity is fundamentally limited by thermal noise in the cavity length. These fluctuations are linked to material dissipation, which depends both on the temperature of the optical components and the material properties. Here, the design and experimental characterization of a sapphire optical cavity operated at 10 K with crystalline c…
▽ More
The frequency stability of a laser locked to an optical reference cavity is fundamentally limited by thermal noise in the cavity length. These fluctuations are linked to material dissipation, which depends both on the temperature of the optical components and the material properties. Here, the design and experimental characterization of a sapphire optical cavity operated at 10 K with crystalline coatings at 1069 nm is presented. Theoretical estimates of the thermo-mechanical noise indicate a thermal noise floor below $\mathrm{4.5\times10^{-18}}$. Major technical noise contributions including vibrations, temperature fluctuations, and residual amplitude modulation are characterized in detail. The short-term performance is measured via a three-cornered hat analysis with two other cavity-stabilized lasers, yielding a noise floor of $1\times10^{-16}$. The long-term performance is measured against an optical lattice clock, indicating cavity stability at the level of $2\times10^{-15}$ for averaging times up to 10,000 s.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Can Reinforcement Learning support policy makers? A preliminary study with Integrated Assessment Models
Authors:
Theodore Wolf,
Nantas Nardelli,
John Shawe-Taylor,
Maria Perez-Ortiz
Abstract:
Governments around the world aspire to ground decision-making on evidence. Many of the foundations of policy making - e.g. sensing patterns that relate to societal needs, develo** evidence-based programs, forecasting potential outcomes of policy changes, and monitoring effectiveness of policy programs - have the potential to benefit from the use of large-scale datasets or simulations together wi…
▽ More
Governments around the world aspire to ground decision-making on evidence. Many of the foundations of policy making - e.g. sensing patterns that relate to societal needs, develo** evidence-based programs, forecasting potential outcomes of policy changes, and monitoring effectiveness of policy programs - have the potential to benefit from the use of large-scale datasets or simulations together with intelligent algorithms. These could, if designed and deployed in a way that is well grounded on scientific evidence, enable a more comprehensive, faster, and rigorous approach to policy making. Integrated Assessment Models (IAM) is a broad umbrella covering scientific models that attempt to link main features of society and economy with the biosphere into one modelling framework. At present, these systems are probed by policy makers and advisory groups in a hypothesis-driven manner. In this paper, we empirically demonstrate that modern Reinforcement Learning can be used to probe IAMs and explore the space of solutions in a more principled manner. While the implication of our results are modest since the environment is simplistic, we believe that this is a step** stone towards more ambitious use cases, which could allow for effective exploration of policies and understanding of their consequences and limitations.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Optical and microwave metrology at the 10-18 level with an Er/Yb:glass frequency comb
Authors:
Nicholas V. Nardelli,
Holly Leopardi,
Thomas R. Schibli,
Tara M. Fortier
Abstract:
Optical frequency combs are an essential tool for precision metrology experiments ranging in application from remote spectroscopic sensing of trace gases to the characterization and comparison of optical atomic clocks for precision time-kee** and searches for physics beyond the standard model. Here we describe the architecture and fully characterize a telecom-band, self-modelocking frequency com…
▽ More
Optical frequency combs are an essential tool for precision metrology experiments ranging in application from remote spectroscopic sensing of trace gases to the characterization and comparison of optical atomic clocks for precision time-kee** and searches for physics beyond the standard model. Here we describe the architecture and fully characterize a telecom-band, self-modelocking frequency comb based on a free-space laser with an Er/Yb co-doped glass gain medium. The laser provides a robust and cost-effective alternative to Er:fiber laser based frequency combs, while offering stability and noise performance similar to Ti:sapphire laser systems. Finally, we demonstrate the Er/Yb:glass frequency comb's utility in high-stability frequency synthesis using two ultra-stable optical references at 1157 nm and 1070 nm and in low-noise photonic microwave generation by dividing these references to the microwave domain.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
Insights From the NeurIPS 2021 NetHack Challenge
Authors:
Eric Hambro,
Sharada Mohanty,
Dmitrii Babaev,
Minwoo Byeon,
Dipam Chakraborty,
Edward Grefenstette,
Minqi Jiang,
Dae** Jo,
Anssi Kanervisto,
Jongmin Kim,
Sungwoong Kim,
Robert Kirk,
Vitaly Kurin,
Heinrich Küttler,
Taehwon Kwon,
Donghoon Lee,
Vegard Mella,
Nantas Nardelli,
Ivan Nazarov,
Nikita Ovsov,
Jack Parker-Holder,
Roberta Raileanu,
Karolis Ramanauskas,
Tim Rocktäschel,
Danielle Rothermel
, et al. (4 additional authors not shown)
Abstract:
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with develo** a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challeng…
▽ More
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with develo** a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack's suitability as a long-term benchmark for AI research.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
10 GHz Generation with Ultra-Low Phase Noise via the Transfer Oscillator Technique
Authors:
Nicholas V. Nardelli,
Tara M. Fortier,
Marco Pomponio,
Esther Baumann,
Craig Nelson,
Thomas R. Schibli,
Archita Hati
Abstract:
Coherent frequency division of high-stability optical sources permits the extraction of microwave signals with ultra-low phase noise, enabling their application to systems with stringent timing precision. To date, the highest performance systems have required tight phase stabilization of laboratory grade optical frequency combs to Fabry-Perot optical reference cavities for faithful optical-to-micr…
▽ More
Coherent frequency division of high-stability optical sources permits the extraction of microwave signals with ultra-low phase noise, enabling their application to systems with stringent timing precision. To date, the highest performance systems have required tight phase stabilization of laboratory grade optical frequency combs to Fabry-Perot optical reference cavities for faithful optical-to-microwave frequency division. This requirement limits the technology to highly-controlled laboratory environments. Here, we employ a transfer oscillator technique, which employs digital and RF analog electronics to coherently suppress additive optical frequency comb noise. This relaxes the stabilization requirements and allows for the extraction of multiple independent microwave outputs from a single comb, while at the same time, permitting low-noise microwave generation from combs with higher noise profiles. Using this method we transferred the phase stability of two high-Finesse optical sources at 1157 nm and 1070 nm to two independent 10 GHz signals using a single frequency comb. We demonstrated absolute phase noise below -106 dBc/Hz at 1-Hz from carrier with corresponding 1 second fractional frequency instability below $2\times10^{-15}$. Finally, the latter phase noise levels were attainable for comb linewidths broadened up to 2 MHz, demonstrating the potential for out-of lab use with low SWaP lasers.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Optical coherence between atomic species at the second scale: improved clock comparisons via differential spectroscopy
Authors:
May E. Kim,
William F. McGrew,
Nicholas V. Nardelli,
Ethan R. Clements,
Youssef S. Hassan,
Xiaogang Zhang,
Jose L. Valencia,
Holly Leopardi,
David B. Hume,
Tara M. Fortier,
Andrw D. Ludlow,
David R. Leibrandt
Abstract:
Comparisons of high-accuracy optical atomic clocks \cite{Ludlow2015} are essential for precision tests of fundamental physics \cite{Safronova2018}, relativistic geodesy \cite{McGrew2018, Grotti2018, Delva2019}, and the anticipated redefinition of the SI second \cite{Riehle2018}. The scientific reach of these applications is restricted by the statistical precision of interspecies comparison measure…
▽ More
Comparisons of high-accuracy optical atomic clocks \cite{Ludlow2015} are essential for precision tests of fundamental physics \cite{Safronova2018}, relativistic geodesy \cite{McGrew2018, Grotti2018, Delva2019}, and the anticipated redefinition of the SI second \cite{Riehle2018}. The scientific reach of these applications is restricted by the statistical precision of interspecies comparison measurements. The instability of individual clocks is limited by the finite coherence time of the optical local oscillator (OLO), which bounds the maximum atomic interrogation time. In this letter, we experimentally demonstrate differential spectroscopy \cite{Hume2016}, a comparison protocol that enables interrogating beyond the OLO coherence time. By phase-coherently linking a zero-dead-time (ZDT) \cite{Schioppo2017} Yb optical lattice clock with an Al$^+$ single-ion clock via an optical frequency comb and performing synchronised Ramsey spectroscopy, we show an improvement in comparison instability relative to our previous result \cite{network2020frequency} of nearly an order of magnitude. To our knowledge, this result represents the most stable interspecies clock comparison to date.
△ Less
Submitted 24 November, 2021; v1 submitted 20 September, 2021;
originally announced September 2021.
-
WordCraft: An Environment for Benchmarking Commonsense Agents
Authors:
Minqi Jiang,
Jelena Luketina,
Nantas Nardelli,
Pasquale Minervini,
Philip H. S. Torr,
Shimon Whiteson,
Tim Rocktäschel
Abstract:
The ability to quickly solve a wide range of real-world tasks requires a commonsense understanding of the world. Yet, how to best extract such knowledge from natural language corpora and integrate it with reinforcement learning (RL) agents remains an open challenge. This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and p…
▽ More
The ability to quickly solve a wide range of real-world tasks requires a commonsense understanding of the world. Yet, how to best extract such knowledge from natural language corpora and integrate it with reinforcement learning (RL) agents remains an open challenge. This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment. To better enable research on agents making use of commonsense knowledge, we propose WordCraft, an RL environment based on Little Alchemy 2. This lightweight environment is fast to run and built upon entities and relations inspired by real-world semantics. We evaluate several representation learning methods on this new benchmark and propose a new method for integrating knowledge graphs with an RL agent.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
The NetHack Learning Environment
Authors:
Heinrich Küttler,
Nantas Nardelli,
Alexander H. Miller,
Roberta Raileanu,
Marco Selvatici,
Edward Grefenstette,
Tim Rocktäschel
Abstract:
Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging…
▽ More
Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source at https://github.com/facebookresearch/nle.
△ Less
Submitted 1 December, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Simulation-Based Inference for Global Health Decisions
Authors:
Christian Schroeder de Witt,
Bradley Gram-Hansen,
Nantas Nardelli,
Andrew Gambardella,
Rob Zinkov,
Puneet Dokania,
N. Siddharth,
Ana Belen Espinosa-Gonzalez,
Ara Darzi,
Philip Torr,
Atılım Güneş Baydin
Abstract:
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recen…
▽ More
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are develo** software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Lessons from reinforcement learning for biological representations of space
Authors:
Alex Muryy,
N. Siddharth,
Nantas Nardelli,
Philip H. S. Torr,
Andrew Glennerster
Abstract:
Neuroscientists postulate 3D representations in the brain in a variety of different coordinate frames (e.g. 'head-centred', 'hand-centred' and 'world-based'). Recent advances in reinforcement learning demonstrate a quite different approach that may provide a more promising model for biological representations underlying spatial perception and navigation. In this paper, we focus on reinforcement le…
▽ More
Neuroscientists postulate 3D representations in the brain in a variety of different coordinate frames (e.g. 'head-centred', 'hand-centred' and 'world-based'). Recent advances in reinforcement learning demonstrate a quite different approach that may provide a more promising model for biological representations underlying spatial perception and navigation. In this paper, we focus on reinforcement learning methods that reward an agent for arriving at a target image without any attempt to build up a 3D 'map'. We test the ability of this type of representation to support geometrically consistent spatial tasks such as interpolating between learned locations using decoding of feature vectors. We introduce a hand-crafted representation that has, by design, a high degree of geometric consistency and demonstrate that, in this case, information about the persistence of features as the camera translates (e.g. distant features persist) can improve performance on the geometric tasks. These examples avoid Cartesian (in this case, 2D) representations of space. Non-Cartesian, learned representations provide an important stimulus in neuroscience to the search for alternatives to a 'cognitive map'.
△ Less
Submitted 6 July, 2020; v1 submitted 13 December, 2019;
originally announced December 2019.
-
MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions
Authors:
Viswanath Sivakumar,
Olivier Delalleau,
Tim Rocktäschel,
Alexander H. Miller,
Heinrich Küttler,
Nantas Nardelli,
Mike Rabbat,
Joelle Pineau,
Sebastian Riedel
Abstract:
Effective network congestion control strategies are key to kee** the Internet (or any large computer network) operational. Network congestion control has been dominated by hand-crafted heuristics for decades. Recently, ReinforcementLearning (RL) has emerged as an alternative to automatically optimize such control strategies. Research so far has primarily considered RL interfaces which block the…
▽ More
Effective network congestion control strategies are key to kee** the Internet (or any large computer network) operational. Network congestion control has been dominated by hand-crafted heuristics for decades. Recently, ReinforcementLearning (RL) has emerged as an alternative to automatically optimize such control strategies. Research so far has primarily considered RL interfaces which block the sender while an agent considers its next action. This is largely an artifact of building on top of frameworks designed for RL in games (e.g. OpenAI Gym). However, this does not translate to real-world networking environments, where a network sender waiting on a policy without sending data leads to under-utilization of bandwidth. We instead propose to formulate congestion control with an asynchronous RL agent that handles delayed actions. We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform. The source code is publicly available at https://github.com/facebookresearch/mvfst-rl.
△ Less
Submitted 26 May, 2021; v1 submitted 9 October, 2019;
originally announced October 2019.
-
TorchBeast: A PyTorch Platform for Distributed RL
Authors:
Heinrich Küttler,
Nantas Nardelli,
Thibaut Lavril,
Marco Selvatici,
Viswanath Sivakumar,
Tim Rocktäschel,
Edward Grefenstette
Abstract:
TorchBeast is a platform for reinforcement learning (RL) research in PyTorch. It implements a version of the popular IMPALA algorithm for fast, asynchronous, parallel training of RL agents. Additionally, TorchBeast has simplicity as an explicit design goal: We provide both a pure-Python implementation ("MonoBeast") as well as a multi-machine high-performance version ("PolyBeast"). In the latter, p…
▽ More
TorchBeast is a platform for reinforcement learning (RL) research in PyTorch. It implements a version of the popular IMPALA algorithm for fast, asynchronous, parallel training of RL agents. Additionally, TorchBeast has simplicity as an explicit design goal: We provide both a pure-Python implementation ("MonoBeast") as well as a multi-machine high-performance version ("PolyBeast"). In the latter, parts of the implementation are written in C++, but all parts pertaining to machine learning are kept in simple Python using PyTorch, with the environments provided using the OpenAI Gym interface. This enables researchers to conduct scalable RL research using TorchBeast without any programming knowledge beyond Python and PyTorch. In this paper, we describe the TorchBeast design principles and implementation and demonstrate that it performs on-par with IMPALA on Atari. TorchBeast is released as an open-source package under the Apache 2.0 license and is available at \url{https://github.com/facebookresearch/torchbeast}.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
A Survey of Reinforcement Learning Informed by Natural Language
Authors:
Jelena Luketina,
Nantas Nardelli,
Gregory Farquhar,
Jakob Foerster,
Jacob Andreas,
Edward Grefenstette,
Shimon Whiteson,
Tim Rocktäschel
Abstract:
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making pr…
▽ More
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making problems. We thus argue that the time is right to investigate a tight integration of natural language understanding into RL in particular. We survey the state of the field, including work on instruction following, text games, and learning from textual domain knowledge. Finally, we call for the development of new environments as well as further investigation into the potential uses of recent Natural Language Processing (NLP) techniques for such tasks.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Multitask Soft Option Learning
Authors:
Maximilian Igl,
Andrew Gambardella,
**ke He,
Nantas Nardelli,
N. Siddharth,
Wendelin Böhmer,
Shimon Whiteson
Abstract:
We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This ''soft'' version of options avoids several instabilities during training in a multitask setting, and provides a natural way to learn both intra-option policie…
▽ More
We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This ''soft'' version of options avoids several instabilities during training in a multitask setting, and provides a natural way to learn both intra-option policies and their terminations. Furthermore, it allows fine-tuning of options for new tasks without forgetting their learned policies, leading to faster training without reducing the expressiveness of the hierarchical policy. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines.
△ Less
Submitted 21 June, 2020; v1 submitted 1 April, 2019;
originally announced April 2019.
-
The StarCraft Multi-Agent Challenge
Authors:
Mikayel Samvelyan,
Tabish Rashid,
Christian Schroeder de Witt,
Gregory Farquhar,
Nantas Nardelli,
Tim G. J. Rudner,
Chia-Man Hung,
Philip H. S. Torr,
Jakob Foerster,
Shimon Whiteson
Abstract:
In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such p…
▽ More
In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.
△ Less
Submitted 9 December, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Value Propagation Networks
Authors:
Nantas Nardelli,
Gabriel Synnaeve,
Zeming Lin,
Pushmeet Kohli,
Philip H. S. Torr,
Nicolas Usunier
Abstract:
We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes s…
▽ More
We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.
△ Less
Submitted 25 March, 2019; v1 submitted 28 May, 2018;
originally announced May 2018.
-
Counterfactual Multi-Agent Policy Gradients
Authors:
Jakob Foerster,
Gregory Farquhar,
Triantafyllos Afouras,
Nantas Nardelli,
Shimon Whiteson
Abstract:
Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) pol…
▽ More
Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while kee** the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
△ Less
Submitted 14 December, 2017; v1 submitted 24 May, 2017;
originally announced May 2017.
-
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
Authors:
Jakob Foerster,
Nantas Nardelli,
Gregory Farquhar,
Triantafyllos Afouras,
Philip H. S. Torr,
Pushmeet Kohli,
Shimon Whiteson
Abstract:
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that indep…
▽ More
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
△ Less
Submitted 21 May, 2018; v1 submitted 28 February, 2017;
originally announced February 2017.
-
Playing Doom with SLAM-Augmented Deep Reinforcement Learning
Authors:
Shehroze Bhatti,
Alban Desmaison,
Ondrej Miksik,
Nantas Nardelli,
N. Siddharth,
Philip H. S. Torr
Abstract:
A number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that ind…
▽ More
A number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that indicates how humans employ a variety of semantic concepts and abstractions (object categories, localisation, etc.) to reason about the world, we build an agent-model that incorporates such abstractions into its policy-learning framework. We augment the raw image input to a Deep Q-Learning Network (DQN), by adding details of objects and structural elements encountered, along with the agent's localisation. The different components are automatically extracted and composed into a topological representation using on-the-fly object detection and 3D-scene reconstruction.We evaluate the efficacy of our approach in Doom, a 3D first-person combat game that exhibits a number of challenges discussed, and show that our augmented framework consistently learns better, more effective policies.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.
-
TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games
Authors:
Gabriel Synnaeve,
Nantas Nardelli,
Alex Auvolat,
Soumith Chintala,
Timothée Lacroix,
Zeming Lin,
Florian Richoux,
Nicolas Usunier
Abstract:
We present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft.
We present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft.
△ Less
Submitted 3 November, 2016; v1 submitted 1 November, 2016;
originally announced November 2016.
-
Counterfactual Reasoning about Intent for Interactive Navigation in Dynamic Environments
Authors:
A. Bordallo,
F. Previtali,
N. Nardelli,
S. Ramamoorthy
Abstract:
Many modern robotics applications require robots to function autonomously in dynamic environments including other decision making agents, such as people or other robots. This calls for fast and scalable interactive motion planning. This requires models that take into consideration the other agent's intended actions in one's own planning. We present a real-time motion planning framework that brings…
▽ More
Many modern robotics applications require robots to function autonomously in dynamic environments including other decision making agents, such as people or other robots. This calls for fast and scalable interactive motion planning. This requires models that take into consideration the other agent's intended actions in one's own planning. We present a real-time motion planning framework that brings together a few key components including intention inference by reasoning counterfactually about potential motion of the other agents as they work towards different goals. By using a light-weight motion model, we achieve efficient iterative planning for fluid motion when avoiding pedestrians, in parallel with goal inference for longer range movement prediction. This inference framework is coupled with a novel distributed visual tracking method that provides reliable and robust models for the current belief-state of the monitored environment. This combined approach represents a computationally efficient alternative to previously studied policy learning methods that often require significant offline training or calibration and do not yet scale to densely populated environments. We validate this framework with experiments involving multi-robot and human-robot navigation. We further validate the tracker component separately on much larger scale unconstrained pedestrian data sets.
△ Less
Submitted 26 October, 2016;
originally announced October 2016.