-
A Detection of Red Noise in PSR J1824$-$2452A and Projections for PSR B1937+21 using NICER X-ray Timing Data
Authors:
Jeffrey S. Hazboun,
Jack Crump,
Andrea N. Lommen,
Sergio Montano,
Samantha J. H. Berry,
Jesse Zeldes,
Elizabeth Teng,
Paul S. Ray,
Matthew Kerr,
Zaven Arzoumanian,
Slavko Bogdanov,
Julia Deneva,
Natalia Lewandowska,
Craig B. Markwardt,
Scott Ransom,
Teruaki Enoto,
Kent S. Wood,
Keith C. Gendreau,
David A. Howe,
Aditya Parthasarathy
Abstract:
We have used X-ray data from the Neutron Star Interior Composition Explorer (NICER) to search for long time-scale, correlated variations ("red noise") in the pulse times of arrival from the millisecond pulsars PSR J1824$-$2452A and PSR B1937+21. These data more closely track intrinsic noise because X-rays are unaffected by the radio-frequency dependent propagation effects of the interstellar mediu…
▽ More
We have used X-ray data from the Neutron Star Interior Composition Explorer (NICER) to search for long time-scale, correlated variations ("red noise") in the pulse times of arrival from the millisecond pulsars PSR J1824$-$2452A and PSR B1937+21. These data more closely track intrinsic noise because X-rays are unaffected by the radio-frequency dependent propagation effects of the interstellar medium. Our Bayesian search methodology yields strong evidence (natural log Bayes factor of $9.634 \pm 0.016$) for red noise in PSR J1824$-$2452A, but is inconclusive for PSR B1937+21. In the interest of future X-ray missions, we devise and implement a method to simulate longer and higher precision X-ray datasets to determine the timing baseline necessary to detect red noise. We find that the red noise in PSR B1937+21 can be reliably detected in a 5-year mission with a time-of-arrival (TOA) error of 2 microseconds and an observing cadence of 20 observations per month compared to the 5 microsecond TOA error and 11 observations per month that NICER currently achieves in PSR B1937+21. We investigate detecting red noise in PSR B1937+21 with other combinations of observing cadences and TOA errors. We also find that an injected stochastic gravitational wave background (GWB) with an amplitude of $A_{\rm GWB}=2\times10^{-15}$ and spectral index of $γ_{\rm GWB}=13/3$ can be detected in a pulsar with similar TOA precision to PSR B1937+21, but with no additional red noise, in a 10-year mission that observes the pulsar 15 times per month and has an average TOA error of 1 microsecond.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning
Authors:
Andrew Cohen,
Ervin Teng,
Vincent-Pierre Berges,
Ruo-** Dong,
Hunter Henry,
Marwan Mattar,
Alexander Zook,
Sujoy Ganguly
Abstract:
The creation and destruction of agents in cooperative multi-agent reinforcement learning (MARL) is a critically under-explored area of research. Current MARL algorithms often assume that the number of agents within a group remains fixed throughout an experiment. However, in many practical problems, an agent may terminate before their teammates. This early termination issue presents a challenge: th…
▽ More
The creation and destruction of agents in cooperative multi-agent reinforcement learning (MARL) is a critically under-explored area of research. Current MARL algorithms often assume that the number of agents within a group remains fixed throughout an experiment. However, in many practical problems, an agent may terminate before their teammates. This early termination issue presents a challenge: the terminated agent must learn from the group's success or failure which occurs beyond its own existence. We refer to propagating value from rewards earned by remaining teammates to terminated agents as the Posthumous Credit Assignment problem. Current MARL methods handle this problem by placing these agents in an absorbing state until the entire group of agents reaches a termination condition. Although absorbing states enable existing algorithms and APIs to handle terminated agents without modification, practical training efficiency and resource use problems exist.
In this work, we first demonstrate that sample complexity increases with the quantity of absorbing states in a toy supervised learning task for a fully connected network, while attention is more robust to variable size input. Then, we present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states. Finally, we demonstrate that this novel architecture significantly outperforms the standard architecture on tasks in which agents are created or destroyed within episodes as well as standard multi-agent coordination tasks.
△ Less
Submitted 6 June, 2022; v1 submitted 10 November, 2021;
originally announced November 2021.
-
Autonomous Curiosity for Real-Time Training Onboard Robotic Agents
Authors:
Ervin Teng,
Bob Iannucci
Abstract:
Learning requires both study and curiosity. A good learner is not only good at extracting information from the data given to it, but also skilled at finding the right new information to learn from. This is especially true when a human operator is required to provide the ground truth - such a source should only be queried sparingly. In this work, we address the problem of curiosity as it relates to…
▽ More
Learning requires both study and curiosity. A good learner is not only good at extracting information from the data given to it, but also skilled at finding the right new information to learn from. This is especially true when a human operator is required to provide the ground truth - such a source should only be queried sparingly. In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a robotic platform, one where motion produces new views of the subject. We propose a deep reinforcement learning approach that decides when to ask the human user for ground truth, and when to move. Through a series of experiments, we demonstrate that our agent learns a movement and request policy that is at least 3x more effective at using human user interactions to train an object detector than untrained approaches, and is generalizable to a variety of subjects and environments.
△ Less
Submitted 29 August, 2021;
originally announced September 2021.
-
Learning to Learn in Simulation
Authors:
Ervin Teng,
Bob Iannucci
Abstract:
Deep learning often requires the manual collection and annotation of a training set. On robotic platforms, can we partially automate this task by training the robot to be curious, i.e., to seek out beneficial training information in the environment? In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onbo…
▽ More
Deep learning often requires the manual collection and annotation of a training set. On robotic platforms, can we partially automate this task by training the robot to be curious, i.e., to seek out beneficial training information in the environment? In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a drone, where motion is constrained to two dimensions. We use a 3D simulation environment and deep reinforcement learning to train a curiosity agent to, in turn, train the object detection model. This agent could have one of two conflicting objectives: train as quickly as possible, or train with minimal human input. We outline a reward function that allows the curiosity agent to learn either of these objectives, while taking into account some of the physical characteristics of the drone platform on which it is meant to run. In addition, We show that we can weigh the importance of achieving these objectives by adjusting a parameter in the reward function.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning
Authors:
Arthur Juliani,
Ahmed Khalifa,
Vincent-Pierre Berges,
Jonathan Harper,
Ervin Teng,
Hunter Henry,
Adam Crespi,
Julian Togelius,
Danny Lange
Abstract:
The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent playing Obstacle Tower must…
▽ More
The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent's ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level.
△ Less
Submitted 1 July, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.
-
Unity: A General Platform for Intelligent Agents
Authors:
Arthur Juliani,
Vincent-Pierre Berges,
Ervin Teng,
Andrew Cohen,
Jonathan Harper,
Chris Elion,
Chris Goy,
Yuan Gao,
Hunter Henry,
Marwan Mattar,
Danny Lange
Abstract:
Recent advances in artificial intelligence have been driven by the presence of increasingly realistic and complex simulated environments. However, many of the existing environments provide either unrealistic visuals, inaccurate physics, low task complexity, restricted agent perspective, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to f…
▽ More
Recent advances in artificial intelligence have been driven by the presence of increasingly realistic and complex simulated environments. However, many of the existing environments provide either unrealistic visuals, inaccurate physics, low task complexity, restricted agent perspective, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, making the simulated environment a black-box from the perspective of the learning system. In this work, we propose a novel taxonomy of existing simulation platforms and discuss the highest level class of general platforms which enable the development of learning environments that are rich in visual, physical, task, and social complexity. We argue that modern game engines are uniquely suited to act as general platforms and as a case study examine the Unity engine and open source Unity ML-Agents Toolkit. We then survey the research enabled by Unity and the Unity ML-Agents Toolkit, discussing the kinds of research a flexible, interactive and easily configurable general platform can facilitate.
△ Less
Submitted 6 May, 2020; v1 submitted 7 September, 2018;
originally announced September 2018.
-
ClickBAIT-v2: Training an Object Detector in Real-Time
Authors:
Ervin Teng,
Rui Huang,
Bob Iannucci
Abstract:
Modern deep convolutional neural networks (CNNs) for image classification and object detection are often trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of problem as time-ordered online training (ToOT). These problems will require a consideration of not only the quantity…
▽ More
Modern deep convolutional neural networks (CNNs) for image classification and object detection are often trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of problem as time-ordered online training (ToOT). These problems will require a consideration of not only the quantity of incoming training data, but the human effort required to annotate and use it. We demonstrate and evaluate a system tailored to training an object detector on a live video stream with minimal input from a human operator. We show that we can obtain bounding box annotation from weakly-supervised single-point clicks through interactive segmentation. Furthermore, by exploiting the time-ordered nature of the video stream through object tracking, we can increase the average training benefit of human interactions by 3-4 times.
△ Less
Submitted 27 March, 2018;
originally announced March 2018.
-
ClickBAIT: Click-based Accelerated Incremental Training of Convolutional Neural Networks
Authors:
Ervin Teng,
João Diogo Falcão,
Bob Iannucci
Abstract:
Today's general-purpose deep convolutional neural networks (CNN) for image classification and object detection are trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of problem as Time-ordered Online Training (ToOT) - these problems will require a consideration of not only t…
▽ More
Today's general-purpose deep convolutional neural networks (CNN) for image classification and object detection are trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of problem as Time-ordered Online Training (ToOT) - these problems will require a consideration of not only the quantity of incoming training data, but the human effort required to tag and use it. In this paper, we define training benefit as a metric to measure the effectiveness of a sequence in using each user interaction. We demonstrate and evaluate a system tailored to performing ToOT in the field, capable of training an image classifier on a live video stream through minimal input from a human operator. We show that by exploiting the time-ordered nature of the video stream through optical flow-based object tracking, we can increase the effectiveness of human actions by about 8 times.
△ Less
Submitted 14 September, 2017;
originally announced September 2017.