Search | arXiv e-print repository

Decoupling regularization from the action space

Authors: Sobhan Mohammadpour, Emma Fre**ger, Pierre-Luc Bacon

Abstract: Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. While standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that i… ▽ More Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. While standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological sequence design task. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.01616 [pdf, other]

Generative Active Learning for the Search of Small-molecule Protein Binders

Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2403.07688 [pdf, other]

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Authors: Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

Abstract: When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity a… ▽ More When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity and pruning. By systematically exploring the impact of various hyperparameter configurations on dying neurons, we unveil their potential to facilitate simple yet effective structured pruning algorithms. We introduce $\textit{Demon Pruning}$ (DemP), a method that controls the proliferation of dead neurons, dynamically leading to network sparsity. Achieved through a combination of noise injection on active units and a one-cycled schedule regularization strategy, DemP stands out for its simplicity and broad applicability. Experiments on CIFAR10 and ImageNet datasets demonstrate that DemP surpasses existing structured pruning techniques, showcasing superior accuracy-sparsity tradeoffs and training speedups. These findings suggest a novel perspective on dying neurons as a valuable resource for efficient model compression and optimization. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.05290 [pdf, other]

Do Transformer World Models Give Better Policy Gradients?

Authors: Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

Abstract: A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long… ▽ More A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long horizons: could they be the solution to this problem? Surprisingly, we show that commonly-used transformer world models produce circuitous gradient paths, which can be detrimental to long-range policy gradients. To tackle this challenge, we propose a class of world models called Actions World Models (AWMs), designed to provide more direct routes for gradient propagation. We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent. We demonstrate that AWMs can generate optimization landscapes that are easier to navigate even when compared to those from the simulator itself. This property allows transformer AWMs to produce better policies than competitive baselines in realistic long-horizon tasks. △ Less

Submitted 10 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: Michel Ma and Pierluca D'Oro contributed equally

arXiv:2401.08898 [pdf, other]

Bridging State and History Representations: Understanding Self-Predictive RL

Authors: Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon

Abstract: Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared propertie… ▽ More Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of preliminary guidelines for RL practitioners. △ Less

Submitted 21 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: ICLR 2024 (Poster). Code is available at https://github.com/twni2016/self-predictive-rl

arXiv:2312.14331 [pdf, other]

Maximum entropy GFlowNets with soft Q-learning

Authors: Sobhan Mohammadpour, Emmanuel Bengio, Emma Fre**ger, Pierre-Luc Bacon

Abstract: Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper add… ▽ More Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space. △ Less

Submitted 2 May, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Journal ref: 2024 Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2593-2601

arXiv:2310.15386 [pdf, other]

Course Correcting Koopman Representations

Authors: Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin

Abstract: Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over… ▽ More Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS. △ Less

Submitted 23 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.00166 [pdf, other]

Motif: Intrinsic Motivation from Artificial Intelligence Feedback

Authors: Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

Abstract: Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM ove… ▽ More Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM over pairs of captions to construct an intrinsic reward, which is then used to train agents with reinforcement learning. We evaluate Motif's performance and behavior on the challenging, open-ended and procedurally-generated NetHack game. Surprisingly, by only learning to maximize its intrinsic reward, Motif achieves a higher game score than an algorithm directly trained to maximize the score itself. When combining Motif's intrinsic reward with the environment reward, our method significantly outperforms existing approaches and makes progress on tasks where no advancements have ever been made without demonstrations. Finally, we show that Motif mostly generates intuitive human-aligned behaviors which can be steered easily through prompt modifications, while scaling well with the LLM size and the amount of information given in the prompt. △ Less

Submitted 29 September, 2023; originally announced October 2023.

Comments: The first two authors equally contributed - order decided by coin flip

arXiv:2309.14597 [pdf, other]

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Authors: Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

Abstract: Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the map** between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy param… ▽ More Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the map** between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents. △ Less

Submitted 10 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: NeurIPS 2023 Accepted Paper. The first two authors contributed equally

arXiv:2307.03864 [pdf, other]

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

Authors: Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon

Abstract: Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the u… ▽ More Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capability of RL algorithms, scaling up to tasks that require memorizing observations $1500$ steps ago. However, Transformers do not improve long-term credit assignment. In summary, our results provide an explanation for the success of Transformers in RL, while also highlighting an important area for future research and benchmark design. Our code is open-sourced at https://github.com/twni2016/Memory-RL △ Less

Submitted 3 November, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023 (Oral)

arXiv:2306.09539 [pdf, other]

Block-State Transformers

Authors: Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin

Abstract: State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks.… ▽ More State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed. △ Less

Submitted 30 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: NeurIPS'23 - Thirty-seventh Conference on Neural Information Processing Systems

arXiv:2306.04620 [pdf, other]

Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

Authors: Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio

Abstract: In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound for pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn… ▽ More In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound for pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front. △ Less

Submitted 29 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 14 pages

arXiv:2209.06259 [pdf, other]

Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization

Authors: Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon

Abstract: The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this p… ▽ More The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this problem. However, the astronomically large state space of biological sequences renders brute-force iterating over all possible sequences infeasible. In this paper, we propose MetaRLBO where we train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection via Bayesian Optimization. We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data acquired in the previous rounds. Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results compared to existing strong baselines. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2205.13513 [pdf, other]

Denoising gravitational-wave signals from binary black holes with dilated convolutional autoencoder

Authors: P. Bacon, A. Trovato, M. Bejger

Abstract: Broadband frequency output of gravitational-wave detectors is a non-stationary and non-Gaussian time series data stream dominated by noise populated by local disturbances and transient artifacts, which evolve on the same timescale as the gravitational-wave signals and may corrupt the astrophysical information. We study a denoising algorithm dedicated to expose the astrophysical signals by employin… ▽ More Broadband frequency output of gravitational-wave detectors is a non-stationary and non-Gaussian time series data stream dominated by noise populated by local disturbances and transient artifacts, which evolve on the same timescale as the gravitational-wave signals and may corrupt the astrophysical information. We study a denoising algorithm dedicated to expose the astrophysical signals by employing a convolutional neural network in the encoder-decoder configuration, i.e. apply the denoising procedure of coalescing binary black hole signals in the publicly available LIGO O1 time series strain data. The denoising convolutional autoencoder neural network is trained on a dataset of simulated astrophysical signals injected into the real detector's noise and a dataset of detector noise artifacts ("glitches"), and its fidelity is tested on real gravitational-wave events from O1 and O2 LIGO-Virgo observing runs. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: 27 pages, 5 figures in the text and 7 in the appendix

arXiv:2205.07802 [pdf, other]

The Primacy Bias in Deep Reinforcement Learning

Authors: Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

Abstract: This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effec… ▽ More This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: ICML 2022; code at https://github.com/evgenii-nikishin/rl_with_resets

arXiv:2203.01443 [pdf, other]

Continuous-Time Meta-Learning with Forward Mode Differentiation

Authors: Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

Abstract: Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differenti… ▽ More Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous, as opposed to a fixed and discrete number of gradient steps. As a consequence, we can optimize the amount of adaptation necessary to solve a new task using stochastic gradient descent, in addition to learning the initial conditions as is standard practice in gradient-based meta-learning. Importantly, in order to compute the exact meta-gradients required for the outer-loop updates, we devise an efficient algorithm based on forward mode differentiation, whose memory requirements do not scale with the length of the learning trajectory, thus allowing longer adaptation in constant memory. We provide analytical guarantees for the stability of COMLN, we show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.10600 [pdf, other]

Myriad: a real-world testbed to bridge trajectory optimization and deep learning

Authors: Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon

Abstract: We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments. The primary contributions of Myriad are threefold. First, Myriad provides machine learning practitioners access to trajectory optimization techniques for application within a typical automatic differentiation workflow. Second, Myriad presents many real-world optimal control problems, rangin… ▽ More We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments. The primary contributions of Myriad are threefold. First, Myriad provides machine learning practitioners access to trajectory optimization techniques for application within a typical automatic differentiation workflow. Second, Myriad presents many real-world optimal control problems, ranging from biology to medicine to engineering, for use by the machine learning community. Formulated in continuous space and time, these environments retain some of the complexity of real-world systems often abstracted away by standard benchmarks. As such, Myriad strives to serve as a step** stone towards application of modern machine learning techniques for impactful real-world tasks. Finally, we use the Myriad repository to showcase a novel approach for learning and control tasks. Trained in a fully end-to-end fashion, our model leverages an implicit planning module over neural ordinary differential equations, enabling simultaneous learning and planning with complex environment dynamics. △ Less

Submitted 26 January, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

Comments: Updated to match version accepted at NeurIPS 2022

arXiv:2112.12228 [pdf, other]

Direct Behavior Specification via Constrained Reinforcement Learning

Authors: Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal

Abstract: The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, whi… ▽ More The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods to automatically weigh each of these behavioral constraints. Specifically, we investigate how CMDPs can be adapted to solve goal-based tasks while adhering to several constraints simultaneously. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games. △ Less

Submitted 18 June, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

arXiv:2110.05442 [pdf, other]

Neural Algorithmic Reasoners are Implicit Planners

Authors: Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Abstract: Implicit planning has emerged as an elegant technique for combining learned models of the world with end-to-end model-free reinforcement learning. We study the class of implicit planners inspired by value iteration, an algorithm that is guaranteed to yield perfect policies in fully-specified tabular environments. We find that prior approaches either assume that the environment is provided in such… ▽ More Implicit planning has emerged as an elegant technique for combining learned models of the world with end-to-end model-free reinforcement learning. We study the class of implicit planners inspired by value iteration, an algorithm that is guaranteed to yield perfect policies in fully-specified tabular environments. We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect. This effect is caused by explicitly running the planning algorithm based on scalar predictions in every state, which can be harmful to data efficiency if such scalars are improperly predicted. We propose eXecuted Latent Value Iteration Networks (XLVINs), which alleviate the above limitations. Our method performs all planning computations in a high-dimensional latent space, breaking the algorithmic bottleneck. It maintains alignment with value iteration by carefully leveraging neural graph-algorithmic reasoning and contrastive self-supervised learning. Across eight low-data settings -- including classical control, navigation and Atari -- XLVINs provide significant improvements to data efficiency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: To appear at NeurIPS 2021 (Spotlight talk). 20 pages, 10 figures. arXiv admin note: text overlap with arXiv:2010.13146

arXiv:2106.03273 [pdf, other]

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

Authors: Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon

Abstract: The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers. When the model class is misspecified or has a limited representational capacity, model parameters with high likelihood might not necessarily result in high performance of the agent on a downstream control task. To alleviate this problem, we… ▽ More The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers. When the model class is misspecified or has a limited representational capacity, model parameters with high likelihood might not necessarily result in high performance of the agent on a downstream control task. To alleviate this problem, we propose an end-to-end approach for model learning which directly optimizes the expected returns using implicit differentiation. We treat a value function that satisfies the Bellman optimality operator induced by the model as an implicit function of model parameters and show how to differentiate the function. We provide theoretical and empirical evidence highlighting the benefits of our approach in the model misspecification regime compared to likelihood-based methods. △ Less

Submitted 6 June, 2021; originally announced June 2021.

Comments: Code at https://github.com/evgenii-nikishin/omd

arXiv:2103.06224 [pdf, ps, other]

An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning

Authors: Dilip Arumugam, Peter Henderson, Pierre-Luc Bacon

Abstract: How do we formalize the challenge of credit assignment in reinforcement learning? Common intuition would draw attention to reward sparsity as a key contributor to difficult credit assignment and traditional heuristics would look to temporal recency for the solution, calling upon the classic eligibility trace. We posit that it is not the sparsity of the reward itself that causes difficulty in credi… ▽ More How do we formalize the challenge of credit assignment in reinforcement learning? Common intuition would draw attention to reward sparsity as a key contributor to difficult credit assignment and traditional heuristics would look to temporal recency for the solution, calling upon the classic eligibility trace. We posit that it is not the sparsity of the reward itself that causes difficulty in credit assignment, but rather the \emph{information sparsity}. We propose to use information theory to define this notion, which we then use to characterize when credit assignment is an obstacle to efficient learning. With this perspective, we outline several information-theoretic mechanisms for measuring credit under a fixed behavior policy, highlighting the potential of information theory as a key tool towards provably-efficient credit assignment. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: Workshop on Biological and Artificial Reinforcement Learning (NeurIPS 2020)

arXiv:2010.14550 [pdf, other]

doi 10.3847/1538-4357/abee15

Search for Gravitational Waves Associated with Gamma-Ray Bursts Detected by Fermi and Swift During the LIGO-Virgo Run O3a

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, A. Aich, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, P. A. Altin, A. Amato, S. Anand, A. Ananyeva , et al. (1228 additional authors not shown)

Abstract: We search for gravitational-wave transients associated with gamma-ray bursts detected by the Fermi and Swift satellites during the first part of the third observing run of Advanced LIGO and Advanced Virgo (1 April 2019 15:00 UTC - 1 October 2019 15:00 UTC). 105 gamma-ray bursts were analyzed using a search for generic gravitational-wave transients; 32 gamma-ray bursts were analyzed with a search t… ▽ More We search for gravitational-wave transients associated with gamma-ray bursts detected by the Fermi and Swift satellites during the first part of the third observing run of Advanced LIGO and Advanced Virgo (1 April 2019 15:00 UTC - 1 October 2019 15:00 UTC). 105 gamma-ray bursts were analyzed using a search for generic gravitational-wave transients; 32 gamma-ray bursts were analyzed with a search that specifically targets neutron star binary mergers as short gamma-ray burst progenitors. We describe a method to calculate the probability that triggers from the binary merger targeted search are astrophysical and apply that method to the most significant gamma-ray bursts in that search. We find no significant evidence for gravitational-wave signals associated with the gamma-ray bursts that we followed up, nor for a population of unidentified subthreshold signals. We consider several source types and signal morphologies, and report for these lower bounds on the distance to each gamma-ray burst. △ Less

Submitted 20 August, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

Comments: 17 pages, 5 figures, 2 tables

Report number: LIGO-P2000040

Journal ref: Astrophys. J. 915, 86 (2021)

arXiv:2010.13146 [pdf, other]

XLVIN: eXecuted Latent Value Iteration Nets

Authors: Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Abstract: Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics. This came with several limitations, however: the model is not incentivised in any way to perform meaningful planning computations, the underlying s… ▽ More Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics. This came with several limitations, however: the model is not incentivised in any way to perform meaningful planning computations, the underlying state space is assumed to be discrete, and the Markov decision process (MDP) is assumed fixed and known. We propose eXecuted Latent Value Iteration Networks (XLVINs), which combine recent developments across contrastive self-supervised learning, graph representation learning and neural algorithmic reasoning to alleviate all of the above limitations, successfully deploying VIN-style models on generic environments. XLVINs match the performance of VIN-like models when the underlying MDP is discrete, fixed and known, and provides significant improvements to model-free baselines across three general MDP setups. △ Less

Submitted 6 December, 2020; v1 submitted 25 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020 Deep Reinforcement Learning Workshop

arXiv:2009.12604 [pdf, other]

Graph neural induction of value iteration

Authors: Andreea Deac, Pierre-Luc Bacon, Jian Tang

Abstract: Many reinforcement learning tasks can benefit from explicit planning based on an internal model of the environment. Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration. Such network have so far been focused on restrictive environments (e.g. grid-worlds), and modelled the planning procedure only i… ▽ More Many reinforcement learning tasks can benefit from explicit planning based on an internal model of the environment. Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration. Such network have so far been focused on restrictive environments (e.g. grid-worlds), and modelled the planning procedure only indirectly. We relax these constraints, proposing a graph neural network (GNN) that executes the value iteration (VI) algorithm, across arbitrary environment models, with direct supervision on the intermediate steps of VI. The results indicate that GNNs are able to model value iteration accurately, recovering favourable metrics and policies across a variety of out-of-distribution tests. This suggests that GNN executors with strong supervision are a viable component within deep reinforcement learning systems. △ Less

Submitted 26 September, 2020; originally announced September 2020.

Comments: ICML GRL+ 2020

arXiv:2009.01190 [pdf, other]

doi 10.3847/2041-8213/aba493

Properties and astrophysical implications of the 150 Msun binary black hole merger GW190521

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, A. Aich, L. Aiello, A. Ain, P. Ajith, S. Akcay, G. Allen, A. Allocca, P. A. Altin, A. Amato, S. Anand , et al. (1233 additional authors not shown)

Abstract: The gravitational-wave signal GW190521 is consistent with a binary black hole merger source at redshift 0.8 with unusually high component masses, $85^{+21}_{-14}\,M_{\odot}$ and $66^{+17}_{-18}\,M_{\odot}$, compared to previously reported events, and shows mild evidence for spin-induced orbital precession. The primary falls in the mass gap predicted by (pulsational) pair-instability supernova theo… ▽ More The gravitational-wave signal GW190521 is consistent with a binary black hole merger source at redshift 0.8 with unusually high component masses, $85^{+21}_{-14}\,M_{\odot}$ and $66^{+17}_{-18}\,M_{\odot}$, compared to previously reported events, and shows mild evidence for spin-induced orbital precession. The primary falls in the mass gap predicted by (pulsational) pair-instability supernova theory, in the approximate range $65 - 120\,M_{\odot}$. The probability that at least one of the black holes in GW190521 is in that range is 99.0%. The final mass of the merger $(142^{+28}_{-16}\,M_{\odot})$ classifies it as an intermediate-mass black hole. Under the assumption of a quasi-circular binary black hole coalescence, we detail the physical properties of GW190521's source binary and its post-merger remnant, including component masses and spin vectors. Three different waveform models, as well as direct comparison to numerical solutions of general relativity, yield consistent estimates of these properties. Tests of strong-field general relativity targeting the merger-ringdown stages of coalescence indicate consistency of the observed signal with theoretical predictions. We estimate the merger rate of similar systems to be $0.13^{+0.30}_{-0.11}\,{\rm Gpc}^{-3}\,\rm{yr}^{-1}$. We discuss the astrophysical implications of GW190521 for stellar collapse, and for the possible formation of black holes in the pair-instability mass gap through various channels: via (multiple) stellar coalescence, or via hierarchical merger of lower-mass black holes in star clusters or in active galactic nuclei. We find it to be unlikely that GW190521 is a strongly lensed signal of a lower-mass black hole binary merger. We also discuss more exotic possible sources for GW190521, including a highly eccentric black hole binary, or a primordial black hole binary. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: 39 pages, 13 figures; data available at https://dcc.ligo.org/P2000158-v4/public

Report number: LIGO-P2000021

Journal ref: Astrophys. J. Lett. 900, L13 (2020)

arXiv:2009.01075 [pdf]

doi 10.1103/PhysRevLett.125.101102

GW190521: A Binary Black Hole Merger with a Total Mass of $150 ~ M_{\odot}$

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, A. Aich, L. Aiello, A. Ain, P. Ajith, S. Akcay, G. Allen, A. Allocca, P. A. Altin, A. Amato, S. Anand , et al. (1232 additional authors not shown)

Abstract: On May 21, 2019 at 03:02:29 UTC Advanced LIGO and Advanced Virgo observed a short duration gravitational-wave signal, GW190521, with a three-detector network signal-to-noise ratio of 14.7, and an estimated false-alarm rate of 1 in 4900 yr using a search sensitive to generic transients. If GW190521 is from a quasicircular binary inspiral, then the detected signal is consistent with the merger of tw… ▽ More On May 21, 2019 at 03:02:29 UTC Advanced LIGO and Advanced Virgo observed a short duration gravitational-wave signal, GW190521, with a three-detector network signal-to-noise ratio of 14.7, and an estimated false-alarm rate of 1 in 4900 yr using a search sensitive to generic transients. If GW190521 is from a quasicircular binary inspiral, then the detected signal is consistent with the merger of two black holes with masses of $85^{+21}_{-14} M_{\odot}$ and $66^{+17}_{-18} M_{\odot}$ (90 % credible intervals). We infer that the primary black hole mass lies within the gap produced by (pulsational) pair-instability supernova processes, and has only a 0.32 % probability of being below $65 M_{\odot}$. We calculate the mass of the remnant to be $142^{+28}_{-16} M_{\odot}$, which can be considered an intermediate mass black hole (IMBH). The luminosity distance of the source is $5.3^{+2.4}_{-2.6}$ Gpc, corresponding to a redshift of $0.82^{+0.28}_{-0.34}$. The inferred rate of mergers similar to GW190521 is $0.13^{+0.30}_{-0.11}\,\mathrm{Gpc}^{-3}\,\mathrm{yr}^{-1}$. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: Supplementary Material at https://dcc.ligo.org/LIGO-P2000020/Public

Journal ref: Phys. Rev. Lett. 125, 101102 (2020)

arXiv:2007.02786 [pdf, other]

TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

Authors: Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

Abstract: We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers. Our method, TDprop, computes a per parameter learning rate based on the diagonal preconditioning of the TD update rule. We show how this can be used in both $n$-step returns and TD($λ$). Our theoretical findings demonstrate that i… ▽ More We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers. Our method, TDprop, computes a per parameter learning rate based on the diagonal preconditioning of the TD update rule. We show how this can be used in both $n$-step returns and TD($λ$). Our theoretical findings demonstrate that including this additional preconditioning information is, surprisingly, comparable to normal semi-gradient TD if the optimal learning rate is found for both via a hyperparameter search. In Deep RL experiments using Expected SARSA, TDprop meets or exceeds the performance of Adam in all tested games under near-optimal learning rates, but a well-tuned SGD can yield similar improvements -- matching our theory. Our findings suggest that Jacobi preconditioning may improve upon typical adaptive optimization methods in Deep RL, but despite incorporating additional information from the TD bootstrap term, may not always be better than SGD. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Comments: Presented at the Theoretical Foundations of Reinforcement Learning workshop at ICML 2020

arXiv:2006.12611 [pdf, other]

doi 10.3847/2041-8213/ab960f

GW190814: Gravitational Waves from the Coalescence of a 23 M$_\odot$ Black Hole with a 2.6 M$_\odot$ Compact Object

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, A. Aich, L. Aiello, A. Ain, P. Ajith, S. Akcay, G. Allen, A. Allocca, P. A. Altin, A. Amato, S. Anand , et al. (1232 additional authors not shown)

Abstract: We report the observation of a compact binary coalescence involving a 22.2 - 24.3 $M_{\odot}$ black hole and a compact object with a mass of 2.50 - 2.67 $M_{\odot}$ (all measurements quoted at the 90$\%$ credible level). The gravitational-wave signal, GW190814, was observed during LIGO's and Virgo's third observing run on August 14, 2019 at 21:10:39 UTC and has a signal-to-noise ratio of 25 in the… ▽ More We report the observation of a compact binary coalescence involving a 22.2 - 24.3 $M_{\odot}$ black hole and a compact object with a mass of 2.50 - 2.67 $M_{\odot}$ (all measurements quoted at the 90$\%$ credible level). The gravitational-wave signal, GW190814, was observed during LIGO's and Virgo's third observing run on August 14, 2019 at 21:10:39 UTC and has a signal-to-noise ratio of 25 in the three-detector network. The source was localized to 18.5 deg$^2$ at a distance of $241^{+41}_{-45}$ Mpc; no electromagnetic counterpart has been confirmed to date. The source has the most unequal mass ratio yet measured with gravitational waves, $0.112^{+0.008}_{-0.009}$, and its secondary component is either the lightest black hole or the heaviest neutron star ever discovered in a double compact-object system. The dimensionless spin of the primary black hole is tightly constrained to $\leq 0.07$. Tests of general relativity reveal no measurable deviations from the theory, and its prediction of higher-multipole emission is confirmed at high confidence. We estimate a merger rate density of 1-23 Gpc$^{-3}$ yr$^{-1}$ for the new class of binary coalescence sources that GW190814 represents. Astrophysical models predict that binaries with mass ratios similar to this event can form through several channels, but are unlikely to have formed in globular clusters. However, the combination of mass ratio, component masses, and the inferred merger rate for this event challenges all current models for the formation and mass distribution of compact-object binaries. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 23 pages, 8 figures, accepted by ApJ Letters

Report number: LIGO-P190814

arXiv:2004.08342 [pdf]

doi 10.1103/PhysRevD.102.043015

GW190412: Observation of a Binary-Black-Hole Coalescence with Asymmetric Masses

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, A. Aich, L. Aiello, A. Ain, P. Ajith, S. Akcay, G. Allen, A. Allocca, P. A. Altin, A. Amato, S. Anand , et al. (1232 additional authors not shown)

Abstract: We report the observation of gravitational waves from a binary-black-hole coalescence during the first two weeks of LIGO's and Virgo's third observing run. The signal was recorded on April 12, 2019 at 05:30:44 UTC with a network signal-to-noise ratio of 19. The binary is different from observations during the first two observing runs most notably due to its asymmetric masses: a ~30 solar mass blac… ▽ More We report the observation of gravitational waves from a binary-black-hole coalescence during the first two weeks of LIGO's and Virgo's third observing run. The signal was recorded on April 12, 2019 at 05:30:44 UTC with a network signal-to-noise ratio of 19. The binary is different from observations during the first two observing runs most notably due to its asymmetric masses: a ~30 solar mass black hole merged with a ~8 solar mass black hole companion. The more massive black hole rotated with a dimensionless spin magnitude between 0.22 and 0.60 (90% probability). Asymmetric systems are predicted to emit gravitational waves with stronger contributions from higher multipoles, and indeed we find strong evidence for gravitational radiation beyond the leading quadrupolar order in the observed signal. A suite of tests performed on GW190412 indicates consistency with Einstein's general theory of relativity. While the mass ratio of this system differs from all previous detections, we show that it is consistent with the population model of stellar binary black holes inferred from the first two observing runs. △ Less

Submitted 24 August, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

Comments: 29 pages, 12 figures; data available under https://doi.org/10.7935/20yv-ka61 posterior samples available under https://dcc.ligo.org/P190412/public

Report number: LIGO-P190412

Journal ref: Phys. Rev. D 102, 043015 (2020)

arXiv:2002.11833 [pdf, other]

Policy Evaluation Networks

Authors: Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

Abstract: Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states. This approach opens up the possibility of performing direct gradient ascent in policy… ▽ More Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states. This approach opens up the possibility of performing direct gradient ascent in policy space without seeing any new data. The main challenge for this approach is finding a way to represent complex policies that facilitates learning and generalization. To address this problem, we introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding. Our empirical results demonstrate that combining these three elements (learned Policy Evaluation Network, policy fingerprints, gradient ascent) can produce policies that outperform those that generated the training data, in zero-shot manner. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Comments: 12 pages, 11 figures

arXiv:2001.01761 [pdf]

doi 10.3847/2041-8213/ab75f5

GW190425: Observation of a Compact Binary Coalescence with Total Mass $\sim 3.4 M_{\odot}$

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand , et al. (1177 additional authors not shown)

Abstract: On 2019 April 25, the LIGO Livingston detector observed a compact binary coalescence with signal-to-noise ratio 12.9. The Virgo detector was also taking data that did not contribute to detection due to a low signal-to-noise ratio, but were used for subsequent parameter estimation. The 90% credible intervals for the component masses range from 1.12 to 2.52 $M_{\odot}$ (1.45 to 1.88 $M_{\odot}$ if w… ▽ More On 2019 April 25, the LIGO Livingston detector observed a compact binary coalescence with signal-to-noise ratio 12.9. The Virgo detector was also taking data that did not contribute to detection due to a low signal-to-noise ratio, but were used for subsequent parameter estimation. The 90% credible intervals for the component masses range from 1.12 to 2.52 $M_{\odot}$ (1.45 to 1.88 $M_{\odot}$ if we restrict the dimensionless component spin magnitudes to be smaller than 0.05). These mass parameters are consistent with the individual binary components being neutron stars. However, both the source-frame chirp mass $1.44^{+0.02}_{-0.02} M_{\odot}$ and the total mass $3.4^{+0.3}_{-0.1}\,M_{\odot}$ of this system are significantly larger than those of any other known binary neutron star system. The possibility that one or both binary components of the system are black holes cannot be ruled out from gravitational-wave data. We discuss possible origins of the system based on its inconsistency with the known Galactic binary neutron star population. Under the assumption that the signal was produced by a binary neutron star coalescence, the local rate of neutron star mergers is updated to $250-2810 \text{Gpc}^{-3}\text{yr}^{-1}$. △ Less

Submitted 7 April, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

Comments: 24 pages, 19 figures, published in ApJL

Report number: LIGO-P190425

Journal ref: Astrophysical Journal Letters 892 (2020) L3

arXiv:2001.00923 [pdf, other]

doi 10.3847/1538-4357/ab7d3e

A Joint Fermi-GBM and LIGO/Virgo Analysis of Compact Binary Mergers From the First and Second Gravitational-wave Observing Runs

Authors: The Fermi Gamma-ray Burst Monitor Team, the LIGO Scientific Collaboration, the Virgo Collaboration, :, R. Hamburg, C. Fletcher, E. Burns, A. Goldstein, E. Bissaldi, M. S. Briggs, W. H. Cleveland, M. M. Giles, C. M. Hui, D. Kocevski, S. Lesage, B. Mailyan, C. Malacaria, S. Poolakkil, R. Preece, O. J. Roberts, P. Veres, A. von Kienlin, C. A. Wilson-Hodge, J. Wood, R. Abbott , et al. (1241 additional authors not shown)

Abstract: We present results from offline searches of Fermi Gamma-ray Burst Monitor (GBM) data for gamma-ray transients coincident with the compact binary coalescences observed by the gravitational-wave (GW) detectors Advanced LIGO and Advanced Virgo during their first and second observing runs. In particular, we perform follow-up for both confirmed events and low significance candidates reported in the LIG… ▽ More We present results from offline searches of Fermi Gamma-ray Burst Monitor (GBM) data for gamma-ray transients coincident with the compact binary coalescences observed by the gravitational-wave (GW) detectors Advanced LIGO and Advanced Virgo during their first and second observing runs. In particular, we perform follow-up for both confirmed events and low significance candidates reported in the LIGO/Virgo catalog GWTC-1. We search for temporal coincidences between these GW signals and GBM triggered gamma-ray bursts (GRBs). We also use the GBM Untargeted and Targeted subthreshold searches to find coincident gamma-rays below the on-board triggering threshold. This work implements a refined statistical approach by incorporating GW astrophysical source probabilities and GBM visibilities of LIGO/Virgo sky localizations to search for cumulative signatures of coincident subthreshold gamma-rays. All search methods recover the short gamma-ray burst GRB 170817A occurring ~1.7 s after the binary neutron star merger GW170817. We also present results from a new search seeking GBM counterparts to LIGO single-interferometer triggers. This search finds a candidate joint event, but given the nature of the GBM signal and localization, as well as the high joint false alarm rate of $1.1 \times 10^{-6}$ Hz, we do not consider it an astrophysical association. We find no additional coincidences. △ Less

Submitted 24 February, 2020; v1 submitted 3 January, 2020; originally announced January 2020.

Comments: Accepted for publication in ApJ. 18 pages, 4 figures, 1 table

Journal ref: The Astrophysical Journal, 893:100 (14pp), 2020 April 20

arXiv:2001.00271 [pdf, other]

Options of Interest: Temporal Abstraction with Interest Functions

Authors: Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

Abstract: Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, be… ▽ More Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments. △ Less

Submitted 1 January, 2020; originally announced January 2020.

Comments: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

arXiv:1912.11716 [pdf, other]

doi 10.1016/j.softx.2021.100658

Open data from the first and second observing runs of Advanced LIGO and Advanced Virgo

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, A. Aich, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, P. A. Altin, A. Amato, S. Anand, A. Ananyeva , et al. (1223 additional authors not shown)

Abstract: Advanced LIGO and Advanced Virgo are actively monitoring the sky and collecting gravitational-wave strain data with sufficient sensitivity to detect signals routinely. In this paper we describe the data recorded by these instruments during their first and second observing runs. The main data products are the gravitational-wave strain arrays, released as time series sampled at 16384 Hz. The dataset… ▽ More Advanced LIGO and Advanced Virgo are actively monitoring the sky and collecting gravitational-wave strain data with sufficient sensitivity to detect signals routinely. In this paper we describe the data recorded by these instruments during their first and second observing runs. The main data products are the gravitational-wave strain arrays, released as time series sampled at 16384 Hz. The datasets that include this strain measurement can be freely accessed through the Gravitational Wave Open Science Center at http://gw-openscience.org, together with data-quality information essential for the analysis of LIGO and Virgo data, documentation, tutorials, and supporting software. △ Less

Submitted 25 January, 2021; v1 submitted 25 December, 2019; originally announced December 2019.

Comments: 42 pages, 5 figures

Report number: LIGO-P1900206

Journal ref: SoftwareX 13 (2021) 100658

arXiv:1912.05104 [pdf, other]

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

Authors: Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

Abstract: The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, this results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from. Furthermore, to obtain unbiased gradient estimates, the starting point of the policy gradient… ▽ More The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, this results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from. Furthermore, to obtain unbiased gradient estimates, the starting point of the policy gradient estimator requires sampling states from a normalized discounted weighting of states. However, the difficulty of estimating the normalized discounted weighting of states, or the stationary state distribution, is quite well-known. Additionally, the large sample complexity of policy gradient methods is often attributed to insufficient exploration, and to remedy this, it is often assumed that the restart distribution provides sufficient exploration in these algorithms. In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution. The key contribution of our work includes providing a practically feasible algorithm to estimate the normalized discounted weighting of states, i.e, the \textit{discounted future state distribution}. We propose that exploration can be achieved by entropy regularization with the discounted state distribution in policy gradients, where a metric for maximal coverage of the state space can be based on the entropy of the induced state distribution. The proposed approach can be considered as a three time-scale algorithm and under some mild technical conditions, we prove its convergence to a locally optimal policy. Experimentally, we demonstrate usefulness of regularization with the discounted future state distribution in terms of increased state space coverage and faster learning on a range of complex tasks. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: In Submission; Appeared at NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop

arXiv:1910.09093 [pdf, ps, other]

All-Action Policy Gradient Methods: A Numerical Integration Approach

Authors: Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon

Abstract: While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 19… ▽ More While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]. In this paper, we adopt a numerical integration perspective to broaden the applicability of the all-action estimator to general spaces and to any function class for the policy or critic components, beyond the Gaussian case considered by [Ciosek, 2018]. In addition, we provide a new theoretical result on the effect of using a biased critic which offers more guidance than the previous "compatible features" condition of [Sutton, 1999]. We demonstrate the benefit of our approach in continuous control tasks with nonlinear function approximation. Our results show improved performance and sample efficiency. △ Less

Submitted 20 October, 2019; originally announced October 2019.

Comments: 9 pages, 2 figures. NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop

arXiv:1910.06508 [pdf, other]

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

Authors: Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Abstract: Off-policy policy estimators that use importance sampling (IS) can suffer from high variance in long-horizon domains, and there has been particular excitement over new IS methods that leverage the structure of Markov decision processes. We analyze the variance of the most popular approaches through the viewpoint of conditional Monte Carlo. Surprisingly, we find that in finite horizon MDPs there is… ▽ More Off-policy policy estimators that use importance sampling (IS) can suffer from high variance in long-horizon domains, and there has been particular excitement over new IS methods that leverage the structure of Markov decision processes. We analyze the variance of the most popular approaches through the viewpoint of conditional Monte Carlo. Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling. We then provide sufficient conditions under which the per-decision or stationary estimators will provably reduce the variance over importance sampling with finite horizons. For the asymptotic (in terms of horizon $T$) case, we develop upper and lower bounds on the variance of those estimators which yields sufficient conditions under which there exists an exponential v.s. polynomial gap between the variance of importance sampling and that of the per-decision or stationary estimators. These results help advance our understanding of if and when new types of IS estimators will improve the accuracy of off-policy estimation. △ Less

Submitted 5 June, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: Accepted by ICML 2020, 21 pages, 1 figure

arXiv:1908.11170 [pdf, other]

doi 10.1088/1361-6382/ab685e

A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Alford, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, A. Ananyeva , et al. (1113 additional authors not shown)

Abstract: The LIGO Scientific Collaboration and the Virgo Collaboration have cataloged eleven confidently detected gravitational-wave events during the first two observing runs of the advanced detector era. All eleven events were consistent with being from well-modeled mergers between compact stellar-mass objects: black holes or neutron stars. The data around the time of each of these events have been made… ▽ More The LIGO Scientific Collaboration and the Virgo Collaboration have cataloged eleven confidently detected gravitational-wave events during the first two observing runs of the advanced detector era. All eleven events were consistent with being from well-modeled mergers between compact stellar-mass objects: black holes or neutron stars. The data around the time of each of these events have been made publicly available through the gravitational-wave open science center. The entirety of the gravitational-wave strain data from the first and second observing runs have also now been made publicly available. There is considerable interest among the broad scientific community in understanding the data and methods used in the analyses. In this paper, we provide an overview of the detector noise properties and the data analysis techniques used to detect gravitational-wave signals and infer the source properties. We describe some of the checks that are performed to validate the analyses and results from the observations of gravitational-wave events. We also address concerns that have been raised about various properties of LIGO-Virgo detector noise and the correctness of our analyses as applied to the resulting data. △ Less

Submitted 10 February, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

Journal ref: B P Abbott et al 2020 Class. Quantum Grav. 37 055002

arXiv:1908.06060 [pdf, other]

doi 10.3847/1538-4357/abdcb7

A gravitational-wave measurement of the Hubble constant following the second observing run of Advanced LIGO and Virgo

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand , et al. (1164 additional authors not shown)

Abstract: This paper presents the gravitational-wave measurement of the Hubble constant ($H_0$) using the detections from the first and second observing runs of the Advanced LIGO and Virgo detector network. The presence of the transient electromagnetic counterpart of the binary neutron star GW170817 led to the first standard-siren measurement of $H_0$. Here we additionally use binary black hole detections i… ▽ More This paper presents the gravitational-wave measurement of the Hubble constant ($H_0$) using the detections from the first and second observing runs of the Advanced LIGO and Virgo detector network. The presence of the transient electromagnetic counterpart of the binary neutron star GW170817 led to the first standard-siren measurement of $H_0$. Here we additionally use binary black hole detections in conjunction with galaxy catalogs and report a joint measurement. Our updated measurement is $H_0 = 68.7^{+17.0}_{-7.8}$ km/s/Mpc (68.3\% of the highest density posterior interval with a flat-in-log prior) which is an improvement by a factor of 1.04 (about 4\%) over the GW170817-only value of $68.7^{+17.5}_{-8.3}$ km/s/Mpc. A significant additional contribution currently comes from GW170814, a loud and well-localized detection from a part of the sky thoroughly covered by the Dark Energy Survey. With numerous detections anticipated over the upcoming years, an exhaustive understanding of other systematic effects are also going to become increasingly important. These results establish the path to cosmology using gravitational-wave observations with and without transient electromagnetic counterparts. △ Less

Submitted 8 November, 2021; v1 submitted 16 August, 2019; originally announced August 2019.

Comments: 21 pages, 8 figures; this version corrects Fig 2; there are minor changes to also Figs 3 & 4 and the final results

Report number: LIGO-P1900015

Journal ref: Astrophys J 909 Number 2 218 (2021)

arXiv:1908.03584 [pdf, other]

doi 10.1103/PhysRevD.101.084002

An Optically Targeted Search for Gravitational Waves emitted by Core-Collapse Supernovae during the First and Second Observing Runs of Advanced LIGO and Advanced Virgo

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand, A. Ananyeva , et al. (1173 additional authors not shown)

Abstract: We present the results from a search for gravitational-wave transients associated with core-collapse supernovae observed within a source distance of approximately 20 Mpc during the first and second observing runs of Advanced LIGO and Advanced Virgo. No significant gravitational-wave candidate was detected. We report the detection efficiencies as a function of the distance for waveforms derived fro… ▽ More We present the results from a search for gravitational-wave transients associated with core-collapse supernovae observed within a source distance of approximately 20 Mpc during the first and second observing runs of Advanced LIGO and Advanced Virgo. No significant gravitational-wave candidate was detected. We report the detection efficiencies as a function of the distance for waveforms derived from multidimensional numerical simulations and phenomenological extreme emission models. For neutrino-driven explosions the distance at which we reach 50% detection efficiency is approaching 5 kpc, and for magnetorotationally-driven explosions is up to 54 kpc. However, waveforms for extreme emission models are detectable up to 28 Mpc. For the first time, the gravitational-wave data enabled us to exclude part of the parameter spaces of two extreme emission models with confidence up to 83%, limited by coincident data coverage. Besides, using ad hoc harmonic signals windowed with Gaussian envelopes we constrained the gravitational-wave energy emitted during core-collapse at the levels of $4.27\times 10^{-4}\,M_\odot c^2$ and $1.28\times 10^{-1}\,M_\odot c^2$ for emissions at 235 Hz and 1304 Hz respectively. These constraints are two orders of magnitude more stringent than previously derived in the corresponding analysis using initial LIGO, initial Virgo and GEO 600 data. △ Less

Submitted 20 August, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

Comments: 13 pages, 5 figures

Report number: LIGO-P1700177

Journal ref: Phys. Rev. D 101, 084002 (2020)

arXiv:1908.01012 [pdf]

doi 10.1088/1361-6382/ab5f7c

Model comparison from LIGO-Virgo data on GW170817's binary components and consequences for the merger remnant

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand, A. Ananyeva , et al. (1169 additional authors not shown)

Abstract: GW170817 is the very first observation of gravitational waves originating from the coalescence of two compact objects in the mass range of neutron stars, accompanied by electromagnetic counterparts, and offers an opportunity to directly probe the internal structure of neutron stars. We perform Bayesian model selection on a wide range of theoretical predictions for the neutron star equation of stat… ▽ More GW170817 is the very first observation of gravitational waves originating from the coalescence of two compact objects in the mass range of neutron stars, accompanied by electromagnetic counterparts, and offers an opportunity to directly probe the internal structure of neutron stars. We perform Bayesian model selection on a wide range of theoretical predictions for the neutron star equation of state. For the binary neutron star hypothesis, we find that we cannot rule out the majority of theoretical models considered. In addition, the gravitational-wave data alone does not rule out the possibility that one or both objects were low-mass black holes. We discuss the possible outcomes in the case of a binary neutron star merger, finding that all scenarios from prompt collapse to long-lived or even stable remnants are possible. For long-lived remnants, we place an upper limit of 1.9 kHz on the rotation rate. If a black hole was formed any time after merger and the coalescing stars were slowly rotating, then the maximum baryonic mass of non-rotating neutron stars is at most 3.05 $M_\odot$, and three equations of state considered here can be ruled out. We obtain a tighter limit of 2.67 $M_\odot$ for the case that the merger results in a hypermassive neutron star. △ Less

Submitted 6 March, 2020; v1 submitted 2 August, 2019; originally announced August 2019.

Comments: 35 pages, 4 figures

Report number: LIGO-P1800379

Journal ref: Classical and Quantum Gravity, Vol. 37, No 4, p 045006 (2020)

arXiv:1907.10851 [pdf, other]

doi 10.1103/PhysRevD.100.124022

Astrophysical signal consistency test adapted for gravitational-wave transient searches

Authors: V. Gayathri, P. Bacon, A. Pai, E. Chassande-Mottin, F. Salemi, G. Vedovato

Abstract: Gravitational wave astronomy is established with direct observation of gravitational wave from merging binary black holes and binary neutron stars during the first and second observing run of LIGO and Virgo detectors. The gravitational-wave transient searches mainly categories into two families: modeled and modeled-independent searches. The modeled searches are based on matched filtering technique… ▽ More Gravitational wave astronomy is established with direct observation of gravitational wave from merging binary black holes and binary neutron stars during the first and second observing run of LIGO and Virgo detectors. The gravitational-wave transient searches mainly categories into two families: modeled and modeled-independent searches. The modeled searches are based on matched filtering techniques and model-independent searches are based on the extraction of excess power from time-frequency representations. We have proposed a hybrid method, called wavegraph that mixes the two approaches. It uses astrophysical information at the extraction stage of model-independent search using a mathematical graph. In this work, we assess the performance of wavegraph clustering in real LIGO and Virgo noises (the sixth science run and the first observing run) and using the coherent WaveBurst transient search as a backbone. Further, we propose a new signal consistency test for this algorithm. This test uses the amplitude profile information to distinguish between the gravitational wave transients from the noisy glitches. This test is able to remove a large fraction of loud glitches, which thus results in additional overall sensitivity in the context of searches for binary black-hole mergers in the low-mass range. △ Less

Submitted 23 December, 2019; v1 submitted 25 July, 2019; originally announced July 2019.

Comments: main paper: 8 page and 13 figures, total with appendices: 10 pages and 13 figures

Report number: LIGO-P1900221

Journal ref: Phys. Rev. D 100, 124022 (2019)

arXiv:1907.09384 [pdf, other]

doi 10.3847/1538-4357/ab3c2d

Search for Eccentric Binary Black Hole Mergers with Advanced LIGO and Advanced Virgo during their First and Second Observing Runs

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand , et al. (1161 additional authors not shown)

Abstract: When formed through dynamical interactions, stellar-mass binary black holes may retain eccentric orbits ($e>0.1$ at 10 Hz) detectable by ground-based gravitational-wave detectors. Eccentricity can therefore be used to differentiate dynamically-formed binaries from isolated binary black hole mergers. Current template-based gravitational-wave searches do not use waveform models associated to eccentr… ▽ More When formed through dynamical interactions, stellar-mass binary black holes may retain eccentric orbits ($e>0.1$ at 10 Hz) detectable by ground-based gravitational-wave detectors. Eccentricity can therefore be used to differentiate dynamically-formed binaries from isolated binary black hole mergers. Current template-based gravitational-wave searches do not use waveform models associated to eccentric orbits, rendering the search less efficient to eccentric binary systems. Here we present results of a search for binary black hole mergers that inspiral in eccentric orbits using data from the first and second observing runs (O1 and O2) of Advanced LIGO and Advanced Virgo. The search uses minimal assumptions on the morphology of the transient gravitational waveform. We show that it is sensitive to binary mergers with a detection range that is weakly dependent on eccentricity for all bound systems. Our search did not identify any new binary merger candidates. We interpret these results in light of eccentric binary formation models. △ Less

Submitted 24 January, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: 7 pages, 2 figures

Report number: LIGO Document P1900110

arXiv:1907.01443 [pdf]

doi 10.3847/1538-4357/ab4b48

Search for gravitational-wave signals associated with gamma-ray bursts during the second observing run of Advanced LIGO and Advanced Virgo

Authors: B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand, A. Ananyeva, S. B. Anderson , et al. (1174 additional authors not shown)

Abstract: We present the results of targeted searches for gravitational-wave transients associated with gamma-ray bursts during the second observing run of Advanced LIGO and Advanced Virgo, which took place from 2016 November to 2017 August. We have analyzed 98 gamma-ray bursts using an unmodeled search method that searches for generic transient gravitational waves and 42 with a modeled search method that t… ▽ More We present the results of targeted searches for gravitational-wave transients associated with gamma-ray bursts during the second observing run of Advanced LIGO and Advanced Virgo, which took place from 2016 November to 2017 August. We have analyzed 98 gamma-ray bursts using an unmodeled search method that searches for generic transient gravitational waves and 42 with a modeled search method that targets compact-binary mergers as progenitors of short gamma-ray bursts. Both methods clearly detect the previously reported binary merger signal GW170817, with p-values of $<9.38 \times 10^{-6}$ (modeled) and $3.1 \times 10^{-4}$ (unmodeled). We do not find any significant evidence for gravitational-wave signals associated with the other gamma-ray bursts analyzed, and therefore we report lower bounds on the distance to each of these, assuming various source types and signal morphologies. Using our final modeled search results, short gamma-ray burst observations, and assuming binary neutron star progenitors, we place bounds on the rate of short gamma-ray bursts as a function of redshift for $z \leq 1$. We estimate 0.07-1.80 joint detections with Fermi-GBM per year for the 2019-20 LIGO-Virgo observing run and 0.15-3.90 per year when current gravitational-wave detectors are operating at their design sensitivities. △ Less

Submitted 22 November, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

Report number: LIGO-P1900034

Journal ref: Astrophys. J. 886, 75 (2019)

arXiv:1906.12040 [pdf, other]

doi 10.1103/PhysRevD.100.122002

Search for gravitational waves from Scorpius X-1 in the second Advanced LIGO observing run with an improved hidden Markov model

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, A. Ananyeva , et al. (1112 additional authors not shown)

Abstract: We present results from a semicoherent search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1, using a hidden Markov model (HMM) to track spin wandering. This search improves on previous HMM-based searches of LIGO data by using an improved frequency domain matched filter, the $\mathcal{J}$-statistic, and by analysing data from Advanced LIGO's second observing run. In… ▽ More We present results from a semicoherent search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1, using a hidden Markov model (HMM) to track spin wandering. This search improves on previous HMM-based searches of LIGO data by using an improved frequency domain matched filter, the $\mathcal{J}$-statistic, and by analysing data from Advanced LIGO's second observing run. In the frequency range searched, from $60$ to $650\,\mathrm{Hz}$, we find no evidence of gravitational radiation. At $194.6\,\mathrm{Hz}$, the most sensitive search frequency, we report an upper limit on gravitational wave strain (at 95\% confidence) of $h_0^{95\%} = 3.47 \times 10^{-25}$ when marginalising over source inclination angle. This is the most sensitive search for Scorpius X-1, to date, that is specifically designed to be robust in the presence of spin wandering. △ Less

Submitted 27 November, 2019; v1 submitted 28 June, 2019; originally announced June 2019.

Comments: 21 pages, 5 figures; accepted for publication in Physical Review D

Report number: LIGO-P1800208; erratum LIGO-P2100373

Journal ref: Phys. Rev. D 100, 122002 (2019); erratum Phys. Rev. D 104, 109903 (2021)

arXiv:1906.08000 [pdf, other]

doi 10.1103/PhysRevD.100.064064

Search for intermediate mass black hole binaries in the first and second observing runs of the Advanced LIGO and Virgo network

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, A. Adams, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato , et al. (1174 additional authors not shown)

Abstract: Gravitational wave astronomy has been firmly established with the detection of gravitational waves from the merger of ten stellar mass binary black holes and a neutron star binary. This paper reports on the all-sky search for gravitational waves from intermediate mass black hole binaries in the first and second observing runs of the Advanced LIGO and Virgo network. The search uses three independen… ▽ More Gravitational wave astronomy has been firmly established with the detection of gravitational waves from the merger of ten stellar mass binary black holes and a neutron star binary. This paper reports on the all-sky search for gravitational waves from intermediate mass black hole binaries in the first and second observing runs of the Advanced LIGO and Virgo network. The search uses three independent algorithms: two based on matched filtering of the data with waveform templates of gravitational wave signals from compact binaries, and a third, model-independent algorithm that employs no signal model for the incoming signal. No intermediate mass black hole binary event was detected in this search. Consequently, we place upper limits on the merger rate density for a family of intermediate mass black hole binaries. In particular, we choose sources with total masses $M=m_1+m_2\in[120,800]$M$_\odot$ and mass ratios $q = m_2/m_1 \in[0.1,1.0]$. For the first time, this calculation is done using numerical relativity waveforms (which include higher modes) as models of the real emitted signal. We place a most stringent upper limit of $0.20$~Gpc$^{-3}$yr$^{-1}$ (in co-moving units at the 90% confidence level) for equal-mass binaries with individual masses $m_{1,2}=100$M$_\odot$ and dimensionless spins $χ_{1,2}= 0.8$ aligned with the orbital angular momentum of the binary. This improves by a factor of $\sim 5$ that reported after Advanced LIGO's first observing run. △ Less

Submitted 24 January, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: main paper: 14 pages, 2 figures and 1 table : total with appendices 19 pages, 2 figures and 2 tables

Report number: LIGO-P1900045

Journal ref: Phys. Rev. D 100, 064064 (2019)

arXiv:1905.03457 [pdf, other]

doi 10.1103/PhysRevD.100.024017

All-sky search for short gravitational-wave bursts in the second Advanced LIGO and Advanced Virgo run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand , et al. (1164 additional authors not shown)

Abstract: We present the results of a search for short-duration gravitational-wave transients in the data from the second observing run of Advanced LIGO and Advanced Virgo. We search for gravitational-wave transients with a duration of milliseconds to approximately one second in the 32-4096 Hz frequency band with minimal assumptions about the signal properties, thus targeting a wide variety of sources. We a… ▽ More We present the results of a search for short-duration gravitational-wave transients in the data from the second observing run of Advanced LIGO and Advanced Virgo. We search for gravitational-wave transients with a duration of milliseconds to approximately one second in the 32-4096 Hz frequency band with minimal assumptions about the signal properties, thus targeting a wide variety of sources. We also perform a matched-filter search for gravitational-wave transients from cosmic string cusps for which the waveform is well-modeled. The unmodeled search detected gravitational waves from several binary black hole mergers which have been identified by previous analyses. No other significant events have been found by either the unmodeled search or the cosmic string search. We thus present search sensitivity for a variety of signal waveforms and report upper limits on the source rate-density as function of the characteristic frequency of the signal. These upper limits are a factor of three lower than the first observing run, with a $50\%$ detection probability for gravitational-wave emissions with energies of $\sim10^{-9}M_\odot c^2$ at 153 Hz. For the search dedicated to cosmic string cusps we consider several loop distribution models, and present updated constraints from the same search done in the first observing run. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: 10 pages, 6 figures

Report number: LIGO-P1800308

Journal ref: Phys. Rev. D 100, 024017 (2019)

arXiv:1904.08976 [pdf, other]

doi 10.1103/PhysRevLett.123.161102

Search for sub-solar mass ultracompact binaries in Advanced LIGO's second observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand , et al. (1165 additional authors not shown)

Abstract: We present an Advanced LIGO and Advanced Virgo search for sub-solar mass ultracompact objects in data obtained during Advanced LIGO's second observing run. In contrast to a previous search of Advanced LIGO data from the first observing run, this search includes the effects of component spin on the gravitational waveform. We identify no viable gravitational wave candidates consistent with sub-solar… ▽ More We present an Advanced LIGO and Advanced Virgo search for sub-solar mass ultracompact objects in data obtained during Advanced LIGO's second observing run. In contrast to a previous search of Advanced LIGO data from the first observing run, this search includes the effects of component spin on the gravitational waveform. We identify no viable gravitational wave candidates consistent with sub-solar mass ultracompact binaries with at least one component between 0.2 - 1.0 solar masses. We use the null result to constrain the binary merger rate of (0.2 solar mass, 0.2 solar mass) binaries to be less than 3.7 x 10^5 Gpc^-3 yr^-1 and the binary merger rate of (1.0 solar mass, 1.0 solar mass) binaries to be less than 5.2 x 10^3 Gpc^-3 yr^-1. Sub-solar mass ultracompact objects are not expected to form via known stellar evolution channels, though it has been suggested that primordial density fluctuations or particle dark matter with cooling mechanisms and/or nuclear interactions could form black holes with sub-solar masses. Assuming a particular primordial black hole formation model, we constrain a population of merging 0.2 solar mass black holes to account for less than 16% of the dark matter density and a population of merging 1.0 solar mass black holes to account for less than 2% of the dark matter density. We discuss how constraints on the merger rate and dark matter fraction may be extended to arbitrary black hole population models that predict sub-solar mass binaries. △ Less

Submitted 25 May, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

Report number: LIGO-P1900037

Journal ref: Phys. Rev. Lett. 123, 161102 (2019)

arXiv:1903.12015 [pdf, other]

doi 10.1103/PhysRevD.99.104033

All-sky search for long-duration gravitational-wave transients in the second Advanced LIGO observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, S. Anand , et al. (1161 additional authors not shown)

Abstract: We present the results of a search for long-duration gravitational-wave transients in the data from the Advanced LIGO second observation run; we search for gravitational-wave transients of $2~\text{--}~ 500$~s duration in the $24 - 2048$\,Hz frequency band with minimal assumptions about signal properties such as waveform morphologies, polarization, sky location or time of occurrence. Targeted sign… ▽ More We present the results of a search for long-duration gravitational-wave transients in the data from the Advanced LIGO second observation run; we search for gravitational-wave transients of $2~\text{--}~ 500$~s duration in the $24 - 2048$\,Hz frequency band with minimal assumptions about signal properties such as waveform morphologies, polarization, sky location or time of occurrence. Targeted signal models include fallback accretion onto neutron stars, broadband chirps from innermost stable circular orbit waves around rotating black holes, eccentric inspiral-merger-ringdown compact binary coalescence waveforms, and other models. The second observation run totals about \otwoduration~days of coincident data between November 2016 and August 2017. We find no significant events within the parameter space that we searched, apart from the already-reported binary neutron star merger GW170817. We thus report sensitivity limits on the root-sum-square strain amplitude $h_{\mathrm{rss}}$ at $50\%$ efficiency. These sensitivity estimates are an improvement relative to the first observing run and also done with an enlarged set of gravitational-wave transient waveforms. Overall, the best search sensitivity is $h_{\mathrm{rss}}^{50\%}$=$2.7\times10^{-22}$~$\mathrm{Hz^{-1/2}}$ for a millisecond magnetar model. For eccentric compact binary coalescence signals, the search sensitivity reaches $h_{\mathrm{rss}}^{50\%}$=$9.6\times10^{-22}$~$\mathrm{Hz^{-1/2}}$. △ Less

Submitted 27 September, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

Journal ref: Phys. Rev. D 99, 104033 (2019)

arXiv:1903.08844 [pdf, other]

doi 10.1103/PhysRevD.100.062001

Directional limits on persistent gravitational waves using data from Advanced LIGO's first two observing runs

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, B. P. Abbott, R. Abbott, T. D. Abbott, S. Abraham, F. Acernese, K. Ackley, C. Adams, R. X. Adhikari, V. B. Adya, C. Affeldt, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, G. Allen, A. Allocca, M. A. Aloy, P. A. Altin, A. Amato, A. Ananyeva , et al. (1110 additional authors not shown)

Abstract: We perform an unmodeled search for persistent, directional gravitational wave (GW) sources using data from the first and second observing runs of Advanced LIGO. We do not find evidence for any GW signals. We place limits on the broadband GW flux emitted at 25~Hz from point sources with a power law spectrum at $F_{α,Θ} <(0.05-25)\times 10^{-8} ~{\rm erg\,cm^{-2}\,s^{-1}\,Hz^{-1}}$ and the (normaliz… ▽ More We perform an unmodeled search for persistent, directional gravitational wave (GW) sources using data from the first and second observing runs of Advanced LIGO. We do not find evidence for any GW signals. We place limits on the broadband GW flux emitted at 25~Hz from point sources with a power law spectrum at $F_{α,Θ} <(0.05-25)\times 10^{-8} ~{\rm erg\,cm^{-2}\,s^{-1}\,Hz^{-1}}$ and the (normalized) energy density spectrum in GWs at 25 Hz from extended sources at $Ω_α(Θ) <(0.19-2.89)\times 10^{-8} ~{\rm sr^{-1}}$ where $α$ is the spectral index of the energy density spectrum. These represent improvements of $2.5-3\times$ over previous limits. We also consider point sources emitting GWs at a single frequency, targeting the directions of Sco X-1, SN 1987A, and the Galactic Center. The best upper limits on the strain amplitude of a potential source in these three directions range from $h_0 < (3.6-4.7)\times 10^{-25}$, 1.5$\times$ better than previous limits set with the same analysis method. We also report on a marginally significant outlier at 36.06~Hz. This outlier is not consistent with a persistent gravitational-wave source as its significance diminishes when combining all of the available data. △ Less

Submitted 9 September, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

Comments: 15 pages, 5 figures

Report number: LIGO-P1900053

Journal ref: Phys. Rev. D 100, 062001 (2019)

Showing 1–50 of 129 results for author: Bacon, P