Search | arXiv e-print repository

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems. △ Less

Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2207.00938 [pdf, other]

doi 10.1109/TPAMI.2022.3225162

Interpretable by Design: Learning Predictors by Composing Interpretable Queries

Authors: Aditya Chattopadhyay, Stewart Slocum, Benjamin D. Haeffele, Rene Vidal, Donald Geman

Abstract: There is a growing concern about typically opaque decision-making with high-performance machine learning algorithms. Providing an explanation of the reasoning process in domain-specific terms can be crucial for adoption in risk-sensitive domains such as healthcare. We argue that machine learning algorithms should be interpretable by design and that the language in which these interpretations are e… ▽ More There is a growing concern about typically opaque decision-making with high-performance machine learning algorithms. Providing an explanation of the reasoning process in domain-specific terms can be crucial for adoption in risk-sensitive domains such as healthcare. We argue that machine learning algorithms should be interpretable by design and that the language in which these interpretations are expressed should be domain- and task-dependent. Consequently, we base our model's prediction on a family of user-defined and task-specific binary functions of the data, each having a clear interpretation to the end-user. We then minimize the expected number of queries needed for accurate prediction on any given input. As the solution is generally intractable, following prior work, we choose the queries sequentially based on information gain. However, in contrast to previous work, we need not assume the queries are conditionally independent. Instead, we leverage a stochastic generative model (VAE) and an MCMC algorithm (Unadjusted Langevin) to select the most informative query about the input based on previous query-answers. This enables the online determination of a query chain of whatever depth is required to resolve prediction ambiguities. Finally, experiments on vision and NLP tasks demonstrate the efficacy of our approach and its superiority over post-hoc explanations. △ Less

Submitted 25 November, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

Comments: 29 pages, 14 figures. Accepted as a Regular Paper in Transactions on Pattern Analysis and Machine Intelligence

arXiv:2205.09133 [pdf, other]

doi 10.3847/1538-4357/ac649f

Disks in Nearby Young Stellar Associations Found Via Virtual Reality

Authors: Susan Higashio, Marc J. Kuchner, Steven M. Silverberg, Matthew A. Brandt, Thomas G. Grubb, Jonathan Gagné, John H. Debes, Joshua Schlieder, John P. Wisniewski, Stewart Slocum, Alissa S. Bans, Shambo Bhattacharjee, Joseph R. Biggs, Milton K. D. Bosch, Tadeas Cernohous, Katharina Doll, Hugo A. Durantini Luca, Alexandru Enachioaie, Phillip Griffith Sr., Joshua Hamilton, Jonathan Holden, Michiharu Hyogo, Dawoon Jung, Lily Lau, Fernanda Piñiero Art Piipuu , et al. (2 additional authors not shown)

Abstract: The Disk Detective citizen science project recently released a new catalog of disk candidates found by visual inspection of images from NASA's Wide-Field Infrared Survey Explorer (WISE) mission and other surveys. We applied this new catalog of well-vetted disk candidates to search for new members of nearby young stellar associations (YSAs) using a novel technique based on Gaia data and virtual rea… ▽ More The Disk Detective citizen science project recently released a new catalog of disk candidates found by visual inspection of images from NASA's Wide-Field Infrared Survey Explorer (WISE) mission and other surveys. We applied this new catalog of well-vetted disk candidates to search for new members of nearby young stellar associations (YSAs) using a novel technique based on Gaia data and virtual reality (VR). We examined AB Doradus, Argus, $β$ Pictoris, Carina, Columba, Octans-Near, Tucana-Horologium, and TW Hya by displaying them in VR together with other nearby stars, color-coded to show infrared excesses found via Disk Detective. Using this method allows us to find new association members in mass regimes where isochrones are degenerate. We propose ten new YSA members with infrared excesses: three of AB Doradus (HD 44775, HD 40540 and HD 44510), one of $β$ Pictoris (HD 198472), two of Octans-Near (HD 157165 and BD+35 2953), and four disk-hosting members of a combined population of Carina, Columba and Tucana-Horologium: CPD-57 937, HD 274311, HD 41992, and WISEA J092521.90-673224.8. This last object (J0925) appears to be an extreme debris disk with a fractional infrared luminosity of $3.7 \times 10^{-2}$. We also propose two new members of AB Doradus that do not show infrared excesses: TYC 6518-1857-1 and CPD-25 1292. We find HD 15115 appears to be a member of Tucana-Horologium rather than $β$ Pictoris. We advocate for membership in Columba-Carina of HD 30447, CPD-35 525, and HD 35841. Finally, we propose that three M dwarfs, previously considered members of Tuc-Hor are better considered a separate association, tentatively called ``Smethells 165''. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: 26 pages; 17 figures, 3 tables. Accepted for publication in AAS Journals

arXiv:2112.01432 [pdf, other]

doi 10.3847/1538-4357/ac674c

Supernovae Shock Breakout/Emergence Detection Predictions for a Wide-Field X-ray Survey

Authors: Amanda J. Bayless, Chris Fryer, Peter J. Brown, Patrick Young, Pete Roming, Michael Davis, Thomas Lechner, Samuel Slocum, Janie D. Echon, Cynthia Froning

Abstract: There are currently many large-field surveys operational and planned including the powerful Vera C. Rubin Observatory Legacy Survey of Space and Time. These surveys will increase the number and diversity of transients dramatically. However, for some transients, like supernovae (SNe), we can gain more understanding by directed observations (e.g. shock breakout, $γ$-ray detections) than by simply in… ▽ More There are currently many large-field surveys operational and planned including the powerful Vera C. Rubin Observatory Legacy Survey of Space and Time. These surveys will increase the number and diversity of transients dramatically. However, for some transients, like supernovae (SNe), we can gain more understanding by directed observations (e.g. shock breakout, $γ$-ray detections) than by simply increasing the sample size. For example, the initial emission from these transients can be a powerful probe of these explosions. Upcoming ground-based detectors are not ideally suited to observe the initial emission (shock emergence) of these transients. These observations require a large field-of-view X-ray mission with a UV follow up within the first hour of shock breakout. The emission in the first one hour to even one day provides strong constraints on the stellar radius and asymmetries in the outer layers of stars, the properties of the circumstellar medium (e.g. inhomogeneities in the wind for core-collapse SNe, accreting companion in thermonuclear SNe), and the transition region between these two. This paper describes a simulation for the number of SNe that could be seen by a large field of view lobster eye X-ray and UV observatory. △ Less

Submitted 28 April, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: 13 pages, 7 figures, submitted to ApJ

Report number: LA-UR-21-31498

arXiv:2010.02141 [pdf, other]

AdaLead: A simple and robust adaptive greedy search algorithm for sequence design

Authors: Sam Sinai, Richard Wang, Alexander Whatley, Stewart Slocum, Elina Locane, Eric D. Kelsic

Abstract: Efficient design of biological sequences will have a great impact across many industrial and healthcare domains. However, discovering improved sequences requires solving a difficult optimization problem. Traditionally, this challenge was approached by biologists through a model-free method known as "directed evolution", the iterative process of random mutation and selection. As the ability to buil… ▽ More Efficient design of biological sequences will have a great impact across many industrial and healthcare domains. However, discovering improved sequences requires solving a difficult optimization problem. Traditionally, this challenge was approached by biologists through a model-free method known as "directed evolution", the iterative process of random mutation and selection. As the ability to build models that capture the sequence-to-function map improves, such models can be used as oracles to screen sequences before running experiments. In recent years, interest in better algorithms that effectively use such oracles to outperform model-free approaches has intensified. These span from approaches based on Bayesian Optimization, to regularized generative models and adaptations of reinforcement learning. In this work, we implement an open-source Fitness Landscape EXploration Sandbox (FLEXS: github.com/samsinai/FLEXS) environment to test and evaluate these algorithms based on their optimality, consistency, and robustness. Using FLEXS, we develop an easy-to-implement, scalable, and robust evolutionary greedy algorithm (AdaLead). Despite its simplicity, we show that AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Showing 1–5 of 5 results for author: Slocum, S