Search | arXiv e-print repository

Enabling Data Dependency-based Query Optimization

Authors: Daniel Lindner, Daniel Ritter, Felix Naumann

Abstract: Data dependency-based query optimization techniques can considerably improve database system performance: we apply three such optimization techniques to five database management systems (DBMSs) and observe throughput improvements between 5 % and 33 %. We address two key challenges to achieve these results: (i) efficiently identifying and extracting relevant dependencies from the data, and (ii) mak… ▽ More Data dependency-based query optimization techniques can considerably improve database system performance: we apply three such optimization techniques to five database management systems (DBMSs) and observe throughput improvements between 5 % and 33 %. We address two key challenges to achieve these results: (i) efficiently identifying and extracting relevant dependencies from the data, and (ii) making use of the dependencies through SQL rewrites or as transformation rules in the optimizer. First, the schema does not provide all relevant dependencies. We present a workload-driven dependency discovery approach to find additional dependencies within milliseconds. Second, the throughput improvement of a state-of-the-art DBMS is 13 % using only SQL rewrites, but 20 % when we integrate dependency-based optimization into the optimizer and execution engine, e. g., by employing dependency propagation and subquery handling. Using all relevant dependencies, the runtime of four standard benchmarks improves by up to 10 % compared to using only primary and foreign keys, and up to 22 % compared to not using dependencies. The dependency discovery overhead amortizes after a single workload execution. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2403.13793 [pdf, other]

Evaluating Frontier Models for Dangerous Capabilities

Authors: Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah , et al. (2 additional authors not shown)

Abstract: To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous… ▽ More To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models. △ Less

Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2310.12921 [pdf, other]

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Authors: Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner

Abstract: Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and gen… ▽ More Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve performance by providing a second "baseline" prompt and projecting out parts of the CLIP embedding space irrelevant to distinguish between goal and baseline. Further, we find a strong scaling effect for VLM-RMs: larger VLMs trained with more compute and data are better reward models. The failure modes of VLM-RMs we encountered are all related to known capability limitations of current VLMs, such as limited spatial reasoning ability or visually unrealistic environments that are far off-distribution for the VLM. We find that VLM-RMs are remarkably robust as long as the VLM is large enough. This suggests that future VLMs will become more and more useful reward models for a wide range of RL applications. △ Less

Submitted 14 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Presented at International Conference on Learning Representations (ICLR) 2024

arXiv:2308.04332 [pdf, other]

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

Authors: Yannick Metz, David Lindner, Raphaël Baur, Daniel Keim, Mennatallah El-Assady

Abstract: To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this… ▽ More To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender. More information is available at https://rlhfblender.info/. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: 14 pages, 3 figures

Journal ref: ICML2023 Interactive Learning from Implicit Human Feedback Workshop

arXiv:2307.15217 [pdf, other]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems. △ Less

Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2305.16147 [pdf, other]

Learning Safety Constraints from Demonstrations with Unknown Rewards

Authors: David Lindner, Xin Chen, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

Abstract: We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with possibly different reward functions. While previous work is limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstratio… ▽ More We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with possibly different reward functions. While previous work is limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstrations with different unknown rewards without knowledge of the environment dynamics. CoCoRL constructs a convex safe set based on demonstrations, which provably guarantees safety even for potentially sub-optimal (but safe) demonstrations. For near-optimal demonstrations, CoCoRL converges to the true safe set with no policy regret. We evaluate CoCoRL in gridworld environments and a driving simulation with multiple constraints. CoCoRL learns constraints that lead to safe driving behavior. Importantly, we can safely transfer the learned constraints to different tasks and environments. In contrast, alternative methods based on Inverse Reinforcement Learning (IRL) often exhibit poor performance and learn unsafe policies. △ Less

Submitted 1 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Presented at the International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

arXiv:2301.05062 [pdf, other]

Tracr: Compiled Transformers as a Laboratory for Interpretability

Authors: David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik

Abstract: We show how to "compile" human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study "superposition" in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evalu… ▽ More We show how to "compile" human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study "superposition" in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods. Commonly, because the "programs" learned by transformers are unknown it is unclear whether an interpretation succeeded. We demonstrate our approach by implementing and examining programs including computing token frequencies, sorting, and parenthesis checking. We provide an open-source implementation of Tracr at https://github.com/google-deepmind/tracr. △ Less

Submitted 3 November, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

Comments: Presented at NeurIPS 2023 (Spotlight)

arXiv:2210.04610 [pdf, other]

Red-Teaming the Stable Diffusion Safety Filter

Authors: Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr

Abstract: Stable Diffusion is a recent open-source image generation model comparable to proprietary models such as DALLE, Imagen, or Parti. Stable Diffusion comes with a safety filter that aims to prevent generating explicit images. Unfortunately, the filter is obfuscated and poorly documented. This makes it hard for users to prevent misuse in their applications, and to understand the filter's limitations a… ▽ More Stable Diffusion is a recent open-source image generation model comparable to proprietary models such as DALLE, Imagen, or Parti. Stable Diffusion comes with a safety filter that aims to prevent generating explicit images. Unfortunately, the filter is obfuscated and poorly documented. This makes it hard for users to prevent misuse in their applications, and to understand the filter's limitations and improve it. We first show that it is easy to generate disturbing content that bypasses the safety filter. We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content. Based on our analysis, we argue safety measures in future model releases should strive to be fully open and properly documented to stimulate security contributions from the community. △ Less

Submitted 10 November, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: ML Safety Workshop NeurIPS 2022

arXiv:2207.08645 [pdf, other]

Active Exploration for Inverse Reinforcement Learning

Authors: David Lindner, Andreas Krause, Giorgia Ramponi

Abstract: Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through seq… ▽ More Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies. △ Less

Submitted 22 August, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: Presented at Conference on Neural Information Processing Systems (NeurIPS), 2022

arXiv:2206.13316 [pdf, other]

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning

Authors: David Lindner, Mennatallah El-Assady

Abstract: Reinforcement learning (RL) commonly assumes access to well-specified reward functions, which many practical applications do not provide. Instead, recently, more work has explored learning what to do from interacting with humans. So far, most of these approaches model humans as being (nosily) rational and, in particular, giving unbiased feedback. We argue that these models are too simplistic and t… ▽ More Reinforcement learning (RL) commonly assumes access to well-specified reward functions, which many practical applications do not provide. Instead, recently, more work has explored learning what to do from interacting with humans. So far, most of these approaches model humans as being (nosily) rational and, in particular, giving unbiased feedback. We argue that these models are too simplistic and that RL researchers need to develop more realistic human models to design and evaluate their algorithms. In particular, we argue that human models have to be personal, contextual, and dynamic. This paper calls for research from different disciplines to address key questions about how humans provide feedback to AIs and how we can build more robust human-in-the-loop RL systems. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted to Communication in Human-AI Interaction Workshop (CHAI) at IJCAI-ECAI-22

arXiv:2206.05255 [pdf, other]

Interactively Learning Preference Constraints in Linear Bandits

Authors: David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

Abstract: We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve… ▽ More We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL's sample complexity matches the lower bound in the worst-case. In the average case, ACOL's sample complexity bound is still significantly tighter than bounds of simpler approaches. In synthetic experiments, ACOL performs on par with an oracle solution and outperforms a range of baselines. As an application, we consider learning constraints to represent human preferences in a driving simulation. ACOL is significantly more sample efficient than alternatives for this application. Further, we find that learning preferences as constraints is more robust to changes in the driving scenario than encoding the preferences directly in the reward function. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: Accepted to International Conference on Machine Learning (ICML), 2022

arXiv:2201.09562 [pdf, other]

doi 10.1016/j.artint.2023.103922

GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems

Authors: Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, Dominik Baumann

Abstract: Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be… ▽ More Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe. △ Less

Submitted 12 June, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

Journal ref: Artificial Intelligence, Volume 320, Year 2023

arXiv:2106.01325 [pdf, other]

Addressing the Long-term Impact of ML Decisions via Policy Regret

Authors: David Lindner, Hoda Heidari, Andreas Krause

Abstract: Machine Learning (ML) increasingly informs the allocation of opportunities to individuals and communities in areas such as lending, education, employment, and beyond. Such decisions often impact their subjects' future characteristics and capabilities in an a priori unknown fashion. The decision-maker, therefore, faces exploration-exploitation dilemmas akin to those in multi-armed bandits. Followin… ▽ More Machine Learning (ML) increasingly informs the allocation of opportunities to individuals and communities in areas such as lending, education, employment, and beyond. Such decisions often impact their subjects' future characteristics and capabilities in an a priori unknown fashion. The decision-maker, therefore, faces exploration-exploitation dilemmas akin to those in multi-armed bandits. Following prior work, we model communities as arms. To capture the long-term effects of ML-based allocation decisions, we study a setting in which the reward from each arm evolves every time the decision-maker pulls that arm. We focus on reward functions that are initially increasing in the number of pulls but may become (and remain) decreasing after a certain point. We argue that an acceptable sequential allocation of opportunities must take an arm's potential for growth into account. We capture these considerations through the notion of policy regret, a much stronger notion than the often-studied external regret, and present an algorithm with provably sub-linear policy regret for sufficiently long time horizons. We empirically compare our algorithm with several baselines and find that it consistently outperforms them, in particular for long time horizons. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted to IJCAI 2021

arXiv:2104.03946 [pdf, other]

Learning What To Do by Simulating the Past

Authors: David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

Abstract: Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thu… ▽ More Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill. △ Less

Submitted 3 May, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

Comments: Presented at ICLR 2021

arXiv:2104.00550 [pdf]

Agent-based simulations for protecting nursing homes with prevention and vaccination strategies

Authors: Jana Lasser, Johannes Zuber, Johannes Sorger, Elma Dervic, Katharina Ledebur, Simon David Lindner, Elisabeth Klager, Maria Kletečka-Pulker, Harald Willschke, Katrin Stangl, Sarah Stadtmann, Christian Haslinger, Peter Klimek, Thomas Wochele-Thoma

Abstract: Due to its high lethality amongst the elderly, the safety of nursing homes has been of central importance during the COVID-19 pandemic. With test procedures becoming available at scale, such as antigen or RT-LAMP tests, and increasing availability of vaccinations, nursing homes might be able to safely relax prohibitory measures while controlling the spread of infections (meaning an average of one… ▽ More Due to its high lethality amongst the elderly, the safety of nursing homes has been of central importance during the COVID-19 pandemic. With test procedures becoming available at scale, such as antigen or RT-LAMP tests, and increasing availability of vaccinations, nursing homes might be able to safely relax prohibitory measures while controlling the spread of infections (meaning an average of one or less secondary infections per index case). Here, we develop a detailed agent-based epidemiological model for the spread of SARS-CoV-2 in nursing homes to identify optimal prevention strategies. The model is microscopically calibrated to high-resolution data from nursing homes in Austria, including detailed social contact networks and information on past outbreaks. We find that the effectiveness of mitigation testing depends critically on the timespan between test and test result, the detection threshold of the viral load for the test to give a positive result, and the screening frequencies of residents and employees. Under realistic conditions and in absence of an effective vaccine, we find that preventive screening of employees only might be sufficient to control outbreaks in nursing homes, provided that turnover times and detection thresholds of the tests are low enough. If vaccines that are moderately effective against infection and transmission are available, control is achieved if 80% or more of the inhabitants are vaccinated, even if no preventive testing is in place and residents are allowed to have visitors. Since these results strongly depend on vaccine efficacy against infection, retention of testing infrastructures, regular voluntary screening and sequencing of virus genomes is advised to enable early identification of new variants of concern. △ Less

Submitted 14 June, 2021; v1 submitted 16 November, 2020; originally announced April 2021.

Comments: Supplementary material is included in the manuscript PDF

arXiv:2102.12466 [pdf, other]

Information Directed Reward Learning for Reinforcement Learning

Authors: David Lindner, Matteo Turchetta, Sebastian Tschiatschek, Kamil Ciosek, Andreas Krause

Abstract: For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithm… ▽ More For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returns with as few expert queries as possible. To this end, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward and selects queries that maximize the information gain about the difference in return between plausibly optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, it achieves similar or better performance with significantly fewer queries by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model. We support our findings with extensive evaluations in multiple environments and with different query types. △ Less

Submitted 31 January, 2022; v1 submitted 24 February, 2021; originally announced February 2021.

Comments: Presented at Conference on Neural Information Processing Systems (NeurIPS), 2021

arXiv:2101.12509 [pdf, ps, other]

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Authors: David Lindner, Kyle Matoba, Alexander Meulemans

Abstract: Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer… ▽ More Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers. △ Less

Submitted 23 February, 2021; v1 submitted 29 January, 2021; originally announced January 2021.

Comments: Presented at the SafeAI workshop at AAAI 2021

arXiv:1907.00452 [pdf, ps, other]

Detecting Spiky Corruption in Markov Decision Processes

Authors: Jason Mancuso, Tomasz Kisielewski, David Lindner, Alok Singh

Abstract: Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a… ▽ More Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption. △ Less

Submitted 30 June, 2019; originally announced July 2019.

Comments: paper accepted to the AI Safety Workshop at IJCAI-19

arXiv:1903.11451 [pdf, other]

doi 10.1145/3308560.3316706

Sensing Social Media Signals for Cryptocurrency News

Authors: Johannes Beck, Roberta Huang, David Lindner, Tian Guo, Ce Zhang, Dirk Helbing, Nino Antulov-Fantulin

Abstract: The ability to track and monitor relevant and important news in real-time is of crucial interest in multiple industrial sectors. In this work, we focus on the set of cryptocurrency news, which recently became of emerging interest to the general and financial audience. In order to track relevant news in real-time, we (i) match news from the web with tweets from social media, (ii) track their intrad… ▽ More The ability to track and monitor relevant and important news in real-time is of crucial interest in multiple industrial sectors. In this work, we focus on the set of cryptocurrency news, which recently became of emerging interest to the general and financial audience. In order to track relevant news in real-time, we (i) match news from the web with tweets from social media, (ii) track their intraday tweet activity and (iii) explore different machine learning models for predicting the number of the article mentions on Twitter within the first 24 hours after its publication. We compare several machine learning models, such as linear extrapolation, linear and random forest autoregressive models, and a sequence-to-sequence neural network. We find that the random forest autoregressive model behaves comparably to more complex models in the majority of tasks. △ Less

Submitted 27 March, 2019; originally announced March 2019.

Comments: full version of the paper, that is accepted at ACM WWW '19 Conference, MSM'19 Workshop

Showing 1–19 of 19 results for author: Lindner, D