-
Ethical and social risks of harm from Language Models
Authors:
Laura Weidinger,
John Mellor,
Maribeth Rauh,
Conor Griffin,
Jonathan Uesato,
Po-Sen Huang,
Myra Cheng,
Mia Glaese,
Borja Balle,
Atoosa Kasirzadeh,
Zac Kenton,
Sasha Brown,
Will Hawkins,
Tom Stepleton,
Courtney Biles,
Abeba Birhane,
Julia Haas,
Laura Rimell,
Lisa Anne Hendricks,
William Isaac,
Sean Legassick,
Geoffrey Irving,
Iason Gabriel
Abstract:
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist…
▽ More
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences.
We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities.
In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
Counterfactual Credit Assignment in Model-Free Reinforcement Learning
Authors:
Thomas Mesnard,
Théophane Weber,
Fabio Viola,
Shantanu Thakoor,
Alaa Saade,
Anna Harutyunyan,
Will Dabney,
Tom Stepleton,
Nicolas Heess,
Arthur Guez,
Éric Moulines,
Marcus Hutter,
Lars Buesing,
Rémi Munos
Abstract:
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to…
▽ More
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to condition value functions on future events, by learning to extract relevant information from a trajectory. We formulate a family of policy gradient algorithms that use these future-conditional value functions as baselines or critics, and show that they are provably low variance. To avoid the potential bias from conditioning on future information, we constrain the hindsight information to not contain information about the agent's actions. We demonstrate the efficacy and validity of our algorithm on a number of illustrative and challenging problems.
△ Less
Submitted 14 December, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Wasserstein Fair Classification
Authors:
Ray Jiang,
Aldo Pacchiano,
Tom Stepleton,
Heinrich Jiang,
Silvia Chiappa
Abstract:
We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances. The approach has desirable theoretical properties and is robust to specific choices of the threshold used to obtain class predictions from model outputs. We introduce different methods that enable hiding sensitive information at te…
▽ More
We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances. The approach has desirable theoretical properties and is robust to specific choices of the threshold used to obtain class predictions from model outputs. We introduce different methods that enable hiding sensitive information at test time or have a simple and fast implementation. We show empirical performance against different fairness baselines on several benchmark fairness datasets.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.
-
Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery
Authors:
Thomas Stepleton,
Razvan Pascanu,
Will Dabney,
Siddhant M. Jayakumar,
Hubert Soyer,
Remi Munos
Abstract:
Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals. This is especially true during the initial learning stages, when exploratory behaviour can increase the delay between specific actions and their effects. Many new or popular approaches for learning these distant correlations employ backpropagation through tim…
▽ More
Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals. This is especially true during the initial learning stages, when exploratory behaviour can increase the delay between specific actions and their effects. Many new or popular approaches for learning these distant correlations employ backpropagation through time (BPTT), but this technique requires storing observation traces long enough to span the interval between cause and effect. Besides memory demands, learning dynamics like vanishing gradients and slow convergence due to infrequent weight updates can reduce BPTT's practicality; meanwhile, although online recurrent network learning is a develo** topic, most approaches are not efficient enough to use as replacements. We propose a simple, effective memory strategy that can extend the window over which BPTT can learn without requiring longer traces. We explore this approach empirically on a few tasks and discuss its implications.
△ Less
Submitted 13 May, 2018;
originally announced May 2018.
-
Safe and Efficient Off-Policy Reinforcement Learning
Authors:
Rémi Munos,
Tom Stepleton,
Anna Harutyunyan,
Marc G. Bellemare
Abstract:
In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace($λ$), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) it is efficient as it makes the be…
▽ More
In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace($λ$), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) it is efficient as it makes the best use of samples collected from near on-policy behaviour policies. We analyze the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. We believe this is the first return-based off-policy control algorithm converging a.s. to $Q^*$ without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q($λ$), which was an open problem since 1989. We illustrate the benefits of Retrace($λ$) on a standard suite of Atari 2600 games.
△ Less
Submitted 7 November, 2016; v1 submitted 8 June, 2016;
originally announced June 2016.
-
Q($λ$) with Off-Policy Corrections
Authors:
Anna Harutyunyan,
Marc G. Bellemare,
Tom Stepleton,
Remi Munos
Abstract:
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided cer…
▽ More
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD($λ$). We illustrate this theoretical relationship empirically on a continuous-state control task.
△ Less
Submitted 11 August, 2016; v1 submitted 16 February, 2016;
originally announced February 2016.