Search | arXiv e-print repository

arXiv:1912.05743 [pdf, other]

Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning

Authors: Akanksha Atrey, Kaleigh Clary, David Jensen

Abstract: Saliency maps are frequently used to support explanations of the behavior of deep reinforcement learning (RL) agents. However, a review of how saliency maps are used in practice indicates that the derived explanations are often unfalsifiable and can be highly subjective. We introduce an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliency maps and… ▽ More Saliency maps are frequently used to support explanations of the behavior of deep reinforcement learning (RL) agents. However, a review of how saliency maps are used in practice indicates that the derived explanations are often unfalsifiable and can be highly subjective. We introduce an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliency maps and assess the degree to which they correspond to the semantics of RL environments. We use Atari games, a common benchmark for deep RL, to evaluate three types of saliency maps. Our results show the extent to which existing claims about Atari games can be evaluated and suggest that saliency maps are best viewed as an exploratory tool rather than an explanatory tool. △ Less

Submitted 20 February, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: Published at ICLR 2020

arXiv:1905.02825 [pdf, other]

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning

Authors: Emma Tosch, Kaleigh Clary, John Foley, David Jensen

Abstract: Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behav… ▽ More Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behavior. We present TOYBOX, a new high-performance, open-source* subset of Atari environments re-designed for the experimental evaluation of deep RL. We show that TOYBOX enables a wide range of experiments and analyses that are impossible in other environments. *https://kdl-umass.github.io/Toybox/ △ Less

Submitted 7 May, 2019; originally announced May 2019.

arXiv:1904.06312 [pdf, other]

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

Authors: Kaleigh Clary, Emma Tosch, John Foley, David Jensen

Abstract: Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variabi… ▽ More Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variability in reinforcement learning agents that make reporting common summary statistics an unsound metric for performance. Our experiments demonstrate the variability of common agents used in the popular OpenAI Baselines repository. We make the case for reporting post-training agent performance as a distribution, rather than a point estimate. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: NeurIPS 2018 Critiquing and Correcting Trends Workshop

arXiv:1812.02850 [pdf, other]

ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents

Authors: John Foley, Emma Tosch, Kaleigh Clary, David Jensen

Abstract: It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the sequential nature of their predictions. Recently, the Arcade Learning Environment (ALE) has become one of the most widely used benchmark suites for d… ▽ More It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the sequential nature of their predictions. Recently, the Arcade Learning Environment (ALE) has become one of the most widely used benchmark suites for deep learning research, and state-of-the-art Reinforcement Learning (RL) agents have been shown to routinely equal or exceed human performance on many ALE tasks. Since ALE is based on emulation of original Atari games, the environment does not provide semantically meaningful representations of internal game state. This means that ALE has limited utility as an environment for supporting testing or model introspection. We propose ToyBox, a collection of reimplementations of these games that solves this critical problem and enables robust testing of RL agents. △ Less

Submitted 25 January, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

Comments: NeurIPS Systems for ML Workshop

Showing 1–4 of 4 results for author: Clary, K