Skip to main content

Showing 1–50 of 78 results for author: Littman, M

.
  1. arXiv:2407.07333  [pdf, other

    cs.LG cs.AI stat.ML

    Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

    Authors: Cameron Allen, Aaron Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael L. Littman, George Konidaris

    Abstract: Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, wit… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: GitHub URL: https://github.com/brownirl/lambda_discrepancy

  2. arXiv:2407.03321  [pdf, other

    cs.CL cs.AI cs.LG

    Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

    Authors: Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael L. Littman, Stephen H. Bach

    Abstract: Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First,… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

    Authors: Cambridge Yang, Michael Littman, Michael Carbin

    Abstract: In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable: There are algorithms that learn a near-optimal policy with high probability using a finite amount of samples and computation. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewa… ▽ More

    Submitted 19 March, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  4. A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

    Authors: Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee , et al. (22 additional authors not shown)

    Abstract: Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through th… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: To appear in Neural Networks

  5. arXiv:2212.03733  [pdf, other

    cs.LG cs.AI

    Tiered Reward Functions: Specifying and Fast Learning of Desired Behavior

    Authors: Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael L. Littman

    Abstract: Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our job in the learning process is to design reward functions to express desired behavior and enable the agent to learn such behavior swiftly. In this work, we consider the reward-design problem in tasks formulated as reaching desirable states and avoiding undesirable states. To start, we… ▽ More

    Submitted 15 February, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: For code, see https://github.com/zhouzypaul/tiered-reward

  6. arXiv:2211.14673  [pdf, other

    cs.AI

    Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex

    Authors: Charles Lovering, Jessica Zosa Forde, George Konidaris, Ellie Pavlick, Michael L. Littman

    Abstract: AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like chess, Go, shogi, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts that humans consider important, it is unclear how these concepts are captured in the network. We inve… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: 10 pages, Neural Information Processing Systems 2022

  7. arXiv:2211.03281  [pdf, other

    cs.LG cs.AI

    Reward-Predictive Clustering

    Authors: Lucas Lehnert, Michael J. Frank, Michael L. Littman

    Abstract: Recent advances in reinforcement-learning research have demonstrated impressive results in building algorithms that can out-perform humans in complex tasks. Nevertheless, creating reinforcement-learning systems that can build abstractions of their experience to accelerate learning in new contexts still remains an active area of research. Previous work showed that reward-predictive state abstractio… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  8. arXiv:2210.15767  [pdf

    cs.AI

    Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report

    Authors: Michael L. Littman, Ifeoma Ajunwa, Guy Berger, Craig Boutilier, Morgan Currie, Finale Doshi-Velez, Gillian Hadfield, Michael C. Horowitz, Charles Isbell, Hiroaki Kitano, Karen Levy, Terah Lyons, Melanie Mitchell, Julie Shah, Steven Sloman, Shannon Vallor, Toby Walsh

    Abstract: In September 2021, the "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the second report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Michael Littman of Brown University. The report, entitled "Gathering Strengt… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 82 pages, https://ai100.stanford.edu/gathering-strength-gathering-storms-one-hundred-year-study-artificial-intelligence-ai100-2021-study

  9. arXiv:2210.11579  [pdf, other

    cs.LG

    Model-based Lifelong Reinforcement Learning with Bayesian Exploration

    Authors: Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris

    Abstract: We propose a model-based lifelong reinforcement-learning approach that estimates a hierarchical Bayesian posterior distilling the common structure shared across different tasks. The learned posterior combined with a sample-based Bayesian exploration procedure increases the sample efficiency of learning across a family of related tasks. We first derive an analysis of the relationship between the sa… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  10. arXiv:2206.03597  [pdf, other

    cs.LG cs.AI

    Meta-Learning Parameterized Skills

    Authors: Haotian Fu, Shangqun Yu, Saket Tiwari, Michael Littman, George Konidaris

    Abstract: We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We propose to leverage off-policy Meta-RL combined with a trajectory-centric smoothness term to learn a set of parameterized skills. Our agent can use these learned skills to construct a th… ▽ More

    Submitted 19 July, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

  11. arXiv:2205.15400  [pdf, other

    cs.LG cs.AI

    Designing Rewards for Fast Learning

    Authors: Henry Sowerby, Zhiyuan Zhou, Michael L. Littman

    Abstract: To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward function for the environment, arguably the most important knob designers have in interacting with RL agents. Although many reward functions induce the same optimal behavior (Ng et al., 1999), in practice, some of them result in faster learning than others. In this paper, we look at how reward-design… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: To appear at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM2022)

  12. arXiv:2203.10614  [pdf, other

    cs.LG

    Does DQN really learn? Exploring adversarial training schemes in Pong

    Authors: Bowen He, Sreehari Rammohan, Jessica Forde, Michael Littman

    Abstract: In this work, we study two self-play training schemes, Chainer and Pool, and show they lead to improved agent performance in Atari Pong compared to a standard DQN agent -- trained against the built-in Atari opponent. To measure agent performance, we define a robustness metric that captures how difficult it is to learn a strategy that beats the agent's learned policy. Through playing past versions… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

    Comments: RLDM 2022

  13. arXiv:2112.05848  [pdf, other

    cs.LG cs.AI

    Faster Deep Reinforcement Learning with Slower Online Network

    Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola

    Abstract: Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrap**. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with u… ▽ More

    Submitted 17 April, 2023; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  14. arXiv:2112.05218  [pdf, other

    cs.AI

    Learning Generalizable Behavior via Visual Rewrite Rules

    Authors: Yiheng Xie, Mingxuan Li, Shangqun Yu, Michael Littman

    Abstract: Though deep reinforcement learning agents have achieved unprecedented success in recent years, their learned policies can be brittle, failing to generalize to even slight modifications of their environments or unfamiliar situations. The black-box nature of the neural network learning dynamics makes it impossible to audit trained deep agents and recover from such failures. In this paper, we propose… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: AAAI 2022 Workshop on Reinforcement Learning in Games

  15. arXiv:2111.12679  [pdf, other

    cs.AI cs.FL cs.LG

    On the (In)Tractability of Reinforcement Learning for LTL Objectives

    Authors: Cambridge Yang, Michael Littman, Michael Carbin

    Abstract: In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved. Previous studies have alluded to this fact but have not examined it in depth. In this paper, we address the tract… ▽ More

    Submitted 24 June, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

  16. arXiv:2111.04147  [pdf, other

    cs.AI cs.FL

    Learning Finite Linear Temporal Logic Specifications with a Specialized Neural Operator

    Authors: Homer Walke, Daniel Ritter, Carl Trimbach, Michael Littman

    Abstract: Finite linear temporal logic ($\mathsf{LTL}_f$) is a powerful formal representation for modeling temporal sequences. We address the problem of learning a compact $\mathsf{LTL}_f$ formula from labeled traces of system behavior. We propose a novel neural network operator and evaluate the resulting architecture, Neural$\mathsf{LTL}_f$. Our approach includes a specialized recurrent filter, designed to… ▽ More

    Submitted 21 November, 2021; v1 submitted 7 November, 2021; originally announced November 2021.

    Comments: 10 pages, 5 figures

  17. arXiv:2111.00876  [pdf, other

    cs.LG cs.AI

    On the Expressivity of Markov Reward

    Authors: David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

    Abstract: Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajector… ▽ More

    Submitted 18 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021

  18. arXiv:2110.12276  [pdf, other

    cs.LG

    Coarse-Grained Smoothness for RL in Metric Spaces

    Authors: Omer Gottesman, Kavosh Asadi, Cameron Allen, Sam Lobel, George Konidaris, Michael Littman

    Abstract: Principled decision-making in continuous state--action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

  19. arXiv:2110.03424  [pdf, other

    cs.LG cs.AI

    Bad-Policy Density: A Measure of Reinforcement Learning Hardness

    Authors: David Abel, Cameron Allen, Dilip Arumugam, D. Ellis Hershkowitz, Michael L. Littman, Lawson L. S. Wong

    Abstract: Reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement-learning hardness called the bad-policy density. This quantity measures the fraction of the deterministic stationary policy space that is below a desired thresh… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Presented at the 2021 ICML Workshop on Reinforcement Learning Theory

  20. arXiv:2109.07054  [pdf, other

    cs.LG cs.AI cs.DS cs.HC

    Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback

    Authors: Ishaan Shah, David Halpern, Kavosh Asadi, Michael L. Littman

    Abstract: Fluid human-agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are likely to provide. This work analyzes the COnvergent… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted into ICML 2021 workshops Human-AI Collaboration in Sequential Decision-Making and Human in the Loop Learning

  21. arXiv:2106.05506  [pdf, other

    cs.AI cs.LG

    Brittle AI, Causal Confusion, and Bad Mental Models: Challenges and Successes in the XAI Program

    Authors: Jeff Druce, James Niehaus, Vanessa Moody, David Jensen, Michael L. Littman

    Abstract: The advances in artificial intelligence enabled by deep learning architectures are undeniable. In several cases, deep neural network driven models have surpassed human level performance in benchmark autonomy tasks. The underlying policies for these agents, however, are not easily interpretable. In fact, given their underlying deep models, it is impossible to directly understand the map** from ob… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  22. People construct simplified mental representations to plan

    Authors: Mark K. Ho, David Abel, Carlos G. Correa, Michael L. Littman, Jonathan D. Cohen, Thomas L. Griffiths

    Abstract: One of the most striking features of human cognition is the capacity to plan. Two aspects of human planning stand out: its efficiency and flexibility. Efficiency is especially impressive because plans must often be made in complex environments, and yet people successfully plan solutions to myriad everyday problems despite having limited cognitive resources. Standard accounts in psychology, economi… ▽ More

    Submitted 26 November, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: 56 pages, 5 main figures, 10 extended data figures, supplementary information is included in ancillary files

    Journal ref: Nature, 606(7912), 129-136 (2022)

  23. arXiv:2104.00606  [pdf, other

    cs.LG cs.AI cs.CY

    Model Selection's Disparate Impact in Real-World Deep Learning Applications

    Authors: Jessica Zosa Forde, A. Feder Cooper, Kweku Kwegyir-Aggrey, Chris De Sa, Michael Littman

    Abstract: Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep… ▽ More

    Submitted 7 September, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Science and Engineering of Deep Learning Workshop, ICLR 2021

  24. arXiv:2010.08869  [pdf, other

    cs.AI

    Task Sco**: Generating Task-Specific Abstractions for Planning in Open-Scope Models

    Authors: Michael Fishman, Nishanth Kumar, Cameron Allen, Natasha Danas, Michael Littman, Stefanie Tellex, George Konidaris

    Abstract: A general-purpose planning agent requires an open-scope world model: one rich enough to tackle any of the wide range of tasks it may be asked to solve over its operational lifetime. This stands in contrast with typical planning approaches, where the scope of a model is limited to a specific family of tasks that share significant structure. Unfortunately, planning to solve any specific task using a… ▽ More

    Submitted 4 February, 2023; v1 submitted 17 October, 2020; originally announced October 2020.

  25. arXiv:2008.03229  [pdf, other

    cs.AI cs.LG

    Towards Sample Efficient Agents through Algorithmic Alignment

    Authors: Mingxuan Li, Michael L. Littman

    Abstract: In this work, we propose and explore Deep Graph Value Network (DeepGV) as a promising method to work around sample complexity in deep reinforcement-learning agents using a message-passing mechanism. The main idea is that the agent should be guided by structured non-neural-network algorithms like dynamic programming. According to recent advances in algorithmic alignment, neural networks with struct… ▽ More

    Submitted 21 October, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

  26. arXiv:2002.05769  [pdf, other

    cs.AI

    The Efficiency of Human Cognition Reflects Planned Information Processing

    Authors: Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths

    Abstract: Planning is useful. It lets people take actions that have desirable long-term consequences. But, planning is hard. It requires thinking about consequences, which consumes limited computational and cognitive resources. Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their actions. Put another way, people should also "plan their… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 13 pg (incl. supplemental materials); included in Proceedings of the 34th AAAI Conference on Artificial Intelligence

  27. arXiv:2002.05518  [pdf, other

    cs.LG cs.AI stat.ML

    Learning State Abstractions for Transfer in Continuous Control

    Authors: Kavosh Asadi, David Abel, Michael L. Littman

    Abstract: Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks. Our main contribution is a learning algorithm that ab… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  28. arXiv:2002.01883  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Radial-Basis Value Functions for Continuous Control

    Authors: Kavosh Asadi, Neev Parikh, Ronald E. Parr, George D. Konidaris, Michael L. Littman

    Abstract: A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the max… ▽ More

    Submitted 13 March, 2021; v1 submitted 5 February, 2020; originally announced February 2020.

    Comments: In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)

  29. arXiv:2001.05411  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Lifelong Reinforcement Learning

    Authors: Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu **nai, Emmanuel Rachelson, Michael L. Littman

    Abstract: We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfe… ▽ More

    Submitted 22 March, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: In proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 21 pages, 11 figures

  30. arXiv:1912.03606  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging

    Authors: John R. Zech, Jessica Zosa Forde, Michael L. Littman

    Abstract: We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (me… ▽ More

    Submitted 7 December, 2019; originally announced December 2019.

    Comments: J.Z. and J.F. contributed equally to this work

  31. arXiv:1908.08641  [pdf, other

    cs.HC cs.AI cs.GT

    Stackelberg Punishment and Bully-Proofing Autonomous Vehicles

    Authors: Matt Cooper, Jun Ki Lee, Jacob Beck, Joshua D. Fishman, Michael Gillett, Zoë Papakipos, Aaron Zhang, Jerome Ramos, Aansh Shah, Michael L. Littman

    Abstract: Mutually beneficial behavior in repeated games can be enforced via the threat of punishment, as enshrined in game theory's well-known "folk theorem." There is a cost, however, to a player for generating these disincentives. In this work, we seek to minimize this cost by computing a "Stackelberg punishment," in which the player selects a behavior that sufficiently punishes the other player while ma… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: 10 pages, The 11th International Conference on Social Robotics

  32. arXiv:1907.08478  [pdf, other

    cs.AI cs.HC

    Interactive Learning of Environment Dynamics for Sequential Tasks

    Authors: Robert Loftin, Bei Peng, Matthew E. Taylor, Michael L. Littman, David L. Roberts

    Abstract: In order for robots and other artificial agents to efficiently learn to perform useful tasks defined by an end user, they must understand not only the goals of those tasks, but also the structure and dynamics of that user's environment. While existing work has looked at how the goals of a task can be inferred from a human teacher, the agent is often left to learn about the environment on its own.… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

  33. arXiv:1905.13320  [pdf, other

    cs.LG cs.AI stat.ML

    Combating the Compounding-Error Problem with a Multi-step Model

    Authors: Kavosh Asadi, Dipendra Misra, Seungchan Kim, Michel L. Littman

    Abstract: Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  34. arXiv:1903.06209  [pdf, other

    cs.LG stat.ML

    Teaching with IMPACT

    Authors: Carl Trimbach, Michael Littman

    Abstract: Like many problems in AI in their general form, supervised learning is computationally intractable. We hypothesize that an important reason humans can learn highly complex and varied concepts, in spite of the computational difficulty, is that they benefit tremendously from experienced and insightful teachers. This paper proposes a new learning framework that provides a role for a knowledgeable, be… ▽ More

    Submitted 14 March, 2019; originally announced March 2019.

  35. arXiv:1902.04257  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement Learning from Policy-Dependent Human Feedback

    Authors: Dilip Arumugam, Jun Ki Lee, Sophie Saskin, Michael L. Littman

    Abstract: To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  36. arXiv:1901.11437  [pdf, ps, other

    cs.LG cs.AI

    Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning

    Authors: Lucas Lehnert, Michael L. Littman

    Abstract: A key question in reinforcement learning is how an intelligent agent can generalize knowledge across different inputs. By generalizing across different inputs, information learned for one input can be immediately reused for improving predictions for another input. Reusing information allows an agent to compute an optimal decision-making strategy using less data. State representation is a key eleme… ▽ More

    Submitted 4 October, 2020; v1 submitted 31 January, 2019; originally announced January 2019.

  37. arXiv:1901.06085  [pdf, other

    cs.AI cs.MA

    Theory of Minds: Understanding Behavior in Groups Through Inverse Planning

    Authors: Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum

    Abstract: Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

    Comments: published in AAAI 2019; Michael Shum and Max Kleiman-Weiner contributed equally

  38. arXiv:1901.05101  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback

    Authors: Jacob Beck, Zoe Papakipos, Michael Littman

    Abstract: In autonomous vehicle (AV) control, allowing mistakes can be quite dangerous and costly in the real world. For this reason we investigate methods of training an AV without allowing the agent to explore and instead having a human explorer collect the data. Supervised learning has been explored for AV control, but it encounters the issue of the covariate shift. That is, training data collected from… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

  39. arXiv:1812.02868  [pdf, other

    cs.LG cs.AI stat.ML

    Measuring and Characterizing Generalization in Deep Reinforcement Learning

    Authors: Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael Littman, David Jensen

    Abstract: Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-… ▽ More

    Submitted 11 December, 2018; v1 submitted 6 December, 2018; originally announced December 2018.

  40. arXiv:1812.01129  [pdf, other

    cs.LG cs.AI

    Mitigating Planner Overfitting in Model-Based Reinforcement Learning

    Authors: Dilip Arumugam, David Abel, Kavosh Asadi, Nakul Gopalan, Christopher Grimm, Jun Ki Lee, Lucas Lehnert, Michael L. Littman

    Abstract: An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slo… ▽ More

    Submitted 19 March, 2020; v1 submitted 3 December, 2018; originally announced December 2018.

  41. arXiv:1811.00128  [pdf, other

    cs.LG cs.AI stat.ML

    Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: When environmental interaction is expensive, model-based reinforcement learning offers a solution by planning ahead and avoiding costly mistakes. Model-based agents typically learn a single-step transition model. In this paper, we propose a multi-step model that predicts the outcome of an action sequence with variable length. We show that this model is easy to learn, and that the model can make po… ▽ More

    Submitted 31 October, 2018; originally announced November 2018.

  42. arXiv:1810.07311  [pdf, other

    cs.AI

    Finding Options that Minimize Planning Time

    Authors: Yuu **nai, David Abel, D Ellis Hershkowitz, Michael Littman, George Konidaris

    Abstract: We formalize the problem of selecting the optimal set of options for planning as that of computing the smallest set of options so that planning converges in less than a given maximum of value-iteration passes. We first show that the problem is NP-hard, even if the task is constrained to be deterministic---the first such complexity result for option discovery. We then present the first polynomial-t… ▽ More

    Submitted 16 March, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

  43. arXiv:1809.10025  [pdf, ps, other

    cs.CY cs.LG stat.ML

    Personalized Education at Scale

    Authors: Sam Saarinen, Evan Cater, Michael Littman

    Abstract: Tailoring the presentation of information to the needs of individual students leads to massive gains in student outcomes~\cite{bloom19842}. This finding is likely due to the fact that different students learn differently, perhaps as a result of variation in ability, interest or other factors~\cite{schiefele1992interest}. Adapting presentations to the educational needs of an individual has traditio… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

  44. arXiv:1807.01736  [pdf, other

    cs.LG cs.AI stat.ML

    Transfer with Model Features in Reinforcement Learning

    Authors: Lucas Lehnert, Michael L. Littman

    Abstract: A key question in Reinforcement Learning is which representation an agent can learn to efficiently reuse knowledge between different tasks. Recently the Successor Representation was shown to have empirical benefits for transferring knowledge between tasks with shared transition dynamics. This paper presents Model Features: a feature representation that clusters behaviourally equivalent states and… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  45. arXiv:1806.01265  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: Learning a generative model is a key component of model-based reinforcement learning. Though learning a good model in the tabular setting is a simple task, learning a useful model in the approximate setting is challenging. In this context, an important question is the loss function used for model learning as varying the loss function can have a remarkable impact on effectiveness of planning. Recen… ▽ More

    Submitted 8 July, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: Accepted at the FAIM workshop "Prediction and Generative Modeling in Reinforcement Learning", Stockholm, Sweden, 2018

  46. arXiv:1804.07193  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Continuity in Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Dipendra Misra, Michael L. Littman

    Abstract: We examine the impact of learning Lipschitz continuous models in the context of model-based reinforcement learning. We provide a novel bound on multi-step prediction error of Lipschitz models where we quantify the error using the Wasserstein metric. We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Li… ▽ More

    Submitted 27 July, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

    Comments: Accepted for the 35th International Conference on Machine Learning (ICML 2018)

  47. arXiv:1710.09718  [pdf, other

    cs.LG

    Learning Approximate Stochastic Transition Models

    Authors: Yuhang Song, Christopher Grimm, Xianming Wang, Michael L. Littman

    Abstract: We examine the problem of learning map**s from state to state, suitable for use in a model-based reinforcement-learning setting, that simultaneously generalize to novel states and can capture stochastic transitions. We show that currently popular generative adversarial networks struggle to learn these stochastic transition models but a modification to their loss functions results in a powerful l… ▽ More

    Submitted 26 October, 2017; originally announced October 2017.

  48. arXiv:1709.06533  [pdf, other

    cs.LG cs.AI stat.ML

    Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting

    Authors: Christopher Grimm, Yuhang Song, Michael L. Littman

    Abstract: Generative adversarial networks (GANs) are an exciting alternative to algorithms for solving density estimation problems---using data to assess how likely samples are to be drawn from the same distribution. Instead of explicitly computing these probabilities, GANs learn a generator that can match the given probabilistic source. This paper looks particularly at this matching capability in the conte… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

  49. arXiv:1709.00503  [pdf, other

    stat.ML cs.AI cs.LG

    Mean Actor Critic

    Authors: Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman

    Abstract: We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate rel… ▽ More

    Submitted 22 May, 2018; v1 submitted 1 September, 2017; originally announced September 2017.

  50. arXiv:1708.00102  [pdf, other

    cs.AI cs.LG stat.ML

    Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning

    Authors: Lucas Lehnert, Stefanie Tellex, Michael L. Littman

    Abstract: One question central to Reinforcement Learning is how to learn a feature representation that supports algorithm scaling and re-use of learned information from different tasks. Successor Features approach this problem by learning a feature representation that satisfies a temporal constraint. We present an implementation of an approach that decouples the feature representation from the reward functi… ▽ More

    Submitted 31 July, 2017; originally announced August 2017.