Skip to main content

Showing 1–50 of 60 results for author: Littman, M L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03321  [pdf, other

    cs.CL cs.AI cs.LG

    Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

    Authors: Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael L. Littman, Stephen H. Bach

    Abstract: Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First,… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

    Authors: Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee , et al. (22 additional authors not shown)

    Abstract: Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through th… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: To appear in Neural Networks

  3. arXiv:2212.03733  [pdf, other

    cs.LG cs.AI

    Tiered Reward Functions: Specifying and Fast Learning of Desired Behavior

    Authors: Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael L. Littman

    Abstract: Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our job in the learning process is to design reward functions to express desired behavior and enable the agent to learn such behavior swiftly. In this work, we consider the reward-design problem in tasks formulated as reaching desirable states and avoiding undesirable states. To start, we… ▽ More

    Submitted 15 February, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: For code, see https://github.com/zhouzypaul/tiered-reward

  4. arXiv:2211.14673  [pdf, other

    cs.AI

    Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex

    Authors: Charles Lovering, Jessica Zosa Forde, George Konidaris, Ellie Pavlick, Michael L. Littman

    Abstract: AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like chess, Go, shogi, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts that humans consider important, it is unclear how these concepts are captured in the network. We inve… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: 10 pages, Neural Information Processing Systems 2022

  5. arXiv:2211.03281  [pdf, other

    cs.LG cs.AI

    Reward-Predictive Clustering

    Authors: Lucas Lehnert, Michael J. Frank, Michael L. Littman

    Abstract: Recent advances in reinforcement-learning research have demonstrated impressive results in building algorithms that can out-perform humans in complex tasks. Nevertheless, creating reinforcement-learning systems that can build abstractions of their experience to accelerate learning in new contexts still remains an active area of research. Previous work showed that reward-predictive state abstractio… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  6. arXiv:2210.15767  [pdf

    cs.AI

    Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report

    Authors: Michael L. Littman, Ifeoma Ajunwa, Guy Berger, Craig Boutilier, Morgan Currie, Finale Doshi-Velez, Gillian Hadfield, Michael C. Horowitz, Charles Isbell, Hiroaki Kitano, Karen Levy, Terah Lyons, Melanie Mitchell, Julie Shah, Steven Sloman, Shannon Vallor, Toby Walsh

    Abstract: In September 2021, the "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the second report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Michael Littman of Brown University. The report, entitled "Gathering Strengt… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 82 pages, https://ai100.stanford.edu/gathering-strength-gathering-storms-one-hundred-year-study-artificial-intelligence-ai100-2021-study

  7. arXiv:2205.15400  [pdf, other

    cs.LG cs.AI

    Designing Rewards for Fast Learning

    Authors: Henry Sowerby, Zhiyuan Zhou, Michael L. Littman

    Abstract: To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward function for the environment, arguably the most important knob designers have in interacting with RL agents. Although many reward functions induce the same optimal behavior (Ng et al., 1999), in practice, some of them result in faster learning than others. In this paper, we look at how reward-design… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: To appear at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM2022)

  8. arXiv:2112.05848  [pdf, other

    cs.LG cs.AI

    Faster Deep Reinforcement Learning with Slower Online Network

    Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola

    Abstract: Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrap**. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with u… ▽ More

    Submitted 17 April, 2023; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  9. arXiv:2111.00876  [pdf, other

    cs.LG cs.AI

    On the Expressivity of Markov Reward

    Authors: David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

    Abstract: Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajector… ▽ More

    Submitted 18 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021

  10. arXiv:2110.03424  [pdf, other

    cs.LG cs.AI

    Bad-Policy Density: A Measure of Reinforcement Learning Hardness

    Authors: David Abel, Cameron Allen, Dilip Arumugam, D. Ellis Hershkowitz, Michael L. Littman, Lawson L. S. Wong

    Abstract: Reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement-learning hardness called the bad-policy density. This quantity measures the fraction of the deterministic stationary policy space that is below a desired thresh… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Presented at the 2021 ICML Workshop on Reinforcement Learning Theory

  11. arXiv:2109.07054  [pdf, other

    cs.LG cs.AI cs.DS cs.HC

    Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback

    Authors: Ishaan Shah, David Halpern, Kavosh Asadi, Michael L. Littman

    Abstract: Fluid human-agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are likely to provide. This work analyzes the COnvergent… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted into ICML 2021 workshops Human-AI Collaboration in Sequential Decision-Making and Human in the Loop Learning

  12. arXiv:2106.05506  [pdf, other

    cs.AI cs.LG

    Brittle AI, Causal Confusion, and Bad Mental Models: Challenges and Successes in the XAI Program

    Authors: Jeff Druce, James Niehaus, Vanessa Moody, David Jensen, Michael L. Littman

    Abstract: The advances in artificial intelligence enabled by deep learning architectures are undeniable. In several cases, deep neural network driven models have surpassed human level performance in benchmark autonomy tasks. The underlying policies for these agents, however, are not easily interpretable. In fact, given their underlying deep models, it is impossible to directly understand the map** from ob… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  13. People construct simplified mental representations to plan

    Authors: Mark K. Ho, David Abel, Carlos G. Correa, Michael L. Littman, Jonathan D. Cohen, Thomas L. Griffiths

    Abstract: One of the most striking features of human cognition is the capacity to plan. Two aspects of human planning stand out: its efficiency and flexibility. Efficiency is especially impressive because plans must often be made in complex environments, and yet people successfully plan solutions to myriad everyday problems despite having limited cognitive resources. Standard accounts in psychology, economi… ▽ More

    Submitted 26 November, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: 56 pages, 5 main figures, 10 extended data figures, supplementary information is included in ancillary files

    Journal ref: Nature, 606(7912), 129-136 (2022)

  14. arXiv:2008.03229  [pdf, other

    cs.AI cs.LG

    Towards Sample Efficient Agents through Algorithmic Alignment

    Authors: Mingxuan Li, Michael L. Littman

    Abstract: In this work, we propose and explore Deep Graph Value Network (DeepGV) as a promising method to work around sample complexity in deep reinforcement-learning agents using a message-passing mechanism. The main idea is that the agent should be guided by structured non-neural-network algorithms like dynamic programming. According to recent advances in algorithmic alignment, neural networks with struct… ▽ More

    Submitted 21 October, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

  15. arXiv:2002.05769  [pdf, other

    cs.AI

    The Efficiency of Human Cognition Reflects Planned Information Processing

    Authors: Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths

    Abstract: Planning is useful. It lets people take actions that have desirable long-term consequences. But, planning is hard. It requires thinking about consequences, which consumes limited computational and cognitive resources. Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their actions. Put another way, people should also "plan their… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 13 pg (incl. supplemental materials); included in Proceedings of the 34th AAAI Conference on Artificial Intelligence

  16. arXiv:2002.05518  [pdf, other

    cs.LG cs.AI stat.ML

    Learning State Abstractions for Transfer in Continuous Control

    Authors: Kavosh Asadi, David Abel, Michael L. Littman

    Abstract: Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks. Our main contribution is a learning algorithm that ab… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  17. arXiv:2002.01883  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Radial-Basis Value Functions for Continuous Control

    Authors: Kavosh Asadi, Neev Parikh, Ronald E. Parr, George D. Konidaris, Michael L. Littman

    Abstract: A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the max… ▽ More

    Submitted 13 March, 2021; v1 submitted 5 February, 2020; originally announced February 2020.

    Comments: In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)

  18. arXiv:2001.05411  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Lifelong Reinforcement Learning

    Authors: Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu **nai, Emmanuel Rachelson, Michael L. Littman

    Abstract: We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfe… ▽ More

    Submitted 22 March, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: In proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 21 pages, 11 figures

  19. arXiv:1912.03606  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging

    Authors: John R. Zech, Jessica Zosa Forde, Michael L. Littman

    Abstract: We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (me… ▽ More

    Submitted 7 December, 2019; originally announced December 2019.

    Comments: J.Z. and J.F. contributed equally to this work

  20. arXiv:1908.08641  [pdf, other

    cs.HC cs.AI cs.GT

    Stackelberg Punishment and Bully-Proofing Autonomous Vehicles

    Authors: Matt Cooper, Jun Ki Lee, Jacob Beck, Joshua D. Fishman, Michael Gillett, Zoë Papakipos, Aaron Zhang, Jerome Ramos, Aansh Shah, Michael L. Littman

    Abstract: Mutually beneficial behavior in repeated games can be enforced via the threat of punishment, as enshrined in game theory's well-known "folk theorem." There is a cost, however, to a player for generating these disincentives. In this work, we seek to minimize this cost by computing a "Stackelberg punishment," in which the player selects a behavior that sufficiently punishes the other player while ma… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: 10 pages, The 11th International Conference on Social Robotics

  21. arXiv:1907.08478  [pdf, other

    cs.AI cs.HC

    Interactive Learning of Environment Dynamics for Sequential Tasks

    Authors: Robert Loftin, Bei Peng, Matthew E. Taylor, Michael L. Littman, David L. Roberts

    Abstract: In order for robots and other artificial agents to efficiently learn to perform useful tasks defined by an end user, they must understand not only the goals of those tasks, but also the structure and dynamics of that user's environment. While existing work has looked at how the goals of a task can be inferred from a human teacher, the agent is often left to learn about the environment on its own.… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

  22. arXiv:1905.13320  [pdf, other

    cs.LG cs.AI stat.ML

    Combating the Compounding-Error Problem with a Multi-step Model

    Authors: Kavosh Asadi, Dipendra Misra, Seungchan Kim, Michel L. Littman

    Abstract: Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  23. arXiv:1902.04257  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement Learning from Policy-Dependent Human Feedback

    Authors: Dilip Arumugam, Jun Ki Lee, Sophie Saskin, Michael L. Littman

    Abstract: To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  24. arXiv:1901.11437  [pdf, ps, other

    cs.LG cs.AI

    Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning

    Authors: Lucas Lehnert, Michael L. Littman

    Abstract: A key question in reinforcement learning is how an intelligent agent can generalize knowledge across different inputs. By generalizing across different inputs, information learned for one input can be immediately reused for improving predictions for another input. Reusing information allows an agent to compute an optimal decision-making strategy using less data. State representation is a key eleme… ▽ More

    Submitted 4 October, 2020; v1 submitted 31 January, 2019; originally announced January 2019.

  25. arXiv:1901.06085  [pdf, other

    cs.AI cs.MA

    Theory of Minds: Understanding Behavior in Groups Through Inverse Planning

    Authors: Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum

    Abstract: Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

    Comments: published in AAAI 2019; Michael Shum and Max Kleiman-Weiner contributed equally

  26. arXiv:1812.01129  [pdf, other

    cs.LG cs.AI

    Mitigating Planner Overfitting in Model-Based Reinforcement Learning

    Authors: Dilip Arumugam, David Abel, Kavosh Asadi, Nakul Gopalan, Christopher Grimm, Jun Ki Lee, Lucas Lehnert, Michael L. Littman

    Abstract: An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slo… ▽ More

    Submitted 19 March, 2020; v1 submitted 3 December, 2018; originally announced December 2018.

  27. arXiv:1811.00128  [pdf, other

    cs.LG cs.AI stat.ML

    Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: When environmental interaction is expensive, model-based reinforcement learning offers a solution by planning ahead and avoiding costly mistakes. Model-based agents typically learn a single-step transition model. In this paper, we propose a multi-step model that predicts the outcome of an action sequence with variable length. We show that this model is easy to learn, and that the model can make po… ▽ More

    Submitted 31 October, 2018; originally announced November 2018.

  28. arXiv:1807.01736  [pdf, other

    cs.LG cs.AI stat.ML

    Transfer with Model Features in Reinforcement Learning

    Authors: Lucas Lehnert, Michael L. Littman

    Abstract: A key question in Reinforcement Learning is which representation an agent can learn to efficiently reuse knowledge between different tasks. Recently the Successor Representation was shown to have empirical benefits for transferring knowledge between tasks with shared transition dynamics. This paper presents Model Features: a feature representation that clusters behaviourally equivalent states and… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  29. arXiv:1806.01265  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: Learning a generative model is a key component of model-based reinforcement learning. Though learning a good model in the tabular setting is a simple task, learning a useful model in the approximate setting is challenging. In this context, an important question is the loss function used for model learning as varying the loss function can have a remarkable impact on effectiveness of planning. Recen… ▽ More

    Submitted 8 July, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: Accepted at the FAIM workshop "Prediction and Generative Modeling in Reinforcement Learning", Stockholm, Sweden, 2018

  30. arXiv:1804.07193  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Continuity in Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Dipendra Misra, Michael L. Littman

    Abstract: We examine the impact of learning Lipschitz continuous models in the context of model-based reinforcement learning. We provide a novel bound on multi-step prediction error of Lipschitz models where we quantify the error using the Wasserstein metric. We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Li… ▽ More

    Submitted 27 July, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

    Comments: Accepted for the 35th International Conference on Machine Learning (ICML 2018)

  31. arXiv:1710.09718  [pdf, other

    cs.LG

    Learning Approximate Stochastic Transition Models

    Authors: Yuhang Song, Christopher Grimm, Xianming Wang, Michael L. Littman

    Abstract: We examine the problem of learning map**s from state to state, suitable for use in a model-based reinforcement-learning setting, that simultaneously generalize to novel states and can capture stochastic transitions. We show that currently popular generative adversarial networks struggle to learn these stochastic transition models but a modification to their loss functions results in a powerful l… ▽ More

    Submitted 26 October, 2017; originally announced October 2017.

  32. arXiv:1709.06533  [pdf, other

    cs.LG cs.AI stat.ML

    Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting

    Authors: Christopher Grimm, Yuhang Song, Michael L. Littman

    Abstract: Generative adversarial networks (GANs) are an exciting alternative to algorithms for solving density estimation problems---using data to assess how likely samples are to be drawn from the same distribution. Instead of explicitly computing these probabilities, GANs learn a generator that can match the given probabilistic source. This paper looks particularly at this matching capability in the conte… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

  33. arXiv:1708.00102  [pdf, other

    cs.AI cs.LG stat.ML

    Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning

    Authors: Lucas Lehnert, Stefanie Tellex, Michael L. Littman

    Abstract: One question central to Reinforcement Learning is how to learn a feature representation that supports algorithm scaling and re-use of learned information from different tasks. Successor Features approach this problem by learning a feature representation that satisfies a temporal constraint. We present an implementation of an approach that decouples the feature representation from the reward functi… ▽ More

    Submitted 31 July, 2017; originally announced August 2017.

  34. arXiv:1706.00536  [pdf, other

    cs.AI

    Modeling Latent Attention Within Neural Networks

    Authors: Christopher Grimm, Dilip Arumugam, Siddharth Karamcheti, David Abel, Lawson L. S. Wong, Michael L. Littman

    Abstract: Deep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to such effective behaviors or, more critically, failure modes. In this work, we present a general method for visualizing an arbitrary neural network's inner mechani… ▽ More

    Submitted 30 December, 2017; v1 submitted 1 June, 2017; originally announced June 2017.

  35. arXiv:1704.04341  [pdf, other

    cs.AI

    Environment-Independent Task Specifications via GLTL

    Authors: Michael L. Littman, Ufuk Topcu, Jie Fu, Charles Isbell, Min Wen, James MacGlashan

    Abstract: We propose a new task-specification language for Markov decision processes that is designed to be an improvement over reward functions by being environment independent. The language is a variant of Linear Temporal Logic (LTL) that is extended to probabilistic specifications in a way that permits approximations to be learned in finite time. We provide several small environments that demonstrate the… ▽ More

    Submitted 13 April, 2017; originally announced April 2017.

  36. arXiv:1701.06049  [pdf, other

    cs.AI

    Interactive Learning from Policy-Dependent Human Feedback

    Authors: James MacGlashan, Mark K Ho, Robert Loftin, Bei Peng, Guan Wang, David Roberts, Matthew E. Taylor, Michael L. Littman

    Abstract: This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy. We present empirical results that show this assump… ▽ More

    Submitted 28 January, 2023; v1 submitted 21 January, 2017; originally announced January 2017.

    Comments: 8 pages + references, 5 figures

    ACM Class: I.2.6

    Journal ref: International Conference on Machine Learning. PMLR, 2017

  37. arXiv:1701.04113  [pdf, other

    cs.LG cs.AI

    Near Optimal Behavior via Approximate State Abstraction

    Authors: David Abel, D. Ellis Hershkowitz, Michael L. Littman

    Abstract: The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opp… ▽ More

    Submitted 15 January, 2017; originally announced January 2017.

    Comments: Earlier version published at ICML 2016

  38. arXiv:1612.05628  [pdf, other

    cs.AI cs.LG stat.ML

    An Alternative Softmax Operator for Reinforcement Learning

    Authors: Kavosh Asadi, Michael L. Littman

    Abstract: A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly… ▽ More

    Submitted 14 June, 2017; v1 submitted 16 December, 2016; originally announced December 2016.

  39. arXiv:1302.4971  [pdf

    cs.AI

    On the Complexity of Solving Markov Decision Problems

    Authors: Michael L. Littman, Thomas L. Dean, Leslie Pack Kaelbling

    Abstract: Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practica… ▽ More

    Submitted 20 February, 2013; originally announced February 2013.

    Comments: Appears in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995)

    Report number: UAI-P-1995-PG-394-402

  40. arXiv:1302.1540  [pdf

    cs.AI

    The Complexity of Plan Existence and Evaluation in Probabilistic Domains

    Authors: Judy Goldsmith, Michael L. Littman, Martin Mundhenk

    Abstract: We examine the computational complexity of testing and finding small plans in probabilistic planning domains with succinct representations. We find that many problems of interest are complete for a variety of complexity classes: NP, co-NP, PP, NP^PP, co-NP^PP, and PSPACE. Of these, the probabilistic classes PP and NP^PP are likely to be of special interest in the field of uncertainty in artifici… ▽ More

    Submitted 6 February, 2013; originally announced February 2013.

    Comments: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997)

    Report number: UAI-P-1997-PG-182-189

  41. arXiv:1302.1525  [pdf

    cs.AI

    Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes

    Authors: Anthony R. Cassandra, Michael L. Littman, Nevin Lianwen Zhang

    Abstract: Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We… ▽ More

    Submitted 6 February, 2013; originally announced February 2013.

    Comments: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997)

    Report number: UAI-P-1997-PG-54-61

  42. arXiv:1301.2281  [pdf

    cs.GT cs.AI

    Graphical Models for Game Theory

    Authors: Michael Kearns, Michael L. Littman, Satinder Singh

    Abstract: In this work, we introduce graphical modelsfor multi-player game theory, and give powerful algorithms for computing their Nash equilibria in certain cases. An n-player game is given by an undirected graph on n nodes and a set of n local matrices. The interpretation is that the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph, and thus the payoff m… ▽ More

    Submitted 7 March, 2015; v1 submitted 10 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

    Report number: UAI-P-2001-PG-253-260

  43. arXiv:1206.6870  [pdf

    cs.LG cs.AI stat.ML

    Incremental Model-based Learners With Formal Learning-Time Guarantees

    Authors: Alexander L. Strehl, Lihong Li, Michael L. Littman

    Abstract: Model-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic programming (RTDP) to speed up two model-based algorith… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

    Report number: UAI-P-2006-PG-485-493

  44. arXiv:1206.6855  [pdf

    cs.GT

    An Efficient Optimal-Equilibrium Algorithm for Two-player Game Trees

    Authors: Michael L. Littman, Nishkam Ravi, Arjun Talwar, Martin Zinkevich

    Abstract: Two-player complete-information game trees are perhaps the simplest possible setting for studying general-sum games and the computational problem of finding equilibria. These games admit a simple bottom-up algorithm for finding subgame perfect Nash equilibria efficiently. However, such an algorithm can fail to identify optimal equilibria, such as those that maximize social welfare. The reason is t… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

    Report number: UAI-P-2006-PG-298-305

  45. arXiv:1206.3277  [pdf

    cs.GT

    A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games

    Authors: Enrique Munoz de Cote, Michael L. Littman

    Abstract: We present a polynomial-time algorithm that always finds an (approximate) Nash equilibrium for repeated two-player stochastic games. The algorithm exploits the folk theorem to derive a strategy profile that forms an equilibrium by buttressing mutually beneficial behavior with threats, where possible. One component of our algorithm efficiently searches for an approximation of the egalitarian point,… ▽ More

    Submitted 13 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

    Report number: UAI-P-2008-PG-419-426

  46. arXiv:1206.3231  [pdf

    cs.LG stat.ML

    CORL: A Continuous-state Offset-dynamics Reinforcement Learner

    Authors: Emma Brunskill, Bethany Leffler, Lihong Li, Michael L. Littman, Nicholas Roy

    Abstract: Continuous state spaces and stochastic, switching dynamics characterize a number of rich, realworld domains, such as robot navigation across varying terrain. We describe a reinforcementlearning algorithm for learning in these domains and prove for certain environments the algorithm is probably approximately correct with a sample complexity that scales polynomially with the state-space dimension. U… ▽ More

    Submitted 13 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

    Report number: UAI-P-2008-PG-53-61

  47. arXiv:1205.2664  [pdf

    cs.LG

    A Bayesian Sampling Approach to Exploration in Reinforcement Learning

    Authors: John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate

    Abstract: We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and how to combine the models. We show that our algorit… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-19-26

  48. arXiv:1205.2606  [pdf

    cs.LG cs.AI

    Exploring compact reinforcement-learning representations with linear regression

    Authors: Thomas J. Walsh, Istvan Szita, Carlos Diuk, Michael L. Littman

    Abstract: This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK l… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-591-598

  49. arXiv:1202.3699  [pdf

    cs.AI

    Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

    Authors: John Asmuth, Michael L. Littman

    Abstract: Bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in the use of Monte-Carlo tree search (MCTS) have shown that it is possible to act near-optimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayes-optimal behavior in an unknown MDP is equivalent to optimal behavior in the known belief-space MDP, although the size of this b… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

    Report number: UAI-P-2011-PG-19-26

  50. arXiv:1107.3090  [pdf, other

    cs.CC cs.LG eess.SY math.OC

    On the Computational Complexity of Stochastic Controller Optimization in POMDPs

    Authors: Nikos Vlassis, Michael L. Littman, David Barber

    Abstract: We show that the problem of finding an optimal stochastic 'blind' controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science. Our result establishes that the more general problem of stochastic controller optimization in… ▽ More

    Submitted 4 October, 2012; v1 submitted 15 July, 2011; originally announced July 2011.

    Comments: Corrected error in the proof of Theorem 2, and revised Section 5

    ACM Class: F.2.1