Skip to main content

Showing 1–9 of 9 results for author: Lehnert, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.14083  [pdf, other

    cs.AI

    Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrap**

    Authors: Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian

    Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A^*$ se… ▽ More

    Submitted 26 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  2. arXiv:2306.14808  [pdf, other

    cs.LG

    Maximum State Entropy Exploration using Predecessor and Successor Representations

    Authors: Arnav Kumar Jain, Lucas Lehnert, Irina Rish, Glen Berseth

    Abstract: Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misplaced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition o… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  3. arXiv:2306.00867  [pdf, other

    cs.LG cs.AI

    IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

    Authors: Rohan Chitnis, Yingchen Xu, Bobak Hashemi, Lucas Lehnert, Urun Dogan, Zheqing Zhu, Olivier Delalleau

    Abstract: Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Journal ref: Short version published at ICRA 2024 (https://tinyurl.com/icra24-iqltdmpc)

  4. arXiv:2211.03281  [pdf, other

    cs.LG cs.AI

    Reward-Predictive Clustering

    Authors: Lucas Lehnert, Michael J. Frank, Michael L. Littman

    Abstract: Recent advances in reinforcement-learning research have demonstrated impressive results in building algorithms that can out-perform humans in complex tasks. Nevertheless, creating reinforcement-learning systems that can build abstractions of their experience to accelerate learning in new contexts still remains an active area of research. Previous work showed that reward-predictive state abstractio… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  5. arXiv:1901.11437  [pdf, ps, other

    cs.LG cs.AI

    Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning

    Authors: Lucas Lehnert, Michael L. Littman

    Abstract: A key question in reinforcement learning is how an intelligent agent can generalize knowledge across different inputs. By generalizing across different inputs, information learned for one input can be immediately reused for improving predictions for another input. Reusing information allows an agent to compute an optimal decision-making strategy using less data. State representation is a key eleme… ▽ More

    Submitted 4 October, 2020; v1 submitted 31 January, 2019; originally announced January 2019.

  6. arXiv:1812.01129  [pdf, other

    cs.LG cs.AI

    Mitigating Planner Overfitting in Model-Based Reinforcement Learning

    Authors: Dilip Arumugam, David Abel, Kavosh Asadi, Nakul Gopalan, Christopher Grimm, Jun Ki Lee, Lucas Lehnert, Michael L. Littman

    Abstract: An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slo… ▽ More

    Submitted 19 March, 2020; v1 submitted 3 December, 2018; originally announced December 2018.

  7. arXiv:1807.01736  [pdf, other

    cs.LG cs.AI stat.ML

    Transfer with Model Features in Reinforcement Learning

    Authors: Lucas Lehnert, Michael L. Littman

    Abstract: A key question in Reinforcement Learning is which representation an agent can learn to efficiently reuse knowledge between different tasks. Recently the Successor Representation was shown to have empirical benefits for transferring knowledge between tasks with shared transition dynamics. This paper presents Model Features: a feature representation that clusters behaviourally equivalent states and… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  8. arXiv:1708.00102  [pdf, other

    cs.AI cs.LG stat.ML

    Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning

    Authors: Lucas Lehnert, Stefanie Tellex, Michael L. Littman

    Abstract: One question central to Reinforcement Learning is how to learn a feature representation that supports algorithm scaling and re-use of learned information from different tasks. Successor Features approach this problem by learning a feature representation that satisfies a temporal constraint. We present an implementation of an approach that decouples the feature representation from the reward functi… ▽ More

    Submitted 31 July, 2017; originally announced August 2017.

  9. arXiv:1512.04105  [pdf, other

    cs.AI cs.LG

    Policy Gradient Methods for Off-policy Control

    Authors: Lucas Lehnert, Doina Precup

    Abstract: Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using function approximation and incremental updates. However, they have been developed for the case of a fixed behavior policy. In control problems, one would like to… ▽ More

    Submitted 13 December, 2015; originally announced December 2015.