Skip to main content

Showing 1–3 of 3 results for author: Talvitie, E J

.
  1. arXiv:2406.16006  [pdf, other

    cs.LG cs.AI

    Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

    Authors: Erin J. Talvitie, Zilei Shao, Huiying Li, **ghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

    Abstract: In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: To appear: Reinforcement Learning Conference (RLC), 2024

  2. arXiv:2007.02418  [pdf, other

    cs.LG cs.AI stat.ML

    Selective Dyna-style Planning Under Limited Model Capacity

    Authors: Zaheer Abbas, Samuel Sokota, Erin J. Talvitie, Martha White

    Abstract: In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this paper, we investigate the idea of using an imperfect model selectively. The agent should plan in parts of the state space where the model would be helpful but… ▽ More

    Submitted 7 March, 2021; v1 submitted 5 July, 2020; originally announced July 2020.

    Comments: Accepted at ICML 2020

  3. arXiv:1806.01825  [pdf, other

    cs.AI

    The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

    Authors: G. Zacharias Holland, Erin J. Talvitie, Michael Bowling

    Abstract: Dyna is a fundamental approach to model-based reinforcement learning (MBRL) that interleaves planning, acting, and learning in an online setting. In the most typical application of Dyna, the dynamics model is used to generate one-step transitions from selected start states from the agent's history, which are used to update the agent's value function or policy as if they were real experiences. In t… ▽ More

    Submitted 28 March, 2019; v1 submitted 5 June, 2018; originally announced June 2018.