Skip to main content

Showing 1–8 of 8 results for author: Gehring, C

.
  1. arXiv:2402.05290  [pdf, other

    cs.LG cs.AI

    Do Transformer World Models Give Better Policy Gradients?

    Authors: Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

    Abstract: A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long… ▽ More

    Submitted 10 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Michel Ma and Pierluca D'Oro contributed equally

  2. arXiv:2401.08898  [pdf, other

    cs.LG cs.AI

    Bridging State and History Representations: Understanding Self-Predictive RL

    Authors: Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon

    Abstract: Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared propertie… ▽ More

    Submitted 21 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ICLR 2024 (Poster). Code is available at https://github.com/twni2016/self-predictive-rl

  3. arXiv:2310.15386  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Course Correcting Koopman Representations

    Authors: Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin

    Abstract: Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over… ▽ More

    Submitted 23 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  4. arXiv:2109.14830  [pdf, other

    cs.AI cs.LG

    Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

    Authors: Clement Gehring, Masataro Asai, Rohan Chitnis, Tom Silver, Leslie Pack Kaelbling, Shirin Sohrabi, Michael Katz

    Abstract: Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic… ▽ More

    Submitted 7 March, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: Equal contributions by the first two authors. This manuscript is a camera-ready version accepted in ICAPS-2022. It is significantly updated from past versions (e.g., in the ICAPS PRL (Planning and RL) workshop) with additional experiments comparing existing work (STRIPS-HGN (Shen, Trevizan, and Thiebaux 2020) and GBFS-GNN (Rivlin, Hazan, and Karpas 2019))

  5. Whole-Body Nonlinear Model Predictive Control Through Contacts for Quadrupeds

    Authors: Michael Neunert, Markus Stäuble, Markus Giftthaler, Carmine D. Bellicoso, Jan Carius, Christian Gehring, Marco Hutter, Jonas Buchli

    Abstract: In this work we present a whole-body Nonlinear Model Predictive Control approach for Rigid Body Systems subject to contacts. We use a full dynamic system model which also includes explicit contact dynamics. Therefore, contact locations, sequences and timings are not prespecified but optimized by the solver. Yet, thorough numerical and software engineering allows for running the nonlinear Optimal C… ▽ More

    Submitted 7 December, 2017; originally announced December 2017.

    Comments: Submitted to "Robotics and Automation: Letters" / "International Conference on Robotics and Automation 2018"

  6. arXiv:1706.01445  [pdf, other

    stat.ML cs.LG math.OC

    Batched Large-scale Bayesian Optimization in High-dimensional Spaces

    Authors: Zi Wang, Clement Gehring, Pushmeet Kohli, Stefanie Jegelka

    Abstract: Bayesian optimization (BO) has become an effective approach for black-box function optimization problems when function evaluations are expensive and the optimum can be achieved within a relatively small number of queries. However, many cases, such as the ones with high-dimensional inputs, may require a much larger number of observations for optimization. Despite an abundance of observations thanks… ▽ More

    Submitted 15 May, 2018; v1 submitted 5 June, 2017; originally announced June 2017.

    Comments: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spain

  7. arXiv:1606.05285  [pdf, other

    cs.RO

    A Primer on the Differential Calculus of 3D Orientations

    Authors: Michael Bloesch, Hannes Sommer, Tristan Laidlow, Michael Burri, Gabriel Nuetzi, Péter Fankhauser, Dario Bellicoso, Christian Gehring, Stefan Leutenegger, Marco Hutter, Roland Siegwart

    Abstract: The proper handling of 3D orientations is a central element in many optimization problems in engineering. Unfortunately many researchers and engineers struggle with the formulation of such problems and often fall back to suboptimal solutions. The existence of many different conventions further complicates this issue, especially when interfacing multiple differing implementations. This document dis… ▽ More

    Submitted 31 October, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

  8. arXiv:1511.08495  [pdf, other

    cs.LG cs.AI

    Incremental Truncated LSTD

    Authors: Clement Gehring, Yangchen Pan, Martha White

    Abstract: Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the number of features, whereas least-squares temporal difference (LSTD) algorithms are sample efficient but can be quadratic in the number of features. In this work,… ▽ More

    Submitted 18 November, 2016; v1 submitted 26 November, 2015; originally announced November 2015.

    Comments: Accepted to IJCAI 2016