Skip to main content

Showing 1–7 of 7 results for author: Belousov, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2311.16656  [pdf, other

    cs.LG stat.ML

    Pseudo-Likelihood Inference

    Authors: Theo Gruner, Boris Belousov, Fabio Muratore, Daniel Palenicek, Jan Peters

    Abstract: Simulation-Based Inference (SBI) is a common name for an emerging family of approaches that infer the model parameters when the likelihood is intractable. Existing SBI methods either approximate the likelihood, such as Approximate Bayesian Computation (ABC) or directly model the posterior, such as Sequential Neural Posterior Estimation (SNPE). While ABC is efficient on low-dimensional problems, on… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 27 pages, 12 figures, Published as a conference paper at NeurIPS 2023

  2. arXiv:1910.03620  [pdf, ps, other

    cs.LG cs.RO stat.ML

    Receding Horizon Curiosity

    Authors: Matthias Schultheis, Boris Belousov, Hany Abdulsamad, Jan Peters

    Abstract: Sample-efficient exploration is crucial not only for discovering rewarding experiences but also for adapting to environment changes in a task-agnostic fashion. A principled treatment of the problem of optimal input synthesis for system identification is provided within the framework of sequential Bayesian experimental design. In this paper, we present an effective trajectory-optimization-based app… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: Published at Conference on Robot Learning (CoRL 2019)

  3. arXiv:1910.02826  [pdf, other

    cs.LG stat.ML

    Self-Paced Contextual Reinforcement Learning

    Authors: Pascal Klink, Hany Abdulsamad, Boris Belousov, Jan Peters

    Abstract: Generalization and adaptation of learned skills to novel situations is a core requirement for intelligent autonomous robots. Although contextual reinforcement learning provides a principled framework for learning and generalization of behaviors across related tasks, it generally relies on uninformed sampling of environments from an unknown, uncontrolled context distribution, thus missing the benef… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  4. arXiv:1909.06153  [pdf, other

    cs.LG cs.RO stat.ML

    HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints

    Authors: Michael Lutter, Boris Belousov, Kim Listmann, Debora Clever, Jan Peters

    Abstract: Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose deep optimal feed… ▽ More

    Submitted 11 October, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: Conference on Robot Learning (CoRL) 2019

  5. arXiv:1907.04214  [pdf, other

    cs.LG stat.ML

    Entropic Regularization of Markov Decision Processes

    Authors: Boris Belousov, Jan Peters

    Abstract: An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment. Such interactive data gathering commonly leads to divergence towards dangerous or uninformative re… ▽ More

    Submitted 18 July, 2019; v1 submitted 6 July, 2019; originally announced July 2019.

    Comments: 16 pages, 4 figures, updated formatting, arXiv admin note: text overlap with arXiv:1801.00056

    Journal ref: Entropy 2019, 21(7), 674

  6. arXiv:1902.05605  [pdf, other

    cs.LG stat.ML

    CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

    Authors: Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, Jan Peters

    Abstract: Sample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce Cro… ▽ More

    Submitted 25 March, 2024; v1 submitted 14 February, 2019; originally announced February 2019.

    Comments: Published at ICLR 2024. Project page at http://aditya.bhatts.org/CrossQ and code release at https://github.com/adityab/CrossQ

  7. arXiv:1801.00056  [pdf, other

    cs.LG cs.AI stat.ML

    f-Divergence constrained policy improvement

    Authors: Boris Belousov, Jan Peters

    Abstract: To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment the policy improvement step with a trust region constraint bounding the information loss. The size of the trust region is commonly determined by the Kullback-Leibler (KL) divergence, which not only captures the notion of distance well but also yields closed-form solutions. In this paper, we consider a… ▽ More

    Submitted 4 April, 2018; v1 submitted 29 December, 2017; originally announced January 2018.