Skip to main content

Showing 1–3 of 3 results for author: Palenicek, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2311.16656  [pdf, other

    cs.LG stat.ML

    Pseudo-Likelihood Inference

    Authors: Theo Gruner, Boris Belousov, Fabio Muratore, Daniel Palenicek, Jan Peters

    Abstract: Simulation-Based Inference (SBI) is a common name for an emerging family of approaches that infer the model parameters when the likelihood is intractable. Existing SBI methods either approximate the likelihood, such as Approximate Bayesian Computation (ABC) or directly model the posterior, such as Sequential Neural Posterior Estimation (SNPE). While ABC is efficient on low-dimensional problems, on… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 27 pages, 12 figures, Published as a conference paper at NeurIPS 2023

  2. arXiv:2006.09436  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    SAMBA: Safe Model-Based & Active Reinforcement Learning

    Authors: Alexander I. Cowen-Rivers, Daniel Palenicek, Vincent Moens, Mohammed Abdullah, Aivar Sootla, Jun Wang, Haitham Ammar

    Abstract: In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  3. arXiv:1902.05605  [pdf, other

    cs.LG stat.ML

    CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

    Authors: Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, Jan Peters

    Abstract: Sample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce Cro… ▽ More

    Submitted 25 March, 2024; v1 submitted 14 February, 2019; originally announced February 2019.

    Comments: Published at ICLR 2024. Project page at http://aditya.bhatts.org/CrossQ and code release at https://github.com/adityab/CrossQ