Skip to main content

Showing 1–13 of 13 results for author: Sukhija, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01175  [pdf, other

    cs.LG

    NeoRL: Efficient Exploration for Nonepisodic RL

    Authors: Bhavya Sukhija, Lenart Treven, Florian Dörfler, Stelian Coros, Andreas Krause

    Abstract: We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimist… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2406.01163  [pdf, other

    cs.LG

    When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL

    Authors: Lenart Treven, Bhavya Sukhija, Yarden As, Florian Dörfler, Andreas Krause

    Abstract: Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP). However, various systems are inherently continuous in time, making discrete-time MDPs an inexact modeling choice. In many applications, such as greenhouse control or medical treatments, each interaction (measurement or switching of action) involves manual intervention and thus is inherently… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2405.05890  [pdf, other

    cs.LG cs.AI

    Safe Exploration Using Bayesian World Models and Log-Barrier Optimization

    Authors: Yarden As, Bhavya Sukhija, Andreas Krause

    Abstract: A major challenge in deploying reinforcement learning in online tasks is ensuring that safety is maintained throughout the learning process. In this work, we propose CERL, a new method for solving constrained Markov decision processes while kee** the policy safe during learning. Our method leverages Bayesian world models and suggests policies that are pessimistic w.r.t. the model's epistemic unc… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2403.16644  [pdf, other

    cs.RO cs.LG

    Bridging the Sim-to-Real Gap with Bayesian Inference

    Authors: Jonas Rothfuss, Bhavya Sukhija, Lenart Treven, Florian Dörfler, Stelian Coros, Andreas Krause

    Abstract: We present SIM-FSVGD for learning robot dynamics from data. As opposed to traditional methods, SIM-FSVGD leverages low-fidelity physical priors, e.g., in the form of simulators, to regularize the training of neural network models. While learning accurate dynamics already in the low data regime, SIM-FSVGD scales and excels also when more data is available. We empirically show that learning with imp… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  5. arXiv:2402.15898  [pdf, other

    cs.LG cs.AI

    Transductive Active Learning: Theory and Applications

    Authors: Jonas Hübotter, Bhavya Sukhija, Lenart Treven, Yarden As, Andreas Krause

    Abstract: We generalize active learning to address real-world settings with concrete prediction targets where sampling is restricted to an accessible region of the domain, while prediction targets may lie outside this region. We analyze a family of decision rules that sample adaptively to minimize uncertainty about prediction targets. We are the first to show, under general regularity assumptions, that such… ▽ More

    Submitted 22 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.15441

  6. arXiv:2402.15441  [pdf, other

    cs.LG cs.AI

    Active Few-Shot Fine-Tuning

    Authors: Jonas Hübotter, Bhavya Sukhija, Lenart Treven, Yarden As, Andreas Krause

    Abstract: We study the question: How can we select the right data for fine-tuning to a specific task? We call this data selection problem active fine-tuning and show that it is an instance of transductive active learning, a novel generalization of classical active learning. We propose ITL, short for information-based transductive learning, an approach which samples adaptively to maximize information gained… ▽ More

    Submitted 21 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  7. arXiv:2311.07558  [pdf, other

    cs.LG cs.RO

    Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

    Authors: Arjun Bhardwaj, Jonas Rothfuss, Bhavya Sukhija, Yarden As, Marco Hutter, Stelian Coros, Andreas Krause

    Abstract: We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robo… ▽ More

    Submitted 6 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  8. arXiv:2310.19848  [pdf, other

    cs.LG cs.RO math.OC

    Efficient Exploration in Continuous-time Model-based Reinforcement Learning

    Authors: Lenart Treven, Jonas Hübotter, Bhavya Sukhija, Florian Dörfler, Andreas Krause

    Abstract: Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use t… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  9. arXiv:2306.12371  [pdf, other

    cs.LG cs.RO eess.SY

    Optimistic Active Exploration of Dynamical Systems

    Authors: Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes, Stelian Coros, Andreas Krause

    Abstract: Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model globally approximates the dynamics and allows us to solve multiple downstream tasks in a zero-shot manner? In this paper, we address this challenge, by develo** an algorithm -- OPAX -- for active exploration. OPAX us… ▽ More

    Submitted 30 October, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  10. arXiv:2306.07092  [pdf, other

    cs.RO cs.AI

    Tuning Legged Locomotion Controllers via Safe Bayesian Optimization

    Authors: Daniel Widmer, Dongho Kang, Bhavya Sukhija, Jonas Hübotter, Andreas Krause, Stelian Coros

    Abstract: This paper presents a data-driven strategy to streamline the deployment of model-based controllers in legged robotic hardware platforms. Our approach leverages a model-free safe learning algorithm to automate the tuning of control gains, addressing the mismatch between the simplified model used in the control formulation and the real system. This method substantially mitigates the risk of hazardou… ▽ More

    Submitted 25 October, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted to the 2023 Conference on Robot Learning (CoRL 2023.) The first two authors contributed equally. The supplementary video is available at https://youtu.be/zDBouUgegrU and the code implementation is available at https://github.com/lasgroup/gosafeopt

  11. arXiv:2303.01076  [pdf, other

    cs.LG cs.AI stat.ML

    Hallucinated Adversarial Control for Conservative Offline Policy Evaluation

    Authors: Jonas Rothfuss, Bhavya Sukhija, Tobias Birchler, Parnian Kassraie, Andreas Krause

    Abstract: We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, collected by other agents, we seek to obtain a (tight) lower bound on a policy's performance. This is crucial when deciding whether a given policy satisfies certain minimal performance/safety criteria before it can be deployed in the real world. To this end, we introduce HA… ▽ More

    Submitted 26 May, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Conference on Uncertainty in Artificial Intelligence (UAI) 2023, first three authors contributed equally

  12. arXiv:2204.04558  [pdf, other

    cs.RO cs.LG

    Gradient-Based Trajectory Optimization With Learned Dynamics

    Authors: Bhavya Sukhija, Nathanael Köhler, Miguel Zamora, Simon Zimmermann, Sebastian Curi, Andreas Krause, Stelian Coros

    Abstract: Trajectory optimization methods have achieved an exceptional level of performance on real-world robots in recent years. These methods heavily rely on accurate analytical models of the dynamics, yet some aspects of the physical world can only be captured to a limited extent. An alternative approach is to leverage machine learning techniques to learn a differentiable dynamics model of the system fro… ▽ More

    Submitted 25 June, 2023; v1 submitted 9 April, 2022; originally announced April 2022.

  13. GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems

    Authors: Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, Dominik Baumann

    Abstract: Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be… ▽ More

    Submitted 12 June, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

    Journal ref: Artificial Intelligence, Volume 320, Year 2023