Skip to main content

Showing 1–3 of 3 results for author: Ramesh, S S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20304  [pdf, other

    cs.CL cs.LG

    Group Robust Preference Optimization in Reward-free RLHF

    Authors: Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic

    Abstract: Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimiz… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Preprint

  2. arXiv:2309.02236  [pdf, other

    cs.LG cs.AI stat.ML

    Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

    Authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic

    Abstract: Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Journal ref: AISTATS 2024

  3. arXiv:2210.08087  [pdf, other

    stat.ML cs.LG

    Movement Penalized Bayesian Optimization with Application to Wind Energy Systems

    Authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Andreas Krause, Ilija Bogunovic

    Abstract: Contextual Bayesian optimization (CBO) is a powerful framework for sequential decision-making given side information, with important applications, e.g., in wind energy systems. In this setting, the learner receives context (e.g., weather conditions) at each round, and has to choose an action (e.g., turbine parameters). Standard algorithms assume no cost for switching their decisions at every round… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022