Skip to main content

Showing 1–10 of 10 results for author: Raparthy, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.04642  [pdf, other

    cs.LG

    Teaching Large Language Models to Reason with Reinforcement Learning

    Authors: Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

    Abstract: Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from feedback (Expert Iteration, Proximal Policy Optimization (\textbf{PPO}), Return-Conditioned RL) on improving LLM reasoning capabilities. We investigate both spa… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  2. arXiv:2402.16822  [pdf, other

    cs.CL cs.AI cs.LG

    Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

    Authors: Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

    Abstract: As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel app… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  3. arXiv:2402.10963  [pdf, other

    cs.CL cs.LG

    GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

    Authors: Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Raileanu

    Abstract: State-of-the-art language models can exhibit impressive reasoning refinement capabilities on math, science or coding tasks. However, recent work demonstrates that even the best models struggle to identify \textit{when and where to refine} without access to external feedback. Outcome-based Reward Models (\textbf{ORMs}), trained to predict correctness of the final answer indicating when to refine, o… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  4. arXiv:2312.03801  [pdf, other

    cs.LG cs.AI

    Generalization to New Sequential Decision Making Tasks with In-Context Learning

    Authors: Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

    Abstract: Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  5. arXiv:2210.12765  [pdf, other

    cs.LG stat.ML

    Multi-Objective GFlowNets

    Authors: Moksh Jain, Sharath Chandra Raparthy, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio, Santiago Miret, Emmanuel Bengio

    Abstract: We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, mak… ▽ More

    Submitted 17 July, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: 23 pages, 8 figures. ICML 2023. Code at: https://github.com/GFNOrg/multi-objective-gfn

  6. arXiv:2112.07066  [pdf, other

    cs.LG

    Continual Learning In Environments With Polynomial Mixing Times

    Authors: Matthew Riemer, Sharath Chandra Raparthy, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel, Irina Rish

    Abstract: The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In… ▽ More

    Submitted 13 October, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted at NeurIPS 2022

  7. arXiv:2110.09419  [pdf, other

    cs.LG

    Compositional Attention: Disentangling Search and Retrieval

    Authors: Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

    Abstract: Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected e… ▽ More

    Submitted 13 February, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

  8. arXiv:2002.07956  [pdf, other

    cs.LG cs.AI stat.ML

    Curriculum in Gradient-Based Meta-Reinforcement Learning

    Authors: Bhairav Mehta, Tristan Deleu, Sharath Chandra Raparthy, Chris J. Pal, Liam Paull

    Abstract: Gradient-based meta-learners such as Model-Agnostic Meta-Learning (MAML) have shown strong few-shot performance in supervised and reinforcement learning settings. However, specifically in the case of meta-reinforcement learning (meta-RL), we can show that gradient-based meta-learners are sensitive to task distributions. With the wrong curriculum, agents suffer the effects of meta-overfitting, shal… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

    Comments: 11 pages, 10 figures

  9. arXiv:2002.07911  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Generating Automatic Curricula via Self-Supervised Active Domain Randomization

    Authors: Sharath Chandra Raparthy, Bhairav Mehta, Florian Golemo, Liam Paull

    Abstract: Goal-directed Reinforcement Learning (RL) traditionally considers an agent interacting with an environment, prescribing a real-valued reward to an agent proportional to the completion of some goal. Goal-directed RL has seen large gains in sample efficiency, due to the ease of reusing or generating new experience by proposing goals. One approach,self-play, allows an agent to "play" against itself b… ▽ More

    Submitted 26 October, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  10. arXiv:1911.06786  [pdf, other

    cs.LG cs.CV

    Data Efficient Stagewise Knowledge Distillation

    Authors: Akshay Kulkarni, Navid Panchi, Sharath Chandra Raparthy, Shital Chiddarwar

    Abstract: Despite the success of Deep Learning (DL), the deployment of modern DL models requiring large computational power poses a significant problem for resource-constrained systems. This necessitates building compact networks that reduce computations while preserving performance. Traditional Knowledge Distillation (KD) methods that transfer knowledge from teacher to student (a) use a single-stage and (b… ▽ More

    Submitted 23 June, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

    Comments: 15 pages, 1 figure, 6 tables and 1 algorithm