Skip to main content

Showing 1–5 of 5 results for author: Phatale, S

.
  1. arXiv:2406.06592  [pdf, other

    cs.CL cs.LG

    Improve Mathematical Reasoning in Language Models by Automated Process Supervision

    Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

    Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 5 figures, 1 table

  2. arXiv:2403.10704  [pdf, other

    cs.LG cs.AI cs.CL

    PERL: Parameter Efficient Reinforcement Learning from Human Feedback

    Authors: Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis **, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Simral Chaudhary, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, Lucas Dixon

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  3. arXiv:2309.00267  [pdf, other

    cs.CL cs.AI cs.LG

    RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

    Authors: Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash

    Abstract: Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences. However, gathering high-quality human preference labels can be a time-consuming and expensive endeavor. RL from AI Feedback (RLAIF), introduced by Bai et al., offers a promising alternative that leverages a powerful off-the-shelf LLM to generate preferences in lie… ▽ More

    Submitted 30 November, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Added two more tasks and many more experiments and analyses (e.g. same-size RLAIF, direct RLAIF, cost analysis)

  4. arXiv:2305.13725  [pdf, other

    cs.CL cs.IR

    Conversational Recommendation as Retrieval: A Simple, Strong Baseline

    Authors: Raghav Gupta, Renat Aksitov, Samrat Phatale, Simral Chaudhary, Harrison Lee, Abhinav Rastogi

    Abstract: Conversational recommendation systems (CRS) aim to recommend suitable items to users through natural language conversation. However, most CRS approaches do not effectively utilize the signal provided by these conversations. They rely heavily on explicit external knowledge e.g., knowledge graphs to augment the models' understanding of the items and attributes, which is quite hard to scale. To allev… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear at the 5th NLP4ConvAI workshop

  5. arXiv:1910.03634  [pdf, other

    cs.CV cs.CL cs.LG

    Prose for a Painting

    Authors: Prerna Kashyap, Samrat Phatale, Iddo Drori

    Abstract: Painting captions are often dry and simplistic which motivates us to describe a painting creatively in the style of Shakespearean prose. This is a difficult problem, since there does not exist a large supervised dataset from paintings to Shakespearean prose. Our solution is to use an intermediate English poem description of the painting and then apply language style transfer which results in Shake… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Journal ref: ICCV Workshop on Closing the Loop Between Vision and Language, 2019