Skip to main content

Showing 1–4 of 4 results for author: Zhuravinskyi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01226  [pdf, other

    cs.CL

    Stable Code Technical Report

    Authors: Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, Ashish Datta, Maksym Zhuravinskyi, Dakota Mahan, Marco Bellagente, Carlos Riquelme, Nathan Cooper

    Abstract: We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing quest… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  2. arXiv:2403.04642  [pdf, other

    cs.LG

    Teaching Large Language Models to Reason with Reinforcement Learning

    Authors: Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

    Abstract: Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from feedback (Expert Iteration, Proximal Policy Optimization (\textbf{PPO}), Return-Conditioned RL) on improving LLM reasoning capabilities. We investigate both spa… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  3. arXiv:2402.17834  [pdf, other

    cs.CL stat.ML

    Stable LM 2 1.6B Technical Report

    Authors: Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme

    Abstract: We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including z… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 23 pages, 6 figures

  4. arXiv:2402.10963  [pdf, other

    cs.CL cs.LG

    GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

    Authors: Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Raileanu

    Abstract: State-of-the-art language models can exhibit impressive reasoning refinement capabilities on math, science or coding tasks. However, recent work demonstrates that even the best models struggle to identify \textit{when and where to refine} without access to external feedback. Outcome-based Reward Models (\textbf{ORMs}), trained to predict correctness of the final answer indicating when to refine, o… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.