Skip to main content

Showing 1–4 of 4 results for author: Vassilieva, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14779  [pdf, other

    cs.CL

    Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches

    Authors: Clément Christophe, Praveen K Kanithi, Prateek Munjal, Tathagata Raha, Nasir Hayat, Ronnie Rajan, Ahmed Al-Mahrooqi, Avani Gupta, Muhammad Umar Salman, Gurpreet Gosal, Bhargav Kanakiya, Charles Chen, Natalia Vassilieva, Boulbaba Ben Amor, Marco AF Pimentel, Shadab Khan

    Abstract: This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Published at AAAI 2024 Spring Symposium - Clinical Foundation Models

  2. arXiv:2309.11568  [pdf, other

    cs.AI cs.CL cs.LG

    BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

    Authors: Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness

    Abstract: We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B parameter mod… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  3. arXiv:2309.10818  [pdf, other

    cs.CL cs.AI

    SlimPajama-DC: Understanding Data Combinations for LLM Training

    Authors: Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing

    Abstract: This paper aims to understand the impacts of various data combinations (e.g., web text, Wikipedia, GitHub, books) on the pretraining of large language models using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T token RedPajama dataset contributed by Together. We have termed our resear… ▽ More

    Submitted 9 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Technical report. Models at: https://huggingface.co/MBZUAI-LLM/SlimPajama-DC and dataset at: https://huggingface.co/datasets/MBZUAI-LLM/SlimPajama-627B-DC

  4. arXiv:2308.16149  [pdf, other

    cs.CL cs.AI cs.LG

    Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

    Authors: Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock , et al. (7 additional authors not shown)

    Abstract: We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Arabic-centric, foundation model, large-language model, LLM, generative model, instruction-tuned, Jais, Jais-chat

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7