Skip to main content

Showing 1–6 of 6 results for author: Jha, A H

.
  1. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  2. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  3. arXiv:2312.10523  [pdf, other

    cs.CL cs.AI cs.LG

    Paloma: A Benchmark for Evaluating Language Model Fit

    Authors: Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

    Abstract: Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Project Page: https://paloma.allen.ai/

  4. arXiv:2305.14864  [pdf, other

    cs.CL

    How To Train Your (Compressed) Large Language Model

    Authors: Ananya Harsh Jha, Tom Sherborne, Evan Pete Walsh, Dirk Groeneveld, Emma Strubell, Iz Beltagy

    Abstract: With the increase in the size of large language models (LLMs), we need compression methods that can reduce the model size while preserving the generality and zero-shot promptability of the model. This goal is more ambitious than the typical compression setup, which reduces the model's size at the expense of specializing it to a specific end-task. To study this, we develop a task-agnostic compressi… ▽ More

    Submitted 18 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 13 pages, 6 figures, 5 tables

  5. arXiv:2107.12329  [pdf, other

    cs.LG

    AASAE: Augmentation-Augmented Stochastic Autoencoders

    Authors: William Falcon, Ananya Harsh Jha, Teddy Koker, Kyunghyun Cho

    Abstract: Recent methods for self-supervised learning can be grouped into two paradigms: contrastive and non-contrastive approaches. Their success can largely be attributed to data augmentation pipelines which generate multiple views of a single input that preserve the underlying semantics. In this work, we introduce augmentation-augmented stochastic autoencoders (AASAE), yet another alternative to self-sup… ▽ More

    Submitted 6 February, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: 17 pages, 5 figures, 3 table

  6. arXiv:1804.10469  [pdf, other

    cs.CV

    Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders

    Authors: Ananya Harsh Jha, Saket Anand, Maneesh Singh, V. S. R. Veeravasarapu

    Abstract: Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. By sampling from the disentangled latent subspace of interest, we can efficiently generate new data necessary for a particular task. Learning disentangled representations is a challenging problem, especially when certain factors of variation ar… ▽ More

    Submitted 27 April, 2018; originally announced April 2018.