Skip to main content

Showing 1–9 of 9 results for author: Jacobs, S A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18820  [pdf, other

    cs.DC cs.LG

    Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training

    Authors: Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang

    Abstract: Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state across multiple accelerators, a requirement for model scaling. Consolidating distributed model state into a single checkpoint unacceptably slows down training, and is impractical at extreme scales. Distributed checkpoints, in contrast, are t… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  3. arXiv:2309.14509  [pdf, other

    cs.LG cs.CL cs.DC

    DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

    Authors: Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He

    Abstract: Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the first three dimensions: data parallelism for batch size, tensor parallelism for hidden size and pipeline parallelism for model depth or layers. These widely studie… ▽ More

    Submitted 4 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  4. arXiv:2306.10209  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

    Authors: Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He

    Abstract: Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass,… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 12 pages

  5. arXiv:2112.08645  [pdf, other

    cs.LG cs.AI cs.NE

    Learning Interpretable Models Through Multi-Objective Neural Architecture Search

    Authors: Zachariah Carmichael, Tim Moon, Sam Ade Jacobs

    Abstract: Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these meth… ▽ More

    Submitted 4 July, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: International Conference on Automated Machine Learning (AutoML) Workshop

  6. arXiv:1912.02892  [pdf, other

    cs.DC cs.LG physics.comp-ph physics.plasm-ph

    Enabling Machine Learning-Ready HPC Ensembles with Merlin

    Authors: J. Luc Peterson, Ben Bay, Joe Koning, Peter Robinson, Jessica Semler, Jeremy White, Rushil Anirudh, Kevin Athey, Peer-Timo Bremer, Francesco Di Natale, David Fox, Jim A. Gaffney, Sam A. Jacobs, Bhavya Kailkhura, Bogdan Kustowski, Steven Langer, Brian Spears, Jayaraman Thiagarajan, Brian Van Essen, Jae-Seung Yeom

    Abstract: With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computin… ▽ More

    Submitted 1 July, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 28 pages, 9 figures; Submitted to FGCS

    Report number: LLNL-JRNL-821884

  7. arXiv:1910.02270  [pdf, other

    cs.DC cs.LG hep-ex physics.comp-ph

    Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

    Authors: Sam Ade Jacobs, Brian Van Essen, David Hysom, Jae-Seung Yeom, Tim Moon, Rushil Anirudh, Jayaraman J. Thiagaranjan, Shusen Liu, Peer-Timo Bremer, Jim Gaffney, Tom Benson, Peter Robinson, Luc Peterson, Brian Spears

    Abstract: Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train traditional as well as generative adversarial networks built on LBANN, a scalable deep learning framework optimized for HPC systems. LBANN combines multiple levels of par… ▽ More

    Submitted 5 October, 2019; originally announced October 2019.

  8. arXiv:1907.08325  [pdf, other

    cs.LG cs.HC cs.NE stat.ML

    Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications

    Authors: Shusen Liu, Di Wang, Dan Maljovec, Rushil Anirudh, Jayaraman J. Thiagarajan, Sam Ade Jacobs, Brian C. Van Essen, David Hysom, Jae-Seung Yeom, Jim Gaffney, Luc Peterson, Peter B. Robinson, Harsh Bhatia, Valerio Pascucci, Brian K. Spears, Peer-Timo Bremer

    Abstract: With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that re… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

  9. arXiv:1901.11152  [pdf, other

    cs.LG stat.ML

    Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency

    Authors: Ya Ju Fan, Jonathan E. Allen, Sam Ade Jacobs, Brian C. Van Essen

    Abstract: Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to model complex biological processes. We present an autoencoder to capture nonlinear relationships recovered from gene expression profiles. The autoencoder is a nonl… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Second Workshop on HPC Applications in Precision Medicine, June 2018