Skip to main content

Showing 1–2 of 2 results for author: Vainbrand, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2203.04910  [pdf, other

    cs.DC cs.AR cs.OS cs.PF

    GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture

    Authors: Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seung Won Min, Amna Masood, Jeongmin Park, **jun Xiong, CJ Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William Dally, Wen-mei Hwu

    Abstract: Graphics Processing Units (GPUs) have traditionally relied on the host CPU to initiate access to the data storage. This approach is well-suited for GPU applications with known data access patterns that enable partitioning of their dataset to be processed in a pipelined fashion in the GPU. However, emerging applications such as graph and data analytics, recommender systems, or graph neural networks… ▽ More

    Submitted 6 February, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: This is an extension to the published conference paper at ASPLOS'23: https://dl.acm.org/doi/abs/10.1145/3575693.3575748

    Journal ref: ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

  2. arXiv:2104.04473  [pdf, other

    cs.CL cs.DC

    Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

    Authors: Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

    Abstract: Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on even a multi-GPU server, and b) the number of compute operations required to train these models can result in unrealistically long training times. Consequently… ▽ More

    Submitted 23 August, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to SC 2021