Skip to main content

Showing 1–1 of 1 results for author: Langman, J

.
  1. Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training

    Authors: Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, Sundaram Narayanan, Jack Langman, Kevin Wilfong, Harsha Rastogi, Carole-Jean Wu, Christos Kozyrakis, Parik Pol

    Abstract: Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators (DSA) are used to train increasingly-complex deep learning models. These clusters rely on a data storage and ingestion (DSI) pipeline, responsible for storing exabytes of training data and serving it at tens of terabytes per second. As DSAs continue to push training efficiency and throughput, the DSI pipe… ▽ More

    Submitted 22 April, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

    Comments: In The 49th Annual International Symposium on Computer Architecture (ISCA 2022)