Skip to main content

Showing 1–1 of 1 results for author: Sabet, A H N

.
  1. arXiv:2105.05720  [pdf, other

    cs.DC cs.LG cs.PL

    Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

    Authors: Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi

    Abstract: Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, current logical separation between computation and communication kernels in deep learning frameworks misses the op… ▽ More

    Submitted 26 March, 2022; v1 submitted 12 May, 2021; originally announced May 2021.