Skip to main content

Showing 1–2 of 2 results for author: Fantl, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.13541  [pdf, other

    cs.DC cs.NI

    Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

    Authors: Prithwish Basu, Liangyu Zhao, Jason Fantl, Siddharth Pal, Arvind Krishnamurthy, Joud Khoury

    Abstract: The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This paper takes a holistic approach to optimize the perform… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: HPDC '24

  2. arXiv:2202.03356  [pdf, other

    cs.NI cs.DC cs.LG

    Efficient Direct-Connect Topologies for Collective Communications

    Authors: Liangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Jason Fantl, Prithwish Basu, Joud Khoury, Arvind Krishnamurthy

    Abstract: We consider the problem of distilling efficient network topologies for collective communications. We provide an algorithmic framework for constructing direct-connect topologies optimized for the latency vs. bandwidth trade-off associated with the workload. Our approach synthesizes many different topologies and schedules for a given cluster size and degree and then identifies the appropriate topolo… ▽ More

    Submitted 12 May, 2024; v1 submitted 7 February, 2022; originally announced February 2022.