Skip to main content

Showing 1–11 of 11 results for author: Won, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19580  [pdf, other

    cs.AR cs.LG

    FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

    Authors: Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna

    Abstract: Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2307.14549  [pdf, other

    cs.LG cs.AI

    Adversarial Slee** Bandit Problems with Multiple Plays: Algorithm and Ranking Application

    Authors: Jianjun Yuan, Wei Lee Woon, Ludovik Coba

    Abstract: This paper presents an efficient algorithm to solve the slee** bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the slee** bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with reg… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted by RecSys 2023 conference

  3. arXiv:2304.05301  [pdf, other

    cs.DC cs.LG

    TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning

    Authors: William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Ajaya Durg, Samvit Kaul, Swati Gupta, Tushar Krishna

    Abstract: The surge of artificial intelligence, specifically large language models, has led to a rapid advent towards the development of large-scale machine learning training clusters. Collective communications within these clusters tend to be heavily bandwidth-bound, necessitating techniques to optimally utilize the available network bandwidth. This puts the routing algorithm for the collective at the fore… ▽ More

    Submitted 29 March, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

  4. arXiv:2303.14006  [pdf, other

    cs.DC cs.LG

    ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

    Authors: William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna

    Abstract: As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emergin… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  5. arXiv:2110.04478  [pdf, other

    cs.DC cs.AR cs.LG cs.NI

    Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

    Authors: Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan, Tushar Krishna

    Abstract: Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activation, depending on the parallelization strategy. In next-generation platforms for training at scale, NPUs will be connected through multi-dimensional n… ▽ More

    Submitted 7 July, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  6. LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models

    Authors: William Won, Saeed Rashidi, Sudarshan Srinivasan, Tushar Krishna

    Abstract: As model sizes in machine learning continue to scale, distributed training is necessary to accommodate model weights within each device and to reduce training time. However, this comes with the expense of increased communication overhead due to the exchange of gradients and activations, which become the critical bottleneck of the end-to-end training process. In this work, we motivate the design of… ▽ More

    Submitted 5 May, 2024; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: Contains 10 main pages, 21 figures, 3 tables

    Journal ref: Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '24)

  7. arXiv:2103.10452  [pdf

    cs.DC

    Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

    Authors: Eric Qin, Geonhwa Jeong, William Won, Sheng-Chun Kao, Hyoukjun Kwon, Sudarshan Srinivasan, Dipankar Das, Gordon E. Moon, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored in a variety of compression formats. We demonstrate that both the compactness of different compression formats and the compute efficiency of the algorithms enab… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted for publication at the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)

  8. Explainable AI as a Social Microscope: A Case Study on Academic Performance

    Authors: Anahit Sargsyan, Areg Karapetyan, Wei Lee Woon, Aamena Alshamsi

    Abstract: Academic performance is perceived as a product of complex interactions between students' overall experience, personal characteristics and upbringing. Data science techniques, most commonly involving regression analysis and related approaches, serve as a viable means to explore this interplay. However, these tend to extract factors with wide-ranging impact, while overlooking variations specific to… ▽ More

    Submitted 4 June, 2020; v1 submitted 7 June, 2018; originally announced June 2018.

  9. arXiv:1803.02282  [pdf, other

    cs.DL cs.SI physics.soc-ph

    The Preeminence of Ethnic Diversity in Scientific Collaboration

    Authors: Bedoor K AlShebli, Talal Rahwan, Wei Lee Woon

    Abstract: Inspired by the social and economic benefits of diversity, we analyze over 9 million papers and 6 million scientists to study the relationship between research impact and five classes of diversity: ethnicity, discipline, gender, affiliation, and academic age. Using randomized baseline models, we establish the presence of homophily in ethnicity, gender and affiliation. We then study the effect of d… ▽ More

    Submitted 20 November, 2020; v1 submitted 6 March, 2018; originally announced March 2018.

    Journal ref: Nature communications, 9(1), 2018, 5163

  10. arXiv:1802.06964  [pdf, other

    cs.CV

    Co-occurrence matrix analysis-based semi-supervised training for object detection

    Authors: Min-Kook Choi, Jaehyeong Park, Jihun Jung, Heechul Jung, **-Hee Lee, Woong Jae Won, Woo Young Jung, **cheol Kim, Soon Kwon

    Abstract: One of the most important factors in training object recognition networks using convolutional neural networks (CNNs) is the provision of annotated data accompanying human judgment. Particularly, in object detection or semantic segmentation, the annotation process requires considerable human effort. In this paper, we propose a semi-supervised learning (SSL)-based training methodology for object det… ▽ More

    Submitted 19 February, 2018; originally announced February 2018.

    Comments: Submitted to International Conference on Image Processing (ICIP) 2018

  11. arXiv:1006.2570  [pdf, ps, other

    math.GR cs.CC

    Power Circuits, Exponential Algebra, and Time Complexity

    Authors: Alexei G. Myasnikov, Alexander Ushakov, Dong Wook Won

    Abstract: Motivated by algorithmic problems from combinatorial group theory we study computational properties of integers equipped with binary operations +, -, z = x 2^y, z = x 2^{-y} (the former two are partial) and predicates < and =. Notice that in this case very large numbers, which are obtained as n towers of exponentiation in the base 2 can be realized as n applications of the operation x2^y, so worki… ▽ More

    Submitted 13 June, 2010; originally announced June 2010.