Skip to main content

Showing 1–18 of 18 results for author: Rashidi, S

.
  1. arXiv:2406.19580  [pdf, other

    cs.AR cs.LG

    FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

    Authors: Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna

    Abstract: Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2403.14728  [pdf, other

    q-bio.OT

    Investigation of Genomic Effect of Zirconium Oxide Nanoparticles in Escherichia coli Bacteria

    Authors: Simin Rashidi, Bahram Golestani Eimani

    Abstract: Due to the concerns of the society about the increase of antibiotic resistant infections, many studies and research have been done on nanoparticles and applications of nano-biotechnology. Zirconium Oxide ($\text{ZrO}_{2}$) in which called zirconia, is a white oxide of zirconium metal that its diameter is 20 nm. The colloidal size of these particles is often smaller than bacterial and eukaryotic ce… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 18 Pages, 6 Figures, 6 Tables

  3. arXiv:2309.04902  [pdf, other

    cs.CV

    Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art

    Authors: Aref Miri Rekavandi, Shima Rashidi, Farid Boussaid, Stephen Hoefs, Emre Akbas, Mohammed bennamoun

    Abstract: Transformers have rapidly gained popularity in computer vision, especially in the field of object recognition and detection. Upon examining the outcomes of state-of-the-art object detection methods, we noticed that transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset. While transformer-based approaches remain at the forefront of small o… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  4. arXiv:2308.10872  [pdf, ps, other

    math.CO

    A study of $4-$cycle systems

    Authors: B. Bagheri Gh., M. Khosravi, E. S. Mahmoodian, S. Rashidi

    Abstract: A $4-$cycle system is a partition of the edges of the complete graph $K_n$ into $4-$cycles. Let ${ C}$ be a collection of cycles of length 4 whose edges partition the edges of $K_n$. A set of 4-cycles $T_1 \subset C$ is called a 4-cycle trade if there exists a set $T_2$ of edge-disjoint 4-cycles on the same vertices, such that $({C} \setminus T_1)\cup T_2$ also is a collection of cycles of length… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    MSC Class: 05B30; 05B05

  5. arXiv:2305.14516  [pdf, other

    cs.LG cs.DC

    Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

    Authors: Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai Man, Tushar Krishna

    Abstract: Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodol… ▽ More

    Submitted 26 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  6. arXiv:2303.14006  [pdf, other

    cs.DC cs.LG

    ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

    Authors: William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna

    Abstract: As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emergin… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  7. arXiv:2211.16648  [pdf, other

    cs.DC cs.AI cs.LG

    COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

    Authors: Divya Kiran Kadiyala, Saeed Rashidi, Taekyung Heo, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexandros Daglis

    Abstract: Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train. Designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging task requiring careful balance of compute, memory, and network resources. Moreover, a plethora of each model's tuning knobs drastically affect the performance, wi… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

  8. arXiv:2210.12947  [pdf, other

    cs.LG cs.CV

    IT-RUDA: Information Theory Assisted Robust Unsupervised Domain Adaptation

    Authors: Shima Rashidi, Ruwan Tennakoon, Aref Miri Rekavandi, Papangkorn Jessadatavornwong, Amanda Freis, Garret Huff, Mark Easton, Adrian Mouritz, Reza Hoseinnezhad, Alireza Bab-Hadiashar

    Abstract: Distribution shift between train (source) and test (target) datasets is a common problem encountered in machine learning applications. One approach to resolve this issue is to use the Unsupervised Domain Adaptation (UDA) technique that carries out knowledge transfer from a label-rich source domain to an unlabeled target domain. Outliers that exist in either source or target datasets can introduce… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  9. arXiv:2207.10898  [pdf, other

    cs.NI cs.AI

    Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

    Authors: Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna

    Abstract: RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing the vital role of congestion control on RoCE networks. Unfortunately, the native RoCE congestion control scheme, based on Priority Flow Control (PFC), s… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  10. arXiv:2110.04478  [pdf, other

    cs.DC cs.AR cs.LG cs.NI

    Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

    Authors: Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan, Tushar Krishna

    Abstract: Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activation, depending on the parallelization strategy. In next-generation platforms for training at scale, NPUs will be connected through multi-dimensional n… ▽ More

    Submitted 7 July, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  11. LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models

    Authors: William Won, Saeed Rashidi, Sudarshan Srinivasan, Tushar Krishna

    Abstract: As model sizes in machine learning continue to scale, distributed training is necessary to accommodate model weights within each device and to reduce training time. However, this comes with the expense of increased communication overhead due to the exchange of gradients and activations, which become the critical bottleneck of the end-to-end training process. In this work, we motivate the design of… ▽ More

    Submitted 5 May, 2024; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: Contains 10 main pages, 21 figures, 3 tables

    Journal ref: Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '24)

  12. Dynamical Equilibrium States of a Class of Irrotational Non-Orthogonally Transitive $G_{2}$ Cosmologies II: Models With One Hypersurface-Orthogonal Killing Vector Field

    Authors: Sepehr Rashidi, C. G. Hewitt, Benoit Charbonneau

    Abstract: We consider a class of inhomogeneous self-similar cosmological models in which the perfect fluid flow is tangential to the orbits of a three-parameter similarity group. We restrict the similarity group to possess both an Abelian $G_{2}$, and a single hypersurface orthogonal Killing vector field, and we restrict the fluid flow to be orthogonal to the orbits of the Abelian $G_{2}$. The temporal evol… ▽ More

    Submitted 30 March, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Only change is reference to companion paper arxiv:2103.16431 added

  13. arXiv:2008.08289  [pdf, other

    cs.LG cs.DC stat.ML

    Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

    Authors: Afshin Abdi, Saeed Rashidi, Faramarz Fekri, Tushar Krishna

    Abstract: Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) where the deep model is divided into several parallel… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

  14. Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms

    Authors: Saeed Rashidi, Matthew Denton, Srinivas Sridharan, Sudarshan Srinivasan, Amoghavarsha Suresh, Jade Ni, Tushar Krishna

    Abstract: Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth. However, as we identify in this work, driving this bandwidth is quite challenging. This is because there is a pernicious balance between using the accelerator's compute and memory for both DL computations and commu… ▽ More

    Submitted 4 May, 2022; v1 submitted 30 June, 2020; originally announced July 2020.

  15. arXiv:2004.01542  [pdf, ps, other

    math.CO

    A Completion of the spectrum of 3-way $(v,k,2)$ Steiner trades

    Authors: Saeedeh Rashidi, Nasrin Soltankhah

    Abstract: A 3-way $(v,k,t)$ trade $T$ of volume $m$ consists of three pairwise disjoint collections $T_1$, $T_2$ and $T_3$, each of $m$ blocks of size $k$, such that for every $t$-subset of $v$-set $V$, the number of blocks containing this $t$-subset is the same in each $T_i$ for $1\leq i\leq 3$. If any $t$-subset of found($T$) occurs at most once in each $T_i$ for $1\leq i\leq 3$, then $T$ is called 3-way… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

  16. Machine Learning Distinguishes Neurosurgical Skill Levels in a Virtual Reality Tumor Resection Task

    Authors: Samaneh Siyar, Hamed Azarnoush, Saeid Rashidi, Alexandre Winkler-Schwartz, Vincent Bissonnette, Nirros Ponnudurai, Rolando F. Del Maestro

    Abstract: Background: Virtual reality simulators and machine learning have the potential to augment understanding, assessment and training of psychomotor performance in neurosurgery residents. Objective: This study outlines the first application of machine learning to distinguish "skilled" and "novice" psychomotor performance during a virtual reality neurosurgical task. Methods: Twenty-three neurosurgeons a… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

  17. arXiv:1310.7759  [pdf, ps, other

    math.CO

    On the possible volume of $μ$-$(v,k,t)$ trades

    Authors: Saeedeh Rashidi, Nasrin Soltankhah

    Abstract: A $μ$-way $(v,k,t)$ $trade$ of volume $m$ consists of $μ$ disjoint collections $T_1$, $T_2, \dots T_μ$, each of $m$ blocks, such that for every $t$-subset of $v$-set $V$ the number of blocks containing this t-subset is the same in each $T_i\ (1\leq i\leq μ)$. In other words any pair of collections $\{T_i,T_j\}$, $1\leq i<j \leq μ$ is a $(v,k,t)$ trade of volume $m$. In this paper we investigate… ▽ More

    Submitted 29 October, 2013; originally announced October 2013.

    Comments: 12 pages, (accepted). Bull. Iranian Math. Soc., 2013

    MSC Class: 05B30; 05B05

  18. arXiv:1301.4764  [pdf, ps, other

    math.CO

    The 3-way intersection problem for S(2, 4, v) designs

    Authors: Saeedeh Rashidi, Nasrin Soltankhah

    Abstract: In this paper the 3-way intersection problem for $S(2,4,v)$ designs is investigated. Let $b_{v}=\frac {v(v-1)}{12}$ and $I_{3}[v]=\{0,1,...,b_{v}\}\setminus\{b_{v}-7,b_{v}-6,b_{v}-5,b_{v}-4,b_{v}-3,b_{v}-2,b_{v}-1\}$. Let $J_{3}[v]=\{k|$ there exist three $S(2,4,v)$ designs with $k$ same common blocks$\}$. We show that $J_{3}[v]\subseteq I_{3}[v]$ for any positive integer… ▽ More

    Submitted 21 January, 2013; originally announced January 2013.

    Comments: accepted in Utilitas mathematics

    MSC Class: 05BXX