Skip to main content

Showing 1–12 of 12 results for author: Low, T M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.04769  [pdf, other

    cs.MS cs.PF

    SMaLL: A Software Framework for portable Machine Learning Libraries

    Authors: Upasana Sridhar, Nicholai Tukanov, Elliott Binder, Tze Meng Low, Scott McMillan, Martin D. Schatz

    Abstract: Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good map** of the high-level interface to the target hardware platform. Comm… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: 14 pages, 12 figures

  2. arXiv:2110.01409  [pdf, other

    cs.DC cs.PF

    Delayed Asynchronous Iterative Graph Algorithms

    Authors: Mark P. Blanco, Scott McMillan, Tze Meng Low

    Abstract: Iterative graph algorithms often compute intermediate values and update them as computation progresses. Updated output values are used as inputs for computations in current or subsequent iterations; hence the number of iterations required for values to converge can potentially reduce if the newest values are asynchronously made available to other updates computed in the same iteration. In a multi-… ▽ More

    Submitted 29 September, 2021; originally announced October 2021.

    Comments: 6 pages, 6 figures, 2 tables, IEEE High Performance Extreme Computing (HPEC) Conference 2021

  3. Towards an Objective Metric for the Performance of Exact Triangle Count

    Authors: Mark P. Blanco, Scott McMillan, Tze Meng Low

    Abstract: The performance of graph algorithms is often measured in terms of the number of traversed edges per second (TEPS). However, this performance metric is inadequate for a graph operation such as exact triangle counting. In triangle counting, execution times on graphs with a similar number of edges can be distinctly different as demonstrated by results from the past Graph Challenge entries. We discuss… ▽ More

    Submitted 29 September, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: 6 Pages, 2020 IEEE High Performance Extreme Computing Conference(HPEC)

  4. Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU

    Authors: Mark Blanco, Tze Meng Low, Kyungjoo Kim

    Abstract: In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: 2019 IEEE High Performance Extreme Computing Conference (HPEC)

  5. Delta-step** SSSP: from Vertices and Edges to GraphBLAS Implementations

    Authors: Upasana Sridhar, Mark Blanco, Rahul Mayuranath, Daniele G. Spampinato, Tze Meng Low, Scott McMillan

    Abstract: GraphBLAS is an interface for implementing graph algorithms. Algorithms implemented using the GraphBLAS interface are cast in terms of linear algebra-like operations. However, many graph algorithms are canonically described in terms of operations on vertices and/or edges. Despite the known duality between these two representations, the differences in the way algorithms are described using the two… ▽ More

    Submitted 16 September, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

    Comments: 10 pages, 4 figures, IPDPSW GRAPL 2019 Workshop

    Journal ref: IEEE International Parallel and Distributed Processing Symposium Workshops, 2019, pp 241 to 250

  6. arXiv:1904.10119  [pdf, other

    cs.MS cs.DC

    A Flexible Framework for Parallel Multi-Dimensional DFTs

    Authors: Doru Thom Popovici, Martin D. Schatz, Franz Franchetti, Tze Meng Low

    Abstract: Multi-dimensional discrete Fourier transforms (DFT) are typically decomposed into multiple 1D transforms. Hence, parallel implementations of any multi-dimensional DFT focus on parallelizing within or across the 1D DFT. Existing DFT packages exploit the inherent parallelism across the 1D DFTs and offer rigid frameworks, that cannot be extended to incorporate both forms of parallelism and various da… ▽ More

    Submitted 22 December, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

  7. arXiv:1903.01042  [pdf, other

    cs.IT cs.DC cs.LG cs.PF

    CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors

    Authors: Sanghamitra Dutta, Ziqian Bai, Tze Meng Low, Pulkit Grover

    Abstract: This work proposes the first strategy to make distributed training of neural networks resilient to computing errors, a problem that has remained unsolved despite being first posed in 1956 by von Neumann. He also speculated that the efficiency and reliability of the human brain is obtained by allowing for low power but error-prone components with redundancy for error-resilience. It is surprising th… ▽ More

    Submitted 3 March, 2019; originally announced March 2019.

    Comments: Currently under review

  8. A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication

    Authors: Sanghamitra Dutta, Ziqian Bai, Haewon Jeong, Tze Meng Low, Pulkit Grover

    Abstract: This paper has two contributions. First, we propose a novel coded matrix multiplication technique called Generalized PolyDot codes that advances on existing methods for coded matrix multiplication under storage and communication constraints. This technique uses "garbage alignment," i.e., aligning computations in coded computing that are not a part of the desired output. Generalized PolyDot codes b… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: Presented in part at the IEEE International Symposium on Information Theory 2018 (Submission Date: Jan 12 2018); Currently under review at the IEEE Transactions on Information Theory

  9. arXiv:1809.10170  [pdf, other

    cs.LG cs.DC stat.ML

    High Performance Zero-Memory Overhead Direct Convolutions

    Authors: Jiyuan Zhang, Franz Franchetti, Tze Meng Low

    Abstract: The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve performance. The problems with such an approach are two-fold. First, these routines incur additional memory overhead which reduces the overall size of the network… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: the 35th International Conference on Machine Learning(ICML 2018), camera ready

  10. arXiv:1805.09891  [pdf, other

    cs.IT cs.DC

    Coded FFT and Its Communication Overhead

    Authors: Haewon Jeong, Tze Meng Low, Pulkit Grover

    Abstract: We propose a coded computing strategy and examine communication costs of coded computing algorithms to make distributed Fast Fourier Transform (FFT) resilient to errors during the computation. We apply maximum distance separable (MDS) codes to a widely used "Transpose" algorithm for parallel FFT. In the uncoded distributed FFT algorithm, the most expensive step is a single "all-to-all" communicati… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  11. arXiv:1611.08035  [pdf, other

    cs.MS

    Automating the Last-Mile for High Performance Dense Linear Algebra

    Authors: Richard Michael Veras, Tze Meng Low, Tyler Michael Smith, Robert van de Geijn, Franz Franchetti

    Abstract: High performance dense linear algebra (DLA) libraries often rely on a general matrix multiply (Gemm) kernel that is implemented using assembly or with vector intrinsics. In particular, the real-valued Gemm kernels provide the overwhelming fraction of performance for the complex-valued Gemm kernels, along with the entire level-3 BLAS and many of the real and complex LAPACK routines. Thus,achieving… ▽ More

    Submitted 28 April, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

  12. arXiv:1301.7744  [pdf, ps, other

    math.NA cs.MS

    Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors

    Authors: Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, Tamara G. Kolda

    Abstract: Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (Blocked Compact Symmetric Storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric te… ▽ More

    Submitted 9 April, 2014; v1 submitted 31 January, 2013; originally announced January 2013.

    MSC Class: 15-02 (Primary)

    Journal ref: SIAM Journal on Scientific Computing, Vol. 36, No. 5, pp. C453-C479, September 2014