Skip to main content

Showing 1–17 of 17 results for author: Vuduc, R

.
  1. arXiv:2206.10581  [pdf, other

    cs.LG

    Nimble GNN Embedding with Tensor-Train Decomposition

    Authors: Chunxing Yin, Da Zheng, Israt Nisa, Christos Faloutos, George Karypis, Richard Vuduc

    Abstract: This paper describes a new method for representing embedding tables of graph neural networks (GNNs) more compactly via tensor-train (TT) decomposition. We consider the scenario where (a) the graph data that lack node features, thereby requiring the learning of embeddings during training; and (b) we wish to exploit GPU platforms, where smaller tables are needed to reduce host-to-GPU communication e… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: To appear in the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 22)

  2. arXiv:2204.05959  [pdf

    cs.DC cs.PF

    "Smarter" NICs for faster molecular dynamics: a case study

    Authors: Sara Karamati, Clayton Hughes, K. Scott Hemmert, Ryan E. Grant, W. Whit Schonbein, Scott Levy, Thomas M. Conte, Jeffrey Young, Richard W. Vuduc

    Abstract: This work evaluates the benefits of using a "smart" network interface card (SmartNIC) as a compute accelerator for the example of the MiniMD molecular dynamics proxy application. The accelerator is NVIDIA's BlueField-2 card, which includes an 8-core Arm processor along with a small amount of DRAM and storage. We test the networking and data movement performance of these cards compared to a standar… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  3. arXiv:2012.01571  [pdf, other

    cs.AR

    Online Model Swap** in Architectural Simulation

    Authors: Patrick Lavin, Jeffrey Young, Rich Vuduc, Jonathan Beard

    Abstract: As systems and applications grow more complex, detailed simulation takes an ever increasing amount of time. The prospect of increased simulation time resulting in slower design iteration forces architects to use simpler models, such as spreadsheets, when they want to iterate quickly on a design. However, the task of migrating from a simple simulation to one with more detail often requires multiple… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  4. arXiv:1911.08630  [pdf, other

    cs.CV

    CUP: Cluster Pruning for Compressing Deep Neural Networks

    Authors: Rahul Duggal, Cao Xiao, Richard Vuduc, Jimeng Sun

    Abstract: We propose Cluster Pruning (CUP) for compressing and accelerating deep neural networks. Our approach prunes similar filters by clustering them based on features derived from both the incoming and outgoing weight connections. With CUP, we overcome two limitations of prior work-(1) non-uniform pruning: CUP can efficiently determine the ideal number of filters to prune in each layer of a neural netwo… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

  5. arXiv:1904.03329  [pdf, other

    cs.DC

    Load-Balanced Sparse MTTKRP on GPUs

    Authors: Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard Vuduc, P. Sadayappan

    Abstract: Sparse matricized tensor times Khatri-Rao product (MTTKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP operation on GPUs, addressing both performance and storage requirements. We begin by identifying the performance bottlenecks in directly extending the state-of-the-art CSF (compressed sparse fiber) format from CPUs… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

  6. arXiv:1901.02775  [pdf, other

    cs.DC

    Programming Strategies for Irregular Algorithms on the Emu Chick

    Authors: Eric Hein, Srinivas Eswar, Abdurrahman Yaşar, Jiajia Li, Jeffrey S. Young, Thomas M. Conte, Ümit V. Çatalyürek, Rich Vuduc, Jason Riedy, Bora Uçar

    Abstract: The Emu Chick prototype implements migratory memory-side processing in a novel hardware system. Rather than transferring large amounts of data across the system interconnect, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each remote memory read. Previous work has characterized the performance of the Chick prototype in terms of memory bandwidth and pro… ▽ More

    Submitted 3 December, 2018; originally announced January 2019.

  7. arXiv:1811.03743  [pdf, other

    cs.PF

    Spatter: A Tool for Evaluating Gather / Scatter Performance

    Authors: Patrick Lavin, Jeffrey Young, Jason Riedy, Richard Vuduc, Aaron Vose, Dan Ernst

    Abstract: This paper describes a new benchmark tool, Spatter, for assessing memory system architectures in the context of a specific category of indexed accesses known as gather and scatter. These types of operations are increasingly used to express sparse and irregular data access patterns, and they have widespread utility in many modern HPC applications including scientific simulations, data mining and an… ▽ More

    Submitted 7 July, 2020; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Updated paper results and text to reflect longer conference submission limit

  8. A Microbenchmark Characterization of the Emu Chick

    Authors: Jeffrey S. Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, Thomas M. Conte

    Abstract: The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less "Gossamer cores for doing… ▽ More

    Submitted 31 May, 2019; v1 submitted 7 September, 2018; originally announced September 2018.

    Journal ref: Parallel Computing, 2019, ISSN 0167-8191

  9. arXiv:1808.07832  [pdf, ps, other

    cs.PL cs.LO cs.MS

    A Simple Methodology for Computing Families of Algorithms

    Authors: Devangi N. Parikh, Margaret E. Myers, Richard Vuduc, Robert A. van de Geijn

    Abstract: Discovering "good" algorithms for an operation is often considered an art best left to experts. What if there is a simple methodology, an algorithm, for systematically deriving a family of algorithms as well as their cost analyses, so that the best algorithm can be chosen? We discuss such an approach for deriving loop-based algorithms. The example used to illustrate this methodology, evaluation of… ▽ More

    Submitted 20 August, 2018; originally announced August 2018.

    Report number: FLAME Working Note #87, The University of Texas at Austin, Department of Computer Science, Technical Report TR-18-06

  10. Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

    Authors: Yang You, James Demmel, Cho-Jui Hsieh, Richard Vuduc

    Abstract: We propose two new methods to address the weak scaling problems of KRR: the Balanced KRR (BKRR) and K-means KRR (KKRR). These methods consider alternative ways to partition the input dataset into p different parts, generating p different models, and then selecting the best model among them. Compared to a conventional implementation, KKRR2 (optimized version of KKRR) improves the weak scaling effic… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Comments: This paper has been accepted by ACM International Conference on Supercomputing (ICS) 2018

  11. arXiv:1803.05473  [pdf, other

    cs.LG

    SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenoty**

    Authors: Ioakeim Perros, Evangelos E. Papalexakis, Haesun Park, Richard Vuduc, Xiaowei Yan, Christopher Defilippi, Walter F. Stewart, Jimeng Sun

    Abstract: This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics… ▽ More

    Submitted 14 March, 2018; originally announced March 2018.

  12. arXiv:1703.04219  [pdf, other

    cs.LG math.NA

    SPARTan: Scalable PARAFAC2 for Large & Sparse Data

    Authors: Ioakeim Perros, Evangelos E. Papalexakis, Fei Wang, Richard Vuduc, Elizabeth Searles, Michael Thompson, Jimeng Sun

    Abstract: In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there is no meaningful way to align their clinical records across time points for analysis purposes. To han… ▽ More

    Submitted 12 March, 2017; originally announced March 2017.

  13. arXiv:1611.04255   

    cs.DC

    Efficient Communications in Training Large Scale Neural Networks

    Authors: Linnan Wang, Wei Wu, George Bosilca, Richard Vuduc, Zenglin Xu

    Abstract: We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for sub-gradient aggregations, which for large messages quickly dominates overal… ▽ More

    Submitted 15 April, 2017; v1 submitted 14 November, 2016; originally announced November 2016.

    Comments: This paper has been withdrawn by the author due to a crucial sign error in equation 1

  14. arXiv:1610.07722  [pdf, other

    cs.LG math.NA

    Sparse Hierarchical Tucker Factorization and its Application to Healthcare

    Authors: Ioakeim Perros, Robert Chen, Richard Vuduc, Jimeng Sun

    Abstract: We propose a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a domain expert. However, Sparse H-Tucker uses a nested samp… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

    Comments: This is an extended version of a paper presented at the 15th IEEE International Conference on Data Mining (ICDM 2015)

  15. arXiv:1603.00491  [pdf, other

    math.NA cs.PF

    Wanted: Floating-Point Add Round-off Error instruction

    Authors: Marat Dukhan, Richard Vuduc, Jason Riedy

    Abstract: We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient. Performance estimates on Intel Haswell, Intel Skylake, and AMD Steamroller processors, as well as Intel Knights Corner co-processor, demonstrate that such an instr… ▽ More

    Submitted 1 March, 2016; originally announced March 2016.

  16. arXiv:1411.1460  [pdf, other

    cs.DC cs.PF

    Branch-Avoiding Graph Algorithms

    Authors: Oded Green, Marat Dukhan, Richard Vuduc

    Abstract: This paper quantifies the impact of branches and branch mispredictions on the single-core performance for two classes of graph problems. Specifically, we consider classical algorithms for computing connected components and breadth-first search (BFS). We show that branch mispredictions are costly and can reduce performance by as much as 30%-50%. This insight suggests that one should seek graph algo… ▽ More

    Submitted 5 November, 2014; originally announced November 2014.

    ACM Class: C.0; C.4; E.1

  17. arXiv:1309.1828  [pdf

    cs.CE cs.DC

    Sustainable Software Development for Next-Gen Sequencing (NGS) Bioinformatics on Emerging Platforms

    Authors: Shel Swenson, Yogesh Simmhan, Viktor Prasanna, Manish Parashar, Jason Riedy, David Bader, Richard Vuduc

    Abstract: DNA sequence analysis is fundamental to life science research. The rapid development of next generation sequencing (NGS) technologies, and the richness and diversity of applications it makes feasible, have created an enormous gulf between the potential of this technology and the development of computational methods to realize this potential. Bridging this gap holds possibilities for broad impacts… ▽ More

    Submitted 26 October, 2013; v1 submitted 7 September, 2013; originally announced September 2013.

    Comments: 4 pages