Skip to main content

Showing 1–10 of 10 results for author: Herbordt, M

.
  1. arXiv:2305.19946  [pdf, other

    cs.DC

    A Survey of Potential MPI Complex Collectives: Large-Scale Mining and Analysis of HPC Applications

    Authors: Pouya Haghi, Ryan Marshall, Po Hao Chen, Anthony Skjellum, Martin Herbordt

    Abstract: Offload of MPI collectives to network devices, e.g., NICs and switches, is being implemented as an effective mechanism to improve application performance by reducing inter- and intra-node communication and bypassing MPI software layers. Given the rich deployment of accelerators and programmable NICs/switches in data centers, we posit that there is an opportunity to further improve performance by e… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  2. arXiv:2206.13734  [pdf, other

    cs.AR cs.LG

    H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

    Authors: Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin Herbordt, Ang Li, Dingwen Tao

    Abstract: Graph Neural Networks (GNNs) have drawn tremendous attention due to their unique capability to extend Machine Learning (ML) approaches to applications broadly-defined as having unstructured data, especially graphs. Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph t… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 8 pages, 8 figures, 4 tables, accepted by FPL'22

  3. arXiv:2204.04816  [pdf, other

    cs.CR

    Distributed Hardware Accelerated Secure Joint Computation on the COPA Framework

    Authors: Rushi Patel, Pouya Haghi, Shweta Jain, Andriy Kot, Venkata Krishnan, Mayank Varia, Martin Herbordt

    Abstract: Performance of distributed data center applications can be improved through use of FPGA-based SmartNICs, which provide additional functionality and enable higher bandwidth communication. Until lately, however, the lack of a simple approach for customizing SmartNICs to application requirements has limited the potential benefits. Intel's Configurable Network Protocol Accelerator (COPA) provides a cu… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

  4. arXiv:2203.03606  [pdf, other

    cs.AR cs.LG

    I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

    Authors: Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin C. Herbordt, Yingyan Lin, Ang Li

    Abstract: Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs. In… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Published in MICRO 2022

  5. arXiv:2009.12617  [pdf, other

    cs.AR cs.DC

    Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster

    Authors: Lawrence C. Stewart, Carlo Pascoe, Brian W. Sherman, Martin Herbordt, Vipin Sachdeva

    Abstract: Molecular Dynamics (MD) simulations play a central role in physics-driven drug discovery. MD applications often use the Particle Mesh Ewald (PME) algorithm to accelerate electrostatic force computations, but efficient parallelization has proven difficult due to the high communication requirements of distributed 3D FFTs. In this paper, we present the design and implementation of a scalable PME algo… ▽ More

    Submitted 5 April, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

    Comments: Accepted as a poster at FCCM21

  6. arXiv:2007.00826  [pdf, ps, other

    cs.CR

    Secret Sharing MPC on FPGAs in the Datacenter

    Authors: Pierre-Francois Wolfe, Rushi Patel, Robert Munafo, Mayank Varia, Martin Herbordt

    Abstract: Multi-Party Computation (MPC) is a technique enabling data from several sources to be used in a secure computation revealing only the result while protecting the original data, facilitating shared utilization of data sets gathered by different entities. The presence of Field Programmable Gate Array (FPGA) hardware in datacenters can provide accelerated computing as well as low latency, high bandwi… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: 7 pages, 6 figures

  7. CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

    Authors: Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden K. -H. So, Martin Herbordt, Ang Li, Yanzhi Wang

    Abstract: Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achiev… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    ACM Class: C.1.4

  8. arXiv:1908.10834  [pdf, other

    cs.DC cs.LG

    AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing

    Authors: Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, Martin Herbordt

    Abstract: Deep learning systems have been successfully applied to Euclidean data such as images, video, and audio. In many applications, however, information and their relationships are better expressed with graphs. Graph Convolutional Networks (GCNs) appear to be a promising approach to efficiently learn from graph data structures, having shown advantages in many critical applications. As with other deep l… ▽ More

    Submitted 10 September, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

  9. arXiv:1905.05359  [pdf, other

    cs.DC cs.AR

    Fully Integrated On-FPGA Molecular Dynamics Simulations

    Authors: Chen Yang, Tong Geng, Tianqi Wang, Rushi Patel, Qingqing Xiong, Ahmed Sanaullah, Jiayi Sheng, Charles Lin, Vipin Sachdeva, Woody Sherman, Martin C. Herbordt

    Abstract: The implementation of Molecular Dynamics (MD) on FPGAs has received substantial attention. Previous work, however, has consisted of either proof-of-concept implementations of components, usually the range-limited force; full systems, but with much of the work shared by the host CPU; or prototype demonstrations, e.g., using OpenCL, that neither implement a whole system nor have competitive performa… ▽ More

    Submitted 13 May, 2019; originally announced May 2019.

    Comments: 13 pages, 17 figures;

  10. arXiv:1901.01007  [pdf, other

    cs.LG cs.AR cs.DC stat.ML

    FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

    Authors: Tong Geng, Tianqi Wang, Ang Li, Xi **, Martin Herbordt

    Abstract: Deep Neural Networks (DNNs) have revolutionized numerous applications, but the demand for ever more performance remains unabated. Scaling DNN computations to larger clusters is generally done by distributing tasks in batch mode using methods such as distributed synchronous SGD. Among the issues with this approach is that to make the distributed cluster work with high utilization, the workload dist… ▽ More

    Submitted 21 June, 2020; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: Accepted by IEEE TRANSACTIONS ON COMPUTERS (TC)