Skip to main content

Showing 1–24 of 24 results for author: Kaeli, D

.
  1. arXiv:2404.15510  [pdf, other

    cs.AR cs.DC cs.LG cs.NE

    NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

    Authors: Kaustubh Shivdikar, Nicolas Bohm Agostini, Malith Jayaweera, Gilbert Jonatan, Jose L. Abellan, Ajay Joshi, John Kim, David Kaeli

    Abstract: Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing. To tackle these challenges, we… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Visit https://neurachip.us for WebGUI based simulations

  2. arXiv:2402.19184  [pdf, other

    cs.PL

    Data Transfer Optimizations for Host-CPU and Accelerators in AXI4MLIR

    Authors: Jude Haris, Nicolas Bohm Agostini, Antonino Tumeo, David Kaeli, José Cano

    Abstract: As custom hardware accelerators become more prevalent, it becomes increasingly important to automatically generate efficient host-driver code that can fully leverage the capabilities of these accelerators. This approach saves time and reduces the likelihood of errors that can occur during manual implementation. AXI4MLIR extends the MLIR compiler framework to generate host-driver code for custom ac… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  3. AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

    Authors: Nicolas Bohm Agostini, Jude Haris, Perry Gibson, Malith Jayaweera, Norm Rubin, Antonino Tumeo, José L. Abellán, José Cano, David Kaeli

    Abstract: This paper addresses the need for automatic and efficient generation of host driver code for arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning and scientific computing. While existing tools have focused on automating accelerator prototy**, little attention has been paid to the host-accelerator in… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: 13 pages, 17 figures, to appear in CGO2024

    ACM Class: D.3.3

  4. arXiv:2312.08656  [pdf, other

    cs.LG cs.AI cs.DC

    MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training

    Authors: Hongwu Peng, Xi Xie, Kaustubh Shivdikar, MD Amit Hasan, Jiahui Zhao, Shaoyi Huang, Omer Khan, David Kaeli, Caiwen Ding

    Abstract: In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges but memory traffic is still significant. We argue t… ▽ More

    Submitted 18 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: ASPLOS 2024 accepted publication

    ACM Class: I.2; C.5

  5. arXiv:2310.17746  [pdf

    cs.PF cs.DC

    Memory Efficient Multithreaded Incremental Segmented Sieve Algorithm

    Authors: Evan Ning, David Kaeli

    Abstract: Prime numbers are fundamental in number theory and play a significant role in various areas, from pure mathematics to practical applications, including cryptography. In this contribution, we introduce a multithreaded implementation of the Segmented Sieve algorithm. In our implementation, instead of handling large prime ranges in one iteration, the sieving process is broken down incrementally, whic… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 10 pages

  6. GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

    Authors: Kaustubh Shivdikar, Yuhui Bao, Rashmi Agrawal, Michael Shen, Gilbert Jonatan, Evelio Mora, Alexander Ingare, Neal Livesay, José L. Abellán, John Kim, Ajay Joshi, David Kaeli

    Abstract: Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computatio… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  7. Thought Bubbles: A Proxy into Players' Mental Model Development

    Authors: Omid Mohaddesi, Noah Chicoine, Min Gong, Ozlem Ergun, Jacqueline Griffin, David Kaeli, Stacy Marsella, Casper Harteveld

    Abstract: Studying mental models has recently received more attention, aiming to understand the cognitive aspects of human-computer interaction. However, there is not enough research on the elicitation of mental models in complex dynamic systems. We present Thought Bubbles as an approach for eliciting mental models and an avenue for understanding players' mental model development in interactive virtual envi… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

  8. arXiv:2209.01290  [pdf, other

    cs.CR cs.AR cs.DC cs.PF

    Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs

    Authors: Kaustubh Shivdikar, Gilbert Jonatan, Evelio Mora, Neal Livesay, Rashmi Agrawal, Ajay Joshi, Jose Abellan, John Kim, David Kaeli

    Abstract: Homomorphic Encryption (HE) enables users to securely outsource both the storage and computation of sensitive data to untrusted servers. Not only does HE offer an attractive solution for security in cloud systems, but lattice-based HE systems are also believed to be resistant to attacks by quantum computers. However, current HE implementations suffer from prohibitively high latency. For lattice-ba… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: Accepted, to be pusblished at SEED 2022 conference (IEEE International Symposium on Secure and Private Execution Environment Design)

  9. To Trust or to Stockpile: Modeling Human-Simulation Interaction in Supply Chain Shortages

    Authors: Omid Mohaddesi, Jacqueline Griffin, Ozlem Ergun, David Kaeli, Stacy Marsella, Casper Harteveld

    Abstract: Understanding decision-making in dynamic and complex settings is a challenge yet essential for preventing, mitigating, and responding to adverse events (e.g., disasters, financial crises). Simulation games have shown promise to advance our understanding of decision-making in such settings. However, an open question remains on how we extract useful information from these games. We contribute an app… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

  10. arXiv:2110.00478  [pdf, other

    cs.AR cs.DC cs.LG

    SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference

    Authors: Jude Haris, Perry Gibson, José Cano, Nicolas Bohm Agostini, David Kaeli

    Abstract: Edge computing devices inherently face tight resource constraints, which is especially apparent when deploying Deep Neural Networks (DNN) with high memory and compute demands. FPGAs are commonly available in edge devices. Since these reconfigurable circuits can achieve higher throughput and lower power consumption than general purpose processors, they are especially well-suited for DNN acceleratio… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: This paper is accepted to SBAC-PAD 2021

  11. arXiv:2108.08910  [pdf, other

    eess.IV cs.AI cs.CV cs.LG cs.NE

    Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

    Authors: Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

    Abstract: Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deploymen… ▽ More

    Submitted 14 February, 2023; v1 submitted 18 August, 2021; originally announced August 2021.

  12. arXiv:2104.00828  [pdf, other

    cs.DC cs.AR cs.HC

    Daisen: A Framework for Visualizing Detailed GPU Execution

    Authors: Yifan Sun, Yixuan Zhang, Ali Mosallaei, Michael D. Shah, Cody Dunne, David Kaeli

    Abstract: Graphics Processing Units (GPUs) have been widely used to accelerate artificial intelligence, physics simulation, medical imaging, and information visualization applications. To improve GPU performance, GPU hardware designers need to identify performance issues by inspecting a huge amount of simulator-generated traces. Visualizing the execution traces can reduce the cognitive burden of users and f… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: EuroVis Camera Ready

  13. arXiv:2009.11242  [pdf, other

    stat.AP stat.ME

    Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

    Authors: Shi Dong, Zlatan Feric, Guangyu Li, Chieh Wu, April Z. Gu, Jennifer Dy, John Meeker, Ingrid Y. Padilla, Jose Cordero, Carmen Velez Vega, Zaira Rosario, Akram Alshawabkeh, David Kaeli

    Abstract: In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant a… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Journal ref: ICMLA 2020

  14. arXiv:2008.02300  [pdf, other

    cs.AR

    MGPU-TSM: A Multi-GPU System with Truly Shared Memory

    Authors: Saiful A. Mojumder, Yifan Sun, Leila Delshadtehrani, Yenai Ma, Trinayan Baruah, José L. Abellán, John Kim, David Kaeli, Ajay Joshi

    Abstract: The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with GPU count because of the overhead of data movement across multiple GPUs. Moreover, a lack of hardware support for coherency exacerbates the problem because a prog… ▽ More

    Submitted 8 August, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: 4 pages, 3 figures

  15. arXiv:2007.16175  [pdf, other

    cs.CR cs.AR

    Hardware/Software Obfuscation against Timing Side-channel Attack on a GPU

    Authors: Elmira Karimi, Yunsi Fei, David Kaeli

    Abstract: GPUs are increasingly being used in security applications, especially for accelerating encryption/decryption. While GPUs are an attractive platform in terms of performance, the security of these devices raises a number of concerns. One vulnerability is the data-dependent timing information, which can be exploited by adversary to recover the encryption key. Memory system features are frequently exp… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

    Comments: 2020 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)

  16. arXiv:2007.04292  [pdf, other

    cs.AR

    HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems

    Authors: Saiful A. Mojumder, Yifan Sun, Leila Delshadtehrani, Yenai Ma, Trinayan Baruah, José L. Abellán, John Kim, David Kaeli, Ajay Joshi

    Abstract: While multi-GPU (MGPU) systems are extremely popular for compute-intensive workloads, several inefficiencies in the memory hierarchy and data movement result in a waste of GPU resources and difficulties in programming MGPU systems. First, due to the lack of hardware-level coherence, the MGPU programming model requires the programmer to replicate and repeatedly transfer data between the GPUs' memor… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: 13 pages, 9 figures

  17. arXiv:2006.01402  [pdf, other

    cs.DC cs.LG

    A Smart Background Scheduler for Storage Systems

    Authors: Maher Kachmar, David Kaeli

    Abstract: In today's enterprise storage systems, supported data services such as snapshot delete or drive rebuild can cause tremendous performance interference if executed inline along with heavy foreground IO, often leading to missing SLOs (Service Level Objectives). Typical storage system applications such as web or VDI (Virtual Desktop Infrastructure) follow a repetitive high/low workload pattern that ca… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

  18. arXiv:1911.11313  [pdf, other

    cs.DC

    Summarizing CPU and GPU Design Trends with Product Data

    Authors: Yifan Sun, Nicolas Bohm Agostini, Shi Dong, David Kaeli

    Abstract: Moore's Law and Dennard Scaling have guided the semiconductor industry for the past few decades. Recently, both laws have faced validity challenges as transistor sizes approach the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In this work, we collect data of more than 4000 publicly-available CPU and GPU products. We fi… ▽ More

    Submitted 13 July, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: Fix flops/watt error

  19. arXiv:1909.03441  [pdf, other

    stat.ML cs.LG stat.AP

    Iterative Spectral Method for Alternative Clustering

    Authors: Chieh Wu, Stratis Ioannidis, Mario Sznaier, Xiangyu Li, David Kaeli, Jennifer G. Dy

    Abstract: Given a dataset and an existing clustering as input, alternative clustering aims to find an alternative partition. One of the state-of-the-art approaches is Kernel Dimension Alternative Clustering (KDAC). We propose a novel Iterative Spectral Method (ISM) that greatly improves the scalability of KDAC. Our algorithm is intuitive, relies on easily implementable spectral decompositions, and comes wit… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

  20. arXiv:1811.02884  [pdf, other

    cs.DC cs.AR

    MGSim + MGMark: A Framework for Multi-GPU System Research

    Authors: Yifan Sun, Trinayan Baruah, Saiful A. Mojumder, Shi Dong, Rafael Ubal, Xiang Gong, Shane Treadway, Yuhui Bao, Vincent Zhao, José L. Abellán, John Kim, Ajay Joshi, David Kaeli

    Abstract: The rapidly growing popularity and scale of data-parallel workloads demand a corresponding increase in raw computational power of GPUs (Graphics Processing Units). As single-GPU systems struggle to satisfy the performance demands, multi-GPU systems have begun to dominate the high-performance computing world. The advent of such systems raises a number of design challenges, including the GPU microar… ▽ More

    Submitted 13 November, 2018; v1 submitted 15 October, 2018; originally announced November 2018.

    Comments: Updated typo

  21. arXiv:1809.05165  [pdf, other

    cs.CR cs.CV cs.LG stat.ML

    Defensive Dropout for Hardening Deep Neural Networks under Adversarial Attacks

    Authors: Siyue Wang, Xiao Wang, Pu Zhao, Wujie Wen, David Kaeli, Peter Chin, Xue Lin

    Abstract: Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That is, adversarial examples, obtained by adding delicately crafted distortions onto original legal inputs, can mislead a DNN to classify them as any target labels. This work provides a solution to hardening DNNs under adversarial attacks through defensive dropout. Besides using dropout during training for the best test accu… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: Accepted as conference paper on ICCAD 2018

  22. arXiv:1711.03244  [pdf, other

    cs.DC physics.comp-ph

    Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

    Authors: Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang

    Abstract: We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language (OpenCL) framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterog… ▽ More

    Submitted 25 January, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

    Comments: Accepted for Publication in Journal of Biomedical Optics Letters on Jan 4, 2018, to appear in Volume 23, Issue 2

    Journal ref: J. Biomed. Opt. 23(1), 010504 (2018)

  23. arXiv:1609.06756  [pdf

    cs.CY

    21st Century Computer Architecture

    Authors: Mark D. Hill, Sarita Adve, Luis Ceze, Mary Jane Irwin, David Kaeli, Margaret Martonosi, Josep Torrellas, Thomas F. Wenisch, David Wood, Katherine Yelick

    Abstract: Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) and made less demanding applications easier to devel… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

    Comments: A Computing Community Consortium (CCC) white paper, 16 pages

  24. Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education

    Authors: Renato Figueiredo, P. Oscar Boykin, Jose A. B. Fortes, Tao Li, Jie-Kwon Peir, David Wolinsky, Lizy John, David Kaeli, David Lilja, Sally McKee, Gokhan Memik, Alain Roy, Gary Tyson

    Abstract: This paper introduces Archer, a community-based computing resource for computer architecture research and education. The Archer infrastructure integrates virtualization and batch scheduling middleware to deliver high-throughput computing resources aggregated from resources distributed across wide-area networks and owned by different participating entities in a seamless manner. The paper discusse… ▽ More

    Submitted 10 July, 2008; originally announced July 2008.

    Comments: 11 pages, 2 figures. Describes the Archer project, http://archer-project.org

    ACM Class: C.0; I.6.3; C.2.4