Skip to main content

Showing 1–19 of 19 results for author: Kastner, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19522  [pdf, other

    cs.LG

    Reliable edge machine learning hardware for scientific applications

    Authors: Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan, Olivia Weng, Caleb Geniesse, Amir Gholami, Michael W. Mahoney, Vladimir Loncar, Philip Harris, Joshua Agar, Shuyu Qin

    Abstract: Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: IEEE VLSI Test Symposium 2024 (VTS)

    Report number: FERMILAB-CONF-24-0116-CSAID

  2. arXiv:2403.08980  [pdf, other

    cs.LG cs.AR

    Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications

    Authors: Olivia Weng, Alexander Redding, Nhan Tran, Javier Mauricio Duarte, Ryan Kastner

    Abstract: With more scientific fields relying on neural networks (NNs) to process data incoming at extreme throughputs and latencies, it is crucial to develop NNs with all their parameters stored on-chip. In many of these applications, there is not enough time to go off-chip and retrieve weights. Even more so, off-chip memory such as DRAM does not have the bandwidth required to process these NNs as fast as… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2401.15639  [pdf, other

    cs.AR

    TOP: Towards Open & Predictable Heterogeneous SoCs

    Authors: Luca Valente, Francesco Restuccia, Davide Rossi, Ryan Kastner, Luca Benini

    Abstract: Ensuring predictability in modern real-time Systems-on-Chip (SoCs) is an increasingly critical concern for many application domains such as automotive, robotics, and industrial automation. An effective approach involves the modeling and development of hardware components, such as interconnects and shared memory resources, to evaluate or enforce their deterministic behavior. Unfortunately, these IP… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  4. arXiv:2304.08263  [pdf, other

    cs.CR cs.AR

    Information Flow Coverage Metrics for Hardware Security Verification

    Authors: Andres Meza, Ryan Kastner

    Abstract: Security graphs model attacks, defenses, mitigations, and vulnerabilities on computer networks and systems. With proper attributes, they provide security metrics using standard graph algorithms. A hyperflow graph is a register-transfer level (RTL) hardware security graph that facilitates security verification. A hyperflow graph models information flows and is annotated with attributes that allow s… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 6 pages, 3 Figures

  5. arXiv:2304.07543  [pdf, other

    cs.CV

    Within-Camera Multilayer Perceptron DVS Denoising

    Authors: A. Rios-Navarro, S. Guo, G Abarajithan, K. Vijayakumar, A. Linares-Barranco, T. Aarrestad, R. Kastner, T. Delbruck

    Abstract: In-camera event denoising reduces the data rate of event cameras by filtering out noise at the source. A lightweight multilayer perceptron denoising filter (MLPF) provides state-of-the-art low-cost denoising accuracy. It processes a small neighborhood of pixels from the timestamp image around each event to discriminate signal and noise events. This paper proposes two digital logic implementations… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: Accepted to 2023 CVPRW Workshop on Event-Based Vision

  6. arXiv:2303.17881  [pdf, other

    cs.CR cs.AR

    Pentimento: Data Remanence in Cloud FPGAs

    Authors: Colin Drewes, Olivia Weng, Andres Meza, Alric Althoff, David Kohlbrenner, Ryan Kastner, Dustin Richmond

    Abstract: Cloud FPGAs strike an alluring balance between computational efficiency, energy efficiency, and cost. It is the flexibility of the FPGA architecture that enables these benefits, but that very same flexibility that exposes new security vulnerabilities. We show that a remote attacker can recover "FPGA pentimenti" - long-removed secret data belonging to a prior user of a cloud FPGA. The sensitive dat… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: 17 Pages, 8 Figures

  7. arXiv:2301.07247  [pdf, other

    cs.CV cs.LG cs.NE

    Tailor: Altering Skip Connections for Resource-Efficient Inference

    Authors: Olivia Weng, Gabriel Marcano, Vladimir Loncar, Alireza Khodamoradi, Nojan Sheybani, Andres Meza, Farinaz Koushanfar, Kristof Denolf, Javier Mauricio Duarte, Ryan Kastner

    Abstract: Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's sk… ▽ More

    Submitted 15 September, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  8. arXiv:2206.11791  [pdf, other

    cs.LG cs.AR

    Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

    Authors: Hendrik Borras, Giuseppe Di Guglielmo, Javier Duarte, Nicolò Ghielmetti, Ben Hawks, Scott Hauck, Shih-Chieh Hsu, Ryan Kastner, Jason Liang, Andres Meza, Jules Muhizi, Tai Nguyen, Rushil Roy, Nhan Tran, Yaman Umuroglu, Olivia Weng, Aidan Yokuda, Michaela Blott

    Abstract: We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classificatio… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 15 pages, 7 figures, Contribution to 3rd Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench) at 5th Conference on Machine Learning and Systems (MLSys)

    Report number: FERMILAB-CONF-22-479-SCD

  9. arXiv:2110.13041  [pdf, other

    cs.LG cs.AR physics.data-an physics.ins-det

    Applications and Techniques for Fast Machine Learning in Science

    Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

    Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: 66 pages, 13 figures, 5 tables

    Report number: FERMILAB-PUB-21-502-AD-E-SCD

    Journal ref: Front. Big Data 5, 787421 (2022)

  10. arXiv:2110.06870  [pdf, ps, other

    cs.DC

    Junkyard Computing: Repurposing Discarded Smartphones to Minimize Carbon

    Authors: Jennifer Switzer, Gabriel Marcano, Ryan Kastner, Pat Pannuto

    Abstract: 1.5 billion smartphones are sold annually, and most are decommissioned less than two years later. Most of these unwanted smartphones are neither discarded nor recycled but languish in junk drawers and storage units. This computational stockpile represents a substantial wasted potential: modern smartphones have increasingly high-performance and energy-efficient processors, extensive networking capa… ▽ More

    Submitted 25 October, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

  11. arXiv:2106.13263  [pdf, other

    cs.CR cs.AR

    AKER: A Design and Verification Framework for Safe andSecure SoC Access Control

    Authors: Francesco Restuccia, Andres Meza, Ryan Kastner

    Abstract: Modern systems on a chip (SoCs) utilize heterogeneous architectures where multiple IP cores have concurrent access to on-chip shared resources. In security-critical applications, IP cores have different privilege levels for accessing shared resources, which must be regulated by an access control system. AKER is a design and verification framework for SoC access control. AKER builds upon the Access… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  12. Isadora: Automated Information Flow Property Generation for Hardware Designs

    Authors: Calvin Deutschbein, Andres Meza, Francesco Restuccia, Ryan Kastner, Cynthia Sturton

    Abstract: Isadora is a methodology for creating information flow specifications of hardware designs. The methodology combines information flow tracking and specification mining to produce a set of information flow properties that are suitable for use during the security validation process, and which support a better understanding of the security posture of the design. Isadora is fully automated; the user pr… ▽ More

    Submitted 2 October, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: 10 pages, 4 figures, accepted at ASHES 2021

  13. arXiv:2102.01351  [pdf

    cs.CV cs.AR

    Hardware-efficient Residual Networks for FPGAs

    Authors: Olivia Weng, Alireza Khodamoradi, Ryan Kastner

    Abstract: Residual networks (ResNets) employ skip connections in their networks -- reusing activations from previous layers -- to improve training convergence, but these skip connections create challenges for hardware implementations of ResNets. The hardware must either wait for skip connections to be processed before processing more incoming data or buffer them elsewhere. Without skip connections, ResNets… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: Presented at DATE Friday Workshop on System-level Design Methods for Deep Learning on Heterogeneous Architectures (SLOHA 2021) (arXiv:2102.00818)

    Report number: SLOHA/2021/10

  14. arXiv:2012.02791  [pdf, other

    cs.AR

    A Unified Model for Gate Level Propagation Analysis

    Authors: Jeremy Blackstone, Wei Hu, Alric Althoff, Armaiti Ardeshiricham, Lu Zhang, Ryan Kastner

    Abstract: Classic hardware verification techniques (e.g., X-propagation and fault-propagation) and more recent hardware security verification techniques based on information flow tracking (IFT) aim to understand how information passes, affects, and otherwise modifies a circuit. These techniques all have separate usage scenarios, but when dissected into their core functionality, they relate in a fundamental… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

  15. FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

    Authors: Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, Farinaz Koushanfar

    Abstract: Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inferen… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

    Comments: Published as a conference paper at ICCAD 2019

    Journal ref: @inproceedings {1143,booktitle = {IEEE/ACM 2019 International Conference On Computer Aided Design (ICCAD)},year = {2019},month = {November}}

  16. arXiv:2001.10717  [pdf, other

    eess.IV cs.CV physics.med-ph

    Patient Specific Biomechanics Are Clinically Significant In Accurate Computer Aided Surgical Image Guidance

    Authors: Michael Barrow, Alice Chao, Qizhi He, Sonia Ramamoorthy, Claude Sirlin, Ryan Kastner

    Abstract: Augmented Reality is used in Image Guided surgery (AR IG) to fuse surgical landmarks from preoperative images into a video overlay. Physical simulation is essential to maintaining accurate position of the landmarks as surgery progresses and ensuring patient safety by avoiding accidental damage to vessels etc. In liver procedures, AR IG simulation accuracy is hampered by an inability to model stiff… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: 7 pages, 8 figures

  17. arXiv:1909.00713  [pdf, other

    cs.CV cs.RO

    Estimation of Absolute Scale in Monocular SLAM Using Synthetic Data

    Authors: Danila Rukhovich, Daniel Mouritzen, Ralf Kaestner, Martin Rufli, Alexander Velizhev

    Abstract: This paper addresses the problem of scale estimation in monocular SLAM by estimating absolute distances between camera centers of consecutive image frames. These estimates would improve the overall performance of classical (not deep) SLAM systems and allow metric feature locations to be recovered from a single monocular camera. We propose several network architectures that lead to an improvement o… ▽ More

    Submitted 2 September, 2019; originally announced September 2019.

  18. arXiv:1805.03648  [pdf, other

    cs.AR

    Parallel Programming for FPGAs

    Authors: Ryan Kastner, Janarbek Matai, Stephen Neuendorffer

    Abstract: This book focuses on the use of algorithmic high-level synthesis (HLS) to build application-specific FPGA systems. Our goal is to give the reader an appreciation of the process of creating an optimized hardware design using HLS. Although the details are, of necessity, different from parallel programming for multicore processors or GPUs, many of the fundamental concepts are similar. For example, de… ▽ More

    Submitted 9 May, 2018; originally announced May 2018.

  19. arXiv:1408.5870  [pdf

    cs.SE cs.PL

    Enabling FPGAs for the Masses

    Authors: Janarbek Matai, Dustin Richmond, Dajung Lee, Ryan Kastner

    Abstract: Implementing an application on a FPGA remains a difficult, non-intuitive task that often requires hardware design expertise in a hardware description language (HDL). High-level synthesis (HLS) raises the design abstraction from HDL to languages such as C/C++/Scala/Java. Despite this, in order to get a good quality of result (QoR), a designer must carefully craft the HLS code. In other words, HLS d… ▽ More

    Submitted 20 August, 2014; originally announced August 2014.

    Comments: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423)

    Report number: FSP/2014/03