Search | arXiv e-print repository

arXiv:2406.19522 [pdf, other]

Reliable edge machine learning hardware for scientific applications

Authors: Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan, Olivia Weng, Caleb Geniesse, Amir Gholami, Michael W. Mahoney, Vladimir Loncar, Philip Harris, Joshua Agar, Shuyu Qin

Abstract: Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling… ▽ More Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling ultra-fine-grained model inspection for efficient fault tolerance. We discuss approaches to develo** and validating reliable algorithms at the scientific edge under such strict latency, resource, power, and area requirements in extreme experimental environments. We study metrics for develo** robust algorithms, present preliminary results and mitigation strategies, and conclude with an outlook of these and future directions of research towards the longer-term goal of develo** autonomous scientific experimentation methods for accelerated scientific discovery. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: IEEE VLSI Test Symposium 2024 (VTS)

Report number: FERMILAB-CONF-24-0116-CSAID

arXiv:2310.16447 [pdf, other]

ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

Authors: Xiaoxuan Ma, Stephan P. Kaufhold, Jiajun Su, Wentao Zhu, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang

Abstract: Understanding the behavior of non-human primates is crucial for improving animal welfare, modeling social behavior, and gaining insights into distinctively human and phylogenetically shared behaviors. However, the lack of datasets on non-human primate behavior hinders in-depth exploration of primate social interactions, posing challenges to research on our closest living relatives. To address thes… ▽ More Understanding the behavior of non-human primates is crucial for improving animal welfare, modeling social behavior, and gaining insights into distinctively human and phylogenetically shared behaviors. However, the lack of datasets on non-human primate behavior hinders in-depth exploration of primate social interactions, posing challenges to research on our closest living relatives. To address these limitations, we present ChimpACT, a comprehensive dataset for quantifying the longitudinal behavior and social relations of chimpanzees within a social group. Spanning from 2015 to 2018, ChimpACT features videos of a group of over 20 chimpanzees residing at the Leipzig Zoo, Germany, with a particular focus on documenting the developmental trajectory of one young male, Azibo. ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160,500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels. We benchmark representative methods of three tracks on ChimpACT: (i) tracking and identification, (ii) pose estimation, and (iii) spatiotemporal action detection of the chimpanzees. Our experiments reveal that ChimpACT offers ample opportunities for both devising new methods and adapting existing ones to solve fundamental computer vision tasks applied to chimpanzee groups, such as detection, pose estimation, and behavior analysis, ultimately deepening our comprehension of communication and sociality in non-human primates. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2304.08263 [pdf, other]

Information Flow Coverage Metrics for Hardware Security Verification

Authors: Andres Meza, Ryan Kastner

Abstract: Security graphs model attacks, defenses, mitigations, and vulnerabilities on computer networks and systems. With proper attributes, they provide security metrics using standard graph algorithms. A hyperflow graph is a register-transfer level (RTL) hardware security graph that facilitates security verification. A hyperflow graph models information flows and is annotated with attributes that allow s… ▽ More Security graphs model attacks, defenses, mitigations, and vulnerabilities on computer networks and systems. With proper attributes, they provide security metrics using standard graph algorithms. A hyperflow graph is a register-transfer level (RTL) hardware security graph that facilitates security verification. A hyperflow graph models information flows and is annotated with attributes that allow security metrics to measure flow paths, flow conditions, and flow rates. Hyperflow graphs enable the understanding of hardware vulnerabilities related to confidentiality, integrity, and availability, as shown on the OpenTitan hardware root of trust under several threat models. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 6 pages, 3 Figures

arXiv:2303.17881 [pdf, other]

Pentimento: Data Remanence in Cloud FPGAs

Authors: Colin Drewes, Olivia Weng, Andres Meza, Alric Althoff, David Kohlbrenner, Ryan Kastner, Dustin Richmond

Abstract: Cloud FPGAs strike an alluring balance between computational efficiency, energy efficiency, and cost. It is the flexibility of the FPGA architecture that enables these benefits, but that very same flexibility that exposes new security vulnerabilities. We show that a remote attacker can recover "FPGA pentimenti" - long-removed secret data belonging to a prior user of a cloud FPGA. The sensitive dat… ▽ More Cloud FPGAs strike an alluring balance between computational efficiency, energy efficiency, and cost. It is the flexibility of the FPGA architecture that enables these benefits, but that very same flexibility that exposes new security vulnerabilities. We show that a remote attacker can recover "FPGA pentimenti" - long-removed secret data belonging to a prior user of a cloud FPGA. The sensitive data constituting an FPGA pentimento is an analog imprint from bias temperature instability (BTI) effects on the underlying transistors. We demonstrate how this slight degradation can be measured using a time-to-digital (TDC) converter when an adversary programs one into the target cloud FPGA. This technique allows an attacker to ascertain previously safe information on cloud FPGAs, even after it is no longer explicitly present. Notably, it can allow an attacker who knows a non-secret "skeleton" (the physical structure, but not the contents) of the victim's design to (1) extract proprietary details from an encrypted FPGA design image available on the AWS marketplace and (2) recover data loaded at runtime by a previous user of a cloud FPGA using a known design. Our experiments show that BTI degradation (burn-in) and recovery are measurable and constitute a security threat to commercial cloud FPGAs. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: 17 Pages, 8 Figures

arXiv:2301.07247 [pdf, other]

Tailor: Altering Skip Connections for Resource-Efficient Inference

Authors: Olivia Weng, Gabriel Marcano, Vladimir Loncar, Alireza Khodamoradi, Nojan Sheybani, Andres Meza, Farinaz Koushanfar, Kristof Denolf, Javier Mauricio Duarte, Ryan Kastner

Abstract: Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's sk… ▽ More Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware efficient implementation with minimal to no accuracy loss. We introduce Tailor, a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network's skip connections to lower their hardware cost. Tailor improves resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs for on-chip, dataflow-style architectures. Tailor increases performance by 30% and reduces memory bandwidth by 45% for a 2D processing element array architecture. △ Less

Submitted 15 September, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

arXiv:2206.11791 [pdf, other]

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Authors: Hendrik Borras, Giuseppe Di Guglielmo, Javier Duarte, Nicolò Ghielmetti, Ben Hawks, Scott Hauck, Shih-Chieh Hsu, Ryan Kastner, Jason Liang, Andres Meza, Jules Muhizi, Tai Nguyen, Rushil Roy, Nhan Tran, Yaman Umuroglu, Olivia Weng, Aidan Yokuda, Michaela Blott

Abstract: We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classificatio… ▽ More We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20 $μ$s and energy consumption as low as 30 $μ$J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: 15 pages, 7 figures, Contribution to 3rd Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench) at 5th Conference on Machine Learning and Systems (MLSys)

Report number: FERMILAB-CONF-22-479-SCD

arXiv:2110.13041 [pdf, other]

doi 10.3389/fdata.2022.787421

Applications and Techniques for Fast Machine Learning in Science

Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlap** challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 66 pages, 13 figures, 5 tables

Report number: FERMILAB-PUB-21-502-AD-E-SCD

Journal ref: Front. Big Data 5, 787421 (2022)

arXiv:2106.13263 [pdf, other]

AKER: A Design and Verification Framework for Safe andSecure SoC Access Control

Authors: Francesco Restuccia, Andres Meza, Ryan Kastner

Abstract: Modern systems on a chip (SoCs) utilize heterogeneous architectures where multiple IP cores have concurrent access to on-chip shared resources. In security-critical applications, IP cores have different privilege levels for accessing shared resources, which must be regulated by an access control system. AKER is a design and verification framework for SoC access control. AKER builds upon the Access… ▽ More Modern systems on a chip (SoCs) utilize heterogeneous architectures where multiple IP cores have concurrent access to on-chip shared resources. In security-critical applications, IP cores have different privilege levels for accessing shared resources, which must be regulated by an access control system. AKER is a design and verification framework for SoC access control. AKER builds upon the Access Control Wrapper (ACW) -- a high performance and easy-to-integrate hardware module that dynamically manages access to shared resources. To build an SoC access control system, AKER distributes the ACWs throughout the SoC, wrap** controller IP cores, and configuring the ACWs to perform local access control. To ensure the access control system is functioning correctly and securely, AKER provides a property-driven security verification using MITRE common weakness enumerations. AKER verifies the SoC access control at the IP level to ensure the absence of bugs in the functionalities of the ACW module, at the firmware level to confirm the secure operation of the ACW when integrated with a hardware root-of-trust (HRoT), and at the system level to evaluate security threats due to the interactions among shared resources. The performance, resource usage, and security of access control systems implemented through AKER is experimentally evaluated on a Xilinx UltraScale+ programmable SoC, it is integrated with the OpenTitan hardware root-of-trust, and it is used to design an access control system for the OpenPULP multicore architecture. △ Less

Submitted 24 June, 2021; originally announced June 2021.

arXiv:2106.07449 [pdf, other]

doi 10.1145/3474376.3487286

Isadora: Automated Information Flow Property Generation for Hardware Designs

Authors: Calvin Deutschbein, Andres Meza, Francesco Restuccia, Ryan Kastner, Cynthia Sturton

Abstract: Isadora is a methodology for creating information flow specifications of hardware designs. The methodology combines information flow tracking and specification mining to produce a set of information flow properties that are suitable for use during the security validation process, and which support a better understanding of the security posture of the design. Isadora is fully automated; the user pr… ▽ More Isadora is a methodology for creating information flow specifications of hardware designs. The methodology combines information flow tracking and specification mining to produce a set of information flow properties that are suitable for use during the security validation process, and which support a better understanding of the security posture of the design. Isadora is fully automated; the user provides only the design under consideration and a testbench and need not supply a threat model nor security specifications. We evaluate Isadora on a RISC-V processor plus two designs related to SoC access control. Isadora generates security properties that align with those suggested by the Common Weakness Enumerations (CWEs), and in the case of the SoC designs, align with the properties written manually by security experts. △ Less

Submitted 2 October, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: 10 pages, 4 figures, accepted at ASHES 2021

arXiv:1706.04254 [pdf, ps, other]

Automatic Localization of Deep Stimulation Electrodes Using Trajectory-based Segmentation Approach

Authors: Roger Gomez Nieto, Andres Marino Alvarez Meza, Julian David Echeverry Correa, Alvaro Angel Orozco Gutierrez

Abstract: Parkinson's disease (PD) is a degenerative condition of the nervous system, which manifests itself primarily as muscle stiffness, hypokinesia, bradykinesia, and tremor. In patients suffering from advanced stages of PD, Deep Brain Stimulation neurosurgery (DBS) is the best alternative to medical treatment, especially when they become tolerant to the drugs. This surgery produces a neuronal activity,… ▽ More Parkinson's disease (PD) is a degenerative condition of the nervous system, which manifests itself primarily as muscle stiffness, hypokinesia, bradykinesia, and tremor. In patients suffering from advanced stages of PD, Deep Brain Stimulation neurosurgery (DBS) is the best alternative to medical treatment, especially when they become tolerant to the drugs. This surgery produces a neuronal activity, a result from electrical stimulation, whose quantification is known as Volume of Tissue Activated (VTA). To locate correctly the VTA in the cerebral volume space, one should be aware exactly the location of the tip of the DBS electrodes, as well as their spatial projection. In this paper, we automatically locate DBS electrodes using a threshold-based medical imaging segmentation methodology, determining the optimal value of this threshold adaptively. The proposed methodology allows the localization of DBS electrodes in Computed Tomography (CT) images, with high noise tolerance, using automatic threshold detection methods. △ Less

Submitted 13 June, 2017; originally announced June 2017.

Comments: 13 pages, 5 figures

Showing 1–10 of 10 results for author: Meza, A