Search | arXiv e-print repository

arXiv:2406.19522 [pdf, other]

Reliable edge machine learning hardware for scientific applications

Authors: Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan, Olivia Weng, Caleb Geniesse, Amir Gholami, Michael W. Mahoney, Vladimir Loncar, Philip Harris, Joshua Agar, Shuyu Qin

Abstract: Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling… ▽ More Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling ultra-fine-grained model inspection for efficient fault tolerance. We discuss approaches to develo** and validating reliable algorithms at the scientific edge under such strict latency, resource, power, and area requirements in extreme experimental environments. We study metrics for develo** robust algorithms, present preliminary results and mitigation strategies, and conclude with an outlook of these and future directions of research towards the longer-term goal of develo** autonomous scientific experimentation methods for accelerated scientific discovery. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: IEEE VLSI Test Symposium 2024 (VTS)

Report number: FERMILAB-CONF-24-0116-CSAID

arXiv:2403.07066 [pdf, other]

Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Authors: Philip Harris, Michael Kagan, Jeffrey Krupa, Benedikt Maier, Nathaniel Woodward

Abstract: Self-Supervised Learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to… ▽ More Self-Supervised Learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation for contrastive learning. By intervening in the middle of the simulation process and re-running simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how R3SL pre-training enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 24 pages, 9 figures

arXiv:2402.01047 [pdf, other]

Ultra Fast Transformers on FPGAs for Particle Physics Experiments

Authors: Zhixing Jiang, Dennis Yin, Elham E Khoda, Vladimir Loncar, Ekaterina Govorkova, Eric Moreno, Philip Harris, Scott Hauck, Shih-Chieh Hsu

Abstract: This work introduces a highly efficient implementation of the transformer architecture on a Field-Programmable Gate Array (FPGA) by using the \texttt{hls4ml} tool. Given the demonstrated effectiveness of transformer models in addressing a wide range of problems, their application in experimental triggers within particle physics becomes a subject of significant interest. In this work, we have imple… ▽ More This work introduces a highly efficient implementation of the transformer architecture on a Field-Programmable Gate Array (FPGA) by using the \texttt{hls4ml} tool. Given the demonstrated effectiveness of transformer models in addressing a wide range of problems, their application in experimental triggers within particle physics becomes a subject of significant interest. In this work, we have implemented critical components of a transformer model, such as multi-head attention and softmax layers. To evaluate the effectiveness of our implementation, we have focused on a particle physics jet flavor tagging problem, employing a public dataset. We recorded latency under 2 $μ$s on the Xilinx UltraScale+ FPGA, which is compatible with hardware trigger requirements at the CERN Large Hadron Collider experiments. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures

Journal ref: Machine Learning and the Physical Sciences Workshop, NeurIPS 2023

arXiv:2401.09949 [pdf, other]

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning

Authors: Ho Fung Tsoi, Vladimir Loncar, Sridhara Dasu, Philip Harris

Abstract: Contrary to the use of genetic programming, the neural network approach to symbolic regression can scale well with high input dimension and leverage gradient methods for faster equation searching. Common ways of constraining expression complexity have relied on multistage pruning methods with fine-tuning, but these often lead to significant performance loss. In this work, we propose SymbolNet, a n… ▽ More Contrary to the use of genetic programming, the neural network approach to symbolic regression can scale well with high input dimension and leverage gradient methods for faster equation searching. Common ways of constraining expression complexity have relied on multistage pruning methods with fine-tuning, but these often lead to significant performance loss. In this work, we propose SymbolNet, a neural network approach to symbolic regression in a novel framework that enables dynamic pruning of model weights, input features, and mathematical operators in a single training, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term per pruning type, which can adaptively adjust its own strength and lead to convergence to a target sparsity level. In contrast to most existing symbolic regression methods that cannot efficiently handle datasets with more than $O$(10) inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs). △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 11 pages. Submitted to IEEE TNNLS, under review

arXiv:2312.07615 [pdf, other]

Optimizing Likelihood-free Inference using Self-supervised Neural Symmetry Embeddings

Authors: Deep Chatterjee, Philip C. Harris, Maanas Goel, Malina Desai, Michael W. Coughlin, Erik Katsavounidis

Abstract: Likelihood-free inference is quickly emerging as a powerful tool to perform fast/effective parameter estimation. We demonstrate a technique of optimizing likelihood-free inference to make it even faster by marginalizing symmetries in a physical problem. In this approach, physical symmetries, for example, time-translation are learned using joint-embedding via self-supervised learning with symmetry… ▽ More Likelihood-free inference is quickly emerging as a powerful tool to perform fast/effective parameter estimation. We demonstrate a technique of optimizing likelihood-free inference to make it even faster by marginalizing symmetries in a physical problem. In this approach, physical symmetries, for example, time-translation are learned using joint-embedding via self-supervised learning with symmetry data augmentations. Subsequently, parameter inference is performed using a normalizing flow where the embedding network is used to summarize the data before conditioning the parameters. We present this approach on two simple physical problems and we show faster convergence in a smaller number of parameters compared to a normalizing flow that does not use a pre-trained symmetry-informed representation. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted for Machine Learning and the Physical Sciences Workshop (submission 69) at NeurIPS 2023; for codes, see https://github.com/ML4GW/summer-projects-2023/blob/neurips-2023/symmetry-informed-flows/README.md

arXiv:2310.06047 [pdf, other]

Knowledge Distillation for Anomaly Detection

Authors: Adrian Alan Pol, Ekaterina Govorkova, Sonja Gronroos, Nadezda Chernyavskaya, Philip Harris, Maurizio Pierini, Isobel Ojalvo, Peter Elmer

Abstract: Unsupervised deep learning techniques are widely used to identify anomalous behaviour. The performance of such methods is a product of the amount of training data and the model size. However, the size is often a limiting factor for the deployment on resource-constrained devices. We present a novel procedure based on knowledge distillation for compressing an unsupervised anomaly detection model int… ▽ More Unsupervised deep learning techniques are widely used to identify anomalous behaviour. The performance of such methods is a product of the amount of training data and the model size. However, the size is often a limiting factor for the deployment on resource-constrained devices. We present a novel procedure based on knowledge distillation for compressing an unsupervised anomaly detection model into a supervised deployable one and we suggest a set of techniques to improve the detection sensitivity. Compressed models perform comparably to their larger counterparts while significantly reducing the size and memory footprint. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.12265 [pdf, ps, other]

Cost-sharing in Parking Games

Authors: Jennifer Elder, Pamela E. Harris, Jan Kretschmann, J. Carlos Martínez Mori

Abstract: In this paper, we study the total displacement statistic of parking functions from the perspective of cooperative game theory. We introduce parking games, which are coalitional cost-sharing games in characteristic function form derived from the total displacement statistic. We show that parking games are supermodular cost-sharing games, indicating that cooperation is difficult (i.e., their core is… ▽ More In this paper, we study the total displacement statistic of parking functions from the perspective of cooperative game theory. We introduce parking games, which are coalitional cost-sharing games in characteristic function form derived from the total displacement statistic. We show that parking games are supermodular cost-sharing games, indicating that cooperation is difficult (i.e., their core is empty). Next, we study their Shapley value, which formalizes a notion of "fair" cost-sharing and amounts to charging each car for its expected marginal displacement under a random arrival order. Our main contribution is a polynomial-time algorithm to compute the Shapley value of parking games, in contrast with known hardness results on computing the Shapley value of arbitrary games. The algorithm leverages the permutation-invariance of total displacement, combinatorial enumeration, and dynamic programming. We conclude with open questions around alternative solution concepts for supermodular cost-sharing games and connections to other areas in combinatorics. △ Less

Submitted 14 November, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 12 pages

MSC Class: 05A05; 91A12; 91A46

arXiv:2306.12656 [pdf, other]

doi 10.1007/s41666-023-00155-0

Identifying and Extracting Rare Disease Phenotypes with Large Language Models

Authors: Cathy Shyr, Yan Hu, Paul A. Harris, Hua Xu

Abstract: Rare diseases (RDs) are collectively common and affect 300 million people worldwide. Accurate phenoty** is critical for informing diagnosis and treatment, but RD phenotypes are often embedded in unstructured text and time-consuming to extract manually. While natural language processing (NLP) models can perform named entity recognition (NER) to automate extraction, a major bottleneck is the devel… ▽ More Rare diseases (RDs) are collectively common and affect 300 million people worldwide. Accurate phenoty** is critical for informing diagnosis and treatment, but RD phenotypes are often embedded in unstructured text and time-consuming to extract manually. While natural language processing (NLP) models can perform named entity recognition (NER) to automate extraction, a major bottleneck is the development of a large, annotated corpus for model training. Recently, prompt learning emerged as an NLP paradigm that can lead to more generalizable results without any (zero-shot) or few labeled samples (few-shot). Despite growing interest in ChatGPT, a revolutionary large language model capable of following complex human prompts and generating high-quality responses, none have studied its NER performance for RDs in the zero- and few-shot settings. To this end, we engineered novel prompts aimed at extracting RD phenotypes and, to the best of our knowledge, are the first the establish a benchmark for evaluating ChatGPT's performance in these settings. We compared its performance to the traditional fine-tuning approach and conducted an in-depth error analysis. Overall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.591 in the zero- and few-shot settings, respectively). Despite this, ChatGPT achieved similar or higher accuracy for certain entities (i.e., rare diseases and signs) in the one-shot setting (F1 of 0.776 and 0.725). This suggests that with appropriate prompt engineering, ChatGPT has the potential to match or outperform fine-tuned language models for certain entity types with just one labeled sample. While the proliferation of large language models may provide opportunities for supporting RD diagnosis and treatment, researchers and clinicians should critically evaluate model outputs and be well-informed of their limitations. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Journal ref: J Healthc Inform Res 8, 438-461 (2024)

arXiv:2305.04099 [pdf, other]

doi 10.1051/epjconf/202429509036

Symbolic Regression on FPGAs for Fast Machine Learning Inference

Authors: Ho Fung Tsoi, Adrian Alan Pol, Vladimir Loncar, Ekaterina Govorkova, Miles Cranmer, Sridhara Dasu, Peter Elmer, Philip Harris, Isobel Ojalvo, Maurizio Pierini

Abstract: The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance physics sensitivity while still meeting data processing time constraints. In this contribution, we introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR). It searches the equati… ▽ More The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance physics sensitivity while still meeting data processing time constraints. In this contribution, we introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR). It searches the equation space to discover algebraic relations approximating a dataset. We use PySR (a software to uncover these expressions based on an evolutionary algorithm) and extend the functionality of hls4ml (a package for machine learning inference in FPGAs) to support PySR-generated expressions for resource-constrained production environments. Deep learning models often optimize the top metric by pinning the network size because the vast hyperparameter space prevents an extensive search for neural architecture. Conversely, SR selects a set of models on the Pareto front, which allows for optimizing the performance-resource trade-off directly. By embedding symbolic forms, our implementation can dramatically reduce the computational resources needed to perform critical tasks. We validate our method on a physics benchmark: the multiclass classification of jets produced in simulated proton-proton collisions at the CERN Large Hadron Collider. We show that our approach can approximate a 3-layer neural network using an inference model that achieves up to a 13-fold decrease in execution time, down to 5 ns, while still preserving more than 90% approximation accuracy. △ Less

Submitted 17 January, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

Comments: 9 pages. Accepted to 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023)

Journal ref: EPJ Web of Conferences 295, 09036 (2024)

arXiv:2304.02577 [pdf, other]

ECG Feature Importance Rankings: Cardiologists vs. Algorithms

Authors: Temesgen Mehari, Ashish Sundar, Alen Bosnjakovic, Peter Harris, Steven E. Williams, Axel Loewe, Olaf Doessel, Claudia Nagel, Nils Strodthoff, Philip J. Aston

Abstract: Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology,… ▽ More Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology, where we try to distinguish three specific pathologies from healthy subjects based on ECG features comparing to features used in cardiologists' decision rules as ground truth. Some methods generally performed well and others performed poorly, while some methods did well on some but not all of the problems considered. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2301.04633 [pdf, ps, other]

doi 10.1007/s41781-023-00101-0

Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing

Authors: Te** Cai, Kenneth Herner, Tingjun Yang, Michael Wang, Maria Acosta Flechas, Philip Harris, Burt Holzman, Kevin Pedro, Nhan Tran

Abstract: We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics e… ▽ More We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics experiments. We process most of the dataset with the GPU version of our processing algorithm and the remainder with the CPU version for timing comparisons. We find that a 100-GPU cloud-based server is able to easily meet the processing demand, and that using the GPU version of the event processing algorithm is two times faster than processing these data with the CPU version when comparing to the newest CPUs in our sample. The amount of data transferred to the inference server during the GPU runs can overwhelm even the highest-bandwidth network switches, however, unless care is taken to observe network facility limits or otherwise distribute the jobs to multiple sites. We discuss the lessons learned from this processing campaign and several avenues for future improvements. △ Less

Submitted 27 October, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: 13 pages, 9 figures, matches accepted version

Report number: FERMILAB-PUB-22-944-ND-PPD-SCD

Journal ref: Comput Softw Big Sci 7, 11 (2023)

arXiv:2212.05081 [pdf, other]

doi 10.1088/2632-2153/ad12e3

FAIR AI Models in High Energy Physics

Authors: Javier Duarte, Haoyang Li, Avik Roy, Ruike Zhu, E. A. Huerta, Daniel Diaz, Philip Harris, Raghav Kansal, Daniel S. Katz, Ishaan H. Kavoori, Volodymyr V. Kindratenko, Farouk Mokhtar, Mark S. Neubauer, Sang Eon Park, Melissa Quinnan, Roger Rusack, Zhizhen Zhao

Abstract: The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly… ▽ More The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability. △ Less

Submitted 29 December, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 34 pages, 9 figures, 10 tables

Journal ref: Mach. Learn.: Sci. Technol. 4 (2023) 045062

arXiv:2210.08973 [pdf, ps, other]

doi 10.1038/s41597-023-02298-6

FAIR for AI: An interdisciplinary and international community building perspective

Authors: E. A. Huerta, Ben Blaiszik, L. Catherine Brinson, Kristofer E. Bouchard, Daniel Diaz, Caterina Doglioni, Javier M. Duarte, Murali Emani, Ian Foster, Geoffrey Fox, Philip Harris, Lukas Heinrich, Shantenu Jha, Daniel S. Katz, Volodymyr Kindratenko, Christine R. Kirkpatrick, Kati Lassila-Perini, Ravi K. Madduri, Mark S. Neubauer, Fotis E. Psomopoulos, Avik Roy, Oliver Rübel, Zhizhen Zhao, Ruike Zhu

Abstract: A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i… ▽ More A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022. △ Less

Submitted 1 August, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Comments: 10 pages, comments welcome!; v2: 12 pages, accepted to Scientific Data

ACM Class: I.2.0; E.0

Journal ref: Scientific Data 10, 487 (2023)

arXiv:2208.05484 [pdf, other]

doi 10.1007/JHEP07(2023)108

Neural Embedding: Learning the Embedding of the Manifold of Physics Data

Authors: Sang Eon Park, Philip Harris, Bryan Ostdiek

Abstract: In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedd… ▽ More In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets. △ Less

Submitted 14 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2207.09060 [pdf, other]

Data Science and Machine Learning in Education

Authors: Gabriele Benelli, Thomas Y. Chen, Javier Duarte, Matthew Feickert, Matthew Graham, Lindsey Gray, Dan Hackett, Phil Harris, Shih-Chieh Hsu, Gregor Kasieczka, Elham E. Khoda, Matthias Komm, Mia Liu, Mark S. Neubauer, Scarlet Norberg, Alexx Perloff, Marcel Rieger, Claire Savard, Kazuhiro Terao, Savannah Thais, Avik Roy, Jean-Roch Vlimant, Grigorios Chachamis

Abstract: The growing role of data science (DS) and machine learning (ML) in high-energy physics (HEP) is well established and pertinent given the complex detectors, large data, sets and sophisticated analyses at the heart of HEP research. Moreover, exploiting symmetries inherent in physics data have inspired physics-informed ML as a vibrant sub-field of computer science research. HEP researchers benefit gr… ▽ More The growing role of data science (DS) and machine learning (ML) in high-energy physics (HEP) is well established and pertinent given the complex detectors, large data, sets and sophisticated analyses at the heart of HEP research. Moreover, exploiting symmetries inherent in physics data have inspired physics-informed ML as a vibrant sub-field of computer science research. HEP researchers benefit greatly from materials widely available materials for use in education, training and workforce development. They are also contributing to these materials and providing software to DS/ML-related fields. Increasingly, physics departments are offering courses at the intersection of DS, ML and physics, often using curricula developed by HEP researchers and involving open software and data used in HEP. In this white paper, we explore synergies between HEP research and DS/ML education, discuss opportunities and challenges at this intersection, and propose community activities that will be mutually beneficial. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Contribution to Snowmass 2021

arXiv:2207.00559 [pdf, other]

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Authors: Elham E Khoda, Dylan Rankin, Rafael Teixeira de Lima, Philip Harris, Scott Hauck, Shih-Chieh Hsu, Michael Kagan, Vladimir Loncar, Chaitanya Paikara, Richa Rao, Sioni Summers, Caterina Vernieri, Aaron Wang

Abstract: Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neura… ▽ More Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: 12 pages, 6 figures, 5 tables

arXiv:2205.07690 [pdf, other]

Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

Authors: Nicolò Ghielmetti, Vladimir Loncar, Maurizio Pierini, Marcel Roed, Sioni Summers, Thea Aarrestad, Christoffer Petersson, Hampus Linander, Jennifer Ngadiuba, Kelvin Lin, Philip Harris

Abstract: In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx Z… ▽ More In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 11 pages, 6 tables, 5 figures

arXiv:2203.16255 [pdf, other]

Physics Community Needs, Tools, and Resources for Machine Learning

Authors: Philip Harris, Erik Katsavounidis, William Patrick McCormack, Dylan Rankin, Yongbin Feng, Abhijith Gandrakota, Christian Herwig, Burt Holzman, Kevin Pedro, Nhan Tran, Tingjun Yang, Jennifer Ngadiuba, Michael Coughlin, Scott Hauck, Shih-Chieh Hsu, Elham E Khoda, Deming Chen, Mark Neubauer, Javier Duarte, Georgia Karagiorgi, Mia Liu

Abstract: Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utiliz… ▽ More Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utilized and accessed in the coming years. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Contribution to Snowmass 2021, 33 pages, 5 figures

arXiv:2110.13041 [pdf, other]

doi 10.3389/fdata.2022.787421

Applications and Techniques for Fast Machine Learning in Science

Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlap** challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 66 pages, 13 figures, 5 tables

Report number: FERMILAB-PUB-21-502-AD-E-SCD

Journal ref: Front. Big Data 5, 787421 (2022)

arXiv:2108.02214 [pdf, other]

doi 10.1038/s41597-021-01109-0

A FAIR and AI-ready Higgs boson decay dataset

Authors: Yifan Chen, E. A. Huerta, Javier Duarte, Philip Harris, Daniel S. Katz, Mark S. Neubauer, Daniel Diaz, Farouk Mokhtar, Raghav Kansal, Sang Eon Park, Volodymyr V. Kindratenko, Zhizhen Zhao, Roger Rusack

Abstract: To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate… ▽ More To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics. △ Less

Submitted 16 February, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 13 pages, 3 figures. v2: Accepted to Nature Scientific Data. Learn about the FAIR4HEP project at https://fair4hep.github.io. See our invited Behind the Paper Blog in Springer Nature Research Data Community at https://go.nature.com/3oMVYxo

ACM Class: I.2; J.2

Journal ref: Scientific Data volume 9, Article number: 31 (2022)

arXiv:2108.01995 [pdf, other]

doi 10.1098/rsta.2020.0262

Robustness of convolutional neural networks to physiological ECG noise

Authors: J. Venton, P. M. Harris, A. Sundar, N. A. S. Smith, P. J. Aston

Abstract: The electrocardiogram (ECG) is one of the most widespread diagnostic tools in healthcare and supports the diagnosis of cardiovascular disorders. Deep learning methods are a successful and popular technique to detect indications of disorders from an ECG signal. However, there are open questions around the robustness of these methods to various factors, including physiological ECG noise. In this stu… ▽ More The electrocardiogram (ECG) is one of the most widespread diagnostic tools in healthcare and supports the diagnosis of cardiovascular disorders. Deep learning methods are a successful and popular technique to detect indications of disorders from an ECG signal. However, there are open questions around the robustness of these methods to various factors, including physiological ECG noise. In this study we generate clean and noisy versions of an ECG dataset before applying Symmetric Projection Attractor Reconstruction (SPAR) and scalogram image transformations. A pretrained convolutional neural network is trained using transfer learning to classify these image transforms. For the clean ECG dataset, F1 scores for SPAR attractor and scalogram transforms were 0.70 and 0.79, respectively, and the scores decreased by less than 0.05 for the noisy ECG datasets. Notably, when the network trained on clean data was used to classify the noisy datasets, performance decreases of up to 0.18 in F1 scores were seen. However, when the network trained on the noisy data was used to classify the clean dataset, the performance decrease was less than 0.05. We conclude that physiological ECG noise impacts classification using deep learning methods and careful consideration should be given to the inclusion of noisy ECG signals in the training data when develo** supervised networks for ECG classification. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 16 pages, 7 figures

arXiv:2105.01683 [pdf, other]

doi 10.1109/TNS.2021.3087100

A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

Authors: Giuseppe Di Guglielmo, Farah Fahim, Christian Herwig, Manuel Blanco Valentin, Javier Duarte, Cristian Gingu, Philip Harris, James Hirschauer, Martin Kwok, Vladimir Loncar, Yingyi Luo, Llovizna Miranda, Jennifer Ngadiuba, Daniel Noonan, Seda Ogrenci-Memik, Maurizio Pierini, Sioni Summers, Nhan Tran

Abstract: Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission… ▽ More Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the CMS experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the neural network weights, a unique data compression algorithm can be deployed for each sensor in different detector regions, and changing detector or collider conditions. To meet area, performance, and power constraints, we perform a quantization-aware training to create an optimized neural network hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework, and was processed through synthesis and physical layout flows based on a LP CMOS 65 nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates, and reports a total area of 3.6 mm^2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation tolerant on-detector ASIC implementation of a neural network that has been designed for particle physics applications. △ Less

Submitted 4 May, 2021; originally announced May 2021.

Comments: 9 pages, 8 figures, 3 tables

Report number: FERMILAB-PUB-21-217-CMS-E-SCD

Journal ref: IEEE Trans. Nucl. Sci. 68, 2179 (2021)

arXiv:2103.05579 [pdf, other]

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Authors: Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo **dariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Orgrenci-Memik, Thea Aarrestad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier Duarte, Scott Hauck, Shih-Chieh Hsu , et al. (5 additional authors not shown)

Abstract: Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-h… ▽ More Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery. △ Less

Submitted 23 March, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: 10 pages, 8 figures, TinyML Research Symposium 2021

Report number: FERMILAB-CONF-21-080-SCD

arXiv:2103.00560 [pdf, other]

doi 10.1093/icb/icab107

Perspectives on individual animal identification from biology and computer vision

Authors: Maxime Vidal, Nathan Wolf, Beth Rosenberg, Bradley P. Harris, Alexander Mathis

Abstract: Identifying individual animals is crucial for many biological investigations. In response to some of the limitations of current identification methods, new automated computer vision approaches have emerged with strong performance. Here, we review current advances of computer vision identification techniques to provide both computer scientists and biologists with an overview of the available tools… ▽ More Identifying individual animals is crucial for many biological investigations. In response to some of the limitations of current identification methods, new automated computer vision approaches have emerged with strong performance. Here, we review current advances of computer vision identification techniques to provide both computer scientists and biologists with an overview of the available tools and discuss their applications. We conclude by offering recommendations for starting an animal identification project, illustrate current limitations and propose how they might be addressed in the future. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: 12 pages, 1 figure, 2 boxes and 1 table

Journal ref: Integr Comp Biol . 2021 Oct 4;61(3):900-916

arXiv:2101.05108 [pdf, other]

doi 10.1088/2632-2153/ac0ea1

Fast convolutional neural networks on FPGAs with hls4ml

Authors: Thea Aarrestad, Vladimir Loncar, Nicolò Ghielmetti, Maurizio Pierini, Sioni Summers, Jennifer Ngadiuba, Christoffer Petersson, Hampus Linander, Yutaro Iiyama, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo **dariani, Kevin Pedro, Nhan Tran, Mia Liu, Edward Kreinar, Zhenbin Wu, Duc Hoang

Abstract: We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,μ$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Num… ▽ More We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,μ$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation. △ Less

Submitted 29 April, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: 18 pages, 18 figures, 4 tables

Journal ref: Mach. Learn.: Sci. Technol. 2 045015 (2021)

arXiv:2012.01563 [pdf, other]

Accelerated Charged Particle Tracking with Graph Neural Networks on FPGAs

Authors: Aneesh Heintz, Vesal Razavimaleki, Javier Duarte, Gage DeZoort, Isobel Ojalvo, Savannah Thais, Markus Atkinson, Mark Neubauer, Lindsey Gray, Sergo **dariani, Nhan Tran, Philip Harris, Dylan Rankin, Thea Aarrestad, Vladimir Loncar, Maurizio Pierini, Sioni Summers, Jennifer Ngadiuba, Mia Liu, Edward Kreinar, Zhenbin Wu

Abstract: We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. The two complementary FPGA designs are based on OpenCL, a framework for writing programs that execute across heterogeneous platforms, and hls4ml, a high-level-synthesis-based compiler for neural network to firmware conversion. We evaluate and compare the resource usage, latency, an… ▽ More We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. The two complementary FPGA designs are based on OpenCL, a framework for writing programs that execute across heterogeneous platforms, and hls4ml, a high-level-synthesis-based compiler for neural network to firmware conversion. We evaluate and compare the resource usage, latency, and tracking performance of our implementations based on a benchmark dataset. We find a considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing workflows and the FPGA-based Level-1 trigger at the CERN Large Hadron Collider. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Comments: 8 pages, 4 figures, To appear in Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020)

Report number: FERMILAB-CONF-20-622-CMS-SCD

arXiv:2010.08556 [pdf, other]

doi 10.1109/H2RC51942.2020.00010

FPGAs-as-a-Service Toolkit (FaaST)

Authors: Dylan Sheldon Rankin, Jeffrey Krupa, Philip Harris, Maria Acosta Flechas, Burt Holzman, Thomas Klijnsma, Kevin Pedro, Nhan Tran, Scott Hauck, Shih-Chieh Hsu, Matthew Trahms, Kelvin Lin, Yu Lou, Ta-Wei Ho, Javier Duarte, Mia Liu

Abstract: Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs… ▽ More Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: 10 pages, 7 figures, to appear in proceedings of the 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing

Report number: FERMILAB-CONF-20-426-SCD

Journal ref: 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), 2020, pp. 38-47

arXiv:2009.04509 [pdf, other]

doi 10.3389/fdata.2020.604083

GPU-accelerated machine learning inference as a service for computing in neutrino experiments

Authors: Michael Wang, Tingjun Yang, Maria Acosta Flechas, Philip Harris, Benjamin Hawks, Burt Holzman, Kyle Knoepfel, Jeffrey Krupa, Kevin Pedro, Nhan Tran

Abstract: Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences crea… ▽ More Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution. △ Less

Submitted 22 March, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

Comments: 15 pages, 7 figures, 2 tables

Report number: FERMILAB-PUB-20-428-ND-SCD

arXiv:2008.03601 [pdf, other]

doi 10.3389/fdata.2020.598927

Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics

Authors: Yutaro Iiyama, Gianluca Cerminara, Abhijay Gupta, Jan Kieseler, Vladimir Loncar, Maurizio Pierini, Shah Rukh Qasim, Marcel Rieger, Sioni Summers, Gerrit Van Onsem, Kinga Wozniak, Jennifer Ngadiuba, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo **dariani, Mia Liu, Kevin Pedro, Nhan Tran, Edward Kreinar, Zhenbin Wu

Abstract: Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how t… ▽ More Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how to design distance-weighted graph networks that can be executed with a latency of less than 1$μ\mathrm{s}$ on an FPGA. To do so, we consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider. We use a graph network architecture developed for such purposes, and apply additional simplifications to match the computing constraints of Level-1 trigger systems, including weight quantization. Using the $\mathtt{hls4ml}$ library, we convert the compressed models into firmware to be implemented on an FPGA. Performance of the synthesized models is presented both in terms of inference accuracy and resource usage. △ Less

Submitted 3 February, 2021; v1 submitted 8 August, 2020; originally announced August 2020.

Comments: 15 pages, 4 figures

Report number: FERMILAB-PUB-20-405-E-SCD

Journal ref: Frontiers in Big Data 3 (2021) 44

arXiv:2007.10359 [pdf, other]

doi 10.1088/2632-2153/abec21

GPU coprocessors as a service for deep learning inference in high energy physics

Authors: Jeffrey Krupa, Kelvin Lin, Maria Acosta Flechas, Jack Dinsmore, Javier Duarte, Philip Harris, Scott Hauck, Burt Holzman, Shih-Chieh Hsu, Thomas Klijnsma, Mia Liu, Kevin Pedro, Dylan Rankin, Natchanon Suaysom, Matt Trahms, Nhan Tran

Abstract: In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolv… ▽ More In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolve this confrontation provided that algorithms can be sufficiently accelerated. In many cases, algorithmic speedups are found to be largest through the adoption of deep learning algorithms. We present a comprehensive exploration of the use of GPU-based hardware acceleration for deep learning inference within the data reconstruction workflow of high energy physics. We present several realistic examples and discuss a strategy for the seamless integration of coprocessors so that the LHC can maintain, if not exceed, its current performance throughout its running. △ Less

Submitted 23 April, 2021; v1 submitted 20 July, 2020; originally announced July 2020.

Comments: 26 pages, 7 figures, 2 tables

Report number: FERMILAB-PUB-20-338-E-SCD

Journal ref: Mach. Learn.: Sci. Technol. 2 (2021) 035005

arXiv:2004.00606 [pdf, ps, other]

Tipsy cop and drunken robber: a variant of the cop and robber game on graphs

Authors: Pamela Harris, Erik Insko, Alicia Prieto-Langarica, Rade Stoisavljevic, Shaun Sullivan

Abstract: Motivated by a biological scenario illustrated in the YouTube video \url{ https://www.youtube.com/watch?v=Z_mXDvZQ6dU} where a neutrophil chases a bacteria cell moving in random directions, we present a variant of the cop and robber game on graphs called the tipsy cop and drunken robber game. In this game, we place a tipsy cop and a drunken robber at different vertices of a finite connected graph… ▽ More Motivated by a biological scenario illustrated in the YouTube video \url{ https://www.youtube.com/watch?v=Z_mXDvZQ6dU} where a neutrophil chases a bacteria cell moving in random directions, we present a variant of the cop and robber game on graphs called the tipsy cop and drunken robber game. In this game, we place a tipsy cop and a drunken robber at different vertices of a finite connected graph $G$. The game consists of independent moves where the robber begins the game by moving to an adjacent vertex from where he began, this is then followed by the cop moving to an adjacent vertex from where she began. Since the robber is inebriated, he takes random walks on the graph, while the cop being tipsy means that her movements are sometimes random and sometimes intentional. Our main results give formulas for the probability that the robber is still free from capture after $m$ moves of this game on highly symmetric graphs, such as the complete graphs, complete bipartite graphs, and cycle graphs. We also give the expected encounter time between the cop and robber for these families of graphs. We end the manuscript by presenting a general method for computing such probabilities and also detail a variety of directions for future research. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: 18 pages

MSC Class: 05A05; 05C25; 05C30; 05C78; 05C85

arXiv:2003.06308 [pdf, other]

doi 10.1088/2632-2153/aba042

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

Authors: Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Duc Hoang, Sergo **dariani, Edward Kreinar, Mia Liu, Vladimir Loncar, Jennifer Ngadiuba, Kevin Pedro, Maurizio Pierini, Dylan Rankin, Sheila Sagear, Sioni Summers, Nhan Tran, Zhenbin Wu

Abstract: We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parame… ▽ More We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources. △ Less

Submitted 29 June, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

Comments: Update to MLST journal version

Report number: FERMILAB-PUB-20-167-PPD-SCD

Journal ref: Mach. Learn.: Sci. Technol. 2, 015001 (2020)

arXiv:2002.02534 [pdf, other]

doi 10.1088/1748-0221/15/05/p05026

Fast inference of Boosted Decision Trees in FPGAs for particle physics

Authors: Sioni Summers, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Duc Hoang, Sergo **dariani, Edward Kreinar, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Dylan Rankin, Nhan Tran, Zhenbin Wu

Abstract: We describe the implementation of Boosted Decision Trees in the hls4ml library, which allows the translation of a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, hls4ml performs inference of Boosted Decision Tree models with extremely low latency. With a typical latency less than 100 ns, this solution is suitable for FPGA-based… ▽ More We describe the implementation of Boosted Decision Trees in the hls4ml library, which allows the translation of a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, hls4ml performs inference of Boosted Decision Tree models with extremely low latency. With a typical latency less than 100 ns, this solution is suitable for FPGA-based real-time processing, such as in the Level-1 Trigger system of a collider experiment. These developments open up prospects for physicists to deploy BDTs in FPGAs for identifying the origin of jets, better reconstructing the energies of muons, and enabling better selection of rare signal processes. △ Less

Submitted 19 February, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

Journal ref: JINST 15 P05026 (2020)

arXiv:1911.05796 [pdf, ps, other]

Response to NITRD, NCO, NSF Request for Information on "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan"

Authors: J. Amundson, J. Annis, C. Avestruz, D. Bowring, J. Caldeira, G. Cerati, C. Chang, S. Dodelson, D. Elvira, A. Farahi, K. Genser, L. Gray, O. Gutsche, P. Harris, J. Kinney, J. B. Kowalkowski, R. Kutschke, S. Mrenna, B. Nord, A. Para, K. Pedro, G. N. Perdue, A. Scheinker, P. Spentzouris, J. St. John , et al. (5 additional authors not shown)

Abstract: We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspect… ▽ More We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspective of Fermilab, America's premier national laboratory for High Energy Physics (HEP). We believe the NAIRDSP should be extended in light of the rapid pace of development and innovation in the field of Artificial Intelligence (AI) since 2016, and present our recommendations below. AI has profoundly impacted many areas of human life, promising to dramatically reshape society --- e.g., economy, education, science --- in the coming years. We are still early in this process. It is critical to invest now in this technology to ensure it is safe and deployed ethically. Science and society both have a strong need for accuracy, efficiency, transparency, and accountability in algorithms, making investments in scientific AI particularly valuable. Thus far the US has been a leader in AI technologies, and we believe as a national Laboratory it is crucial to help maintain and extend this leadership. Moreover, investments in AI will be important for maintaining US leadership in the physical sciences. △ Less

Submitted 4 November, 2019; originally announced November 2019.

Report number: FERMILAB-FN-1092-SCD

arXiv:1804.06913 [pdf, other]

doi 10.1088/1748-0221/13/07/P07027

Fast inference of deep neural networks in FPGAs for particle physics

Authors: Javier Duarte, Song Han, Philip Harris, Sergo **dariani, Edward Kreinar, Benjamin Kreis, Jennifer Ngadiuba, Maurizio Pierini, Ryan Rivera, Nhan Tran, Zhenbin Wu

Abstract: Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begu… ▽ More Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We develop a package based on High-Level Synthesis (HLS) called hls4ml to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs. For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns. △ Less

Submitted 28 June, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

Comments: 22 pages, 17 figures, 2 tables, JINST revision

Report number: FERMILAB-PUB-18-089-E

Journal ref: JINST 13 P07027 (2018)

arXiv:1711.03477 [pdf, other]

doi 10.1109/LWC.2018.2799863

Achievable Rates and Training Overheads for a Measured LOS Massive MIMO Channel

Authors: Paul Harris, Wael Boukley Hasan, Liang Liu, Steffen Malkowsky, Mark Beach, Simon Armour, Fredrik Tufvesson, Ove Edfors

Abstract: This paper presents achievable uplink (UL) sumrate predictions for a measured line-of-sight (LOS) massive multiple-input, multiple-output (MIMO) (MMIMO) scenario and illustrates the trade-off between spatial multiplexing performance and channel de-coherence rate for an increasing number of base station (BS) antennas. In addition, an orthogonal frequency division multiplexing (OFDM) case study is f… ▽ More This paper presents achievable uplink (UL) sumrate predictions for a measured line-of-sight (LOS) massive multiple-input, multiple-output (MIMO) (MMIMO) scenario and illustrates the trade-off between spatial multiplexing performance and channel de-coherence rate for an increasing number of base station (BS) antennas. In addition, an orthogonal frequency division multiplexing (OFDM) case study is formed which considers the 90% coherence time to evaluate the impact of MMIMO channel training overheads in high-speed LOS scenarios. It is shown that whilst 25% of the achievable zero-forcing (ZF) sumrate is lost when the resounding interval is increased by a factor of 4, the OFDM training overheads for a 100-antenna MMIMO BS using an LTE-like physical layer could be as low as 2% for a terminal speed of 90m/s. △ Less

Submitted 22 February, 2018; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: 4 pages, 5 figures

Journal ref: IEEE Wireless Communications Letters 2018

arXiv:1705.07540 [pdf, other]

doi 10.1049/ic.2016.0064

An Overview of Massive MIMO Research at the University of Bristol

Authors: Paul Harris, Wael Boukley Hasan, Henry Brice, Benny Chitambira, Mark Beach, Evangelos Mellios, Andrew Nix, Simon Armour, Angela Doufexi

Abstract: Massive MIMO has rapidly gained popularity as a technology crucial to the capacity advances required for 5G wireless systems. Since its theoretical conception six years ago, research activity has grown exponentially, and there is now a develo** industrial interest to commercialise the technology. For this to happen effectively, we believe it is crucial that further pragmatic research is conducte… ▽ More Massive MIMO has rapidly gained popularity as a technology crucial to the capacity advances required for 5G wireless systems. Since its theoretical conception six years ago, research activity has grown exponentially, and there is now a develo** industrial interest to commercialise the technology. For this to happen effectively, we believe it is crucial that further pragmatic research is conducted with a view to establish how reality differs from theoretical ideals. This paper presents an overview of the massive MIMO research activities occurring within the Communication Systems & Networks Group at the University of Bristol centred around our 128-antenna real-time testbed, which has been developed through the BIO programmable city initiative in collaboration with NI and Lund University. Through recent preliminary trials, we achieved a world first spectral efficiency of 79.4 bits/s/Hz, and subsequently demonstrated that this could be increased to 145.6 bits/s/Hz. We provide a summary of this work here along with some of our ongoing research directions such as large-scale array wave-front analysis, optimised power control and localisation techniques. △ Less

Submitted 21 May, 2017; originally announced May 2017.

Comments: Presented at the IET Radio Propagation and Technologies for 5G Conference (2016). 5 pages

arXiv:1703.04723 [pdf, other]

Temporal Analysis of Measured LOS Massive MIMO Channels with Mobility

Authors: Paul Harris, Steffen Malkowsky, Joao Vieira, Fredrik Tufvesson, Wael Boukley Hasan, Liang Liu, Mark Beach, Simon Armour, Ove Edfors

Abstract: The first measured results for massive multiple-input, multiple-output (MIMO) performance in a line-of-sight (LOS) scenario with moderate mobility are presented, with 8 users served by a 100 antenna base Station (BS) at 3.7 GHz. When such a large number of channels dynamically change, the inherent propagation and processing delay has a critical relationship with the rate of change, as the use of o… ▽ More The first measured results for massive multiple-input, multiple-output (MIMO) performance in a line-of-sight (LOS) scenario with moderate mobility are presented, with 8 users served by a 100 antenna base Station (BS) at 3.7 GHz. When such a large number of channels dynamically change, the inherent propagation and processing delay has a critical relationship with the rate of change, as the use of outdated channel information can result in severe detection and precoding inaccuracies. For the downlink (DL) in particular, a time division duplex (TDD) configuration synonymous with massive MIMO deployments could mean only the uplink (UL) is usable in extreme cases. Therefore, it is of great interest to investigate the impact of mobility on massive MIMO performance and consider ways to combat the potential limitations. In a mobile scenario with moving cars and pedestrians, the correlation of the MIMO channel vector over time is inspected for vehicles moving up to 29 km/h. For a 100 antenna system, it is found that the channel state information (CSI) update rate requirement may increase by 7 times when compared to an 8 antenna system, whilst the power control update rate could be decreased by at least 5 times relative to a single antenna system. △ Less

Submitted 14 March, 2017; originally announced March 2017.

Comments: Accepted for presentation at the 85th IEEE Vehicular Technology Conference in Sydney. 5 Pages. arXiv admin note: substantial text overlap with arXiv:1701.08818

arXiv:1701.08818 [pdf, other]

doi 10.1109/JSAC.2017.2686678

Performance Characterization of a Real-Time Massive MIMO System with LOS Mobile Channels

Authors: Paul Harris, Steffen Malkowsky, Joao Vieira, Fredrik Tufvesson Wael Boukley Hassan, Liang Liu, Mark Beach, Simon Armour, Ove Edfors

Abstract: The first measured results for massive MIMO performance in a line-of-sight (LOS) scenario with moderate mobility are presented, with 8 users served in real-time using a 100-antenna base Station (BS) at 3.7 GHz. When such a large number of channels dynamically change, the inherent propagation and processing delay has a critical relationship with the rate of change, as the use of outdated channel in… ▽ More The first measured results for massive MIMO performance in a line-of-sight (LOS) scenario with moderate mobility are presented, with 8 users served in real-time using a 100-antenna base Station (BS) at 3.7 GHz. When such a large number of channels dynamically change, the inherent propagation and processing delay has a critical relationship with the rate of change, as the use of outdated channel information can result in severe detection and precoding inaccuracies. For the downlink (DL) in particular, a time division duplex (TDD) configuration synonymous with massive multiple-input, multiple-output (MIMO) deployments could mean only the uplink (UL) is usable in extreme cases. Therefore, it is of great interest to investigate the impact of mobility on massive MIMO performance and consider ways to combat the potential limitations. In a mobile scenario with moving cars and pedestrians, the massive MIMO channel is sampled across many points in space to build a picture of the overall user orthogonality, and the impact of both azimuth and elevation array configurations are considered. Temporal analysis is also conducted for vehicles moving up to 29km/h and real-time bit error rates (BERs) for both the UL and DL without power control are presented. For a 100-antenna system, it is found that the channel state information (CSI) update rate requirement may increase by 7 times when compared to an 8-antenna system, whilst the power control update rate could be decreased by at least 5 times relative to a single antenna system. △ Less

Submitted 19 May, 2017; v1 submitted 30 January, 2017; originally announced January 2017.

Comments: Submitted to the 2017 IEEE JSAC Special Issue on Deployment Issues and Performance Challenges for 5G, IEEE Journal on Selected Areas in Communications, 2017, vol.PP, no.99, pp.1-1

arXiv:1701.01161 [pdf, other]

The World's First Real-Time Testbed for Massive MIMO: Design, Implementation, and Validation

Authors: Steffen Malkowsky, Joao Vieira, Liang Liu, Paul Harris, Karl Nieman, Nikhil Kundargi, Ian Wong, Fredrik Tufvesson, Viktor Öwall, Ove Edfors

Abstract: This paper sets up a framework for designing a massive multiple-input multiple-output (MIMO) testbed by investigating hardware (HW) and system-level requirements such as processing complexity, duplexing mode and frame structure. Taking these into account, a generic system and processing partitioning is proposed which allows flexible scaling and processing distribution onto a multitude of physicall… ▽ More This paper sets up a framework for designing a massive multiple-input multiple-output (MIMO) testbed by investigating hardware (HW) and system-level requirements such as processing complexity, duplexing mode and frame structure. Taking these into account, a generic system and processing partitioning is proposed which allows flexible scaling and processing distribution onto a multitude of physically separated devices. Based on the given HW constraints such as maximum number of links and maximum throughput for peer-to-peer interconnections combined with processing capabilities, the framework allows to evaluate modular HW components. To verify our design approach, we present the LuMaMi (Lund University Massive MIMO) testbed which constitutes the first reconfigurable real-time HW platform for prototy** massive MIMO. Utilizing up to 100 base station antennas and more than 50 Field Programmable Gate Arrays, up to 12 user equipments are served on the same time/frequency resource using an LTE-like Orthogonal Frequency Division Multiplexing time-division duplex-based transmission scheme. Proof-of-concept tests with this system show that massive MIMO can simultaneously serve a multitude of users in a static indoor and static outdoor environment utilizing the same time/frequency resource. △ Less

Submitted 16 May, 2017; v1 submitted 20 December, 2016; originally announced January 2017.

Comments: 15 pages, accepted for publication in IEEE Access

Showing 1–40 of 40 results for author: Harris, P