-
R2 Indicator and Deep Reinforcement Learning Enhanced Adaptive Multi-Objective Evolutionary Algorithm
Authors:
Farajollah Tahernezhad-Javazm,
Debbie Rankin,
Naomi Du Bois,
Alice E. Smith,
Damien Coyle
Abstract:
Choosing an appropriate optimization algorithm is essential to achieving success in optimization challenges. Here we present a new evolutionary algorithm structure that utilizes a reinforcement learning-based agent aimed at addressing these issues. The agent employs a double deep q-network to choose a specific evolutionary operator based on feedback it receives from the environment during optimiza…
▽ More
Choosing an appropriate optimization algorithm is essential to achieving success in optimization challenges. Here we present a new evolutionary algorithm structure that utilizes a reinforcement learning-based agent aimed at addressing these issues. The agent employs a double deep q-network to choose a specific evolutionary operator based on feedback it receives from the environment during optimization. The algorithm's structure contains five single-objective evolutionary algorithm operators. This single-objective structure is transformed into a multi-objective one using the R2 indicator. This indicator serves two purposes within our structure: first, it renders the algorithm multi-objective, and second, provides a means to evaluate each algorithm's performance in each generation to facilitate constructing the reinforcement learning-based reward function. The proposed R2-reinforcement learning multi-objective evolutionary algorithm (R2-RLMOEA) is compared with six other multi-objective algorithms that are based on R2 indicators. These six algorithms include the operators used in R2-RLMOEA as well as an R2 indicator-based algorithm that randomly selects operators during optimization. We benchmark performance using the CEC09 functions, with performance measured by inverted generational distance and spacing. The R2-RLMOEA algorithm outperforms all other algorithms with strong statistical significance (p<0.001) when compared with the average spacing metric across all ten benchmarks.
△ Less
Submitted 17 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
A machine-learning pipeline for real-time detection of gravitational waves from compact binary coalescences
Authors:
Ethan Marx,
William Benoit,
Alec Gunny,
Rafia Omer,
Deep Chatterjee,
Ricco C. Venterea,
Lauren Wills,
Muhammed Saleem,
Eric Moreno,
Ryan Raikman,
Ekaterina Govorkova,
Dylan Rankin,
Michael W. Coughlin,
Philip Harris,
Erik Katsavounidis
Abstract:
The promise of multi-messenger astronomy relies on the rapid detection of gravitational waves at very low latencies ($\mathcal{O}$(1\,s)) in order to maximize the amount of time available for follow-up observations. In recent years, neural-networks have demonstrated robust non-linear modeling capabilities and millisecond-scale inference at a comparatively small computational footprint, making them…
▽ More
The promise of multi-messenger astronomy relies on the rapid detection of gravitational waves at very low latencies ($\mathcal{O}$(1\,s)) in order to maximize the amount of time available for follow-up observations. In recent years, neural-networks have demonstrated robust non-linear modeling capabilities and millisecond-scale inference at a comparatively small computational footprint, making them an attractive family of algorithms in this context. However, integration of these algorithms into the gravitational-wave astrophysics research ecosystem has proven non-trivial. Here, we present the first fully machine learning-based pipeline for the detection of gravitational waves from compact binary coalescences (CBCs) running in low-latency. We demonstrate this pipeline to have a fraction of the latency of traditional matched filtering search pipelines while achieving state-of-the-art sensitivity to higher-mass stellar binary black holes.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Graph Neural Network-based Tracking as a Service
Authors:
Haoran Zhao,
Andrew Naylor,
Shih-Chieh Hsu,
Paolo Calafiura,
Steven Farrell,
Yongbing Feng,
Philip Coleman Harris,
Elham E Khoda,
William Patrick Mccormack,
Dylan Sheldon Rankin,
Xiangyang Ju
Abstract:
Recent studies have shown promising results for track finding in dense environments using Graph Neural Network (GNN)-based algorithms. However, GNN-based track finding is computationally slow on CPUs, necessitating the use of coprocessors to accelerate the inference time. Additionally, the large input graph size demands a large device memory for efficient computation, a requirement not met by all…
▽ More
Recent studies have shown promising results for track finding in dense environments using Graph Neural Network (GNN)-based algorithms. However, GNN-based track finding is computationally slow on CPUs, necessitating the use of coprocessors to accelerate the inference time. Additionally, the large input graph size demands a large device memory for efficient computation, a requirement not met by all computing facilities used for particle physics experiments, particularly those lacking advanced GPUs. Furthermore, deploying the GNN-based track-finding algorithm in a production environment requires the installation of all dependent software packages, exclusively utilized by this algorithm. These computing challenges must be addressed for the successful implementation of GNN-based track-finding algorithm into production settings. In response, we introduce a ``GNN-based tracking as a service'' approach, incorporating a custom backend within the NVIDIA Triton inference server to facilitate GNN-based tracking. This paper presents the performance of this approach using the Perlmutter supercomputer at NERSC.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Co-Clustering Multi-View Data Using the Latent Block Model
Authors:
Joshua Tobin,
Michaela Black,
James Ng,
Debbie Rankin,
Jonathan Wallace,
Catherine Hughes,
Leane Hoey,
Adrian Moore,
**ling Wang,
Geraldine Horigan,
Paul Carlin,
Helene McNulty,
Anne M Molloy,
Mimi Zhang
Abstract:
The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations.…
▽ More
The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations. In this work, we introduce the multi-view LBM, extending the LBM method to multi-view data, where each view marginally follows an LBM. In the case of two views, the dependence between them is captured by a cluster membership matrix, and we aim to learn the structure of this matrix. We develop a likelihood-based approach in which parameter estimation uses a stochastic EM algorithm integrating a Gibbs sampler, and an ICL criterion is derived to determine the number of row and column clusters in each view. To motivate the application of multi-view methods, we extend recent work develo** hypothesis tests for the null hypothesis that clusters of observations in each view are independent of each other. The testing procedure is integrated into the model estimation strategy. Furthermore, we introduce a penalty scheme to generate sparse row clusterings. We verify the performance of the developed algorithm using synthetic datasets, and provide guidance for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Non-equilibrium molecular dynamics of steady-state fluid transport through a 2D membrane driven by a concentration gradient
Authors:
Daniel J. Rankin,
David M. Huang
Abstract:
We use a novel non-equilibrium algorithm to simulate steady-state fluid transport through a two-dimensional (2D) membrane due to a concentration gradient by molecular dynamics (MD) for the first time. We confirm that, as required by the Onsager reciprocal relations in the linear-response regime, the solution flux obtained using this algorithm agrees with the excess solute flux obtained from an est…
▽ More
We use a novel non-equilibrium algorithm to simulate steady-state fluid transport through a two-dimensional (2D) membrane due to a concentration gradient by molecular dynamics (MD) for the first time. We confirm that, as required by the Onsager reciprocal relations in the linear-response regime, the solution flux obtained using this algorithm agrees with the excess solute flux obtained from an established non-equilibrium MD algorithm for pressure-driven flow. In addition, we show that the concentration-gradient solution flux in this regime is quantified far more efficiently by explicitly applying a transmembrane concentration difference using our algorithm than by applying Onsager reciprocity to pressure-driven flow. The simulated fluid fluxes are captured with reasonable quantitative accuracy by our previously derived continuum theory of concentration-gradient-driven fluid transport through a 2D membrane [J. Chem. Phys. 151, 044705 (2019)] for a wide range of solution and membrane parameters even though the simulated pore sizes are only several times the size of the fluid particles. The simulations deviate from the theory especially for strong solute--membrane interactions relative to the thermal energy, for which the theoretical approximations break down. Our findings will be beneficial for molecular-level understanding of fluid transport driven by concentration gradients through membranes made from 2D materials, which have diverse applications in energy harvesting, molecular separations, and biosensing.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
GWAK: Gravitational-Wave Anomalous Knowledge with Recurrent Autoencoders
Authors:
Ryan Raikman,
Eric A. Moreno,
Ekaterina Govorkova,
Ethan J Marx,
Alec Gunny,
William Benoit,
Deep Chatterjee,
Rafia Omer,
Muhammed Saleem,
Dylan S Rankin,
Michael W Coughlin,
Philip C Harris,
Erik Katsavounidis
Abstract:
Matched-filtering detection techniques for gravitational-wave (GW) signals in ground-based interferometers rely on having well-modeled templates of the GW emission. Such techniques have been traditionally used in searches for compact binary coalescences (CBCs), and have been employed in all known GW detections so far. However, interesting science cases aside from compact mergers do not yet have ac…
▽ More
Matched-filtering detection techniques for gravitational-wave (GW) signals in ground-based interferometers rely on having well-modeled templates of the GW emission. Such techniques have been traditionally used in searches for compact binary coalescences (CBCs), and have been employed in all known GW detections so far. However, interesting science cases aside from compact mergers do not yet have accurate enough modeling to make matched filtering possible, including core-collapse supernovae and sources where stochasticity may be involved. Therefore the development of techniques to identify sources of these types is of significant interest. In this paper, we present a method of anomaly detection based on deep recurrent autoencoders to enhance the search region to unmodeled transients. We use a semi-supervised strategy that we name Gravitational Wave Anomalous Knowledge (GWAK). While the semi-supervised nature of the problem comes with a cost in terms of accuracy as compared to supervised techniques, there is a qualitative advantage in generalizing experimental sensitivity beyond pre-computed signal templates. We construct a low-dimensional embedded space using the GWAK method, capturing the physical signatures of distinct signals on each axis of the space. By introducing signal priors that capture some of the salient features of GW signals, we allow for the recovery of sensitivity even when an unmodeled anomaly is encountered. We show that regions of the GWAK space can identify CBCs, detector glitches and also a variety of unmodeled astrophysical sources.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Demonstration of Machine Learning-assisted real-time noise regression in gravitational wave detectors
Authors:
Muhammed Saleem,
Alec Gunny,
Chia-Jui Chou,
Li-Cheng Yang,
Shu-Wei Yeh,
Andy H. Y. Chen,
Ryan Magee,
William Benoit,
Tri Nguyen,
Pinchen Fan,
Deep Chatterjee,
Ethan Marx,
Eric Moreno,
Rafia Omer,
Ryan Raikman,
Dylan Rankin,
Ritwik Sharma,
Michael Coughlin,
Philip Harris,
Erik Katsavounidis
Abstract:
Real-time noise regression algorithms are crucial for maximizing the science outcomes of the LIGO, Virgo, and KAGRA gravitational-wave detectors. This includes improvements in the detectability, source localization and pre-merger detectability of signals thereby enabling rapid multi-messenger follow-up. In this paper, we demonstrate the effectiveness of \textit{DeepClean}, a convolutional neural n…
▽ More
Real-time noise regression algorithms are crucial for maximizing the science outcomes of the LIGO, Virgo, and KAGRA gravitational-wave detectors. This includes improvements in the detectability, source localization and pre-merger detectability of signals thereby enabling rapid multi-messenger follow-up. In this paper, we demonstrate the effectiveness of \textit{DeepClean}, a convolutional neural network architecture that uses witness sensors to estimate and subtract non-linear and non-stationary noise from gravitational-wave strain data. Our study uses LIGO data from the third observing run with injected compact binary signals. As a demonstration, we use \textit{DeepClean} to subtract the noise at 60 Hz due to the power mains and their sidebands arising from non-linear coupling with other instrumental noise sources. Our parameter estimation study on the injected signals shows that \textit{DeepClean} does not do any harm to the underlying astrophysical signals in the data while it can enhances the signal-to-noise ratio of potential signals. We show that \textit{DeepClean} can be used for low-latency noise regression to produce cleaned output data at latencies $\sim 1-2$\, s. We also discuss various considerations that may be made while training \textit{DeepClean} for low latency applications.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Authors:
Elham E Khoda,
Dylan Rankin,
Rafael Teixeira de Lima,
Philip Harris,
Scott Hauck,
Shih-Chieh Hsu,
Michael Kagan,
Vladimir Loncar,
Chaitanya Paikara,
Richa Rao,
Sioni Summers,
Caterina Vernieri,
Aaron Wang
Abstract:
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neura…
▽ More
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Smart sensors using artificial intelligence for on-detector electronics and ASICs
Authors:
Gabriella Carini,
Grzegorz Deptuch,
Jennet Dickinson,
Dionisio Doering,
Angelo Dragone,
Farah Fahim,
Philip Harris,
Ryan Herbst,
Christian Herwig,
** Huang,
Soumyajit Mandal,
Cristina Mantilla Suarez,
Allison McCarn Deiana,
Sandeep Miryala,
F. Mitchell Newcomer,
Benjamin Parpillon,
Veljko Radeka,
Dylan Rankin,
Yihui Ren,
Lorenzo Rota,
Larry Ruckman,
Nhan Tran
Abstract:
Cutting edge detectors push sensing technology by further improving spatial and temporal resolution, increasing detector area and volume, and generally reducing backgrounds and noise. This has led to a explosion of more and more data being generated in next-generation experiments. Therefore, the need for near-sensor, at the data source, processing with more powerful algorithms is becoming increasi…
▽ More
Cutting edge detectors push sensing technology by further improving spatial and temporal resolution, increasing detector area and volume, and generally reducing backgrounds and noise. This has led to a explosion of more and more data being generated in next-generation experiments. Therefore, the need for near-sensor, at the data source, processing with more powerful algorithms is becoming increasingly important to more efficiently capture the right experimental data, reduce downstream system complexity, and enable faster and lower-power feedback loops. In this paper, we discuss the motivations and potential applications for on-detector AI. Furthermore, the unique requirements of particle physics can uniquely drive the development of novel AI hardware and design tools. We describe existing modern work for particle physics in this area. Finally, we outline a number of areas of opportunity where we can advance machine learning techniques, codesign workflows, and future microelectronics technologies which will accelerate design, performance, and implementations for next generation experiments.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Physics Community Needs, Tools, and Resources for Machine Learning
Authors:
Philip Harris,
Erik Katsavounidis,
William Patrick McCormack,
Dylan Rankin,
Yongbin Feng,
Abhijith Gandrakota,
Christian Herwig,
Burt Holzman,
Kevin Pedro,
Nhan Tran,
Tingjun Yang,
Jennifer Ngadiuba,
Michael Coughlin,
Scott Hauck,
Shih-Chieh Hsu,
Elham E Khoda,
Deming Chen,
Mark Neubauer,
Javier Duarte,
Georgia Karagiorgi,
Mia Liu
Abstract:
Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utiliz…
▽ More
Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utilized and accessed in the coming years.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Applications and Techniques for Fast Machine Learning in Science
Authors:
Allison McCarn Deiana,
Nhan Tran,
Joshua Agar,
Michaela Blott,
Giuseppe Di Guglielmo,
Javier Duarte,
Philip Harris,
Scott Hauck,
Mia Liu,
Mark S. Neubauer,
Jennifer Ngadiuba,
Seda Ogrenci-Memik,
Maurizio Pierini,
Thea Aarrestad,
Steffen Bahr,
Jurgen Becker,
Anne-Sophie Berthold,
Richard J. Bonventre,
Tomas E. Muller Bravo,
Markus Diefenthaler,
Zhen Dong,
Nick Fritzsche,
Amir Gholami,
Ekaterina Govorkova,
Kyle J Hazelwood
, et al. (62 additional authors not shown)
Abstract:
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac…
▽ More
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlap** challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Hardware-accelerated Inference for Real-Time Gravitational-Wave Astronomy
Authors:
Alec Gunny,
Dylan Rankin,
Jeffrey Krupa,
Muhammed Saleem,
Tri Nguyen,
Michael Coughlin,
Philip Harris,
Erik Katsavounidis,
Steven Timm,
Burt Holzman
Abstract:
The field of transient astronomy has seen a revolution with the first gravitational-wave detections and the arrival of multi-messenger observations they enabled. Transformed by the first detection of binary black hole and binary neutron star mergers, computational demands in gravitational-wave astronomy are expected to grow by at least a factor of two over the next five years as the global network…
▽ More
The field of transient astronomy has seen a revolution with the first gravitational-wave detections and the arrival of multi-messenger observations they enabled. Transformed by the first detection of binary black hole and binary neutron star mergers, computational demands in gravitational-wave astronomy are expected to grow by at least a factor of two over the next five years as the global network of kilometer-scale interferometers are brought to design sensitivity. With the increase in detector sensitivity, real-time delivery of gravitational-wave alerts will become increasingly important as an enabler of multi-messenger followup. In this work, we report a novel implementation and deployment of deep learning inference for real-time gravitational-wave data denoising and astrophysical source identification. This is accomplished using a generic Inference-as-a-Service model that is capable of adapting to the future needs of gravitational-wave data analysis. Our implementation allows seamless incorporation of hardware accelerators and also enables the use of commercial or private (dedicated) as-a-service computing. Based on our results, we propose a paradigm shift in low-latency and offline computing in gravitational-wave astronomy. Such a shift can address key challenges in peak-usage, scalability and reliability, and provide a data analysis platform particularly optimized for deep learning applications. The achieved sub-millisecond scale latency will also be relevant for any machine learning-based real-time control systems that may be invoked in the operation of near-future and next generation ground-based laser interferometers, as well as the front-end collection, distribution and processing of data from such instruments.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Authors:
Farah Fahim,
Benjamin Hawks,
Christian Herwig,
James Hirschauer,
Sergo **dariani,
Nhan Tran,
Luca P. Carloni,
Giuseppe Di Guglielmo,
Philip Harris,
Jeffrey Krupa,
Dylan Rankin,
Manuel Blanco Valentin,
Josiah Hester,
Yingyi Luo,
John Mamish,
Seda Orgrenci-Memik,
Thea Aarrestad,
Hamza Javed,
Vladimir Loncar,
Maurizio Pierini,
Adrian Alan Pol,
Sioni Summers,
Javier Duarte,
Scott Hauck,
Shih-Chieh Hsu
, et al. (5 additional authors not shown)
Abstract:
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-h…
▽ More
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.
△ Less
Submitted 23 March, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics
Authors:
Gregor Kasieczka,
Benjamin Nachman,
David Shih,
Oz Amram,
Anders Andreassen,
Kees Benkendorfer,
Blaz Bortolato,
Gustaaf Brooijmans,
Florencia Canelli,
Jack H. Collins,
Biwei Dai,
Felipe F. De Freitas,
Barry M. Dillon,
Ioan-Mihail Dinu,
Zhongtian Dong,
Julien Donini,
Javier Duarte,
D. A. Faroughy,
Julia Gonski,
Philip Harris,
Alan Kahn,
Jernej F. Kamenik,
Charanjit K. Khosa,
Patrick Komiske,
Luc Le Pottier
, et al. (22 additional authors not shown)
Abstract:
A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a…
▽ More
A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Fast convolutional neural networks on FPGAs with hls4ml
Authors:
Thea Aarrestad,
Vladimir Loncar,
Nicolò Ghielmetti,
Maurizio Pierini,
Sioni Summers,
Jennifer Ngadiuba,
Christoffer Petersson,
Hampus Linander,
Yutaro Iiyama,
Giuseppe Di Guglielmo,
Javier Duarte,
Philip Harris,
Dylan Rankin,
Sergo **dariani,
Kevin Pedro,
Nhan Tran,
Mia Liu,
Edward Kreinar,
Zhenbin Wu,
Duc Hoang
Abstract:
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,μ$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Num…
▽ More
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,μ$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.
△ Less
Submitted 29 April, 2021; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Accelerated Charged Particle Tracking with Graph Neural Networks on FPGAs
Authors:
Aneesh Heintz,
Vesal Razavimaleki,
Javier Duarte,
Gage DeZoort,
Isobel Ojalvo,
Savannah Thais,
Markus Atkinson,
Mark Neubauer,
Lindsey Gray,
Sergo **dariani,
Nhan Tran,
Philip Harris,
Dylan Rankin,
Thea Aarrestad,
Vladimir Loncar,
Maurizio Pierini,
Sioni Summers,
Jennifer Ngadiuba,
Mia Liu,
Edward Kreinar,
Zhenbin Wu
Abstract:
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. The two complementary FPGA designs are based on OpenCL, a framework for writing programs that execute across heterogeneous platforms, and hls4ml, a high-level-synthesis-based compiler for neural network to firmware conversion. We evaluate and compare the resource usage, latency, an…
▽ More
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. The two complementary FPGA designs are based on OpenCL, a framework for writing programs that execute across heterogeneous platforms, and hls4ml, a high-level-synthesis-based compiler for neural network to firmware conversion. We evaluate and compare the resource usage, latency, and tracking performance of our implementations based on a benchmark dataset. We find a considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing workflows and the FPGA-based Level-1 trigger at the CERN Large Hadron Collider.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Quasi Anomalous Knowledge: Searching for new physics with embedded knowledge
Authors:
Sang Eon Park,
Dylan Rankin,
Silviu-Marian Udrescu,
Mikaeel Yunus,
Philip Harris
Abstract:
Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alte…
▽ More
Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alternative signal priors that capture some of the salient features of new physics signatures, allowing for the recovery of sensitivity even when the alternative signal is incorrect. This approach can be applied to a broad range of physics models and neural network architectures. In this paper, we apply QUAK to anomaly detection of new physics events at the CERN Large Hadron Collider utilizing variational autoencoders with normalizing flow.
△ Less
Submitted 11 June, 2021; v1 submitted 6 November, 2020;
originally announced November 2020.
-
FPGAs-as-a-Service Toolkit (FaaST)
Authors:
Dylan Sheldon Rankin,
Jeffrey Krupa,
Philip Harris,
Maria Acosta Flechas,
Burt Holzman,
Thomas Klijnsma,
Kevin Pedro,
Nhan Tran,
Scott Hauck,
Shih-Chieh Hsu,
Matthew Trahms,
Kelvin Lin,
Yu Lou,
Ta-Wei Ho,
Javier Duarte,
Mia Liu
Abstract:
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs…
▽ More
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics
Authors:
Yutaro Iiyama,
Gianluca Cerminara,
Abhijay Gupta,
Jan Kieseler,
Vladimir Loncar,
Maurizio Pierini,
Shah Rukh Qasim,
Marcel Rieger,
Sioni Summers,
Gerrit Van Onsem,
Kinga Wozniak,
Jennifer Ngadiuba,
Giuseppe Di Guglielmo,
Javier Duarte,
Philip Harris,
Dylan Rankin,
Sergo **dariani,
Mia Liu,
Kevin Pedro,
Nhan Tran,
Edward Kreinar,
Zhenbin Wu
Abstract:
Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how t…
▽ More
Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how to design distance-weighted graph networks that can be executed with a latency of less than 1$μ\mathrm{s}$ on an FPGA. To do so, we consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider. We use a graph network architecture developed for such purposes, and apply additional simplifications to match the computing constraints of Level-1 trigger systems, including weight quantization. Using the $\mathtt{hls4ml}$ library, we convert the compressed models into firmware to be implemented on an FPGA. Performance of the synthesized models is presented both in terms of inference accuracy and resource usage.
△ Less
Submitted 3 February, 2021; v1 submitted 8 August, 2020;
originally announced August 2020.
-
GPU coprocessors as a service for deep learning inference in high energy physics
Authors:
Jeffrey Krupa,
Kelvin Lin,
Maria Acosta Flechas,
Jack Dinsmore,
Javier Duarte,
Philip Harris,
Scott Hauck,
Burt Holzman,
Shih-Chieh Hsu,
Thomas Klijnsma,
Mia Liu,
Kevin Pedro,
Dylan Rankin,
Natchanon Suaysom,
Matt Trahms,
Nhan Tran
Abstract:
In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolv…
▽ More
In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolve this confrontation provided that algorithms can be sufficiently accelerated. In many cases, algorithmic speedups are found to be largest through the adoption of deep learning algorithms. We present a comprehensive exploration of the use of GPU-based hardware acceleration for deep learning inference within the data reconstruction workflow of high energy physics. We present several realistic examples and discuss a strategy for the seamless integration of coprocessors so that the LHC can maintain, if not exceed, its current performance throughout its running.
△ Less
Submitted 23 April, 2021; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
Authors:
Giuseppe Di Guglielmo,
Javier Duarte,
Philip Harris,
Duc Hoang,
Sergo **dariani,
Edward Kreinar,
Mia Liu,
Vladimir Loncar,
Jennifer Ngadiuba,
Kevin Pedro,
Maurizio Pierini,
Dylan Rankin,
Sheila Sagear,
Sioni Summers,
Nhan Tran,
Zhenbin Wu
Abstract:
We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parame…
▽ More
We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
△ Less
Submitted 29 June, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Fast inference of Boosted Decision Trees in FPGAs for particle physics
Authors:
Sioni Summers,
Giuseppe Di Guglielmo,
Javier Duarte,
Philip Harris,
Duc Hoang,
Sergo **dariani,
Edward Kreinar,
Vladimir Loncar,
Jennifer Ngadiuba,
Maurizio Pierini,
Dylan Rankin,
Nhan Tran,
Zhenbin Wu
Abstract:
We describe the implementation of Boosted Decision Trees in the hls4ml library, which allows the translation of a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, hls4ml performs inference of Boosted Decision Tree models with extremely low latency. With a typical latency less than 100 ns, this solution is suitable for FPGA-based…
▽ More
We describe the implementation of Boosted Decision Trees in the hls4ml library, which allows the translation of a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, hls4ml performs inference of Boosted Decision Tree models with extremely low latency. With a typical latency less than 100 ns, this solution is suitable for FPGA-based real-time processing, such as in the Level-1 Trigger system of a collider experiment. These developments open up prospects for physicists to deploy BDTs in FPGAs for identifying the origin of jets, better reconstructing the energies of muons, and enabling better selection of rare signal processes.
△ Less
Submitted 19 February, 2020; v1 submitted 5 February, 2020;
originally announced February 2020.
-
An approach to constraining the Higgs width at the LHC and HL-LHC
Authors:
Philip Coleman Harris,
Dylan Sheldon Rankin,
Cristina Mantilla Suarez
Abstract:
Despite the discovery of the Higgs boson decay in five separate channels many parameters of the Higgs boson remain largely unconstrained. In this paper, we present a new approach to constraining the Higgs total width by requiring the Higgs to be resolved as a single high p$_T$ jet and measuring the inclusive Higgs boson cross section. To measure the inclusive Higgs boson cross section, we rely on…
▽ More
Despite the discovery of the Higgs boson decay in five separate channels many parameters of the Higgs boson remain largely unconstrained. In this paper, we present a new approach to constraining the Higgs total width by requiring the Higgs to be resolved as a single high p$_T$ jet and measuring the inclusive Higgs boson cross section. To measure the inclusive Higgs boson cross section, we rely on new approaches from machine learning and a modified jet reconstruction. This approach is found to be complementary to the existing off-shell width measurement and, with the full HL-LHC luminosity, is capable of yielding similar sensitivity to the off-shell projections. We outline the theoretical and experimental limitations and present a path towards making this approach a truly model-independent measurement of the Higgs boson total width.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Entrance Effects in Concentration-Gradient-Driven Flow Through an Ultrathin Porous Membrane
Authors:
Daniel J. Rankin,
Lydéric Bocquet,
David M. Huang
Abstract:
Transport of liquid mixtures through porous membranes is central to processes such as desalination, chemical separations and energy harvesting, with ultrathin membranes made from novel 2D nanomaterials showing exceptional promise. Here we derive, for the first time, general equations for the solution and solute fluxes through a circular pore in an ultrathin planar membrane induced by a solute conc…
▽ More
Transport of liquid mixtures through porous membranes is central to processes such as desalination, chemical separations and energy harvesting, with ultrathin membranes made from novel 2D nanomaterials showing exceptional promise. Here we derive, for the first time, general equations for the solution and solute fluxes through a circular pore in an ultrathin planar membrane induced by a solute concentration gradient. We show that the equations accurately capture the fluid fluxes measured in finite-element numerical simulations for weak solute-membrane interactions. We also derive scaling laws for these fluxes as a function of the pore size and the strength and range of solute-membrane interactions. These scaling relationships differ markedly from those for concentration-gradient-driven flow through a long cylindrical pore or for flow induced by a pressure gradient or electric field through a pore in an ultrathin membrane. These results have broad implications for transport of liquid mixtures through membranes with a thickness on the order of the characteristic pore size.
△ Less
Submitted 25 June, 2019; v1 submitted 24 April, 2019;
originally announced April 2019.
-
FPGA-accelerated machine learning inference as a service for particle physics computing
Authors:
Javier Duarte,
Philip Harris,
Scott Hauck,
Burt Holzman,
Shih-Chieh Hsu,
Sergo **dariani,
Suffian Khan,
Benjamin Kreis,
Brian Lee,
Mia Liu,
Vladimir Lončar,
Jennifer Ngadiuba,
Kevin Pedro,
Brandon Perez,
Maurizio Pierini,
Dylan Rankin,
Nhan Tran,
Matthew Trahms,
Aristeidis Tsaris,
Colin Versteeg,
Ted W. Way,
Dustin Werran,
Zhenbin Wu
Abstract:
New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of mach…
▽ More
New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
△ Less
Submitted 16 October, 2019; v1 submitted 18 April, 2019;
originally announced April 2019.