-
Neural Embedding: Learning the Embedding of the Manifold of Physics Data
Authors:
Sang Eon Park,
Philip Harris,
Bryan Ostdiek
Abstract:
In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedd…
▽ More
In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets.
△ Less
Submitted 14 August, 2022; v1 submitted 10 August, 2022;
originally announced August 2022.
-
A FAIR and AI-ready Higgs boson decay dataset
Authors:
Yifan Chen,
E. A. Huerta,
Javier Duarte,
Philip Harris,
Daniel S. Katz,
Mark S. Neubauer,
Daniel Diaz,
Farouk Mokhtar,
Raghav Kansal,
Sang Eon Park,
Volodymyr V. Kindratenko,
Zhizhen Zhao,
Roger Rusack
Abstract:
To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate…
▽ More
To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.
△ Less
Submitted 16 February, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics
Authors:
Gregor Kasieczka,
Benjamin Nachman,
David Shih,
Oz Amram,
Anders Andreassen,
Kees Benkendorfer,
Blaz Bortolato,
Gustaaf Brooijmans,
Florencia Canelli,
Jack H. Collins,
Biwei Dai,
Felipe F. De Freitas,
Barry M. Dillon,
Ioan-Mihail Dinu,
Zhongtian Dong,
Julien Donini,
Javier Duarte,
D. A. Faroughy,
Julia Gonski,
Philip Harris,
Alan Kahn,
Jernej F. Kamenik,
Charanjit K. Khosa,
Patrick Komiske,
Luc Le Pottier
, et al. (22 additional authors not shown)
Abstract:
A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a…
▽ More
A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Quasi Anomalous Knowledge: Searching for new physics with embedded knowledge
Authors:
Sang Eon Park,
Dylan Rankin,
Silviu-Marian Udrescu,
Mikaeel Yunus,
Philip Harris
Abstract:
Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alte…
▽ More
Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alternative signal priors that capture some of the salient features of new physics signatures, allowing for the recovery of sensitivity even when the alternative signal is incorrect. This approach can be applied to a broad range of physics models and neural network architectures. In this paper, we apply QUAK to anomaly detection of new physics events at the CERN Large Hadron Collider utilizing variational autoencoders with normalizing flow.
△ Less
Submitted 11 June, 2021; v1 submitted 6 November, 2020;
originally announced November 2020.