Search | arXiv e-print repository

doi 10.1007/JHEP07(2023)108

Neural Embedding: Learning the Embedding of the Manifold of Physics Data

Authors: Sang Eon Park, Philip Harris, Bryan Ostdiek

Abstract: In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedd… ▽ More In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets. △ Less

Submitted 14 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2108.02214 [pdf, other]

doi 10.1038/s41597-021-01109-0

A FAIR and AI-ready Higgs boson decay dataset

Authors: Yifan Chen, E. A. Huerta, Javier Duarte, Philip Harris, Daniel S. Katz, Mark S. Neubauer, Daniel Diaz, Farouk Mokhtar, Raghav Kansal, Sang Eon Park, Volodymyr V. Kindratenko, Zhizhen Zhao, Roger Rusack

Abstract: To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate… ▽ More To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics. △ Less

Submitted 16 February, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 13 pages, 3 figures. v2: Accepted to Nature Scientific Data. Learn about the FAIR4HEP project at https://fair4hep.github.io. See our invited Behind the Paper Blog in Springer Nature Research Data Community at https://go.nature.com/3oMVYxo

ACM Class: I.2; J.2

Journal ref: Scientific Data volume 9, Article number: 31 (2022)

arXiv:2101.08320 [pdf, other]

doi 10.1088/1361-6633/ac36b9

The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics

Authors: Gregor Kasieczka, Benjamin Nachman, David Shih, Oz Amram, Anders Andreassen, Kees Benkendorfer, Blaz Bortolato, Gustaaf Brooijmans, Florencia Canelli, Jack H. Collins, Biwei Dai, Felipe F. De Freitas, Barry M. Dillon, Ioan-Mihail Dinu, Zhongtian Dong, Julien Donini, Javier Duarte, D. A. Faroughy, Julia Gonski, Philip Harris, Alan Kahn, Jernej F. Kamenik, Charanjit K. Khosa, Patrick Komiske, Luc Le Pottier , et al. (22 additional authors not shown)

Abstract: A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a… ▽ More A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: 108 pages, 53 figures, 3 tables

arXiv:2011.03550 [pdf, other]

doi 10.1007/JHEP06(2021)030

Quasi Anomalous Knowledge: Searching for new physics with embedded knowledge

Authors: Sang Eon Park, Dylan Rankin, Silviu-Marian Udrescu, Mikaeel Yunus, Philip Harris

Abstract: Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alte… ▽ More Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alternative signal priors that capture some of the salient features of new physics signatures, allowing for the recovery of sensitivity even when the alternative signal is incorrect. This approach can be applied to a broad range of physics models and neural network architectures. In this paper, we apply QUAK to anomaly detection of new physics events at the CERN Large Hadron Collider utilizing variational autoencoders with normalizing flow. △ Less

Submitted 11 June, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: 25 pages, 9 figures

Journal ref: J. High Energ. Phys. 2021, 30 (2021)

Showing 1–4 of 4 results for author: Park, S E