-
Variational Pseudo Marginal Methods for Jet Reconstruction in Particle Physics
Authors:
Hanming Yang,
Antonio Khalil Moretti,
Sebastian Macaluso,
Philippe Chlenski,
Christian A. Naesseth,
Itsik Pe'er
Abstract:
Reconstructing jets, which provide vital insights into the properties and histories of subatomic particles produced in high-energy collisions, is a main problem in data analyses in collider physics. This intricate task deals with estimating the latent structure of a jet (binary tree) and involves parameters such as particle energy, momentum, and types. While Bayesian methods offer a natural approa…
▽ More
Reconstructing jets, which provide vital insights into the properties and histories of subatomic particles produced in high-energy collisions, is a main problem in data analyses in collider physics. This intricate task deals with estimating the latent structure of a jet (binary tree) and involves parameters such as particle energy, momentum, and types. While Bayesian methods offer a natural approach for handling uncertainty and leveraging prior knowledge, they face significant challenges due to the super-exponential growth of potential jet topologies as the number of observed particles increases. To address this, we introduce a Combinatorial Sequential Monte Carlo approach for inferring jet latent structures. As a second contribution, we leverage the resulting estimator to develop a variational inference algorithm for parameter learning. Building on this, we introduce a variational family using a pseudo-marginal framework for a fully Bayesian treatment of all variables, unifying the generative model with the inference process. We illustrate our method's effectiveness through experiments using data generated with a collider physics generative model, highlighting superior speed and accuracy across a range of tasks.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
The Quantum Trellis: A classical algorithm for sampling the parton shower with interference effects
Authors:
Sebastian Macaluso,
Kyle Cranmer
Abstract:
Simulations of high-energy particle collisions, such as those used at the Large Hadron Collider, are based on quantum field theory; however, many approximations are made in practice. For example, the simulation of the parton shower, which gives rise to objects called `jets', is based on a semi-classical approximation that neglects various interference effects. While there is a desire to incorporat…
▽ More
Simulations of high-energy particle collisions, such as those used at the Large Hadron Collider, are based on quantum field theory; however, many approximations are made in practice. For example, the simulation of the parton shower, which gives rise to objects called `jets', is based on a semi-classical approximation that neglects various interference effects. While there is a desire to incorporate interference effects, new computational techniques are needed to cope with the exponential growth in complexity associated to quantum processes. We present a classical algorithm called the quantum trellis to efficiently compute the un-normalized probability density over N-body phase space including all interference effects, and we pair this with an MCMC-based sampling strategy. This provides a potential path forward for classical computers and a strong baseline for approaches based on quantum computing.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Exact and Approximate Hierarchical Clustering Using A*
Authors:
Craig S. Greenberg,
Sebastian Macaluso,
Nicholas Monath,
Avinava Dubey,
Patrick Flaherty,
Manzil Zaheer,
Amr Ahmed,
Kyle Cranmer,
Andrew McCallum
Abstract:
Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To…
▽ More
Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel \emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with $10^{12}$ trees to $10^{15}$ trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than $10^{1000}$ trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Hierarchical clustering in particle physics through reinforcement learning
Authors:
Johann Brehmer,
Sebastian Macaluso,
Duccio Pappadopulo,
Kyle Cranmer
Abstract:
Particle physics experiments often require the reconstruction of decay patterns through a hierarchical clustering of the observed final-state particles. We show that this task can be phrased as a Markov Decision Process and adapt reinforcement learning algorithms to solve it. In particular, we show that Monte-Carlo Tree Search guided by a neural policy can construct high-quality hierarchical clust…
▽ More
Particle physics experiments often require the reconstruction of decay patterns through a hierarchical clustering of the observed final-state particles. We show that this task can be phrased as a Markov Decision Process and adapt reinforcement learning algorithms to solve it. In particular, we show that Monte-Carlo Tree Search guided by a neural policy can construct high-quality hierarchical clusterings and outperform established greedy and beam search baselines.
△ Less
Submitted 18 December, 2020; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Data Structures & Algorithms for Exact Inference in Hierarchical Clustering
Authors:
Craig S. Greenberg,
Sebastian Macaluso,
Nicholas Monath,
Ji-Ah Lee,
Patrick Flaherty,
Kyle Cranmer,
Andrew McGregor,
Andrew McCallum
Abstract:
Hierarchical clustering is a fundamental task often used to discover meaningful structures in data, such as phylogenetic trees, taxonomies of concepts, subtypes of cancer, and cascades of particle decays in particle physics. Typically approximate algorithms are used for inference due to the combinatorial number of possible hierarchical clusterings. In contrast to existing methods, we present novel…
▽ More
Hierarchical clustering is a fundamental task often used to discover meaningful structures in data, such as phylogenetic trees, taxonomies of concepts, subtypes of cancer, and cascades of particle decays in particle physics. Typically approximate algorithms are used for inference due to the combinatorial number of possible hierarchical clusterings. In contrast to existing methods, we present novel dynamic-programming algorithms for \emph{exact} inference in hierarchical clustering based on a novel trellis data structure, and we prove that we can exactly compute the partition function, maximum likelihood hierarchy, and marginal probabilities of sub-hierarchies and clusters. Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies. Also, for larger datasets where our exact algorithms become infeasible, we introduce an approximate algorithm based on a sparse trellis that compares well to other benchmarks. Exact methods are relevant to data analyses in particle physics and for finding correlations among gene expression in cancer genomics, and we give examples in both areas, where our algorithms outperform greedy and beam search baselines. In addition, we consider Dasgupta's cost with synthetic data.
△ Less
Submitted 22 October, 2020; v1 submitted 26 February, 2020;
originally announced February 2020.