-
Streaming Belief Propagation for Community Detection
Authors:
Yuchen Wu,
MohammadHossein Bateni,
Andre Linhares,
Filipe Miguel Goncalves de Almeida,
Andrea Montanari,
Ashkan Norouzi-Fard,
Jakab Tardos
Abstract:
The community detection problem requires to cluster the nodes of a network into a small number of well-connected "communities". There has been substantial recent progress in characterizing the fundamental statistical limits of community detection under simple stochastic block models. However, in real-world applications, the network structure is typically dynamic, with nodes that join over time. In…
▽ More
The community detection problem requires to cluster the nodes of a network into a small number of well-connected "communities". There has been substantial recent progress in characterizing the fundamental statistical limits of community detection under simple stochastic block models. However, in real-world applications, the network structure is typically dynamic, with nodes that join over time. In this setting, we would like a detection algorithm to perform only a limited number of updates at each node arrival. While standard voting approaches satisfy this constraint, it is unclear whether they exploit the network information optimally. We introduce a simple model for networks growing over time which we refer to as streaming stochastic block model (StSBM). Within this model, we prove that voting algorithms have fundamental limitations. We also develop a streaming belief-propagation (StreamBP) approach, for which we prove optimality in certain regimes. We validate our theoretical findings on synthetic and real data.
△ Less
Submitted 10 June, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
$L^\infty$-estimates in optimal transport for non quadratic costs
Authors:
Cristian E. Gutiérrez,
Annamaria Montanari
Abstract:
For cost functions $c(x,y)=h(x-y)$ with $h\in C^2$ homogeneous of degree $p\geq 2$, we show $L^\infty$-estimates of $Tx-x$ on balls, where $T$ is an $h$-monotone map. Estimates for the interpolating map**s $T_t=t(T-I)+I$ are deduced from this.
For cost functions $c(x,y)=h(x-y)$ with $h\in C^2$ homogeneous of degree $p\geq 2$, we show $L^\infty$-estimates of $Tx-x$ on balls, where $T$ is an $h$-monotone map. Estimates for the interpolating map**s $T_t=t(T-I)+I$ are deduced from this.
△ Less
Submitted 18 October, 2021; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Search for dark matter annihilation signals from unidentified Fermi-LAT objects with H.E.S.S
Authors:
H. E. S. S. Collaboration,
H. Abdallah,
F. Aharonian,
F. Ait Benkhali,
E. O. Angüner,
C. Arcaro,
C. Armand,
T. Armstrong,
H. Ashkar,
M. Backes,
V. Baghmanyan,
V. Barbosa Martins,
A. Barnacka,
M. Barnard,
Y. Becherini,
D. Berge,
K. Bernlöhr,
B. Bi,
M. Böttcher,
C. Boisson,
J. Bolmont,
M. de Bony de Lavergne,
M. Breuhaus,
R. Brose,
F. Brun
, et al. (205 additional authors not shown)
Abstract:
Cosmological $N$-body simulations show that Milky Way-sized galaxies harbor a population of unmerged dark matter subhalos. These subhalos could shine in gamma-rays and be eventually detected in gamma-ray surveys as unidentified sources. We performed a thorough selection among unidentified Fermi-LAT Objects (UFOs) to identify them as possible TeV-scale dark matter subhalo candidates. We search for…
▽ More
Cosmological $N$-body simulations show that Milky Way-sized galaxies harbor a population of unmerged dark matter subhalos. These subhalos could shine in gamma-rays and be eventually detected in gamma-ray surveys as unidentified sources. We performed a thorough selection among unidentified Fermi-LAT Objects (UFOs) to identify them as possible TeV-scale dark matter subhalo candidates. We search for very-high-energy (E $\gtrsim$ 100 GeV) gamma-ray emissions using H.E.S.S. observations towards four selected UFOs. Since no significant very-high-energy gamma-ray emission is detected in any dataset of the four observed UFOs nor in the combined UFO dataset, strong constraints are derived on the product of the velocity-weighted annihilation cross section $\langle σv \rangle$ by the $J$-factor for the dark matter models. The 95% C.L. observed upper limits derived from combined H.E.S.S. observations reach $\langle σv \rangle J$ values of 3.7$\times$10$^{-5}$ and 8.1$\times$10$^{-6}$ GeV$^2$cm$^{-2}$s$^{-1}$ in the $W^+W^-$ and $τ^+τ^-$ channels, respectively, for a 1 TeV dark matter mass. Focusing on thermal WIMPs, the H.E.S.S. constraints restrict the $J$-factors to lie in the range 6.1$\times$10$^{19}$ - 2.0$\times$10$^{21}$ GeV$^2$cm$^{-5}$, and the masses to lie between 0.2 and 6 TeV in the $W^+W^-$ channel. For the $τ^+τ^-$ channel, the $J$-factors lie in the range 7.0$\times$10$^{19}$ - 7.1$\times$10$^{20}$ GeV$^2$cm$^{-5}$ and the masses lie between 0.2 and 0.5 TeV. Assuming model-dependent predictions from cosmological N-body simulations on the $J$-factor distribution for Milky Way-sized galaxies, the dark matter models with masses greater than 0.3 TeV for the UFO emissions can be ruled out at high confidence level.
△ Less
Submitted 15 June, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Coded masks for imaging of neutrino events
Authors:
M. Andreotti,
P. Bernardini,
A. Bersani,
S. Bertolucci,
S. Biagi,
A. Branca,
C. Brizzolari,
G. Brunetti,
I. Cagnoli,
R. Calabrese,
A. Caminata,
A. Campani,
P. Carniti,
R. Cataldo,
C. Cattadori,
S. Cherubini,
V. Cicero,
M. Citterio,
S. Copello,
P. Cova,
E. Cristaldo Morales,
S. Davini,
N. Delmonte,
G. De Matteis,
S. Di Domizio
, et al. (54 additional authors not shown)
Abstract:
The capture of scintillation light emitted by liquid Argon and Xenon under molecular excitations by charged particles is still a challenging task. Here we present a first attempt to design a device able to grab sufficiently high luminosity in order to reconstruct the path of ionizing particles. This preliminary study is based on the use of masks to encode the light signal combined with single-phot…
▽ More
The capture of scintillation light emitted by liquid Argon and Xenon under molecular excitations by charged particles is still a challenging task. Here we present a first attempt to design a device able to grab sufficiently high luminosity in order to reconstruct the path of ionizing particles. This preliminary study is based on the use of masks to encode the light signal combined with single-photon detectors. In this respect, the proposed system is able to detect tracks over focal distances of about tens of centimeters. From numerical simulations it emerges that it is possible to successfully decode and recognize signals, even complex, with a relatively limited number of acquisition channels. Such innovative technique can be very fruitful in a new generation of detectors devoted to neutrino physics and dark matter search. Indeed the introduction of coded masks combined with SiPM detectors is proposed for a liquid-Argon target in the Near Detector of the DUNE experiment.
△ Less
Submitted 21 November, 2021; v1 submitted 22 May, 2021;
originally announced May 2021.
-
Search for dark matter annihilation in the dwarf irregular galaxy WLM with H.E.S.S
Authors:
H. E. S. S. Collaboration,
H. Abdallah,
R. Adam,
F. Aharonian,
F. Ait Benkhali,
E. O. Angüner,
C. Arcaro,
C. Armand,
T. Armstrong,
H. Ashkar,
M. Backes,
V. Baghmanyan,
V. Barbosa Martins,
A. Barnacka,
M. Barnard,
Y. Becherini,
D. Berge,
K. Bernlöhr,
B. Bi,
M. Böttcher,
C. Boisson,
J. Bolmont,
M. de Bony de Lavergne,
M. Breuhaus,
F. Brun
, et al. (211 additional authors not shown)
Abstract:
We search for an indirect signal of dark matter through very high-energy gamma rays from the Wolf-Lundmark-Melotte (WLM) dwarf irregular galaxy. The pair annihilation of dark matter particles would produce Standard Model particles in the final state such as gamma rays, which might be detected by ground-based Cherenkov telescopes. Dwarf irregular galaxies represent promising targets as they are dar…
▽ More
We search for an indirect signal of dark matter through very high-energy gamma rays from the Wolf-Lundmark-Melotte (WLM) dwarf irregular galaxy. The pair annihilation of dark matter particles would produce Standard Model particles in the final state such as gamma rays, which might be detected by ground-based Cherenkov telescopes. Dwarf irregular galaxies represent promising targets as they are dark matter dominated objects with well measured kinematics and small uncertainties on their dark matter distribution profiles. In 2018, the H.E.S.S. five-telescope array observed the dwarf irregular galaxy WLM for 18 hours. We present the first analysis based on data obtained from an imaging atmospheric Cherenkov telescope for this subclass of dwarf galaxy. As we do not observe any significant excess in the direction of WLM, we interpret the result in terms of constraints on the velocity-weighted cross section for dark matter pair annihilation as a function of the dark matter particle mass for various continuum channels as well as the prompt gamma-gamma emission. For the $τ^+τ^-$ channel the limits reach a $\langle σv \rangle$ value of about $4\times 10^{-22}$ cm3s-1 for a dark matter particle mass of 1 TeV. For the prompt gamma-gamma channel, the upper limit reaches a $\langle σv \rangle$ value of about $5 \times10^{-24}$ cm3s-1 for a mass of 370 GeV. These limits represent an improvement of up to a factor 200 with respect to previous results for the dwarf irregular galaxies for TeV dark matter search.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Large factor model estimation by nuclear norm plus $l_1$ norm penalization
Authors:
Matteo Farnè,
Angela Montanari
Abstract:
This paper provides a comprehensive estimation framework via nuclear norm plus $l_1$ norm penalization for high-dimensional approximate factor models with a sparse residual covariance. The underlying assumptions allow for non-pervasive latent eigenvalues and a prominent residual covariance pattern. In that context, existing approaches based on principal components may lead to misestimate the laten…
▽ More
This paper provides a comprehensive estimation framework via nuclear norm plus $l_1$ norm penalization for high-dimensional approximate factor models with a sparse residual covariance. The underlying assumptions allow for non-pervasive latent eigenvalues and a prominent residual covariance pattern. In that context, existing approaches based on principal components may lead to misestimate the latent rank, due to the numerical instability of sample eigenvalues. On the contrary, the proposed optimization problem retrieves the latent covariance structure and exactly recovers the latent rank and the residual sparsity pattern. Conditioning on them, the asymptotic rates of the subsequent ordinary least squares estimates of loadings and factor scores are provided, the recovered latent eigenvalues are shown to be maximally concentrated and the estimates of factor scores via Bartlett's and Thompson's methods are proved to be the most precise given the data. The validity of outlined results is highlighted in an exhaustive simulation study and in a real financial data example.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Minimum complexity interpolation in random features models
Authors:
Michael Celentano,
Theodor Misiakiewicz,
Andrea Montanari
Abstract:
Despite their many appealing properties, kernel methods are heavily affected by the curse of dimensionality. For instance, in the case of inner product kernels in $\mathbb{R}^d$, the Reproducing Kernel Hilbert Space (RKHS) norm is often very large for functions that depend strongly on a small subset of directions (ridge functions). Correspondingly, such functions are difficult to learn using kerne…
▽ More
Despite their many appealing properties, kernel methods are heavily affected by the curse of dimensionality. For instance, in the case of inner product kernels in $\mathbb{R}^d$, the Reproducing Kernel Hilbert Space (RKHS) norm is often very large for functions that depend strongly on a small subset of directions (ridge functions). Correspondingly, such functions are difficult to learn using kernel methods. This observation has motivated the study of generalizations of kernel methods, whereby the RKHS norm -- which is equivalent to a weighted $\ell_2$ norm -- is replaced by a weighted functional $\ell_p$ norm, which we refer to as $\mathcal{F}_p$ norm. Unfortunately, tractability of these approaches is unclear. The kernel trick is not available and minimizing these norms requires to solve an infinite-dimensional convex problem.
We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size. Hence, learning with $\mathcal{F}_p$ norms is tractable in these cases. We introduce a proof technique based on uniform concentration in the dual, which can be of broader interest in the study of overparametrized models. For $p= 1$, our guarantees for the random features approximation break down. We prove instead that learning with the $\mathcal{F}_1$ norm is $\mathsf{NP}$-hard under a randomized reduction based on the problem of learning halfspaces with noise.
△ Less
Submitted 5 November, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
Deep Underground Neutrino Experiment (DUNE) Near Detector Conceptual Design Report
Authors:
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
A. Aduszkiewicz,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
M. Alrashed,
C. Alt,
A. Alton,
P. Amedo,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
N. Anfimov,
A. Ankowski,
M. Antonova,
S. Antusch
, et al. (1041 additional authors not shown)
Abstract:
This report describes the conceptual design of the DUNE near detector
This report describes the conceptual design of the DUNE near detector
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Deep learning: a statistical viewpoint
Authors:
Peter L. Bartlett,
Andrea Montanari,
Alexander Rakhlin
Abstract:
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conje…
▽ More
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Experiment Simulation Configurations Approximating DUNE TDR
Authors:
DUNE Collaboration,
B. Abi,
R. Acciarri,
M. A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
M. Antonova,
S. Antusch,
A. Aranda-Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (949 additional authors not shown)
Abstract:
The Deep Underground Neutrino Experiment (DUNE) is a next-generation long-baseline neutrino oscillation experiment consisting of a high-power, broadband neutrino beam, a highly capable near detector located on site at Fermilab, in Batavia, Illinois, and a massive liquid argon time projection chamber (LArTPC) far detector located at the 4850L of Sanford Underground Research Facility in Lead, South…
▽ More
The Deep Underground Neutrino Experiment (DUNE) is a next-generation long-baseline neutrino oscillation experiment consisting of a high-power, broadband neutrino beam, a highly capable near detector located on site at Fermilab, in Batavia, Illinois, and a massive liquid argon time projection chamber (LArTPC) far detector located at the 4850L of Sanford Underground Research Facility in Lead, South Dakota. The long-baseline physics sensitivity calculations presented in the DUNE Physics TDR, and in a related physics paper, rely upon simulation of the neutrino beam line, simulation of neutrino interactions in the near and far detectors, fully automated event reconstruction and neutrino classification, and detailed implementation of systematic uncertainties. The purpose of this posting is to provide a simplified summary of the simulations that went into this analysis to the community, in order to facilitate phenomenological studies of long-baseline oscillation at DUNE. Simulated neutrino flux files and a GLoBES configuration describing the far detector reconstruction and selection performance are included as ancillary files to this posting. A simple analysis using these configurations in GLoBES produces sensitivity that is similar, but not identical, to the official DUNE sensitivity. DUNE welcomes those interested in performing phenomenological work as members of the collaboration, but also recognizes the benefit of making these configurations readily available to the wider community.
△ Less
Submitted 18 March, 2021; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Learning with invariances in random features and kernel models
Authors:
Song Mei,
Theodor Misiakiewicz,
Andrea Montanari
Abstract:
A number of machine learning tasks entail a high degree of invariance: the data distribution does not change if we act on the data with a certain group of transformations. For instance, labels of images are invariant under translations of the images. Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such…
▽ More
A number of machine learning tasks entail a high degree of invariance: the data distribution does not change if we act on the data with a certain group of transformations. For instance, labels of images are invariant under translations of the images. Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties. With the objective of quantifying the gain achieved by invariant architectures, we introduce two classes of models: invariant random features and invariant kernel methods. The latter includes, as a special case, the neural tangent kernel for convolutional networks with global average pooling. We consider uniform covariates distributions on the sphere and hypercube and a general invariant target function. We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, for a class of groups that we call `degeneracy $α$', with $α\leq 1$. We show that exploiting invariance in the architecture saves a $d^α$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures.
Finally, we show that output symmetrization of an unstructured kernel estimator does not give a significant statistical improvement; on the other hand, data augmentation with an unstructured kernel estimator is equivalent to an invariant kernel estimator and enjoys the same improvement in statistical efficiency.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration
Authors:
Song Mei,
Theodor Misiakiewicz,
Andrea Montanari
Abstract:
Consider the classical supervised learning problem: we are given data $(y_i,{\boldsymbol x}_i)$, $i\le n$, with $y_i$ a response and ${\boldsymbol x}_i\in {\mathcal X}$ a covariates vector, and try to learn a model $f:{\mathcal X}\to{\mathbb R}$ to predict future responses. Random features methods map the covariates vector ${\boldsymbol x}_i$ to a point ${\boldsymbol φ}({\boldsymbol x}_i)$ in a hi…
▽ More
Consider the classical supervised learning problem: we are given data $(y_i,{\boldsymbol x}_i)$, $i\le n$, with $y_i$ a response and ${\boldsymbol x}_i\in {\mathcal X}$ a covariates vector, and try to learn a model $f:{\mathcal X}\to{\mathbb R}$ to predict future responses. Random features methods map the covariates vector ${\boldsymbol x}_i$ to a point ${\boldsymbol φ}({\boldsymbol x}_i)$ in a higher dimensional space ${\mathbb R}^N$, via a random featurization map ${\boldsymbol φ}$. We study the use of random features methods in conjunction with ridge regression in the feature space ${\mathbb R}^N$. This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime.
We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random features ridge regression. In particular, we address two fundamental questions: $(1)$~What is the generalization error of KRR? $(2)$~How big $N$ should be for the random features approximation to achieve the same error as KRR?
In this setting, we prove that KRR is well approximated by a projection onto the top $\ell$ eigenfunctions of the kernel, where $\ell$ depends on the sample size $n$. We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-δ}$ for some $δ>0$. We characterize this gap. For $N\ge n^{1+δ}$, random features achieve the same error as the corresponding KRR, and further increasing $N$ does not lead to a significant change in test error.
△ Less
Submitted 26 January, 2021;
originally announced January 2021.
-
Observation of a sudden cessation of a very-high-energy gamma-ray flare in PKS 1510-089 with H.E.S.S. and MAGIC in May 2016
Authors:
H. E. S. S. Collaboration,
H. Abdalla,
R. Adam,
F. Aharonian,
F. Ait Benkhali,
E. O. Angüner,
C. Arcaro,
C. Arm,
T. Armstrong,
H. Ashkar,
M. Backes,
V. Baghmanyan,
V. Barbosa Martins,
A. Barnacka,
M. Barnard,
Y. Becherini,
D. Berge,
K. Bernlöhr,
B. Bi,
M. Böttcher,
C. Boisson,
J. Bolmont,
S. Bonnefoy,
M. de Bony de Lavergne,
J. Bregeon
, et al. (409 additional authors not shown)
Abstract:
The flat spectrum radio quasar (FSRQ) PKS 1510-089 is known for its complex multiwavelength behavior, and is one of only a few FSRQs detected at very high energy (VHE, $E>100\,$GeV) $γ$-rays. VHE $γ$-ray observations with H.E.S.S. and MAGIC during late May and early June 2016 resulted in the detection of an unprecedented flare, which reveals for the first time VHE $γ$-ray intranight variability in…
▽ More
The flat spectrum radio quasar (FSRQ) PKS 1510-089 is known for its complex multiwavelength behavior, and is one of only a few FSRQs detected at very high energy (VHE, $E>100\,$GeV) $γ$-rays. VHE $γ$-ray observations with H.E.S.S. and MAGIC during late May and early June 2016 resulted in the detection of an unprecedented flare, which reveals for the first time VHE $γ$-ray intranight variability in this source. While a common variability timescale of $1.5\,$hr is found, there is a significant deviation near the end of the flare with a timescale of $\sim 20\,$min marking the cessation of the event. The peak flux is nearly two orders of magnitude above the low-level emission. For the first time, curvature is detected in the VHE $γ$-ray spectrum of PKS 1510-089, which is fully explained through absorption by the extragalactic background light. Optical R-band observations with ATOM reveal a counterpart of the $γ$-ray flare, even though the detailed flux evolution differs from the VHE ightcurve. Interestingly, a steep flux decrease is observed at the same time as the cessation of the VHE flare. In the high energy (HE, $E>100\,$MeV) $γ$-ray band only a moderate flux increase is observed with Fermi-LAT, while the HE $γ$-ray spectrum significantly hardens up to a photon index of 1.6. A search for broad-line region (BLR) absorption features in the $γ$-ray spectrum indicates that the emission region is located outside of the BLR. Radio VLBI observations reveal a fast moving knot interacting with a standing jet feature around the time of the flare. As the standing feature is located $\sim 50\,$pc from the black hole, the emission region of the flare may have been located at a significant distance from the black hole. If this correlation is indeed true, VHE $γ$ rays have been produced far down the jet where turbulent plasma crosses a standing shock.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
SensiX: A Platform for Collaborative Machine Learning on the Edge
Authors:
Chulhong Min,
Akhil Mathur,
Alessandro Montanari,
Utku Gunay Acer,
Fahim Kawsar
Abstract:
The emergence of multiple sensory devices on or near a human body is uncovering new dynamics of extreme edge computing. In this, a powerful and resource-rich edge device such as a smartphone or a Wi-Fi gateway is transformed into a personal edge, collaborating with multiple devices to offer remarkable sensory al eapplications, while harnessing the power of locality, availability, and proximity. Na…
▽ More
The emergence of multiple sensory devices on or near a human body is uncovering new dynamics of extreme edge computing. In this, a powerful and resource-rich edge device such as a smartphone or a Wi-Fi gateway is transformed into a personal edge, collaborating with multiple devices to offer remarkable sensory al eapplications, while harnessing the power of locality, availability, and proximity. Naturally, this transformation pushes us to rethink how to construct accurate, robust, and efficient sensory systems at personal edge. For instance, how do we build a reliable activity tracker with multiple on-body IMU-equipped devices? While the accuracy of sensing models is improving, their runtime performance still suffers, especially under this emerging multi-device, personal edge environments. Two prime caveats that impact their performance are device and data variabilities, contributed by several runtime factors, including device availability, data quality, and device placement. To this end, we present SensiX, a personal edge platform that stays between sensor data and sensing models, and ensures best-effort inference under any condition while co** with device and data variabilities without demanding model engineering. SensiX externalises model execution away from applications, and comprises of two essential functions, a translation operator for principled map** of device-to-device data and a quality-aware selection operator to systematically choose the right execution path as a function of model accuracy. We report the design and implementation of SensiX and demonstrate its efficacy in develo** motion and audio-based multi-device sensing systems. Our evaluation shows that SensiX offers a 7-13% increase in overall accuracy and up to 30% increase across different environment dynamics at the expense of 3mW power overhead.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Sensitivity of the SHiP experiment to dark photons decaying to a pair of charged particles
Authors:
SHiP Collaboration,
C. Ahdida,
A. Akmete,
R. Albanese,
A. Alexandrov,
A. Anokhina,
S. Aoki,
G. Arduini,
E. Atkin,
N. Azorskiy,
J. J. Back,
A. Bagulya,
F. Baaltasar Dos Santos,
A. Baranov,
F. Bardou,
G. J. Barker,
M. Battistin,
J. Bauche,
A. Bay,
V. Bayliss,
G. Bencivenni,
A. Y. Berdnikov,
Y. A. Berdnikov,
M. Bertani,
C. Betancourt
, et al. (309 additional authors not shown)
Abstract:
Dark photons are hypothetical massive vector particles that could mix with ordinary photons. The simplest theoretical model is fully characterised by only two parameters: the mass of the dark photon m$_{γ^{\mathrm{D}}}$ and its mixing parameter with the photon, $\varepsilon$. The sensitivity of the SHiP detector is reviewed for dark photons in the mass range between 0.002 and 10 GeV. Different pro…
▽ More
Dark photons are hypothetical massive vector particles that could mix with ordinary photons. The simplest theoretical model is fully characterised by only two parameters: the mass of the dark photon m$_{γ^{\mathrm{D}}}$ and its mixing parameter with the photon, $\varepsilon$. The sensitivity of the SHiP detector is reviewed for dark photons in the mass range between 0.002 and 10 GeV. Different production mechanisms are simulated, with the dark photons decaying to pairs of visible fermions, including both leptons and quarks. Exclusion contours are presented and compared with those of past experiments. The SHiP detector is expected to have a unique sensitivity for m$_{γ^{\mathrm{D}}}$ ranging between 0.8 and 3.3$^{+0.2}_{-0.5}$ GeV, and $\varepsilon^2$ ranging between $10^{-11}$ and $10^{-17}$.
△ Less
Submitted 1 March, 2021; v1 submitted 10 November, 2020;
originally announced November 2020.
-
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Authors:
Alexander D'Amour,
Katherine Heller,
Dan Moldovan,
Ben Adlam,
Babak Alipanahi,
Alex Beutel,
Christina Chen,
Jonathan Deaton,
Jacob Eisenstein,
Matthew D. Hoffman,
Farhad Hormozdiari,
Neil Houlsby,
Shaobo Hou,
Ghassen Jerfel,
Alan Karthikesalingam,
Mario Lucic,
Yian Ma,
Cory McLean,
Diana Mincu,
Akinori Mitani,
Andrea Montanari,
Zachary Nado,
Vivek Natarajan,
Christopher Nielson,
Thomas F. Osborne
, et al. (15 additional authors not shown)
Abstract:
ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict…
▽ More
ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.
△ Less
Submitted 24 November, 2020; v1 submitted 6 November, 2020;
originally announced November 2020.
-
An extreme particle accelerator in the Galactic plane: HESS J1826$-$130
Authors:
H. E. S. S. Collaboration,
H. Abdalla,
R. Adam,
F. Aharonian,
F. Ait Benkhali,
E. O. Angüner,
C. Arcaro,
C. Armand,
T. Armstrong,
H. Ashkar,
M. Backes,
V. Baghmanyan,
V. Barbosa Martins,
A. Barnacka,
M. Barnard,
Y. Becherini,
D. Berge,
K. Bernlöhr,
B. Bi,
M. Böttcher,
C. Boisson,
J. Bolmont,
M. de Bony de Lavergne,
P. Bordas,
M. Breuhaus
, et al. (215 additional authors not shown)
Abstract:
The unidentified very-high-energy (VHE; E $>$ 0.1 TeV) $γ$-ray source, HESS J1826$-$130, was discovered with the High Energy Stereoscopic System (HESS) in the Galactic plane. The analysis of 215 h of HESS data has revealed a steady $γ$-ray flux from HESS J1826$-$130, which appears extended with a half-width of 0.21$^{\circ}$ $\pm$ 0.02$^{\circ}_{\text{stat}}$ $\pm$ 0.05$^{\circ}_{\text{sys}}$. The…
▽ More
The unidentified very-high-energy (VHE; E $>$ 0.1 TeV) $γ$-ray source, HESS J1826$-$130, was discovered with the High Energy Stereoscopic System (HESS) in the Galactic plane. The analysis of 215 h of HESS data has revealed a steady $γ$-ray flux from HESS J1826$-$130, which appears extended with a half-width of 0.21$^{\circ}$ $\pm$ 0.02$^{\circ}_{\text{stat}}$ $\pm$ 0.05$^{\circ}_{\text{sys}}$. The source spectrum is best fit with either a power-law function with a spectral index $Γ$ = 1.78 $\pm$ 0.10$_{\text{stat}}$ $\pm$ 0.20$_{\text{sys}}$ and an exponential cut-off at 15.2$^{+5.5}_{-3.2}$ TeV, or a broken power-law with $Γ_{1}$ = 1.96 $\pm$ 0.06$_{\text{stat}}$ $\pm$ 0.20$_{\text{sys}}$, $Γ_{2}$ = 3.59 $\pm$ 0.69$_{\text{stat}}$ $\pm$ 0.20$_{\text{sys}}$ for energies below and above $E_{\rm{br}}$ = 11.2 $\pm$ 2.7 TeV, respectively. The VHE flux from HESS J1826$-$130 is contaminated by the extended emission of the bright, nearby pulsar wind nebula (PWN), HESS J1825$-$137, particularly at the low end of the energy spectrum. Leptonic scenarios for the origin of HESS J1826$-$130 VHE emission related to PSR J1826$-$1256 are confronted by our spectral and morphological analysis. In a hadronic framework, taking into account the properties of dense gas regions surrounding HESS J1826$-$130, the source spectrum would imply an astrophysical object capable of accelerating the parent particle population up to $\gtrsim$200 TeV. Our results are also discussed in a multiwavelength context, accounting for both the presence of nearby supernova remnants (SNRs), molecular clouds, and counterparts detected in radio, X-rays, and TeV energies.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Algorithmic Thresholds in Mean Field Spin Glasses
Authors:
Ahmed El Alaoui,
Andrea Montanari
Abstract:
Optimizing a high-dimensional non-convex function is, in general, computationally hard and many problems of this type are hard to solve even approximately. Complexity theory characterizes the optimal approximation ratios achievable in polynomial time in the worst case. On the other hand, when the objective function is random, worst case approximation ratios are overly pessimistic. Mean field spin…
▽ More
Optimizing a high-dimensional non-convex function is, in general, computationally hard and many problems of this type are hard to solve even approximately. Complexity theory characterizes the optimal approximation ratios achievable in polynomial time in the worst case. On the other hand, when the objective function is random, worst case approximation ratios are overly pessimistic. Mean field spin glasses are canonical families of random energy functions over the discrete hypercube $\{-1,+1\}^N$. The near-optima of these energy landscapes are organized according to an ultrametric tree-like structure, which enjoys a high degree of universality. Recently, a precise connection has begun to emerge between this ultrametric structure and the optimal approximation ratio achievable in polynomial time in the typical case. A new approximate message passing (AMP) algorithm has been proposed that leverages this connection. The asymptotic behavior of this algorithm has been analyzed, conditional on the nature of the solution of a certain variational problem.
In this paper we describe the first implementation of this algorithm and the first numerical solution of the associated variational problem. We test our approach on two prototypical mean-field spin glasses: the Sherrington-Kirkpatrick (SK) model, and the $3$-spin Ising spin glass. We observe that the algorithm works well already at moderate sizes ($N\gtrsim 1000$) and its behavior is consistent with theoretical expectations. For the SK model it asymptotically achieves arbitrarily good approximations of the global optimum. For the $3$-spin model, it achieves a constant approximation ratio that is predicted by the theory, and it appears to beat the `threshold energy' achieved by Glauber dynamics. Finally, we observe numerically that the intermediate states generated by the algorithm have the properties of ancestor states in the ultrametric tree.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Prospects for Beyond the Standard Model Physics Searches at the Deep Underground Neutrino Experiment
Authors:
DUNE Collaboration,
B. Abi,
R. Acciarri,
M. A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
M. Antonova,
S. Antusch,
A. Aranda-Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (953 additional authors not shown)
Abstract:
The Deep Underground Neutrino Experiment (DUNE) will be a powerful tool for a variety of physics topics. The high-intensity proton beams provide a large neutrino flux, sampled by a near detector system consisting of a combination of capable precision detectors, and by the massive far detector system located deep underground. This configuration sets up DUNE as a machine for discovery, as it enables…
▽ More
The Deep Underground Neutrino Experiment (DUNE) will be a powerful tool for a variety of physics topics. The high-intensity proton beams provide a large neutrino flux, sampled by a near detector system consisting of a combination of capable precision detectors, and by the massive far detector system located deep underground. This configuration sets up DUNE as a machine for discovery, as it enables opportunities not only to perform precision neutrino measurements that may uncover deviations from the present three-flavor mixing paradigm, but also to discover new particles and unveil new interactions and symmetries beyond those predicted in the Standard Model (SM). Of the many potential beyond the Standard Model (BSM) topics DUNE will probe, this paper presents a selection of studies quantifying DUNE's sensitivities to sterile neutrino mixing, heavy neutral leptons, non-standard interactions, CPT symmetry violation, Lorentz invariance violation, neutrino trident production, dark matter from both beam induced and cosmogenic sources, baryon number violation, and other new physics topics that complement those at high-energy colliders and significantly extend the present reach.
△ Less
Submitted 23 April, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Supernova Neutrino Burst Detection with the Deep Underground Neutrino Experiment
Authors:
DUNE collaboration,
B. Abi,
R. Acciarri,
M. A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
M. Antonova,
S. Antusch,
A. Aranda-Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (949 additional authors not shown)
Abstract:
The Deep Underground Neutrino Experiment (DUNE), a 40-kton underground liquid argon time projection chamber experiment, will be sensitive to the electron-neutrino flavor component of the burst of neutrinos expected from the next Galactic core-collapse supernova. Such an observation will bring unique insight into the astrophysics of core collapse as well as into the properties of neutrinos. The gen…
▽ More
The Deep Underground Neutrino Experiment (DUNE), a 40-kton underground liquid argon time projection chamber experiment, will be sensitive to the electron-neutrino flavor component of the burst of neutrinos expected from the next Galactic core-collapse supernova. Such an observation will bring unique insight into the astrophysics of core collapse as well as into the properties of neutrinos. The general capabilities of DUNE for neutrino detection in the relevant few- to few-tens-of-MeV neutrino energy range will be described. As an example, DUNE's ability to constrain the $ν_e$ spectral parameters of the neutrino burst will be considered.
△ Less
Submitted 29 May, 2021; v1 submitted 15 August, 2020;
originally announced August 2020.
-
Reactive Synthesis from Extended Bounded Response LTL Specifications
Authors:
Alessandro Cimatti,
Luca Geatti,
Nicola Gigante,
Angelo Montanari,
Stefano Tonetta
Abstract:
Reactive synthesis is a key technique for the design of correct-by-construction systems and has been thoroughly investigated in the last decades. It consists in the synthesis of a controller that reacts to environment's inputs satisfying a given temporal logic specification. Common approaches are based on the explicit construction of automata and on their determinization, which limit their scalabi…
▽ More
Reactive synthesis is a key technique for the design of correct-by-construction systems and has been thoroughly investigated in the last decades. It consists in the synthesis of a controller that reacts to environment's inputs satisfying a given temporal logic specification. Common approaches are based on the explicit construction of automata and on their determinization, which limit their scalability.
In this paper, we introduce a new fragment of Linear Temporal Logic, called Extended Bounded Response LTL (\LTLEBR), that allows one to combine bounded and universal unbounded temporal operators (thus covering a large set of practical cases), and we show that reactive synthesis from \LTLEBR specifications can be reduced to solving a safety game over a deterministic symbolic automaton built directly from the specification. We prove the correctness of the proposed approach and we successfully evaluate it on various benchmarks.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
The Lasso with general Gaussian designs with applications to hypothesis testing
Authors:
Michael Celentano,
Andrea Montanari,
Yuting Wei
Abstract:
The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates $p$ is of the same order or larger than the number of observations $n$. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: $(1)$ The regularized risk is non-smooth; $(2)$ The distance between the estimator $\widehat{\boldsymbolθ}$ and the t…
▽ More
The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates $p$ is of the same order or larger than the number of observations $n$. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: $(1)$ The regularized risk is non-smooth; $(2)$ The distance between the estimator $\widehat{\boldsymbolθ}$ and the true parameters vector $\boldsymbolθ^*$ cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail.
On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large and $n/p$ is of order one. This characterization was first obtained in the case of Gaussian designs with i.i.d. covariates: here we generalize it to Gaussian correlated designs with non-singular covariance structure. This is expressed in terms of a simpler ``fixed-design'' model. We establish non-asymptotic bounds on the distance between the distribution of various quantities in the two models, which hold uniformly over signals $\boldsymbolθ^*$ in a suitable sparsity class and over values of the regularization parameter.
As an application, we study the distribution of the debiased Lasso and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.
△ Less
Submitted 19 September, 2023; v1 submitted 27 July, 2020;
originally announced July 2020.
-
The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training
Authors:
Andrea Montanari,
Yiqiao Zhong
Abstract:
Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to b…
▽ More
Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in $d$ dimensions, and $N$ hidden neurons. We assume that both the sample size $n$ and the dimension $d$ are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the overparametrized regime $Nd\gg n$. This characterization implies as a corollary that the minimum eigenvalue of the empirical NT kernel is bounded away from zero as soon as $Nd\gg n$, and therefore the network can exactly interpolate arbitrary labels in the same regime.
Our second main result is a characterization of the generalization error of NT ridge regression including, as a special case, min-$\ell_2$ norm interpolation. We prove that, as soon as $Nd\gg n$, the test error is well approximated by the one of kernel ridge regression with respect to the infinite-width kernel. The latter is in turn well approximated by the error of polynomial ridge regression, whereby the regularization parameter is increased by a `self-induced' term related to the high-degree components of the activation function. The polynomial degree depends on the sample size and the dimension (in particular on $\log n/\log d$).
△ Less
Submitted 8 June, 2022; v1 submitted 24 July, 2020;
originally announced July 2020.
-
First results on ProtoDUNE-SP liquid argon time projection chamber performance from a beam test at the CERN Neutrino Platform
Authors:
DUNE Collaboration,
B. Abi,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
G. Adamov,
M. Adamowski,
D. Adams,
P. Adrien,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
M. Antonova,
S. Antusch,
A. Aranda-Fernandez,
A. Ariga
, et al. (970 additional authors not shown)
Abstract:
The ProtoDUNE-SP detector is a single-phase liquid argon time projection chamber with an active volume of $7.2\times 6.0\times 6.9$ m$^3$. It is installed at the CERN Neutrino Platform in a specially-constructed beam that delivers charged pions, kaons, protons, muons and electrons with momenta in the range 0.3 GeV$/c$ to 7 GeV/$c$. Beam line instrumentation provides accurate momentum measurements…
▽ More
The ProtoDUNE-SP detector is a single-phase liquid argon time projection chamber with an active volume of $7.2\times 6.0\times 6.9$ m$^3$. It is installed at the CERN Neutrino Platform in a specially-constructed beam that delivers charged pions, kaons, protons, muons and electrons with momenta in the range 0.3 GeV$/c$ to 7 GeV/$c$. Beam line instrumentation provides accurate momentum measurements and particle identification. The ProtoDUNE-SP detector is a prototype for the first far detector module of the Deep Underground Neutrino Experiment, and it incorporates full-size components as designed for that module. This paper describes the beam line, the time projection chamber, the photon detectors, the cosmic-ray tagger, the signal processing and particle reconstruction. It presents the first results on ProtoDUNE-SP's performance, including noise and gain measurements, $dE/dx$ calibration for muons, protons, pions and electrons, drift electron lifetime measurements, and photon detector noise, signal sensitivity and time resolution measurements. The measured values meet or exceed the specifications for the DUNE far detector, in several cases by large margins. ProtoDUNE-SP's successful operation starting in 2018 and its production of large samples of high-quality data demonstrate the effectiveness of the single-phase far detector design.
△ Less
Submitted 3 June, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Long-baseline neutrino oscillation physics potential of the DUNE experiment
Authors:
DUNE Collaboration,
B. Abi,
R. Acciarri,
M. A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
M. Antonova,
S. Antusch,
A. Aranda-Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (949 additional authors not shown)
Abstract:
The sensitivity of the Deep Underground Neutrino Experiment (DUNE) to neutrino oscillation is determined, based on a full simulation, reconstruction, and event selection of the far detector and a full simulation and parameterized analysis of the near detector. Detailed uncertainties due to the flux prediction, neutrino interaction model, and detector effects are included. DUNE will resolve the neu…
▽ More
The sensitivity of the Deep Underground Neutrino Experiment (DUNE) to neutrino oscillation is determined, based on a full simulation, reconstruction, and event selection of the far detector and a full simulation and parameterized analysis of the near detector. Detailed uncertainties due to the flux prediction, neutrino interaction model, and detector effects are included. DUNE will resolve the neutrino mass ordering to a precision of 5$σ$, for all $δ_{\mathrm{CP}}$ values, after 2 years of running with the nominal detector design and beam configuration. It has the potential to observe charge-parity violation in the neutrino sector to a precision of 3$σ$ (5$σ$) after an exposure of 5 (10) years, for 50\% of all $δ_{\mathrm{CP}}$ values. It will also make precise measurements of other parameters governing long-baseline neutrino oscillation, and after an exposure of 15 years will achieve a similar sensitivity to $\sin^{2} 2θ_{13}$ to current reactor experiments.
△ Less
Submitted 6 December, 2021; v1 submitted 26 June, 2020;
originally announced June 2020.
-
Neutrino interaction classification with a convolutional neural network in the DUNE far detector
Authors:
DUNE Collaboration,
B. Abi,
R. Acciarri,
M. A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
M. Antonova,
S. Antusch,
A. Aranda-Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (951 additional authors not shown)
Abstract:
The Deep Underground Neutrino Experiment is a next-generation neutrino oscillation experiment that aims to measure $CP$-violation in the neutrino sector as part of a wider physics program. A deep learning approach based on a convolutional neural network has been developed to provide highly efficient and pure selections of electron neutrino and muon neutrino charged-current interactions. The electr…
▽ More
The Deep Underground Neutrino Experiment is a next-generation neutrino oscillation experiment that aims to measure $CP$-violation in the neutrino sector as part of a wider physics program. A deep learning approach based on a convolutional neural network has been developed to provide highly efficient and pure selections of electron neutrino and muon neutrino charged-current interactions. The electron neutrino (antineutrino) selection efficiency peaks at 90% (94%) and exceeds 85% (90%) for reconstructed neutrino energies between 2-5 GeV. The muon neutrino (antineutrino) event selection is found to have a maximum efficiency of 96% (97%) and exceeds 90% (95%) efficiency for reconstructed neutrino energies above 2 GeV. When considering all electron neutrino and antineutrino interactions as signal, a selection purity of 90% is achieved. These event selections are critical to maximize the sensitivity of the experiment to $CP$-violating effects.
△ Less
Submitted 10 November, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
When Do Neural Networks Outperform Kernel Methods?
Authors:
Behrooz Ghorbani,
Song Mei,
Theodor Misiakiewicz,
Andrea Montanari
Abstract:
For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothn…
▽ More
For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization.
How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present the spiked covariates model that can capture in a unified framework both behaviors observed in earlier work.
We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.
△ Less
Submitted 9 November, 2021; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Satisfiability and Model Checking for the Logic of Sub-Intervals under the Homogeneity Assumption
Authors:
Laura Bozzelli,
Alberto Molinari,
Angelo Montanari,
Adriano Peron,
Pietro Sala
Abstract:
The expressive power of interval temporal logics (ITLs) makes them one of the most natural choices in a number of application domains, ranging from the specification and verification of complex reactive systems to automated planning. However, for a long time, because of their high computational complexity, they were considered not suitable for practical purposes. The recent discovery of several co…
▽ More
The expressive power of interval temporal logics (ITLs) makes them one of the most natural choices in a number of application domains, ranging from the specification and verification of complex reactive systems to automated planning. However, for a long time, because of their high computational complexity, they were considered not suitable for practical purposes. The recent discovery of several computationally well-behaved ITLs has finally changed the scenario.
In this paper, we investigate the finite satisfiability and model checking problems for the ITL D, that has a single modality for the sub-interval relation, under the homogeneity assumption (that constrains a proposition letter to hold over an interval if and only if it holds over all its points). We first prove that the satisfiability problem for D, over finite linear orders, is PSPACE-complete, and then we show that the same holds for its model checking problem, over finite Kripke structures. In such a way, we enrich the set of tractable interval temporal logics with a new meaningful representative.
△ Less
Submitted 31 January, 2022; v1 submitted 8 June, 2020;
originally announced June 2020.
-
The estimation error of general first order methods
Authors:
Michael Celentano,
Andrea Montanari,
Yuchen Wu
Abstract:
Modern large-scale statistical models require to estimate thousands to millions of parameters. This is often accomplished by iterative algorithms such as gradient descent, projected gradient descent or their accelerated versions. What are the fundamental limits to these approaches? This question is well understood from an optimization viewpoint when the underlying objective is convex. Work in this…
▽ More
Modern large-scale statistical models require to estimate thousands to millions of parameters. This is often accomplished by iterative algorithms such as gradient descent, projected gradient descent or their accelerated versions. What are the fundamental limits to these approaches? This question is well understood from an optimization viewpoint when the underlying objective is convex. Work in this area characterizes the gap to global optimality as a function of the number of iterations. However, these results have only indirect implications in terms of the gap to statistical optimality.
Here we consider two families of high-dimensional estimation problems: high-dimensional regression and low-rank matrix estimation, and introduce a class of `general first order methods' that aim at efficiently estimating the underlying parameters. This class of algorithms is broad enough to include classical first order optimization (for convex and non-convex objectives), but also other types of algorithms. Under a random design assumption, we derive lower bounds on the estimation error that hold in the high-dimensional asymptotics in which both the number of observations and the number of parameters diverge. These lower bounds are optimal in the sense that there exist algorithms whose estimation error matches the lower bounds up to asymptotically negligible terms. We illustrate our general results through applications to sparse phase retrieval and sparse principal component analysis.
△ Less
Submitted 3 March, 2020; v1 submitted 28 February, 2020;
originally announced February 2020.
-
SND@LHC
Authors:
SHiP Collaboration,
C. Ahdida,
A. Akmete,
R. Albanese,
A. Alexandrov,
M. Andreini,
A. Anokhina,
S. Aoki,
G. Arduini,
E. Atkin,
N. Azorskiy,
J. J. Back,
A. Bagulya,
F. Baaltasar Dos Santos,
A. Baranov,
F. Bardou,
G. J. Barker,
M. Battistin,
J. Bauche,
A. Bay,
V. Bayliss,
G. Bencivenni,
A. Y. Berdnikov,
Y. A. Berdnikov,
M. Bertani
, et al. (319 additional authors not shown)
Abstract:
We propose to build and operate a detector that, for the first time, will measure the process $pp\toνX$ at the LHC and search for feebly interacting particles (FIPs) in an unexplored domain. The TI18 tunnel has been identified as a suitable site to perform these measurements due to very low machine-induced background. The detector will be off-axis with respect to the ATLAS interaction point (IP1)…
▽ More
We propose to build and operate a detector that, for the first time, will measure the process $pp\toνX$ at the LHC and search for feebly interacting particles (FIPs) in an unexplored domain. The TI18 tunnel has been identified as a suitable site to perform these measurements due to very low machine-induced background. The detector will be off-axis with respect to the ATLAS interaction point (IP1) and, given the pseudo-rapidity range accessible, the corresponding neutrinos will mostly come from charm decays: the proposed experiment will thus make the first test of the heavy flavour production in a pseudo-rapidity range that is not accessible by the current LHC detectors. In order to efficiently reconstruct neutrino interactions and identify their flavour, the detector will combine in the target region nuclear emulsion technology with scintillating fibre tracking layers and it will adopt a muon identification system based on scintillating bars that will also play the role of a hadronic calorimeter. The time of flight measurement will be achieved thanks to a dedicated timing detector. The detector will be a small-scale prototype of the scattering and neutrino detector (SND) of the SHiP experiment: the operation of this detector will provide an important test of the neutrino reconstruction in a high occupancy environment.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
Deep Underground Neutrino Experiment (DUNE), Far Detector Technical Design Report, Volume IV: Far Detector Single-phase Technology
Authors:
B. Abi,
R. Acciarri,
Mario A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
J. Anthony,
M. Antonova,
S. Antusch,
A. Aranda Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (941 additional authors not shown)
Abstract:
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. DUNE is an international world-clas…
▽ More
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. DUNE is an international world-class experiment dedicated to addressing these questions as it searches for leptonic charge-parity symmetry violation, stands ready to capture supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model.
Central to achieving DUNE's physics program is a far detector that combines the many tens-of-kiloton fiducial mass necessary for rare event searches with sub-centimeter spatial resolution in its ability to image those events, allowing identification of the physics signatures among the numerous backgrounds. In the single-phase liquid argon time-projection chamber (LArTPC) technology, ionization charges drift horizontally in the liquid argon under the influence of an electric field towards a vertical anode, where they are read out with fine granularity. A photon detection system supplements the TPC, directly enhancing physics capabilities for all three DUNE physics drivers and opening up prospects for further physics explorations.
The DUNE far detector technical design report (TDR) describes the DUNE physics program and the technical designs of the single- and dual-phase DUNE liquid argon TPC far detector modules. Volume IV presents an overview of the basic operating principles of a single-phase LArTPC, followed by a description of the DUNE implementation. Each of the subsystems is described in detail, connecting the high-level design requirements and decisions to the overriding physics goals of DUNE.
△ Less
Submitted 8 September, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Deep Underground Neutrino Experiment (DUNE), Far Detector Technical Design Report, Volume III: DUNE Far Detector Technical Coordination
Authors:
B. Abi,
R. Acciarri,
Mario A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
J. Anthony,
M. Antonova,
S. Antusch,
A. Aranda Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (941 additional authors not shown)
Abstract:
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. The Deep Underground Neutrino Exper…
▽ More
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. The Deep Underground Neutrino Experiment (DUNE) is an international world-class experiment dedicated to addressing these questions as it searches for leptonic charge-parity symmetry violation, stands ready to capture supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model.
The DUNE far detector technical design report (TDR) describes the DUNE physics program and the technical designs of the single- and dual-phase DUNE liquid argon TPC far detector modules. Volume III of this TDR describes how the activities required to design, construct, fabricate, install, and commission the DUNE far detector modules are organized and managed.
This volume details the organizational structures that will carry out and/or oversee the planned far detector activities safely, successfully, on time, and on budget. It presents overviews of the facilities, supporting infrastructure, and detectors for context, and it outlines the project-related functions and methodologies used by the DUNE technical coordination organization, focusing on the areas of integration engineering, technical reviews, quality assurance and control, and safety oversight. Because of its more advanced stage of development, functional examples presented in this volume focus primarily on the single-phase (SP) detector module.
△ Less
Submitted 8 September, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Deep Underground Neutrino Experiment (DUNE), Far Detector Technical Design Report, Volume II: DUNE Physics
Authors:
B. Abi,
R. Acciarri,
Mario A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
J. Anthony,
M. Antonova,
S. Antusch,
A. Aranda Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (941 additional authors not shown)
Abstract:
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. DUNE is an international world-clas…
▽ More
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. DUNE is an international world-class experiment dedicated to addressing these questions as it searches for leptonic charge-parity symmetry violation, stands ready to capture supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model.
The DUNE far detector technical design report (TDR) describes the DUNE physics program and the technical designs of the single- and dual-phase DUNE liquid argon TPC far detector modules. Volume II of this TDR, DUNE Physics, describes the array of identified scientific opportunities and key goals. Crucially, we also report our best current understanding of the capability of DUNE to realize these goals, along with the detailed arguments and investigations on which this understanding is based.
This TDR volume documents the scientific basis underlying the conception and design of the LBNF/DUNE experimental configurations. As a result, the description of DUNE's experimental capabilities constitutes the bulk of the document. Key linkages between requirements for successful execution of the physics program and primary specifications of the experimental configurations are drawn and summarized.
This document also serves a wider purpose as a statement on the scientific potential of DUNE as a central component within a global program of frontier theoretical and experimental particle physics research. Thus, the presentation also aims to serve as a resource for the particle physics community at large.
△ Less
Submitted 25 March, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Deep Underground Neutrino Experiment (DUNE), Far Detector Technical Design Report, Volume I: Introduction to DUNE
Authors:
B. Abi,
R. Acciarri,
Mario A. Acero,
G. Adamov,
D. Adams,
M. Adinolfi,
Z. Ahmad,
J. Ahmed,
T. Alion,
S. Alonso Monsalve,
C. Alt,
J. Anderson,
C. Andreopoulos,
M. P. Andrews,
F. Andrianala,
S. Andringa,
A. Ankowski,
J. Anthony,
M. Antonova,
S. Antusch,
A. Aranda Fernandez,
A. Ariga,
L. O. Arnold,
M. A. Arroyave,
J. Asaadi
, et al. (941 additional authors not shown)
Abstract:
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. The Deep Underground Neutrino Exper…
▽ More
The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay -- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. The Deep Underground Neutrino Experiment (DUNE) is an international world-class experiment dedicated to addressing these questions as it searches for leptonic charge-parity symmetry violation, stands ready to capture supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model.
The DUNE far detector technical design report (TDR) describes the DUNE physics program and the technical designs of the single- and dual-phase DUNE liquid argon TPC far detector modules. This TDR is intended to justify the technical choices for the far detector that flow down from the high-level physics goals through requirements at all levels of the Project. Volume I contains an executive summary that introduces the DUNE science program, the far detector and the strategy for its modular designs, and the organization and management of the Project. The remainder of Volume I provides more detail on the science program that drives the choice of detector technologies and on the technologies themselves. It also introduces the designs for the DUNE near detector and the DUNE computing model, for which DUNE is planning design reports.
Volume II of this TDR describes DUNE's physics program in detail. Volume III describes the technical coordination required for the far detector design, construction, installation, and integration, and its organizational structure. Volume IV describes the single-phase far detector technology. A planned Volume V will describe the dual-phase technology.
△ Less
Submitted 8 September, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Imputation for High-Dimensional Linear Regression
Authors:
Kabir Aladin Chandrasekher,
Ahmed El Alaoui,
Andrea Montanari
Abstract:
We study high-dimensional regression with missing entries in the covariates. A common strategy in practice is to \emph{impute} the missing entries with an appropriate substitute and then implement a standard statistical procedure acting as if the covariates were fully observed. Recent literature on this subject proposes instead to design a specific, often complicated or non-convex, algorithm tailo…
▽ More
We study high-dimensional regression with missing entries in the covariates. A common strategy in practice is to \emph{impute} the missing entries with an appropriate substitute and then implement a standard statistical procedure acting as if the covariates were fully observed. Recent literature on this subject proposes instead to design a specific, often complicated or non-convex, algorithm tailored to the case of missing covariates. We investigate a simpler approach where we fill-in the missing entries with their conditional mean given the observed covariates. We show that this imputation scheme coupled with standard off-the-shelf procedures such as the LASSO and square-root LASSO retains the minimax estimation rate in the random-design setting where the covariates are i.i.d.\ sub-Gaussian. We further show that the square-root LASSO remains \emph{pivotal} in this setting.
It is often the case that the conditional expectation cannot be computed exactly and must be approximated from data. We study two cases where the covariates either follow an autoregressive (AR) process, or are jointly Gaussian with sparse precision matrix. We propose tractable estimators for the conditional expectation and then perform linear regression via LASSO, and show similar estimation rates in both cases. We complement our theoretical results with simulations on synthetic and semi-synthetic examples, illustrating not only the sharpness of our bounds, but also the broader utility of this strategy beyond our theoretical assumptions.
△ Less
Submitted 24 January, 2020;
originally announced January 2020.
-
Cryogenic SiPM arrays for the DUNE photon detection system
Authors:
A. Falcone,
A. Andreani,
S. Bertolucci,
C. Brizzolari,
N. Buckanamd,
M. Capasso,
C. Cattadori,
P. Carniti,
M. Citterio,
K. Francis,
N. Gallice,
A. Gola,
C. Gotti,
I. Lax,
P. Litrico,
A. Mazzi,
M. Mellinato,
A. Montanari,
L. Patrizii,
L. Pasqualini,
G. Pessina,
M. Pozzato,
S. Riboldi,
P. Sala,
G. Sirri
, et al. (7 additional authors not shown)
Abstract:
In this paper we report on the characterization of SiPM tiles developed for the R & D on the DUNE Photon Detection System. The tiles were produced by Fondazione Bruno Kessler (FBK) employing NUV-HD-SF SiPMs. Special emphasis is given on cryo-reliability of the sensors, i.e. the stability of electric and mechanical properties after thermal cycles at room and 77K temperature. The characterization in…
▽ More
In this paper we report on the characterization of SiPM tiles developed for the R & D on the DUNE Photon Detection System. The tiles were produced by Fondazione Bruno Kessler (FBK) employing NUV-HD-SF SiPMs. Special emphasis is given on cryo-reliability of the sensors, i.e. the stability of electric and mechanical properties after thermal cycles at room and 77K temperature. The characterization includes the determination of the I-V curve, a high sensitivity measurement of Dark Count Rate at different overvoltages, and correlated noise. The single p.e. sensitivity is measured as a function of the number of sensors connected to a single electronic channel, after amplification at 77K using a dedicated cold amplifier.
△ Less
Submitted 24 January, 2020;
originally announced January 2020.
-
Optimization of Mean-field Spin Glasses
Authors:
Ahmed El Alaoui,
Andrea Montanari,
Mark Sellke
Abstract:
Mean-field spin glasses are families of random energy functions (Hamiltonians) on high-dimensional product spaces. In this paper we consider the case of Ising mixed $p$-spin models, namely Hamiltonians $H_N:Σ_N\to {\mathbb R}$ on the Hamming hypercube $Σ_N = \{\pm 1\}^N$, which are defined by the property that $\{H_N({\boldsymbol σ})\}_{{\boldsymbol σ}\in Σ_N}$ is a centered Gaussian process with…
▽ More
Mean-field spin glasses are families of random energy functions (Hamiltonians) on high-dimensional product spaces. In this paper we consider the case of Ising mixed $p$-spin models, namely Hamiltonians $H_N:Σ_N\to {\mathbb R}$ on the Hamming hypercube $Σ_N = \{\pm 1\}^N$, which are defined by the property that $\{H_N({\boldsymbol σ})\}_{{\boldsymbol σ}\in Σ_N}$ is a centered Gaussian process with covariance ${\mathbb E}\{H_N({\boldsymbol σ}_1)H_N({\boldsymbol σ}_2)\}$ depending only on the scalar product $\langle {\boldsymbol σ}_1,{\boldsymbol σ}_2\rangle$. The asymptotic value of the optimum $\max_{{\boldsymbol σ}\in Σ_N}H_N({\boldsymbol σ})$ was characterized in terms of a variational principle known as the Parisi formula, first proved by Talagrand and, in a more general setting, by Panchenko. The structure of superlevel sets is extremely rich and has been studied by a number of authors. Here we ask whether a near optimal configuration ${\boldsymbol σ}$ can be computed in polynomial time. We develop a message passing algorithm whose complexity per-iteration is of the same order as the complexity of evaluating the gradient of $H_N$, and characterize the typical energy value it achieves. When the $p$-spin model $H_N$ satisfies a certain no-overlap gap assumption, for any $\varepsilon>0$, the algorithm outputs ${\boldsymbol σ}\inΣ_N$ such that $H_N({\boldsymbol σ})\ge (1-\varepsilon)\max_{{\boldsymbol σ}'} H_N({\boldsymbol σ}')$, with high probability. The number of iterations is bounded in $N$ and depends uniquely on $\varepsilon$. More generally, regardless of whether the no-overlap gap assumption holds, the energy achieved is given by an extended variational principle, which generalizes the Parisi formula.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.
-
Anisotropic estimates of subelliptic type
Authors:
Annamaria Montanari,
Daniele Morbidelli
Abstract:
We discuss some estimates of subelliptic type related with vector fields satisfying the Hörmander condition. Our approach makes use of a class of approximate exponentials maps. Such kind of estimates arises naturally in the study of regularity theory of weak solutions of degenerate elliptic equations.
We discuss some estimates of subelliptic type related with vector fields satisfying the Hörmander condition. Our approach makes use of a class of approximate exponentials maps. Such kind of estimates arises naturally in the study of regularity theory of weak solutions of degenerate elliptic equations.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Matrix sketching for supervised classification with imbalanced classes
Authors:
Roberta Falcone,
Angela Montanari,
Laura Anderlucci
Abstract:
Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so doing numerical operations on big data sets become faster. Sketching algorithms generally use random projections to compress the original dataset and this stochast…
▽ More
Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so doing numerical operations on big data sets become faster. Sketching algorithms generally use random projections to compress the original dataset and this stochastic generation process makes them amenable to statistical analysis. The statistical properties of sketching algorithms have been widely studied in the context of multiple linear regression. In this paper we propose matrix sketching as a tool for rebalancing class sizes in supervised classification with imbalanced classes. It is well-known in fact that class imbalance may lead to poor classification performances especially as far as the minority class is concerned.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime
Authors:
Andrea Montanari,
Feng Ruan,
Youngtak Sohn,
Jun Yan
Abstract:
Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes.
Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data…
▽ More
Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes.
Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data $(y_i,{\boldsymbol x}_i)$, $i\le n$ are i.i.d. with ${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol Σ})$ a $p$-dimensional Gaussian feature vector, and $y_i \in\{+1,-1\}$ a label whose distribution depends on a linear combination of the covariates $\langle {\boldsymbol θ}_*,{\boldsymbol x}_i \rangle$. While the Gaussian model might appear extremely simplistic, universality arguments can be used to show that the results derived in this setting also apply to the output of certain nonlinear featurization maps.
We consider the proportional asymptotics $n,p\to\infty$ with $p/n\to ψ$, and derive exact expressions for the limiting generalization error. We use this theory to derive two results of independent interest: $(i)$ Sufficient conditions on $({\boldsymbol Σ},{\boldsymbol θ}_*)$ for `benign overfitting' that parallel previously derived conditions in the case of linear regression; $(ii)$ An asymptotically exact expression for the generalization error when max-margin classification is used in conjunction with feature vectors produced by random one-layer neural networks.
△ Less
Submitted 22 March, 2023; v1 submitted 4 November, 2019;
originally announced November 2019.
-
High-dimensional clustering via Random Projections
Authors:
Laura Anderlucci,
Francesca Fortunato,
Angela Montanari
Abstract:
In this work, we address the unsupervised classification issue by exploiting the general idea of Random Projection Ensemble. Specifically, we propose to generate a set of low dimensional independent random projections and to perform model-based clustering on each of them. The top $B^*$ projections, i.e. the projections which show the best grou** structure are then retained. The final partition i…
▽ More
In this work, we address the unsupervised classification issue by exploiting the general idea of Random Projection Ensemble. Specifically, we propose to generate a set of low dimensional independent random projections and to perform model-based clustering on each of them. The top $B^*$ projections, i.e. the projections which show the best grou** structure are then retained. The final partition is obtained by aggregating the clusters found in the projections via consensus. The performances of the method are assessed on both real and simulated datasets. The obtained results suggest that the proposal represents a promising tool for high-dimensional clustering.
△ Less
Submitted 23 November, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.
-
Quantification of predictive uncertainty in hydrological modelling by harnessing the wisdom of the crowd: A large-sample experiment at monthly timescale
Authors:
Georgia Papacharalampous,
Hristos Tyralis,
Demetris Koutsoyiannis,
Alberto Montanari
Abstract:
Predictive hydrological uncertainty can be quantified by using ensemble methods. If properly formulated, these methods can offer improved predictive performance by combining multiple predictions. In this work, we use 50-year-long monthly time series observed in 270 catchments in the United States to explore the performances provided by an ensemble learning post-processing methodology for issuing p…
▽ More
Predictive hydrological uncertainty can be quantified by using ensemble methods. If properly formulated, these methods can offer improved predictive performance by combining multiple predictions. In this work, we use 50-year-long monthly time series observed in 270 catchments in the United States to explore the performances provided by an ensemble learning post-processing methodology for issuing probabilistic hydrological predictions. This methodology allows the utilization of flexible quantile regression models for exploiting information about the hydrological model's error. Its key differences with respect to basic two-stage hydrological post-processing methodologies using the same type of regression models are that (a) instead of a single point hydrological prediction it generates a large number of "sister predictions" (yet using a single hydrological model), and that (b) it relies on the concept of combining probabilistic predictions via simple quantile averaging. A major hydrological modelling challenge is obtaining probabilistic predictions that are simultaneously reliable and associated to prediction bands that are as narrow as possible; therefore, we assess both these desired properties of the predictions by computing their coverage probabilities, average widths and average interval scores. The results confirm the usefulness of the proposed methodology and its larger robustness with respect to basic two-stage post-processing methodologies. Finally, this methodology is empirically proven to harness the "wisdom of the crowd" in terms of average interval score, i.e., the average of the individual predictions combined by this methodology scores no worse -- usually better -- than the average of the scores of the individual predictions.
△ Less
Submitted 3 January, 2020; v1 submitted 31 August, 2019;
originally announced September 2019.
-
Quantification of predictive uncertainty in hydrological modelling by harnessing the wisdom of the crowd: Methodology development and investigation using toy models
Authors:
Georgia Papacharalampous,
Demetris Koutsoyiannis,
Alberto Montanari
Abstract:
We introduce an ensemble learning post-processing methodology for probabilistic hydrological modelling. This methodology generates numerous point predictions by applying a single hydrological model, yet with different parameter values drawn from the respective simulated posterior distribution. We call these predictions "sister predictions". Each sister prediction extending in the period of interes…
▽ More
We introduce an ensemble learning post-processing methodology for probabilistic hydrological modelling. This methodology generates numerous point predictions by applying a single hydrological model, yet with different parameter values drawn from the respective simulated posterior distribution. We call these predictions "sister predictions". Each sister prediction extending in the period of interest is converted into a probabilistic prediction using information about the hydrological model's errors. This information is obtained from a preceding period for which observations are available, and is exploited using a flexible quantile regression model. All probabilistic predictions are finally combined via simple quantile averaging to produce the output probabilistic prediction. The idea is inspired by the ensemble learning methods originating from the machine learning literature. The proposed methodology offers larger robustness in performance than basic post-processing methodologies using a single hydrological point prediction. It is also empirically proven to "harness the wisdom of the crowd" in terms of average interval score, i.e., the obtained quantile predictions score no worse -- usually better -- than the average score of the combined individual predictions. This proof is provided within toy examples, which can be used for gaining insight on how the methodology works and under which conditions it can optimally convert point hydrological predictions to probabilistic ones. A large-scale hydrological application is made in a companion paper.
△ Less
Submitted 3 January, 2020; v1 submitted 31 August, 2019;
originally announced September 2019.
-
The generalization error of random features regression: Precise asymptotics and double descent curve
Authors:
Song Mei,
Andrea Montanari
Abstract:
Deep learning methods operate in regimes that defy the traditional statistical mindset. Neural network architectures often contain more parameters than training samples, and are so rich that they can interpolate the observed labels, even if the latter are replaced by pure noise. Despite their huge complexity, the same architectures achieve small generalization error on real data.
This phenomenon…
▽ More
Deep learning methods operate in regimes that defy the traditional statistical mindset. Neural network architectures often contain more parameters than training samples, and are so rich that they can interpolate the observed labels, even if the latter are replaced by pure noise. Despite their huge complexity, the same architectures achieve small generalization error on real data.
This phenomenon has been rationalized in terms of a so-called `double descent' curve. As the model complexity increases, the test error follows the usual U-shaped curve at the beginning, first decreasing and then peaking around the interpolation threshold (when the model achieves vanishing training error). However, it descends again as model complexity exceeds this threshold. The global minimum of the test error is found above the interpolation threshold, often in the extreme overparametrization regime in which the number of parameters is much larger than the number of samples. Far from being a peculiar property of deep neural networks, elements of this behavior have been demonstrated in much simpler settings, including linear regression with random covariates.
In this paper we consider the problem of learning an unknown function over the $d$-dimensional sphere $\mathbb S^{d-1}$, from $n$ i.i.d. samples $(\boldsymbol x_i, y_i)\in \mathbb S^{d-1} \times \mathbb R$, $i\le n$. We perform ridge regression on $N$ random features of the form $σ(\boldsymbol w_a^{\mathsf T} \boldsymbol x)$, $a\le N$. This can be equivalently described as a two-layers neural network with random first-layer weights. We compute the precise asymptotics of the test error, in the limit $N,n,d\to \infty$ with $N/d$ and $n/d$ fixed. This provides the first analytically tractable model that captures all the features of the double descent phenomenon without assuming ad hoc misspecification structures.
△ Less
Submitted 10 December, 2020; v1 submitted 14 August, 2019;
originally announced August 2019.
-
Multiexponential maps in Carnot groups with applications to convexity and differentiability
Authors:
Annamaria Montanari,
Daniele Morbidelli
Abstract:
We analyze some properties of a class of multiexponential maps appearing naturally in the geometric analysis of Carnot groups. We will see that such maps can be useful in at least two interesting problems. First, in relation to the analysis of some regularity properties of horizontally convex sets. Then, we will show that our multiexponential maps can be used to prove the Pansu differentiability o…
▽ More
We analyze some properties of a class of multiexponential maps appearing naturally in the geometric analysis of Carnot groups. We will see that such maps can be useful in at least two interesting problems. First, in relation to the analysis of some regularity properties of horizontally convex sets. Then, we will show that our multiexponential maps can be used to prove the Pansu differentiability of the subRiemannian distance from a fixed point.
△ Less
Submitted 8 May, 2020; v1 submitted 3 August, 2019;
originally announced August 2019.
-
Limitations of Lazy Training of Two-layers Neural Networks
Authors:
Behrooz Ghorbani,
Song Mei,
Theodor Misiakiewicz,
Andrea Montanari
Abstract:
We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class label…
▽ More
We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels. We use two-layers neural networks with quadratic activations, and compare three different learning regimes: the random features (RF) regime in which we only train the second-layer weights; the neural tangent (NT) regime in which we train a linearization of the neural network around its initialization; the fully trained neural network (NN) regime in which we train all the weights in the network. We prove that, even for the simple quadratic model of point (1), there is a potentially unbounded gap between the prediction risk achieved in these three training regimes, when the number of neurons is smaller than the ambient dimension. When the number of neurons is larger than the number of dimensions, the problem is significantly easier and both NT and NN learning achieve zero risk.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
One-class classification with application to forensic analysis
Authors:
Laura Anderlucci,
Francesca Fortunato,
Angela Montanari
Abstract:
The analysis of broken glass is forensically important to reconstruct the events of a criminal act. In particular, the comparison between the glass fragments found on a suspect (recovered cases) and those collected on the crime scene (control cases) may help the police to correctly identify the offender(s). The forensic issue can be framed as a one-class classification problem. One-class classific…
▽ More
The analysis of broken glass is forensically important to reconstruct the events of a criminal act. In particular, the comparison between the glass fragments found on a suspect (recovered cases) and those collected on the crime scene (control cases) may help the police to correctly identify the offender(s). The forensic issue can be framed as a one-class classification problem. One-class classification is a recently emerging and special classification task, where only one class is fully known (the so-called target class), while information on the others is completely missing. We propose to consider classic Gini's transvariation probability as a measure of typicality, i.e. a measure of resemblance between an observation and a set of well-known objects (the control cases). The aim of the proposed Transvariation-based One-Class Classifier (TOCC) is to identify the best boundary around the target class, that is, to recognise as many target objects as possible while rejecting all those deviating from this class.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Linearized two-layers neural networks in high dimension
Authors:
Behrooz Ghorbani,
Song Mei,
Theodor Misiakiewicz,
Andrea Montanari
Abstract:
We consider the problem of learning an unknown function $f_{\star}$ on the $d$-dimensional sphere with respect to the square loss, given i.i.d. samples $\{(y_i,{\boldsymbol x}_i)\}_{i\le n}$ where ${\boldsymbol x}_i$ is a feature vector uniformly distributed on the sphere and $y_i=f_{\star}({\boldsymbol x}_i)+\varepsilon_i$. We study two popular classes of models that can be regarded as linearizat…
▽ More
We consider the problem of learning an unknown function $f_{\star}$ on the $d$-dimensional sphere with respect to the square loss, given i.i.d. samples $\{(y_i,{\boldsymbol x}_i)\}_{i\le n}$ where ${\boldsymbol x}_i$ is a feature vector uniformly distributed on the sphere and $y_i=f_{\star}({\boldsymbol x}_i)+\varepsilon_i$. We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: the random features model of Rahimi-Recht (RF); the neural tangent kernel model of Jacot-Gabriel-Hongler (NT). Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.
We consider two specific regimes: the approximation-limited regime, in which $n=\infty$ while $d$ and $N$ are large but finite; and the sample size-limited regime in which $N=\infty$ while $d$ and $n$ are large but finite. In the first regime we prove that if $d^{\ell + δ} \le N\le d^{\ell+1-δ}$ for small $δ> 0$, then \RF\, effectively fits a degree-$\ell$ polynomial in the raw features, and \NT\, fits a degree-$(\ell+1)$ polynomial. In the second regime, both RF and NT reduce to kernel methods with rotationally invariant kernels. We prove that, if the number of samples is $d^{\ell + δ} \le n \le d^{\ell +1-δ}$, then kernel methods can fit at most a a degree-$\ell$ polynomial in the raw features. This lower bound is achieved by kernel ridge regression. Optimal prediction error is achieved for vanishing ridge regularization.
△ Less
Submitted 16 February, 2020; v1 submitted 27 April, 2019;
originally announced April 2019.
-
Undecidability of future timeline-based planning over dense temporal domains
Authors:
Laura Bozzelli,
Alberto Molinari,
Angelo Montanari,
Adriano Peron
Abstract:
Planning is one of the most studied problems in computer science. In this paper, we consider the timeline-based approach, where the domain is modeled by a set of independent, but interacting, components, identified by a set of state variables, whose behavior over time (timelines) is governed by a set of temporal constraints (synchronization rules). Timeline-based planning in the dense-time setting…
▽ More
Planning is one of the most studied problems in computer science. In this paper, we consider the timeline-based approach, where the domain is modeled by a set of independent, but interacting, components, identified by a set of state variables, whose behavior over time (timelines) is governed by a set of temporal constraints (synchronization rules). Timeline-based planning in the dense-time setting has been recently shown to be undecidable in the general case, and undecidability relies on the high expressiveness of the trigger synchronization rules. In this paper, we strengthen the previous negative result by showing that undecidability already holds under the future semantics of the trigger rules which limits the comparison to temporal contexts in the future with respect to the trigger.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
On the computational tractability of statistical estimation on amenable graphs
Authors:
Ahmed El Alaoui,
Andrea Montanari
Abstract:
We consider the problem of estimating a vector of discrete variables $(θ_1,\cdots,θ_n)$, based on noisy observations $Y_{uv}$ of the pairs $(θ_u,θ_v)$ on the edges of a graph $G=([n],E)$. This setting comprises a broad family of statistical estimation problems, including group synchronization on graphs, community detection, and low-rank matrix estimation.
A large body of theoretical work has est…
▽ More
We consider the problem of estimating a vector of discrete variables $(θ_1,\cdots,θ_n)$, based on noisy observations $Y_{uv}$ of the pairs $(θ_u,θ_v)$ on the edges of a graph $G=([n],E)$. This setting comprises a broad family of statistical estimation problems, including group synchronization on graphs, community detection, and low-rank matrix estimation.
A large body of theoretical work has established sharp thresholds for weak and exact recovery, and sharp characterizations of the optimal reconstruction accuracy in such models, focusing however on the special case of Erdös--Rényi-type random graphs. The single most important finding of this line of work is the ubiquity of an information-computation gap. Namely, for many models of interest, a large gap is found between the optimal accuracy achievable by any statistical method, and the optimal accuracy achieved by known polynomial-time algorithms. Moreover, this gap is generally believed to be robust to small amounts of additional side information revealed about the $θ_i$'s.
How does the structure of the graph $G$ affect this picture? Is the information-computation gap a general phenomenon or does it only apply to specific families of graphs?
We prove that the picture is dramatically different for graph sequences converging to amenable graphs (including, for instance, $d$-dimensional grids). We consider a model in which an arbitrarily small fraction of the vertex labels is revealed, and show that a linear-time local algorithm can achieve reconstruction accuracy that is arbitrarily close to the information-theoretic optimum. We contrast this to the case of random graphs. Indeed, focusing on group synchronization on random regular graphs, we prove that the information-computation gap still persists even when a small amount of side information is revealed.
△ Less
Submitted 22 September, 2019; v1 submitted 5 April, 2019;
originally announced April 2019.