-
Linear Transformers are Versatile In-Context Learners
Authors:
Max Vladymyrov,
Johannes von Oswald,
Mark Sandler,
Rong Ge
Abstract:
Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that any linear transformer maintains an implicit linear model and can be interpreted as…
▽ More
Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that any linear transformer maintains an implicit linear model and can be interpreted as performing a variant of preconditioned gradient descent. We also investigate the use of linear transformers in a challenging scenario where the training data is corrupted with different levels of noise. Remarkably, we demonstrate that for this problem linear transformers discover an intricate and highly effective optimization algorithm, surpassing or matching in performance many reasonable baselines. We reverse-engineer this algorithm and show that it is a novel approach incorporating momentum and adaptive rescaling based on noise levels. Our findings show that even linear transformers possess the surprising ability to discover sophisticated optimization strategies.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Uncovering mesa-optimization algorithms in Transformers
Authors:
Johannes von Oswald,
Eyvind Niklasson,
Maximilian Schlegel,
Sei** Kobayashi,
Nicolas Zucchet,
Nino Scherrer,
Nolan Miller,
Mark Sandler,
Blaise Agüera y Arcas,
Max Vladymyrov,
Razvan Pascanu,
João Sacramento
Abstract:
Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning…
▽ More
Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Continual Few-Shot Learning Using HyperTransformers
Authors:
Max Vladymyrov,
Andrey Zhmoginov,
Mark Sandler
Abstract:
We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from…
▽ More
We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.
△ Less
Submitted 12 January, 2023; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Training trajectories, mini-batch losses and the curious role of the learning rate
Authors:
Mark Sandler,
Andrey Zhmoginov,
Max Vladymyrov,
Nolan Miller
Abstract:
Stochastic gradient descent plays a fundamental role in nearly all applications of deep learning. However its ability to converge to a global minimum remains shrouded in mystery. In this paper we propose to study the behavior of the loss function on fixed mini-batches along SGD trajectories. We show that the loss function on a fixed batch appears to be remarkably convex-like. In particular for Res…
▽ More
Stochastic gradient descent plays a fundamental role in nearly all applications of deep learning. However its ability to converge to a global minimum remains shrouded in mystery. In this paper we propose to study the behavior of the loss function on fixed mini-batches along SGD trajectories. We show that the loss function on a fixed batch appears to be remarkably convex-like. In particular for ResNet the loss for any fixed mini-batch can be accurately modeled by a quadratic function and a very low loss value can be reached in just one step of gradient descent with sufficiently large learning rate. We propose a simple model that allows to analyze the relationship between the gradients of stochastic mini-batches and the full batch. Our analysis allows us to discover the equivalency between iterate aggregates and specific learning rate schedules. In particular, for Exponential Moving Average (EMA) and Stochastic Weight Averaging we show that our proposed model matches the observed training trajectories on ImageNet. Our theoretical model predicts that an even simpler averaging technique, averaging just two points a many steps apart, significantly improves accuracy compared to the baseline. We validated our findings on ImageNet and other datasets using ResNet architecture.
△ Less
Submitted 1 February, 2023; v1 submitted 5 January, 2023;
originally announced January 2023.
-
Transformers learn in-context by gradient descent
Authors:
Johannes von Oswald,
Eyvind Niklasson,
Ettore Randazzo,
João Sacramento,
Alexander Mordvintsev,
Andrey Zhmoginov,
Max Vladymyrov
Abstract:
At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linea…
▽ More
At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss. Motivated by that construction, we show empirically that when training self-attention-only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the weights found by optimization match the construction. Thus we show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass. This allows us, at least in the domain of regression problems, to mechanistically understand the inner workings of in-context learning in optimized Transformers. Building on this insight, we furthermore identify how Transformers surpass the performance of plain gradient descent by learning an iterative curvature correction and learn linear models on deep data representations to solve non-linear regression tasks. Finally, we discuss intriguing parallels to a mechanism identified to be crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be understood as a specific case of in-context learning by gradient descent learning within Transformers. Code to reproduce the experiments can be found at https://github.com/google-research/self-organising-systems/tree/master/transformers_learn_icl_by_gd .
△ Less
Submitted 31 May, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Decentralized Learning with Multi-Headed Distillation
Authors:
Andrey Zhmoginov,
Mark Sandler,
Nolan Miller,
Gus Kristiansen,
Max Vladymyrov
Abstract:
Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxilia…
▽ More
Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Fine-tuning Image Transformers using Learnable Memory
Authors:
Mark Sandler,
Andrey Zhmoginov,
Max Vladymyrov,
Andrew Jackson
Abstract:
In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these "memory tokens"…
▽ More
In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these "memory tokens". We show that augmenting a model with just a handful of such tokens per layer significantly improves accuracy when compared to conventional head-only fine-tuning, and performs only slightly below the significantly more expensive full fine-tuning. We then propose an attention-masking approach that enables extension to new downstream tasks, with a computation reuse. In this setup in addition to being parameters efficient, models can execute both old and new tasks as a part of single inference at a small incremental cost.
△ Less
Submitted 29 March, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
GradMax: Growing Neural Networks using Gradient Information
Authors:
Utku Evci,
Bart van Merriënboer,
Thomas Unterthiner,
Max Vladymyrov,
Fabian Pedregosa
Abstract:
The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the trai…
▽ More
The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We achieve the latter by maximizing the gradients of the new weights and find the optimal initialization efficiently by means of the singular value decomposition (SVD). We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.
△ Less
Submitted 7 June, 2022; v1 submitted 13 January, 2022;
originally announced January 2022.
-
HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Authors:
Andrey Zhmoginov,
Mark Sandler,
Max Vladymyrov
Abstract:
In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space…
▽ More
In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable.
△ Less
Submitted 13 July, 2022; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Meta-Learning Bidirectional Update Rules
Authors:
Mark Sandler,
Max Vladymyrov,
Andrey Zhmoginov,
Nolan Miller,
Andrew Jackson,
Tom Madams,
Blaise Aguera y Arcas
Abstract:
In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks…
▽ More
In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional "genome". We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks.
△ Less
Submitted 11 June, 2021; v1 submitted 9 April, 2021;
originally announced April 2021.
-
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Authors:
Alexander D'Amour,
Katherine Heller,
Dan Moldovan,
Ben Adlam,
Babak Alipanahi,
Alex Beutel,
Christina Chen,
Jonathan Deaton,
Jacob Eisenstein,
Matthew D. Hoffman,
Farhad Hormozdiari,
Neil Houlsby,
Shaobo Hou,
Ghassen Jerfel,
Alan Karthikesalingam,
Mario Lucic,
Yian Ma,
Cory McLean,
Diana Mincu,
Akinori Mitani,
Andrea Montanari,
Zachary Nado,
Vivek Natarajan,
Christopher Nielson,
Thomas F. Osborne
, et al. (15 additional authors not shown)
Abstract:
ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict…
▽ More
ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.
△ Less
Submitted 24 November, 2020; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Novel tracking approach based on fully-unsupervised disentanglement of the geometrical factors of variation
Authors:
Mykhailo Vladymyrov,
Akitaka Ariga
Abstract:
Efficient tracking algorithms are a crucial part of particle tracking detectors. While a lot of work has been done in designing a plethora of algorithms, these usually require tedious tuning for each use case. (Weakly) supervised Machine Learning-based approaches can leverage the actual raw data for maximal performance. Yet in realistic scenarios, sufficient high-quality labeled data is not availa…
▽ More
Efficient tracking algorithms are a crucial part of particle tracking detectors. While a lot of work has been done in designing a plethora of algorithms, these usually require tedious tuning for each use case. (Weakly) supervised Machine Learning-based approaches can leverage the actual raw data for maximal performance. Yet in realistic scenarios, sufficient high-quality labeled data is not available. While training might be performed on simulated data, the reproduction of realistic signal and noise in the detector requires substantial effort, compromising this approach.
Here we propose a novel, fully unsupervised, approach to track reconstruction. The introduced model for learning to disentangle the factors of variation in a geometrically meaningful way employs geometrical space invariances. We train it through constraints on the equivariance between the image space and the latent representation in a Deep Convolutional Autoencoder. Using experimental results on synthetic data we show that a combination of different space transformations is required for meaningful disentanglement of factors of variation. We also demonstrate the performance of our model on real data from tracking detectors.
△ Less
Submitted 13 February, 2020; v1 submitted 10 September, 2019;
originally announced September 2019.
-
No Pressure! Addressing the Problem of Local Minima in Manifold Learning Algorithms
Authors:
Max Vladymyrov
Abstract:
Nonlinear embedding manifold learning methods provide invaluable visual insights into the structure of high-dimensional data. However, due to a complicated nonconvex objective function, these methods can easily get stuck in local minima and their embedding quality can be poor. We propose a natural extension to several manifold learning methods aimed at identifying pressured points, i.e. points stu…
▽ More
Nonlinear embedding manifold learning methods provide invaluable visual insights into the structure of high-dimensional data. However, due to a complicated nonconvex objective function, these methods can easily get stuck in local minima and their embedding quality can be poor. We propose a natural extension to several manifold learning methods aimed at identifying pressured points, i.e. points stuck in poor local minima and have poor embedding quality. We show that the objective function can be decreased by temporarily allowing these points to make use of an extra dimension in the embedding space. Our method is able to improve the objective function value of existing methods even after they get stuck in a poor local minimum.
△ Less
Submitted 27 December, 2019; v1 submitted 26 June, 2019;
originally announced June 2019.
-
DsTau: Study of tau neutrino production with 400 GeV protons from the CERN-SPS
Authors:
Shigeki Aoki,
Akitaka Ariga,
Tomoko Ariga,
Sergey Dmitrievsky,
Elena Firu,
Dean Forshaw,
Tsutomu Fukuda,
Yuri Gornushkin,
Ali Murat Guler,
Maria Haiduc,
Koichi Kodama,
Masahiro Komatsu,
Muhtesem Akif Korkmaz,
Umut Kose,
Madalina Miloi,
Antonio Miucci,
Motoaki Miyanishi,
Mitsuhiro Nakamura,
Toshiyuki Nakano,
Alina Neagu,
Hiroki Rokujo,
Osamu Sato,
Elizaveta Sitnikova,
Yosuke Suzuki,
Tomoki Takao
, et al. (5 additional authors not shown)
Abstract:
In the DsTau experiment at the CERN SPS, an independent and direct way to measure tau neutrino production following high energy proton interactions was proposed. As the main source of tau neutrinos is a decay of Ds mesons, produced in proton-nucleus interactions, the project aims at measuring a differential cross section of this reaction. The experimental method is based on a use of high resolutio…
▽ More
In the DsTau experiment at the CERN SPS, an independent and direct way to measure tau neutrino production following high energy proton interactions was proposed. As the main source of tau neutrinos is a decay of Ds mesons, produced in proton-nucleus interactions, the project aims at measuring a differential cross section of this reaction. The experimental method is based on a use of high resolution emulsion detectors for effective registration of events with short lived particle decays. Here we present the motivation of the study, details of the experimental technique, and the first results of the analysis of the data collected during test runs, which prove feasibility of the full scale study of the process in future.
△ Less
Submitted 8 June, 2019;
originally announced June 2019.
-
Nuclear emulsions for the detection of micrometric-scale fringe patterns: an application to positron interferometry
Authors:
S. Aghion,
A. Ariga,
M. Bollani,
A. Ereditato,
R. Ferragut,
M. Giammarchi,
M. Lodari,
C. Pistillo,
S. Sala,
P. Scampoli,
M. Vladymyrov
Abstract:
Nuclear emulsions are capable of very high position resolution in the detection of ionizing particles. This feature can be exploited to directly resolve the micrometric-scale fringe pattern produced by a matter-wave interferometer for low energy positrons (in the 10-20 keV range). We have tested the performance of emulsion films in this specific scenario. Exploiting silicon nitride diffraction gra…
▽ More
Nuclear emulsions are capable of very high position resolution in the detection of ionizing particles. This feature can be exploited to directly resolve the micrometric-scale fringe pattern produced by a matter-wave interferometer for low energy positrons (in the 10-20 keV range). We have tested the performance of emulsion films in this specific scenario. Exploiting silicon nitride diffraction gratings as absorption masks, we produced periodic patterns with features comparable to the expected interferometer signal. Test samples with periodicities of 6, 7 and 20 μm were exposed to the positron beam, and the patterns clearly reconstructed. Our results support the feasibility of matter-wave interferometry experiments with positrons.
△ Less
Submitted 11 April, 2018; v1 submitted 12 February, 2018;
originally announced February 2018.
-
Study of tau-neutrino production at the CERN SPS
Authors:
S. Aoki,
A. Ariga,
T. Ariga,
E. Firu,
T. Fukuda,
Y. Gornushkin,
A. M. Guler,
M. Haiduc,
K. Kodama,
M. A. Korkmaz,
U. Kose,
M. Nakamura,
T. Nakano,
A. T. Neagu,
H. Rokujo,
O. Sato,
S. Vasina,
M. Vladymyrov,
M. Yoshimoto
Abstract:
The DsTau project proposes to study tau-neutrino production in high-energy proton interactions. The outcome of this experiment are prerequisite for measuring the $ν_τ$ charged-current cross section that has never been well measured. Precisely measuring the cross section would enable testing of lepton universality in $ν_τ$ scattering and it also has practical implications for neutrino oscillation e…
▽ More
The DsTau project proposes to study tau-neutrino production in high-energy proton interactions. The outcome of this experiment are prerequisite for measuring the $ν_τ$ charged-current cross section that has never been well measured. Precisely measuring the cross section would enable testing of lepton universality in $ν_τ$ scattering and it also has practical implications for neutrino oscillation experiments and high-energy astrophysical $ν_τ$ observations. $D_s$ mesons, the source of tau neutrinos, following high-energy proton interactions will be studied by a novel approach to detect the double-kink topology of the decays $D_s \rightarrow τν_τ$ and $τ\rightarrowν_τX$. Directly measuring $D_s\rightarrow τ$ decays will provide an inclusive measurement of the $D_s$ production rate and decay branching ratio to $τ$. The momentum reconstruction of $D_s$ will be performed by combining topological variables. This project aims to detect 1,000 $D_s \rightarrow τ$ decays in $2.3 \times 10^8$ proton interactions in tungsten target to study the differential production cross section of $D_s$ mesons. To achieve this, state-of-the-art emulsion detectors with a nanometric-precision readout will be used. The data generated by this project will enable the $ν_τ$ cross section from DONUT to be re-evaluated, and this should significantly reduce the total systematic uncertainty. Furthermore, these results will provide essential data for future $ν_τ$ experiments such as the $ν_τ$ program in the SHiP project at CERN. In addition, the analysis of $2.3 \times 10^8$ proton interactions, combined with the expected high yield of $10^5$ charmed decays as by-products, will enable the extraction of additional physical quantities.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Measurement of antiproton annihilation on Cu, Ag and Au with emulsion films
Authors:
S. Aghion,
C. Amsler,
A. Ariga,
T. Ariga,
G. Bonomi,
P. Braunig,
R. S. Brusa,
L. Cabaret,
M. Caccia,
R. Caravita,
F. Castelli,
G. Cerchiari,
D. Comparat,
G. Consolati,
A. Demetrio,
L. Di Noto,
M. Doser,
A. Ereditato,
C. Evans,
R. Ferragut,
J. Fesel,
A. Fontana,
S. Gerber,
M. Giammarchi,
A. Gligorova
, et al. (47 additional authors not shown)
Abstract:
The characteristics of low energy antiproton annihilations on nuclei (e.g. hadronization and product multiplicities) are not well known, and Monte Carlo simulation packages that use different models provide different descriptions of the annihilation events. In this study, we measured the particle multiplicities resulting from antiproton annihilations on nuclei. The results were compared with predi…
▽ More
The characteristics of low energy antiproton annihilations on nuclei (e.g. hadronization and product multiplicities) are not well known, and Monte Carlo simulation packages that use different models provide different descriptions of the annihilation events. In this study, we measured the particle multiplicities resulting from antiproton annihilations on nuclei. The results were compared with predictions obtained using different models in the simulation tools GEANT4 and FLUKA. For this study, we exposed thin targets (Cu, Ag and Au) to a very low energy antiproton beam from CERN's Antiproton Decelerator, exploiting the secondary beamline available in the AEgIS experimental zone. The antiproton annihilation products were detected using emulsion films developed at the Laboratory of High Energy Physics in Bern, where they were analysed at the automatic microscope facility. The fragment multiplicity measured in this study is in good agreement with results obtained with FLUKA simulations for both minimally and heavily ionizing particles.
△ Less
Submitted 23 April, 2017; v1 submitted 23 January, 2017;
originally announced January 2017.
-
NEWS: Nuclear Emulsions for WIMP Search
Authors:
A. Aleksandrov,
A. Anokhina,
T. Asada,
D. Bender,
I. Bodnarchuk,
A. Buonaura,
S. Buontempo,
M. Chernyavskii,
A. Chukanov,
L. Consiglio,
N. D'Ambrosio,
G. De Lellis,
M. De Serio,
A. Di Crescenzo,
N. Di Marco,
S. Dmitrievski,
T. Dzhatdoev,
R. A. Fini,
S. Furuya,
G. Galati,
V. Gentile,
S. Gorbunov,
Y. Gornushkin,
A. M. Guler,
H. Ichiki
, et al. (34 additional authors not shown)
Abstract:
Nowadays there is compelling evidence for the existence of dark matter in the Universe. A general consensus has been expressed on the need for a directional sensitive detector to confirm, with a complementary approach, the candidates found in conventional searches and to finally extend their sensitivity beyond the limit of neutrino-induced background. We propose here the use of a detector based on…
▽ More
Nowadays there is compelling evidence for the existence of dark matter in the Universe. A general consensus has been expressed on the need for a directional sensitive detector to confirm, with a complementary approach, the candidates found in conventional searches and to finally extend their sensitivity beyond the limit of neutrino-induced background. We propose here the use of a detector based on nuclear emulsions to measure the direction of WIMP-induced nuclear recoils. The production of nuclear emulsion films with nanometric grains is established. Several measurement campaigns have demonstrated the capability of detecting sub-micrometric tracks left by low energy ions in such emulsion films. Innovative analysis technologies with fully automated optical microscopes have made it possible to achieve the track reconstruction for path lengths down to one hundred nanometers and there are good prospects to further exceed this limit. The detector concept we propose foresees the use of a bulk of nuclear emulsion films surrounded by a shield from environmental radioactivity, to be placed on an equatorial telescope in order to cancel out the effect of the Earth rotation, thus kee** the detector at a fixed orientation toward the expected direction of galactic WIMPs. We report the schedule and cost estimate for a one-kilogram mass pilot experiment, aiming at delivering the first results on the time scale of six years.
△ Less
Submitted 14 April, 2016;
originally announced April 2016.
-
Extra-large crystal emulsion detectors for future large-scale experiments
Authors:
T. Ariga,
A. Ariga,
K. Kuwabara,
K. Morishima,
M. Moto,
A. Nishio,
P. Scampoli,
M. Vladymyrov
Abstract:
Photographic emulsion is a particle tracking device which features the best spatial resolution among particle detectors. For certain applications, for example muon radiography, large-scale detectors are required. Therefore, a huge surface has to be analyzed by means of automated optical microscopes. An improvement of the readout speed is then a crucial point to make these applications possible and…
▽ More
Photographic emulsion is a particle tracking device which features the best spatial resolution among particle detectors. For certain applications, for example muon radiography, large-scale detectors are required. Therefore, a huge surface has to be analyzed by means of automated optical microscopes. An improvement of the readout speed is then a crucial point to make these applications possible and the availability of a new type of photographic emulsions featuring crystals of larger size is a way to pursue this program. This would allow a lower magnification for the microscopes, a consequent larger field of view resulting in a faster data analysis. In this framework, we developed new kinds of emulsion detectors with a crystal size of 600-1000 nm, namely 3-5 times larger than conventional ones, allowing a 25 times faster data readout. The new photographic emulsions have shown a sufficient sensitivity and a good signal to noise ratio. The proposed development opens the way to future large-scale applications of the technology, e.g. 3D imaging of glacier bedrocks or future neutrino experiments.
△ Less
Submitted 11 February, 2016; v1 submitted 4 December, 2015;
originally announced December 2015.
-
Discovery of tau neutrino appearance in the CNGS neutrino beam with the OPERA experiment
Authors:
OPERA Collaboration,
N. Agafonova,
A. Aleksandrov,
A. Anokhina,
S. Aoki,
A. Ariga,
T. Ariga,
D. Bender,
A. Bertolin,
I. Bodnarchuk,
C. Bozza,
R. Brugnera,
A. Buonaura,
S. Buontempo,
B. Büttner,
M. Chernyavsky,
A. Chukanov,
L. Consiglio,
N. D'Ambrosio,
G. De Lellis,
M. De Serio,
P. Del Amo Sanchez,
A. Di Crescenzo,
D. Di Ferdinando,
N. Di Marco
, et al. (117 additional authors not shown)
Abstract:
The OPERA experiment was designed to search for $ν_μ \rightarrow ν_τ$ oscillations in appearance mode, i.e. by detecting the $τ$-leptons produced in charged current $ν_τ$ interactions. The experiment took data from 2008 to 2012 in the CERN Neutrinos to Gran Sasso beam. The observation of $ν_μ \rightarrow ν_τ$ appearance, achieved with four candidate events in a sub-sample of the data, was previous…
▽ More
The OPERA experiment was designed to search for $ν_μ \rightarrow ν_τ$ oscillations in appearance mode, i.e. by detecting the $τ$-leptons produced in charged current $ν_τ$ interactions. The experiment took data from 2008 to 2012 in the CERN Neutrinos to Gran Sasso beam. The observation of $ν_μ \rightarrow ν_τ$ appearance, achieved with four candidate events in a sub-sample of the data, was previously reported. In this paper, a fifth $ν_τ$ candidate event, found in an enlarged data sample, is described. Together with a further reduction of the expected background, the candidate events detected so far allow assessing the discovery of $ν_μ\rightarrow ν_τ$ oscillations in appearance mode with a significance larger than 5 $σ$.
△ Less
Submitted 2 November, 2015; v1 submitted 6 July, 2015;
originally announced July 2015.
-
A facility to Search for Hidden Particles (SHiP) at the CERN SPS
Authors:
SHiP Collaboration,
M. Anelli,
S. Aoki,
G. Arduini,
J. J. Back,
A. Bagulya,
W. Baldini,
A. Baranov,
G. J. Barker,
S. Barsuk,
M. Battistin,
J. Bauche,
A. Bay,
V. Bayliss,
L. Bellagamba,
G. Bencivenni,
M. Bertani,
O. Bezshyyko,
D. Bick,
N. Bingefors,
A. Blondel,
M. Bogomilov,
A. Boyarsky,
D. Bonacorsi,
D. Bondarenko
, et al. (211 additional authors not shown)
Abstract:
A new general purpose fixed target facility is proposed at the CERN SPS accelerator which is aimed at exploring the domain of hidden particles and make measurements with tau neutrinos. Hidden particles are predicted by a large number of models beyond the Standard Model. The high intensity of the SPS 400~GeV beam allows probing a wide variety of models containing light long-lived exotic particles w…
▽ More
A new general purpose fixed target facility is proposed at the CERN SPS accelerator which is aimed at exploring the domain of hidden particles and make measurements with tau neutrinos. Hidden particles are predicted by a large number of models beyond the Standard Model. The high intensity of the SPS 400~GeV beam allows probing a wide variety of models containing light long-lived exotic particles with masses below ${\cal O}$(10)~GeV/c$^2$, including very weakly interacting low-energy SUSY states. The experimental programme of the proposed facility is capable of being extended in the future, e.g. to include direct searches for Dark Matter and Lepton Flavour Violation.
△ Less
Submitted 20 April, 2015;
originally announced April 2015.
-
Search for Sterile Neutrinos in the Muon Neutrino Disappearance Mode at FNAL
Authors:
A. Anokhina,
A. Bagulya,
M. Benettoni,
P. Bernardini,
R. Brugnera,
M. Calabrese,
A. Cecchetti,
S. Cecchini,
M. Chernyavskiy,
F. Dal Corso,
O. Dalkarov,
A. Del Prete,
G. De Robertis,
M. De Serio,
D. Di Ferdinando,
S. Dusini,
T. Dzhatdoev,
R. A. Fini,
G. Fiore,
A. Garfagnini,
M. Guerzoni,
B. Klicek,
U. Kose,
K. Jakovcic,
G. Laurenti
, et al. (39 additional authors not shown)
Abstract:
The NESSiE Collaboration has been setup to undertake a conclusive experiment to clarify the {\em muon--neutrino disappearance} measurements at short baselines in order to put severe constraints to models with more than the three--standard neutrinos. To this aim the current FNAL--Booster neutrino beam for a Short--Baseline experiment was carefully evaluated by considering the use of magnetic spectr…
▽ More
The NESSiE Collaboration has been setup to undertake a conclusive experiment to clarify the {\em muon--neutrino disappearance} measurements at short baselines in order to put severe constraints to models with more than the three--standard neutrinos. To this aim the current FNAL--Booster neutrino beam for a Short--Baseline experiment was carefully evaluated by considering the use of magnetic spectrometers at two sites, near and far ones. The detector locations were studied, together with the achievable performances of two OPERA--like spectrometers. The study was constrained by the availability of existing hardware and a time--schedule compatible with the undergoing project of multi--site Liquid--Argon detectors at FNAL.
The settled physics case and the kind of proposed experiment on the Booster neutrino beam would definitively clarify the existing tension between the $ν_μ$ disappearance and the $ν_e$ appearance/disappearance at the eV mass scale. In the context of neutrino oscillations the measurement of $ν_μ$ disappearance is a robust and fast approach to either reject or discover new neutrino states at the eV mass scale. We discuss an experimental program able to extend by more than one order of magnitude (for neutrino disappearance) and by almost one order of magnitude (for antineutrino disappearance) the present range of sensitivity for the mixing angle between standard and sterile neutrinos. These extensions are larger than those achieved in any other proposal presented so far.
△ Less
Submitted 2 February, 2017; v1 submitted 25 March, 2015;
originally announced March 2015.
-
Limits on muon-neutrino to tau-neutrino oscillations induced by a sterile neutrino state obtained by OPERA at the CNGS beam
Authors:
OPERA Collaboration,
N. Agafonova,
A. Aleksandrov,
A. Anokhina,
S. Aoki,
A. Ariga,
T. Ariga,
D. Bender,
A. Bertolin,
I. Bodnarchuk,
C. Bozza,
R. Brugnera,
A. Buonaura,
S. Buontempo,
B. Büttner,
M. Chernyavsky,
A. Chukanov,
L. Consiglio,
N. D'Ambrosio,
G. De Lellis,
M. De Serio,
P. Del Amo Sanchez,
A. Di Crescenzo,
D. Di Ferdinando,
N. Di Marco
, et al. (106 additional authors not shown)
Abstract:
The OPERA experiment, exposed to the CERN to Gran Sasso $ν_μ$ beam, collected data from 2008 to 2012. Four oscillated $ν_τ$ Charged Current interaction candidates have been detected in appearance mode, which are consistent with $ν_μ\to ν_τ$ oscillations at the atmospheric $Δm^2$ within the "standard" three-neutrino framework. In this paper, the OPERA $ν_τ$ appearance results are used to derive lim…
▽ More
The OPERA experiment, exposed to the CERN to Gran Sasso $ν_μ$ beam, collected data from 2008 to 2012. Four oscillated $ν_τ$ Charged Current interaction candidates have been detected in appearance mode, which are consistent with $ν_μ\to ν_τ$ oscillations at the atmospheric $Δm^2$ within the "standard" three-neutrino framework. In this paper, the OPERA $ν_τ$ appearance results are used to derive limits on the mixing parameters of a massive sterile neutrino.
△ Less
Submitted 14 March, 2015; v1 submitted 6 March, 2015;
originally announced March 2015.
-
The NESSiE way to searches for sterile neutrinos at FNAL
Authors:
L. Stanco,
A. Anokhina,
A. Bagulya,
M. Benettoni,
P. Bernardini,
R. Brugnera,
M. Calabrese,
A. Cecchetti,
S. Cecchini,
M. Chernyavskiy,
P. Creti,
F. Dal Corso,
O. Dalkarov,
A. Del Prete,
G. De Robertis,
M. De Serio,
L. Degli Esposti,
D. Di Ferdinando,
S. Dusini,
T. Dzhatdoev,
C. Fanin,
R. A. Fini,
G. Fiore,
A. Garfagnini,
S. Golovanov
, et al. (44 additional authors not shown)
Abstract:
Neutrino physics is nowadays receiving more and more attention as a possible source of information for the long-standing problem of new physics beyond the Standard Model. The recent measurement of the mixing angle $θ_{13}$ in the standard mixing oscillation scenario encourages us to pursue the still missing results on leptonic CP violation and absolute neutrino masses. However, puzzling measuremen…
▽ More
Neutrino physics is nowadays receiving more and more attention as a possible source of information for the long-standing problem of new physics beyond the Standard Model. The recent measurement of the mixing angle $θ_{13}$ in the standard mixing oscillation scenario encourages us to pursue the still missing results on leptonic CP violation and absolute neutrino masses. However, puzzling measurements exist that deserve an exhaustive evaluation.
The NESSiE Collaboration has been setup to undertake conclusive experiments to clarify the muon-neutrino disappearance measurements at small $L/E$, which will be able to put severe constraints to models with more than the three-standard neutrinos, or even to robustly measure the presence of a new kind of neutrino oscillation for the first time. To this aim the use of the current FNAL-Booster neutrino beam for a Short-Baseline experiment has been carefully evaluated. Its recent proposal refers to the use of magnetic spectrometers at two different sites, Near and Far ones. Their positions have been extensively studied, together with the possible performances of two OPERA-like spectrometers. The proposal is constrained by availability of existing hardware and a time-schedule compatible with the undergoing project of a multi-site Liquid-Argon detectors at FNAL.
The experiment to be possibly setup at Booster will allow to definitively clarify the current $ν_μ$ disappearance tension with $ν_{e}$ appearance and disappearance at the eV mass scale.
△ Less
Submitted 15 October, 2014;
originally announced October 2014.
-
Prospects for the measurement of muon-neutrino disappearance at the FNAL-Booster
Authors:
A. Anokhina,
A. Bagulya,
M. Benettoni,
P. Bernardini,
R. Brugnera,
M. Calabrese,
A. Cecchetti,
S. Cecchini,
M. Chernyavskiy,
P. Creti,
F. Dal Corso,
O. Dalkarov,
A. Del Prete,
G. De Robertis,
M. De Serio,
L. Degli Esposti,
D. Di Ferdinando,
S. Dusini,
T. Dzhatdoev,
C. Fanin,
R. A. Fini,
G. Fiore,
A. Garfagnini,
S. Golovanov,
M. Guerzoni
, et al. (44 additional authors not shown)
Abstract:
Neutrino physics is nowadays receiving more and more attention as a possible source of information for the long-standing problem of new physics beyond the Standard Model. The recent measurement of the mixing angle $θ_{13}$ in the standard mixing oscillation scenario encourages us to pursue the still missing results on leptonic CP violation and absolute neutrino masses. However, puzzling measuremen…
▽ More
Neutrino physics is nowadays receiving more and more attention as a possible source of information for the long-standing problem of new physics beyond the Standard Model. The recent measurement of the mixing angle $θ_{13}$ in the standard mixing oscillation scenario encourages us to pursue the still missing results on leptonic CP violation and absolute neutrino masses. However, puzzling measurements exist that deserve an exhaustive evaluation. The NESSiE Collaboration has been setup to undertake conclusive experiments to clarify the muon-neutrino disappearance measurements at small $L/E$, which will be able to put severe constraints to models with more than the three-standard neutrinos, or even to robustly measure the presence of a new kind of neutrino oscillation for the first time. To this aim the use of the current FNAL-Booster neutrino beam for a Short-Baseline experiment has been carefully evaluated. This proposal refers to the use of magnetic spectrometers at two different sites, Near and Far. Their positions have been extensively studied, together with the possible performances of two OPERA-like spectrometers. The proposal is constrained by availability of existing hardware and a time-schedule compatible with the CERN project for a new more performant neutrino beam, which will nicely extend the physics results achievable at the Booster. The possible FNAL experiment will allow to clarify the current $ν_μ$ disappearance tension with $ν_e$ appearance and disappearance at the eV mass scale. Instead, a new CERN neutrino beam would allow a further span in the parameter space together with a refined control of systematics and, more relevant, the measurement of the antineutrino sector, by upgrading the spectrometer with detectors currently under R&D study.
△ Less
Submitted 9 April, 2014;
originally announced April 2014.
-
The NESSiE Concept for Sterile Neutrinos
Authors:
L. Stanco,
A. Anokhina,
A. Bagulya,
M. Benettoni,
P. Bernardini,
A. Bertolin,
R. Brugnera,
M. Calabrese,
A. Cecchetti,
S. Cecchini,
M. Chernyavskiy,
G. Collazuol,
P. Creti,
F. Dal Corso,
O. Dalkarov,
A. Del Prete,
I. De Mitri,
G. De Robertis,
M. De Serio,
L. Degli Esposti,
D. Di Ferdinando,
U. Dore,
S. Dusini,
T. Dzhatdoev,
C. Fanin
, et al. (56 additional authors not shown)
Abstract:
Neutrino physics is nowadays receiving more and more attention as a possible source of information for the long-standing problem of new physics beyond the Standard Model. The recent measurement of the third mixing angle theta13 in the standard mixing oscillation scenario encourages us to pursue the still missing results on leptonic CP violation and absolute neutrino masses. However, several puzzli…
▽ More
Neutrino physics is nowadays receiving more and more attention as a possible source of information for the long-standing problem of new physics beyond the Standard Model. The recent measurement of the third mixing angle theta13 in the standard mixing oscillation scenario encourages us to pursue the still missing results on leptonic CP violation and absolute neutrino masses. However, several puzzling measurements exist, which deserve an exhaustive evaluation. The NESSiE Collaboration has been setup to undertake a definitive experiment to clarify the muon disappearance measurements at small L/E, which will be able to put severe constraints to any model with more than the three-standard neutrinos, or even to robustly measure the presence of a new kind of neutrino oscillation for the first time. Within the context of the current CERN project, aimed to revitalize the neutrino field in Europe, we will illustrate the achievements that can be obtained by a double muon-spectrometer system, with emphasis on the search for sterile neutrinos.
△ Less
Submitted 4 December, 2013;
originally announced December 2013.
-
Partial-Hessian Strategies for Fast Learning of Nonlinear Embeddings
Authors:
Max Vladymyrov,
Miguel Carreira-Perpinan
Abstract:
Stochastic neighbor embedding (SNE) and related nonlinear manifold learning algorithms achieve high-quality low-dimensional representations of similarity data, but are notoriously slow to train. We propose a generic formulation of embedding algorithms that includes SNE and other existing algorithms, and study their relation with spectral methods and graph Laplacians. This allows us to define sever…
▽ More
Stochastic neighbor embedding (SNE) and related nonlinear manifold learning algorithms achieve high-quality low-dimensional representations of similarity data, but are notoriously slow to train. We propose a generic formulation of embedding algorithms that includes SNE and other existing algorithms, and study their relation with spectral methods and graph Laplacians. This allows us to define several partial-Hessian optimization strategies, characterize their global and local convergence, and evaluate them empirically. We achieve up to two orders of magnitude speedup over existing training methods with a strategy (which we call the spectral direction) that adds nearly no overhead to the gradient and yet is simple, scalable and applicable to several existing and future embedding algorithms.
△ Less
Submitted 18 June, 2012;
originally announced June 2012.