-
Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
B. Bannier,
K. N. Barish,
B. Bassalleck,
S. Bathe
, et al. (377 additional authors not shown)
Abstract:
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability…
▽ More
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Expressivity of Neural Networks with Random Weights and Learned Biases
Authors:
Ezekiel Williams,
Avery Hee-Woon Ryoo,
Thomas Jiralerspong,
Alexandre Payeur,
Matthew G. Perich,
Luca Mazzucato,
Guillaume Lajoie
Abstract:
Landmark universal function approximation results for neural networks with trained weights and biases provided impetus for the ubiquitous use of neural networks as learning models in Artificial Intelligence (AI) and neuroscience. Recent work has pushed the bounds of universal approximation by showing that arbitrary functions can similarly be learned by tuning smaller subsets of parameters, for exa…
▽ More
Landmark universal function approximation results for neural networks with trained weights and biases provided impetus for the ubiquitous use of neural networks as learning models in Artificial Intelligence (AI) and neuroscience. Recent work has pushed the bounds of universal approximation by showing that arbitrary functions can similarly be learned by tuning smaller subsets of parameters, for example the output weights, within randomly initialized networks. Motivated by the fact that biases can be interpreted as biologically plausible mechanisms for adjusting unit outputs in neural networks, such as tonic inputs or activation thresholds, we investigate the expressivity of neural networks with random weights where only biases are optimized. We provide theoretical and numerical evidence demonstrating that feedforward neural networks with fixed random weights can be trained to perform multiple tasks by learning biases only. We further show that an equivalent result holds for recurrent neural networks predicting dynamical system trajectories. Our results are relevant to neuroscience, where they demonstrate the potential for behaviourally relevant changes in dynamics without modifying synaptic weights, as well as for AI, where they shed light on multi-task methods such as bias fine-tuning and unit masking.
△ Less
Submitted 2 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Does learning the right latent variables necessarily improve in-context learning?
Authors:
Sarthak Mittal,
Eric Elmoznino,
Leo Gagnon,
Sangnie Bhardwaj,
Dhanya Sridhar,
Guillaume Lajoie
Abstract:
Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by in…
▽ More
Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Identified charged-hadron production in $p$$+$Al, $^3$He$+$Au, and Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
J. Alexander,
M. Alfred,
V. Andrieux,
K. Aoki,
N. Apadula,
H. Asano,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
X. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
V. Baublis
, et al. (456 additional authors not shown)
Abstract:
The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interprete…
▽ More
The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interpreted in terms of radially expanding thermalized systems. The particle ratios of $K/π$ and $p/π$ have been measured in different centrality ranges of large (Cu$+$Au, U$+$U) and small ($p$$+$Al, $^3$He$+$Au) collision systems. The values of $K/π$ ratios measured in all considered collision systems were found to be consistent with those measured in $p$$+$$p$ collisions. However the values of $p/π$ ratios measured in large collision systems reach the values of $\approx0.6$, which is $\approx2$ times larger than in $p$$+$$p$ collisions. These results can be qualitatively understood in terms of the baryon enhancement expected from hadronization by recombination. Identified charged-hadron nuclear-modification factors ($R_{AB}$) are also presented. Enhancement of proton $R_{AB}$ values over meson $R_{AB}$ values was observed in central $^3$He$+$Au, Cu$+$Au, and U$+$U collisions. The proton $R_{AB}$ values measured in $p$$+$Al collision system were found to be consistent with $R_{AB}$ values of $φ$, $π^\pm$, $K^\pm$, and $π^0$ mesons, which may indicate that the size of the system produced in $p$$+$Al collisions is too small for recombination to cause a noticeable increase in proton production.
△ Less
Submitted 22 May, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
A Unified, Scalable Framework for Neural Population Decoding
Authors:
Mehdi Azabou,
Vinam Arora,
Venkataramana Ganesh,
Ximeng Mao,
Santosh Nachimuthu,
Michael J. Mendelson,
Blake Richards,
Matthew G. Perich,
Guillaume Lajoie,
Eva L. Dyer
Abstract:
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and archit…
▽ More
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
How connectivity structure shapes rich and lazy learning in neural circuits
Authors:
Yuhan Helena Liu,
Aristide Baratin,
Jonathan Cornford,
Stefan Mihalas,
Eric Shea-Brown,
Guillaume Lajoie
Abstract:
In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biolo…
▽ More
In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity could exhibit a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights -- in particular their effective rank -- influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in sha** learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting.
△ Less
Submitted 19 February, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Amortizing intractable inference in large language models
Authors:
Edward J. Hu,
Moksh Jain,
Eric Elmoznino,
Younesse Kaddar,
Guillaume Lajoie,
Yoshua Bengio,
Nikolay Malkin
Abstract:
Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distribu…
▽ More
Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.
△ Less
Submitted 13 March, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
Authors:
Tianhong Li,
Sangnie Bhardwaj,
Yonglong Tian,
Han Zhang,
Jarred Barber,
Dina Katabi,
Guillaume Lajoie,
Huiwen Chang,
Dilip Krishnan
Abstract:
Current vision-language generative models rely on expansive corpora of paired image-text data to attain optimal performance and generalization capabilities. However, automatically collecting such data (e.g. via large-scale web scra**) leads to low quality and poor image-text correlation, while human annotation is more accurate but requires significant manual effort and expense. We introduce…
▽ More
Current vision-language generative models rely on expansive corpora of paired image-text data to attain optimal performance and generalization capabilities. However, automatically collecting such data (e.g. via large-scale web scra**) leads to low quality and poor image-text correlation, while human annotation is more accurate but requires significant manual effort and expense. We introduce $\textbf{ITIT}$ ($\textbf{I}$n$\textbf{T}$egrating $\textbf{I}$mage $\textbf{T}$ext): an innovative training paradigm grounded in the concept of cycle consistency which allows vision-language training on unpaired image and text data. ITIT is comprised of a joint image-text encoder with disjoint image and text decoders that enable bidirectional image-to-text and text-to-image generation in a single framework. During training, ITIT leverages a small set of paired image-text data to ensure its output matches the input reasonably well in both directions. Simultaneously, the model is also trained on much larger datasets containing only images or texts. This is achieved by enforcing cycle consistency between the original unpaired samples and the cycle-generated counterparts. For instance, it generates a caption for a given input image and then uses the caption to create an output image, and enforces similarity between the input and output images. Our experiments show that ITIT with unpaired datasets exhibits similar scaling behavior as using high-quality paired data. We demonstrate image generation and captioning performance on par with state-of-the-art text-to-image and image-to-text models with orders of magnitude fewer (only 3M) paired image-text data.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Delta-AI: Local objectives for amortized inference in sparse graphical models
Authors:
Jean-Pierre Falet,
Hae Beom Lee,
Nikolay Malkin,
Chen Sun,
Dragos Secrieru,
Thomas Jiralerspong,
Dinghuai Zhang,
Guillaume Lajoie,
Yoshua Bengio
Abstract:
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $Δ$-amortized inference ($Δ$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local…
▽ More
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $Δ$-amortized inference ($Δ$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local constraint that can be turned into a local loss in the style of generative flow networks (GFlowNets) that enables off-policy training but avoids the need to instantiate all the random variables for each parameter update, thus speeding up training considerably. The $Δ$-AI objective matches the conditional distribution of a variable given its Markov blanket in a tractable learned sampler, which has the structure of a Bayesian network, with the same conditional distribution under the target PGM. As such, the trained sampler recovers marginals and conditional distributions of interest and enables inference of partial subsets of variables. We illustrate $Δ$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
△ Less
Submitted 13 March, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Discrete, compositional, and symbolic representations through attractor dynamics
Authors:
Andrew Nam,
Eric Elmoznino,
Nikolay Malkin,
Chen Sun,
Yoshua Bengio,
Guillaume Lajoie
Abstract:
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite capacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantiz…
▽ More
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite capacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Synaptic Weight Distributions Depend on the Geometry of Plasticity
Authors:
Roman Pogodin,
Jonathan Cornford,
Arna Ghosh,
Gauthier Gidel,
Guillaume Lajoie,
Blake Richards
Abstract:
A growing literature in computational neuroscience leverages gradient descent and learning algorithms that approximate it to study synaptic plasticity in the brain. However, the vast majority of this work ignores a critical underlying assumption: the choice of distance for synaptic changes - i.e. the geometry of synaptic plasticity. Gradient descent assumes that the distance is Euclidean, but many…
▽ More
A growing literature in computational neuroscience leverages gradient descent and learning algorithms that approximate it to study synaptic plasticity in the brain. However, the vast majority of this work ignores a critical underlying assumption: the choice of distance for synaptic changes - i.e. the geometry of synaptic plasticity. Gradient descent assumes that the distance is Euclidean, but many other distances are possible, and there is no reason that biology necessarily uses Euclidean geometry. Here, using the theoretical tools provided by mirror descent, we show that the distribution of synaptic weights will depend on the geometry of synaptic plasticity. We use these results to show that experimentally-observed log-normal weight distributions found in several brain areas are not consistent with standard gradient descent (i.e. a Euclidean geometry), but rather with non-Euclidean distances. Finally, we show that it should be possible to experimentally test for different synaptic geometries by comparing synaptic weight distributions before and after learning. Overall, our work shows that the current paradigm in theoretical work on synaptic plasticity that assumes Euclidean synaptic geometry may be misguided and that it should be possible to experimentally determine the true geometry of synaptic plasticity in the brain.
△ Less
Submitted 4 March, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Hot QCD White Paper
Authors:
M. Arslandok,
S. A. Bass,
A. A. Baty,
I. Bautista,
C. Beattie,
F. Becattini,
R. Bellwied,
Y. Berdnikov,
A. Berdnikov,
J. Bielcik,
J. T. Blair,
F. Bock,
B. Boimska,
H. Bossi,
H. Caines,
Y. Chen,
Y. -T. Chien,
M. Chiu,
M. E. Connors,
M. Csanád,
C. L. da Silva,
A. P. Dash,
G. David,
K. Dehmelt,
V. Dexheimer
, et al. (149 additional authors not shown)
Abstract:
Hot QCD physics studies the nuclear strong force under extreme temperature and densities. Experimentally these conditions are achieved via high-energy collisions of heavy ions at the Relativistic Heavy Ion Collider (RHIC) and the Large Hadron Collider (LHC). In the past decade, a unique and substantial suite of data was collected at RHIC and the LHC, probing hydrodynamics at the nucleon scale, the…
▽ More
Hot QCD physics studies the nuclear strong force under extreme temperature and densities. Experimentally these conditions are achieved via high-energy collisions of heavy ions at the Relativistic Heavy Ion Collider (RHIC) and the Large Hadron Collider (LHC). In the past decade, a unique and substantial suite of data was collected at RHIC and the LHC, probing hydrodynamics at the nucleon scale, the temperature dependence of the transport properties of quark-gluon plasma, the phase diagram of nuclear matter, the interaction of quarks and gluons at different scales and much more. This document, as part of the 2023 nuclear science long range planning process, was written to review the progress in hot QCD since the 2015 Long Range Plan for Nuclear Science, as well as highlight the realization of previous recommendations, and present opportunities for the next decade, building on the accomplishments and investments made in theoretical developments and the construction of new detectors. Furthermore, this document provides additional context to support the recommendations voted on at the Joint Hot and Cold QCD Town Hall Meeting, which are reported in a separate document.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Disentangling centrality bias and final-state effects in the production of high-$p_T$ $π^0$ using direct $γ$ in $d$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
N. J. Abdulameer,
U. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
K. Aoki,
N. Apadula,
C. Ayuso,
V. Babintsev,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
M. Boer,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj,
V. Bumazhnov,
C. Butler
, et al. (253 additional authors not shown)
Abstract:
PHENIX presents a simultaneous measurement of the production of direct $γ$ and $π^0$ in $d$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV over a $p_T$ range of 7.5 to 18 GeV/$c$ for different event samples selected by event activity, i.e. charged-particle multiplicity detected at forward rapidity. Direct-photon yields are used to empirically estimate the contribution of hard-scattering processes i…
▽ More
PHENIX presents a simultaneous measurement of the production of direct $γ$ and $π^0$ in $d$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV over a $p_T$ range of 7.5 to 18 GeV/$c$ for different event samples selected by event activity, i.e. charged-particle multiplicity detected at forward rapidity. Direct-photon yields are used to empirically estimate the contribution of hard-scattering processes in the different event samples. Using this estimate, the average nuclear-modification factor $R_{d\rm Au,EXP}^{γ^{\rm dir}}$ is $0.925{\pm}0.023({\rm stat}){\pm}0.15^{\rm (scale)}$, consistent with unity for minimum-bias (MB) $d$$+$Au events. For event classes with moderate event activity, $R_{d\rm Au,EXP}^{γ^{\rm dir}}$ is consistent with the MB value within 5\% uncertainty. These results confirm that the previously observed enhancement of high-$p_T$ $π^0$ production found in small-system collisions with low event activity is a result of a bias in interpreting event activity within the Glauber framework. In contrast, for the top 5\% of events with the highest event activity, $R_{d\rm Au,EXP}^{γ^{\rm dir}}$ is suppressed by 20\% relative to the MB value with a significance of $4.5σ$, which may be due to final-state effects.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Transverse single-spin asymmetry of charged hadrons at forward and backward rapidity in polarized $p$+$p$, $p$+Al, and $p$+Au collisions at $\sqrt{s_{NN}}=200$ GeV}
Authors:
N. J. Abdulameer,
U. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
N. S. Bandara,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj
, et al. (297 additional authors not shown)
Abstract:
Reported here are transverse single-spin asymmetries ($A_{N}$) in the production of charged hadrons as a function of transverse momentum ($p_T$) and Feynman-$x$ ($x_F$) in polarized $p^{\uparrow}$+$p$, $p^{\uparrow}$+Al, and $p^{\uparrow}$+Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. The measurements have been performed at forward and backward rapidity ($1.4<|η|<2.4$) over the range of…
▽ More
Reported here are transverse single-spin asymmetries ($A_{N}$) in the production of charged hadrons as a function of transverse momentum ($p_T$) and Feynman-$x$ ($x_F$) in polarized $p^{\uparrow}$+$p$, $p^{\uparrow}$+Al, and $p^{\uparrow}$+Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. The measurements have been performed at forward and backward rapidity ($1.4<|η|<2.4$) over the range of $1.5<p_{T}<7.0~{\rm GeV}/c$ and $0.04<|x_{F}|<0.2$. A nonzero asymmetry is observed for positively charged hadrons at forward rapidity ($x_F>0$) in $p^{\uparrow}$+$p$ collisions, whereas the $p^{\uparrow}$+Al and $p^{\uparrow}$+Au results show smaller asymmetries. This finding provides new opportunities to investigate the origin of transverse single-spin asymmetries and a tool to study nuclear effects in $p$+$A$ collisions.
△ Less
Submitted 31 October, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Transverse single-spin asymmetry of midrapidity $π^{0}$ and $η$ mesons in $p$+Au and $p$+Al collisions at $\sqrt{s_{_{NN}}}=$ 200 GeV
Authors:
N. J. Abdulameer,
U. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
N. S. Bandara,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj
, et al. (297 additional authors not shown)
Abstract:
Presented are the first measurements of the transverse single-spin asymmetries ($A_N$) for neutral pions and eta mesons in $p$+Au and $p$+Al collisions at $\sqrt{s_{_{NN}}}=200$ GeV in the pseudorapidity range $|η|<$0.35 with the PHENIX detector at the Relativistic Heavy Ion Collider. The asymmetries are consistent with zero, similar to those for midrapidity neutral pions and eta mesons produced i…
▽ More
Presented are the first measurements of the transverse single-spin asymmetries ($A_N$) for neutral pions and eta mesons in $p$+Au and $p$+Al collisions at $\sqrt{s_{_{NN}}}=200$ GeV in the pseudorapidity range $|η|<$0.35 with the PHENIX detector at the Relativistic Heavy Ion Collider. The asymmetries are consistent with zero, similar to those for midrapidity neutral pions and eta mesons produced in $p$+$p$ collisions. These measurements show no evidence of additional effects that could potentially arise from the more complex partonic environment present in proton-nucleus collisions.
△ Less
Submitted 6 June, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
The Present and Future of QCD
Authors:
P. Achenbach,
D. Adhikari,
A. Afanasev,
F. Afzal,
C. A. Aidala,
A. Al-bataineh,
D. K. Almaalol,
M. Amaryan,
D. Androić,
W. R. Armstrong,
M. Arratia,
J. Arrington,
A. Asaturyan,
E. C. Aschenauer,
H. Atac,
H. Avakian,
T. Averett,
C. Ayerbe Gayoso,
X. Bai,
K. N. Barish,
N. Barnea,
G. Basar,
M. Battaglieri,
A. A. Baty,
I. Bautista
, et al. (378 additional authors not shown)
Abstract:
This White Paper presents the community inputs and scientific conclusions from the Hot and Cold QCD Town Meeting that took place September 23-25, 2022 at MIT, as part of the Nuclear Science Advisory Committee (NSAC) 2023 Long Range Planning process. A total of 424 physicists registered for the meeting. The meeting highlighted progress in Quantum Chromodynamics (QCD) nuclear physics since the 2015…
▽ More
This White Paper presents the community inputs and scientific conclusions from the Hot and Cold QCD Town Meeting that took place September 23-25, 2022 at MIT, as part of the Nuclear Science Advisory Committee (NSAC) 2023 Long Range Planning process. A total of 424 physicists registered for the meeting. The meeting highlighted progress in Quantum Chromodynamics (QCD) nuclear physics since the 2015 LRP (LRP15) and identified key questions and plausible paths to obtaining answers to those questions, defining priorities for our research over the coming decade. In defining the priority of outstanding physics opportunities for the future, both prospects for the short (~ 5 years) and longer term (5-10 years and beyond) are identified together with the facilities, personnel and other resources needed to maximize the discovery potential and maintain United States leadership in QCD physics worldwide. This White Paper is organized as follows: In the Executive Summary, we detail the Recommendations and Initiatives that were presented and discussed at the Town Meeting, and their supporting rationales. Section 2 highlights major progress and accomplishments of the past seven years. It is followed, in Section 3, by an overview of the physics opportunities for the immediate future, and in relation with the next QCD frontier: the EIC. Section 4 provides an overview of the physics motivations and goals associated with the EIC. Section 5 is devoted to the workforce development and support of diversity, equity and inclusion. This is followed by a dedicated section on computing in Section 6. Section 7 describes the national need for nuclear data science and the relevance to QCD research.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
Flexible Phase Dynamics for Bio-Plausible Contrastive Learning
Authors:
Ezekiel Williams,
Colin Bredenberg,
Guillaume Lajoie
Abstract:
Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by contrasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL.…
▽ More
Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by contrasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks.
△ Less
Submitted 30 August, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Steerable Equivariant Representation Learning
Authors:
Sangnie Bhardwaj,
Willie McClinton,
Tongzhou Wang,
Guillaume Lajoie,
Chen Sun,
Phillip Isola,
Dilip Krishnan
Abstract:
Pre-trained deep image representations are useful for post-training tasks such as classification through transfer learning, image retrieval, and object detection. Data augmentations are a crucial aspect of pre-training robust representations in both supervised and self-supervised settings. Data augmentations explicitly or implicitly promote invariance in the embedding space to the input image tran…
▽ More
Pre-trained deep image representations are useful for post-training tasks such as classification through transfer learning, image retrieval, and object detection. Data augmentations are a crucial aspect of pre-training robust representations in both supervised and self-supervised settings. Data augmentations explicitly or implicitly promote invariance in the embedding space to the input image transformations. This invariance reduces generalization to those downstream tasks which rely on sensitivity to these particular data augmentations. In this paper, we propose a method of learning representations that are instead equivariant to data augmentations. We achieve this equivariance through the use of steerable representations. Our representations can be manipulated directly in embedding space via learned linear maps. We demonstrate that our resulting steerable and equivariant representations lead to better performance on transfer learning and robustness: e.g. we improve linear probe top-1 accuracy by between 1% to 3% for transfer; and ImageNet-C accuracy by upto 3.4%. We further show that the steerability of our representations provides significant speedup (nearly 50x) for test-time augmentations; by applying a large number of augmentations for out-of-distribution detection, we significantly improve OOD AUC on the ImageNet-C dataset over an invariant representation.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Sources of Richness and Ineffability for Phenomenally Conscious States
Authors:
Xu Ji,
Eric Elmoznino,
George Deane,
Axel Constant,
Guillaume Dumas,
Guillaume Lajoie,
Jonathan Simon,
Yoshua Bengio
Abstract:
Conscious states (states that there is something it is like to be in) seem both rich or full of detail, and ineffable or hard to fully describe or recall. The problem of ineffability, in particular, is a longstanding issue in philosophy that partly motivates the explanatory gap: the belief that consciousness cannot be reduced to underlying physical processes. Here, we provide an information theore…
▽ More
Conscious states (states that there is something it is like to be in) seem both rich or full of detail, and ineffable or hard to fully describe or recall. The problem of ineffability, in particular, is a longstanding issue in philosophy that partly motivates the explanatory gap: the belief that consciousness cannot be reduced to underlying physical processes. Here, we provide an information theoretic dynamical systems perspective on the richness and ineffability of consciousness. In our framework, the richness of conscious experience corresponds to the amount of information in a conscious state and ineffability corresponds to the amount of information lost at different stages of processing. We describe how attractor dynamics in working memory would induce impoverished recollections of our original experiences, how the discrete symbolic nature of language is insufficient for describing the rich and high-dimensional structure of experiences, and how similarity in the cognitive function of two individuals relates to improved communicability of their experiences to each other. While our model may not settle all questions relating to the explanatory gap, it makes progress toward a fully physicalist explanation of the richness and ineffability of conscious experience: two important aspects that seem to be part of what makes qualitative character so puzzling.
△ Less
Submitted 20 June, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer
Authors:
Damjan Kalajdzievski,
Ximeng Mao,
Pascal Fortier-Poisson,
Guillaume Lajoie,
Blake Richards
Abstract:
When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challeng…
▽ More
When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
△ Less
Submitted 8 March, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Reliability of CKA as a Similarity Measure in Deep Learning
Authors:
MohammadReza Davari,
Stefan Horoi,
Amine Natik,
Guillaume Lajoie,
Guy Wolf,
Eugene Belilovsky
Abstract:
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained…
▽ More
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.
△ Less
Submitted 16 November, 2022; v1 submitted 28 October, 2022;
originally announced October 2022.
-
From Points to Functions: Infinite-dimensional Representations in Diffusion Models
Authors:
Sarthak Mittal,
Guillaume Lajoie,
Stefan Bauer,
Arash Mehrjou
Abstract:
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a le…
▽ More
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a learned stochastic differential equation (SDE). Generative models are only concerned with the final state of this trajectory that delivers samples from the desired distribution. Abstreiter et. al showed that these stochastic trajectories can be seen as continuous filters that wash out information along the way. Consequently, it is reasonable to ask if there is an intermediate time step at which the preserved information is optimal for a given downstream task. In this work, we show that a combination of information content from different time steps gives a strictly better representation for the downstream task. We introduce an attention and recurrence based modules that ``learn to mix'' information content of various time-steps such that the resultant representation leads to superior performance in downstream tasks.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
Authors:
Thomas George,
Guillaume Lajoie,
Aristide Baratin
Abstract:
Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called lazy training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficu…
▽ More
Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called lazy training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of easy-to-learn spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.
△ Less
Submitted 21 November, 2022; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Measurement of $φ$-meson production in Cu$+$Au at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U at $\sqrt{s_{_{NN}}}=193$ GeV
Authors:
N. J. Abdulameer,
U. Acharya,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
J. Alexander,
M. Alfred,
M. Alibordi,
K. Aoki,
N. Apadula,
H. Asano,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
X. Bai,
B. Bannier,
K. N. Barish,
S. Bathe,
V. Baublis,
C. Baumann,
S. Baumgart,
A. Bazilevsky
, et al. (387 additional authors not shown)
Abstract:
The PHENIX experiment reports systematic measurements at the Relativistic Heavy Ion Collider of $φ$-meson production in asymmetric Cu$+$Au collisions at $\sqrt{s_{_{NN}}}$=200 GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}$=193 GeV. Measurements were performed via the $φ\rightarrow K^{+}K^{-}$ decay channel at midrapidity $|η|<0.35$. Features of $φ$-meson production measured in Cu$+$Cu, Cu$+$Au,…
▽ More
The PHENIX experiment reports systematic measurements at the Relativistic Heavy Ion Collider of $φ$-meson production in asymmetric Cu$+$Au collisions at $\sqrt{s_{_{NN}}}$=200 GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}$=193 GeV. Measurements were performed via the $φ\rightarrow K^{+}K^{-}$ decay channel at midrapidity $|η|<0.35$. Features of $φ$-meson production measured in Cu$+$Cu, Cu$+$Au, Au$+$Au, and U$+$U collisions were found to not depend on the collision geometry, which was expected because the yields are averaged over the azimuthal angle and follow the expected scaling with nuclear-overlap size. The elliptic flow of the $φ$ meson in Cu$+$Au, Au$+$Au, and U$+$U collisions scales with second-order-participant eccentricity and the length scale of the nuclear-overlap region (estimated with the number of participating nucleons). At moderate $p_T$, $φ$-meson production measured in Cu$+$Au and U$+$U collisions is consistent with coalescence-model predictions, whereas at high $p_T$ the production is in agreement with expectations for in-medium energy loss of parent partons prior to their fragmentation. The elliptic flow for $φ$ mesons measured in Cu$+$Au and U$+$U collisions is well described by a (2+1)D viscous-hydrodynamic model with specific-shear viscosity $η/s=1/4π$.
△ Less
Submitted 13 January, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
On Neural Architecture Inductive Biases for Relational Tasks
Authors:
Giancarlo Kerg,
Sarthak Mittal,
David Rolnick,
Yoshua Bengio,
Blake Richards,
Guillaume Lajoie
Abstract:
Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generalization. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory represe…
▽ More
Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generalization. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by 'partitioned' representations of relations and sensory details, and how this inductive bias can help recompose learned relational structure in newly encountered settings. We introduce a simple architecture based on similarity scores which we name Compositional Relational Network (CoRelNet). Using this model, we investigate a series of inductive biases that ensure abstract relations are learned and represented distinctly from sensory data, and explore their effects on out-of-distribution generalization for a series of relational psychophysics tasks. We find that simple architectural choices can outperform existing models in out-of-distribution generalization. Together, these results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing out-of-distribution relational computations.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Is a Modular Architecture Enough?
Authors:
Sarthak Mittal,
Yoshua Bengio,
Guillaume Lajoie
Abstract:
Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent work demonstrates that not only do some modular architectures generalize well, but they also lead to better out-of-distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is tha…
▽ More
Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent work demonstrates that not only do some modular architectures generalize well, but they also lead to better out-of-distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparsely interacting parts, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules
Authors:
Yuhan Helena Liu,
Arna Ghosh,
Blake A. Richards,
Eric Shea-Brown,
Guillaume Lajoie
Abstract:
To unveil how the brain learns, ongoing work seeks biologically-plausible approximations of gradient descent algorithms for training recurrent neural networks (RNNs). Yet, beyond task accuracy, it is unclear if such learning rules converge to solutions that exhibit different levels of generalization than their nonbiologically-plausible counterparts. Leveraging results from deep learning theory bas…
▽ More
To unveil how the brain learns, ongoing work seeks biologically-plausible approximations of gradient descent algorithms for training recurrent neural networks (RNNs). Yet, beyond task accuracy, it is unclear if such learning rules converge to solutions that exhibit different levels of generalization than their nonbiologically-plausible counterparts. Leveraging results from deep learning theory based on loss landscape curvature, we ask: how do biologically-plausible gradient approximations affect generalization? We first demonstrate that state-of-the-art biologically-plausible learning rules for training RNNs exhibit worse and more variable generalization performance compared to their machine learning counterparts that follow the true gradient more closely. Next, we verify that such generalization performance is correlated significantly with loss landscape curvature, and we show that biologically-plausible learning rules tend to approach high-curvature regions in synaptic weight space. Using tools from dynamical systems, we derive theoretical arguments and present a theorem explaining this phenomenon. This predicts our numerical results, and explains why biologically-plausible rules lead to worse and more variable generalization properties. Finally, we suggest potential remedies that could be used by the brain to mitigate this effect. To our knowledge, our analysis is the first to identify the reason for this generalization gap between artificial and biologically-plausible learning rules, which can help guide future investigations into how the brain learns solutions that generalize.
△ Less
Submitted 13 January, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Improving constraints on gluon spin-momentum correlations in transversely polarized protons via midrapidity open-heavy-flavor electrons in $p^{\uparrow}+p$ collisions at $\sqrt{s}=200$ GeV
Authors:
N. J. Abdulameer,
U. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
N. S. Bandara,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj
, et al. (299 additional authors not shown)
Abstract:
Polarized proton-proton collisions provide leading-order access to gluons, presenting an opportunity to constrain gluon spin-momentum correlations within transversely polarized protons and enhance our understanding of the three-dimensional structure of the proton. Midrapidity open-heavy-flavor production at $\sqrt{s}=200$ GeV is dominated by gluon-gluon fusion, providing heightened sensitivity to…
▽ More
Polarized proton-proton collisions provide leading-order access to gluons, presenting an opportunity to constrain gluon spin-momentum correlations within transversely polarized protons and enhance our understanding of the three-dimensional structure of the proton. Midrapidity open-heavy-flavor production at $\sqrt{s}=200$ GeV is dominated by gluon-gluon fusion, providing heightened sensitivity to gluon dynamics relative to other production channels. Transverse single-spin asymmetries of positrons and electrons from heavy-flavor hadron decays are measured at midrapidity using the PHENIX detector at the Relativistic Heavy Ion Collider. These charge-separated measurements are sensitive to gluon correlators that can in principle be related to gluon orbital angular momentum via model calculations. Explicit constraints on gluon correlators are extracted for two separate models, one of which had not been constrained previously.
△ Less
Submitted 7 March, 2023; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Nonprompt direct-photon production in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
U. A. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok
, et al. (311 additional authors not shown)
Abstract:
The measurement of the direct-photon spectrum from Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV is presented by the PHENIX collaboration using the external-photon-conversion technique for 0\%--93\% central collisions in a transverse-momentum ($p_T$) range of 0.8--10 GeV/$c$. An excess of direct photons, above prompt-photon production from hard-scattering processes, is observed for $p_T<6$ GeV/…
▽ More
The measurement of the direct-photon spectrum from Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV is presented by the PHENIX collaboration using the external-photon-conversion technique for 0\%--93\% central collisions in a transverse-momentum ($p_T$) range of 0.8--10 GeV/$c$. An excess of direct photons, above prompt-photon production from hard-scattering processes, is observed for $p_T<6$ GeV/$c$. Nonprompt direct photons are measured by subtracting the prompt component, which is estimated as $N_{\rm coll}$-scaled direct photons from $p$$+$$p$ collisions at 200 GeV, from the direct-photon spectrum. Results are obtained for $0.8<p_T<6.0$ GeV/$c$ and suggest that the spectrum has an increasing inverse slope from ${\approx}0.2$ to 0.4 GeV/$c$ with increasing $p_T$, which indicates a possible sensitivity of the measurement to photons from earlier stages of the evolution of the collision. In addition, like the direct-photon production, the $p_T$-integrated nonprompt direct-photon yields also follow a power-law scaling behavior as a function of collision-system size. The exponent, $α$, for the nonprompt component is found to be consistent with 1.1 with no apparent $p_T$ dependence.
△ Less
Submitted 19 April, 2024; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Charm- and Bottom-Quark Production in Au$+$Au Collisions at $\sqrt{s_{_{NN}}}$ = 200 GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship
, et al. (321 additional authors not shown)
Abstract:
The invariant yield of electrons from open-heavy-flavor decays for $1<p_T<8$ GeV/$c$ at midrapidity $|y|<0.35$ in Au$+$Au collisions at $\sqrt{s_{_{NN}}}$ = 200 GeV has been measured by the PHENIX experiment at the Relativistic Heavy Ion Collider. A displaced-vertex analysis with the PHENIX silicon-vertex detector enables extraction of the fraction of charm and bottom hadron decays and unfolding o…
▽ More
The invariant yield of electrons from open-heavy-flavor decays for $1<p_T<8$ GeV/$c$ at midrapidity $|y|<0.35$ in Au$+$Au collisions at $\sqrt{s_{_{NN}}}$ = 200 GeV has been measured by the PHENIX experiment at the Relativistic Heavy Ion Collider. A displaced-vertex analysis with the PHENIX silicon-vertex detector enables extraction of the fraction of charm and bottom hadron decays and unfolding of the invariant yield of parent charm and bottom hadrons. The nuclear-modification factors $R_{AA}$ for electrons from charm and bottom hadron decays and heavy-flavor hadrons show both a centrality and a quark-mass dependence, indicating suppression in the quark-gluon plasma produced in these collisions that is medium sized and quark-mass dependent.
△ Less
Submitted 11 April, 2024; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Low-$p_T$ direct-photon production in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=39$ and 62.4 GeV
Authors:
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
M. Alfred,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
B. Bannier,
K. N. Barish,
B. Bassalleck,
S. Bathe
, et al. (409 additional authors not shown)
Abstract:
The measurement of direct photons from Au$+$Au collisions at $\sqrt{s_{_{NN}}}=39$ and 62.4 GeV in the transverse-momentum range $0.4<p_T<3$ Gev/$c$ is presented by the PHENIX collaboration at the Relativistic Heavy Ion Collider. A significant direct-photon yield is observed in both collision systems. A universal scaling is observed when the direct-photon $p_T$ spectra for different center-of-mass…
▽ More
The measurement of direct photons from Au$+$Au collisions at $\sqrt{s_{_{NN}}}=39$ and 62.4 GeV in the transverse-momentum range $0.4<p_T<3$ Gev/$c$ is presented by the PHENIX collaboration at the Relativistic Heavy Ion Collider. A significant direct-photon yield is observed in both collision systems. A universal scaling is observed when the direct-photon $p_T$ spectra for different center-of-mass energies and for different centrality selections at $\sqrt{s_{_{NN}}}=62.4$ GeV is scaled with $(dN_{\rm ch}/dη)^α$ for $α=1.21{\pm}0.04$. This scaling also holds true for direct-photon spectra from Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV measured earlier by PHENIX, as well as the spectra from Pb$+$Pb at $\sqrt{s_{_{NN}}}=2760$ GeV published by ALICE. The scaling power $α$ seems to be independent of $p_T$, center of mass energy, and collision centrality. The spectra from different collision energies have a similar shape up to $p_T$ of 2 GeV/$c$. The spectra have a local inverse slope $T_{\rm eff}$ increasing with $p_T$ of $0.174\pm0.018$ GeV/$c$ in the range $0.4<p_T<1.3$ GeV/$c$ and increasing to $0.289\pm0.024$ GeV/$c$ for $0.9<p_T<2.1$ GeV/$c$. The observed similarity of low-$p_T$ direct-photon production from $\sqrt{s_{_{NN}}}= 39$ to 2760 GeV suggests a common source of direct photons for the different collision energies and event centrality selections, and suggests a comparable space-time evolution of direct-photon emission.
△ Less
Submitted 24 February, 2023; v1 submitted 23 March, 2022;
originally announced March 2022.
-
Measurements of second-harmonic Fourier coefficients from azimuthal anisotropies in $p$$+$$p$, $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
V. Andrieux,
K. Aoki,
N. Apadula,
H. Asano,
C. Ayuso,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont,
A. Berdnikov,
Y. Berdnikov
, et al. (368 additional authors not shown)
Abstract:
Recently, the PHENIX Collaboration has published second- and third-harmonic Fourier coefficients $v_2$ and $v_3$ for midrapidity ($|η|<0.35$) charged hadrons in 0\%--5\% central $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV utilizing three sets of two-particle correlations for two detector combinations with different pseudorapidity acceptance [Phys. Rev. C {\bf 105},…
▽ More
Recently, the PHENIX Collaboration has published second- and third-harmonic Fourier coefficients $v_2$ and $v_3$ for midrapidity ($|η|<0.35$) charged hadrons in 0\%--5\% central $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV utilizing three sets of two-particle correlations for two detector combinations with different pseudorapidity acceptance [Phys. Rev. C {\bf 105}, 024901 (2022)]. This paper extends these measurements of $v_2$ to all centralities in $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions, as well as $p$$+$$p$ collisions, as a function of transverse momentum ($p_T$) and event multiplicity. The kinematic dependence of $v_2$ is quantified as the ratio $R$ of $v_2$ between the two detector combinations as a function of event multiplicity for $0.5$$<$$p_T$$<$$1$ and $2$$<$$p_T$$<$$2.5$ GeV/$c$. A multiphase-transport (AMPT) model can reproduce the observed $v_2$ in most-central to midcentral $d$$+$Au and $^3$He$+$Au collisions. However, the AMPT model systematically overestimates the measurements in $p$$+$$p$, $p$$+$Au, and peripheral $d$$+$Au and $^3$He$+$Au collisions, indicating a higher nonflow contribution in AMPT than in the experimental data. The AMPT model fails to describe the observed $R$ for $0.5$$<$$p_T$$<$$1$ GeV/$c$, but there is qualitative agreement with the measurements for $2$$<$$p_T$$<$$2.5$ GeV/$c$.
△ Less
Submitted 4 March, 2023; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Study of $φ$-meson production in $p$$+$Al, $p$$+$Au, $d$$+$Au, and $^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau
, et al. (346 additional authors not shown)
Abstract:
Small nuclear collisions are mainly sensitive to cold-nuclear-matter effects; however, the collective behavior observed in these collisions shows a hint of hot-nuclear-matter effects. The identified-particle spectra, especially the $φ$ mesons which contain strange and antistrange quarks and have a relatively small hadronic-interaction cross section, are a good tool to study these effects. The PHEN…
▽ More
Small nuclear collisions are mainly sensitive to cold-nuclear-matter effects; however, the collective behavior observed in these collisions shows a hint of hot-nuclear-matter effects. The identified-particle spectra, especially the $φ$ mesons which contain strange and antistrange quarks and have a relatively small hadronic-interaction cross section, are a good tool to study these effects. The PHENIX experiment has measured $φ$ mesons in a specific set of small collision systems $p$$+$Al, $p$$+$Au, and $^3$He$+$Au, as well as $d$$+$Au [Phys. Rev. C {\bf 83}, 024909 (2011)], at $\sqrt{s_{_{NN}}}=200$ GeV. The transverse-momentum spectra and nuclear-modification factors are presented and compared to theoretical-model predictions. The comparisons with different calculations suggest that quark-gluon plasma may be formed in these small collision systems at $\sqrt{s_{_{NN}}}=200$ GeV. However, the volume and the lifetime of the produced medium may be insufficient for observing strangeness-enhancement and jet-quenching effects. Comparison with calculations suggests that the main production mechanisms of $φ$ mesons at midrapidity may be different in $p$$+$Al versus $p/d/$$^3$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. While thermal quark recombination seems to dominate in $p/d/$$^3$He$+$Au collisions, fragmentation seems to be the main production mechanism in $p$$+$Al collisions.
△ Less
Submitted 26 July, 2022; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Continuous-Time Meta-Learning with Forward Mode Differentiation
Authors:
Tristan Deleu,
David Kanaa,
Leo Feng,
Giancarlo Kerg,
Yoshua Bengio,
Guillaume Lajoie,
Pierre-Luc Bacon
Abstract:
Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differenti…
▽ More
Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous, as opposed to a fixed and discrete number of gradient steps. As a consequence, we can optimize the amount of adaptation necessary to solve a new task using stochastic gradient descent, in addition to learning the initial conditions as is standard practice in gradient-based meta-learning. Importantly, in order to compute the exact meta-gradients required for the outer-loop updates, we devise an efficient algorithm based on forward mode differentiation, whose memory requirements do not scale with the length of the learning trajectory, thus allowing longer adaptation in constant memory. We provide analytical guarantees for the stability of COMLN, we show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Clarifying MCMC-based training of modern EBMs : Contrastive Divergence versus Maximum Likelihood
Authors:
Léo Gagnon,
Guillaume Lajoie
Abstract:
The Energy-Based Model (EBM) framework is a very general approach to generative modeling that tries to learn and exploit probability distributions only defined though unnormalized scores. It has risen in popularity recently thanks to the impressive results obtained in image generation by parameterizing the distribution with Convolutional Neural Networks (CNN). However, the motivation and theoretic…
▽ More
The Energy-Based Model (EBM) framework is a very general approach to generative modeling that tries to learn and exploit probability distributions only defined though unnormalized scores. It has risen in popularity recently thanks to the impressive results obtained in image generation by parameterizing the distribution with Convolutional Neural Networks (CNN). However, the motivation and theoretical foundations behind modern EBMs are often absent from recent papers and this sometimes results in some confusion. In particular, the theoretical justifications behind the popular MCMC-based learning algorithm Contrastive Divergence (CD) are often glossed over and we find that this leads to theoretical errors in recent influential papers (Du & Mordatch, 2019; Du et al., 2020). After offering a first-principles introduction of MCMC-based training, we argue that the learning algorithm they use can in fact not be described as CD and reinterpret theirs methods in light of a new interpretation. Finally, we discuss the implications of our new interpretation and provide some illustrative experiments.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Measurement of Direct-Photon Cross Section and Double-Helicity Asymmetry at $\sqrt{s}=510$ GeV in $\vec{p}+\vec{p}$ Collisions
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
M. Alfred,
N. Apadula,
Y. Aramaki,
H. Asano,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont
, et al. (336 additional authors not shown)
Abstract:
We present measurements of the cross section and double-helicity asymmetry $A_{LL}$ of direct-photon production in $\vec{p}+\vec{p}$ collisions at $\sqrt{s}=510$ GeV. The measurements have been performed at midrapidity ($|η|<0.25$) with the PHENIX detector at the Relativistic Heavy Ion Collider. At relativistic energies, direct photons are dominantly produced from the initial quark-gluon hard scat…
▽ More
We present measurements of the cross section and double-helicity asymmetry $A_{LL}$ of direct-photon production in $\vec{p}+\vec{p}$ collisions at $\sqrt{s}=510$ GeV. The measurements have been performed at midrapidity ($|η|<0.25$) with the PHENIX detector at the Relativistic Heavy Ion Collider. At relativistic energies, direct photons are dominantly produced from the initial quark-gluon hard scattering and do not interact via the strong force at leading order. Therefore, at $\sqrt{s}=510$ GeV, where leading-order-effects dominate, these measurements provide clean and direct access to the gluon helicity in the polarized proton in the gluon-momentum-fraction range $0.02<x<0.08$, with direct sensitivity to the sign of the gluon contribution.
△ Less
Submitted 6 May, 2023; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Measurement of $ψ(2S)$ nuclear modification at backward and forward rapidity in $p$$+$$p$, $p$$+$Al, and $p$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
U. A. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
N. S. Bandara,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj,
V. Bumazhnov
, et al. (291 additional authors not shown)
Abstract:
Suppression of the $J/ψ$ nuclear-modification factor has been seen as a trademark signature of final-state effects in large collision systems for decades. In small systems, the nuclear modification was attributed to cold-nuclear-matter effects until the observation of strong differential suppression of the $ψ(2S)$ state in $p/d$$+$$A$ collisions suggested the presence of final-state effects. Resul…
▽ More
Suppression of the $J/ψ$ nuclear-modification factor has been seen as a trademark signature of final-state effects in large collision systems for decades. In small systems, the nuclear modification was attributed to cold-nuclear-matter effects until the observation of strong differential suppression of the $ψ(2S)$ state in $p/d$$+$$A$ collisions suggested the presence of final-state effects. Results of $J/ψ$ and $ψ(2S)$ measurements in the dimuon decay channel are presented here for $p$$+$$p$, $p$$+$Al, and $p$$+$Au collision systems at $\sqrt{s_{_{NN}}}=200$ GeV. The results are predominantly shown in the form of the nuclear-modification factor, $R_{pA}$, the ratio of the $ψ(2S)$ invariant yield per nucleon-nucleon collision in collisions of proton on target nucleus to that in $p$$+$$p$ collisions. Measurements of the $J/ψ$ and $ψ(2S)$ nuclear-modification factor are compared with shadowing and transport-model predictions, as well as to complementary measurements at Large-Hadron-Collider energies.
△ Less
Submitted 30 June, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Learning shared neural manifolds from multi-subject FMRI data
Authors:
Jessie Huang,
Erica L. Busch,
Tom Wallenstein,
Michal Gerasimiuk,
Andrew Benz,
Guillaume Lajoie,
Guy Wolf,
Nicholas B. Turk-Browne,
Smita Krishnaswamy
Abstract:
Functional magnetic resonance imaging (fMRI) is a notoriously noisy measurement of brain activity because of the large variations between individuals, signals marred by environmental differences during collection, and spatiotemporal averaging required by the measurement resolution. In addition, the data is extremely high dimensional, with the space of the activity typically having much lower intri…
▽ More
Functional magnetic resonance imaging (fMRI) is a notoriously noisy measurement of brain activity because of the large variations between individuals, signals marred by environmental differences during collection, and spatiotemporal averaging required by the measurement resolution. In addition, the data is extremely high dimensional, with the space of the activity typically having much lower intrinsic dimension. In order to understand the connection between stimuli of interest and brain activity, and analyze differences and commonalities between subjects, it becomes important to learn a meaningful embedding of the data that denoises, and reveals its intrinsic structure. Specifically, we assume that while noise varies significantly between individuals, true responses to stimuli will share common, low-dimensional features between subjects which are jointly discoverable. Similar approaches have been exploited previously but they have mainly used linear methods such as PCA and shared response modeling (SRM). In contrast, we propose a neural network called MRMD-AE (manifold-regularized multiple decoder, autoencoder), that learns a common embedding from multiple subjects in an experiment while retaining the ability to decode to individual raw fMRI signals. We show that our learned common space represents an extensible manifold (where new points not seen during training can be mapped), improves the classification accuracy of stimulus features of unseen timepoints, as well as improves cross-subject translation of fMRI signals. We believe this framework can be used for many downstream applications such as guided brain-computer interface (BCI) training in the future.
△ Less
Submitted 22 December, 2021;
originally announced January 2022.
-
Transverse-single-spin asymmetries of charged pions at midrapidity in transversely polarized $p{+}p$ collisions at $\sqrt{s}=200$ GeV
Authors:
U. A. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
N. S. Bandara,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj,
V. Bumazhnov
, et al. (286 additional authors not shown)
Abstract:
In 2015, the PHENIX collaboration has measured single-spin asymmetries for charged pions in transversely polarized proton-proton collisions at the center of mass energy of $\sqrt{s}=200$ GeV. The pions were detected at central rapidities of $|η|<0.35$. The single-spin asymmetries are consistent with zero for each charge individually, as well as consistent with the previously published neutral-pion…
▽ More
In 2015, the PHENIX collaboration has measured single-spin asymmetries for charged pions in transversely polarized proton-proton collisions at the center of mass energy of $\sqrt{s}=200$ GeV. The pions were detected at central rapidities of $|η|<0.35$. The single-spin asymmetries are consistent with zero for each charge individually, as well as consistent with the previously published neutral-pion asymmetries in the same rapidity range. However, they show a slight indication of charge-dependent differences which may suggest a flavor dependence in the underlying mechanisms that create these asymmetries.
△ Less
Submitted 9 February, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Authors:
Mohammad Pezeshki,
Amartya Mitra,
Yoshua Bengio,
Guillaume Lajoie
Abstract:
A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomen…
▽ More
A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Systematic study of nuclear effects in $p$$+$Al, $p$$+$Au, $d$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV using $π^0$ production
Authors:
U. A. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
V. Andrieux,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
N. S. Bandara,
B. Bannier,
K. N. Barish
, et al. (529 additional authors not shown)
Abstract:
The PHENIX collaboration presents a systematic study of $π^0$ production from $p$$+$$p$, $p$$+$Al, $p$$+$Au, $d$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. Measurements were performed with different centrality selections as well as the total inelastic, 0%--100%, selection for all collision systems. For 0%--100% collisions, the nuclear modification factors, $R_{xA}$, are cons…
▽ More
The PHENIX collaboration presents a systematic study of $π^0$ production from $p$$+$$p$, $p$$+$Al, $p$$+$Au, $d$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. Measurements were performed with different centrality selections as well as the total inelastic, 0%--100%, selection for all collision systems. For 0%--100% collisions, the nuclear modification factors, $R_{xA}$, are consistent with unity for $p_T$ above 8 GeV/$c$, but exhibit an enhancement in peripheral collisions and a suppression in central collisions. The enhancement and suppression characteristics are similar for all systems for the same centrality class. It is shown that for high-$p_T$-$π^0$ production, the nucleons in the $d$ and $^3$He interact mostly independently with the Au nucleus and that the counter intuitive centrality dependence is likely due to a physical correlation between multiplicity and the presence of a hard scattering process. These observations disfavor models where parton energy loss has a significant contribution to nuclear modifications in small systems. Nuclear modifications at lower $p_T$ resemble the Cronin effect -- an increase followed by a peak in central or inelastic collisions and a plateau in peripheral collisions. The peak height has a characteristic ordering by system size as $p$$+$Au $>$ $d$$+$Au $>$ $^{3}$He$+$Au $>$ $p$$+$Al. For collisions with Au ions, current calculations based on initial state cold nuclear matter effects result in the opposite order, suggesting the presence of other contributions to nuclear modifications, in particular at lower $p_T$.
△ Less
Submitted 6 June, 2022; v1 submitted 10 November, 2021;
originally announced November 2021.
-
Compositional Attention: Disentangling Search and Retrieval
Authors:
Sarthak Mittal,
Sharath Chandra Raparthy,
Irina Rish,
Yoshua Bengio,
Guillaume Lajoie
Abstract:
Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected e…
▽ More
Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid map** between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.
△ Less
Submitted 13 February, 2022; v1 submitted 18 October, 2021;
originally announced October 2021.
-
Transverse single spin asymmetries of forward neutrons in $p$$+$$p$, $p$$+$Al, and $p$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV as a function of transverse and longitudinal momenta
Authors:
U. A. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
N. S. Bandara,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj,
V. Bumazhnov
, et al. (286 additional authors not shown)
Abstract:
In 2015 the PHENIX collaboration at the Relativistic Heavy Ion Collider recorded $p$$+$$p$, $p$$+$Al, and $p$$+$Au collision data at center of mass energies of $\sqrt{s_{_{NN}}}=200$ GeV with the proton beam(s) transversely polarized. At very forward rapidities $η>6.8$ relative to the polarized proton beam, neutrons were detected either inclusively or in (anti)correlation with detector activity re…
▽ More
In 2015 the PHENIX collaboration at the Relativistic Heavy Ion Collider recorded $p$$+$$p$, $p$$+$Al, and $p$$+$Au collision data at center of mass energies of $\sqrt{s_{_{NN}}}=200$ GeV with the proton beam(s) transversely polarized. At very forward rapidities $η>6.8$ relative to the polarized proton beam, neutrons were detected either inclusively or in (anti)correlation with detector activity related to hard collisions. The resulting single spin asymmetries, that were previously reported, have now been extracted as a function of the transverse momentum of the neutron as well as its longitudinal momentum fraction $x_F$. The explicit kinematic dependence, combined with the correlation information allows for a closer look at the interplay of different mechanisms suggested to describe these asymmetries, such as hadronic interactions or electromagnetic interactions in ultra-peripheral collisions, UPC. Events that are correlated with a hard collision indeed display a mostly negative asymmetry that increases in magnitude as a function of transverse momentum with only little dependence on $x_F$. In contrast, events that are not likely to have emerged from a hard collision display positive asymmetries for the nuclear collisions with a kinematic dependence that resembles that of a UPC based model. Because the UPC interaction depends strongly on the charge of the nucleus, those effects are very small for $p$$+$$p$ collisions, moderate for $p$$+$Al collisions, and large for $p$$+$Au collisions.
△ Less
Submitted 9 February, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance
Authors:
Alexander Tong,
Guillaume Huguet,
Dennis Shung,
Amine Natik,
Manik Kuchroo,
Guillaume Lajoie,
Guy Wolf,
Smita Krishnaswamy
Abstract:
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observations in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying…
▽ More
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observations in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space. However, this is inefficient when computing the EMD between many signals. Here, we propose an unbalanced graph EMD that efficiently embeds the unbalanced EMD on an underlying graph into an $L^1$ space, whose metric we call unbalanced diffusion earth mover's distance (UDEMD). Next, we show how this gives distances between graph signals that are robust to noise. Finally, we apply this to organizing patients based on clinical notes, embedding cells modeled as signals on a gene graph, and organizing genes modeled as signals over a large cell graph. In each case, we show that UDEMD-based embeddings find accurate distances that are highly efficient compared to other methods.
△ Less
Submitted 28 March, 2022; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Kinematic dependence of azimuthal anisotropies in $p$$+$Au, $d$$+$Au, $^3$He+Au at $\sqrt{s_{_{NN}}}$ = 200 GeV
Authors:
U. A. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
V. Andrieux,
K. Aoki,
N. Apadula,
H. Asano,
C. Ayuso,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon
, et al. (360 additional authors not shown)
Abstract:
There is strong evidence for the formation of small droplets of quark-gluon plasma in $p/d/^{3}$He+Au collisions at the Relativistic Heavy Ion Collider (RHIC) and in $p$+$p$/Pb collisions at the Large Hadron Collider. In particular, the analysis of data at RHIC for different geometries obtained by varying the projectile size and shape has proven insightful. In the present analysis, we find excelle…
▽ More
There is strong evidence for the formation of small droplets of quark-gluon plasma in $p/d/^{3}$He+Au collisions at the Relativistic Heavy Ion Collider (RHIC) and in $p$+$p$/Pb collisions at the Large Hadron Collider. In particular, the analysis of data at RHIC for different geometries obtained by varying the projectile size and shape has proven insightful. In the present analysis, we find excellent agreement with the previously published PHENIX at RHIC results on elliptical and triangular flow with an independent analysis via the two-particle correlation method, which has quite different systematic uncertainties and an independent code base. In addition, the results are extended to other detector combinations with different kinematic (pseudorapidity) coverage. These results provide additional constraints on contributions from nonflow and longitudinal decorrelations.
△ Less
Submitted 3 February, 2022; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning
Authors:
Nan Rosemary Ke,
Aniket Didolkar,
Sarthak Mittal,
Anirudh Goyal,
Guillaume Lajoie,
Stefan Bauer,
Danilo Rezende,
Yoshua Bengio,
Michael Mozer,
Christopher Pal
Abstract:
Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables,…
▽ More
Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Efficient and robust multi-task learning in the brain with modular latent primitives
Authors:
Christian David Márton,
Léo Gagnon,
Guillaume Lajoie,
Kanaka Rajan
Abstract:
Biological agents do not have infinite resources to learn new things. For this reason, a central aspect of human learning is the ability to recycle previously acquired knowledge in a way that allows for faster, less resource-intensive acquisition of new skills. In spite of that, how neural networks in the brain leverage existing knowledge to learn new computations is not well understood. In this w…
▽ More
Biological agents do not have infinite resources to learn new things. For this reason, a central aspect of human learning is the ability to recycle previously acquired knowledge in a way that allows for faster, less resource-intensive acquisition of new skills. In spite of that, how neural networks in the brain leverage existing knowledge to learn new computations is not well understood. In this work, we study this question in artificial recurrent neural networks (RNNs) trained on a corpus of commonly used neuroscience tasks. Combining brain-inspired inductive biases we call functional and structural, we propose a system that learns new tasks by building on top of pre-trained latent dynamics organised into separate recurrent modules. These modules, acting as prior knowledge acquired previously through evolution or development, are pre-trained on the statistics of the full corpus of tasks so as to be independent and maximally informative. The resulting model, we call a Modular Latent Primitives (MoLaP) network, allows for learning multiple tasks while kee** parameter counts, and updates, low. We also show that the skills acquired with our approach are more robust to a broad range of perturbations compared to those acquired with other multi-task learning strategies, and that generalisation to new tasks is facilitated. This work offers a new perspective on achieving efficient multi-task learning in the brain, illustrating the benefits of leveraging pre-trained latent dynamical primitives.
△ Less
Submitted 25 May, 2022; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Probing gluon spin-momentum correlations in transversely polarized protons through midrapidity isolated direct photons in $p^\uparrow+p$ collisions at $\sqrt{s}=200$ GeV
Authors:
U. A. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
V. Andrieux,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
N. S. Bandara,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
J. S. Bok,
M. L. Brooks,
J. Bryslawskyj,
V. Bumazhnov,
S. Campbell
, et al. (286 additional authors not shown)
Abstract:
Studying spin-momentum correlations in hadronic collisions offers a glimpse into a three-dimensional picture of proton structure. The transverse single-spin asymmetry for midrapidity isolated direct photons in $p^\uparrow+p$ collisions at $\sqrt{s}=200$ GeV is measured with the PHENIX detector at the Relativistic Heavy Ion Collider (RHIC). Because direct photons in particular are produced from the…
▽ More
Studying spin-momentum correlations in hadronic collisions offers a glimpse into a three-dimensional picture of proton structure. The transverse single-spin asymmetry for midrapidity isolated direct photons in $p^\uparrow+p$ collisions at $\sqrt{s}=200$ GeV is measured with the PHENIX detector at the Relativistic Heavy Ion Collider (RHIC). Because direct photons in particular are produced from the hard scattering and do not interact via the strong force, this measurement is a clean probe of initial-state spin-momentum correlations inside the proton and is in particular sensitive to gluon interference effects within the proton. This is the first time direct photons have been used as a probe of spin-momentum correlations at RHIC. The uncertainties on the results are a fifty-fold improvement with respect to those of the one prior measurement for the same observable, from the Fermilab E704 experiment. These results constrain gluon spin-momentum correlations in transversely polarized protons.
△ Less
Submitted 20 August, 2021; v1 submitted 26 February, 2021;
originally announced February 2021.
-
Exploring the Geometry and Topology of Neural Network Loss Landscapes
Authors:
Stefan Horoi,
Jessie Huang,
Bastian Rieck,
Guillaume Lajoie,
Guy Wolf,
Smita Krishnaswamy
Abstract:
Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have…
▽ More
Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel "jump and retrain" procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network's ability to generalize. Next, we show that non-linear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifold-preserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences.
△ Less
Submitted 26 January, 2022; v1 submitted 31 January, 2021;
originally announced February 2021.