-
In-Context In-Context Learning with Transformer Neural Processes
Authors:
Matthew Ashman,
Cristiana Diaconu,
Adrian Weller,
Richard E. Turner
Abstract:
Neural processes (NPs) are a powerful family of meta-learning models that seek to approximate the posterior predictive map of the ground-truth stochastic process from which each dataset in a meta-dataset is sampled. There are many cases in which practitioners, besides having access to the dataset of interest, may also have access to other datasets that share similarities with it. In this case, int…
▽ More
Neural processes (NPs) are a powerful family of meta-learning models that seek to approximate the posterior predictive map of the ground-truth stochastic process from which each dataset in a meta-dataset is sampled. There are many cases in which practitioners, besides having access to the dataset of interest, may also have access to other datasets that share similarities with it. In this case, integrating these datasets into the NP can improve predictions. We equip NPs with this functionality and describe this paradigm as in-context in-context learning. Standard NP architectures, such as the convolutional conditional NP (ConvCNP) or the family of transformer neural processes (TNPs), are not capable of in-context in-context learning, as they are only able to condition on a single dataset. We address this shortcoming by develo** the in-context in-context learning pseudo-token TNP (ICICL-TNP). The ICICL-TNP builds on the family of PT-TNPs, which utilise pseudo-token-based transformer architectures to sidestep the quadratic computational complexity associated with regular transformer architectures. Importantly, the ICICL-TNP is capable of conditioning on both sets of datapoints and sets of datasets, enabling it to perform in-context in-context learning. We demonstrate the importance of in-context in-context learning and the effectiveness of the ICICL-TNP in a number of experiments.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Approximately Equivariant Neural Processes
Authors:
Matthew Ashman,
Cristiana Diaconu,
Adrian Weller,
Wessel Bruinsma,
Richard E. Turner
Abstract:
Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topogr…
▽ More
Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topographical features like mountains break translation equivariance. In these scenarios, it is desirable to construct architectures that can flexibly depart from exact equivariance in a data-driven way. In this paper, we develop a general approach to achieving this using existing equivariant architectures. Our approach is agnostic to both the choice of symmetry group and model architecture, making it widely applicable. We consider the use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models. We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, demonstrating that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
von Mises Quasi-Processes for Bayesian Circular Regression
Authors:
Yarden Cohen,
Alexandre Khae Wu Navarro,
Jes Frellsen,
Richard E. Turner,
Raziel Riemer,
Ari Pakman
Abstract:
The need for regression models to predict circular values arises in many scientific fields. In this work we explore a family of expressive and interpretable distributions over circle-valued random functions related to Gaussian processes targeting two Euclidean dimensions conditioned on the unit circle. The resulting probability model has connections with continuous spin models in statistical physi…
▽ More
The need for regression models to predict circular values arises in many scientific fields. In this work we explore a family of expressive and interpretable distributions over circle-valued random functions related to Gaussian processes targeting two Euclidean dimensions conditioned on the unit circle. The resulting probability model has connections with continuous spin models in statistical physics. Moreover, its density is very simple and has maximum-entropy, unlike previous Gaussian process-based approaches, which use wrap** or radial marginalization. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling. We argue that transductive learning in these models favors a Bayesian approach to the parameters. We present experiments applying this model to the prediction of (i) wind directions and (ii) the percentage of the running gait cycle as a function of joint angles.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Translation Equivariant Transformer Neural Processes
Authors:
Matthew Ashman,
Cristiana Diaconu,
Junhyuck Kim,
Lakee Sivaraya,
Stratis Markou,
James Requeima,
Wessel P. Bruinsma,
Richard E. Turner
Abstract:
The effectiveness of neural processes (NPs) in modelling posterior prediction maps -- the map** from data to posterior predictive distributions -- has significantly improved since their inception. This improvement can be attributed to two principal factors: (1) advancements in the architecture of permutation invariant set functions, which are intrinsic to all NPs; and (2) leveraging symmetries p…
▽ More
The effectiveness of neural processes (NPs) in modelling posterior prediction maps -- the map** from data to posterior predictive distributions -- has significantly improved since their inception. This improvement can be attributed to two principal factors: (1) advancements in the architecture of permutation invariant set functions, which are intrinsic to all NPs; and (2) leveraging symmetries present in the true posterior predictive map, which are problem dependent. Transformers are a notable development in permutation invariant set functions, and their utility within NPs has been demonstrated through the family of models we refer to as TNPs. Despite significant interest in TNPs, little attention has been given to incorporating symmetries. Notably, the posterior prediction maps for data that are stationary -- a common assumption in spatio-temporal modelling -- exhibit translation equivariance. In this paper, we introduce of a new family of translation equivariant TNPs that incorporate translation equivariance. Through an extensive range of experiments on synthetic and real-world spatio-temporal data, we demonstrate the effectiveness of TE-TNPs relative to their non-translation-equivariant counterparts and other NP baselines.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Noise-Aware Differentially Private Regression via Meta-Learning
Authors:
Ossi Räisä,
Stratis Markou,
Matthew Ashman,
Wessel P. Bruinsma,
Marlon Tobaben,
Antti Honkela,
Richard E. Turner
Abstract:
Many high-stakes applications require machine learning models that protect user privacy and provide well-calibrated, accurate predictions. While Differential Privacy (DP) is the gold standard for protecting user privacy, standard DP mechanisms typically significantly impair performance. One approach to mitigating this issue is pre-training models on simulated data before DP learning on the private…
▽ More
Many high-stakes applications require machine learning models that protect user privacy and provide well-calibrated, accurate predictions. While Differential Privacy (DP) is the gold standard for protecting user privacy, standard DP mechanisms typically significantly impair performance. One approach to mitigating this issue is pre-training models on simulated data before DP learning on the private data. In this work we go a step further, using simulated data to train a meta-learning model that combines the Convolutional Conditional Neural Process (ConvCNP) with an improved functional DP mechanism of Hall et al. [2013] yielding the DPConvCNP. DPConvCNP learns from simulated data how to map private data to a DP predictive model in one forward pass, and then provides accurate, well-calibrated predictions. We compare DPConvCNP with a DP Gaussian Process (GP) baseline with carefully tuned hyperparameters. The DPConvCNP outperforms the GP baseline, especially on non-Gaussian data, yet is much faster at test time and requires less tuning.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Fearless Stochasticity in Expectation Propagation
Authors:
Jonathan So,
Richard E. Turner
Abstract:
Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in d…
▽ More
Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in different ways. In this work, we provide a novel perspective on the moment-matching updates of EP; namely, that they perform natural-gradient-based optimisation of a variational objective. We use this insight to motivate two new EP variants, with updates that are particularly well-suited to MC estimation; they remain stable and are most sample-efficient when estimated with just a single sample. These new variants combine the benefits of their predecessors and address key weaknesses. In particular, they are easier to tune, offer an improved speed-accuracy trade-off, and do not rely on the use of debiasing estimators. We demonstrate their efficacy on a variety of probabilistic inference tasks.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
The Pulsar Science Collaboratory: Multi-Epoch Scintillation Studies of Pulsars
Authors:
Jacob E. Turner,
Juan G. Lebron Medina,
Zachary Zelensky,
Kathleen A. Gustavso,
Jeffrey Marx,
Manvith Kothapalli,
Luis D. Cruz Vega,
Alexander Lee,
Caryelis B. Figueroa,
Daniel E. Reichart,
Joshua B. Haislip,
Vladimir V. Kouprianov,
Steve White,
Frank Ghigo,
Sue Ann Heatherly,
Maura A. McLaughlin
Abstract:
We report on findings from scintillation analyses using high-cadence observations of nine canonical pulsars with observing baselines ranging from one to three years. We obtain scintillation bandwidth and timescale measurements for all pulsars in our survey and obtain scintillation arc curvature measurements for four pulsars, detecting multiple arcs for two of them. Using updated pulsar distance es…
▽ More
We report on findings from scintillation analyses using high-cadence observations of nine canonical pulsars with observing baselines ranging from one to three years. We obtain scintillation bandwidth and timescale measurements for all pulsars in our survey and obtain scintillation arc curvature measurements for four pulsars, detecting multiple arcs for two of them. Using updated pulsar distance estimates, we find evidence of previously undocumented scattering screens along the line of sight (LOS) of PSRs J1645$-$0317 and J2022$+$5154, as well as evidence that one of the arcs along the LOS to PSR J2313$+$4253 may reside somewhere within the Orion-Cygnus arm of the Milky Way. By augmenting the results of previous studies, we find general agreement with estimations of scattering delays from pulsar observations and those predicted by the NE2001 electron density model. In a similar manner, we find additional evidence of a correlation between a pulsar's dispersion measure and the overall variability of its scattering delays over time. The plethora of interesting science obtained through these observations demonstrates the capabilities of the Green Bank Observatory's 20m telescope to contribute to pulsar-based studies of the interstellar medium.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Variance-Reducing Couplings for Random Features: Perspectives from Optimal Transport
Authors:
Isaac Reid,
Stratis Markou,
Krzysztof Choromanski,
Richard E. Turner,
Adrian Weller
Abstract:
Random features (RFs) are a popular technique to scale up kernel methods in machine learning, replacing exact kernel evaluations with stochastic Monte Carlo estimates. They underpin models as diverse as efficient transformers (by approximating attention) to sparse spectrum Gaussian processes (by approximating the covariance function). Efficiency can be further improved by speeding up the convergen…
▽ More
Random features (RFs) are a popular technique to scale up kernel methods in machine learning, replacing exact kernel evaluations with stochastic Monte Carlo estimates. They underpin models as diverse as efficient transformers (by approximating attention) to sparse spectrum Gaussian processes (by approximating the covariance function). Efficiency can be further improved by speeding up the convergence of these estimates: a variance reduction problem. We tackle this through the unifying framework of optimal transport, using theoretical insights and numerical algorithms to develop novel, high-performing RF couplings for kernels defined on Euclidean and discrete input spaces. They enjoy concrete theoretical performance guarantees and sometimes provide strong empirical downstream gains, including for scalable approximate inference on graphs. We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Aurora: A Foundation Model of the Atmosphere
Authors:
Cristian Bodnar,
Wessel P. Bruinsma,
Ana Lucic,
Megan Stanley,
Johannes Brandstetter,
Patrick Garvan,
Maik Riechert,
Jonathan Weyn,
Haiyu Dong,
Anna Vaughan,
Jayesh K. Gupta,
Kit Tambiratnam,
Alex Archibald,
Elizabeth Heider,
Max Welling,
Richard E. Turner,
Paris Perdikaris
Abstract:
Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-sc…
▽ More
Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data. Aurora leverages the strengths of the foundation modelling approach to produce operational forecasts for a wide variety of atmospheric prediction problems, including those with limited training data, heterogeneous variables, and extreme events. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools and the best specialized deep learning models. Taken together, these results indicate that foundation models can transform environmental forecasting.
△ Less
Submitted 28 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language
Authors:
James Requeima,
John Bronskill,
Dami Choi,
Richard E. Turner,
David Duvenaud
Abstract:
Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed to integrate this prior knowledge into probabilistic modeling typically limits the application of these models to specialists. Our goal is to build a regressio…
▽ More
Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed to integrate this prior knowledge into probabilistic modeling typically limits the application of these models to specialists. Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations, guided by natural language text which describes a user's prior knowledge. Large Language Models (LLMs) provide a useful starting point for designing such a tool since they 1) provide an interface where users can incorporate expert insights in natural language and 2) provide an opportunity for leveraging latent problem-relevant knowledge encoded in LLMs that users may not have themselves. We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from LLMs. We examine these joint predictive distributions, which we call LLM Processes, over arbitrarily-many quantities in settings such as forecasting, multi-dimensional regression, black-box optimization, and image modeling. We investigate the practical details of prompting to elicit coherent predictive distributions, and demonstrate their effectiveness at regression. Finally, we demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions. This lets us begin to explore the rich, grounded hypothesis space that LLMs implicitly encode.
△ Less
Submitted 25 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
A Cyclic Spectroscopy Scintillation Study of PSR B1937+21 I. Demonstration of Improved Scintillometry
Authors:
Jacob E. Turner,
Timothy Dolch,
James M. Cordes,
Stella K. Ocker,
Daniel R. Stinebring,
Shami Chatterjee,
Maura A. McLaughlin,
Victoria E. Catlett,
Cody Jessup,
Nathaniel Jones,
Christopher Scheithauer
Abstract:
We use cyclic spectroscopy to perform high frequency-resolution analyses of multi-hour baseband Arecibo observations of the millisecond pulsar PSR B1937+21. This technique allows for the examination of scintillation features in far greater detail than is otherwise possible under most pulsar timing array observing setups. We measure scintillation bandwidths and timescales in each of eight subbands…
▽ More
We use cyclic spectroscopy to perform high frequency-resolution analyses of multi-hour baseband Arecibo observations of the millisecond pulsar PSR B1937+21. This technique allows for the examination of scintillation features in far greater detail than is otherwise possible under most pulsar timing array observing setups. We measure scintillation bandwidths and timescales in each of eight subbands across a 200 MHz observing band in each observation. Through these measurements we obtain intra-epoch estimates of the frequency scalings for scintillation bandwidth and timescale.Thanks to our high frequency resolution and the narrow scintles of this pulsar, we resolve scintillation arcs in the secondary spectra due to the increased Nyquist limit, which would not have been resolved at the same observing frequency with a traditional filterbank spectrum using NANOGrav's current time and frequency resolutions, and the frequency-dependent evolution of scintillation arc features within individual observations. We observe the dimming of prominent arc features at higher frequencies, possibly due to a combination of decreasing flux density and the frequency dependence of the plasma refractive index of the interstellar medium. We also find agreement with arc curvature frequency dependence predicted by Stinebring et al. (2001) in some epochs. Thanks to the frequency resolution improvement provided by cyclic spectroscopy, these results show strong promise for future such analyses with millisecond pulsars, particularly for pulsar timing arrays, where such techniques can allow for detailed studies of the interstellar medium in highly scattered pulsars without sacrificing the timing resolution that is crucial to their gravitational wave detection efforts.
△ Less
Submitted 21 June, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
The NANOGrav 15 yr Data Set: Looking for Signs of Discreteness in the Gravitational-wave Background
Authors:
Gabriella Agazie,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Lucas Brown,
Sarah Burke-Spolaor,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Megan E. DeCesar,
Paul B. Demorest,
Heling Deng,
Timothy Dolch,
Elizabeth C. Ferrara,
William Fiore,
Emmanuel Fonseca,
Gabriel E. Freedman,
Nate Garver-Daniels
, et al. (58 additional authors not shown)
Abstract:
The cosmic merger history of supermassive black hole binaries (SMBHBs) is expected to produce a low-frequency gravitational wave background (GWB). Here we investigate how signs of the discrete nature of this GWB can manifest in pulsar timing arrays through excursions from, and breaks in, the expected $f_{\mathrm{GW}}^{-2/3}$ power-law of the GWB strain spectrum. To do this, we create a semi-analyt…
▽ More
The cosmic merger history of supermassive black hole binaries (SMBHBs) is expected to produce a low-frequency gravitational wave background (GWB). Here we investigate how signs of the discrete nature of this GWB can manifest in pulsar timing arrays through excursions from, and breaks in, the expected $f_{\mathrm{GW}}^{-2/3}$ power-law of the GWB strain spectrum. To do this, we create a semi-analytic SMBHB population model, fit to NANOGrav's 15 yr GWB amplitude, and with 1,000 realizations we study the populations' characteristic strain and residual spectra. Comparing our models to the NANOGrav 15 yr spectrum, we find two interesting excursions from the power-law. The first, at $2 \; \mathrm{nHz}$, is below our GWB realizations with $p$-value significance $p = 0.05$ to $0.06$ ($\approx 1.8 σ- 1.9 σ$). The second, at $16 \; \mathrm{nHz}$, is above our GWB realizations with $p = 0.04$ to $0.15$ ($\approx 1.4 σ- 2.1 σ$). We explore the properties of a loud SMBHB which could cause such an excursion. Our simulations also show that the expected number of SMBHBs decreases by three orders of magnitude, from $\sim 10^6$ to $\sim 10^3$, between $2\; \mathrm{nHz}$ and $20 \; \mathrm{nHz}$. This causes a break in the strain spectrum as the stochasticity of the background breaks down at $26^{+28}_{-19} \; \mathrm{nHz}$, consistent with predictions pre-dating GWB measurements. The diminished GWB signal from SMBHBs at frequencies above the $26$~nHz break opens a window for PTAs to detect continuous GWs from individual SMBHBs or GWs from the early universe.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
End-to-end data-driven weather forecasting
Authors:
Anna Vaughan,
Stratis Markou,
Will Tebbutt,
James Requeima,
Wessel P. Bruinsma,
Tom R. Andersson,
Michael Herzog,
Nicholas D. Lane,
Matthew Chantry,
J. Scott Hosking,
Richard E. Turner
Abstract:
Weather forecasting is critical for a range of human activities including transportation, agriculture, industry, as well as the safety of the general public. Machine learning models have the potential to transform the complex weather prediction pipeline, but current approaches still rely on numerical weather prediction (NWP) systems, limiting forecast speed and accuracy. Here we demonstrate that a…
▽ More
Weather forecasting is critical for a range of human activities including transportation, agriculture, industry, as well as the safety of the general public. Machine learning models have the potential to transform the complex weather prediction pipeline, but current approaches still rely on numerical weather prediction (NWP) systems, limiting forecast speed and accuracy. Here we demonstrate that a machine learning model can replace the entire operational NWP pipeline. Aardvark Weather, an end-to-end data-driven weather prediction system, ingests raw observations and outputs global gridded forecasts and local station forecasts. Further, it can be optimised end-to-end to maximise performance over quantities of interest. Global forecasts outperform an operational NWP baseline for multiple variables and lead times. Local station forecasts are skillful up to ten days lead time and achieve comparable and often lower errors than a post-processed global NWP baseline and a state-of-the-art end-to-end forecasting system with input from human forecasters. These forecasts are produced with a remarkably simple neural process model using just 8\% of the input data and three orders of magnitude less compute than existing NWP and hybrid AI-NWP methods. We anticipate that Aardvark Weather will be the starting point for a new generation of end-to-end machine learning models for medium-range forecasting that will reduce computational costs by orders of magnitude and enable the rapid and cheap creation of bespoke models for users in a variety of fields, including for the develo** world where state-of-the-art local models are not currently available.
△ Less
Submitted 10 July, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
SportsNGEN: Sustained Generation of Multi-player Sports Gameplay
Authors:
Lachlan Thorpe,
Lewis Bawden,
Karanjot Vendal,
John Bronskill,
Richard E. Turner
Abstract:
We present a transformer decoder based model, SportsNGEN, that is trained on sports player and ball tracking sequences that is capable of generating realistic and sustained gameplay. We train and evaluate SportsNGEN on a large database of professional tennis tracking data and demonstrate that by combining the generated simulations with a shot classifier and logic to start and end rallies, the syst…
▽ More
We present a transformer decoder based model, SportsNGEN, that is trained on sports player and ball tracking sequences that is capable of generating realistic and sustained gameplay. We train and evaluate SportsNGEN on a large database of professional tennis tracking data and demonstrate that by combining the generated simulations with a shot classifier and logic to start and end rallies, the system is capable of simulating an entire tennis match. In addition, a generic version of SportsNGEN can be customized to a specific player by fine-tuning on match data that includes that player. We show that our model is well calibrated and can be used to derive insights for coaches and broadcasters by evaluating counterfactual or what if options. Finally, we show qualitative results indicating the same approach works for football.
△ Less
Submitted 9 February, 2024;
originally announced March 2024.
-
A Generative Model of Symmetry Transformations
Authors:
James Urquhart Allingham,
Bruno Kacper Mlodozeniec,
Shreyas Padhy,
Javier Antorán,
David Krueger,
Richard E. Turner,
Eric Nalisnick,
José Miguel Hernández-Lobato
Abstract:
Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge. While recent advancements have been made in learning those symmetries directly from the dataset, most of this work has focused on the discriminative setting. In this paper, we take inspiration from grou…
▽ More
Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge. While recent advancements have been made in learning those symmetries directly from the dataset, most of this work has focused on the discriminative setting. In this paper, we take inspiration from group theoretic ideas to construct a generative model that explicitly aims to capture the data's approximate symmetries. This results in a model that, given a prespecified broad set of possible symmetries, learns to what extent, if at all, those symmetries are actually present. Our model can be seen as a generative process for data augmentation. We provide a simple algorithm for learning our generative model and empirically demonstrate its ability to capture symmetries under affine and color transformations, in an interpretable way. Combining our symmetry model with standard generative models results in higher marginal test-log-likelihoods and improved data efficiency.
△ Less
Submitted 20 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Denoising Diffusion Probabilistic Models in Six Simple Steps
Authors:
Richard E. Turner,
Cristiana-Diana Diaconu,
Stratis Markou,
Aliaksandra Shysheya,
Andrew Y. K. Foong,
Bruno Mlodozeniec
Abstract:
Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, co…
▽ More
Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, comprehensive, clean and clear. The compact explanations necessary in research papers are not able to elucidate all of the different design steps taken to formulate the DDPM and the rationale of the steps that are presented is often omitted to save space. Moreover, the expositions are typically presented from the variational lower bound perspective which is unnecessary and arguably harmful as it obfuscates why the method is working and suggests generalisations that do not perform well in practice. On the other hand, perspectives that take the continuous time-limit are beautiful and general, but they have a high barrier-to-entry as they require background knowledge of stochastic differential equations and probability flow. In this note, we distill down the formulation of the DDPM into six simple steps each of which comes with a clear rationale. We assume that the reader is familiar with fundamental topics in machine learning including basic probabilistic modelling, Gaussian distributions, maximum likelihood estimation, and deep learning.
△ Less
Submitted 10 February, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Authors:
Wu Lin,
Felix Dangel,
Runa Eschenhagen,
Juhan Bae,
Richard E. Turner,
Alireza Makhzani
Abstract:
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental differen…
▽ More
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental difference. In this work, we investigate how the behavior of adaptive methods changes when we remove the root, i.e.,strengthen their second-order motivation. Surprisingly, we find that such square-root-free adaptive methods close the generalization gap to SGD on convolutional architectures, while maintaining their root-based counterpart's performance on transformers. The second-order perspective also has practical benefits for develo** non-diagonal methods that can incorporate arbitrary curvature approximations through the concept of preconditioner invariance. In contrast to root-based methods like Shampoo, root-free counterparts work well and fast with half-precision since they do not require numerically unstable matrix root decompositions and inversions. Overall, our findings provide new insights into the development of adaptive methods and raise important questions regarding the overlooked role of adaptivity in their success. (experiment code: https://github.com/yorkerlin/remove-the-square-root optimizer code: https://github.com/f-dangel/sirfshampoo)
△ Less
Submitted 20 June, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Transformer Neural Autoregressive Flows
Authors:
Massimiliano Patacchiola,
Aliaksandra Shysheya,
Katja Hofmann,
Richard E. Turner
Abstract:
Density estimation, a central problem in machine learning, can be performed using Normalizing Flows (NFs). NFs comprise a sequence of invertible transformations, that turn a complex target distribution into a simple one, by exploiting the change of variables theorem. Neural Autoregressive Flows (NAFs) and Block Neural Autoregressive Flows (B-NAFs) are arguably the most perfomant members of the NF…
▽ More
Density estimation, a central problem in machine learning, can be performed using Normalizing Flows (NFs). NFs comprise a sequence of invertible transformations, that turn a complex target distribution into a simple one, by exploiting the change of variables theorem. Neural Autoregressive Flows (NAFs) and Block Neural Autoregressive Flows (B-NAFs) are arguably the most perfomant members of the NF family. However, they suffer scalability issues and training instability due to the constraints imposed on the network structure. In this paper, we propose a novel solution to these challenges by exploiting transformers to define a new class of neural flows called Transformer Neural Autoregressive Flows (T-NAFs). T-NAFs treat each dimension of a random variable as a separate input token, using attention masking to enforce an autoregressive constraint. We take an amortization-inspired approach where the transformer outputs the parameters of an invertible transformation. The experimental results demonstrate that T-NAFs consistently match or outperform NAFs and B-NAFs across multiple datasets from the UCI benchmark. Remarkably, T-NAFs achieve these results using an order of magnitude fewer parameters than previous approaches, without composing multiple flows.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
Authors:
Wu Lin,
Felix Dangel,
Runa Eschenhagen,
Kirill Neklyudov,
Agustinus Kristiadi,
Richard E. Turner,
Alireza Makhzani
Abstract:
Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-fre…
▽ More
Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.
△ Less
Submitted 15 June, 2024; v1 submitted 9 December, 2023;
originally announced December 2023.
-
Identifiable Feature Learning for Spatial Data with Nonlinear ICA
Authors:
Hermanni Hälvä,
Jonathan So,
Richard E. Turner,
Aapo Hyvärinen
Abstract:
Recently, nonlinear ICA has surfaced as a popular alternative to the many heuristic models used in deep representation learning and disentanglement. An advantage of nonlinear ICA is that a sophisticated identifiability theory has been developed; in particular, it has been proven that the original components can be recovered under sufficiently strong latent dependencies. Despite this general theory…
▽ More
Recently, nonlinear ICA has surfaced as a popular alternative to the many heuristic models used in deep representation learning and disentanglement. An advantage of nonlinear ICA is that a sophisticated identifiability theory has been developed; in particular, it has been proven that the original components can be recovered under sufficiently strong latent dependencies. Despite this general theory, practical nonlinear ICA algorithms have so far been mainly limited to data with one-dimensional latent dependencies, especially time-series data. In this paper, we introduce a new nonlinear ICA framework that employs $t$-process (TP) latent components which apply naturally to data with higher-dimensional dependency structures, such as spatial and spatio-temporal data. In particular, we develop a new learning and inference algorithm that extends variational inference methods to handle the combination of a deep neural network mixing function with the TP prior, and employs the method of inducing points for computational efficacy. On the theoretical side, we show that such TP independent components are identifiable under very general conditions. Further, Gaussian Process (GP) nonlinear ICA is established as a limit of the TP Nonlinear ICA model, and we prove that the identifiability of the latent components at this GP limit is more restricted. Namely, those components are identifiable if and only if they have distinctly different covariance kernels. Our algorithm and identifiability theorems are explored on simulated spatial data and real world spatio-temporal data.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Diffusion-Augmented Neural Processes
Authors:
Lorenzo Bonito,
James Requeima,
Aliaksandra Shysheya,
Richard E. Turner
Abstract:
Over the last few years, Neural Processes have become a useful modelling tool in many application areas, such as healthcare and climate sciences, in which data are scarce and prediction uncertainty estimates are indispensable. However, the current state of the art in the field (AR CNPs; Bruinsma et al., 2023) presents a few issues that prevent its widespread deployment. This work proposes an alter…
▽ More
Over the last few years, Neural Processes have become a useful modelling tool in many application areas, such as healthcare and climate sciences, in which data are scarce and prediction uncertainty estimates are indispensable. However, the current state of the art in the field (AR CNPs; Bruinsma et al., 2023) presents a few issues that prevent its widespread deployment. This work proposes an alternative, diffusion-based approach to NPs which, through conditioning on noised datasets, addresses many of these limitations, whilst also exceeding SOTA performance.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
The context-specificity of virulence evolution revealed through evolutionary invasion analysis
Authors:
Sudam Surasinghe,
Ketty Kabengele,
Paul E. Turner,
C. Brandon Ogbunugafor
Abstract:
Models are often employed to integrate knowledge about epidemics across scales and simulate disease dynamics. While these approaches have played a central role in studying the mechanics underlying epidemics, we lack ways to reliably predict how the relationship between virulence (the harm to hosts caused by an infection) and transmission will evolve in certain virus-host contexts. In this study, w…
▽ More
Models are often employed to integrate knowledge about epidemics across scales and simulate disease dynamics. While these approaches have played a central role in studying the mechanics underlying epidemics, we lack ways to reliably predict how the relationship between virulence (the harm to hosts caused by an infection) and transmission will evolve in certain virus-host contexts. In this study, we invoke evolutionary invasion analysis -- a method used to identify the evolution of uninvadable strategies in dynamical systems -- to examine how the virulence-transmission dichotomy can evolve in models of virus infections defined by different natural histories. We reveal that peculiar ecologies drive different evolved relationships between virulence and transmission. Specifically, we discover patterns of virulence evolution between epidemics of various kinds (SARS-CoV-2 and hepatitis C virus) and that varying definitions of virulence alter our predictions for how viruses will evolve. We discuss the findings in light of contemporary conversations in the public health sector around the possibility of predicting virus evolution and in more extensive theoretical discussions involving virulence evolution in emerging infectious diseases.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
Authors:
Runa Eschenhagen,
Alexander Immer,
Richard E. Turner,
Frank Schneider,
Philipp Hennig
Abstract:
The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currentl…
▽ More
The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currently no framework to apply it to generic architectures, specifically ones with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers which motivate two flavours of K-FAC -- $\textit{expand}$ and $\textit{reduce}$. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimising the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in $50$-$75\%$ of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.
△ Less
Submitted 11 January, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Sim2Real for Environmental Neural Processes
Authors:
Jonas Scholz,
Tom R. Andersson,
Anna Vaughan,
James Requeima,
Richard E. Turner
Abstract:
Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML mo…
▽ More
Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML models directly on observations such as weather stations. Modelling scattered and sparse environmental observations requires scalable and flexible ML architectures, one of which is the convolutional conditional neural process (ConvCNP). ConvCNPs can learn to condition on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations. However, the sparsity of real observations presents a challenge for data-hungry deep learning models like the ConvCNP. One potential solution is 'Sim2Real': pre-training on reanalysis and fine-tuning on observational data. We analyse Sim2Real with a ConvCNP trained to interpolate surface air temperature over Germany, using varying numbers of weather stations for fine-tuning. On held-out weather stations, Sim2Real training substantially outperforms the same model architecture trained only with reanalysis data or only with station data, showing that reanalysis data can serve as a step** stone for learning from real observations. Sim2Real could thus enable more accurate models for weather prediction and climate monitoring.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
The NANOGrav 15-year data set: Search for Transverse Polarization Modes in the Gravitational-Wave Background
Authors:
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Jeremy Baier,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Rand Burnette,
Robin Case,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Megan E. DeCesar,
Dallas DeGan,
Paul B. Demorest
, et al. (74 additional authors not shown)
Abstract:
Recently we found compelling evidence for a gravitational wave background with Hellings and Downs (HD) correlations in our 15-year data set. These correlations describe gravitational waves as predicted by general relativity, which has two transverse polarization modes. However, more general metric theories of gravity can have additional polarization modes which produce different interpulsar correl…
▽ More
Recently we found compelling evidence for a gravitational wave background with Hellings and Downs (HD) correlations in our 15-year data set. These correlations describe gravitational waves as predicted by general relativity, which has two transverse polarization modes. However, more general metric theories of gravity can have additional polarization modes which produce different interpulsar correlations. In this work we search the NANOGrav 15-year data set for evidence of a gravitational wave background with quadrupolar Hellings and Downs (HD) and Scalar Transverse (ST) correlations. We find that HD correlations are the best fit to the data, and no significant evidence in favor of ST correlations. While Bayes factors show strong evidence for a correlated signal, the data does not strongly prefer either correlation signature, with Bayes factors $\sim 2$ when comparing HD to ST correlations, and $\sim 1$ for HD plus ST correlations to HD correlations alone. However, when modeled alongside HD correlations, the amplitude and spectral index posteriors for ST correlations are uninformative, with the HD process accounting for the vast majority of the total signal. Using the optimal statistic, a frequentist technique that focuses on the pulsar-pair cross-correlations, we find median signal-to-noise-ratios of 5.0 for HD and 4.6 for ST correlations when fit for separately, and median signal-to-noise-ratios of 3.5 for HD and 3.0 for ST correlations when fit for simultaneously. While the signal-to-noise-ratios for each of the correlations are comparable, the estimated amplitude and spectral index for HD are a significantly better fit to the total signal, in agreement with our Bayesian analysis.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Optimising Distributions with Natural Gradient Surrogates
Authors:
Jonathan So,
Richard E. Turner
Abstract:
Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with…
▽ More
Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks.
△ Less
Submitted 4 March, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
The NANOGrav 12.5-year data set: A computationally efficient eccentric binary search pipeline and constraints on an eccentric supermassive binary candidate in 3C 66B
Authors:
Gabriella Agazie,
Zaven Arzoumanian,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Harsha Blumer,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Belinda D. Cheeseboro,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Megan E. DeCesar,
Paul B. Demorest,
Lankeswar Dey,
Timothy Dolch,
Justin A. Ellis,
Robert D. Ferdman,
Elizabeth C. Ferrara
, et al. (63 additional authors not shown)
Abstract:
The radio galaxy 3C 66B has been hypothesized to host a supermassive black hole binary (SMBHB) at its center based on electromagnetic observations. Its apparent 1.05-year period and low redshift ($\sim0.02$) make it an interesting testbed to search for low-frequency gravitational waves (GWs) using Pulsar Timing Array (PTA) experiments. This source has been subjected to multiple searches for contin…
▽ More
The radio galaxy 3C 66B has been hypothesized to host a supermassive black hole binary (SMBHB) at its center based on electromagnetic observations. Its apparent 1.05-year period and low redshift ($\sim0.02$) make it an interesting testbed to search for low-frequency gravitational waves (GWs) using Pulsar Timing Array (PTA) experiments. This source has been subjected to multiple searches for continuous GWs from a circular SMBHB, resulting in progressively more stringent constraints on its GW amplitude and chirp mass. In this paper, we develop a pipeline for performing Bayesian targeted searches for eccentric SMBHBs in PTA data sets, and test its efficacy by applying it on simulated data sets with varying injected signal strengths. We also search for a realistic eccentric SMBHB source in 3C 66B using the NANOGrav 12.5-year data set employing PTA signal models containing Earth term-only as well as Earth+Pulsar term contributions using this pipeline. Due to limitations in our PTA signal model, we get meaningful results only when the initial eccentricity $e_0<0.5$ and the symmetric mass ratio $η>0.1$. We find no evidence for an eccentric SMBHB signal in our data, and therefore place 95% upper limits on the PTA signal amplitude of $88.1\pm3.7$ ns for the Earth term-only and $81.74\pm0.86$ ns for the Earth+Pulsar term searches for $e_0<0.5$ and $η>0.1$. Similar 95% upper limits on the chirp mass are $(1.98 \pm 0.05) \times 10^9\,M_{\odot}$ and $(1.81 \pm 0.01) \times 10^9\,M_{\odot}$. These upper limits, while less stringent than those calculated from a circular binary search in the NANOGrav 12.5-year data set, are consistent with the SMBHB model of 3C 66B developed from electromagnetic observations.
△ Less
Submitted 15 January, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
How to Detect an Astrophysical Nanohertz Gravitational-Wave Background
Authors:
Bence Bécsy,
Neil J. Cornish,
Patrick M. Meyers,
Luke Zoltan Kelley,
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Paul T. Baker,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Katerina Chatziioannou,
Tyler Cohen,
James M. Cordes,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Megan E. DeCesar,
Paul B. Demorest,
Timothy Dolch
, et al. (71 additional authors not shown)
Abstract:
Analysis of pulsar timing data have provided evidence for a stochastic gravitational wave background in the nHz frequency band. The most plausible source of such a background is the superposition of signals from millions of supermassive black hole binaries. The standard statistical techniques used to search for such a background and assess its significance make several simplifying assumptions, nam…
▽ More
Analysis of pulsar timing data have provided evidence for a stochastic gravitational wave background in the nHz frequency band. The most plausible source of such a background is the superposition of signals from millions of supermassive black hole binaries. The standard statistical techniques used to search for such a background and assess its significance make several simplifying assumptions, namely: i) Gaussianity; ii) isotropy; and most often iii) a power-law spectrum. However, a stochastic background from a finite collection of binaries does not exactly satisfy any of these assumptions. To understand the effect of these assumptions, we test standard analysis techniques on a large collection of realistic simulated datasets. The dataset length, observing schedule, and noise levels were chosen to emulate the NANOGrav 15-year dataset. Simulated signals from millions of binaries drawn from models based on the Illustris cosmological hydrodynamical simulation were added to the data. We find that the standard statistical methods perform remarkably well on these simulated datasets, despite their fundamental assumptions not being strictly met. They are able to achieve a confident detection of the background. However, even for a fixed set of astrophysical parameters, different realizations of the universe result in a large variance in the significance and recovered parameters of the background. We also find that the presence of loud individual binaries can bias the spectral recovery of the background if we do not account for them.
△ Less
Submitted 1 December, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Comparing recent PTA results on the nanohertz stochastic gravitational wave background
Authors:
The International Pulsar Timing Array Collaboration,
G. Agazie,
J. Antoniadis,
A. Anumarlapudi,
A. M. Archibald,
P. Arumugam,
S. Arumugam,
Z. Arzoumanian,
J. Askew,
S. Babak,
M. Bagchi,
M. Bailes,
A. -S. Bak Nielsen,
P. T. Baker,
C. G. Bassa,
A. Bathula,
B. Bécsy,
A. Berthereau,
N. D. R. Bhat,
L. Blecha,
M. Bonetti,
E. Bortolas,
A. Brazier,
P. R. Brook,
M. Burgay
, et al. (220 additional authors not shown)
Abstract:
The Australian, Chinese, European, Indian, and North American pulsar timing array (PTA) collaborations recently reported, at varying levels, evidence for the presence of a nanohertz gravitational wave background (GWB). Given that each PTA made different choices in modeling their data, we perform a comparison of the GWB and individual pulsar noise parameters across the results reported from the PTA…
▽ More
The Australian, Chinese, European, Indian, and North American pulsar timing array (PTA) collaborations recently reported, at varying levels, evidence for the presence of a nanohertz gravitational wave background (GWB). Given that each PTA made different choices in modeling their data, we perform a comparison of the GWB and individual pulsar noise parameters across the results reported from the PTAs that constitute the International Pulsar Timing Array (IPTA). We show that despite making different modeling choices, there is no significant difference in the GWB parameters that are measured by the different PTAs, agreeing within $1σ$. The pulsar noise parameters are also consistent between different PTAs for the majority of the pulsars included in these analyses. We bridge the differences in modeling choices by adopting a standardized noise model for all pulsars and PTAs, finding that under this model there is a reduction in the tension in the pulsar noise parameters. As part of this reanalysis, we "extended" each PTA's data set by adding extra pulsars that were not timed by that PTA. Under these extensions, we find better constraints on the GWB amplitude and a higher signal-to-noise ratio for the Hellings and Downs correlations. These extensions serve as a prelude to the benefits offered by a full combination of data across all pulsars in the IPTA, i.e., the IPTA's Data Release 3, which will involve not just adding in additional pulsars, but also including data from all three PTAs where any given pulsar is timed by more than as single PTA.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
Authors:
Phillip Lippe,
Bastiaan S. Veeling,
Paris Perdikaris,
Richard E. Turner,
Johannes Brandstetter
Abstract:
Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is…
▽ More
Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.
△ Less
Submitted 21 October, 2023; v1 submitted 10 August, 2023;
originally announced August 2023.
-
The NANOGrav 12.5-year Data Set: Search for Gravitational Wave Memory
Authors:
Gabriella Agazie,
Zaven Arzoumanian,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Harsha Blumer,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Rand Burnette,
Robin Case,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Megan E. DeCesar,
Dallas DeGan,
Paul B. Demorest,
Timothy Dolch,
Brendan Drachler,
Justin A. Ellis
, et al. (65 additional authors not shown)
Abstract:
We present the results of a Bayesian search for gravitational wave (GW) memory in the NANOGrav 12.5-yr data set. We find no convincing evidence for any gravitational wave memory signals in this data set (Bayes factor = 2.8). As such, we go on to place upper limits on the strain amplitude of GW memory events as a function of sky location and event epoch. These upper limits are computed using a sign…
▽ More
We present the results of a Bayesian search for gravitational wave (GW) memory in the NANOGrav 12.5-yr data set. We find no convincing evidence for any gravitational wave memory signals in this data set (Bayes factor = 2.8). As such, we go on to place upper limits on the strain amplitude of GW memory events as a function of sky location and event epoch. These upper limits are computed using a signal model that assumes the existence of a common, spatially uncorrelated red noise in addition to a GW memory signal. The median strain upper limit as a function of sky position is approximately $3.3 \times 10^{-14}$. We also find that there are some differences in the upper limits as a function of sky position centered around PSR J0613$-$0200. This suggests that this pulsar has some excess noise which can be confounded with GW memory. Finally, the upper limits as a function of burst epoch continue to improve at later epochs. This improvement is attributable to the continued growth of the pulsar timing array.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Geometric Neural Diffusion Processes
Authors:
Emile Mathieu,
Vincent Dutordoir,
Michael J. Hutchinson,
Valentin De Bortoli,
Yee Whye Teh,
Richard E. Turner
Abstract:
Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models…
▽ More
Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Beyond Intuition, a Framework for Applying GPs to Real-World Data
Authors:
Kenza Tazi,
Jihao Andreas Lin,
Ross Viljoen,
Alex Gardner,
ST John,
Hong Ge,
Richard E. Turner
Abstract:
Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guid…
▽ More
Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.
△ Less
Submitted 17 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
The NANOGrav 15-year Gravitational-Wave Background Analysis Pipeline
Authors:
Aaron D. Johnson,
Patrick M. Meyers,
Paul T. Baker,
Neil J. Cornish,
Jeffrey S. Hazboun,
Tyson B. Littenberg,
Joseph D. Romano,
Stephen R. Taylor,
Michele Vallisneri,
Sarah J. Vigeland,
Ken D. Olum,
Xavier Siemens,
Justin A. Ellis,
Rutger van Haasteren,
Sophie Hourihane,
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Bence Bécsy,
J. Andrew Casey-Clyde
, et al. (71 additional authors not shown)
Abstract:
This paper presents rigorous tests of pulsar timing array methods and software, examining their consistency across a wide range of injected parameters and signal strength. We discuss updates to the 15-year isotropic gravitational-wave background analyses and their corresponding code representations. Descriptions of the internal structure of the flagship algorithms \texttt{Enterprise} and \texttt{P…
▽ More
This paper presents rigorous tests of pulsar timing array methods and software, examining their consistency across a wide range of injected parameters and signal strength. We discuss updates to the 15-year isotropic gravitational-wave background analyses and their corresponding code representations. Descriptions of the internal structure of the flagship algorithms \texttt{Enterprise} and \texttt{PTMCMCSampler} are given to facilitate understanding of the PTA likelihood structure, how models are built, and what methods are currently used in sampling the high-dimensional PTA parameter space. We introduce a novel version of the PTA likelihood that uses a two-step marginalization procedure that performs much faster when the white noise parameters remain fixed. We perform stringent tests of consistency and correctness of the Bayesian and frequentist analysis software. For the Bayesian analysis, we test prior recovery, injection recovery, and Bayes factors. For the frequentist analysis, we test that the cross-correlation-based optimal statistic, when modified to account for a non-negligible gravitational-wave background, accurately recovers the amplitude of the background. We also summarize recent advances and tests performed on the optimal statistic in the literature from both GWB detection and parameter estimation perspectives. The tests presented here validate current and future analyses of PTA data.
△ Less
Submitted 7 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
The NANOGrav 15-year Data Set: Bayesian Limits on Gravitational Waves from Individual Supermassive Black Hole Binaries
Authors:
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Robin Case,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Tyler Cohen,
James M. Cordes,
Neil Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Megan DeCesar,
Paul B. Demorest,
Matthew C. Digman,
Timothy Dolch,
Brendan Drachler
, et al. (74 additional authors not shown)
Abstract:
Evidence for a low-frequency stochastic gravitational wave background has recently been reported based on analyses of pulsar timing array data. The most likely source of such a background is a population of supermassive black hole binaries, the loudest of which may be individually detected in these datasets. Here we present the search for individual supermassive black hole binaries in the NANOGrav…
▽ More
Evidence for a low-frequency stochastic gravitational wave background has recently been reported based on analyses of pulsar timing array data. The most likely source of such a background is a population of supermassive black hole binaries, the loudest of which may be individually detected in these datasets. Here we present the search for individual supermassive black hole binaries in the NANOGrav 15-year dataset. We introduce several new techniques, which enhance the efficiency and modeling accuracy of the analysis. The search uncovered weak evidence for two candidate signals, one with a gravitational-wave frequency of $\sim$4 nHz, and another at $\sim$170 nHz. The significance of the low-frequency candidate was greatly diminished when Hellings-Downs correlations were included in the background model. The high-frequency candidate was discounted due to the lack of a plausible host galaxy, the unlikely astrophysical prior odds of finding such a source, and since most of its support comes from a single pulsar with a commensurate binary period. Finding no compelling evidence for signals from individual binary systems, we place upper limits on the strain amplitude of gravitational waves emitted by such systems.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
The NANOGrav 15-year Data Set: Search for Anisotropy in the Gravitational-Wave Background
Authors:
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Megan E. DeCesar,
Paul B. Demorest,
Timothy Dolch,
Brendan Drachler,
Elizabeth C. Ferrara,
William Fiore
, et al. (68 additional authors not shown)
Abstract:
The North American Nanohertz Observatory for Gravitational Waves (NANOGrav) has reported evidence for the presence of an isotropic nanohertz gravitational wave background (GWB) in its 15 yr dataset. However, if the GWB is produced by a population of inspiraling supermassive black hole binary (SMBHB) systems, then the background is predicted to be anisotropic, depending on the distribution of these…
▽ More
The North American Nanohertz Observatory for Gravitational Waves (NANOGrav) has reported evidence for the presence of an isotropic nanohertz gravitational wave background (GWB) in its 15 yr dataset. However, if the GWB is produced by a population of inspiraling supermassive black hole binary (SMBHB) systems, then the background is predicted to be anisotropic, depending on the distribution of these systems in the local Universe and the statistical properties of the SMBHB population. In this work, we search for anisotropy in the GWB using multiple methods and bases to describe the distribution of the GWB power on the sky. We do not find significant evidence of anisotropy, and place a Bayesian $95\%$ upper limit on the level of broadband anisotropy such that $(C_{l>0} / C_{l=0}) < 20\%$. We also derive conservative estimates on the anisotropy expected from a random distribution of SMBHB systems using astrophysical simulations conditioned on the isotropic GWB inferred in the 15-yr dataset, and show that this dataset has sufficient sensitivity to probe a large fraction of the predicted level of anisotropy. We end by highlighting the opportunities and challenges in searching for anisotropy in pulsar timing array data.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
The NANOGrav 15-year Data Set: Constraints on Supermassive Black Hole Binaries from the Gravitational Wave Background
Authors:
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Alexander Bonilla,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Rand Burnette,
Robin Case,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Katerina Chatziioannou,
Belinda D. Cheeseboro,
Siyuan Chen,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Curt J. Cutler
, et al. (89 additional authors not shown)
Abstract:
The NANOGrav 15-year data set shows evidence for the presence of a low-frequency gravitational-wave background (GWB). While many physical processes can source such low-frequency gravitational waves, here we analyze the signal as coming from a population of supermassive black hole (SMBH) binaries distributed throughout the Universe. We show that astrophysically motivated models of SMBH binary popul…
▽ More
The NANOGrav 15-year data set shows evidence for the presence of a low-frequency gravitational-wave background (GWB). While many physical processes can source such low-frequency gravitational waves, here we analyze the signal as coming from a population of supermassive black hole (SMBH) binaries distributed throughout the Universe. We show that astrophysically motivated models of SMBH binary populations are able to reproduce both the amplitude and shape of the observed low-frequency gravitational-wave spectrum. While multiple model variations are able to reproduce the GWB spectrum at our current measurement precision, our results highlight the importance of accurately modeling binary evolution for producing realistic GWB spectra. Additionally, while reasonable parameters are able to reproduce the 15-year observations, the implied GWB amplitude necessitates either a large number of parameters to be at the edges of expected values, or a small number of parameters to be notably different from standard expectations. While we are not yet able to definitively establish the origin of the inferred GWB signal, the consistency of the signal with astrophysical expectations offers a tantalizing prospect for confirming that SMBH binaries are able to form, reach sub-parsec separations, and eventually coalesce. As the significance grows over time, higher-order features of the GWB spectrum will definitively determine the nature of the GWB and allow for novel constraints on SMBH populations.
△ Less
Submitted 18 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
The NANOGrav 15-year Data Set: Search for Signals from New Physics
Authors:
Adeela Afzal,
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Paul T. Baker,
Bence Bécsy,
Jose Juan Blanco-Pillado,
Laura Blecha,
Kimberly K. Boddy,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Rand Burnette,
Robin Case,
Maria Charisi,
Shami Chatterjee,
Katerina Chatziioannou,
Belinda D. Cheeseboro,
Siyuan Chen,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie
, et al. (98 additional authors not shown)
Abstract:
The 15-year pulsar timing data set collected by the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) shows positive evidence for the presence of a low-frequency gravitational-wave (GW) background. In this paper, we investigate potential cosmological interpretations of this signal, specifically cosmic inflation, scalar-induced GWs, first-order phase transitions, cosmic string…
▽ More
The 15-year pulsar timing data set collected by the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) shows positive evidence for the presence of a low-frequency gravitational-wave (GW) background. In this paper, we investigate potential cosmological interpretations of this signal, specifically cosmic inflation, scalar-induced GWs, first-order phase transitions, cosmic strings, and domain walls. We find that, with the exception of stable cosmic strings of field theory origin, all these models can reproduce the observed signal. When compared to the standard interpretation in terms of inspiraling supermassive black hole binaries (SMBHBs), many cosmological models seem to provide a better fit resulting in Bayes factors in the range from 10 to 100. However, these results strongly depend on modeling assumptions about the cosmic SMBHB population and, at this stage, should not be regarded as evidence for new physics. Furthermore, we identify excluded parameter regions where the predicted GW signal from cosmological sources significantly exceeds the NANOGrav signal. These parameter constraints are independent of the origin of the NANOGrav signal and illustrate how pulsar timing data provide a new way to constrain the parameter space of these models. Finally, we search for deterministic signals produced by models of ultralight dark matter (ULDM) and dark matter substructures in the Milky Way. We find no evidence for either of these signals and thus report updated constraints on these models. In the case of ULDM, these constraints outperform torsion balance and atomic clock constraints for ULDM coupled to electrons, muons, or gluons.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
The NANOGrav 15-Year Data Set: Detector Characterization and Noise Budget
Authors:
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Paul T. Baker,
Bence Bécsy,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Maria Charisi,
Shami Chatterjee,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Megan E. Decesar,
Paul B. Demorest,
Timothy Dolch,
Brendan Drachler,
Elizabeth C. Ferrara,
William Fiore,
Emmanuel Fonseca
, et al. (66 additional authors not shown)
Abstract:
Pulsar timing arrays (PTAs) are galactic-scale gravitational wave detectors. Each individual arm, composed of a millisecond pulsar, a radio telescope, and a kiloparsecs-long path, differs in its properties but, in aggregate, can be used to extract low-frequency gravitational wave (GW) signals. We present a noise and sensitivity analysis to accompany the NANOGrav 15-year data release and associated…
▽ More
Pulsar timing arrays (PTAs) are galactic-scale gravitational wave detectors. Each individual arm, composed of a millisecond pulsar, a radio telescope, and a kiloparsecs-long path, differs in its properties but, in aggregate, can be used to extract low-frequency gravitational wave (GW) signals. We present a noise and sensitivity analysis to accompany the NANOGrav 15-year data release and associated papers, along with an in-depth introduction to PTA noise models. As a first step in our analysis, we characterize each individual pulsar data set with three types of white noise parameters and two red noise parameters. These parameters, along with the timing model and, particularly, a piecewise-constant model for the time-variable dispersion measure, determine the sensitivity curve over the low-frequency GW band we are searching. We tabulate information for all of the pulsars in this data release and present some representative sensitivity curves. We then combine the individual pulsar sensitivities using a signal-to-noise-ratio statistic to calculate the global sensitivity of the PTA to a stochastic background of GWs, obtaining a minimum noise characteristic strain of $7\times 10^{-15}$ at 5 nHz. A power law-integrated analysis shows rough agreement with the amplitudes recovered in NANOGrav's 15-year GW background analysis. While our phenomenological noise model does not model all known physical effects explicitly, it provides an accurate characterization of the noise in the data while preserving sensitivity to multiple classes of GW signals.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
The NANOGrav 15-year Data Set: Observations and Timing of 68 Millisecond Pulsars
Authors:
Gabriella Agazie,
Md Faisal Alam,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Paul T. Baker,
Laura Blecha,
Victoria Bonidie,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Bence Bécsy,
Christopher Chapman,
Maria Charisi,
Shami Chatterjee,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Megan E. DeCesar,
Paul B. Demorest,
Timothy Dolch,
Brendan Drachler
, et al. (75 additional authors not shown)
Abstract:
We present observations and timing analyses of 68 millisecond pulsars (MSPs) comprising the 15-year data set of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav). NANOGrav is a pulsar timing array (PTA) experiment that is sensitive to low-frequency gravitational waves. This is NANOGrav's fifth public data release, including both "narrowband" and "wideband" time-of-arrival…
▽ More
We present observations and timing analyses of 68 millisecond pulsars (MSPs) comprising the 15-year data set of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav). NANOGrav is a pulsar timing array (PTA) experiment that is sensitive to low-frequency gravitational waves. This is NANOGrav's fifth public data release, including both "narrowband" and "wideband" time-of-arrival (TOA) measurements and corresponding pulsar timing models. We have added 21 MSPs and extended our timing baselines by three years, now spanning nearly 16 years for some of our sources. The data were collected using the Arecibo Observatory, the Green Bank Telescope, and the Very Large Array between frequencies of 327 MHz and 3 GHz, with most sources observed approximately monthly. A number of notable methodological and procedural changes were made compared to our previous data sets. These improve the overall quality of the TOA data set and are part of the transition to new pulsar timing and PTA analysis software packages. For the first time, our data products are accompanied by a full suite of software to reproduce data reduction, analysis, and results. Our timing models include a variety of newly detected astrometric and binary pulsar parameters, including several significant improvements to pulsar mass constraints. We find that the time series of 23 pulsars contain detectable levels of red noise, 10 of which are new measurements. In this data set, we find evidence for a stochastic gravitational-wave background.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
The NANOGrav 15-year Data Set: Evidence for a Gravitational-Wave Background
Authors:
Gabriella Agazie,
Akash Anumarlapudi,
Anne M. Archibald,
Zaven Arzoumanian,
Paul T. Baker,
Bence Becsy,
Laura Blecha,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Rand Burnette,
Robin Case,
Maria Charisi,
Shami Chatterjee,
Katerina Chatziioannou,
Belinda D. Cheeseboro,
Siyuan Chen,
Tyler Cohen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Kathryn Crowter,
Curt J. Cutler,
Megan E. DeCesar
, et al. (89 additional authors not shown)
Abstract:
We report multiple lines of evidence for a stochastic signal that is correlated among 67 pulsars from the 15-year pulsar-timing data set collected by the North American Nanohertz Observatory for Gravitational Waves. The correlations follow the Hellings-Downs pattern expected for a stochastic gravitational-wave background. The presence of such a gravitational-wave background with a power-law-spectr…
▽ More
We report multiple lines of evidence for a stochastic signal that is correlated among 67 pulsars from the 15-year pulsar-timing data set collected by the North American Nanohertz Observatory for Gravitational Waves. The correlations follow the Hellings-Downs pattern expected for a stochastic gravitational-wave background. The presence of such a gravitational-wave background with a power-law-spectrum is favored over a model with only independent pulsar noises with a Bayes factor in excess of $10^{14}$, and this same model is favored over an uncorrelated common power-law-spectrum model with Bayes factors of 200-1000, depending on spectral modeling choices. We have built a statistical background distribution for these latter Bayes factors using a method that removes inter-pulsar correlations from our data set, finding $p = 10^{-3}$ (approx. $3σ$) for the observed Bayes factors in the null no-correlation scenario. A frequentist test statistic built directly as a weighted sum of inter-pulsar correlations yields $p = 5 \times 10^{-5} - 1.9 \times 10^{-4}$ (approx. $3.5 - 4σ$). Assuming a fiducial $f^{-2/3}$ characteristic-strain spectrum, as appropriate for an ensemble of binary supermassive black-hole inspirals, the strain amplitude is $2.4^{+0.7}_{-0.6} \times 10^{-15}$ (median + 90% credible interval) at a reference frequency of 1/(1 yr). The inferred gravitational-wave background amplitude and spectrum are consistent with astrophysical expectations for a signal from a population of supermassive black-hole binaries, although more exotic cosmological and astrophysical sources cannot be excluded. The observation of Hellings-Downs correlations points to the gravitational-wave origin of this signal.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation
Authors:
Massimiliano Patacchiola,
Mingfei Sun,
Katja Hofmann,
Richard E. Turner
Abstract:
In this paper we explore few-shot imitation learning for control problems, which involves learning to imitate a target policy by accessing a limited set of offline rollouts. This setting has been relatively under-explored despite its relevance to robotics and control applications. State-of-the-art methods developed to tackle few-shot imitation rely on meta-learning, which is expensive to train as…
▽ More
In this paper we explore few-shot imitation learning for control problems, which involves learning to imitate a target policy by accessing a limited set of offline rollouts. This setting has been relatively under-explored despite its relevance to robotics and control applications. State-of-the-art methods developed to tackle few-shot imitation rely on meta-learning, which is expensive to train as it requires access to a distribution over tasks (rollouts from many target policies and variations of the base environment). Given this limitation we investigate an alternative approach, fine-tuning, a family of methods that pretrain on a single dataset and then fine-tune on unseen domain-specific data. Recent work has shown that fine-tuners outperform meta-learners in few-shot image classification tasks, especially when the data is out-of-domain. Here we evaluate to what extent this is true for control problems, proposing a simple yet effective baseline which relies on two stages: (i) training a base policy online via reinforcement learning (e.g. Soft Actor-Critic) on a single base environment, (ii) fine-tuning the base policy via behavioral cloning on a few offline rollouts of the target policy. Despite its simplicity this baseline is competitive with meta-learning methods on a variety of conditions and is able to imitate target policies trained on unseen variations of the original environment. Importantly, the proposed approach is practical and easy to implement, as it does not need any complex meta-training protocol. As a further contribution, we release an open source dataset called iMuJoCo (iMitation MuJoCo) consisting of 154 variants of popular OpenAI-Gym MuJoCo environments with associated pretrained target policies and rollouts, which can be used by the community to study few-shot imitation learning and offline reinforcement learning.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
An Introduction to Transformers
Authors:
Richard E. Turner
Abstract:
The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and spatio-temporal modelling. There are many introductions to transformers, but most do not contain precise mathematical descriptions of the architecture and the intuitions be…
▽ More
The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and spatio-temporal modelling. There are many introductions to transformers, but most do not contain precise mathematical descriptions of the architecture and the intuitions behind the design choices are often also missing. Moreover, as research takes a winding path, the explanations for the components of the transformer can be idiosyncratic. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture. We will not discuss training as this is rather standard. We assume that the reader is familiar with fundamental topics in machine learning including multi-layer perceptrons, linear transformations, softmax functions and basic probability.
△ Less
Submitted 8 February, 2024; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Autoregressive Conditional Neural Processes
Authors:
Wessel P. Bruinsma,
Stratis Markou,
James Requiema,
Andrew Y. K. Foong,
Tom R. Andersson,
Anna Vaughan,
Anthony Buonomo,
J. Scott Hosking,
Richard E. Turner
Abstract:
Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate infere…
▽ More
Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate inference or being limited to Gaussian predictions. In this work, we instead propose to change how CNPs are deployed at test time, without any modifications to the model or training procedure. Instead of making predictions independently for every target point, we autoregressively define a joint predictive distribution using the chain rule of probability, taking inspiration from the neural autoregressive density estimator (NADE) literature. We show that this simple procedure allows factorised Gaussian CNPs to model highly dependent, non-Gaussian predictive distributions. Perhaps surprisingly, in an extensive range of tasks with synthetic and real data, we show that CNPs in autoregressive (AR) mode not only significantly outperform non-AR CNPs, but are also competitive with more sophisticated models that are significantly more computationally expensive and challenging to train. This performance is remarkable given that AR CNPs are not trained to model joint dependencies. Our work provides an example of how ideas from neural distribution estimation can benefit neural processes, and motivates research into the AR deployment of other neural process models.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
First Session Adaptation: A Strong Replay-Free Baseline for Class-Incremental Learning
Authors:
Aristeidis Panos,
Yuriko Kobe,
Daniel Olmeda Reino,
Rahaf Aljundi,
Richard E. Turner
Abstract:
In Class-Incremental Learning (CIL) an image classification system is exposed to new classes in each learning session and must be updated incrementally. Methods approaching this problem have updated both the classification head and the feature extractor body at each session of CIL. In this work, we develop a baseline method, First Session Adaptation (FSA), that sheds light on the efficacy of exist…
▽ More
In Class-Incremental Learning (CIL) an image classification system is exposed to new classes in each learning session and must be updated incrementally. Methods approaching this problem have updated both the classification head and the feature extractor body at each session of CIL. In this work, we develop a baseline method, First Session Adaptation (FSA), that sheds light on the efficacy of existing CIL approaches and allows us to assess the relative performance contributions from head and body adaption. FSA adapts a pre-trained neural network body only on the first learning session and fixes it thereafter; a head based on linear discriminant analysis (LDA), is then placed on top of the adapted body, allowing exact updates through CIL. FSA is replay-free i.e.~it does not memorize examples from previous sessions of continual learning. To empirically motivate FSA, we first consider a diverse selection of 22 image-classification datasets, evaluating different heads and body adaptation techniques in high/low-shot offline settings. We find that the LDA head performs well and supports CIL out-of-the-box. We also find that Featurewise Layer Modulation (FiLM) adapters are highly effective in the few-shot setting, and full-body adaption in the high-shot setting. Second, we empirically investigate various CIL settings including high-shot CIL and few-shot CIL, including settings that have previously been used in the literature. We show that FSA significantly improves over the state-of-the-art in 15 of the 16 settings considered. FSA with FiLM adapters is especially performant in the few-shot setting. These results indicate that current approaches to continuous body adaptation are not working as expected. Finally, we propose a measure that can be applied to a set of unlabelled inputs which is predictive of the benefits of body adaptation.
△ Less
Submitted 12 January, 2024; v1 submitted 23 March, 2023;
originally announced March 2023.
-
On the Efficacy of Differentially Private Few-shot Image Classification
Authors:
Marlon Tobaben,
Aliaksandra Shysheya,
John Bronskill,
Andrew Paverd,
Shruti Tople,
Santiago Zanella-Beguelin,
Richard E Turner,
Antti Honkela
Abstract:
There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-tuned on private downstream datasets that are relatively large and similar in distribution to the pretraining data. However, in many applications including person…
▽ More
There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-tuned on private downstream datasets that are relatively large and similar in distribution to the pretraining data. However, in many applications including personalization and federated learning, it is crucial to perform well (i) in the few-shot setting, as obtaining large amounts of labeled data may be problematic; and (ii) on datasets from a wide variety of domains for use in various specialist settings. To understand under which conditions few-shot DP can be effective, we perform an exhaustive set of experiments that reveals how the accuracy and vulnerability to attack of few-shot DP image classification models are affected as the number of shots per class, privacy level, model architecture, downstream dataset, and subset of learnable parameters in the model vary. We show that to achieve DP accuracy on par with non-private models, the shots per class must be increased as the privacy level increases. We also show that learning parameter-efficient FiLM adapters under DP is competitive with learning just the final classifier layer or learning all of the network parameters. Finally, we evaluate DP federated learning systems and establish state-of-the-art performance on the challenging FLAIR benchmark.
△ Less
Submitted 19 December, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Scattering Delay Mitigation in High Accuracy Pulsar Timing: Cyclic Spectroscopy Techniques
Authors:
Jacob E. Turner,
Daniel R. Stinebring,
Maura A. McLaughlin,
Anne M. Archibald,
Timothy Dolch,
Ryan S. Lynch
Abstract:
We simulate scattering delays from the interstellar medium to examine the effectiveness of three estimators in recovering these delays in pulsar timing data. Two of these estimators use the more traditional process of fitting autocorrelation functions to pulsar dynamic spectra to extract scintillation bandwidths, while the third estimator uses the newer technique of cyclic spectroscopy on baseband…
▽ More
We simulate scattering delays from the interstellar medium to examine the effectiveness of three estimators in recovering these delays in pulsar timing data. Two of these estimators use the more traditional process of fitting autocorrelation functions to pulsar dynamic spectra to extract scintillation bandwidths, while the third estimator uses the newer technique of cyclic spectroscopy on baseband pulsar data to recover the interstellar medium's impulse response function. We find that either fitting a Lorentzian or Gaussian distribution to an autocorrelation function or recovering the impulse response function from the cyclic spectrum are, on average, accurate in recovering scattering delays, although autocorrelation function estimators have a large variance, even at high signal-to-noise ratio (S/N). We find that, given sufficient S/N, cyclic spectroscopy is more accurate than both Gaussian and Lorentzian fitting for recovering scattering delays at specific epochs, suggesting that cyclic spectroscopy is a superior method for scattering estimation in high quality data.
△ Less
Submitted 27 February, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
A Simultaneous Dual-Frequency Scintillation Arc Survey of Six Bright Canonical Pulsars Using the Upgraded Giant Metrewave Radio Telescope
Authors:
Jacob E. Turner,
Bhal Chandra Joshi,
Maura A. McLaughlin,
Daniel R. Stinebring
Abstract:
We use the upgraded Giant Metrewave Radio Telescope to measure scintillation arc properties in six bright canonical pulsars with simultaneous dual frequency coverage. These observations at frequencies from 300 to 750 MHz allowed for detailed analysis of arc evolution across frequency and epoch. We perform more robust determinations of frequency dependence for arc curvature, scintillation bandwidth…
▽ More
We use the upgraded Giant Metrewave Radio Telescope to measure scintillation arc properties in six bright canonical pulsars with simultaneous dual frequency coverage. These observations at frequencies from 300 to 750 MHz allowed for detailed analysis of arc evolution across frequency and epoch. We perform more robust determinations of frequency dependence for arc curvature, scintillation bandwidth, and scintillation timescale, and comparison between arc curvature and pseudo-curvature than allowed by single-frequency-band-per-epoch measurements, which we find to agree with theory and previous literature. We find a strong correlation between arc asymmetry and arc curvature, which we have replicated using simulations, and attribute to a bias in the Hough transform approach to scintillation arc analysis. Possible evidence for an approximately week long timescale over which a given scattering screen dominates signal propagation was found by tracking visible scintillation arcs in each epoch in PSR J1136+1551. The inclusion of a 155 minute observation allowed us to resolve the scale of scintillation variations on short timescales, which we find to be directly tied to the amount of ISM sampled over the observation. Some of our pulsars showed either consistent or emerging asymmetries in arc curvature, indicating instances of refraction across their lines of sight. Significant features in various pulsars, such as multiple scintillation arcs in PSR J1136+1551 and flat arclets in PSR J1509+5531, that have been found in previous works, were also detected. The simultaneous multiple band observing capability of the upgraded GMRT shows excellent promise for future pulsar scintillation work.
△ Less
Submitted 20 October, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
The NANOGrav 12.5-year Data Set: Bayesian Limits on Gravitational Waves from Individual Supermassive Black Hole Binaries
Authors:
Zaven Arzoumanian,
Paul T. Baker,
Laura Blecha,
Harsha Blumer,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Bence Bécsy,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Siyuan Chen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Megan E. DeCesar,
Paul B. Demorest,
Timothy Dolch,
Brendan Drachler,
Justin A. Ellis,
E. C. Ferrara,
William Fiore,
Emmanuel Fonseca,
Gabriel E. Freedman
, et al. (53 additional authors not shown)
Abstract:
Pulsar timing array collaborations, such as the North American Nanohertz Observatory for Gravitational Waves (NANOGrav), are seeking to detect nanohertz gravitational waves emitted by supermassive black hole binaries formed in the aftermath of galaxy mergers. We have searched for continuous waves from individual circular supermassive black hole binaries using the NANOGrav's recent 12.5-year data s…
▽ More
Pulsar timing array collaborations, such as the North American Nanohertz Observatory for Gravitational Waves (NANOGrav), are seeking to detect nanohertz gravitational waves emitted by supermassive black hole binaries formed in the aftermath of galaxy mergers. We have searched for continuous waves from individual circular supermassive black hole binaries using the NANOGrav's recent 12.5-year data set. We created new methods to accurately model the uncertainties on pulsar distances in our analysis, and we implemented new techniques to account for a common red noise process in pulsar timing array data sets while searching for deterministic gravitational wave signals, including continuous waves. As we found no evidence for continuous waves in our data, we placed 95\% upper limits on the strain amplitude of continuous waves emitted by these sources. At our most sensitive frequency of 7.65 nanohertz, we placed a sky-averaged limit of $h_0 < $ $(6.82 \pm 0.35) \times 10^{-15}$, and $h_0 <$ $(2.66 \pm 0.15) \times 10^{-15}$ in our most sensitive sky location. Finally, we placed a multi-messenger limit of $\mathcal{M} <$ $(1.41 \pm 0.02) \times 10^9 M_\odot$ on the chirp mass of the supermassive black hole binary candidate 3C~66B.
△ Less
Submitted 6 June, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning Few-Shot Meta-Learners
Authors:
Elre T. Oldewage,
John Bronskill,
Richard E. Turner
Abstract:
This paper examines the robustness of deployed few-shot meta-learning systems when they are fed an imperceptibly perturbed few-shot dataset. We attack amortized meta-learners, which allows us to craft colluding sets of inputs that are tailored to fool the system's learning algorithm when used as training data. Jointly crafted adversarial inputs might be expected to synergistically manipulate a cla…
▽ More
This paper examines the robustness of deployed few-shot meta-learning systems when they are fed an imperceptibly perturbed few-shot dataset. We attack amortized meta-learners, which allows us to craft colluding sets of inputs that are tailored to fool the system's learning algorithm when used as training data. Jointly crafted adversarial inputs might be expected to synergistically manipulate a classifier, allowing for very strong data-poisoning attacks that would be hard to detect. We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance. However, in opposition to the well-known transferability of adversarial examples in general, the colluding sets do not transfer well to different classifiers. We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred. Regardless of the mitigation strategies suggested by these hypotheses, the colluding inputs transfer no better than adversarial inputs that are generated independently in the usual way.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.