Search | arXiv e-print repository

The Topos of Transformer Networks

Authors: Mattia Jacopo Villani, Peter McBurney

Abstract: The transformer neural network has significantly out-shined all other neural network architectures as the engine behind large language models. We provide a theoretical analysis of the expressivity of the transformer architecture through the lens of topos theory. From this viewpoint, we show that many common neural network architectures, such as the convolutional, recurrent and graph convolutional… ▽ More The transformer neural network has significantly out-shined all other neural network architectures as the engine behind large language models. We provide a theoretical analysis of the expressivity of the transformer architecture through the lens of topos theory. From this viewpoint, we show that many common neural network architectures, such as the convolutional, recurrent and graph convolutional networks, can be embedded in a pretopos of piecewise-linear functions, but that the transformer necessarily lives in its topos completion. In particular, this suggests that the two network families instantiate different fragments of logic: the former are first order, whereas transformers are higher-order reasoners. Furthermore, we draw parallels with architecture search and gradient descent, integrating our analysis in the framework of cybernetic agents. △ Less

Submitted 5 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Requires major revision

arXiv:2402.07439 [pdf, other]

Joint estimation of the predictive ability of experts using a multi-output Gaussian process

Authors: Oscar Oelrich, Mattias Villani

Abstract: A multi-output Gaussian process (GP) is introduced as a model for the joint posterior distribution of the local predictive ability of set of models and/or experts, conditional on a vector of covariates, from historical predictions in the form of log predictive scores. Following a power transformation of the log scores, a GP with Gaussian noise can be used, which allows faster computation by first… ▽ More A multi-output Gaussian process (GP) is introduced as a model for the joint posterior distribution of the local predictive ability of set of models and/or experts, conditional on a vector of covariates, from historical predictions in the form of log predictive scores. Following a power transformation of the log scores, a GP with Gaussian noise can be used, which allows faster computation by first using Hamiltonian Monte Carlo to sample the hyper-parameters of the GP from a model where the latent GP surface has been marginalized out, and then using these draws to generate draws of joint predictive ability conditional on a new vector of covariates. Linear pools based on learned joint local predictive ability are applied to predict daily bike usage in Washington DC. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 22 pages, 7 figures. This paper was included in the first author's PhD thesis: Oelrich, O. (2022) 'Learning Local Predictive Accuracy for Expert Evaluation and Forecast Combination' which can be found at https://su.diva-portal.org/smash/record.jsf?pid=diva2:1708601

arXiv:2402.02068 [pdf, other]

Modeling local predictive ability using power-transformed Gaussian processes

Authors: Oscar Oelrich, Mattias Villani

Abstract: A Gaussian process is proposed as a model for the posterior distribution of the local predictive ability of a model or expert, conditional on a vector of covariates, from historical predictions in the form of log predictive scores. Assuming Gaussian expert predictions and a Gaussian data generating process, a linear transformation of the predictive score follows a noncentral chi-squared distributi… ▽ More A Gaussian process is proposed as a model for the posterior distribution of the local predictive ability of a model or expert, conditional on a vector of covariates, from historical predictions in the form of log predictive scores. Assuming Gaussian expert predictions and a Gaussian data generating process, a linear transformation of the predictive score follows a noncentral chi-squared distribution with one degree of freedom. Motivated by this we develop a non-central chi-squared Gaussian process regression to flexibly model local predictive ability, with the posterior distribution of the latent GP function and kernel hyperparameters sampled by Hamiltonian Monte Carlo. We show that a cube-root transformation of the log scores is approximately Gaussian with homoscedastic variance, which makes it possible to estimate the model much faster by marginalizing the latent GP function analytically. Linear pools based on learned local predictive ability are applied to predict daily bike usage in Washington DC. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 27 pages, 15 figures. This paper was included in the first author's PhD thesis: Oelrich, O. (2022) 'Learning Local Predictive Accuracy for Expert Evaluation and Forecast Combination' which can be found at https://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-210919

arXiv:2310.16914 [pdf, ps, other]

Semi-analytical and numerical solutions to Teukolsky equations for large fermion mass over black hole mass ratio

Authors: Mattia Villani

Abstract: In a recent paper, we have studied the Teukolsky equations for fermions with mass $m_e\neq 0$ and rotating black hole of mass $M$. There, we have studied two cases: $\tilde{m}_e=m_e\,M^{-1}\ll 1$ and $aω\ll 1$; $\tilde{m}_e\ll 1$ and $aω\gtrsim 1$. Here we study the two remaining case case in which $\tilde{m}_e\gtrsim 1$ and $aω\ll 1$ using a semi-analytical approach and $\tilde{m}_e\gtrsim 1$ and… ▽ More In a recent paper, we have studied the Teukolsky equations for fermions with mass $m_e\neq 0$ and rotating black hole of mass $M$. There, we have studied two cases: $\tilde{m}_e=m_e\,M^{-1}\ll 1$ and $aω\ll 1$; $\tilde{m}_e\ll 1$ and $aω\gtrsim 1$. Here we study the two remaining case case in which $\tilde{m}_e\gtrsim 1$ and $aω\ll 1$ using a semi-analytical approach and $\tilde{m}_e\gtrsim 1$ and $aω\gtrsim 1$ using a numerical approach. This case could be of some interest for the study of the interactions of fermions with small black holes, such as those formed in the last stages of the the Hawking evaporation process. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 7 pages, two tables. arXiv admin note: text overlap with arXiv:2310.13645

arXiv:2310.13645 [pdf, ps, other]

Perturbative and semi-analytical solutions to Teukolsky equations for massive fermions

Authors: Mattia Villani

Abstract: In this work, we aim at solving the Teukolsky equations for a fermion with mass $m_e\neq 0$ in the presence of a rotating black hole with mass $M$. We consider two different regimes: $\tilde{m}_e= M^{-1} m_e\ll 1$ and $aω\ll 1$; $\tilde{m}_e\ll 1$ and $aω\gtrsim 1$. We treat each of these two regimes in different ways: we use a perturbative approach for the first, similar to the \emph{usual} one e… ▽ More In this work, we aim at solving the Teukolsky equations for a fermion with mass $m_e\neq 0$ in the presence of a rotating black hole with mass $M$. We consider two different regimes: $\tilde{m}_e= M^{-1} m_e\ll 1$ and $aω\ll 1$; $\tilde{m}_e\ll 1$ and $aω\gtrsim 1$. We treat each of these two regimes in different ways: we use a perturbative approach for the first, similar to the \emph{usual} one employed for spin 0, 1, 2 and mass-less 1/2 fields, but with two small parameters (a$ω$ and $\tilde{m}_e$); as we shall see, the second can be treated with a semi-analytical approach. In a forthcoming paper we shall study the remaining two cases in which $\tilde{m}_e \gtrsim 1$, while $aω\ll 1$ or $aω\gtrsim 1$. The regime with $\tilde{m}_e \ll 1$, but $aω\gtrsim 1$ is probably the most interesting from the astrophysics point of view, but this last two cases might be of some interest for the study of the interaction of fermions with very small black holes, which may be formed, for example, in the last stages of the Hawking evaporation. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 17 pages

arXiv:2310.00495 [pdf]

Characterization of hydrogenated amorphous silicon sensors on polyimide flexible substrate

Authors: M. Menichelli, L. Antognini, S. Aziz, A. Bashiri, M. Bizzarri, L. Calcagnile, M. Caprai, D. Caputo, A. P. Caricato, R. Catalano, D. Chilà, G. A. P. Cirrone, T. Croci, G. Cuttone, G. De Cesare, S. Dunand, M. Fabi, L. Frontini, C. Grimani, M. Ionica, K. Kanxheri, M. Large, V. Liberali, N. Lovecchio, M. Martino , et al. (28 additional authors not shown)

Abstract: Hydrogenated amorphous silicon (a-Si:H) is a material having an intrinsically high radiation hardness that can be deposited on flexible substrates like Polyimide. For these properties a-Si:H can be used for the production of flexible sensors. a-Si:H sensors can be successfully utilized in dosimetry, beam monitoring for particle physics (x-ray, electron, gamma-ray and proton detection) and radiothe… ▽ More Hydrogenated amorphous silicon (a-Si:H) is a material having an intrinsically high radiation hardness that can be deposited on flexible substrates like Polyimide. For these properties a-Si:H can be used for the production of flexible sensors. a-Si:H sensors can be successfully utilized in dosimetry, beam monitoring for particle physics (x-ray, electron, gamma-ray and proton detection) and radiotherapy, radiation flux measurement for space applications (study of solar energetic particles and stellar events) and neutron flux measurements. In this paper we have studied the dosimetric x-ray response of n-i-p diodes deposited on Polyimide. We measured the linearity of the photocurrent response to x-rays versus dose-rate from which we have extracted the dosimetric x-ray sensitivity at various bias voltages. In particular low bias voltage operation has been studied to assess the high energy efficiency of these kind of sensor. A measurement of stability of x-ray response versus time has been shown. The effect of detectors annealing has been studied. Operation under bending at various bending radii is also shown. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2307.11598 [pdf, other]

doi 10.1051/0004-6361/202346679

Particle monitoring capability of the Solar Orbiter Metis coronagraph through the increasing phase of solar cycle 25

Authors: Catia Grimani, Vincenzo Andretta, Ester Antonucci, Paolo Chioetto, Vania Da Deppo, Michele Fabi, Samuel Gissot, Giovanna Jerse, Mauro Messerotti, Giampiero Naletto, Maurizio Pancrazzi, Andrea Persici, Christina Plainaki, Marco Romoli, Federico Sabbatini, Daniele Spadaro, Marco Stangalini, Daniele Telloni, Luca Teriaca, Michela Uslenghi, Mattia Villani, Lucia Abbo, Aleksandr Burtovoi, Federica Frassati, Federico Landini , et al. (4 additional authors not shown)

Abstract: Context. Galactic cosmic rays (GCRs) and solar particles with energies greater than tens of MeV penetrate spacecraft and instruments hosted aboard space missions. The Solar Orbiter Metis coronagraph is aimed at observing the solar corona in both visible (VL) and ultraviolet (UV) light. Particle tracks are observed in the Metis images of the corona. An algorithm has been implemented in the Metis pr… ▽ More Context. Galactic cosmic rays (GCRs) and solar particles with energies greater than tens of MeV penetrate spacecraft and instruments hosted aboard space missions. The Solar Orbiter Metis coronagraph is aimed at observing the solar corona in both visible (VL) and ultraviolet (UV) light. Particle tracks are observed in the Metis images of the corona. An algorithm has been implemented in the Metis processing electronics to detect the VL image pixels crossed by cosmic rays. This algorithm was initially enabled for the VL instrument only, since the process of separating the particle tracks in the UV images has proven to be very challenging. Aims. We study the impact of the overall bulk of particles of galactic and solar origin on the Metis coronagraph images. We discuss the effects of the increasing solar activity after the Solar Orbiter mission launch on the secondary particle production in the spacecraft. Methods. We compared Monte Carlo simulations of GCRs crossing or interacting in the Metis VL CMOS sensor to observations gathered in 2020 and 2022. We also evaluated the impact of solar energetic particle events of different intensities on the Metis images. Results. The study of the role of abundant and rare cosmic rays in firing pixels in the Metis VL images of the corona allows us to estimate the efficiency of the algorithm applied for cosmic-ray track removal from the images and to demonstrate that the instrument performance had remained unchanged during the first two years of the Solar Orbiter operations. The outcome of this work can be used to estimate the Solar Orbiter instrument's deep charging and the order of magnitude for energetic particles crossing the images of Metis and other instruments such as STIX and EUI. △ Less

Submitted 24 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: 8 pages, 6 figures

Journal ref: A&A 677, A45 (2023)

arXiv:2306.11827 [pdf, other]

Any Deep ReLU Network is Shallow

Authors: Mattia Jacopo Villani, Nandi Schoots

Abstract: We constructively prove that every deep ReLU network can be rewritten as a functionally identical three-layer network with weights valued in the extended reals. Based on this proof, we provide an algorithm that, given a deep ReLU network, finds the explicit weights of the corresponding shallow network. The resulting shallow network is transparent and used to generate explanations of the model s be… ▽ More We constructively prove that every deep ReLU network can be rewritten as a functionally identical three-layer network with weights valued in the extended reals. Based on this proof, we provide an algorithm that, given a deep ReLU network, finds the explicit weights of the corresponding shallow network. The resulting shallow network is transparent and used to generate explanations of the model s behaviour. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 12 pages including bibliography and appendix

arXiv:2305.09424 [pdf, ps, other]

Unwrap** All ReLU Networks

Authors: Mattia Jacopo Villani, Peter McBurney

Abstract: Deep ReLU Networks can be decomposed into a collection of linear models, each defined in a region of a partition of the input space. This paper provides three results extending this theory. First, we extend this linear decompositions to Graph Neural networks and tensor convolutional networks, as well as networks with multiplicative interactions. Second, we provide proofs that neural networks can b… ▽ More Deep ReLU Networks can be decomposed into a collection of linear models, each defined in a region of a partition of the input space. This paper provides three results extending this theory. First, we extend this linear decompositions to Graph Neural networks and tensor convolutional networks, as well as networks with multiplicative interactions. Second, we provide proofs that neural networks can be understood as interpretable models such as Multivariate Decision trees and logical theories. Finally, we show how this model leads to computing cheap and exact SHAP values. We validate the theory through experiments with on Graph Neural Networks. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2302.00339 [pdf, other]

A Hydrogenated amorphous silicon detector for Space Weather Applications

Authors: Catia Grimani, Michele Fabi, Federico Sabbatini, Mattia Villani, Lucio Calcagnile, Anna Paola Caricato, Roberto Catalano, Giuseppe Antonio Pablo Cirrone, Tommaso Croci, Giacomo Cuttone, Sylvain Dunand, Luca Frontini, Maria Ionica, Keida Kanxheri, Matthew Large, Valentino Liberali, Maurizio Martino, Giuseppe Maruccio, Giovanni Mazza, Mauro Menichelli, Anna Grazia Monteduro, Arianna Morozzi, Francesco Moscatelli, Stefania Pallotta, Daniele Passeri , et al. (13 additional authors not shown)

Abstract: The characteristics of a hydrogenated amorphous silicon (a-Si:H) detector are presented here for monitoring in space solar flares and the evolution of large energetic proton events up to hundreds of MeV. The a-Si:H presents an excellent radiation hardness and finds application in harsh radiation environments for medical purposes, for particle beam characterization and in space weather science and… ▽ More The characteristics of a hydrogenated amorphous silicon (a-Si:H) detector are presented here for monitoring in space solar flares and the evolution of large energetic proton events up to hundreds of MeV. The a-Si:H presents an excellent radiation hardness and finds application in harsh radiation environments for medical purposes, for particle beam characterization and in space weather science and applications. The critical flux detection threshold for solar X rays, soft gamma rays, electrons and protons is discussed in detail. △ Less

Submitted 1 September, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

Comments: 32 pages, 13 figures, submitted to Experimental Astronomy

arXiv:2212.09671 [pdf, other]

doi 10.1007/978-3-031-45434-9_9

Bohmian Mechanics as a Practical Tool

Authors: Xabier Oianguren-Asua, Carlos F. Destefani, Matteo Villani, David K. Ferry, Xavier Oriols

Abstract: In this chapter, we will take a trip around several hot-spots where Bohmian mechanics and its capacity to describe the microscopic reality, even in the absence of measurements, can be harnessed as computational tools, in order to help in the prediction of phenomenologically accessible information (also useful for the followers of the Copenhagen theory). As a first example, we will see how a Stocha… ▽ More In this chapter, we will take a trip around several hot-spots where Bohmian mechanics and its capacity to describe the microscopic reality, even in the absence of measurements, can be harnessed as computational tools, in order to help in the prediction of phenomenologically accessible information (also useful for the followers of the Copenhagen theory). As a first example, we will see how a Stochastic Schrödinger Equation, when used to compute the reduced density matrix of a non-Markovian open quantum system, necessarily seems to employ the Bohmian concept of a conditional wavefunction. We will see that by dressing these conditional wavefunctions with an interpretation, the Bohmian theory can prove to be a useful tool to build general quantum frameworks, like a high-frequency electron transport model. As a second example, we will introduce how a Copenhagen "observable operator" can be derived from numerical properties of the Bohmian trajectories, which within Bohmian mechanics, are well-defined even for an "unmeasured" system. Most importantly in practice, even if these numbers are given no ontological meaning, not only we will be able to simulate (thus, predict and talk about) them, but we will see that they can be operationally determined in a weak value experiment. Therefore, they will be practical numbers to characterize a quantum system irrespective of the followed quantum theory. △ Less

Submitted 14 February, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: 13 pages, 1 figure, final version of the chapter of the book "Physics and the Nature of Reality: Essays in Memory of Detlef Dürr", published by Springer (2024), edited by Angelo Bassi, Sheldon Goldstein, Roderich Tumulka and Nino Zanghì

Journal ref: In: Bassi, A. et al. (eds) Physics and the Nature of Reality. Fundamental Theories of Physics, vol 215. Springer, Cham (2024)

arXiv:2212.00542 [pdf, other]

Graph Convolutional Neural Networks as Parametric CoKleisli morphisms

Authors: Bruno Gavranović, Mattia Villani

Abstract: We define the bicategory of Graph Convolutional Neural Networks $\mathbf{GCNN}_n$ for an arbitrary graph with $n$ nodes. We show it can be factored through the already existing categorical constructions for deep learning called $\mathbf{Para}$ and $\mathbf{Lens}$ with the base category set to the CoKleisli category of the product comonad. We prove that there exists an injective-on-objects, faithfu… ▽ More We define the bicategory of Graph Convolutional Neural Networks $\mathbf{GCNN}_n$ for an arbitrary graph with $n$ nodes. We show it can be factored through the already existing categorical constructions for deep learning called $\mathbf{Para}$ and $\mathbf{Lens}$ with the base category set to the CoKleisli category of the product comonad. We prove that there exists an injective-on-objects, faithful 2-functor $\mathbf{GCNN}_n \to \mathbf{Para}(\mathsf{CoKl}(\mathbb{R}^{n \times n} \times -))$. We show that this construction allows us to treat the adjacency matrix of a GCNN as a global parameter instead of a a local, layer-wise one. This gives us a high-level categorical characterisation of a particular kind of inductive bias GCNNs possess. Lastly, we hypothesize about possible generalisations of GCNNs to general message-passing graph neural networks, connections to equivariant learning, and the (lack of) functoriality of activation functions. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 21 pages

arXiv:2211.17114 [pdf]

Development of thin hydrogenated amorphous silicon detectors on a flexible substrate

Authors: M. Menichelli, M. Bizzarri, L. Calcagnile, M. Caprai, A. P. Caricato, R. Catalano, G. A. P. Cirrone, T. Croci, G. Cuttone, S. Dunand, M. Fabi, L. Frontini, B. Gianfelici, C. Grimani, M. Ionica, K. Kanxheri, M. Large, V. Liberali, M. Martino, G. Maruccio, G. Mazza, A. G. Monteduro, A. Morozzi, F. Moscatelli, S. Pallotta , et al. (18 additional authors not shown)

Abstract: The HASPIDE (Hydrogenated Amorphous Silicon PIxels DEtectors) project aims at the development of thin hydrogenated amorphous silicon (a-Si:H) detectors on flexible substrates (mostly Polyimide) for beam monitoring, neutron detection and space applications. Since a-Si:H is a material with superior radiation hardness, the benefit for the above-mentioned applications can be appreciated mostly in radi… ▽ More The HASPIDE (Hydrogenated Amorphous Silicon PIxels DEtectors) project aims at the development of thin hydrogenated amorphous silicon (a-Si:H) detectors on flexible substrates (mostly Polyimide) for beam monitoring, neutron detection and space applications. Since a-Si:H is a material with superior radiation hardness, the benefit for the above-mentioned applications can be appreciated mostly in radiation harsh environments. Furthermore, the possibility to deposit this material on flexible substrates like Polyimide (PI), polyethylene naphthalate (PEN) or polyethylene terephthalate (PET) facilitates the usage of these detectors in medical dosimetry, beam flux and beam profile measurements. Particularly interesting is its use when positioned directly on the flange of the vacuum-to-air separation interface in a beam line, as well as other applications where a thin self-standing radiation flux detector is envisaged. In this paper, the HASPIDE project will be described and some preliminary results on PI and glass substrates will be reported. △ Less

Submitted 30 November, 2022; originally announced November 2022.

arXiv:2210.02176 [pdf, other]

Feature Importance for Time Series Data: Improving KernelSHAP

Authors: Mattia Villani, Joshua Lockhart, Daniele Magazzeni

Abstract: Feature importance techniques have enjoyed widespread attention in the explainable AI literature as a means of determining how trained machine learning models make their predictions. We consider Shapley value based approaches to feature importance, applied in the context of time series data. We present closed form solutions for the SHAP values of a number of time series models, including VARMAX. W… ▽ More Feature importance techniques have enjoyed widespread attention in the explainable AI literature as a means of determining how trained machine learning models make their predictions. We consider Shapley value based approaches to feature importance, applied in the context of time series data. We present closed form solutions for the SHAP values of a number of time series models, including VARMAX. We also show how KernelSHAP can be applied to time series tasks, and how the feature importances that come from this technique can be combined to perform "event detection". Finally, we explore the use of Time Consistent Shapley values for feature importance. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: Will appear at ICAIF Workshop on Explainable Artificial Intelligence in Finance, November 2, 2022

arXiv:2209.12791 [pdf, other]

doi 10.1088/1361-6382/acbadd

The role of low-energy electrons in the charging process of LISA test masses

Authors: Simone Taioli, Maurizio Dapor, Francesco Dimiccoli, Michele Fabi, Valerio Ferroni, Catia Grimani, Mattia Villani, William Joseph Weber

Abstract: The space environment encountered by operating spacecraft is populated by a continuous flux of charged particles that penetrate into electronic devices inducing phantom commands and loss of control, eventually leading to satellite failure. Moreover, electron static discharge that results from secondary electron emission of the device materials can also be responsible for satellite malfunction. In… ▽ More The space environment encountered by operating spacecraft is populated by a continuous flux of charged particles that penetrate into electronic devices inducing phantom commands and loss of control, eventually leading to satellite failure. Moreover, electron static discharge that results from secondary electron emission of the device materials can also be responsible for satellite malfunction. In this regard, the estimate of the total electron yield is fundamental for our understanding of the test-mass charging associated with galactic cosmic rays in the LISA Pathfinder mission and in the forthcoming gravitational wave observatory LISA. To unveil the role of low energy electrons in this process owing to galactic and solar energetic particle events, in this work we study the interaction of keV and sub-keV electrons with a gold slab using a mixed Monte Carlo and ab-initio framework. We determine the energy spectrum of the electrons emerging from such a gold slab hit by a primary electron beam by considering the relevant energy loss mechanisms as well as the elastic scattering events. We also show that our results are consistent with experimental data and Monte Carlo simulations carried out with the GEANT4-DNA toolkit. △ Less

Submitted 21 February, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.12329 [pdf, other]

Interplanetary medium monitoring with LISA: lessons from LISA Pathfinder

Authors: A. Cesarini, C. Grimani, S. Benella, M. Fabi, F. Sabbatini, M. Villani, D. Telloni

Abstract: The Laser Interferometer Space Antenna (LISA) of the European Space Agency (ESA) will be the first low-frequency gravitational-wave observatory orbiting the Sun at 1 AU. The LISA Pathfinder (LPF) mission, aiming at testing of the instruments to be located on board the LISA spacecraft (S/C), hosted, among the others, fluxgate magnetometers and a particle detector as parts of a diagnostics subsystem… ▽ More The Laser Interferometer Space Antenna (LISA) of the European Space Agency (ESA) will be the first low-frequency gravitational-wave observatory orbiting the Sun at 1 AU. The LISA Pathfinder (LPF) mission, aiming at testing of the instruments to be located on board the LISA spacecraft (S/C), hosted, among the others, fluxgate magnetometers and a particle detector as parts of a diagnostics subsystem. These instruments allowed us for the estimate of the magnetic and Coulomb spurious forces acting on the test masses that constitute the mirrors of the interferometer. With these instruments we also had the possibility to study the galactic cosmic-ray short term-term variations as a function of the particle energy and the associated interplanetary disturbances. Platform magnetometers and particle detectors will be also placed on board each LISA S/C. This work reports about an empirical method that allowed us to disentangle the interplanetary and onboard-generated components of the magnetic field by using the LPF magnetometer measurements. Moreover, we estimate the number and fluence of solar energetic particle events expected to be observed with the ESA Next Generation Radiation Monitor during the mission lifetime. An additional cosmic-ray detector, similar to that designed for LPF, in combination with magnetometers, would permit to observe the evolution of recurrent and non-recurrent galactic cosmic-ray variations and associated increases of the interplanetary magnetic field at the transit of high-speed solar wind streams and interplanetary counterparts of coronal mass ejections. The diagnostics subsystem of LISA makes this mission also a natural multi-point observatory for space weather science investigations. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Comments: Accepted for publication Journal of Space Weather and Space Climate (JSWSC)

arXiv:2208.08849 [pdf, ps, other]

doi 10.1051/0004-6361/202243984

Bridging the gap between Monte Carlo simulations and measurements of the LISA Pathfinder test-mass charging for LISA

Authors: Catia Grimani, Mattia Villani, Michele Fabi, Andrea Cesarini, Federico Sabbatini

Abstract: Cubic gold-platinum free-falling test masses (TMs) constitute the mirrors of future LISA and LISA-like interferometers for low-frequency gravitational wave detection in space. High-energy particles of Galactic and solar origin charge the TMs and thus induce spurious electrostatic and magnetic forces that limit the sensitivity of these interferometers. Prelaunch Monte Carlo simulations of the TM ch… ▽ More Cubic gold-platinum free-falling test masses (TMs) constitute the mirrors of future LISA and LISA-like interferometers for low-frequency gravitational wave detection in space. High-energy particles of Galactic and solar origin charge the TMs and thus induce spurious electrostatic and magnetic forces that limit the sensitivity of these interferometers. Prelaunch Monte Carlo simulations of the TM charging were carried out for the LISA Pathfinder (LPF) mission, that was planned to test the LISA instrumentation. Measurements and simulations were compared during the mission operations. The measured net TM charging agreed with simulation estimates, while the charging noise was three to four times higher. We aim to bridge the gap between LPF TM charging noise simulations and observations. New Monte Carlo simulations of the LPF TM charging due to both Galactic and solar particles were carried out with the FLUKA/LEI toolkit. This allowed propagating low-energy electrons down to a few electronvolt. These improved FLUKA/LEI simulations agree with observations gathered during the mission operations within statistical and Monte Carlo errors. The charging noise induced by Galactic cosmic rays is about one thousand charges per second. This value increases to tens of thousands charges per second during solar energetic particle events. Similar results are expected for the LISA TM charging. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: 11 pages, 9 figures

Journal ref: A&A 666, A38 (2022)

arXiv:2204.14202 [pdf, other]

doi 10.1103/PhysRevB.106.205306

Resonant tunneling diodes in semiconductor microcavities: modeling polaritonic features in the THz displacement current

Authors: Carlos F. Destefani, Matteo Villani, Xavier Cartoixà, Michael Feiginov, Xavier Oriols

Abstract: We develop in this work a simple qualitative quantum electron transport model, in the strong light-matter coupling regime under dipole approximation, able to capture polaritonic signatures in the time-dependent electrical current. The effect of the quantized electromagnetic field in the displacement current of a resonant tunneling diode inside an optical cavity is analyzed. The original peaks of t… ▽ More We develop in this work a simple qualitative quantum electron transport model, in the strong light-matter coupling regime under dipole approximation, able to capture polaritonic signatures in the time-dependent electrical current. The effect of the quantized electromagnetic field in the displacement current of a resonant tunneling diode inside an optical cavity is analyzed. The original peaks of the bare electron transmission coefficient split into two new peaks due to the resonant electron-photon interaction, leading to coherent Rabi oscillations among the polaritonic states that are developed in the system in the strong coupling regime. This mimics known effects predicted by a Jaynes-Cummings model in closed systems, and shows how a full quantum treatment of electrons and electromagnetic fields may open interesting paths for engineering new THz electron devices. The computational burden involved in the multi-time measurements of THz currents is tackled by invoking a Bohmian description of the light-matter interaction. We also show that the traditional static transmission coefficient used to characterize DC quantum electron devices has to be substituted by a new displacement current coefficient in high-frequency AC scenarios. △ Less

Submitted 15 November, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: 26 pages, 8 figures

arXiv:2202.01569 [pdf, ps, other]

doi 10.3390/e23040408

Scattering in Terms of Bohmian Conditional Wave Functions for Scenarios with Non-Commuting Energy and Momentum Operators

Authors: Matteo Villani, Guillermo Albareda, Carlos Destefani, Xavier Cartoixà, Xavier Oriols

Abstract: Without access to the full quantum state, modeling quantum transport in mesoscopic systems requires dealing with a limited number of degrees of freedom. In this work, we analyze the possibility of modeling the perturbation induced by the non-simulated degrees of freedom on the simulated ones as a transition between single-particle pure states. First, we show that Bohmian conditional wave functions… ▽ More Without access to the full quantum state, modeling quantum transport in mesoscopic systems requires dealing with a limited number of degrees of freedom. In this work, we analyze the possibility of modeling the perturbation induced by the non-simulated degrees of freedom on the simulated ones as a transition between single-particle pure states. First, we show that Bohmian conditional wave functions (BCWF) allow a rigorous discussion of the dynamics of electrons inside open quantum systems in terms of such single-particle pure states, either under Markovian or non-Markovian conditions. Second, we discuss the practical application of the method for modeling light-matter interaction phenomena in a resonant tunneling device (RTD), where a single photon is interacting with a single electron. Third, we emphasize the importance of interpreting such scattering mechanism as a transition between initial and final single-particle BCWF with well-defined energies (rather than with well-defined momenta). △ Less

Submitted 3 February, 2022; originally announced February 2022.

Journal ref: Entropy 2021, 23(4), 408

arXiv:2201.07874 [pdf, ps, other]

Bayesian Prediction with Covariates Subject to Detection Limits

Authors: Caroline Svahn, Mattias Villani

Abstract: Missing values in covariates due to censoring by signal interference or lack of sensitivity in the measuring devices are common in industrial problems. We propose a full Bayesian solution to the prediction problem with an efficient Markov Chain Monte Carlo (MCMC) algorithm that updates all the censored covariate values jointly in a random scan Gibbs sampler. We show that the joint updating of miss… ▽ More Missing values in covariates due to censoring by signal interference or lack of sensitivity in the measuring devices are common in industrial problems. We propose a full Bayesian solution to the prediction problem with an efficient Markov Chain Monte Carlo (MCMC) algorithm that updates all the censored covariate values jointly in a random scan Gibbs sampler. We show that the joint updating of missing covariate values can be at least two orders of magnitude more efficient than univariate updating. This increased efficiency is shown to be crucial for quickly learning the missing covariate values and their uncertainty in a real-time decision making context, in particular when there is substantial correlation in the posterior for the missing values. The approach is evaluated on simulated data and on data from the telecom sector. Our results show that the proposed Bayesian imputation gives substantially more accurate predictions than naïve imputation, and that the use of auxiliary variables in the imputation gives additional predictive power. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2201.07574 [pdf, ps, other]

doi 10.1007/s10825-021-01798-1

Can Wigner distribution functions with collisions satisfy complete positivity and energy conservation?

Authors: Matteo Villani, Xavier Oriols

Abstract: To avoid the computational burden of many-body quantum simulation, the interaction of an electron with a photon (phonon) is typically accounted for by disregarding the explicit simulation of the photon (phonon) degree of freedom and just modelling its effect on the electron dynamics. For quantum models developed from the (reduced) density matrix or its Wigner-Weyl transformation, the modelling of… ▽ More To avoid the computational burden of many-body quantum simulation, the interaction of an electron with a photon (phonon) is typically accounted for by disregarding the explicit simulation of the photon (phonon) degree of freedom and just modelling its effect on the electron dynamics. For quantum models developed from the (reduced) density matrix or its Wigner-Weyl transformation, the modelling of collisions may violate complete positivity (precluding the typical probabilistic interpretation). In this paper, we show that such quantum transport models can also strongly violate the energy conservation in the electron-photon (electron-phonon) interactions. After comparing collisions models to exact results for an electron interacting with a photon, we conclude that there is no fundamental restriction that prevents a collision model developed within the (reduced) density matrix or Wigner formalisms to satisfy simultaneously complete positivity and energy conservation. However, at the practical level, the development of such satisfactory collision model seems very complicated. Collision models with an explicit knowledge of the microscopic state ascribed to each electron seems recommendable, since they allow to model collisions of each electron individually in a controlled way satisfying both conditions. △ Less

Submitted 19 January, 2022; originally announced January 2022.

Journal ref: J. Comput. Electron 20, 2232 (2021)

arXiv:2112.09073 [pdf, other]

Local Prediction Pools

Authors: Oscar Oelrich, Mattias Villani, Sebastian Ankargren

Abstract: We propose local prediction pools as a method for combining the predictive distributions of a set of experts conditional on a set of variables believed to be related to the predictive accuracy of the experts. This is done in a two step process where we first estimate the conditional predictive accuracy of each expert given a vector of covariates$\unicode{x2014}$or pooling variables… ▽ More We propose local prediction pools as a method for combining the predictive distributions of a set of experts conditional on a set of variables believed to be related to the predictive accuracy of the experts. This is done in a two step process where we first estimate the conditional predictive accuracy of each expert given a vector of covariates$\unicode{x2014}$or pooling variables$\unicode{x2014}$and then combine the predictive distributions of the experts conditional on this local predictive accuracy. To estimate the local predictive accuracy of each expert, we introduce the simple, fast, and interpretable caliper method. Expert pooling weights from the local prediction pool approaches the equal weight solution whenever there is little data on local predictive performance, making the pools robust and adaptive. We also propose a local version of the widely used optimal prediction pools. Local prediction pools are shown to outperform the widely used optimal linear pools in a macroeconomic forecasting evaluation, and in predicting daily bike usage for a bike rental company. △ Less

Submitted 25 August, 2023; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 21 pages, 9 figures, 4 tables

arXiv:2112.04249 [pdf, other]

Bayesian Modeling of Effective and Functional Brain Connectivity using Hierarchical Vector Autoregressions

Authors: Bertil Wegmann, Anders Lundquist, Anders Eklund, Mattias Villani

Abstract: Analysis of brain connectivity is important for understanding how information is processed by the brain. We propose a novel Bayesian vector autoregression (VAR) hierarchical model for analyzing brain connectivity in a resting-state fMRI data set with autism spectrum disorder (ASD) patients and healthy controls. Our approach models functional and effective connectivity simultaneously, which is new… ▽ More Analysis of brain connectivity is important for understanding how information is processed by the brain. We propose a novel Bayesian vector autoregression (VAR) hierarchical model for analyzing brain connectivity in a resting-state fMRI data set with autism spectrum disorder (ASD) patients and healthy controls. Our approach models functional and effective connectivity simultaneously, which is new in the VAR literature for brain connectivity, and allows for both group- and single-subject inference as well as group comparisons. We combine analytical marginalization with Hamiltonian Monte Carlo (HMC) to obtain highly efficient posterior sampling. The results from more simplified covariance settings are, in general, overly optimistic about functional connectivity between regions compared to our results. In addition, our modeling of heterogeneous subject-specific covariance matrices is shown to give smaller differences in effective connectivity compared to models with a common covariance matrix to all subjects. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 21 pages, 5 figures

arXiv:2109.11449 [pdf, other]

Dynamic Mixture of Experts Models for Online Prediction

Authors: Parfait Munezero, Mattias Villani, Robert Kohn

Abstract: A mixture of experts models the conditional density of a response variable using a mixture of regression models with covariate-dependent mixture weights. We extend the finite mixture of experts model by allowing the parameters in both the mixture components and the weights to evolve in time by following random walk processes. Inference for time-varying parameters in richly parameterized mixture of… ▽ More A mixture of experts models the conditional density of a response variable using a mixture of regression models with covariate-dependent mixture weights. We extend the finite mixture of experts model by allowing the parameters in both the mixture components and the weights to evolve in time by following random walk processes. Inference for time-varying parameters in richly parameterized mixture of experts models is challenging. We propose a sequential Monte Carlo algorithm for online inference and based on a tailored proposal distribution built on ideas from linear Bayes methods and the EM algorithm. The method gives a unified treatment for mixtures with time-varying parameters, including the special case of static parameters. We assess the properties of the method on simulated data and on industrial data where the aim is to predict software faults in a continuously upgraded large-scale software project. △ Less

Submitted 13 October, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: To appear in Technometrics

arXiv:2106.14188 [pdf, other]

doi 10.1088/1361-6382/ac0e1a

Including topology change in Loop Quantum Gravity with topspin network formalism with application to homogeneous and isotropic cosmology

Authors: Mattia Villani

Abstract: We apply topspin network formalism to Loop Quantum Gravity in order to include in the theory the possibility of changes in the topology of spacetime. We apply this formalism to three toy models: with the first, we find that the topology can actually change due to the action of the Hamiltonian constraint and with the second we find that the final state might be a superposition of states with differ… ▽ More We apply topspin network formalism to Loop Quantum Gravity in order to include in the theory the possibility of changes in the topology of spacetime. We apply this formalism to three toy models: with the first, we find that the topology can actually change due to the action of the Hamiltonian constraint and with the second we find that the final state might be a superposition of states with different topologies. In the third and last application, we consider an homogeneous and isotropic Universe, calculating the difference equation that describes the evolution of the system and which are the final topological states after the action of the Hamiltonian constraint. For this last case, we also calculate the transition amplitudes and probabilities from the initial to the final states. △ Less

Submitted 27 June, 2021; originally announced June 2021.

Comments: Accepted for the pubblication on Class. Quantum Grav

arXiv:2106.13576 [pdf, other]

Robust Real-Time Delay Predictions in a Network of High-Frequency Urban Buses

Authors: Hector Rodriguez-Deniz, Mattias Villani

Abstract: Providing transport users and operators with accurate forecasts on travel times is challenging due to a highly stochastic traffic environment. Public transport users are particularly sensitive to unexpected waiting times, which negatively affect their perception on the system's reliability. In this paper we develop a robust model for real-time bus travel time prediction that depart from Gaussian a… ▽ More Providing transport users and operators with accurate forecasts on travel times is challenging due to a highly stochastic traffic environment. Public transport users are particularly sensitive to unexpected waiting times, which negatively affect their perception on the system's reliability. In this paper we develop a robust model for real-time bus travel time prediction that depart from Gaussian assumptions by using Student-$t$ errors. The proposed approach uses spatiotemporal characteristics from the route and previous bus trips to model short-term effects, and date/time variables and Gaussian processes for long-run forecasts. The model allows for flexible modeling of mean, variance and kurtosis spaces. We propose algorithms for Bayesian inference and for computing probabilistic forecast distributions. Experiments are performed using data from high-frequency buses in Stockholm, Sweden. Results show that Student-$t$ models outperform Gaussian ones in terms of log-posterior predictive power to forecast bus delays at specific stops, which reveals the importance of accounting for predictive uncertainty in model selection. Estimated Student-$t$ regressions capture typical temporal variability between within-day hours and different weekdays. Strong spatiotemporal effects are detected for incoming buses from immediately previous stops, which is in line with many recently developed models. We finally show how Bayesian inference naturally allows for predictive uncertainty quantification, e.g. by returning the predictive probability that the delay of an incoming bus exceeds a given threshold. △ Less

Submitted 24 February, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2104.02134 [pdf, other]

Spectral Subsampling MCMC for Stationary Multivariate Time Series with Applications to Vector ARTFIMA Processes

Authors: Mattias Villani, Matias Quiroz, Robert Kohn, Robert Salomone

Abstract: Spectral subsampling MCMC was recently proposed to speed up Markov chain Monte Carlo (MCMC) for long stationary univariate time series by subsampling periodogram observations in the frequency domain. This article extends the approach to multivariate time series using a multivariate generalisation of the Whittle likelihood. To assess the computational gains from spectral subsampling in challenging… ▽ More Spectral subsampling MCMC was recently proposed to speed up Markov chain Monte Carlo (MCMC) for long stationary univariate time series by subsampling periodogram observations in the frequency domain. This article extends the approach to multivariate time series using a multivariate generalisation of the Whittle likelihood. To assess the computational gains from spectral subsampling in challenging problems, a multivariate generalisation of the autoregressive tempered fractionally integrated moving average model (ARTFIMA) is introduced and some of its properties derived. Bayesian inference based on the Whittle likelihood is demonstrated to be a fast and accurate alternative to the exact time domain likelihood. Spectral subsampling is shown to provide up to two orders of magnitude additional speed-up, while retaining MCMC sampling efficiency and accuracy, compared to spectral methods using the full dataset. Keywords: Bayesian, Markov chain Monte Carlo, Semi-long memory, Spectral analysis, Whittle likelihood. △ Less

Submitted 18 September, 2022; v1 submitted 5 April, 2021; originally announced April 2021.

arXiv:2103.11808 [pdf, ps, other]

doi 10.1109/LED.2021.3049229

There is Plenty of Room for THz Tunneling Electron Devices Beyond the Transit Time Limit

Authors: Matteo Villani, Simone Clochiatti, Werner Prost, Nils Weimann, Xavier Oriols

Abstract: The traditional transmission coefficient present in the original Landauer formulation, which is valid for quasi-static scenarios with working frequencies below the inverse of the electron transit time, is substituted by a novel time-dependent displacement current coefficient valid for frequencies above this limit. Our model captures in a simple way the displacement current component of the total c… ▽ More The traditional transmission coefficient present in the original Landauer formulation, which is valid for quasi-static scenarios with working frequencies below the inverse of the electron transit time, is substituted by a novel time-dependent displacement current coefficient valid for frequencies above this limit. Our model captures in a simple way the displacement current component of the total current, which at frequencies larger than the inverse of the electron transit time can be more relevant than the particle component. The proposed model is applied to compute the response of a resonant tunneling diode from 10$\,$GHz up to 5$\,$THz. We show that tunneling electron devices are intrinsically nonlinear at such high frequencies, even under small-signal conditions, due to memory effects related to the displacement current. We show that these intrinsic nonlinearities (anharmonicities) represent an advantage, rather than a drawback, as they open the path for tunneling devices in many THz applications, and avoid further device downscaling. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Journal ref: IEEE Electron Device Letters Vol. 42, Num. 2, Feb. 2021, pp. 224 - 227

arXiv:2012.02690 [pdf, other]

doi 10.1088/1361-6382/abd142

Low-energy electromagnetic processes affecting free-falling test-mass charging for LISA and future space interferometers

Authors: Catia Grimani, Andrea Cesarini, Michele Fabi, Mattia Villani

Abstract: Galactic cosmic rays and solar energetic particles charge gold-platinum, free-falling test masses (TMs) on board interferometers for the detection of gravitational waves in space. The charging process induces spurious forces on the test masses that affect the sensitivity of these instruments mainly below $10^{-3}$ Hz. Geant4 and FLUKA Monte Carlo simulations were carried out to study the TM chargi… ▽ More Galactic cosmic rays and solar energetic particles charge gold-platinum, free-falling test masses (TMs) on board interferometers for the detection of gravitational waves in space. The charging process induces spurious forces on the test masses that affect the sensitivity of these instruments mainly below $10^{-3}$ Hz. Geant4 and FLUKA Monte Carlo simulations were carried out to study the TM charging process on board the LISA Pathfinder mission that remained into orbit around the Sun-Earth Lagrange point L1 between 2016 and 2017. While a good agreement was observed between simulations and measurements of the TMs net charging, the shot noise associated with charging fluctuations of both positive and negative particles resulted 3-4 times higher that predicted. The origin of this mismatch was attributed to the propagation of electrons and photons only above 100 eV in the simulations. In this paper, low-energy electromagnetic processes to be included in the future Monte Carlo simulations for LISA and LISA-like space interferometers TM charging are considered. {It is found that electrons and photons below 100 eV give a contribution to the effective charging comparable to that of the whole sample of particles above this energy. In particular, for incident protons ionization contributes twice with respect to low energy kinetic emission and electron backscattering. The other processes are found to play a negligible role. For heavy nuclei only sputtering must be considered. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: Accepted for publication in Classical and Quantum Gravity

arXiv:2012.01152 [pdf, ps, other]

doi 10.3847/1538-4357/abbb90

Recurrent galactic cosmic-ray flux modulation in L1 and geomagnetic activity during the declining phase of the solar cycle 24

Authors: Catia Grimani, Andrea Cesarini, Michele Fabi, Federico Sabbatini, Daniele Telloni, Mattia Villani

Abstract: Galactic cosmic-ray (GCR) flux short-term variations ($<$1 month) in the inner heliosphere are mainly associated with the passage of high-speed solar wind streams (HSS) and interplanetary (IP) counterparts of coronal mass ejections (ICMEs). Data gathered with a particle detector flown on board the ESA LISA Pathfinder (LPF) spacecraft, during the declining part of the solar cycle 24 (February 2016… ▽ More Galactic cosmic-ray (GCR) flux short-term variations ($<$1 month) in the inner heliosphere are mainly associated with the passage of high-speed solar wind streams (HSS) and interplanetary (IP) counterparts of coronal mass ejections (ICMEs). Data gathered with a particle detector flown on board the ESA LISA Pathfinder (LPF) spacecraft, during the declining part of the solar cycle 24 (February 2016 - July 2017) around the Lagrange point L1, have allowed to study the characteristics of recurrent cosmic-ray flux modulations above 70 MeV n$^{-1}$. %These modulations are observed when the solar wind speed is $>$ 400 km s$^{-1}$ and/or the IP magnetic field intensity $>$ 10 nT. It is shown that the amplitude and evolution of individual modulations depend in a unique way on both IP plasma parameters and particle flux intensity before HSS and ICMEs transit. By comparing the LPF data with those gathered contemporaneously with the magnetic spectrometer experiment AMS-02 on board the International Space Station and with those of Earth polar neutron monitors, the GCR flux modulation was studied at different energies during recurrent short-term variations. It is also aimed to set the near real-time particle observation requirements to disentangle the role of long and short-term variations of the GCR flux to evaluate the performance of high-sensitivity instruments in space such as the future interferometers for gravitational wave detection. Finally, the association between recurrent GCR flux variation observations in L1 and weak to moderate geomagnetic activity in 2016-2017 is discussed. Short-term recurrent GCR flux variations are good proxies of recurrent geomagnetic activity when the B$_z$ component of the IP magnetic field is directed northern. △ Less

Submitted 3 December, 2020; v1 submitted 2 December, 2020; originally announced December 2020.

Journal ref: ApJ 904 64 (2020)

arXiv:2004.10092 [pdf, other]

Bayesian Optimization of Hyperparameters from Noisy Marginal Likelihood Estimates

Authors: Oskar Gustafsson, Mattias Villani, Pär Stockhammar

Abstract: Bayesian models often involve a small set of hyperparameters determined by maximizing the marginal likelihood. Bayesian optimization is a popular iterative method where a Gaussian process posterior of the underlying function is sequentially updated by new function evaluations. An acquisition strategy uses this posterior distribution to decide where to place the next function evaluation. We propose… ▽ More Bayesian models often involve a small set of hyperparameters determined by maximizing the marginal likelihood. Bayesian optimization is a popular iterative method where a Gaussian process posterior of the underlying function is sequentially updated by new function evaluations. An acquisition strategy uses this posterior distribution to decide where to place the next function evaluation. We propose a novel Bayesian optimization framework for situations where the user controls the computational effort, and therefore the precision of the function evaluations. This is a common situation in econometrics where the marginal likelihood is often computed by Markov chain Monte Carlo (MCMC) or importance sampling methods, with the precision of the marginal likelihood estimator determined by the number of samples. The new acquisition strategy gives the optimizer the option to explore the function with cheap noisy evaluations and therefore find the optimum faster. The method is applied to estimating the prior hyperparameters in two popular models on US macroeconomic time series data: the steady-state Bayesian vector autoregressive (BVAR) and the time-varying parameter BVAR with stochastic volatility. The proposed method is shown to find the optimum much quicker than traditional Bayesian optimization or grid search. △ Less

Submitted 17 August, 2022; v1 submitted 21 April, 2020; originally announced April 2020.

arXiv:2003.04026 [pdf, other]

When are Bayesian model probabilities overconfident?

Authors: Oscar Oelrich, Shutong Ding, Måns Magnusson, Aki Vehtari, Mattias Villani

Abstract: Bayesian model comparison is often based on the posterior distribution over the set of compared models. This distribution is often observed to concentrate on a single model even when other measures of model fit or forecasting ability indicate no strong preference. Furthermore, a moderate change in the data sample can easily shift the posterior model probabilities to concentrate on another model. W… ▽ More Bayesian model comparison is often based on the posterior distribution over the set of compared models. This distribution is often observed to concentrate on a single model even when other measures of model fit or forecasting ability indicate no strong preference. Furthermore, a moderate change in the data sample can easily shift the posterior model probabilities to concentrate on another model. We document overconfidence in two high-profile applications in economics and neuroscience. To shed more light on the sources of overconfidence we derive the sampling variance of the Bayes factor in univariate and multivariate linear regression. The results show that overconfidence is likely to happen when i) the compared models give very different approximations of the data-generating process, ii) the models are very flexible with large degrees of freedom that are not shared between the models, and iii) the models underestimate the true variability in the data. △ Less

Submitted 9 March, 2020; originally announced March 2020.

Comments: 6 pages + 4 pages appendix, 6 figures

arXiv:1911.13136 [pdf, other]

A Multilayered Block Network Model to Forecast Large Dynamic Transportation Graphs: an Application to US Air Transport

Authors: Hector Rodriguez-Deniz, Mattias Villani, Augusto Voltes-Dorta

Abstract: Dynamic transportation networks have been analyzed for years by means of static graph-based indicators in order to study the temporal evolution of relevant network components, and to reveal complex dependencies that would not be easily detected by a direct inspection of the data. This paper presents a state-of-the-art probabilistic latent network model to forecast multilayer dynamic graphs that ar… ▽ More Dynamic transportation networks have been analyzed for years by means of static graph-based indicators in order to study the temporal evolution of relevant network components, and to reveal complex dependencies that would not be easily detected by a direct inspection of the data. This paper presents a state-of-the-art probabilistic latent network model to forecast multilayer dynamic graphs that are increasingly common in transportation and proposes a community-based extension to reduce the computational burden. Flexible time series analysis is obtained by modeling the probability of edges between vertices through latent Gaussian processes. The models and Bayesian inference are illustrated on a sample of 10-year data from four major airlines within the US air transportation system. Results show how the estimated latent parameters from the models are related to the airline's connectivity dynamics, and their ability to project the multilayer graph into the future for out-of-sample full network forecasts, while stochastic blockmodeling allows for the identification of relevant communities. Reliable network predictions would allow policy-makers to better understand the dynamics of the transport system, and help in their planning on e.g. route development, or the deployment of new regulations. △ Less

Submitted 24 February, 2022; v1 submitted 29 November, 2019; originally announced November 2019.

arXiv:1910.13627 [pdf, other]

Spectral Subsampling MCMC for Stationary Time Series

Authors: Robert Salomone, Matias Quiroz, Robert Kohn, Mattias Villani, Minh-Ngoc Tran

Abstract: Bayesian inference using Markov Chain Monte Carlo (MCMC) on large datasets has developed rapidly in recent years. However, the underlying methods are generally limited to relatively simple settings where the data have specific forms of independence. We propose a novel technique for speeding up MCMC for time series data by efficient data subsampling in the frequency domain. For several challenging… ▽ More Bayesian inference using Markov Chain Monte Carlo (MCMC) on large datasets has developed rapidly in recent years. However, the underlying methods are generally limited to relatively simple settings where the data have specific forms of independence. We propose a novel technique for speeding up MCMC for time series data by efficient data subsampling in the frequency domain. For several challenging time series models, we demonstrate a speedup of up to two orders of magnitude while incurring negligible bias compared to MCMC on the full dataset. We also propose alternative control variates for variance reduction based on data grou** and coreset constructions. △ Less

Submitted 15 February, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: Empirical section significantly revised and extended

arXiv:1910.08415 [pdf, other]

Anatomically informed Bayesian spatial priors for fMRI analysis

Authors: David Abramian, Per Sidén, Hans Knutsson, Mattias Villani, Anders Eklund

Abstract: Existing Bayesian spatial priors for functional magnetic resonance imaging (fMRI) data correspond to stationary isotropic smoothing filters that may oversmooth at anatomical boundaries. We propose two anatomically informed Bayesian spatial models for fMRI data with local smoothing in each voxel based on a tensor field estimated from a T1-weighted anatomical image. We show that our anatomically inf… ▽ More Existing Bayesian spatial priors for functional magnetic resonance imaging (fMRI) data correspond to stationary isotropic smoothing filters that may oversmooth at anatomical boundaries. We propose two anatomically informed Bayesian spatial models for fMRI data with local smoothing in each voxel based on a tensor field estimated from a T1-weighted anatomical image. We show that our anatomically informed Bayesian spatial models results in posterior probability maps that follow the anatomical structure. △ Less

Submitted 18 October, 2019; originally announced October 2019.

arXiv:1906.10591 [pdf, other]

Spatial 3D Matérn priors for fast whole-brain fMRI analysis

Authors: Per Sidén, Finn Lindgren, David Bolin, Anders Eklund, Mattias Villani

Abstract: Bayesian whole-brain functional magnetic resonance imaging (fMRI) analysis with three-dimensional spatial smoothing priors has been shown to produce state-of-the-art activity maps without pre-smoothing the data. The proposed inference algorithms are computationally demanding however, and the proposed spatial priors have several less appealing properties, such as being improper and having infinite… ▽ More Bayesian whole-brain functional magnetic resonance imaging (fMRI) analysis with three-dimensional spatial smoothing priors has been shown to produce state-of-the-art activity maps without pre-smoothing the data. The proposed inference algorithms are computationally demanding however, and the proposed spatial priors have several less appealing properties, such as being improper and having infinite spatial range. We propose a statistical inference framework for whole-brain fMRI analysis based on the class of Matérn covariance functions. The framework uses the Gaussian Markov random field (GMRF) representation of possibly anisotropic spatial Matérn fields via the stochastic partial differential equation (SPDE) approach of Lindgren et al. (2011). This allows for more flexible and interpretable spatial priors, while maintaining the sparsity required for fast inference in the high-dimensional whole-brain setting. We develop an accelerated stochastic gradient descent (SGD) optimization algorithm for empirical Bayes (EB) inference of the spatial hyperparameters. Conditionally on the inferred hyperparameters, we make a fully Bayesian treatment of the brain activity. The Matérn prior is applied to both simulated and experimental task-fMRI data and clearly demonstrates that it is a more reasonable choice than the previously used priors, using comparisons of activity maps, prior simulation and cross-validation. △ Less

Submitted 1 October, 2020; v1 submitted 25 June, 2019; originally announced June 2019.

arXiv:1904.04694 [pdf, ps, other]

doi 10.3847/1538-4357/ab0c99

Forbush decreases and $<$ 2-day GCR flux non-recurrent variations studied with LISA Pathfinder

Authors: C. Grimani, M. Armano, H. Audley, J. Baird, S. Benella, P. Binetruy, M. Born, D. Bortoluzzi, E. Castelli, A. Cavalleri, A. Cesarini, A. M. Cruise, K. Danzmann, M. de Deus Silva, I. Diepholz, G. Dixon, R. Dolesi, M. Fabi, L. Ferraioli, V. Ferroni, N. Finetti, E. D. Fitzsimons, M. Freschi, L. Gesa, F. Gibert , et al. (60 additional authors not shown)

Abstract: Non-recurrent short term variations of the galactic cosmic-ray (GCR) flux above 70 MeV n$^{-1}$ were observed between 2016 February 18 and 2017 July 3 aboard the European Space Agency LISA Pathfinder (LPF) mission orbiting around the Lagrange point L1 at 1.5$\times$10$^6$ km from Earth. The energy dependence of three Forbush decreases (FDs) is studied and reported here. A comparison of these obser… ▽ More Non-recurrent short term variations of the galactic cosmic-ray (GCR) flux above 70 MeV n$^{-1}$ were observed between 2016 February 18 and 2017 July 3 aboard the European Space Agency LISA Pathfinder (LPF) mission orbiting around the Lagrange point L1 at 1.5$\times$10$^6$ km from Earth. The energy dependence of three Forbush decreases (FDs) is studied and reported here. A comparison of these observations with others carried out in space down to the energy of a few tens of MeV n$^{-1}$ shows that the same GCR flux parameterization applies to events of different intensity during the main phase. FD observations in L1 with LPF and geomagnetic storm occurrence is also presented. Finally, the characteristics of GCR flux non-recurrent variations (peaks and depressions) of duration $<$ 2 days and their association with interplanetary structures are investigated. It is found that, most likely, plasma compression regions between subsequent corotating high-speed streams cause peaks, while heliospheric current sheet crossing cause the majority of the depressions. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Journal ref: M. Armano et al 2019 ApJ 874 167

arXiv:1903.10443 [pdf, other]

Real-Time Robotic Search using Hierarchical Spatial Point Processes

Authors: Olov Andersson, Per Sidén, Johan Dahlin, Patrick Doherty, Mattias Villani

Abstract: Aerial robots hold great potential for aiding Search and Rescue (SAR) efforts over large areas. Traditional approaches typically searches an area exhaustively, thereby ignoring that the density of victims varies based on predictable factors, such as the terrain, population density and the type of disaster. We present a probabilistic model to automate SAR planning, with explicit minimization of the… ▽ More Aerial robots hold great potential for aiding Search and Rescue (SAR) efforts over large areas. Traditional approaches typically searches an area exhaustively, thereby ignoring that the density of victims varies based on predictable factors, such as the terrain, population density and the type of disaster. We present a probabilistic model to automate SAR planning, with explicit minimization of the expected time to discovery. The proposed model is a hierarchical spatial point process with three interacting spatial fields for i) the point patterns of persons in the area, ii) the probability of detecting persons and iii) the probability of injury. This structure allows inclusion of informative priors from e.g. geographic or cell phone traffic data, while falling back to latent Gaussian processes when priors are missing or inaccurate. To solve this problem in real-time, we propose a combination of fast approximate inference using Integrated Nested Laplace Approximation (INLA), and a novel Monte Carlo tree search tailored to the problem. Experiments using data simulated from real world GIS maps show that the framework outperforms traditional search strategies, and finds up to ten times more injured in the crucial first hours. △ Less

Submitted 25 March, 2019; originally announced March 2019.

arXiv:1811.04653 [pdf, other]

Modeling Text Complexity using a Multi-Scale Probit

Authors: Johan Falkenjack, Mattias Villani, Arne Jönsson

Abstract: We present a novel model for text complexity analysis which can be fitted to ordered categorical data measured on multiple scales, e.g. a corpus with binary responses mixed with a corpus with more than two ordered outcomes. The multiple scales are assumed to be driven by the same underlying latent variable describing the complexity of the text. We propose an easily implemented Gibbs sampler to sam… ▽ More We present a novel model for text complexity analysis which can be fitted to ordered categorical data measured on multiple scales, e.g. a corpus with binary responses mixed with a corpus with more than two ordered outcomes. The multiple scales are assumed to be driven by the same underlying latent variable describing the complexity of the text. We propose an easily implemented Gibbs sampler to sample from the posterior distribution by a direct extension of established data augmentation schemes. By being able to combine multiple corpora with different annotation schemes we can get around the common problem of having more text features than annotated documents, i.e. an example of the $p>n$ problem. The predictive performance of the model is evaluated using both simulated and real world readability data with very promising results. △ Less

Submitted 12 November, 2018; originally announced November 2018.

Comments: 21 pages, 19 figures

arXiv:1810.08493 [pdf, other]

doi 10.1088/1361-6641/aae85c

Implications of the Klein tunneling times on high frequency graphene devices using Bohmian trajectories

Authors: Devashish Pandey, Matteo Villani, Enrique Colomés, Zhen Zhan, Xavier Oriols

Abstract: Because of its large Fermi velocity, leading to a great mobility, graphene is expected to play an important role in (small signal) radio frequency electronics. Among other, graphene devices based on Klein tunneling phenomena are already envisioned. The connection between the Klein tunneling times of electrons and cut-off frequencies of graphene devices is not obvious. We argue in this paper that t… ▽ More Because of its large Fermi velocity, leading to a great mobility, graphene is expected to play an important role in (small signal) radio frequency electronics. Among other, graphene devices based on Klein tunneling phenomena are already envisioned. The connection between the Klein tunneling times of electrons and cut-off frequencies of graphene devices is not obvious. We argue in this paper that the trajectory-based Bohmian approach gives a very natural framework to quantify Klein tunneling times in linear band graphene devices because of its ability to distinguish, not only between transmitted and reflected electrons, but also between reflected electrons that spend time in the barrier and those that do not. Without such distinction, typical expressions found in the literature to compute dwell times can give unphysical results when applied to predict cut-off frequencies. In particular, we study Klein tunneling times for electrons in a two-terminal graphene device constituted by a potential barrier between two metallic contacts. We show that for a zero incident angle (and positive or negative kinetic energy), the transmission coefficient is equal to one, and the dwell time is roughly equal to the barrier distance divided by the Fermi velocity. For electrons incident with a non-zero angle smaller than the critical angle, the transmission coefficient decreases and dwell time can still be easily predicted in the Bohmian framework. The main conclusion of this work is that, contrary to tunneling devices with parabolic bands, the high graphene mobility is roughly independent of the presence of Klein tunneling phenomena in the active device region. △ Less

Submitted 6 March, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

Report number: 34 034002

Journal ref: Semiconductor Science and Technology,Volume 34, Number 3, 2018

arXiv:1807.08409 [pdf, ps, other]

Subsampling MCMC - An introduction for the survey statistician

Authors: Matias Quiroz, Mattias Villani, Robert Kohn, Minh-Ngoc Tran, Khue-Dung Dang

Abstract: The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods sca… ▽ More The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called pseudo-marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature. △ Less

Submitted 20 September, 2018; v1 submitted 22 July, 2018; originally announced July 2018.

Comments: Accepted for publication in Sankhya A. Previous uploaded version contained a bug in generating the figures and references

arXiv:1708.06152 [pdf, other]

doi 10.1016/j.jneumeth.2020.108778

Physiological Gaussian Process Priors for the Hemodynamics in fMRI Analysis

Authors: Josef Wilzén, Anders Eklund, Mattias Villani

Abstract: Background: Inference from fMRI data faces the challenge that the hemodynamic system that relates neural activity to the observed BOLD fMRI signal is unknown. New Method: We propose a new Bayesian model for task fMRI data with the following features: (i) joint estimation of brain activity and the underlying hemodynamics, (ii) the hemodynamics is modeled nonparametrically with a Gaussian process… ▽ More Background: Inference from fMRI data faces the challenge that the hemodynamic system that relates neural activity to the observed BOLD fMRI signal is unknown. New Method: We propose a new Bayesian model for task fMRI data with the following features: (i) joint estimation of brain activity and the underlying hemodynamics, (ii) the hemodynamics is modeled nonparametrically with a Gaussian process (GP) prior guided by physiological information and (iii) the predicted BOLD is not necessarily generated by a linear time-invariant (LTI) system. We place a GP prior directly on the predicted BOLD response, rather than on the hemodynamic response function as in previous literature. This allows us to incorporate physiological information via the GP prior mean in a flexible way, and simultaneously gives us the nonparametric flexibility of the GP. Results: Results on simulated data show that the proposed model is able to discriminate between active and non-active voxels also when the GP prior deviates from the true hemodynamics. Our model finds time varying dynamics when applied to real fMRI data. Comparison with Existing Method(s): The proposed model is better at detecting activity in simulated data than standard models, without inflating the false positive rate. When applied to real fMRI data, our GP model in several cases finds brain activity where previously proposed LTI models does not. Conclusions: We have proposed a new non-linear model for the hemodynamics in task fMRI, that is able to detect active voxels, and gives the opportunity to ask new kinds of questions related to hemodynamics. △ Less

Submitted 18 May, 2020; v1 submitted 21 August, 2017; originally announced August 2017.

Comments: 18 pages, 14 figures

arXiv:1708.00955 [pdf, other]

Hamiltonian Monte Carlo with Energy Conserving Subsampling

Authors: Khue-Dung Dang, Matias Quiroz, Robert Kohn, Minh-Ngoc Tran, Mattias Villani

Abstract: Hamiltonian Monte Carlo (HMC) samples efficiently from high-dimensional posterior distributions with proposed parameter draws obtained by iterating on a discretized version of the Hamiltonian dynamics. The iterations make HMC computationally costly, especially in problems with large datasets, since it is necessary to compute posterior densities and their derivatives with respect to the parameters.… ▽ More Hamiltonian Monte Carlo (HMC) samples efficiently from high-dimensional posterior distributions with proposed parameter draws obtained by iterating on a discretized version of the Hamiltonian dynamics. The iterations make HMC computationally costly, especially in problems with large datasets, since it is necessary to compute posterior densities and their derivatives with respect to the parameters. Naively computing the Hamiltonian dynamics on a subset of the data causes HMC to lose its key ability to generate distant parameter proposals with high acceptance probability. The key insight in our article is that efficient subsampling HMC for the parameters is possible if both the dynamics and the acceptance probability are computed from the same data subsample in each complete HMC iteration. We show that this is possible to do in a principled way in a HMC-within-Gibbs framework where the subsample is updated using a pseudo marginal MH step and the parameters are then updated using an HMC step, based on the current subsample. We show that our subsampling methods are fast and compare favorably to two popular sampling algorithms that utilize gradient estimates from data subsampling. We also explore the current limitations of subsampling HMC algorithms by varying the quality of the variance reducing control variates used in the estimators of the posterior density and its gradients. △ Less

Submitted 1 May, 2019; v1 submitted 2 August, 2017; originally announced August 2017.

Comments: Includes an experiment on the scalability of the method. Text has been revised too

arXiv:1705.08656 [pdf, other]

Efficient Covariance Approximations for Large Sparse Precision Matrices

Authors: Per Sidén, Finn Lindgren, David Bolin, Mattias Villani

Abstract: The use of sparse precision (inverse covariance) matrices has become popular because they allow for efficient algorithms for joint inference in high-dimensional models. Many applications require the computation of certain elements of the covariance matrix, such as the marginal variances, which may be non-trivial to obtain when the dimension is large. This paper introduces a fast Rao-Blackwellized… ▽ More The use of sparse precision (inverse covariance) matrices has become popular because they allow for efficient algorithms for joint inference in high-dimensional models. Many applications require the computation of certain elements of the covariance matrix, such as the marginal variances, which may be non-trivial to obtain when the dimension is large. This paper introduces a fast Rao-Blackwellized Monte Carlo sampling based method for efficiently approximating selected elements of the covariance matrix. The variance and confidence bounds of the approximations can be precisely estimated without additional computational costs. Furthermore, a method that iterates over subdomains is introduced, and is shown to additionally reduce the approximation errors to practically negligible levels in an application on functional magnetic resonance imaging data. Both methods have low memory requirements, which is typically the bottleneck for competing direct methods. △ Less

Submitted 5 December, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

arXiv:1702.05008 [pdf, other]

Tree Ensembles with Rule Structured Horseshoe Regularization

Authors: Malte Nalenz, Mattias Villani

Abstract: We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of no… ▽ More We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package. △ Less

Submitted 15 February, 2018; v1 submitted 16 February, 2017; originally announced February 2017.

Comments: 24 pages. R package

arXiv:1612.07034 [pdf, other]

Bayesian Non-Central Chi Regression For Neuroimaging

Authors: Bertil Wegmann, Anders Eklund, Mattias Villani

Abstract: We propose a regression model for non-central $χ$ (NC-$χ$) distributed functional magnetic resonance imaging (fMRI) and diffusion weighted imaging (DWI) data, with the heteroscedastic Rician regression model as a prominent special case. The model allows both parameters in the NC-$χ$ distribution to be linked to explanatory variables, with the relevant covariates automatically chosen by Bayesian va… ▽ More We propose a regression model for non-central $χ$ (NC-$χ$) distributed functional magnetic resonance imaging (fMRI) and diffusion weighted imaging (DWI) data, with the heteroscedastic Rician regression model as a prominent special case. The model allows both parameters in the NC-$χ$ distribution to be linked to explanatory variables, with the relevant covariates automatically chosen by Bayesian variable selection. A highly efficient Markov chain Monte Carlo (MCMC) algorithm is proposed for simulating from the joint Bayesian posterior distribution of all model parameters and the binary covariate selection indicators. Simulated fMRI data is used to demonstrate that the Rician model is able to localize brain activity much more accurately than the traditionally used Gaussian model at low signal-to-noise ratios. Using a diffusion dataset from the Human Connectome Project, it is also shown that the commonly used approximate Gaussian noise model underestimates the mean diffusivity (MD) and the fractional anisotropy (FA) in the single-diffusion tensor model compared to the theoretically correct Rician model. △ Less

Submitted 21 December, 2016; originally announced December 2016.

arXiv:1612.00690 [pdf, other]

doi 10.1016/j.neuroimage.2017.04.069

A Bayesian Heteroscedastic GLM with Application to fMRI Data with Motion Spikes

Authors: Anders Eklund, Martin A. Lindquist, Mattias Villani

Abstract: We propose a voxel-wise general linear model with autoregressive noise and heteroscedastic noise innovations (GLMH) for analyzing functional magnetic resonance imaging (fMRI) data. The model is analyzed from a Bayesian perspective and has the benefit of automatically down-weighting time points close to motion spikes in a data-driven manner. We develop a highly efficient Markov Chain Monte Carlo (M… ▽ More We propose a voxel-wise general linear model with autoregressive noise and heteroscedastic noise innovations (GLMH) for analyzing functional magnetic resonance imaging (fMRI) data. The model is analyzed from a Bayesian perspective and has the benefit of automatically down-weighting time points close to motion spikes in a data-driven manner. We develop a highly efficient Markov Chain Monte Carlo (MCMC) algorithm that allows for Bayesian variable selection among the regressors to model both the mean (i.e., the design matrix) and variance. This makes it possible to include a broad range of explanatory variables in both the mean and variance (e.g., time trends, activation stimuli, head motion parameters and their temporal derivatives), and to compute the posterior probability of inclusion from the MCMC output. Variable selection is also applied to the lags in the autoregressive noise process, making it possible to infer the lag order from the data simultaneously with all other model parameters. We use both simulated data and real fMRI data from OpenfMRI to illustrate the importance of proper modeling of heteroscedasticity in fMRI data analysis. Our results show that the GLMH tends to detect more brain activity, compared to its homoscedastic counterpart, by allowing the variance to change over time depending on the degree of head motion. △ Less

Submitted 11 March, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

Journal ref: NeuroImage, Volume 155, 354-369 (2017)

arXiv:1606.00980 [pdf, ps, other]

doi 10.1016/j.neuroimage.2016.11.040

Fast Bayesian whole-brain fMRI analysis with spatial 3D priors

Authors: Per Sidén, Anders Eklund, David Bolin, Mattias Villani

Abstract: Spatial whole-brain Bayesian modeling of task-related functional magnetic resonance imaging (fMRI) is a great computational challenge. Most of the currently proposed methods therefore do inference in subregions of the brain separately or do approximate inference without comparison to the true posterior distribution. A popular such method, which is now the standard method for Bayesian single subjec… ▽ More Spatial whole-brain Bayesian modeling of task-related functional magnetic resonance imaging (fMRI) is a great computational challenge. Most of the currently proposed methods therefore do inference in subregions of the brain separately or do approximate inference without comparison to the true posterior distribution. A popular such method, which is now the standard method for Bayesian single subject analysis in the SPM software, is introduced in Penny et al. (2005b). The method processes the data slice-by-slice and uses an approximate variational Bayes (VB) estimation algorithm that enforces posterior independence between activity coefficients in different voxels. We introduce a fast and practical Markov chain Monte Carlo (MCMC) scheme for exact inference in the same model, both slice-wise and for the whole brain using a 3D prior on activity coefficients. The algorithm exploits sparsity and uses modern techniques for efficient sampling from high-dimensional Gaussian distributions, leading to speed-ups without which MCMC would not be a practical option. Using MCMC, we are for the first time able to evaluate the approximate VB posterior against the exact MCMC posterior, and show that VB can lead to spurious activation. In addition, we develop an improved VB method that drops the assumption of independent voxels a posteriori. This algorithm is shown to be much faster than both MCMC and the original VB for large datasets, with negligible error compared to the MCMC posterior. △ Less

Submitted 29 September, 2016; v1 submitted 3 June, 2016; originally announced June 2016.

Journal ref: NeuroImage (2017), vol. 146, 211-225

arXiv:1603.08232 [pdf, other]

The block-Poisson estimator for optimally tuned exact subsampling MCMC

Authors: Matias Quiroz, Minh-Ngoc Tran, Mattias Villani, Robert Kohn, Khue-Dung Dang

Abstract: Speeding up Markov Chain Monte Carlo (MCMC) for datasets with many observations by data subsampling has recently received considerable attention. A pseudo-marginal MCMC method is proposed that estimates the likelihood by data subsampling using a block-Poisson estimator. The estimator is a product of Poisson estimators, allowing us to update a single block of subsample indicators in each MCMC itera… ▽ More Speeding up Markov Chain Monte Carlo (MCMC) for datasets with many observations by data subsampling has recently received considerable attention. A pseudo-marginal MCMC method is proposed that estimates the likelihood by data subsampling using a block-Poisson estimator. The estimator is a product of Poisson estimators, allowing us to update a single block of subsample indicators in each MCMC iteration so that a desired correlation is achieved between the logs of successive likelihood estimates. This is important since pseudo-marginal MCMC with positively correlated likelihood estimates can use substantially smaller subsamples without adversely affecting the sampling efficiency. The block-Poisson estimator is unbiased but not necessarily positive, so the algorithm runs the MCMC on the absolute value of the likelihood estimator and uses an importance sampling correction to obtain consistent estimates of the posterior mean of any function of the parameters. Our article derives guidelines to select the optimal tuning parameters for our method and shows that it compares very favourably to regular MCMC without subsampling, and to two other recently proposed exact subsampling approaches in the literature. △ Less

Submitted 6 April, 2020; v1 submitted 27 March, 2016; originally announced March 2016.

Comments: The main paper is 28 pages. The supplementary material is 28 pages

arXiv:1603.02485 [pdf, other]

The Block Pseudo-Marginal Sampler

Authors: M. -N. Tran, R. Kohn, M. Quiroz, M. Villani

Abstract: The pseudo-marginal (PM) approach is increasingly used for Bayesian inference in statistical models, where the likelihood is intractable but can be estimated unbiasedly. %Examples include random effect models, state-space models and data subsampling in big-data settings. Deligiannidis et al. (2016) show how the PM approach can be made much more efficient by correlating the underlying Monte Carlo (… ▽ More The pseudo-marginal (PM) approach is increasingly used for Bayesian inference in statistical models, where the likelihood is intractable but can be estimated unbiasedly. %Examples include random effect models, state-space models and data subsampling in big-data settings. Deligiannidis et al. (2016) show how the PM approach can be made much more efficient by correlating the underlying Monte Carlo (MC) random numbers used to form the estimate of the likelihood at the current and proposed values of the unknown parameters. Their approach greatly speeds up the standard PM algorithm, as it requires a much smaller number of samples or particles to form the optimal likelihood estimate. Our paper presents an alternative implementation of the correlated PM approach, called the block PM, which divides the underlying random numbers into blocks so that the likelihood estimates for the proposed and current values of the parameters only differ by the random numbers in one block. We show that this implementation of the correlated PM can be much more efficient for some specific problems than the implementation in Deligiannidis et al. (2016); for example when the likelihood is estimated by subsampling or the likelihood is a product of terms each of which is given by an integral which can be estimated unbiasedly by randomised quasi-Monte Carlo. Our article provides methodology and guidelines for efficiently implementing the block PM. A second advantage of the the block PM is that it provides a direct way to control the correlation between the logarithms of the estimates of the likelihood at the current and proposed values of the parameters than the implementation in Deligiannidis et al. (2016). We obtain methods and guidelines for selecting the optimal number of samples based on idealized but realistic assumptions. △ Less

Submitted 9 September, 2017; v1 submitted 8 March, 2016; originally announced March 2016.

Comments: 41 pages, 6 tables , 4 figures

Showing 1–50 of 72 results for author: Villani, M