-
Evaluation of data imputation strategies in complex, deeply-phenotyped data sets: the case of the EU-AIMS Longitudinal European Autism Project
Authors:
A. Llera,
M. Brammer,
B. Oakley,
J. Tillmann,
M. Zabihi,
T. Mei,
T. Charman,
C. Ecker,
F. Dell Acqua,
T. Banaschewski,
C. Moessnang,
S. Baron-Cohen,
R. Holt,
S. Durston,
D. Murphy,
E. Loth,
J. K. Buitelaar,
D. L. Floris,
C. F. Beckmann
Abstract:
An increasing number of large-scale multi-modal research initiatives has been conducted in the typically develo** population, as well as in psychiatric cohorts. Missing data is a common problem in such datasets due to the difficulty of assessing multiple measures on a large number of participants. The consequences of missing data accumulate when researchers aim to explore relationships between m…
▽ More
An increasing number of large-scale multi-modal research initiatives has been conducted in the typically develo** population, as well as in psychiatric cohorts. Missing data is a common problem in such datasets due to the difficulty of assessing multiple measures on a large number of participants. The consequences of missing data accumulate when researchers aim to explore relationships between multiple measures. Here we aim to evaluate different imputation strategies to fill in missing values in clinical data from a large (total N=764) and deeply characterised (i.e. range of clinical and cognitive instruments administered) sample of N=453 autistic individuals and N=311 control individuals recruited as part of the EU-AIMS Longitudinal European Autism Project (LEAP) consortium. In particular we consider a total of 160 clinical measures divided in 15 overlap** subsets of participants. We use two simple but common univariate strategies, mean and median imputation, as well as a Round Robin regression approach involving four independent multivariate regression models including a linear model, Bayesian Ridge regression, as well as several non-linear models, Decision Trees, Extra Trees and K-Neighbours regression. We evaluate the models using the traditional mean square error towards removed available data, and consider in addition the KL divergence between the observed and the imputed distributions. We show that all of the multivariate approaches tested provide a substantial improvement compared to typical univariate approaches. Further, our analyses reveal that across all 15 data-subsets tested, an Extra Trees regression approach provided the best global results. This allows the selection of a unique model to impute missing data for the LEAP project and deliver a fixed set of imputed clinical data to be used by researchers working with the LEAP dataset in the future.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Variational Mixture Models with Gamma or inverse-Gamma components
Authors:
A. Llera,
D. Vidaurre,
R. H. R. Pruim,
C. F. Beckmann
Abstract:
Mixture models with Gamma and or inverse-Gamma distributed mixture components are useful for medical image tissue segmentation or as post-hoc models for regression coefficients obtained from linear regression within a Generalised Linear Modeling framework (GLM), used in this case to separate stochastic (Gaussian) noise from some kind of positive or negative "activation" (modeled as Gamma or invers…
▽ More
Mixture models with Gamma and or inverse-Gamma distributed mixture components are useful for medical image tissue segmentation or as post-hoc models for regression coefficients obtained from linear regression within a Generalised Linear Modeling framework (GLM), used in this case to separate stochastic (Gaussian) noise from some kind of positive or negative "activation" (modeled as Gamma or inverse-Gamma distributed). To date, the most common choice in this context it is Gaussian/Gamma mixture models learned through a maximum likelihood (ML) approach; we recently extended such algorithm for mixture models with inverse-Gamma components. Here, we introduce a fully analytical Variational Bayes (VB) learning framework for both Gamma and/or inverse-Gamma components. We use synthetic and resting state fMRI data to compare the performance of the ML and VB algorithms in terms of area under the curve and computational cost. We observed that the ML Gaussian/Gamma model is very expensive specially when considering high resolution images; furthermore, these solutions are highly variable and they occasionally can overestimate the activations severely. The Bayesian Gauss-Gamma is in general the fastest algorithm but provides too dense solutions. The maximum likelihood Gaussian/inverse-Gamma is also very fast but provides in general very sparse solutions. The variational Gaussian/inverse-Gamma mixture model is the most robust and its cost is acceptable even for high resolution images. Further, the presented methodology represents an essential building block that can be directly used in more complex inference tasks, specially designed to analyse MRI-fMRI data; such models include for example analytical variational mixture models with adaptive spatial regularization or better source models for new spatial blind source separation approaches.
△ Less
Submitted 26 July, 2016;
originally announced July 2016.
-
Bayesian estimators of the Gamma distribution
Authors:
A. Llera,
C. F. Beckmann
Abstract:
In this paper we introduce two Bayesian estimators for learning the parameters of the Gamma distribution. The first algorithm uses a well known unnormalized conjugate prior for the Gamma shape and the second one uses a non-linear approximation to the likelihood and a prior on the shape that is conjugate to the approximated likelihood. In both cases use the Laplace approximation to compute the requ…
▽ More
In this paper we introduce two Bayesian estimators for learning the parameters of the Gamma distribution. The first algorithm uses a well known unnormalized conjugate prior for the Gamma shape and the second one uses a non-linear approximation to the likelihood and a prior on the shape that is conjugate to the approximated likelihood. In both cases use the Laplace approximation to compute the required expectations. We perform a theoretical comparison between maximum like- lihood and the presented Bayesian algorithms that allow us to provide non-informative parameter values for the priors hyper parameters. We also provide a numerical comparison using synthetic data. The introduction of these novel Bayesian estimators open the possibility of including Gamma distributions into more complex Bayesian structures, e.g. variational Bayesian mixture models.
△ Less
Submitted 12 July, 2016;
originally announced July 2016.
-
Increasing robustness of pairwise methods for effective connectivity in Magnetic Resonance Imaging by using fractional moment series of BOLD signal distributions
Authors:
Natalia Bielczyk,
Alberto Llera,
Jan Buitelaar,
Jeffrey Glennon,
Christian Beckmann
Abstract:
Estimating causal interactions in the brain from functional magnetic resonance imaging (fMRI) data remains a challenging task. Multiple studies have demonstrated that all current approaches to determine direction of connectivity perform poorly even when applied to synthetic fMRI datasets. Recent advances in this field include methods for pairwise inference, which involve creating a sparse connecto…
▽ More
Estimating causal interactions in the brain from functional magnetic resonance imaging (fMRI) data remains a challenging task. Multiple studies have demonstrated that all current approaches to determine direction of connectivity perform poorly even when applied to synthetic fMRI datasets. Recent advances in this field include methods for pairwise inference, which involve creating a sparse connectome in the first step, and then using a classifier in order to determine the directionality of connection between of every pair of nodes in the second step. In this work, we introduce an advance to the second step of this procedure, by building a classifier based on fractional moments of the BOLD distribution combined into cumulants. The classifier is trained on datasets generated under the Dynamic Causal Modeling (DCM) generative model. The directionality is inferred based upon statistical dependencies between the two node time series, e.g. assigning a causal link from time series of low variance to time series of high variance. Our approach outperforms or performs as well as other methods for effective connectivity when applied to the benchmark datasets. Crucially, it is also more resilient to confounding effects such as differential noise level across different areas of the connectome.
△ Less
Submitted 30 May, 2019; v1 submitted 28 June, 2016;
originally announced June 2016.
-
Estimating an Inverse Gamma distribution
Authors:
A. Llera,
C. F. Beckmann
Abstract:
In this paper we introduce five different algorithms based on method of moments, maximum likelihood and full Bayesian estimation for learning the parameters of the Inverse Gamma distribution. We also provide an expression for the KL divergence for Inverse Gamma distributions which allows us to quantify the estimation accuracy of each of the algorithms. All the presented algorithms are novel. The m…
▽ More
In this paper we introduce five different algorithms based on method of moments, maximum likelihood and full Bayesian estimation for learning the parameters of the Inverse Gamma distribution. We also provide an expression for the KL divergence for Inverse Gamma distributions which allows us to quantify the estimation accuracy of each of the algorithms. All the presented algorithms are novel. The most relevant novelties include the first conjugate prior for the Inverse Gamma shape parameter which allows analytical Bayesian inference, and two very fast algorithms, a maximum likelihood and a Bayesian one, both based on likelihood approximation. In order to compute expectations under the proposed distributions we use the Laplace approximation. The introduction of these novel Bayesian estimators opens the possibility of including Inverse Gamma distributions into more complex Bayesian structures, e.g. variational Bayesian mixture models. The algorithms introduced in this paper are computationally compared using synthetic data and interesting relationships between the maximum likelihood and the Bayesian approaches are derived.
△ Less
Submitted 7 July, 2016; v1 submitted 3 May, 2016;
originally announced May 2016.