Search | arXiv e-print repository

Bayesian temporal biclustering with applications to multi-subject neuroscience studies

Authors: Federica Zoe Ricci, Erik B. Sudderth, Jaylen Lee, Megan A. K. Peters, Marina Vannucci, Michele Guindani

Abstract: We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of… ▽ More We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of subjects induces a time-varying partition of measurements. Our approach allows for data-driven determination of the number of subject and measurement clusters as well as estimation of the number and location of changepoints in measurement partitions. To efficiently perform model fitting and posterior estimation with Markov Chain Monte Carlo, we derive a blocked update of measurements' cluster-assignment sequences. We illustrate the performance of our model in two applications to functional magnetic resonance imaging data and to an electroencephalogram dataset. The results indicate that the proposed model can combine information from potentially many subjects to discover a set of interpretable, dynamic patterns. Experiments on simulated data compare the estimation performance of the proposed model against ground-truth values and other statistical methods, showing that it performs well at identifying ground-truth subject and measurement clusters even when no subject or time dependence is present. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.03385 [pdf, other]

Discrete Autoregressive Switching Processes in Sparse Graphical Modeling of Multivariate Time Series Data

Authors: Beniamino Hadj-Amar, Aaron M. Bornstein, Michele Guindani, Marina Vannucci

Abstract: We propose a flexible Bayesian approach for sparse Gaussian graphical modeling of multivariate time series. We account for temporal correlation in the data by assuming that observations are characterized by an underlying and unobserved hidden discrete autoregressive process. We assume multivariate Gaussian emission distributions and capture spatial dependencies by modeling the state-specific preci… ▽ More We propose a flexible Bayesian approach for sparse Gaussian graphical modeling of multivariate time series. We account for temporal correlation in the data by assuming that observations are characterized by an underlying and unobserved hidden discrete autoregressive process. We assume multivariate Gaussian emission distributions and capture spatial dependencies by modeling the state-specific precision matrices via graphical horseshoe priors. We characterize the mixing probabilities of the hidden process via a cumulative shrinkage prior that accommodates zero-inflated parameters for non-active components, and further incorporate a sparsity-inducing Dirichlet prior to estimate the effective number of states from the data. For posterior inference, we develop a sampling procedure that allows estimation of the number of discrete autoregressive lags and the number of states, and that cleverly avoids having to deal with the changing dimensions of the parameter space. We thoroughly investigate performance of our proposed methodology through several simulation studies. We further illustrate the use of our approach for the estimation of dynamic brain connectivity based on fMRI data collected on a subject performing a task-based experiment on latent learning △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2401.10235 [pdf, other]

Semi-parametric local variable selection under misspecification

Authors: David Rossell, Arnold Kisuk Kseung, Ignacio Saez, Michele Guindani

Abstract: Local variable selection aims to discover localized effects by assessing the impact of covariates on outcomes within specific regions defined by other covariates. We outline some challenges of local variable selection in the presence of non-linear relationships and model misspecification. Specifically, we highlight a potential drawback of common semi-parametric methods: even slight model misspecif… ▽ More Local variable selection aims to discover localized effects by assessing the impact of covariates on outcomes within specific regions defined by other covariates. We outline some challenges of local variable selection in the presence of non-linear relationships and model misspecification. Specifically, we highlight a potential drawback of common semi-parametric methods: even slight model misspecification can result in a high rate of false positives. To address these shortcomings, we propose a methodology based on orthogonal cut splines that achieves consistent local variable selection in high-dimensional scenarios. Our approach offers simplicity, handles both continuous and discrete covariates, and provides theory for high-dimensional covariates and model misspecification. We discuss settings with either independent or dependent data. Our proposal allows including adjustment covariates that do not undergo selection, enhancing flexibility in modeling complex scenarios. We illustrate its application in simulation studies with both independent and functional data, as well as with two real datasets. One dataset evaluates salary gaps associated with discrimination factors at different ages, while the other examines the effects of covariates on brain activation over time. The approach is implemented in the R package mombf. △ Less

Submitted 19 February, 2024; v1 submitted 14 November, 2023; originally announced January 2024.

arXiv:2210.01281 [pdf, other]

A Predictor-Informed Multi-Subject Bayesian Approach for Dynamic Functional Connectivity

Authors: Jaylen Lee, Sana Hussain, Ryan Warnick, Marina Vannucci, Isaac Menchaca, Aaron R. Seitz, ** Hu, Megan A. K. Peters, Michele Guindani

Abstract: Time Varying Functional Connectivity (TVFC) investigates how the interactions among brain regions vary over the course of an fMRI experiment. The transitions between different individual connectivity states can be modulated by changes in underlying physiological mechanisms that drive functional network dynamics, e.g., changes in attention or cognitive effort as measured by pupil dilation. In this… ▽ More Time Varying Functional Connectivity (TVFC) investigates how the interactions among brain regions vary over the course of an fMRI experiment. The transitions between different individual connectivity states can be modulated by changes in underlying physiological mechanisms that drive functional network dynamics, e.g., changes in attention or cognitive effort as measured by pupil dilation. In this paper, we develop a multi-subject Bayesian framework for estimating dynamic functional networks as a function of time-varying exogenous physiological covariates that are simultaneously recorded in each subject during the fMRI experiment. More specifically, we consider a dynamic Gaussian graphical model approach, where a non-homogeneous hidden Markov model is employed to classify the fMRI time series into latent neurological states, borrowing strength over the entire time course of the experiment. The state-transition probabilities are assumed to vary over time and across subjects, as a function of the underlying covariates, allowing for the estimation of recurrent connectivity patterns and the sharing of networks among the subjects. Our modeling approach further assumes sparsity in the network structures, via shrinkage priors. We achieve edge selection in the estimated graph structures, by introducing a multi-comparison procedure for shrinkage-based inferences with Bayesian false discovery rate control. We apply our modeling framework on a resting-state experiment where fMRI data have been collected concurrently with pupillometry measurements, leading us to assess the heterogeneity of the effects of changes in pupil dilation, previously linked to changes in norepinephrine-containing locus coeruleus, on the subjects' propensity to change connectivity states. △ Less

Submitted 9 January, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2205.00930 [pdf, other]

Multiple hypothesis screening using mixtures of non-local distributions with applications to genomic studies

Authors: Francesco Denti, Stefano Peluso, Michele Guindani, Antonietta Mira

Abstract: The analysis of large-scale datasets, especially in biomedical contexts, frequently involves a principled screening of multiple hypotheses. The celebrated two-group model jointly models the distribution of the test statistics with mixtures of two competing densities, the null and the alternative distributions. We investigate the use of weighted densities and, in particular, non-local densities as… ▽ More The analysis of large-scale datasets, especially in biomedical contexts, frequently involves a principled screening of multiple hypotheses. The celebrated two-group model jointly models the distribution of the test statistics with mixtures of two competing densities, the null and the alternative distributions. We investigate the use of weighted densities and, in particular, non-local densities as working alternative distributions, to enforce separation from the null and thus refine the screening procedure. We show how these weighted alternatives improve various operating characteristics, such as the Bayesian False Discovery rate, of the resulting tests for a fixed mixture proportion with respect to a local, unweighted likelihood approach. Parametric and nonparametric model specifications are proposed, along with efficient samplers for posterior inference. By means of a simulation study, we exhibit how our model compares with both well-established and state-of-the-art alternatives in terms of various operating characteristics. Finally, to illustrate the versatility of our method, we conduct three differential expression analyses with publicly-available datasets from genomic studies of heterogeneous nature. △ Less

Submitted 9 March, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

arXiv:2106.14083 [pdf, other]

Bayesian Time-Varying Tensor Vector Autoregressive Models for Dynamic Effective Connectivity

Authors: Wei Zhang, Ivor Cribben, sonia Petrone, Michele Guindani

Abstract: In contemporary neuroscience, a key area of interest is dynamic effective connectivity, which is crucial for understanding the dynamic interactions and causal relationships between different brain regions. Dynamic effective connectivity can provide insights into how brain network interactions are altered in neurological disorders such as dyslexia. Time-varying vector autoregressive (TV-VAR) models… ▽ More In contemporary neuroscience, a key area of interest is dynamic effective connectivity, which is crucial for understanding the dynamic interactions and causal relationships between different brain regions. Dynamic effective connectivity can provide insights into how brain network interactions are altered in neurological disorders such as dyslexia. Time-varying vector autoregressive (TV-VAR) models have been employed to draw inferences for this purpose. However, their significant computational requirements pose challenges, since the number of parameters to be estimated increases quadratically with the number of time series. In this paper, we propose a computationally efficient Bayesian time-varying VAR approach. For dealing with large-dimensional time series, the proposed framework employs a tensor decomposition for the VAR coefficient matrices at different lags. Dynamically varying connectivity patterns are captured by assuming that at any given time only a subset of components in the tensor decomposition is active. Latent binary time series select the active components at each time via an innovative and parsimonious Ising model in the time-domain. Furthermore, we propose parsity-inducing priors to achieve global-local shrinkage of the VAR coefficients, determine automatically the rank of the tensor decomposition and guide the selection of the lags of the auto-regression. We show the performances of our model formulation via simulation studies and data from a real fMRI study involving a book reading experiment. △ Less

Submitted 28 May, 2024; v1 submitted 26 June, 2021; originally announced June 2021.

arXiv:2106.08281 [pdf, other]

A Horseshoe mixture model for Bayesian screening with an application to light sheet fluorescence microscopy in brain imaging

Authors: Francesco Denti, Ricardo Azevedo, Chelsie Lo, Damian Wheeler, Sunil P. Gandhi, Michele Guindani, Babak Shahbaba

Abstract: In this paper, we focus on identifying differentially activated brain regions using a light sheet fluorescence microscopy - a recently developed technique for whole-brain imaging. Most existing statistical methods solve this problem by partitioning the brain regions into two classes: significantly and non-significantly activated. However, for the brain imaging problem at the center of our study, s… ▽ More In this paper, we focus on identifying differentially activated brain regions using a light sheet fluorescence microscopy - a recently developed technique for whole-brain imaging. Most existing statistical methods solve this problem by partitioning the brain regions into two classes: significantly and non-significantly activated. However, for the brain imaging problem at the center of our study, such binary grou** may provide overly simplistic discoveries by filtering out weak but important signals, that are typically adulterated by the noise present in the data. To overcome this limitation, we introduce a new Bayesian approach that allows classifying the brain regions into several tiers with varying degrees of relevance. Our approach is based on a combination of shrinkage priors - widely used in regression and multiple hypothesis testing problems - and mixture models - commonly used in model-based clustering. In contrast to the existing regularizing prior distributions, which use either the spike-and-slab prior or continuous scale mixtures, our class of priors is based on a discrete mixture of continuous scale mixtures and devises a cluster-shrinkage version of the Horseshoe prior. As a result, our approach provides a more general setting for Bayesian sparse estimation, drastically reduces the number of shrinkage parameters needed, and creates a framework for sharing information across units of interest. We show that this approach leads to more biologically meaningful and interpretable results in our brain imaging problem, since it allows the discrimination between active and inactive regions, while at the same time ranking the discoveries into clusters representing tiers of similar importance. △ Less

Submitted 27 January, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

arXiv:2103.03818 [pdf, other]

Time-varying $\ell_0$ optimization for Spike Inference from Multi-Trial Calcium Recordings

Authors: Tong Shen, Kevin Johnston, Gyorgy Lur, Michele Guindani, Hernando Ombao, Zhaoxia Yu

Abstract: Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained… ▽ More Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying $\ell_0$ penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate function across trials due to learning. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2102.09403 [pdf, other]

Bayesian nonparametric analysis for the detection of spikes in noisy calcium imaging data

Authors: Laura D'Angelo, Antonio Canale, Zhaoxia Yu, Michele Guindani

Abstract: Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian f… ▽ More Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian finite mixture specification that allows the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities. △ Less

Submitted 27 January, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: 18 pages, 5 figures

arXiv:2011.05548 [pdf, other]

A Bayesian Nonparametric model for textural pattern heterogeneity

Authors: Xiao Li, Michele Guindani, Chaan S. Ng, Brian P. Hobbs

Abstract: Cancer radiomics is an emerging discipline promising to elucidate lesion phenotypes and tumor heterogeneity through patterns of enhancement, texture, morphology, and shape. The prevailing technique for image texture analysis relies on the construction and synthesis of Gray-Level Co-occurrence Matrices (GLCM). Practice currently reduces the structured count data of a GLCM to reductive and redundant… ▽ More Cancer radiomics is an emerging discipline promising to elucidate lesion phenotypes and tumor heterogeneity through patterns of enhancement, texture, morphology, and shape. The prevailing technique for image texture analysis relies on the construction and synthesis of Gray-Level Co-occurrence Matrices (GLCM). Practice currently reduces the structured count data of a GLCM to reductive and redundant summary statistics for which analysis requires variable selection and multiple comparisons for each application, thus limiting reproducibility. In this article, we develop a Bayesian multivariate probabilistic framework for the analysis and unsupervised clustering of a sample of GLCM objects. By appropriately accounting for skewness and zero-inflation of the observed counts and simultaneously adjusting for existing spatial autocorrelation at nearby cells, the methodology facilitates estimation of texture pattern distributions within the GLCM lattice itself. The techniques are applied to cluster images of adrenal lesions obtained from CT scans with and without administration of contrast. We further assess whether the resultant subtypes are clinically oriented by investigating their correspondence with pathological diagnoses. Additionally, we compare performance to a class of machine-learning approaches currently used in cancer radiomics with simulation studies. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: 45 pages, 7 figures, 1 Table

arXiv:2008.07077 [pdf, other]

A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data

Authors: Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira

Abstract: The use of high-dimensional data for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn di… ▽ More The use of high-dimensional data for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested Common Atoms Model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice-sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:1404.3560 [pdf, ps, other]

doi 10.1214/13-AOAS705

A hierarchical Bayesian model for inference of copy number variants and their association to gene expression

Authors: Alberto Cassese, Michele Guindani, Mahlet G. Tadesse, Francesco Falciani, Marina Vannucci

Abstract: A number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates… ▽ More A number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates the gene expression levels to latent copy number states which, in turn, are related to the observed surrogate CGH measurements via a hidden Markov model. We employ selection priors that exploit the dependencies across adjacent copy number states and investigate MCMC stochastic search techniques for posterior inference. Our approach results in a unified modeling framework for simultaneously inferring copy number variants (CNV) and identifying their significant associations with mRNA transcripts abundance. We show performance on simulated data and illustrate an application to data from a genomic study on human cancer cell lines. △ Less

Submitted 14 April, 2014; originally announced April 2014.

Comments: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS705

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 1, 148-175

arXiv:1202.6172 [pdf, ps, other]

doi 10.1214/11-AOAS482

A class of covariate-dependent spatiotemporal covariance functions for the analysis of daily ozone concentration

Authors: Brian J. Reich, Jo Eidsvik, Michele Guindani, Amy J. Nail, Alexandra M. Schmidt

Abstract: In geostatistics, it is common to model spatially distributed phenomena through an underlying stationary and isotropic spatial process. However, these assumptions are often untenable in practice because of the influence of local effects in the correlation structure. Therefore, it has been of prolonged interest in the literature to provide flexible and effective ways to model nonstationarity in the… ▽ More In geostatistics, it is common to model spatially distributed phenomena through an underlying stationary and isotropic spatial process. However, these assumptions are often untenable in practice because of the influence of local effects in the correlation structure. Therefore, it has been of prolonged interest in the literature to provide flexible and effective ways to model nonstationarity in the spatial effects. Arguably, due to the local nature of the problem, we might envision that the correlation structure would be highly dependent on local characteristics of the domain of study, namely, the latitude, longitude and altitude of the observation sites, as well as other locally defined covariate information. In this work, we provide a flexible and computationally feasible way for allowing the correlation structure of the underlying processes to depend on local covariate information. We discuss the properties of the induced covariance functions and methods to assess its dependence on local covariate information. The proposed method is used to analyze daily ozone in the southeast United States. △ Less

Submitted 28 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOAS482 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS482

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 4, 2425-2447

arXiv:1012.0866 [pdf, other]

Generalized Species Sampling Priors with Latent Beta reinforcements

Authors: Edoardo M. Airoldi, Thiago Costa, Federico Bassetti, Fabrizio Leisen, Michele Guindani

Abstract: Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a {novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of indepen… ▽ More Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a {novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data. △ Less

Submitted 1 August, 2014; v1 submitted 3 December, 2010; originally announced December 2010.

Comments: For correspondence purposes, Edoardo M. Airoldi's email is [email protected]; Federico Bassetti's email is [email protected]; Michele Guindani's email is [email protected] ; Fabrizo Leisen's email is [email protected]. To appear in the Journal of the American Statistical Association

Showing 1–14 of 14 results for author: Guindani, M