Search | arXiv e-print repository

Spatially Structured Regression for Non-conformable Spaces: Integrating Pathology Imaging and Genomics Data in Cancer

Authors: Nathaniel Osher, Jian Kang, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: The spatial composition and cellular heterogeneity of the tumor microenvironment plays a critical role in cancer development and progression. High-definition pathology imaging of tumor biopsies provide a high-resolution view of the spatial organization of different types of cells. This allows for systematic assessment of intra- and inter-patient spatial cellular interactions and heterogeneity by i… ▽ More The spatial composition and cellular heterogeneity of the tumor microenvironment plays a critical role in cancer development and progression. High-definition pathology imaging of tumor biopsies provide a high-resolution view of the spatial organization of different types of cells. This allows for systematic assessment of intra- and inter-patient spatial cellular interactions and heterogeneity by integrating accompanying patient-level genomics data. However, joint modeling across tumor biopsies presents unique challenges due to non-conformability (lack of a common spatial domain across biopsies) as well as high-dimensionality. To address this problem, we propose the Dual random effect and main effect selection model for Spatially structured regression model (DreameSpase). DreameSpase employs a Bayesian variable selection framework that facilitates the assessment of spatial heterogeneity with respect to covariates both within (through fixed effects) and between spaces (through spatial random effects) for non-conformable spatial domains. We demonstrate the efficacy of DreameSpase via simulations and integrative analyses of pathology imaging and gene expression data obtained from $335$ melanoma biopsies. Our findings confirm several existing relationships, e.g. neutrophil genes being associated with both inter- and intra-patient spatial heterogeneity, as well as discovering novel associations. We also provide freely available and computationally efficient software for implementing DreameSpase. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2401.11515 [pdf, other]

Geometry-driven Bayesian Inference for Ultrametric Covariance Matrices

Authors: Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, Veerabhadran Baladandayuthapani

Abstract: Ultrametric matrices arise as covariance matrices in latent tree models for multivariate data with hierarchically correlated components. As a parameter space in a model, the set of ultrametric matrices is neither convex nor a smooth manifold, and focus in literature has hitherto mainly been restricted to estimation through projections and relaxation-based techniques. Leveraging the link between an… ▽ More Ultrametric matrices arise as covariance matrices in latent tree models for multivariate data with hierarchically correlated components. As a parameter space in a model, the set of ultrametric matrices is neither convex nor a smooth manifold, and focus in literature has hitherto mainly been restricted to estimation through projections and relaxation-based techniques. Leveraging the link between an ultrametric matrix and a rooted tree, we equip the set of ultrametric matrices with a convenient geometry based on the well-known geometry of phylogenetic trees, whose attractive properties (e.g. unique geodesics and Fréchet means) the set of ultrametric matrices inherits. This results in a novel representation of an ultrametric matrix by coordinates of the tree space, which we then use to define a class of Markovian and consistent prior distributions on the set of ultrametric matrices in a Bayesian model, and develop an efficient algorithm to sample from the posterior distribution that generates updates by making intrinsic local moves along geodesics within the set of ultrametric matrices. In simulation studies, our proposed algorithm restores the underlying matrices with posterior samples that recover the tree topology with a high frequency of true topology and generate element-wise credible intervals with a high nominal coverage rate. We use the proposed algorithm on the pre-clinical cancer data to investigate the mechanism similarity by constructing the underlying treatment tree and identify treatments with high mechanism similarity also target correlated pathways in biological literature. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2311.08484 [pdf, other]

Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe)

Authors: Neel Desai, Veerabhadran Baladandayuthapani, Russell T. Shinohara, Jeffrey S. Morris

Abstract: We propose a new method for the simultaneous selection and estimation of multivariate sparse additive models with correlated errors. Our method called Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe) simultaneously selects among null, linear, and smooth non-linear effects for each predictor while incorporating joint estimation of the sparse residual structure among respons… ▽ More We propose a new method for the simultaneous selection and estimation of multivariate sparse additive models with correlated errors. Our method called Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe) simultaneously selects among null, linear, and smooth non-linear effects for each predictor while incorporating joint estimation of the sparse residual structure among responses, with the motivation that accounting for inter-response correlation structure can lead to improved accuracy in variable selection and estimation efficiency. CoMPAdRe is constructed in a computationally efficient way that allows the selection and estimation of linear and non-linear covariates to be conducted in parallel across responses. Compared to single-response approaches that marginally select linear and non-linear covariate effects, we demonstrate in simulation studies that the joint multivariate modeling leads to gains in both estimation efficiency and selection accuracy, of greater magnitude in settings where signal is moderate relative to the level of noise. We apply our approach to protein-mRNA expression levels from multiple breast cancer pathways obtained from The Cancer Proteome Atlas and characterize both mRNA-protein associations and protein-protein subnetworks for each pathway. We find non-linear mRNA-protein associations for the Core Reactive, EMT, PIK-AKT, and RTK pathways. △ Less

Submitted 18 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.18474 [pdf, other]

Robust Bayesian Graphical Regression Models for Assessing Tumor Heterogeneity in Proteomic Networks

Authors: Tsung-Hung Yao, Yang Ni, Anindya Bhadra, Jian Kang, Veerabhadran Baladandayuthapani

Abstract: Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of the two canonical assumptions: (i) a homogeneous graph with a common network for all subjects; or (ii) an assumption of normality especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hol… ▽ More Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of the two canonical assumptions: (i) a homogeneous graph with a common network for all subjects; or (ii) an assumption of normality especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hold in certain applications such as proteomic networks in cancer. To this end, we propose an approach termed robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR is a flexible framework that accommodates non-normality through random marginal transformations and constructs covariate-dependent graphs to accommodate heterogeneity through graphical regression techniques. We formulate a new characterization of edge dependencies in such models called conditional sign independence with covariates along with an efficient posterior sampling algorithm. In simulation studies, we demonstrate that rBGR outperforms existing graphical regression models for data generated under various levels of non-normality in both edge and covariate selection. We use rBGR to assess proteomic networks across two cancers: lung and ovarian, to systematically investigate the effects of immunogenic heterogeneity within tumors. Our analyses reveal several important protein-protein interactions that are differentially impacted by the immune cell abundance; some corroborate existing biological knowledge whereas others are novel findings. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2212.14165 [pdf, other]

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data

Authors: Rupam Bhattacharyya, Nicholas Henderson, Veerabhadran Baladandayuthapani

Abstract: Rapid advancements in collection and dissemination of multi-platform molecular and genomics data has resulted in enormous opportunities to aggregate such data in order to understand, prevent, and treat human diseases. While significant improvements have been made in multi-omic data integration methods to discover biological markers and mechanisms underlying both prognosis and treatment, the precis… ▽ More Rapid advancements in collection and dissemination of multi-platform molecular and genomics data has resulted in enormous opportunities to aggregate such data in order to understand, prevent, and treat human diseases. While significant improvements have been made in multi-omic data integration methods to discover biological markers and mechanisms underlying both prognosis and treatment, the precise cellular functions governing these complex mechanisms still need detailed and data-driven de-novo evaluations. We propose a framework called Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data (fiBAG), that allows simultaneous identification of upstream functional evidence of proteogenomic biomarkers and the incorporation of such knowledge in Bayesian variable selection models to improve signal detection. fiBAG employs a conflation of Gaussian process models to quantify (possibly non-linear) functional evidence via Bayes factors, which are then mapped to a novel calibrated spike-and-slab prior, thus guiding selection and providing functional relevance to the associations with patient outcomes. Using simulations, we illustrate how integrative methods with functional calibration have higher power to detect disease related markers than non-integrative approaches. We demonstrate the profitability of fiBAG via a pan-cancer analysis of 14 cancer types to identify and assess the cellular mechanisms of proteogenomic markers associated with cancer stemness and patient survival. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: 41 pages manuscript, 5 figures; 49 pages supplementary materials, 37 supplementary figures

MSC Class: 62-08 ACM Class: G.3

arXiv:2210.08096 [pdf, other]

Bayesian Covariate-Dependent Quantile Directed Acyclic Graphical Models for Individualized Inference

Authors: Ksheera Sagar, Yang Ni, Veerabhadran Baladandayuthapani, Anindya Bhadra

Abstract: We propose an approach termed ``qDAGx'' for Bayesian covariate-dependent quantile directed acyclic graphs (DAGs) where these DAGs are individualized, in the sense that they depend on individual-specific covariates. The individualized DAG structure of the proposed approach can be uniquely identified at any given quantile, based on purely observational data without strong assumptions such as a known… ▽ More We propose an approach termed ``qDAGx'' for Bayesian covariate-dependent quantile directed acyclic graphs (DAGs) where these DAGs are individualized, in the sense that they depend on individual-specific covariates. The individualized DAG structure of the proposed approach can be uniquely identified at any given quantile, based on purely observational data without strong assumptions such as a known topological ordering. To scale the proposed method to a large number of variables and covariates, we propose for the model parameters a novel parameter expanded horseshoe prior that affords a number of attractive theoretical and computational benefits to our approach. By modeling the conditional quantiles, qDAGx overcomes the common limitations of mean regression for DAGs, which can be sensitive to the choice of likelihood, e.g., an assumption of multivariate normality, as well as to the choice of priors. We demonstrate the performance of qDAGx through extensive numerical simulations and via an application in precision medicine, which infers patient-specific protein--protein interaction networks in lung cancer. △ Less

Submitted 22 May, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 35 pages, 5 figures

arXiv:2204.04840 [pdf, other]

Nonparametric Bayes Differential Analysis of Multigroup DNA Methylation Data

Authors: Chiyu Gu, Veerabhadran Baladandayuthapani, Subharup Guha

Abstract: DNA methylation datasets in cancer studies are comprised of measurements on a large number of genomic locations called cytosine-phosphate-guanine (CpG) sites with complex correlation structures. A fundamental goal of these studies is the development of statistical techniques that can identify disease genomic signatures across multiple patient groups defined by different experimental or biological… ▽ More DNA methylation datasets in cancer studies are comprised of measurements on a large number of genomic locations called cytosine-phosphate-guanine (CpG) sites with complex correlation structures. A fundamental goal of these studies is the development of statistical techniques that can identify disease genomic signatures across multiple patient groups defined by different experimental or biological conditions. We propose BayesDiff, a nonparametric Bayesian approach for differential analysis relying on a novel class of first order mixture models called the Sticky Pitman-Yor process or two-restaurant two-cuisine franchise (2R2CF). The BayesDiff methodology flexibly utilizes information from all CpG sites or probes, adaptively accommodates any serial dependence due to the widely varying inter-probe distances and performs simultaneous inferences about the differential genomic signature of the patient groups. Using simulation studies, we demonstrate the effectiveness of the BayesDiff procedure relative to existing statistical techniques for differential DNA methylation. The methodology is applied to analyze a gastrointestinal (GI) cancer dataset that displays both serial correlations and interaction patterns. The results support and complement known aspects of DNA methylation and gene association in upper GI cancers. △ Less

Submitted 4 May, 2023; v1 submitted 10 April, 2022; originally announced April 2022.

arXiv:2111.11529 [pdf, other]

Bayesian Robust Learning in Chain Graph Models for Integrative Pharmacogenomics

Authors: Moumita Chakraborty, Veerabhadran Baladandayuthapani, Anindya Bhadra, Min ** Ha

Abstract: Integrative analysis of multi-level pharmacogenomic data for modeling dependencies across various biological domains is crucial for develo** genomic-testing based treatments. Chain graphs characterize conditional dependence structures of such multi-level data where variables are naturally partitioned into multiple ordered layers, consisting of both directed and undirected edges. Existing literat… ▽ More Integrative analysis of multi-level pharmacogenomic data for modeling dependencies across various biological domains is crucial for develo** genomic-testing based treatments. Chain graphs characterize conditional dependence structures of such multi-level data where variables are naturally partitioned into multiple ordered layers, consisting of both directed and undirected edges. Existing literature mostly focus on Gaussian chain graphs, which are ill-suited for non-normal distributions with heavy-tailed marginals, potentially leading to inaccurate inferences. We propose a Bayesian robust chain graph model (RCGM) based on random transformations of marginals using Gaussian scale mixtures to account for node-level non-normality in continuous multivariate data. This flexible modeling strategy facilitates identification of conditional sign dependencies among non-normal nodes while still being able to infer conditional dependencies among normal nodes. In simulations, we demonstrate that RCGM outperforms existing Gaussian chain graph inference methods in data generated from various non-normal mechanisms. We apply our method to genomic, transcriptomic and proteomic data to understand underlying biological processes holistically for drug response and resistance in lung cancer cell lines. Our analysis reveals inter- and intra- platform dependencies of key signaling pathways to monotherapies of icotinib, erlotinib and osimertinib among other drugs, along with shared patterns of molecular mechanisms behind drug actions. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 35 pages, 5 figures; Supplementary material follows after the main document

arXiv:2108.05034 [pdf, ps, other]

Bayesian functional graphical models

Authors: Lin Zhang, Veera Baladandayuthapani, Quinton Neville, Karina Quevedo, Jeffrey S. Morris

Abstract: We develop a Bayesian graphical modeling framework for functional data for correlated multivariate random variables observed over a continuous domain. Our method leads to graphical Markov models for functional data which allows the graphs to vary over the functional domain. The model involves estimation of graphical models that evolve functionally in a nonparametric fashion while accounting for wi… ▽ More We develop a Bayesian graphical modeling framework for functional data for correlated multivariate random variables observed over a continuous domain. Our method leads to graphical Markov models for functional data which allows the graphs to vary over the functional domain. The model involves estimation of graphical models that evolve functionally in a nonparametric fashion while accounting for within-functional correlations and borrowing strength across functional positions so contiguous locations are encouraged but not forced to have similar graph structure and edge strength. We utilize a strategy that combines nonparametric basis function modeling with modified Bayesian graphical regularization techniques, which induces a new class of hypoexponential normal scale mixture distributions that not only leads to adaptively shrunken estimators of the conditional cross-covariance but also facilitates a thorough theoretical investigation of the shrinkage properties. Our approach scales up to large functional datasets collected on a fine grid. We show through simulations and real data analysis that the Bayesian functional graphical model can efficiently reconstruct the functionally-evolving graphical models by accounting for within-function correlations. △ Less

Submitted 11 August, 2021; originally announced August 2021.

arXiv:2106.10941 [pdf, other]

Tumor Radiogenomics with Bayesian Layered Variable Selection

Authors: Shariq Mohammed, Sebastian Kurtek, Karthik Bharath, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: We propose a statistical framework to integrate radiological magnetic resonance imaging (MRI) and genomic data to identify the underlying radiogenomic associations in lower grade gliomas (LGG). We devise a novel imaging phenotype by dividing the tumor region into concentric spherical layers that mimics the tumor evolution process. MRI data within each layer is represented by voxel--intensity-based… ▽ More We propose a statistical framework to integrate radiological magnetic resonance imaging (MRI) and genomic data to identify the underlying radiogenomic associations in lower grade gliomas (LGG). We devise a novel imaging phenotype by dividing the tumor region into concentric spherical layers that mimics the tumor evolution process. MRI data within each layer is represented by voxel--intensity-based probability density functions which capture the complete information about tumor heterogeneity. Under a Riemannian-geometric framework these densities are mapped to a vector of principal component scores which act as imaging phenotypes. Subsequently, we build Bayesian variable selection models for each layer with the imaging phenotypes as the response and the genomic markers as predictors. Our novel hierarchical prior formulation incorporates the interior-to-exterior structure of the layers, and the correlation between the genomic markers. We employ a computationally-efficient Expectation--Maximization-based strategy for estimation. Simulation studies demonstrate the superior performance of our approach compared to other approaches. With a focus on the cancer driver genes in LGG, we discuss some biologically relevant findings. Genes implicated with survival and oncogenesis are identified as being associated with the spherical layers, which could potentially serve as early-stage diagnostic markers for disease monitoring, prior to routine invasive approaches. △ Less

Submitted 21 June, 2021; originally announced June 2021.

arXiv:2104.00510 [pdf, other]

RADIOHEAD: Radiogenomic Analysis Incorporating Tumor Heterogeneity in Imaging Through Densities

Authors: Shariq Mohammed, Karthik Bharath, Sebastian Kurtek, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: Recent technological advancements have enabled detailed investigation of associations between the molecular architecture and tumor heterogeneity, through multi-source integration of radiological imaging and genomic (radiogenomic) data. In this paper, we integrate and harness radiogenomic data in patients with lower grade gliomas (LGG), a type of brain cancer, in order to develop a regression frame… ▽ More Recent technological advancements have enabled detailed investigation of associations between the molecular architecture and tumor heterogeneity, through multi-source integration of radiological imaging and genomic (radiogenomic) data. In this paper, we integrate and harness radiogenomic data in patients with lower grade gliomas (LGG), a type of brain cancer, in order to develop a regression framework called RADIOHEAD (RADIOgenomic analysis incorporating tumor HEterogeneity in imAging through Densities) to identify radiogenomic associations. Imaging data is represented through voxel intensity probability density functions of tumor sub-regions obtained from multimodal magnetic resonance imaging, and genomic data through molecular signatures in the form of pathway enrichment scores corresponding to their gene expression profiles. Employing a Riemannian-geometric framework for principal component analysis on the set of probability densities functions, we map each probability density to a vector of principal component scores, which are then included as predictors in a Bayesian regression model with the pathway enrichment scores as the response. Variable selection compatible with the grou** structure amongst the predictors induced through the tumor sub-regions is carried out under a group spike-and-slab prior. A Bayesian false discovery rate mechanism is then used to infer significant associations based on the posterior distribution of the regression coefficients. Our analyses reveal several pathways relevant to LGG etiology (such as synaptic transmission, nerve impulse and neurotransmitter pathways), to have significant associations with the corresponding imaging-based predictors. △ Less

Submitted 7 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

arXiv:2010.14638 [pdf, ps, other]

Bayesian Variable Selection in Multivariate Nonlinear Regression with Graph Structures

Authors: Yabo Niu, Nilabja Guha, Debkumar De, Anindya Bhadra, Veerabhadran Baladandayuthapani, Bani K. Mallick

Abstract: Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices. We develop a Bayesian method to incorporate covariate information in this GGMs setup in a nonlinear seemingly unrelated regression framework. We propose a joint predictor and graph selection model and develop an efficient collapsed Gibbs sampler algorithm to… ▽ More Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices. We develop a Bayesian method to incorporate covariate information in this GGMs setup in a nonlinear seemingly unrelated regression framework. We propose a joint predictor and graph selection model and develop an efficient collapsed Gibbs sampler algorithm to search the joint model space. Furthermore, we investigate its theoretical variable selection properties. We demonstrate our method on a variety of simulated data, concluding with a real data set from the TCPA project. △ Less

Submitted 30 July, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

arXiv:2004.12012 [pdf, other]

Integrative Bayesian models using Post-selective Inference: a case study in Radiogenomics

Authors: Snigdha Panigrahi, Shariq Mohammed, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: Integrative analyses based on statistically relevant associations between genomics and a wealth of intermediary phenotypes (such as imaging) provide vital insights into their clinical relevance in terms of the disease mechanisms. Estimates for uncertainty in the resulting integrative models are however unreliable unless inference accounts for the selection of these associations with accuracy. In t… ▽ More Integrative analyses based on statistically relevant associations between genomics and a wealth of intermediary phenotypes (such as imaging) provide vital insights into their clinical relevance in terms of the disease mechanisms. Estimates for uncertainty in the resulting integrative models are however unreliable unless inference accounts for the selection of these associations with accuracy. In this article, we develop selection-aware Bayesian methods which: (i) counteract the impact of model selection bias through a "selection-aware posterior" in a flexible class of integrative Bayesian models post a selection of promising variables via $\ell_1$-regularized algorithms; (ii) strike an inevitable tradeoff between the quality of model selection and inferential power when the same dataset is used for both selection and uncertainty estimation. Central to our methodological development, a carefully constructed conditional likelihood function deployed with a reparameterization map** provides notably tractable updates when gradient-based MCMC sampling is used for estimating uncertainties from the selection-aware posterior. Applying our methods to a radiogenomic analysis, we successfully recover several important gene pathways and estimate uncertainties for their associations with patient survival times. △ Less

Submitted 12 August, 2022; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: 45 pages, 7 Figures

arXiv:2002.07122 [pdf, other]

Bayesian Structure Learning in Multi-layered Genomic Networks

Authors: Min ** Ha, Francesco Stingo, Veerabhadran Baladandayuthapani

Abstract: Integrative network modeling of data arising from multiple genomic platforms provides insight into the holistic picture of the interactive system, as well as the flow of information across many disease domains including cancer. The basic data structure consists of a sequence of hierarchically ordered datasets for each individual subject, which facilitates integration of diverse inputs, such as gen… ▽ More Integrative network modeling of data arising from multiple genomic platforms provides insight into the holistic picture of the interactive system, as well as the flow of information across many disease domains including cancer. The basic data structure consists of a sequence of hierarchically ordered datasets for each individual subject, which facilitates integration of diverse inputs, such as genomic, transcriptomic, and proteomic data. A primary analytical task in such contexts is to model the layered architecture of networks where the vertices can be naturally partitioned into ordered layers, dictated by multiple platforms, and exhibit both undirected and directed relationships. We propose a multi-layered Gaussian graphical model (mlGGM) to investigate conditional independence structures in such multi-level genomic networks in human cancers. We implement a Bayesian node-wise selection (BANS) approach based on variable selection techniques that coherently accounts for the multiple types of dependencies in mlGGM; this flexible strategy exploits edge-specific prior knowledge and selects sparse and interpretable models. Through simulated data generated under various scenarios, we demonstrate that BANS outperforms other existing multivariate regression-based methodologies. Our integrative genomic network analysis for key signaling pathways across multiple cancer types highlights commonalities and differences of p53 integrative networks and epigenetic effects of BRCA2 on p53 and its interaction with T68 phosphorylated CHK2, that may have translational utilities of finding biomarkers and therapeutic targets. △ Less

Submitted 17 February, 2020; originally announced February 2020.

Comments: 39 pages with 8 figures and 1 table

arXiv:1811.05405 [pdf, ps, other]

doi 10.1093/bioinformatics/btz636

NExUS: Bayesian simultaneous network estimation across unequal sample sizes

Authors: Priyam Das, Christine Peterson, Kim-Anh Do, Rehan Akbani, Veerabhadran Baladandayuthapani

Abstract: Network-based analyses of high-throughput genomics data provide a holistic, systems-level understanding of various biological mechanisms for a common population. However, when estimating multiple networks across heterogeneous sub-populations, varying sample sizes pose a challenge in the estimation and inference, as network differences may be driven by differences in power. We are particularly inte… ▽ More Network-based analyses of high-throughput genomics data provide a holistic, systems-level understanding of various biological mechanisms for a common population. However, when estimating multiple networks across heterogeneous sub-populations, varying sample sizes pose a challenge in the estimation and inference, as network differences may be driven by differences in power. We are particularly interested in addressing this challenge in the context of proteomic networks for related cancers, as the number of subjects available for rare cancer (sub-)types is often limited. We develop NExUS (Network Estimation across Unequal Sample sizes), a Bayesian method that enables joint learning of multiple networks while avoiding artefactual relationship between sample size and network sparsity. We demonstrate through simulations that NExUS outperforms existing network estimation methods in this context, and apply it to learn network similarity and shared pathway activity for groups of cancers with related origins represented in The Cancer Genome Atlas (TCGA) proteomic data. △ Less

Submitted 6 November, 2018; originally announced November 2018.

Comments: 8 pages, 8 figues

arXiv:1810.03496 [pdf, other]

Regression Analyses of Distributions using Quantile Functional Regression

Authors: Ho** Yang, Veerabhadran Baladandayuthapani, Arvind U. K. Rao, Jeffrey S. Morris

Abstract: Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected featur… ▽ More Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected features. In this paper, we present methods to model the entire marginal distribution of pixel intensities via the quantile function as functional data, regressed on a set of demographic, clinical, and genetic predictors. We call this approach quantile functional regression, regressing subject-specific marginal distributions across repeated measurements on a set of covariates, allowing us to assess which covariates are associated with the distribution in a global sense, as well as to identify distributional features characterizing these differences, including mean, variance, skewness, and various upper and lower quantiles. To account for smoothness in the quantile functions, we introduce custom basis functions we call quantlets that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set. We fit this model using a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and provides fully Bayesian inference after fitting a Markov chain Monte Carlo. We demonstrate the benefit of the basis space modeling through simulation studies, and apply the method to Magnetic resonance imaging (MRI) based radiomic dataset from Glioblastoma Multiforme to relate imaging-based quantile functions to demographic, clinical, and genetic predictors, finding specific differences in tumor pixel intensity distribution between males and females and between tumors with and without DDIT3 mutations. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: 83 pages, 32 figures. arXiv admin note: substantial text overlap with arXiv:1711.00031

arXiv:1802.08727 [pdf, other]

Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data

Authors: Wonyul Lee, Michelle F. Miranda, Phlip Rausch, Veerbhadran Baladandayuthapani, Massimo Fazio, J. Crawford Downs, Jeffrey S. Morris

Abstract: Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors su… ▽ More Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors such as age. The hypothesis is that scleral strain decreases with age, which could alter biomechanics of the optic nerve head and cause damage that could eventually lead to glaucoma. To evaluate this hypothesis, we adapted Bayesian Functional Mixed Models to model these complex data consisting of correlated functions on spherical scleral surface, with nonparametric age effects allowed to vary in magnitude and smoothness across the scleral surface, multi-level random effect functions to capture within-subject correlation, and functional growth curve terms to capture serial correlation across IOPs that can vary around the scleral surface. Our method yields fully Bayesian inference on the scleral surface or any aggregation or transformation thereof, and reveals interesting insights into the biomechanical etiology of glaucoma. The general modeling framework described is very flexible and applicable to many complex, high-dimensional functional data. △ Less

Submitted 7 May, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: paper accepted in Journal of the American Statistical Association, 2018 -- to appear

arXiv:1711.00031 [pdf, other]

Quantile Functional Regression using Quantlets

Authors: Ho** Yang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris

Abstract: In this paper, we develop a quantile functional regression modeling framework that models the distribution of a set of common repeated observations from a subject through the quantile function, which is regressed on a set of covariates to determine how these factors affect various aspects of the underlying subject-specific distribution. To account for smoothness in the quantile functions, we intro… ▽ More In this paper, we develop a quantile functional regression modeling framework that models the distribution of a set of common repeated observations from a subject through the quantile function, which is regressed on a set of covariates to determine how these factors affect various aspects of the underlying subject-specific distribution. To account for smoothness in the quantile functions, we introduce custom basis functions we call \textit{quantlets} that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set and containing a Gaussian subspace so {non-Gaussianness} can be assessed. While these quantlets could be used within various functional regression frameworks, we build a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and allows fully Bayesian inferences after fitting a Markov chain Monte Carlo. Specifically, we apply global tests to assess which covariates have any effect on the distribution at all, followed by local tests to identify at which specific quantiles the differences lie while adjusting for multiple testing, and to assess whether the covariate affects certain major aspects of the distribution, including location, scale, skewness, Gaussianness, or tails. If the difference lies in these commonly-used summaries, our approach can still detect them, but our systematic modeling strategy can also detect effects on other aspects of the distribution that might be missed if one restricted attention to pre-chosen summaries. We demonstrate the benefit of the basis space modeling through simulation studies, and illustrate the method using a biomedical imaging data set in which we relate the distribution of pixel intensities from a tumor image to various demographic, clinical, and genetic characteristics. △ Less

Submitted 31 October, 2017; originally announced November 2017.

Comments: 41 pages, 8 figures

arXiv:1710.10713

Nonparametric Bayes Differential Analysis for Dependent Multigroup Data with Application to DNA Methylation Analyses in Cancer

Authors: Chiyu Gu, Veerabhadran Baladandayuthapani, Subharup Guha

Abstract: Modern cancer genomics datasets involve widely varying sizes and scales, measurement variables, and correlation structures. A fundamental analytical goal in these high-throughput studies is the development of general statistical techniques that can cleanly sift the signal from noise in identifying disease-specific genomic signatures across a set of experimental or biological conditions. We propose… ▽ More Modern cancer genomics datasets involve widely varying sizes and scales, measurement variables, and correlation structures. A fundamental analytical goal in these high-throughput studies is the development of general statistical techniques that can cleanly sift the signal from noise in identifying disease-specific genomic signatures across a set of experimental or biological conditions. We propose BayesDiff, a nonparametric Bayesian approach based on a novel class of first order mixture models, called the Sticky Poisson-Dirichlet process or multicuisine restaurant franchise. The BayesDiff methodology flexibly utilizes information from all the measurements and adaptively accommodates any serial dependence in the data, accounting for the inter-probe distances, to perform simultaneous inferences on the variables. The technique is applied to analyze a DNA methylation gastrointestinal (GI) cancer dataset, which displays both serial correlations and complex interaction patterns. Our analyses and results both support and complement known aspects of DNA methylation and gene association in upper GI cancers. In simulation studies, we demonstrate the effectiveness of the BayesDiff procedure relative to existing techniques for differential DNA methylation. △ Less

Submitted 10 April, 2022; v1 submitted 29 October, 2017; originally announced October 2017.

Comments: An article with overlap** content but different focus has been submitted, thus this article is no longer appropriate

arXiv:1702.01191 [pdf, other]

Radiologic Image-based Statistical Shape Analysis of Brain Tumors

Authors: Karthik Bharath, Sebastian Kurtek, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: We propose a curve-based Riemannian-geometric approach for general shape-based statistical analyses of tumors obtained from radiologic images. A key component of the framework is a suitable metric that (1) enables comparisons of tumor shapes, (2) provides tools for computing descriptive statistics and implementing principal component analysis on the space of tumor shapes, and (3) allows for a rich… ▽ More We propose a curve-based Riemannian-geometric approach for general shape-based statistical analyses of tumors obtained from radiologic images. A key component of the framework is a suitable metric that (1) enables comparisons of tumor shapes, (2) provides tools for computing descriptive statistics and implementing principal component analysis on the space of tumor shapes, and (3) allows for a rich class of continuous deformations of a tumor shape. The utility of the framework is illustrated through specific statistical tasks on a dataset of radiologic images of patients diagnosed with glioblastoma multiforme, a malignant brain tumor with poor prognosis. In particular, our analysis discovers two patient clusters with very different survival, subtype and genomic characteristics. Furthermore, it is demonstrated that adding tumor shape information into survival models containing clinical and genomic variables results in a significant increase in predictive power. △ Less

Submitted 3 February, 2017; originally announced February 2017.

arXiv:1611.02480 [pdf, other]

Quantile Graphical Models: Bayesian Approaches

Authors: Nilabja Guha, Veera Baladandayuthapani, Bani K. Mallick

Abstract: Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer fro… ▽ More Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset wherein multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology. △ Less

Submitted 8 January, 2020; v1 submitted 8 November, 2016; originally announced November 2016.

arXiv:1604.03615 [pdf, other]

doi 10.1214/16-EJS1184

A Nonparametric Bayesian Technique for High-Dimensional Regression

Authors: Subharup Guha, Veerabhadran Baladandayuthapani

Abstract: This paper proposes a nonparametric Bayesian framework called VariScan for simultaneous clustering, variable selection, and prediction in high-throughput regression settings. Poisson-Dirichlet processes are utilized to detect lower-dimensional latent clusters of covariates. An adaptive nonlinear prediction model is constructed for the response, achieving a balance between model parsimony and flexi… ▽ More This paper proposes a nonparametric Bayesian framework called VariScan for simultaneous clustering, variable selection, and prediction in high-throughput regression settings. Poisson-Dirichlet processes are utilized to detect lower-dimensional latent clusters of covariates. An adaptive nonlinear prediction model is constructed for the response, achieving a balance between model parsimony and flexibility. Contrary to conventional belief, cluster detection is shown to be aposteriori consistent for a general class of models as the number of covariates and subjects grows. Simulation studies and data analyses demonstrate that VariScan often outperforms several well-known statistical methods. △ Less

Submitted 12 April, 2016; originally announced April 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1407.5472

arXiv:1604.00376 [pdf, other]

Inferring network structure in non-normal and mixed discrete-continuous genomic data

Authors: Anindya Bhadra, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear s… ▽ More Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. △ Less

Submitted 1 April, 2016; originally announced April 2016.

arXiv:1509.07535 [pdf, other]

Bayesian Nonparametric Graph Clustering

Authors: Sayantan Banerjee, Rehan Akbani, Veerabhadran Baladandayuthapani

Abstract: We present clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables. As opposed to standard approaches that assume known graph structures, we first estimate the edge structure of the unknown graph using Bayesian neighborhood selection approaches, wherein we account for the uncertainty of graphical structure learning through model-aver… ▽ More We present clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables. As opposed to standard approaches that assume known graph structures, we first estimate the edge structure of the unknown graph using Bayesian neighborhood selection approaches, wherein we account for the uncertainty of graphical structure learning through model-averaged estimates of the suitable parameters. Subsequently, we develop a nonparametric graph clustering model on the lower dimensional projections of the graph based on Laplacian embeddings using Dirichlet process mixture models. In contrast to standard algorithmic approaches, this fully probabilistic approach allows incorporation of uncertainty in estimation and inference for both graph structure learning and clustering. More importantly, we formalize the arguments for Laplacian embeddings as suitable projections for graph clustering by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. We develop fast computational algorithms that allow our methods to scale to large number of nodes. Through extensive simulations we compare our clustering performance with standard clustering methods. We apply our methods to a novel pan-cancer proteomic data set, and evaluate protein networks and clusters across multiple different cancer types. △ Less

Submitted 24 September, 2015; originally announced September 2015.

arXiv:1508.02803 [pdf, other]

Bayesian Variable Selection with Structure Learning: Applications in Integrative Genomics

Authors: Suprateek Kundu, Minsuk Shin, Yichen Cheng, Ganiraju Manyam, Bani K. Mallick, Veera Baladandayuthapani

Abstract: Significant advances in biotechnology have allowed for simultaneous measurement of molecular data points across multiple genomic and transcriptomic levels from a single tumor/cancer sample. This has motivated systematic approaches to integrate multi-dimensional structured datasets since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions… ▽ More Significant advances in biotechnology have allowed for simultaneous measurement of molecular data points across multiple genomic and transcriptomic levels from a single tumor/cancer sample. This has motivated systematic approaches to integrate multi-dimensional structured datasets since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions between them. We propose a novel two-step Bayesian approach that combines a variable selection framework with integrative structure learning between multiple sources of data. The structure learning in the first step is accomplished through novel joint graphical models for heterogeneous (mixed scale) data allowing for flexible incorporation of prior knowledge. This structure learning subsequently informs the variable selection in the second step to identify groups of molecular features within and across platforms associated with outcomes of cancer progression. The variable selection strategy adjusts for collinearity and multiplicity, and also has theoretical justifications. We evaluate our methods through simulations and apply them to a motivating genomic (DNA copy number and methylation) and transcriptomic (mRNA expression) data for assessing important markers associated with Glioblastoma progression. △ Less

Submitted 11 August, 2015; originally announced August 2015.

arXiv:1407.5472

Nonparametric Variable Selection, Clustering and Prediction for High-Dimensional Regression

Authors: Subharup Guha, Veerabhadran Baladandayuthapani

Abstract: The development of parsimonious models for reliable inference and prediction of responses in high-dimensional regression settings is often challenging due to relatively small sample sizes and the presence of complex interaction patterns between a large number of covariates. We propose an efficient, nonparametric framework for simultaneous variable selection, clustering and prediction in high-throu… ▽ More The development of parsimonious models for reliable inference and prediction of responses in high-dimensional regression settings is often challenging due to relatively small sample sizes and the presence of complex interaction patterns between a large number of covariates. We propose an efficient, nonparametric framework for simultaneous variable selection, clustering and prediction in high-throughput regression settings with continuous or discrete outcomes, called VariScan. The VariScan model utilizes the sparsity induced by Poisson-Dirichlet processes (PDPs) to group the covariates into lower-dimensional latent clusters consisting of covariates with similar patterns among the samples. The data are permitted to direct the choice of a suitable cluster allocation scheme, choosing between PDPs and their special case, a Dirichlet process. Subsequently, the latent clusters are used to build a nonlinear prediction model for the responses using an adaptive mixture of linear and nonlinear elements, thus achieving a balance between model parsimony and flexibility. We investigate theoretical properties of the VariScan procedure that differentiate the allocations patterns of PDPs and Dirichlet processes both in terms of the number and relative sizes of their clusters. Additional theoretical results guarantee the high accuracy of the model-based clustering procedure, and establish model selection and prediction consistency. Through simulation studies and analyses of benchmark data sets, we demonstrate the reliability of VariScan's clustering mechanism and show that the technique compares favorably to, and often outperforms, existing methodologies in terms of the prediction accuracies of the subject-specific responses. △ Less

Submitted 13 April, 2016; v1 submitted 21 July, 2014; originally announced July 2014.

Comments: Note: this version has been substantially revised and please see new version of the article at the following link: [arXiv:1604.03615]

arXiv:1404.2910 [pdf, other]

Statistical Tests for Large Tree-structured Data

Authors: Karthik Bharath, Prabhanjan Kambadur, Dipak. K. Dey, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on develo** asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of… ▽ More We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on develo** asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton--Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as $χ^2$ and $F$ random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients. △ Less

Submitted 20 September, 2016; v1 submitted 10 April, 2014; originally announced April 2014.

arXiv:1403.7672 [pdf, ps, other]

doi 10.1214/14-AOAS722

Bayesian sparse graphical models for classification with application to protein expression data

Authors: Veerabhadran Baladandayuthapani, Rajesh Talluri, Yuan Ji, Kevin R. Coombes, Yiling Lu, Bryan T. Hennessy, Michael A. Davies, Bani K. Mallick

Abstract: Reverse-phase protein array (RPPA) analysis is a powerful, relatively new platform that allows for high-throughput, quantitative analysis of protein networks. One of the challenges that currently limit the potential of this technology is the lack of methods that allow for accurate data modeling and identification of related networks and samples. Such models may improve the accuracy of biological s… ▽ More Reverse-phase protein array (RPPA) analysis is a powerful, relatively new platform that allows for high-throughput, quantitative analysis of protein networks. One of the challenges that currently limit the potential of this technology is the lack of methods that allow for accurate data modeling and identification of related networks and samples. Such models may improve the accuracy of biological sample classification based on patterns of protein network activation and provide insight into the distinct biological relationships underlying different types of cancer. Motivated by RPPA data, we propose a Bayesian sparse graphical modeling approach that uses selection priors on the conditional relationships in the presence of class information. The novelty of our Bayesian model lies in the ability to draw information from the network data as well as from the associated categorical outcome in a unified hierarchical model for classification. In addition, our method allows for intuitive integration of a priori network information directly in the model and allows for posterior inference on the network topologies both within and between classes. Applying our methodology to an RPPA data set generated from panels of human breast cancer and ovarian cancer cell lines, we demonstrate that the model is able to distinguish the different cancer cell types more accurately than several existing models and to identify differential regulation of components of a critical signaling network (the PI3K-AKT pathway) between these two types of cancer. This approach represents a powerful new tool that can be used to improve our understanding of protein networks in cancer. △ Less

Submitted 21 November, 2014; v1 submitted 29 March, 2014; originally announced March 2014.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS722 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS722

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 3, 1443-1468

arXiv:1310.1127 [pdf, other]

Bayesian sparse graphical models and their mixtures using lasso selection priors

Authors: Rajesh Talluri, Veerabhadran Baladandayuthapani, Bani K. Mallick

Abstract: We propose Bayesian methods for Gaussian graphical models that lead to sparse and adaptively shrunk estimators of the precision (inverse covariance) matrix. Our methods are based on lasso-type regularization priors leading to parsimonious parameterization of the precision matrix, which is essential in several applications involving learning relationships among the variables. In this context, we in… ▽ More We propose Bayesian methods for Gaussian graphical models that lead to sparse and adaptively shrunk estimators of the precision (inverse covariance) matrix. Our methods are based on lasso-type regularization priors leading to parsimonious parameterization of the precision matrix, which is essential in several applications involving learning relationships among the variables. In this context, we introduce a novel type of selection prior that develops a sparse structure on the precision matrix by making most of the elements exactly zero, in addition to ensuring positive definiteness -- thus conducting model selection and estimation simultaneously. We extend these methods to finite and infinite mixtures of Gaussian graphical models for clustered data using Dirichlet process priors. We discuss appropriate posterior simulation schemes to implement posterior inference in the proposed models, including the evaluation of normalizing constants that are functions of parameters of interest which result from the restrictions on the correlation matrix. We evaluate the operating characteristics of our method via several simulations and in application to real data sets. △ Less

Submitted 3 October, 2013; originally announced October 2013.

Comments: under revision

arXiv:1308.3915 [pdf, other]

Bayes Regularized Graphical Model Estimation in High Dimensions

Authors: Suprateek Kundu, Veera Baladandayuthapani, Bani K. Mallick

Abstract: There has been an intense development of Bayes graphical model estimation approaches over the past decade - however, most of the existing methods are restricted to moderate dimensions. We propose a novel approach suitable for high dimensional settings, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under novel class of continuous shrinkag… ▽ More There has been an intense development of Bayes graphical model estimation approaches over the past decade - however, most of the existing methods are restricted to moderate dimensions. We propose a novel approach suitable for high dimensional settings, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under novel class of continuous shrinkage priors on the precision matrix elements, which induces shrinkage under an equivalence with Cholesky-based regularization while enabling conjugate updates of entire precision matrices. Subsequently, we propose a post-fitting graphical model estimation step which proceeds using penalized joint credible regions to perform neighborhood selection sequentially for each node. The posterior computation proceeds using straightforward fully Gibbs sampling, and the approach is scalable to high dimensions. The proposed approach is shown to be asymptotically consistent in estimating the graph structure for fixed $p$ when the truth is a Gaussian graphical model. Simulations show that our approach compares favorably with Bayesian competitors both in terms of graphical model estimation and computational efficiency. We apply our methods to high dimensional gene expression and microRNA datasets in cancer genomics. △ Less

Submitted 18 August, 2013; originally announced August 2013.

Comments: 42 Pages, 4 figures, 5 tables

arXiv:1108.3910 [pdf, ps, other]

doi 10.1214/10-AOAS407

Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data

Authors: Jeffrey S. Morris, Veerabhadran Baladandayuthapani, Richard C. Herrick, Pietro Sanna, Howard Gutstein

Abstract: Image data are increasingly encountered and are of growing importance in many areas of science. Much of these data are quantitative image data, which are characterized by intensities that represent some measurement of interest in the scanned images. The data typically consist of multiple images on the same domain and the goal of the research is to combine the quantitative information across images… ▽ More Image data are increasingly encountered and are of growing importance in many areas of science. Much of these data are quantitative image data, which are characterized by intensities that represent some measurement of interest in the scanned images. The data typically consist of multiple images on the same domain and the goal of the research is to combine the quantitative information across images to make inference about populations or interventions. In this paper we present a unified analysis framework for the analysis of quantitative image data using a Bayesian functional mixed model approach. This framework is flexible enough to handle complex, irregular images with many local features, and can model the simultaneous effects of multiple factors on the image intensities and account for the correlation between images induced by the design. We introduce a general isomorphic modeling approach to fitting the functional mixed model, of which the wavelet-based functional mixed model is one special case. With suitable modeling choices, this approach leads to efficient calculations and can result in flexible modeling and adaptive smoothing of the salient features in the data. The proposed method has the following advantages: it can be run automatically, it produces inferential plots indicating which regions of the image are associated with each factor, it simultaneously considers the practical and statistical significance of findings, and it controls the false discovery rate. △ Less

Submitted 19 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS407

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 2A, 894-923

Showing 1–31 of 31 results for author: Baladandayuthapani, V