Skip to main content

Showing 1–23 of 23 results for author: Lock, E F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2308.16333  [pdf, other

    stat.ME q-bio.QM stat.ML

    Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis

    Authors: Jiuzhou Wang, Eric F. Lock

    Abstract: Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (i.e., cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization me… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 38 pages, 7 figures, 10 tables

  2. arXiv:2211.16403  [pdf, other

    stat.ME stat.ML

    Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data

    Authors: Sarah Samorodnitsky, Chris H. Wendt, Eric F. Lock

    Abstract: Understanding of the pathophysiology of obstructive lung disease (OLD) is limited by available methods to examine the relationship between multi-omic molecular phenomena and clinical outcomes. Integrative factorization methods for multi-omic data can reveal latent patterns of variation describing important biological signal. However, most methods do not provide a framework for inference on the est… ▽ More

    Submitted 29 November, 2022; v1 submitted 29 November, 2022; originally announced November 2022.

  3. arXiv:2208.03396  [pdf, other

    stat.ME q-bio.QM stat.ML

    Bayesian predictive modeling of multi-source multi-way data

    Authors: Jonathan Kim, Brian J. Sandri, Raghavendra B. Rao, Eric F. Lock

    Abstract: We develop a Bayesian approach to predict a continuous or binary outcome from data that are collected from multiple sources with a multi-way (i.e.. multidimensional tensor) structure. As a motivating example we consider molecular data from multiple 'omics sources, each measured over multiple developmental time points, as predictors of early-life iron deficiency (ID) in a rhesus monkey model. We us… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: 23 pages, 2 figures, 13 tables

  4. arXiv:2110.05377  [pdf, other

    stat.ME q-bio.QM stat.ML

    Multiway sparse distance weighted discrimination

    Authors: Bin Guo, Lynn E. Eberly, Pierre-Gilles Henry, Christophe Lenglet, Eric F. Lock

    Abstract: Modern data often take the form of a multiway array. However, most classification methods are designed for vectors, i.e., 1-way arrays. Distance weighted discrimination (DWD) is a popular high-dimensional classification method that has been extended to the multiway context, with dramatic improvements in performance when data have multiway structure. However, the previous implementation of multiway… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 46 pages, 8 figures

  5. arXiv:2103.00629  [pdf, other

    stat.AP

    A Hierarchical Spike-and-Slab Model for Pan-Cancer Survival Using Pan-Omic Data

    Authors: Sarah Samorodnitsky, Katherine A. Hoadley, Eric F. Lock

    Abstract: Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer, expanding what was known from single-cancer or single-omics studies. However, pan-cancer, pan-omics analyses have been limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict important clinical out… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

  6. arXiv:2102.13278  [pdf, other

    stat.ML cs.LG q-bio.QM stat.ME

    sJIVE: Supervised Joint and Individual Variation Explained

    Authors: Elise F. Palzer, Christine Wendt, Russell Bowler, Craig P. Hersh, Sandra E. Safo, Eric F. Lock

    Abstract: Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: 23 pages, 8 tables, 3 figures

  7. arXiv:2010.03111  [pdf, other

    stat.ME cs.LG stat.ML

    Bayesian Distance Weighted Discrimination

    Authors: Eric F. Lock

    Abstract: Distance weighted discrimination (DWD) is a linear discrimination method that is particularly well-suited for classification tasks with high-dimensional data. The DWD coefficients minimize an intuitive objective function, which can solved very efficiently using state-of-the-art optimization techniques. However, DWD has not yet been cast into a model-based framework for statistical inference. In th… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 27 pages, 8 figures

  8. arXiv:2002.02601  [pdf, other

    stat.ML cs.LG q-bio.QM stat.AP stat.ME

    Bidimensional linked matrix factorization for pan-omics pan-cancer analysis

    Authors: Eric F. Lock, Jun Young Park, Katherine A. Hoadley

    Abstract: Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies… ▽ More

    Submitted 7 April, 2022; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: 26 pages, 5 figures

    Journal ref: Annals of Applied Statistics 2022, Vol. 16, No. 1, 193-215

  9. arXiv:1910.07017  [pdf, other

    stat.ME

    Bayesian variable selection in hierarchical difference-in-differences models

    Authors: James Normington, Eric F. Lock, Thomas A. Murray, Caroline S. Carlin

    Abstract: A popular method for estimating a causal treatment effect with observational data is the difference-in-differences (DiD) model. In this work, we consider an extension of the classical DiD setting to the hierarchical context in which data cannot be matched at the most granular level (e.g., individual-level differences are unobservable). We propose a Bayesian hierarchical difference-in-differences (… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

  10. arXiv:1910.03447  [pdf, other

    q-bio.QM stat.AP

    A Pan-Cancer and Polygenic Bayesian Hierarchical Model for the Effect of Somatic Mutations on Survival

    Authors: Sarah Samorodnitsky, Katherine A. Hoadley, Eric F. Lock

    Abstract: We built a novel Bayesian hierarchical survival model based on the somatic mutation profile of patients across 50 genes and 27 cancer types. The pan-cancer quality allows for the model to "borrow" information across cancer types, motivated by the assumption that similar mutation profiles may have similar (but not necessarily identical) effects on survival across different tissues-of-origin or tumo… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: 20 pages, 4 figures

  11. arXiv:1906.03722  [pdf, other

    stat.ML cs.LG q-bio.QM stat.ME

    Integrative Factorization of Bidimensionally Linked Matrices

    Authors: Jun Young Park, Eric F. Lock

    Abstract: Advances in molecular "omics'" technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of b… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: 27 pages, 4 figures

    Journal ref: Biometrics, 2019

  12. arXiv:1901.11172  [pdf, other

    stat.AP stat.ME stat.ML

    Bayesian nonparametric multiway regression for clustered binomial data

    Authors: Eric F. Lock, Dipankar Bandyopadhyay

    Abstract: We introduce a Bayesian nonparametric regression model for data with multiway (tensor) structure, motivated by an application to periodontal disease (PD) data. Our outcome is the number of diseased sites measured over four different tooth types for each subject, with subject-specific covariates available as predictors. The outcomes are not well-characterized by simple parametric models, so we use… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: 20 pages, 5 figures

  13. arXiv:1808.02193  [pdf, ps, other

    stat.ME

    Generalized Integrative Principal Component Analysis for Multi-Type Data with Block-Wise Missing Structure

    Authors: Huichen Zhu, Gen Li, Eric F. Lock

    Abstract: High-dimensional multi-source data are encountered in many fields. Despite recent developments on the integrative dimension reduction of such data, most existing methods cannot easily accommodate data of multiple types (e.g., binary or count-valued). Moreover, multi-source data often have block-wise missing structure, i.e., data in one or more sources may be completely unobserved for a sample. The… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  14. Detecting Multiple Random Changepoints in Bayesian Piecewise Growth Mixture Models

    Authors: Eric F Lock, Nidhi Kohli, Maitreyee Bose

    Abstract: Piecewise growth mixture models (PGMM) are a flexible and useful class of methods for analyzing segmented trends in individual growth trajectory over time, where the individuals come from a mixture of two or more latent classes. These models allow each segment of the overall developmental process within each class to have a different functional form; examples include two linear phases of growth, o… ▽ More

    Submitted 29 October, 2017; originally announced October 2017.

    Comments: 32 pages, 10 tables, 6 figures

    Journal ref: Psychometrika 83(3), 733-750, 2018

  15. arXiv:1710.02931  [pdf, other

    stat.ME q-bio.QM

    Linked Matrix Factorization

    Authors: Michael J. O'Connell, Eric F. Lock

    Abstract: In recent years, a number of methods have been developed for the dimension reduction and decomposition of multiple linked high-content data matrices. Typically these methods assume that just one dimension, rows or columns, is shared among the data sources. This shared dimension may represent common features that are measured for different sample sets (i.e., horizontal integration) or a common set… ▽ More

    Submitted 9 October, 2017; originally announced October 2017.

    Comments: 24 pages, 4 figures

    Journal ref: Biometrics 75 (2): 582-592, 2019

  16. arXiv:1704.02069  [pdf

    q-bio.QM stat.AP

    Prediction with Dimension Reduction of Multiple Molecular Data Sources for Patient Survival

    Authors: Adam Kaplan, Eric F. Lock

    Abstract: Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal components analysis (PCA). However, the application of PCA is not straightforward for multi-source data, wherein multiple sources of 'omics data measure different but related biological components. In this article we utilize recent advances in the dimension reduction of multi-s… ▽ More

    Submitted 17 July, 2017; v1 submitted 6 April, 2017; originally announced April 2017.

    Comments: 11 pages, 9 figures

    Journal ref: Cancer Informatics, 16: 1-11, 2017

  17. Tensor-on-tensor regression

    Authors: Eric F. Lock

    Abstract: We propose a framework for the linear prediction of a multi-way array (i.e., a tensor) from another multi-way array of arbitrary dimension, using the contracted tensor product. This framework generalizes several existing approaches, including methods to predict a scalar outcome from a tensor, a matrix from a matrix, or a tensor from a scalar. We describe an approach that exploits the multiway stru… ▽ More

    Submitted 26 June, 2017; v1 submitted 4 January, 2017; originally announced January 2017.

    Comments: 33 pages, 3 figures

    Journal ref: Journal of Computational and Graphical Statistics 27 (3), 638-647, 2018

  18. arXiv:1609.03228  [pdf, other

    stat.ME stat.ML

    Supervised multiway factorization

    Authors: Eric F. Lock, Gen Li

    Abstract: We describe a probabilistic PARAFAC/CANDECOMP (CP) factorization for multiway (i.e., tensor) data that incorporates auxiliary covariates, SupCP. SupCP generalizes the supervised singular value decomposition (SupSVD) for vector-valued observations, to allow for observations that have the form of a matrix or higher-order array. Such data are increasingly encountered in biomedical research and other… ▽ More

    Submitted 31 March, 2018; v1 submitted 11 September, 2016; originally announced September 2016.

    Comments: 31 pages, 6 figures, 7 tables

    Journal ref: Electronic Journal of Statistics 2018, Vol. 12, No. 1, 1150-1180

  19. arXiv:1606.08046  [pdf, other

    stat.ME q-bio.QM stat.ML

    Discriminating sample groups with multi-way data

    Authors: Tianmeng Lyu, Eric F. Lock, Lynn E. Eberly

    Abstract: High-dimensional linear classifiers, such as the support vector machine (SVM) and distance weighted discrimination (DWD), are commonly used in biomedical research to distinguish groups of subjects based on a large number of features. However, their use is limited to applications where a single vector of features is measured for each subject. In practice data are often multi-way, or measured over m… ▽ More

    Submitted 26 June, 2016; originally announced June 2016.

    Comments: 25 pages, 5 figures, 2 tables

    Journal ref: Biostatistics 18(3), 434-450, 2017

  20. arXiv:1604.08654  [pdf, other

    stat.ME q-bio.GN stat.AP

    Bayesian Genome- and Epigenome-wide Association Studies with Gene Level Dependence

    Authors: Eric F. Lock, David B. Dunson

    Abstract: High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These genomic variables can naturally be grouped by the gene they encode, among other criteria. However, standard practice in such applications is in… ▽ More

    Submitted 28 April, 2016; originally announced April 2016.

    Comments: 23 pages, 7 figures

    Journal ref: Biometrics 73(3), 1018-1028, 2017

  21. Shared kernel Bayesian screening

    Authors: Eric F. Lock, David B. Dunson

    Abstract: This article concerns testing for equality of distribution between groups. We focus on screening variables with shared distributional features such as common support, modes and patterns of skewness. We propose a Bayesian testing method using kernel mixtures, which improves performance by borrowing information across the different variables and groups through shared kernels and a common probability… ▽ More

    Submitted 17 February, 2016; v1 submitted 1 November, 2013; originally announced November 2013.

    Comments: Author version of article published in Biometrika; 23 pages, 9 figures

    Journal ref: Biometrika 102(4), 829-842, 2015

  22. Bayesian Consensus Clustering

    Authors: Eric F. Lock, David B. Dunson

    Abstract: The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework… ▽ More

    Submitted 28 February, 2013; originally announced February 2013.

    Comments: 32 pages, 13 figures

    Journal ref: Bioinformatics 29 (2013) 2610-2616

  23. arXiv:1102.4110  [pdf, ps, other

    stat.ML stat.AP stat.ME

    Joint and individual variation explained (JIVE) for integrated analysis of multiple data types

    Authors: Eric F. Lock, Katherine A. Hoadley, J. S. Marron, Andrew B. Nobel

    Abstract: Research in several fields now requires the analysis of data sets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of… ▽ More

    Submitted 28 May, 2013; v1 submitted 20 February, 2011; originally announced February 2011.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS597 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS597

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 1, 523-542