Search | arXiv e-print repository

Semi-supervised empirical Bayes group-regularized factor regression

Authors: Magnus M. Münch, Mark A. van de Wiel, Aad W. van der Vaart, Carel F. W. Peeters

Abstract: The features in high dimensional biomedical prediction problems are often well described with lower dimensional manifolds. An example is genes that are organised in smaller functional networks. The outcome can then be described with the factor regression model. A benefit of the factor model is that is allows for straightforward inclusion of unlabeled observations in the estimation of the model, i.… ▽ More The features in high dimensional biomedical prediction problems are often well described with lower dimensional manifolds. An example is genes that are organised in smaller functional networks. The outcome can then be described with the factor regression model. A benefit of the factor model is that is allows for straightforward inclusion of unlabeled observations in the estimation of the model, i.e., semi-supervised learning. In addition, the high dimensional features in biomedical prediction problems are often well characterised. Examples are genes, for which annotation is available, and metabolites with $p$-values from a previous study available. In this paper, the extra information on the features is included in the prior model for the features. The extra information is weighted and included in the estimation through empirical Bayes, with Variational approximations to speed up the computation. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predictions oral cancer metastatsis from RNAseq data. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 19 pages, 5 figures, submitted to Biometrical Journal

arXiv:2010.05619 [pdf, other]

rags2ridges: A One-Stop-Shop for Graphical Modeling of High-Dimensional Precision Matrices

Authors: Carel F. W. Peeters, Anders Ellern Bilgrau, Wessel N. van Wieringen

Abstract: A graphical model is an undirected network representing the conditional independence properties between random variables. Graphical modeling has become part and parcel of systems or network approaches to multivariate data, in particular when the variable dimension exceeds the observation dimension. rags2ridges is an R package for graphical modeling of high-dimensional precision matrices. It provid… ▽ More A graphical model is an undirected network representing the conditional independence properties between random variables. Graphical modeling has become part and parcel of systems or network approaches to multivariate data, in particular when the variable dimension exceeds the observation dimension. rags2ridges is an R package for graphical modeling of high-dimensional precision matrices. It provides a modular framework for the extraction, visualization, and analysis of Gaussian graphical models from high-dimensional data. Moreover, it can handle the incorporation of prior information as well as multiple heterogeneous data classes. As such, it provides a one-stop-shop for graphical modeling of high-dimensional precision matrices. The functionality of the package is illustrated with an example dataset pertaining to blood-based metabolite measurements in persons suffering from Alzheimer's Disease. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: 30 pages, 10 figures

arXiv:1909.11566 [pdf, other]

doi 10.1177/0049124110378099

A Note on a Simple and Practical Randomized Response Framework for Eliciting Sensitive Dichotomous & Quantitative Information

Authors: Carel F. W. Peeters, Gerty J. L. M. Lensvelt-Mulders, Karin Lasthuizen

Abstract: Many issues of interest to social scientists and policymakers are of a sensitive nature in the sense that they are intrusive, stigmatizing or incriminating to the respondent. This results in refusals to cooperate or evasive cooperation in studies using self-reports. In a seminal article Warner proposed to curb this problem by generating an artificial variability in responses to inoculate the indiv… ▽ More Many issues of interest to social scientists and policymakers are of a sensitive nature in the sense that they are intrusive, stigmatizing or incriminating to the respondent. This results in refusals to cooperate or evasive cooperation in studies using self-reports. In a seminal article Warner proposed to curb this problem by generating an artificial variability in responses to inoculate the individual meaning of answers to sensitive questions. This procedure was further developed and extended, and came to be known as the randomized response (RR) technique. Here, we propose a unified treatment for eliciting sensitive binary as well as quantitative information with RR based on a model where the inoculating elements are provided for by the randomization device. The procedure is simple and we will argue that its implementation in a computer-assisted setting may have superior practical capabilities. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: Postprint, 11 pages, 1 figure

Journal ref: Sociological Methods & Research, 39 (2010): 283-296

arXiv:1909.08022 [pdf, ps, other]

doi 10.1007/s11336-012-9259-3

Rotational Uniqueness Conditions Under Oblique Factor Correlation Metric

Authors: Carel F. W. Peeters

Abstract: In an addendum to his seminal 1969 article Jöreskog stated two sets of conditions for rotational identification of the oblique factor solution under utilization of fixed zero elements in the factor loadings matrix. These condition sets, formulated under factor correlation and factor covariance metrics, respectively, were claimed to be equivalent and to lead to global rotational uniqueness of the f… ▽ More In an addendum to his seminal 1969 article Jöreskog stated two sets of conditions for rotational identification of the oblique factor solution under utilization of fixed zero elements in the factor loadings matrix. These condition sets, formulated under factor correlation and factor covariance metrics, respectively, were claimed to be equivalent and to lead to global rotational uniqueness of the factor solution. It is shown here that the conditions for the oblique factor correlation structure need to be amended for global rotational uniqueness, and hence, that the condition sets are not equivalent in terms of unicity of the solution. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: Postprint, 5 pages

Journal ref: Psychometrika, 77 (2012): 288-292

arXiv:1909.07648 [pdf, other]

Social Network Analysis of Corruption Structures: Adjacency Matrices Supporting the Visualization and Quantification of Layeredness

Authors: Carel F. W. Peeters

Abstract: Often, corruption is described as taking place within or supported by a network: A collection of individuals structured in such a way as to enable the transaction of bribes for favors. Surprisingly, despite the network nomenclature, corruption is rarely analyzed from the network perspective using the tools of network science. Here, we will argue that analyzing corruption from the perspective of ne… ▽ More Often, corruption is described as taking place within or supported by a network: A collection of individuals structured in such a way as to enable the transaction of bribes for favors. Surprisingly, despite the network nomenclature, corruption is rarely analyzed from the network perspective using the tools of network science. Here, we will argue that analyzing corruption from the perspective of network science is beneficial to its understanding. In passing this chapter, a contribution to the Liber Amicorum in honor of Leo Huberts, then gives a very short introduction into social network analysis. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: 13 pages. Postprint of Chapter 20 in: G. de Graaf (Eds). "It is all about integrity stupid: Studies on, about or inspired by the work of Leo Huberts." Eleven International Publishing, 2019: pp. 201-219

arXiv:1903.11696 [pdf, other]

Stable prediction with radiomics data

Authors: Carel F. W. Peeters, Caroline Übelhör, Steven W. Mes, Roland Martens, Thomas Koopman, Pim de Graaf, Floris H. P. van Velden, Ronald Boellaard, Jonas A. Castelijns, Dennis E. te Beest, Martijn W. Heymans, Mark A. van de Wiel

Abstract: Motivation: Radiomics refers to the high-throughput mining of quantitative features from radiographic images. It is a promising field in that it may provide a non-invasive solution for screening and classification. Standard machine learning classification and feature selection techniques, however, tend to display inferior performance in terms of (the stability of) predictive performance. This is d… ▽ More Motivation: Radiomics refers to the high-throughput mining of quantitative features from radiographic images. It is a promising field in that it may provide a non-invasive solution for screening and classification. Standard machine learning classification and feature selection techniques, however, tend to display inferior performance in terms of (the stability of) predictive performance. This is due to the heavy multicollinearity present in radiomic data. We set out to provide an easy-to-use approach that deals with this problem. Results: We developed a four-step approach that projects the original high-dimensional feature space onto a lower-dimensional latent-feature space, while retaining most of the covariation in the data. It consists of (i) penalized maximum likelihood estimation of a redundancy filtered correlation matrix. The resulting matrix (ii) is the input for a maximum likelihood factor analysis procedure. This two-stage maximum-likelihood approach can be used to (iii) produce a compact set of stable features that (iv) can be directly used in any (regression-based) classifier or predictor. It outperforms other classification (and feature selection) techniques in both external and internal validation settings regarding survival in squamous cell cancers. △ Less

Submitted 27 March, 2019; originally announced March 2019.

Comments: 52 pages: 14 pages Main Text and 38 pages of Supplementary Material

arXiv:1805.00389 [pdf, other]

Adaptive group-regularized logistic elastic net regression

Authors: Magnus M. Münch, Carel F. W. Peeters, Aad W. van der Vaart, Mark A. van de Wiel

Abstract: In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (a) p-values from a previous study, (b) a summary of prior information, and (c) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection, but is not straightforward in the s… ▽ More In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (a) p-values from a previous study, (b) a summary of prior information, and (c) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection, but is not straightforward in the standard regression setting. As a solution to this problem, we propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical-variational Bayes framework. Simulations and an application to a colon cancer microRNA study show that, if the partitioning of the features is informative, classification performance and feature selection are indeed enhanced. △ Less

Submitted 1 May, 2018; originally announced May 2018.

Comments: 19 pages, 5 figures, supplementary material available from first author's personal website

arXiv:1801.01285 [pdf, ps, other]

doi 10.1007/978-0-387-09612-4_13

Inequality Constrained Multilevel Models

Authors: Bernet S. Kato, Carel F. W. Peeters

Abstract: Multilevel or hierarchical data structures can occur in many areas of research, including economics, psychology, sociology, agriculture, medicine, and public health. Over the last 25 years, there has been increasing interest in develo** suitable techniques for the statistical analysis of multilevel data, and this has resulted in a broad class of models known under the generic name of multilevel… ▽ More Multilevel or hierarchical data structures can occur in many areas of research, including economics, psychology, sociology, agriculture, medicine, and public health. Over the last 25 years, there has been increasing interest in develo** suitable techniques for the statistical analysis of multilevel data, and this has resulted in a broad class of models known under the generic name of multilevel models. Generally, multilevel models are useful for exploring how relationships vary across higher-level units taking into account the within and between cluster variations. Research scientists often have substantive theories in mind when evaluating data with statistical models. Substantive theories often involve inequality constraints among the parameters to translate a theory into a model. This chapter shows how the inequality constrained multilevel linear model can be given a Bayesian formulation, how the model parameters can be estimated using a so-called augmented Gibbs sampler, and how posterior probabilities can be computed to assist the researcher in model selection. △ Less

Submitted 4 January, 2018; originally announced January 2018.

Comments: 20 pages. Postprint of Chapter 13 in: H. Hoijtink, I. Klugkist, & P.A. Boelen (Eds.). "Bayesian Evaluation of Informative Hypotheses." New York: Springer, 2008: pp. 273-295

arXiv:1709.07285 [pdf, other]

doi 10.1016/j.dadm.2017.07.006

Blood-based metabolic signatures in Alzheimer's disease

Authors: Francisca A. de Leeuw, Carel F. W. Peeters, Maartje I. Kester, Amy C. Harms, Eduard A. Struys, Thomas Hankemeier, Herman W. T. van Vlijmen, Sven J. van der Lee, Cornelia M. van Duijn, Philip Scheltens, Ayşe Demirkan, Mark A. van de Wiel, Wiesje M. van der Flier, Charlotte E. Teunissen

Abstract: Introduction: Identification of blood-based metabolic changes might provide early and easy-to-obtain biomarkers. Methods: We included 127 AD patients and 121 controls with CSF-biomarker-confirmed diagnosis (cut-off tau/A$β_{42}$: 0.52). Mass spectrometry platforms determined the concentrations of 53 amine, 22 organic acid, 120 lipid, and 40 oxidative stress compounds. Multiple signatures were as… ▽ More Introduction: Identification of blood-based metabolic changes might provide early and easy-to-obtain biomarkers. Methods: We included 127 AD patients and 121 controls with CSF-biomarker-confirmed diagnosis (cut-off tau/A$β_{42}$: 0.52). Mass spectrometry platforms determined the concentrations of 53 amine, 22 organic acid, 120 lipid, and 40 oxidative stress compounds. Multiple signatures were assessed: differential expression (nested linear models), classification (logistic regression), and regulatory (network extraction). Results: Twenty-six metabolites were differentially expressed. Metabolites improved the classification performance of clinical variables from 74% to 79%. Network models identified 5 hubs of metabolic dysregulation: Tyrosine, glycylglycine, glutamine, lysophosphatic acid C18:2 and platelet activating factor C16:0. The metabolite network for APOE $ε$4 negative AD patients was less cohesive compared to the network for APOE $ε$4 positive AD patients. Discussion: Multiple signatures point to various promising peripheral markers for further validation. The network differences in AD patients according to APOE genotype may reflect different pathways to AD. △ Less

Submitted 21 September, 2017; originally announced September 2017.

Comments: Postprint, 76 pages, 32 figures, includes supplementary material

Journal ref: Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 8 (2017): 196-207

arXiv:1610.06762 [pdf, other]

doi 10.1016/j.dadm.2017.03.002

Detecting functional decline from normal ageing to dementia: development and validation of a short version of the Amsterdam IADL Questionnaire

Authors: Roos J. Jutten, Carel F. W. Peeters, Sophie M. J. Leijdesdorff, Pieter Jelle Visser, Andrea B. Maier, Caroline B. Terwee, Philip Scheltens, Sietske A. M. Sikkes

Abstract: INTRODUCTION: Detecting functional decline from normal ageing to dementia is relevant for diagnostic and prognostic purposes. Therefore, the Amsterdam IADL Questionnaire (A-IADL-Q) was developed: a 70-item proxy-based tool with good psychometric properties. We aimed to design a short version whilst preserving its psychometric quality. METHODS: Study partners of subjects (n=1355), ranging from cogn… ▽ More INTRODUCTION: Detecting functional decline from normal ageing to dementia is relevant for diagnostic and prognostic purposes. Therefore, the Amsterdam IADL Questionnaire (A-IADL-Q) was developed: a 70-item proxy-based tool with good psychometric properties. We aimed to design a short version whilst preserving its psychometric quality. METHODS: Study partners of subjects (n=1355), ranging from cognitively normal to dementia subjects, completed the original A-IADL-Q. We selected the short version items using a stepwise procedure combining missing data, Item Response Theory and input from respondents and experts. We investigated internal consistency of the short version as well as concordance with the original version. To assess its construct validity, we additionally investigated concordance between the short version and the Mini-Mental State Examination (MMSE) and Disability Assessment for Dementia (DAD). Lastly, we investigated differences in IADL scores between diagnostic groups across the dementia spectrum. RESULTS: We selected 30 items covering the entire spectrum of IADL functioning. Internal consistency (.98) and concordance with the original version (.97) were very high. Concordance with the MMSE (.72) and DAD (.87) scores was high. IADL impairment scores increased across the spectrum from normal cognition to dementia. DISCUSSION: The A-IADL-Q Short Version (A-IADL-Q-SV) consists of 30 items. The A-IADL-Q-SV has maintained the psychometric quality of the original A-IADL-Q. As such, it is a concise measure of functional decline. △ Less

Submitted 21 March, 2017; v1 submitted 21 October, 2016; originally announced October 2016.

Comments: 14 pages, 3 tables, 4 figures

Journal ref: Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 8 (2017): 26-35

arXiv:1609.02313 [pdf, other]

doi 10.1016/j.annepidem.2014.07.012

Pathophysiological Domains Underlying the Metabolic Syndrome: An Alternative Factor Analytic Strategy

Authors: Carel F. W. Peeters, James Dziura, Floryt van Wesel

Abstract: Purpose: Factor analysis (FA) has become part and parcel in metabolic syndrome (MBS) research. Both exploration- and confirmation-driven factor analyzes are rampant. However, factor analytic results on MBS differ widely. A situation that is at least in part attributable to misapplication of FA. Here, our purpose is (i) to review factor analytic efforts in the study of MBS with emphasis on misusage… ▽ More Purpose: Factor analysis (FA) has become part and parcel in metabolic syndrome (MBS) research. Both exploration- and confirmation-driven factor analyzes are rampant. However, factor analytic results on MBS differ widely. A situation that is at least in part attributable to misapplication of FA. Here, our purpose is (i) to review factor analytic efforts in the study of MBS with emphasis on misusage of the FA model and (ii) to propose an alternative factor analytic strategy. Methods: The proposed factor analytic strategy consists of four steps and confronts weaknesses in application of the FA model. At its heart lies the explicit separation of dimensionality and pattern selection as well as the direct evaluation of competing inequality-constrained loading patterns. A high-profile MBS data set with anthropometric measurements on overweight children and adolescents is reanalyzed using this strategy. Results: The reanalysis implied a more parsimonious constellation of pathophysiological domains underlying phenotypic expressions of MBS than the original analysis (and many other analyzes). The results emphasize correlated factors of impaired glucose metabolism and impaired lipid metabolism. Conclusions: Pathophysiological domains underlying phenotypic expressions of MBS included in the analysis are driven by multiple interrelated metabolic impairments. These findings indirectly point to the possible existence of a multifactorial aetiology. △ Less

Submitted 8 September, 2016; originally announced September 2016.

Comments: Postprint, 41 pages, includes supplementary material

Journal ref: Annals of Epidemiology, 24 (2014): 762-770

arXiv:1608.04123 [pdf, other]

doi 10.1007/s00180-019-00912-z

The Spectral Condition Number Plot for Regularization Parameter Determination

Authors: Carel F. W. Peeters, Mark A. van de Wiel, Wessel N. van Wieringen

Abstract: Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its… ▽ More Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter selection. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators. △ Less

Submitted 14 August, 2016; originally announced August 2016.

Comments: 41 pages, 7 figures, includes supplementary material

Journal ref: Computational Statistics, 35(2):629-646, 2020

arXiv:1603.05882 [pdf, other]

Bayesian Constrained-Model Selection for Factor Analytic Modeling

Authors: Carel F. W. Peeters

Abstract: My dissertation revolves around Bayesian approaches towards constrained statistical inference in the factor analysis (FA) model. Two interconnected types of restricted-model selection are considered. These types have a natural connection to selection problems in the exploratory FA (EFA) and confirmatory FA (CFA) model and are termed Type I and Type II model selection. Type I constrained-model sele… ▽ More My dissertation revolves around Bayesian approaches towards constrained statistical inference in the factor analysis (FA) model. Two interconnected types of restricted-model selection are considered. These types have a natural connection to selection problems in the exploratory FA (EFA) and confirmatory FA (CFA) model and are termed Type I and Type II model selection. Type I constrained-model selection is taken to mean the determination of the appropriate dimensionality of a model. This type of constrained-model selection connects with EFA in the sense of selecting the optimal dimensionality of the latent vector. Type II model selection is taken to mean the determination of appropriate inequality, order or shape restrictions on the parameter space. The dissertation connects Type II constrained-model selection to CFA by focusing on the determination of linear inequality constraints as expressions of the direction and (relative) strength of factor loadings. The figures accompanying this article are taken from the slides of my Division 5 Awards Symposium Invited address at the APA 2015 Annual Convention in Toronto. These slides can be retrieved from \url{https://github.com/CFWP/ConventionTalk}. △ Less

Submitted 12 April, 2016; v1 submitted 18 March, 2016; originally announced March 2016.

Comments: 8 pages, 3 figures; Preprint based on the first chapter of my unpublished PhD dissertation. Published version can be retrieved from URL: http://www.apadivisions.org/division-5/publications/score/2016/04/index.aspx, The Score, April 2016 Issue

arXiv:1509.07982 [pdf, other]

Targeted Fused Ridge Estimation of Inverse Covariance Matrices from Multiple High-Dimensional Data Classes

Authors: Anders Ellern Bilgrau, Carel F. W. Peeters, Poul Svante Eriksen, Martin Bøgsted, Wessel N. van Wieringen

Abstract: We consider the problem of jointly estimating multiple inverse covariance matrices from high-dimensional data consisting of distinct classes. An $\ell_2$-penalized maximum likelihood approach is employed. The suggested approach is flexible and generic, incorporating several other $\ell_2$-penalized estimators as special cases. In addition, the approach allows specification of target matrices throu… ▽ More We consider the problem of jointly estimating multiple inverse covariance matrices from high-dimensional data consisting of distinct classes. An $\ell_2$-penalized maximum likelihood approach is employed. The suggested approach is flexible and generic, incorporating several other $\ell_2$-penalized estimators as special cases. In addition, the approach allows specification of target matrices through which prior knowledge may be incorporated and which can stabilize the estimation procedure in high-dimensional settings. The result is a targeted fused ridge estimator that is of use when the precision matrices of the constituent classes are believed to chiefly share the same structure while potentially differing in a number of locations of interest. It has many applications in (multi)factorial study designs. We focus on the graphical interpretation of precision matrices with the proposed estimator then serving as a basis for integrative or meta-analytic Gaussian graphical modeling. Situations are considered in which the classes are defined by data sets and subtypes of diseases. The performance of the proposed estimator in the graphical modeling setting is assessed through extensive simulation experiments. Its practical usability is illustrated by the differential network modeling of 12 large-scale gene expression data sets of diffuse large B-cell lymphoma subtypes. The estimator and its related procedures are incorporated into the R-package rags2ridges. △ Less

Submitted 26 March, 2020; v1 submitted 26 September, 2015; originally announced September 2015.

Comments: 52 pages, 11 figures

Journal ref: Journal of Machine Learning Research, 21(26):1--52, 2020

arXiv:1403.0904 [pdf, other]

doi 10.1016/j.csda.2016.05.012

Ridge Estimation of Inverse Covariance Matrices from High-Dimensional Data

Authors: Wessel N. van Wieringen, Carel F. W. Peeters

Abstract: We study ridge estimation of the precision matrix in the high-dimensional setting where the number of variables is large relative to the sample size. We first review two archetypal ridge estimators and note that their utilized penalties do not coincide with common ridge penalties. Subsequently, starting from a common ridge penalty, analytic expressions are derived for two alternative ridge estimat… ▽ More We study ridge estimation of the precision matrix in the high-dimensional setting where the number of variables is large relative to the sample size. We first review two archetypal ridge estimators and note that their utilized penalties do not coincide with common ridge penalties. Subsequently, starting from a common ridge penalty, analytic expressions are derived for two alternative ridge estimators of the precision matrix. The alternative estimators are compared to the archetypes with regard to eigenvalue shrinkage and risk. The alternatives are also compared to the graphical lasso within the context of graphical modeling. The comparisons may give reason to prefer the proposed alternative estimators. △ Less

Submitted 24 September, 2015; v1 submitted 4 March, 2014; originally announced March 2014.

Journal ref: Computational Statistics & Data Analysis, 103 (2016): 284-303

Showing 1–15 of 15 results for author: Peeters, C F W