Skip to main content

Showing 1–27 of 27 results for author: van de Wiel, M A

.
  1. arXiv:2405.04917  [pdf, other

    stat.ME stat.ML

    Guiding adaptive shrinkage by co-data to improve regression-based prediction and feature selection

    Authors: Mark A. van de Wiel, Wessel N. van Wieringen

    Abstract: The high dimensional nature of genomics data complicates feature selection, in particular in low sample size studies - not uncommon in clinical prediction settings. It is widely recognized that complementary data on the features, `co-data', may improve results. Examples are prior feature groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availabili… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 19 pages, 7 figures. Including Supplementary Material

  2. arXiv:2311.09997  [pdf, other

    stat.ML cs.LG

    Co-data Learning for Bayesian Additive Regression Trees

    Authors: Jeroen M. Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A. van de Wiel

    Abstract: Medical prediction applications often need to deal with small sample sizes compared to the number of covariates. Such data pose problems for prediction and variable selection, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate co-data, i.e. external information on the covariates, into Bayesian additive regression trees (BART),… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 30 pages, 3 Figures, 2 Tables

  3. arXiv:2309.13998  [pdf, other

    stat.ME stat.ML

    Linked shrinkage to improve estimation of interaction effects in regression models

    Authors: Mark A. van de Wiel, Matteo Amestoy, Jeroen Hoogland

    Abstract: We address a classical problem in statistics: adding two-way interaction terms to a regression model. As the covariate dimension increases quadratically, we develop an estimator that adapts well to this increase, while providing accurate estimates and appropriate inference. Existing strategies overcome the dimensionality problem by only allowing interactions between relevant main effects. Building… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 28 pages, 18 figures

  4. arXiv:2301.09890  [pdf, other

    stat.ME stat.AP

    Think before you shrink: Alternatives to default shrinkage methods can improve prediction accuracy, calibration and coverage

    Authors: Mark A. van de Wiel, Gwenaël G. R. Leday, Jeroen Hoogland, Martijn W. Heymans, Erik W. van Zwet, Ailko H. Zwinderman

    Abstract: While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces bias, which may harm two other measures of predictive performance: calibration and coverage of confidence intervals. Much of the criticism stems from the usage… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: 35 pages including Supplementary Information

  5. arXiv:2212.08581  [pdf, other

    stat.ME stat.ML

    Penalised regression with multiple sources of prior effects

    Authors: Armin Rauschenberger, Zied Landoulsi, Mark A. van de Wiel, Enrico Glaab

    Abstract: In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. We propose an approach… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  6. arXiv:2206.03825  [pdf, other

    stat.ME stat.ML

    Estimation of Predictive Performance in High-Dimensional Data Settings using Learning Curves

    Authors: Jeroen M. Goedhart, Thomas Klausch, Mark A. van de Wiel

    Abstract: In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared t… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: 19 pages, 2 figures, 2 tables

  7. arXiv:2205.07640  [pdf, other

    stat.ME stat.ML

    ecpc: An R-package for generic co-data models for high-dimensional prediction

    Authors: Mirrelijn M. van Nee, Lodewyk F. A. Wessels, Mark A. van de Wiel

    Abstract: High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables. We consider adaptive ridge penalised generalised linear and Cox models, in which the variable… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  8. arXiv:2110.02649  [pdf, other

    stat.ME

    A Bayesian accelerated failure time model for interval censored three-state screening outcomes

    Authors: Thomas Klausch, Eddymurphy U. Akwiwu, Mark A. van de Wiel, Veerle M. H. Coupe, Johannes Berkhof

    Abstract: Women infected by the Human papilloma virus are at an increased risk to develop cervical intraepithalial neoplasia lesions (CIN). CIN are classified into three grades of increasing severity (CIN-1, CIN-2, and CIN-3) and can eventually develop into cervical cancer. The main purpose of screening is detecting CIN-2 and CIN-3 cases which are usually surgically removed. Screening data from the POBASCAM… ▽ More

    Submitted 3 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: 22 pages (Manuscript), 34 pages (Supplemental Material)

    MSC Class: 62N02

  9. arXiv:2104.02419  [pdf, other

    stat.ME

    Semi-supervised empirical Bayes group-regularized factor regression

    Authors: Magnus M. Münch, Mark A. van de Wiel, Aad W. van der Vaart, Carel F. W. Peeters

    Abstract: The features in high dimensional biomedical prediction problems are often well described with lower dimensional manifolds. An example is genes that are organised in smaller functional networks. The outcome can then be described with the factor regression model. A benefit of the factor model is that is allows for straightforward inclusion of unlabeled observations in the estimation of the model, i.… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 19 pages, 5 figures, submitted to Biometrical Journal

  10. arXiv:2101.03875  [pdf, other

    stat.ME stat.ML

    Fast marginal likelihood estimation of penalties for group-adaptive elastic net

    Authors: Mirrelijn M. van Nee, Tim van de Brug, Mark A. van de Wiel

    Abstract: Nowadays, clinical research routinely uses omics data, such as gene expression, for predicting clinical outcomes or selecting markers. Additionally, so-called co-data are often available, providing complementary information on the covariates, like p-values from previously published studies or groups of genes corresponding to pathways. Elastic net penalisation is widely used for prediction and cova… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: 16 pages, 6 figures, 1 table

  11. arXiv:2005.09301  [pdf, other

    stat.ME stat.CO stat.ML

    Fast cross-validation for multi-penalty ridge regression

    Authors: Mark A. van de Wiel, Mirrelijn M. van Nee, Armin Rauschenberger

    Abstract: High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type specific penalties. The largest challenge for multi-penalty ridge is to optimize the… ▽ More

    Submitted 1 April, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

  12. arXiv:2005.04010  [pdf, other

    stat.ME stat.ML

    Flexible co-data learning for high-dimensional prediction

    Authors: Mirrelijn M. van Nee, Lodewyk F. A. Wessels, Mark A. van de Wiel

    Abstract: Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as geno… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: Document consists of main content (20 pages, 10 figures) and supplementary material (14 pages, 13 figures)

  13. arXiv:1903.11696  [pdf, other

    stat.ML cs.LG eess.IV q-bio.QM stat.AP stat.ME

    Stable prediction with radiomics data

    Authors: Carel F. W. Peeters, Caroline Übelhör, Steven W. Mes, Roland Martens, Thomas Koopman, Pim de Graaf, Floris H. P. van Velden, Ronald Boellaard, Jonas A. Castelijns, Dennis E. te Beest, Martijn W. Heymans, Mark A. van de Wiel

    Abstract: Motivation: Radiomics refers to the high-throughput mining of quantitative features from radiographic images. It is a promising field in that it may provide a non-invasive solution for screening and classification. Standard machine learning classification and feature selection techniques, however, tend to display inferior performance in terms of (the stability of) predictive performance. This is d… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

    Comments: 52 pages: 14 pages Main Text and 38 pages of Supplementary Material

  14. arXiv:1902.02623  [pdf, other

    stat.CO

    Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

    Authors: Jurre R. Veerman, Gwenael G. R. Leday, Mark A. van de Wiel

    Abstract: For high-dimensional linear regression models, we review and compare several estimators of variances $τ^2$ and $σ^2$ of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty $λ$ and heritability index $h^2$, often used in genetics. Direct and indirect estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), a… ▽ More

    Submitted 7 February, 2019; originally announced February 2019.

  15. arXiv:1901.10217  [pdf, other

    stat.ME

    Incorporating prior information and borrowing information in high-dimensional sparse regression using the horseshoe and variational Bayes

    Authors: Gino B. Kpogbezan, Mark A. van de Wiel, Wessel N. van Wieringen, Aad W. van der Vaart

    Abstract: We introduce a sparse high-dimensional regression approach that can incorporate prior information on the regression parameters and can borrow information across a set of similar datasets. Prior information may for instance come from previous studies or genomic databases, and information borrowed across a set of genes or genomic networks. The approach is based on prior modelling of the regression p… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

  16. arXiv:1809.06679  [pdf, other

    stat.ME stat.ML

    Estimating Bayesian Optimal Treatment Regimes for Dichotomous Outcomes using Observational Data

    Authors: Thomas Klausch, Peter van de Ven, Tim van de Brug, Mark A. van de Wiel, Johannes Berkhof

    Abstract: Optimal treatment regimes (OTR) are individualised treatment assignment strategies that identify a medical treatment as optimal given all background information available on the individual. We discuss Bayes optimal treatment regimes estimated using a loss function defined on the bivariate distribution of dichotomous potential outcomes. The proposed approach allows considering more general objectiv… ▽ More

    Submitted 28 September, 2018; v1 submitted 18 September, 2018; originally announced September 2018.

    Comments: 30 pages, 8 figures

  17. arXiv:1805.09175  [pdf, other

    stat.ME

    Detecting SNPs with interactive effects on a quantitative trait

    Authors: Armin Rauschenberger, Renee X. Menezes, Mark A. van de Wiel, Natasja M. van Schoor, Marianne A. Jonker

    Abstract: Here we propose a test to detect effects of single nucleotide polymorphisms (SNPs) on a quantitative trait. Significant SNP-SNP interactions are more difficult to detect than significant SNPs, partly due to the massive amount of SNP-SNP combinations. We propose to move away from testing interaction terms, and move towards testing whether an individual SNP is involved in any interaction. This reduc… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

  18. arXiv:1805.00389  [pdf, other

    stat.ME

    Adaptive group-regularized logistic elastic net regression

    Authors: Magnus M. Münch, Carel F. W. Peeters, Aad W. van der Vaart, Mark A. van de Wiel

    Abstract: In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (a) p-values from a previous study, (b) a summary of prior information, and (c) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection, but is not straightforward in the s… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Comments: 19 pages, 5 figures, supplementary material available from first author's personal website

  19. arXiv:1709.07285  [pdf, other

    q-bio.NC q-bio.MN stat.AP

    Blood-based metabolic signatures in Alzheimer's disease

    Authors: Francisca A. de Leeuw, Carel F. W. Peeters, Maartje I. Kester, Amy C. Harms, Eduard A. Struys, Thomas Hankemeier, Herman W. T. van Vlijmen, Sven J. van der Lee, Cornelia M. van Duijn, Philip Scheltens, Ayşe Demirkan, Mark A. van de Wiel, Wiesje M. van der Flier, Charlotte E. Teunissen

    Abstract: Introduction: Identification of blood-based metabolic changes might provide early and easy-to-obtain biomarkers. Methods: We included 127 AD patients and 121 controls with CSF-biomarker-confirmed diagnosis (cut-off tau/A$β_{42}$: 0.52). Mass spectrometry platforms determined the concentrations of 53 amine, 22 organic acid, 120 lipid, and 40 oxidative stress compounds. Multiple signatures were as… ▽ More

    Submitted 21 September, 2017; originally announced September 2017.

    Comments: Postprint, 76 pages, 32 figures, includes supplementary material

    Journal ref: Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 8 (2017): 196-207

  20. arXiv:1709.04192  [pdf, other

    stat.ME

    Learning from a lot: Empirical Bayes in high-dimensional prediction settings

    Authors: Mark A. van de Wiel, Dennis E. te Beest, Magnus Münch

    Abstract: Empirical Bayes is a versatile approach to `learn from a lot' in two ways: first, from a large number of variables and second, from a potentially large amount of prior information, e.g. stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods including penalized regression, linear discriminant analysis, and B… ▽ More

    Submitted 16 March, 2018; v1 submitted 13 September, 2017; originally announced September 2017.

  21. arXiv:1706.00641  [pdf, other

    stat.AP

    Improved high-dimensional prediction with Random Forests by the use of co-data

    Authors: Dennis E. te Beest, Steven W. Mes, Ruud H. Brakenhoff, Mark A. van de Wiel

    Abstract: Prediction in high dimensional settings is difficult due to large by number of variables relative to the sample size. We demonstrate how auxiliary "co-data" can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities (used to draw candidate variables, the default for a Random Forest) by c… ▽ More

    Submitted 2 June, 2017; originally announced June 2017.

    Comments: 17 pages, 4 figures, to be published

  22. The Spectral Condition Number Plot for Regularization Parameter Determination

    Authors: Carel F. W. Peeters, Mark A. van de Wiel, Wessel N. van Wieringen

    Abstract: Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its… ▽ More

    Submitted 14 August, 2016; originally announced August 2016.

    Comments: 41 pages, 7 figures, includes supplementary material

    Journal ref: Computational Statistics, 35(2):629-646, 2020

  23. arXiv:1605.07514  [pdf, other

    stat.ME

    An empirical Bayes approach to network recovery using external knowledge

    Authors: Gino B. Kpogbezan, Aad W. van der Vaart, Wessel N. van Wieringen, Gwenaël G. R. Leday, Mark A. van de Wiel

    Abstract: Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simul… ▽ More

    Submitted 24 May, 2016; originally announced May 2016.

  24. arXiv:1510.03771  [pdf, other

    stat.ME

    Gene network reconstruction using global-local shrinkage priors

    Authors: Gwenaël G. R. Leday, Mathisca C. M. de Gunst, Gino B. Kpogbezan, Aad W. Van der Vaart, Wessel N. Van Wieringen, Mark A. Van de Wiel

    Abstract: Reconstructing a gene network from high-throughput molecular data is often a challenging task, as the number of parameters to estimate easily is much larger than the sample size. A conventional remedy is to regularize or penalize the model likelihood. In network models, this is often done locally in the neighbourhood of each node or gene. However, estimation of the many regularization parameters i… ▽ More

    Submitted 13 October, 2015; originally announced October 2015.

    Comments: 27 pages, 5 figures

  25. arXiv:1411.3496  [pdf, other

    stat.ME

    Better prediction by use of co-data: Adaptive group-regularized ridge regression

    Authors: Mark A. van de Wiel, Tonje G. Lien, Wina Verlaat, Wessel N. van Wieringen, Saskia M. Wilting

    Abstract: For many high-dimensional studies, additional information on the variables, like (genomic) annotation or external p-values, is available. In the context of binary and continuous prediction, we develop a method for adaptive group-regularized (logistic) ridge regression, which makes structural use of such 'co-data'. Here, 'groups' refer to a partition of the variables according to the co-data. We de… ▽ More

    Submitted 18 May, 2015; v1 submitted 13 November, 2014; originally announced November 2014.

    Comments: 15 pages, 2 figures. Supplementary Information available on first author's web site

    MSC Class: 62J07

  26. Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

    Authors: Gwenaël G. R. Leday, Aad W. van der Vaart, Wessel N. van Wieringen, Mark A. van de Wiel

    Abstract: DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexi… ▽ More

    Submitted 6 December, 2013; originally announced December 2013.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS605

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 2, 823-845

  27. A nonparametric control chart based on the Mann-Whitney statistic

    Authors: Subhabrata Chakraborti, Mark A. van de Wiel

    Abstract: Nonparametric or distribution-free charts can be useful in statistical process control problems when there is limited or lack of knowledge about the underlying process distribution. In this paper, a phase II Shewhart-type chart is considered for location, based on reference data from phase I analysis and the well-known Mann-Whitney statistic. Control limits are computed using Lugannani-Rice-sadd… ▽ More

    Submitted 15 May, 2008; originally announced May 2008.

    Comments: Published in at http://dx.doi.org/10.1214/193940307000000112 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-COLL1-IMSCOLL112 MSC Class: 62G30; 62-07; 62P30 (Primary)

    Journal ref: IMS Collections 2008, Vol. 1, 156-172