Skip to main content

Showing 1–6 of 6 results for author: Stephenson, W T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2107.09194  [pdf, other

    stat.ML cs.LG stat.ME

    Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

    Authors: William T. Stephenson, Zachary Frangella, Madeleine Udell, Tamara Broderick

    Abstract: Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimu… ▽ More

    Submitted 1 November, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: Published in NeurIPS 2021

  2. arXiv:2106.06510  [pdf, other

    stat.ML cs.LG stat.CO

    Measuring the robustness of Gaussian processes to kernel choice

    Authors: William T. Stephenson, Soumya Ghosh, Tin D. Nguyen, Mikhail Yurochkin, Sameer K. Deshpande, Tamara Broderick

    Abstract: Gaussian processes (GPs) are used to make medical and scientific decisions, including in cardiac care and monitoring of atmospheric carbon dioxide levels. Notably, the choice of GP kernel is often somewhat arbitrary. In particular, uncountably many kernels typically align with qualitative prior knowledge (e.g.\ function smoothness or stationarity). But in practice, data analysts choose among a han… ▽ More

    Submitted 12 March, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: AISTATS 2022

  3. arXiv:2008.10547  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Approximate Cross-Validation with Low-Rank Data in High Dimensions

    Authors: William T. Stephenson, Madeleine Udell, Tamara Broderick

    Abstract: Many recent advances in machine learning are driven by a challenging trifecta: large data size $N$; high dimensions; and expensive algorithms. In this setting, cross-validation (CV) serves as an important tool for model assessment. Recent advances in approximate cross validation (ACV) provide accurate approximations to CV with only a single model fit, avoiding traditional CV's requirement for repe… ▽ More

    Submitted 1 November, 2022; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: Published in NeurIPS 2020

    Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  4. arXiv:2006.12669  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Approximate Cross-Validation for Structured Models

    Authors: Soumya Ghosh, William T. Stephenson, Tin D. Nguyen, Sameer K. Deshpande, Tamara Broderick

    Abstract: Many modern data analyses benefit from explicitly modeling dependence structure in data -- such as measurements across time or space, ordered words in a sentence, or genes in a genome. A gold standard evaluation technique is structured cross-validation (CV), which leaves out some data subset (such as data within a time interval or data in a geographic region) in each fold. But CV here can be prohi… ▽ More

    Submitted 1 December, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 25 pages, 8 figures. NeurIPS 2020 camera ready. v2 fixes typos and provides additional empirical results. Code: https://github.com/SoumyaTGhosh/structured-infinitesimal-jackknife

  5. arXiv:1905.13657  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Approximate Cross-Validation in High Dimensions with Guarantees

    Authors: William T. Stephenson, Tamara Broderick

    Abstract: Leave-one-out cross-validation (LOOCV) can be particularly accurate among cross-validation (CV) variants for machine learning assessment tasks -- e.g., assessing methods' error or variability. But it is expensive to re-fit a model $N$ times for a dataset of size $N$. Previous work has shown that approximations to LOOCV can be both fast and accurate -- when the unknown parameter is of small, fixed… ▽ More

    Submitted 22 June, 2020; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Accepted to AISTATS 2020. 33 pages, 8 figures

  6. arXiv:1811.11790  [pdf, other

    q-bio.QM cs.LG stat.ML

    Reconstructing probabilistic trees of cellular differentiation from single-cell RNA-seq data

    Authors: Miriam Shiffman, William T. Stephenson, Geoffrey Schiebinger, Jonathan Huggins, Trevor Campbell, Aviv Regev, Tamara Broderick

    Abstract: Until recently, transcriptomics was limited to bulk RNA sequencing, obscuring the underlying expression patterns of individual cells in favor of a global average. Thanks to technological advances, we can now profile gene expression across thousands or millions of individual cells in parallel. This new type of data has led to the intriguing discovery that individual cell profiles can reflect the im… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: 18 pages, 6 figures. Preliminary work appeared in the 2017 NeurIPS workshops in Advances in Approximate Bayesian Inference (http://approximateinference.org/2017) and Machine Learning for Computational Biology (https://mlcb.github.io)