-
Bayesian Transfer Learning
Authors:
Piotr M. Suder,
Jason Xu,
David B. Dunson
Abstract:
Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical…
▽ More
Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical engineering have sought to bring these ideas into focus, primarily surveying general methodologies and works from these disciplines. This article highlights Bayesian approaches to transfer learning, which have received relatively limited attention despite their innate compatibility with the notion of drawing upon prior knowledge to guide new learning tasks. Our survey encompasses a wide range of Bayesian transfer learning frameworks applicable to a variety of practical settings. We discuss how these methods address the problem of finding the optimal information to transfer between domains, which is a central question in transfer learning. We illustrate the utility of Bayesian transfer learning methods via a simulation study where we compare performance against frequentist competitors.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Direct covariance matrix estimation with compositional data
Authors:
Aaron J. Molstad,
Karl Oskar Ekvall,
Piotr M. Suder
Abstract:
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject's gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions…
▽ More
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject's gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We compute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. Through simulation studies, we demonstrate that our estimator can outperform existing estimators. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with chronic fatigue syndrome.
△ Less
Submitted 24 April, 2024; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Scalable algorithms for semiparametric accelerated failure time models in high dimensions
Authors:
Piotr M. Suder,
Aaron J. Molstad
Abstract:
Semiparametric accelerated failure time (AFT) models are a useful alternative to Cox proportional hazards models, especially when the assumption of constant hazard ratios is untenable. However, rank-based criteria for fitting AFT models are often non-differentiable, which poses a computational challenge in high-dimensional settings. In this article, we propose a new alternating direction method of…
▽ More
Semiparametric accelerated failure time (AFT) models are a useful alternative to Cox proportional hazards models, especially when the assumption of constant hazard ratios is untenable. However, rank-based criteria for fitting AFT models are often non-differentiable, which poses a computational challenge in high-dimensional settings. In this article, we propose a new alternating direction method of multipliers algorithm for fitting semiparametric AFT models by minimizing a penalized rank-based loss function. Our algorithm scales well in both the number of subjects and number of predictors; and can easily accommodate a wide range of popular penalties. To improve the selection of tuning parameters, we propose a new criterion which avoids some common problems in cross-validation with censored responses. Through extensive simulation studies, we show that our algorithm and software is much faster than existing methods (which can only be applied to special cases), and we show that estimators which minimize a penalized rank-based criterion often outperform alternative estimators which minimize penalized weighted least squares criteria. Application to nine cancer datasets further demonstrates that rank-based estimators of semiparametric AFT models are competitive with estimators assuming proportional hazards model in high-dimensional settings, whereas weighted least squares estimators are often not. A software package implementing the algorithm, along with a set of auxiliary functions, is available for download at github.com/ajmolstad/penAFT.
△ Less
Submitted 18 January, 2022; v1 submitted 4 April, 2021;
originally announced April 2021.