-
Pretraining and the Lasso
Authors:
Erin Craig,
Mert Pilanci,
Thomas Le Menestrel,
Balasubramanian Narasimhan,
Manuel Rivas,
Roozbeh Dehghannasiri,
Julia Salzman,
Jonathan Taylor,
Robert Tibshirani
Abstract:
Pretraining is a popular and powerful paradigm in machine learning. As an example, suppose one has a modest-sized dataset of images of cats and dogs, and plans to fit a deep neural network to classify them from the pixel features. With pretraining, we start with a neural network trained on a large corpus of images, consisting of not just cats and dogs but hundreds of other image types. Then we fix…
▽ More
Pretraining is a popular and powerful paradigm in machine learning. As an example, suppose one has a modest-sized dataset of images of cats and dogs, and plans to fit a deep neural network to classify them from the pixel features. With pretraining, we start with a neural network trained on a large corpus of images, consisting of not just cats and dogs but hundreds of other image types. Then we fix all of the network weights except for the top layer (which makes the final classification) and train (or "fine tune") those weights on our dataset. This often results in dramatically better performance than the network trained solely on our smaller dataset.
In this paper, we ask the question "Can pretraining help the lasso?". We develop a framework for the lasso in which an overall model is fit to a large set of data, and then fine-tuned to a specific task on a smaller dataset. This latter dataset can be a subset of the original dataset, but does not need to be. We find that this framework has a wide variety of applications, including stratified models, multinomial targets, multi-response models, conditional average treatment estimation and even gradient boosting.
In the stratified model setting, the pretrained lasso pipeline estimates the coefficients common to all groups at the first stage, and then group specific coefficients at the second "fine-tuning" stage. We show that under appropriate assumptions, the support recovery rate of the common coefficients is superior to that of the usual lasso trained only on individual groups. This separate identification of common and individual coefficients can also be useful for scientific understanding.
△ Less
Submitted 18 April, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Cooperative learning for multiview analysis
Authors:
Daisy Yi Ding,
Shuangning Li,
Balasubramanian Narasimhan,
Robert Tibshirani
Abstract:
We propose a new method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data such as genomics, proteomics and radiomics are measured on a common set of samples. Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from…
▽ More
We propose a new method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data such as genomics, proteomics and radiomics are measured on a common set of samples. Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g. lasso, random forests, boosting, neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and a real multiomics example of labor onset prediction. Leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.
△ Less
Submitted 3 September, 2022; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Elastic Net Regularization Paths for All Generalized Linear Models
Authors:
J. Kenneth Tay,
Balasubramanian Narasimhan,
Trevor Hastie
Abstract:
The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work…
▽ More
The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models for right-censored data. We further extend the reach of the elastic net-regularized regression to all generalized linear model families, Cox models with (start, stop] data and strata, and a simplified version of the relaxed lasso. We also discuss convenient utility functions for measuring the performance of these fitted models.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Multi-resolution Networks For Flexible Irregular Time Series Modeling (Multi-FIT)
Authors:
Bhanu Pratap Singh,
Iman Deznabi,
Bharath Narasimhan,
Bryon Kucharski,
Rheeya Uppaal,
Akhila Josyula,
Madalina Fiterau
Abstract:
Missing values, irregularly collected samples, and multi-resolution signals commonly occur in multivariate time series data, making predictive tasks difficult. These challenges are especially prevalent in the healthcare domain, where patients' vital signs and electronic records are collected at different frequencies and have occasionally missing information due to the imperfections in equipment or…
▽ More
Missing values, irregularly collected samples, and multi-resolution signals commonly occur in multivariate time series data, making predictive tasks difficult. These challenges are especially prevalent in the healthcare domain, where patients' vital signs and electronic records are collected at different frequencies and have occasionally missing information due to the imperfections in equipment or patient circumstances. Researchers have handled each of these issues differently, often handling missing data through mean value imputation and then using sequence models over the multivariate signals while ignoring the different resolution of signals. We propose a unified model named Multi-resolution Flexible Irregular Time series Network (Multi-FIT). The building block for Multi-FIT is the FIT network. The FIT network creates an informative dense representation at each time step using signal information such as last observed value, time difference since the last observed time stamp and overall mean for the signal. Vertical FIT (FIT-V) is a variant of FIT which also models the relationship between different temporal signals while creating the informative dense representations for the signal. The multi-FIT model uses multiple FIT networks for sets of signals with different resolutions, further facilitating the construction of flexible representations. Our model has three main contributions: a.) it does not impute values but rather creates informative representations to provide flexibility to the model for creating task-specific representations b.) it models the relationship between different signals in the form of support signals c.) it models different resolutions in parallel before merging them for the final prediction task. The FIT, FIT-V and Multi-FIT networks improve upon the state-of-the-art models for three predictive tasks, including the forecasting of patient survival.
△ Less
Submitted 30 April, 2019;
originally announced May 2019.
-
A Scalable Discrete-Time Survival Model for Neural Networks
Authors:
Michael F. Gensheimer,
Balasubramanian Narasimhan
Abstract:
There is currently great interest in applying neural networks to prediction tasks in medicine. It is important for predictive models to be able to use survival data, where each patient has a known follow-up time and event/censoring indicator. This avoids information loss when training the model and enables generation of predicted survival curves. In this paper, we describe a discrete-time survival…
▽ More
There is currently great interest in applying neural networks to prediction tasks in medicine. It is important for predictive models to be able to use survival data, where each patient has a known follow-up time and event/censoring indicator. This avoids information loss when training the model and enables generation of predicted survival curves. In this paper, we describe a discrete-time survival model that is designed to be used with neural networks, which we refer to as Nnet-survival. The model is trained with the maximum likelihood method using minibatch stochastic gradient descent (SGD). The use of SGD enables rapid convergence and application to large datasets that do not fit in memory. The model is flexible, so that the baseline hazard rate and the effect of the input data on hazard probability can vary with follow-up time. It has been implemented in the Keras deep learning framework, and source code for the model and several examples is available online. We demonstrate the performance of the model on both simulated and real data and compare it to existing models Cox-nnet and Deepsurv.
△ Less
Submitted 19 November, 2018; v1 submitted 2 May, 2018;
originally announced May 2018.
-
Imputation of mixed data with multilevel singular value decomposition
Authors:
François Husson,
Julie Josse,
Balasubramanian Narasimhan,
Geneviève Robin
Abstract:
Statistical analysis of large data sets offers new opportunities to better understand many processes. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources. As a consequence, such data sets often contain mixed data, i.e. both quantitative and qualitative and many missing values. Furthermore, aggregated data present a natural \textit{multilevel} struct…
▽ More
Statistical analysis of large data sets offers new opportunities to better understand many processes. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources. As a consequence, such data sets often contain mixed data, i.e. both quantitative and qualitative and many missing values. Furthermore, aggregated data present a natural \textit{multilevel} structure, where individuals or samples are nested within different sites, such as countries or hospitals. Imputation of multilevel data has therefore drawn some attention recently, but current solutions are not designed to handle mixed data, and suffer from important drawbacks such as their computational cost. In this article, we propose a single imputation method for multilevel data, which can be used to complete either quantitative, categorical or mixed data. The method is based on multilevel singular value decomposition (SVD), which consists in decomposing the variability of the data into two components, the between and within groups variability, and performing SVD on both parts. We show on a simulation study that in comparison to competitors, the method has the great advantages of handling data sets of various size, and being computationally faster. Furthermore, it is the first so far to handle mixed data. We apply the method to impute a medical data set resulting from the aggregation of several data sets coming from different hospitals. This application falls in the framework of a larger project on Trauma patients. To overcome obstacles associated to the aggregation of medical data, we turn to distributed computation. The method is implemented in an R package.
△ Less
Submitted 30 April, 2018;
originally announced April 2018.
-
CVXR: An R Package for Disciplined Convex Optimization
Authors:
Anqi Fu,
Balasubramanian Narasimhan,
Stephen Boyd
Abstract:
CVXR is an R package that provides an object-oriented modeling language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the restrictive form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters…
▽ More
CVXR is an R package that provides an object-oriented modeling language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the restrictive form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters using a library of functions with known mathematical properties. CVXR then applies signed disciplined convex programming (DCP) to verify the problem's convexity. Once verified, the problem is converted into standard conic form using graph implementations and passed to a cone solver such as ECOS or SCS. We demonstrate CVXR's modeling framework with several applications.
△ Less
Submitted 29 June, 2020; v1 submitted 20 November, 2017;
originally announced November 2017.
-
Software for Distributed Computation on Medical Databases: A Demonstration Project
Authors:
Balasubramanian Narasimhan,
Daniel L. Rubin,
Samuel M. Gross,
Marina Bendersky,
Philip W. Lavori
Abstract:
Bringing together the information latent in distributed medical databases promises to personalize medical care by enabling reliable, stable modeling of outcomes with rich feature sets (including patient characteristics and treatments received). However, there are barriers to aggregation of medical data, due to lack of standardization of ontologies, privacy concerns, proprietary attitudes toward da…
▽ More
Bringing together the information latent in distributed medical databases promises to personalize medical care by enabling reliable, stable modeling of outcomes with rich feature sets (including patient characteristics and treatments received). However, there are barriers to aggregation of medical data, due to lack of standardization of ontologies, privacy concerns, proprietary attitudes toward data, and a reluctance to give up control over end use. Aggregation of data is not always necessary for model fitting. In models based on maximizing a likelihood, the computations can be distributed, with aggregation limited to the intermediate results of calculations on local data, rather than raw data. Distributed fitting is also possible for singular value decomposition. There has been work on the technical aspects of shared computation for particular applications, but little has been published on the software needed to support the "social networking" aspect of shared computing, to reduce the barriers to collaboration. We describe a set of software tools that allow the rapid assembly of a collaborative computational project, based on the flexible and extensible R statistical software and other open source packages, that can work across a heterogeneous collection of database environments, with full transparency to allow local officials concerned with privacy protections to validate the safety of the method. We describe the principles, architecture, and successful test results for the site-stratified Cox model and rank-k Singular Value Decomposition (SVD).
△ Less
Submitted 9 February, 2017; v1 submitted 22 December, 2014;
originally announced December 2014.
-
A New Approach to Designing Phase I-II Cancer Trials for Cytotoxic Chemotherapies
Authors:
Jay Bartroff,
Tze Leung Lai,
Balasubramanian Narasimhan
Abstract:
Recently there has been much work on early phase cancer designs that incorporate both toxicity and efficacy data, called Phase I-II designs because they combine elements of both phases. However, they do not explicitly address the Phase II hypothesis test of $H_0: p\le p_0$, where $p$ is the probability of efficacy at the estimated maximum tolerated dose (MTD) $\widehatη$ from Phase I and $p_0$ is…
▽ More
Recently there has been much work on early phase cancer designs that incorporate both toxicity and efficacy data, called Phase I-II designs because they combine elements of both phases. However, they do not explicitly address the Phase II hypothesis test of $H_0: p\le p_0$, where $p$ is the probability of efficacy at the estimated maximum tolerated dose (MTD) $\widehatη$ from Phase I and $p_0$ is the baseline efficacy rate. Standard practice for Phase II remains to treat $p$ as a fixed, unknown parameter and to use Simon's 2-stage design with all patients dosed at $\widehatη$. We propose a Phase I-II design that addresses the uncertainty in the estimate $p=p(\widehatη)$ in $H_0$ by using sequential generalized likelihood theory. Combining this with a Phase I design that incorporates efficacy data, the Phase I-II design provides a common framework that can be used all the way from the first dose of Phase I through the final accept/reject decision about $H_0$ at the end of Phase II, utilizing both toxicity and efficacy data throughout. Efficient group sequential testing is used in Phase II that allows for early stop** to show treatment effect or futility. The proposed Phase I-II design thus removes the artificial barrier between Phase I and Phase II, and fulfills the objectives of searching for the MTD and testing if the treatment has an acceptable response rate to enter into a Phase III trial.
△ Less
Submitted 11 February, 2014;
originally announced February 2014.