-
Sparse Learning and Class Probability Estimation with Weighted Support Vector Machines
Authors:
Liyun Zeng,
Hao Helen Zhang
Abstract:
Classification and probability estimation have broad applications in modern machine learning and data science applications, including biology, medicine, engineering, and computer science. The recent development of a class of weighted Support Vector Machines (wSVMs) has shown great values in robustly predicting the class probability and classification for various problems with high accuracy. The cu…
▽ More
Classification and probability estimation have broad applications in modern machine learning and data science applications, including biology, medicine, engineering, and computer science. The recent development of a class of weighted Support Vector Machines (wSVMs) has shown great values in robustly predicting the class probability and classification for various problems with high accuracy. The current framework is based on the $\ell^2$-norm regularized binary wSVMs optimization problem, which only works with dense features and has poor performance at sparse features with redundant noise in most real applications. The sparse learning process requires a prescreen of the important variables for each binary wSVMs for accurately estimating pairwise conditional probability. In this paper, we proposed novel wSVMs frameworks that incorporate automatic variable selection with accurate probability estimation for sparse learning problems. We developed efficient algorithms for effective variable selection for solving either the $\ell^1$-norm or elastic net regularized binary wSVMs optimization problems. The binary class probability is then estimated either by the $\ell^2$-norm regularized wSVMs framework with selected variables or by elastic net regularized wSVMs directly. The two-step approach of $\ell^1$-norm followed by $\ell^2$-norm wSVMs show a great advantage in both automatic variable selection and reliable probability estimators with the most efficient time. The elastic net regularized wSVMs offer the best performance in terms of variable selection and probability estimation with the additional advantage of variable grou** in the compensation of more computation time for high dimensional problems. The proposed wSVMs-based sparse learning methods have wide applications and can be further extended to $K$-class problems through ensemble learning.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Robust Brain MRI Image Classification with SIBOW-SVM
Authors:
Liyun Zeng,
Hao Helen Zhang
Abstract:
The majority of primary Central Nervous System (CNS) tumors in the brain are among the most aggressive diseases affecting humans. Early detection of brain tumor types, whether benign or malignant, glial or non-glial, is critical for cancer prevention and treatment, ultimately improving human life expectancy. Magnetic Resonance Imaging (MRI) stands as the most effective technique to detect brain tu…
▽ More
The majority of primary Central Nervous System (CNS) tumors in the brain are among the most aggressive diseases affecting humans. Early detection of brain tumor types, whether benign or malignant, glial or non-glial, is critical for cancer prevention and treatment, ultimately improving human life expectancy. Magnetic Resonance Imaging (MRI) stands as the most effective technique to detect brain tumors by generating comprehensive brain images through scans. However, human examination can be error-prone and inefficient due to the complexity, size, and location variability of brain tumors. Recently, automated classification techniques using machine learning (ML) methods, such as Convolutional Neural Network (CNN), have demonstrated significantly higher accuracy than manual screening, while maintaining low computational costs. Nonetheless, deep learning-based image classification methods, including CNN, face challenges in estimating class probabilities without proper model calibration. In this paper, we propose a novel brain tumor image classification method, called SIBOW-SVM, which integrates the Bag-of-Features (BoF) model with SIFT feature extraction and weighted Support Vector Machines (wSVMs). This new approach effectively captures hidden image features, enabling the differentiation of various tumor types and accurate label predictions. Additionally, the SIBOW-SVM is able to estimate the probabilities of images belonging to each class, thereby providing high-confidence classification decisions. We have also developed scalable and parallelable algorithms to facilitate the practical implementation of SIBOW-SVM for massive images. As a benchmark, we apply the SIBOW-SVM to a public data set of brain tumor MRI images containing four classes: glioma, meningioma, pituitary, and normal. Our results show that the new method outperforms state-of-the-art methods, including CNN.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Boosting Nyström Method
Authors:
Keaton Hamm,
Zhaoying Lu,
Wenbo Ouyang,
Hao Helen Zhang
Abstract:
The Nyström method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nyström approximation, ensemble Nyström algorithms compute a mixture of Nyström approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nyström, which iter…
▽ More
The Nyström method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nyström approximation, ensemble Nyström algorithms compute a mixture of Nyström approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nyström, which iteratively generate multiple ``weak'' Nyström approximations (each using a small number of columns) in a sequence adaptively - each approximation aims to compensate for the weaknesses of its predecessor - and then combine them to form one strong approximation. We demonstrate that our boosting Nyström algorithms can yield more efficient and accurate low-rank approximations to kernel matrices. Improvements over the standard and ensemble Nyström methods are illustrated by simulation studies and real-world data analysis.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation
Authors:
Liyun Zeng,
Hao Helen Zhang
Abstract:
Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for $K$-class problems (Wu, Zhang a…
▽ More
Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for $K$-class problems (Wu, Zhang and Liu, 2010; Wang, Zhang and Wu, 2019), where $K$ is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in $K$. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in $K$. Though not being most efficient in computation, the OVA offers the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate finite sample performance.
△ Less
Submitted 22 September, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Nonparametric Trace Regression in High Dimensions via Sign Series Representation
Authors:
Chanwoo Lee,
Lexin Li,
Hao Helen Zhang,
Miaoyan Wang
Abstract:
Learning of matrix-valued data has recently surged in a range of scientific and business applications. Trace regression is a widely used method to model effects of matrix predictors and has shown great success in matrix learning. However, nearly all existing trace regression solutions rely on two assumptions: (i) a known functional form of the conditional mean, and (ii) a global low-rank structure…
▽ More
Learning of matrix-valued data has recently surged in a range of scientific and business applications. Trace regression is a widely used method to model effects of matrix predictors and has shown great success in matrix learning. However, nearly all existing trace regression solutions rely on two assumptions: (i) a known functional form of the conditional mean, and (ii) a global low-rank structure in the entire range of the regression function, both of which may be violated in practice. In this article, we relax these assumptions by develo** a general framework for nonparametric trace regression models via structured sign series representations of high dimensional functions. The new model embraces both linear and nonlinear trace effects, and enjoys rank invariance to order-preserving transformations of the response. In the context of matrix completion, our framework leads to a substantially richer model based on what we coin as the "sign rank" of a matrix. We show that the sign series can be statistically characterized by weighted classification tasks. Based on this connection, we propose a learning reduction approach to learn the regression model via a series of classifiers, and develop a parallelable computation algorithm to implement sign series aggregations. We establish the excess risk bounds, estimation error rates, and sample complexities. Our proposal provides a broad nonparametric paradigm to many important matrix learning problems, including matrix regression, matrix completion, multi-task learning, and compressed sensing. We demonstrate the advantages of our method through simulations and two applications, one on brain connectivity study and the other on high-rank image completion.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Spatial Heterogeneity Automatic Detection and Estimation
Authors:
Xin Wang,
Zhengyuan Zhu,
Hao Helen Zhang
Abstract:
Spatial regression is widely used for modeling the relationship between a dependent variable and explanatory covariates. Oftentimes, the linear relationships vary across space, when some covariates have location-specific effects on the response. One fundamental question is how to detect the systematic variation in the model and identify which locations share common regression coefficients and whic…
▽ More
Spatial regression is widely used for modeling the relationship between a dependent variable and explanatory covariates. Oftentimes, the linear relationships vary across space, when some covariates have location-specific effects on the response. One fundamental question is how to detect the systematic variation in the model and identify which locations share common regression coefficients and which do not. Only a correct model structure can assure unbiased estimation of coefficients and valid inferences. In this work, we propose a new procedure, called Spatial Heterogeneity Automatic Detection and Estimation (SHADE), for automatically and simultaneously subgrou** and estimating covariate effects for spatial regression models. The SHADE employs a class of spatially-weighted fusion type penalty on all pairs of observations, with location-specific weight adaptively constructed using spatial information, to cluster coefficients into subgroups. Under certain regularity conditions, the SHADE is shown to be able to identify the true model structure with probability approaching one and estimate regression coefficients consistently. We develop an alternating direction method of multiplier algorithm (ADMM) to compute the SHAD efficiently. In numerical studies, we demonstrate empirical performance of the SHADE by using different choices of weights and compare their accuracy. The results suggest that spatial information can enhance subgroup structure analysis in challenging situations when the spatial variation among regression coefficients is small or the number of repeated measures is small. Finally, the SHADE is applied to find the relationship between a natural resource survey and a land cover data layer to identify spatially interpretable groups.
△ Less
Submitted 16 December, 2020; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Robust regression for optimal individualized treatment rules
Authors:
Wei Xiao,
Hao Helen Zhang,
Wenbin Lu
Abstract:
Because different patients may response quite differently to the same drug or treatment, there is increasing interest in discovering individualized treatment rule. In particular, people are eager to find the optimal individualized treatment rules, which if followed by the whole patient population would lead to the "best" outcome. In this paper, we propose new estimators based on robust regression…
▽ More
Because different patients may response quite differently to the same drug or treatment, there is increasing interest in discovering individualized treatment rule. In particular, people are eager to find the optimal individualized treatment rules, which if followed by the whole patient population would lead to the "best" outcome. In this paper, we propose new estimators based on robust regression with general loss functions to estimate the optimal individualized treatment rules. The new estimators possess the following nice properties: first, they are robust against skewed, heterogeneous, heavy-tailed errors or outliers; second, they are robust against misspecification of the baseline function; third, under certain situations, the new estimator coupled with pinball loss approximately maximizes the outcome's conditional quantile instead of conditional mean, which leads to a different optimal individualized treatment rule comparing with traditional Q- and A-learning. Consistency and asymptotic normality of the proposed estimators are established. Their empirical performance is demonstrated via extensive simulation studies and an analysis of an AIDS data.
△ Less
Submitted 13 April, 2016;
originally announced April 2016.
-
Model Selection for High Dimensional Quadratic Regression via Regularization
Authors:
Ning Hao,
Yang Feng,
Hao Helen Zhang
Abstract:
Quadratic regression (QR) models naturally extend linear models by considering interaction effects between the covariates. To conduct model selection in QR, it is important to maintain the hierarchical model structure between main effects and interaction effects. Existing regularization methods generally achieve this goal by solving complex optimization problems, which usually demands high computa…
▽ More
Quadratic regression (QR) models naturally extend linear models by considering interaction effects between the covariates. To conduct model selection in QR, it is important to maintain the hierarchical model structure between main effects and interaction effects. Existing regularization methods generally achieve this goal by solving complex optimization problems, which usually demands high computational cost and hence are not feasible for high dimensional data. This paper focuses on scalable regularization methods for model selection in high dimensional QR. We first consider two-stage regularization methods and establish theoretical properties of the two-stage LASSO. Then, a new regularization method, called Regularization Algorithm under Marginality Principle (RAMP), is proposed to compute a hierarchy-preserving regularization solution path efficiently. Both methods are further extended to solve generalized QR models. Numerical results are also shown to demonstrate performance of the methods.
△ Less
Submitted 14 July, 2016; v1 submitted 30 December, 2014;
originally announced January 2015.
-
A Note on High Dimensional Linear Regression with Interactions
Authors:
Ning Hao,
Hao Helen Zhang
Abstract:
The problem of interaction selection has recently caught much attention in high dimensional data analysis. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss issues such as a valid way of defining importance for the main effects and inte…
▽ More
The problem of interaction selection has recently caught much attention in high dimensional data analysis. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss issues such as a valid way of defining importance for the main effects and interaction effects, the invariance principle, and the strong heredity condition. Then we focus on two-stage methods, which are computationally attractive for large p problems but regarded heuristic in the literature. We will revisit the counterexample of Turlach (2004) and provide new insight to justify two-stage methods from a theoretical perspective. In the end, we suggest some new strategies for interaction selection under the marginality principle, which is followed by a numerical example.
△ Less
Submitted 7 October, 2015; v1 submitted 22 December, 2014;
originally announced December 2014.
-
Sparse and Efficient Estimation for Partial Spline Models with Increasing Dimension
Authors:
Guang Cheng,
Hao Helen Zhang,
Zuofeng Shang
Abstract:
We consider model selection and estimation for partial spline models and propose a new regularization method in the context of smoothing splines. The regularization method has a simple yet elegant form, consisting of roughness penalty on the nonparametric component and shrinkage penalty on the parametric components, which can achieve function smoothing and sparse estimation simultaneously. We esta…
▽ More
We consider model selection and estimation for partial spline models and propose a new regularization method in the context of smoothing splines. The regularization method has a simple yet elegant form, consisting of roughness penalty on the nonparametric component and shrinkage penalty on the parametric components, which can achieve function smoothing and sparse estimation simultaneously. We establish the convergence rate and oracle properties of the estimator under weak regularity conditions. Remarkably, the estimated parametric components are sparse and efficient, and the nonparametric component can be estimated with the optimal rate. The procedure also has attractive computational properties. Using the representer theory of smoothing splines, we reformulate the objective function as a LASSO-type problem, enabling us to use the LARS algorithm to compute the solution path. We then extend the procedure to situations when the number of predictors increases with the sample size and investigate its asymptotic properties in that context. Finite-sample performance is illustrated by simulations.
△ Less
Submitted 21 November, 2013; v1 submitted 31 October, 2013;
originally announced October 2013.
-
Variable selection for the multicategory SVM via adaptive sup-norm regularization
Authors:
Hao Helen Zhang,
Yufeng Liu,
Yichao Wu,
Ji Zhu
Abstract:
The Support Vector Machine (SVM) is a popular classification paradigm in machine learning and has achieved great success in real applications. However, the standard SVM can not select variables automatically and therefore its solution typically utilizes all the input variables without discrimination. This makes it difficult to identify important predictor variables, which is often one of the pri…
▽ More
The Support Vector Machine (SVM) is a popular classification paradigm in machine learning and has achieved great success in real applications. However, the standard SVM can not select variables automatically and therefore its solution typically utilizes all the input variables without discrimination. This makes it difficult to identify important predictor variables, which is often one of the primary goals in data analysis. In this paper, we propose two novel types of regularization in the context of the multicategory SVM (MSVM) for simultaneous classification and variable selection. The MSVM generally requires estimation of multiple discriminating functions and applies the argmax rule for prediction. For each individual variable, we propose to characterize its importance by the supnorm of its coefficient vector associated with different functions, and then minimize the MSVM hinge loss function subject to a penalty on the sum of supnorms. To further improve the supnorm penalty, we propose the adaptive regularization, which allows different weights imposed on different variables according to their relative importance. Both types of regularization automate variable selection in the process of building classifiers, and lead to sparse multi-classifiers with enhanced interpretability and improved accuracy, especially for high dimensional low sample size data. One big advantage of the supnorm penalty is its easy implementation via standard linear programming. Several simulated examples and one real gene data analysis demonstrate the outstanding performance of the adaptive supnorm penalty in various data settings.
△ Less
Submitted 26 March, 2008;
originally announced March 2008.