-
Guaranteed Tensor Recovery Fused Low-rankness and Smoothness
Authors:
Hailin Wang,
Jiangjun Peng,
Wen** Qin,
Jianjun Wang,
Deyu Meng
Abstract:
The tensor data recovery task has thus attracted much research attention in recent years. Solving such an ill-posed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor p…
▽ More
The tensor data recovery task has thus attracted much research attention in recent years. Solving such an ill-posed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor priors, i.e., global low-rankness (L) and local smoothness (S) across different tensor modes, which are always encoded as a sum of two separate regularization terms into the recovery models. However, unlike the primary theoretical developments on low-rank tensor recovery, these joint L+S models have no theoretical exact-recovery guarantees yet, making the methods lack reliability in real practice. To this crucial issue, in this work, we build a unique regularization term, which essentially encodes both L and S priors of a tensor simultaneously. Especially, by equip** this single regularizer into the recovery models, we can rigorously prove the exact recovery guarantees for two typical tensor recovery tasks, i.e., tensor completion (TC) and tensor robust principal component analysis (TRPCA). To the best of our knowledge, this should be the first exact-recovery results among all related L+S methods for tensor recovery. Significant recovery accuracy improvements over many other SOTA methods in several TC and TRPCA tasks with various kinds of visual tensor data are observed in extensive experiments. Typically, our method achieves a workable performance when the missing rate is extremely large, e.g., 99.5%, for the color image inpainting task, while all its peers totally fail in such challenging case.
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction
Authors:
Ziyi Yang,
Jun Shu,
Yong Liang,
Deyu Meng,
Zongben Xu
Abstract:
Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously known as small data. In our work, we focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients that can guide treatment d…
▽ More
Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously known as small data. In our work, we focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients that can guide treatment decisions for a specific individual through training on small data. In fact, doctors and clinicians always address this problem by studying several interrelated clinical variables simultaneously. We attempt to simulate such clinical perspective, and introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks and transfer it to help address new tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification. Observing that gene expression data have specifically high dimensionality and high noise properties compared with image data, we proposed a new extension of it by appending two modules to address these issues. Concretely, we append a feature selection layer to automatically filter out the disease-irrelated genes and incorporate a sample reweighting strategy to adaptively remove noisy data, and meanwhile the extended model is capable of learning from a limited number of training examples and generalize well. Simulations and real gene expression data experiments substantiate the superiority of the proposed method for predicting the subtypes of disease and identifying potential disease-related genes.
△ Less
Submitted 3 September, 2020; v1 submitted 1 September, 2020;
originally announced September 2020.
-
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
Authors:
Jun Shu,
Yanwen Zhu,
Qian Zhao,
Zongben Xu,
Deyu Meng
Abstract:
The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it al…
▽ More
The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it always needs to search proper LR schedules from scratch for new tasks, which, however, are often largely different with task variations, like data modalities, network architectures, or training data capacities. To address this learning-rate-schedule setting issues, we propose to parameterize LR schedules with an explicit map** formulation, called \textit{MLR-SNet}. The learnable parameterized structure brings more flexibility for MLR-SNet to learn a proper LR schedule to comply with the training dynamics of DNN. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules. Moreover, the explicit parameterized structure makes the meta-learned LR schedules capable of being transferable and plug-and-play, which can be easily generalized to new heterogeneous tasks. We transfer our meta-learned MLR-SNet to query tasks like different training epochs, network architectures, data modalities, dataset sizes from the training ones, and achieve comparable or even better performance compared with hand-designed LR schedules specifically designed for the query tasks. The robustness of MLR-SNet is also substantiated when the training data are biased with corrupted noise. We further prove the convergence of the SGD algorithm equipped with LR schedule produced by our MLR-Net, with the convergence rate comparable to the best-known ones of the algorithm for solving the problem.
△ Less
Submitted 13 May, 2021; v1 submitted 28 July, 2020;
originally announced July 2020.
-
Meta Transition Adaptation for Robust Deep Learning with Noisy Labels
Authors:
Jun Shu,
Qian Zhao,
Zongben Xu,
Deyu Meng
Abstract:
To discover intrinsic inter-class transition probabilities underlying data, learning with noise transition has become an important approach for robust deep learning on corrupted labels. Prior methods attempt to achieve such transition knowledge by pre-assuming strongly confident anchor points with 1-probability belonging to a specific class, generally infeasible in practice, or directly jointly es…
▽ More
To discover intrinsic inter-class transition probabilities underlying data, learning with noise transition has become an important approach for robust deep learning on corrupted labels. Prior methods attempt to achieve such transition knowledge by pre-assuming strongly confident anchor points with 1-probability belonging to a specific class, generally infeasible in practice, or directly jointly estimating the transition matrix and learning the classifier from the noisy samples, always leading to inaccurate estimation misguided by wrong annotation information especially in large noise cases. To alleviate these issues, this study proposes a new meta-transition-learning strategy for the task. Specifically, through the sound guidance of a small set of meta data with clean labels, the noise transition matrix and the classifier parameters can be mutually ameliorated to avoid being trapped by noisy training samples, and without need of any anchor point assumptions. Besides, we prove our method is with statistical consistency guarantee on correctly estimating the desired transition matrix. Extensive synthetic and real experiments validate that our method can more accurately extract the transition matrix, naturally following its more robust performance than prior arts. Its essential relationship with label distribution learning is also discussed, which explains its fine performance even under no-noise scenarios.
△ Less
Submitted 11 June, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Ball k-means
Authors:
Shuyin Xia,
Daowan Peng,
Deyu Meng,
Changqing Zhang,
Guoyin Wang,
Zizhong Chen,
Wei Wei
Abstract:
This paper presents a novel accelerated exact k-means algorithm called the Ball k-means algorithm, which uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The Ball k-means can accurately find the neighbor clusters for each cluster resulting distance computations only between a point and its neighbor clusters' centroids instead of all centroids. Moreov…
▽ More
This paper presents a novel accelerated exact k-means algorithm called the Ball k-means algorithm, which uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The Ball k-means can accurately find the neighbor clusters for each cluster resulting distance computations only between a point and its neighbor clusters' centroids instead of all centroids. Moreover, each cluster can be divided into a stable area and an active area, and the later one can be further divided into annulus areas. The assigned cluster of the points in the stable area is not changed in the current iteration while the points in the annulus area will be adjusted within a few neighbor clusters in the current iteration. Also, there are no upper or lower bounds in the proposed Ball k-means. Furthermore, reducing centroid-centroid distance computation between iterations makes it efficient for large k clustering. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability
Authors:
Hojjat Aghakhani,
Dongyu Meng,
Yu-Xiang Wang,
Christopher Kruegel,
Giovanni Vigna
Abstract:
A recent source of concern for the security of neural networks is the emergence of clean-label dataset poisoning attacks, wherein correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to the human observer, they contain malicious characteristics that trigger a targeted misclassification during inference. We propose a scalable and transf…
▽ More
A recent source of concern for the security of neural networks is the emergence of clean-label dataset poisoning attacks, wherein correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to the human observer, they contain malicious characteristics that trigger a targeted misclassification during inference. We propose a scalable and transferable clean-label poisoning attack against transfer learning, which creates poison images with their center close to the target image in the feature space. Our attack, Bullseye Polytope, improves the attack success rate of the current state-of-the-art by 26.75% in end-to-end transfer learning, while increasing attack speed by a factor of 12. We further extend Bullseye Polytope to a more practical attack model by including multiple images of the same object (e.g., from different angles) when crafting the poison samples. We demonstrate that this extension improves attack transferability by over 16% to unseen images (of the same object) without using extra poison samples.
△ Less
Submitted 13 March, 2021; v1 submitted 30 April, 2020;
originally announced May 2020.
-
Learning Adaptive Loss for Robust Learning with Noisy Labels
Authors:
Jun Shu,
Qian Zhao,
Keyu Chen,
Zongben Xu,
Deyu Meng
Abstract:
Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust loss functions, however, inevitably involve hyperparameter(s) to be tuned, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Besides, the non-convexity brought by the loss as well as the complicated network architec…
▽ More
Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust loss functions, however, inevitably involve hyperparameter(s) to be tuned, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Besides, the non-convexity brought by the loss as well as the complicated network architecture makes it easily trapped into an unexpected solution with poor generalization capability. To address above issues, we propose a meta-learning method capable of adaptively learning hyperparameter in robust loss functions. Specifically, through mutual amelioration between robust loss hyperparameter and network parameters in our method, both of them can be simultaneously finely learned and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust loss functions are attempted to be integrated into our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its accuracy and generalization performance, as compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.
△ Less
Submitted 15 February, 2020;
originally announced February 2020.
-
Meta-Weight-Net: Learning an Explicit Map** For Sample Weighting
Authors:
Jun Shu,
Qi Xie,
Lixuan Yi,
Qian Zhao,
San** Zhou,
Zongben Xu,
Deyu Meng
Abstract:
Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function map** from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify th…
▽ More
Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function map** from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify the weighting function as well as its additional hyper-parameters. It makes them fairly hard to be generally applied in practice due to the significant variation of proper weighting schemes relying on the investigated problem and training data. To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data. The weighting function is an MLP with one hidden layer, constituting a universal approximator to almost any continuous functions, making the method able to fit a wide range of weighting functions including those assumed in conventional research. Guided by a small amount of unbiased meta-data, the parameters of the weighting function can be finely updated simultaneously with the learning process of the classifiers. Synthetic and real experiments substantiate the capability of our method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases. This naturally leads to its better accuracy than other state-of-the-art methods.
△ Less
Submitted 26 September, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
Model Inconsistent but Correlated Noise: Multi-view Subspace Learning with Regularized Mixture of Gaussians
Authors:
Hongwei Yong,
Deyu Meng,
**xing Li,
Wangmeng Zuo,
Lei Zhang
Abstract:
Multi-view subspace learning (MSL) aims to find a low-dimensional subspace of the data obtained from multiple views. Different from single view case, MSL should take both common and specific knowledge among different views into consideration. To enhance the robustness of model, the complexity, non-consistency and similarity of noise in multi-view data should be fully taken into consideration. Most…
▽ More
Multi-view subspace learning (MSL) aims to find a low-dimensional subspace of the data obtained from multiple views. Different from single view case, MSL should take both common and specific knowledge among different views into consideration. To enhance the robustness of model, the complexity, non-consistency and similarity of noise in multi-view data should be fully taken into consideration. Most current MSL methods only assume a simple Gaussian or Laplacian distribution for the noise while neglect the complex noise configurations in each view and noise correlations among different views of practical data. To this issue, this work initiates a MSL method by encoding the multi-view-shared and single-view-specific noise knowledge in data. Specifically, we model data noise in each view as a separated Mixture of Gaussians (MoG), which can fit a wider range of complex noise types than conventional Gaussian/Laplacian. Furthermore, we link all single-view-noise as a whole by regularizing them by a common MoG component, encoding the shared noise knowledge among them. Such regularization component can be formulated as a concise KL-divergence regularization term under a MAP framework, leading to good interpretation of our model and simple EM-based solving strategy to the problem. Experimental results substantiate the superiority of our method.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
Discovering Influential Factors in Variational Autoencoder
Authors:
Shiqi Liu,
**gxin Liu,
Qian Zhao,
Xiangyong Cao,
Huibin Li,
Hongying Meng,
Sheng Liu,
Deyu Meng
Abstract:
In the field of machine learning, it is still a critical issue to identify and supervise the learned representation without manually intervening or intuition assistance to extract useful knowledge or serve for the downstream tasks. In this work, we focus on supervising the influential factors extracted by the variational autoencoder(VAE). The VAE is proposed to learn independent low dimension repr…
▽ More
In the field of machine learning, it is still a critical issue to identify and supervise the learned representation without manually intervening or intuition assistance to extract useful knowledge or serve for the downstream tasks. In this work, we focus on supervising the influential factors extracted by the variational autoencoder(VAE). The VAE is proposed to learn independent low dimension representation while facing the problem that sometimes pre-set factors are ignored. We argue that the mutual information of the input and each learned factor of the representation plays a necessary indicator of discovering the influential factors. We find the VAE objective inclines to induce mutual information sparsity in factor dimension over the data intrinsic dimension and results in some non-influential factors whose function on data reconstruction could be ignored. We show mutual information also influences the lower bound of VAE's reconstruction error and downstream classification task. To make such indicator applicable, we design an algorithm for calculating the mutual information for VAE and prove its consistency. Experimental results on MNIST, CelebA and DEAP datasets show that mutual information can help determine influential factors, of which some are interpretable and can be used to further generation and classification tasks, and help discover the variant that connects with emotion on DEAP dataset.
△ Less
Submitted 5 April, 2019; v1 submitted 5 September, 2018;
originally announced September 2018.
-
Small Sample Learning in Big Data Era
Authors:
Jun Shu,
Zongben Xu,
Deyu Meng
Abstract:
As a promising area in artificial intelligence, a new learning paradigm, called Small Sample Learning (SSL), has been attracting prominent research attention in the recent years. In this paper, we aim to present a survey to comprehensively introduce the current techniques proposed on this topic. Specifically, current SSL techniques can be mainly divided into two categories. The first category of S…
▽ More
As a promising area in artificial intelligence, a new learning paradigm, called Small Sample Learning (SSL), has been attracting prominent research attention in the recent years. In this paper, we aim to present a survey to comprehensively introduce the current techniques proposed on this topic. Specifically, current SSL techniques can be mainly divided into two categories. The first category of SSL approaches can be called "concept learning", which emphasizes learning new concepts from only few related observations. The purpose is mainly to simulate human learning behaviors like recognition, generation, imagination, synthesis and analysis. The second category is called "experience learning", which usually co-exists with the large sample learning manner of conventional machine learning. This category mainly focuses on learning with insufficient samples, and can also be called small data learning in some literatures. More extensive surveys on both categories of SSL techniques are introduced and some neuroscience evidences are provided to clarify the rationality of the entire SSL regime, and the relationship with human learning process. Some discussions on the main challenges and possible future research directions along this line are also presented.
△ Less
Submitted 22 August, 2018; v1 submitted 14 August, 2018;
originally announced August 2018.
-
Understanding Self-Paced Learning under Concave Conjugacy Theory
Authors:
Shiqi Liu,
Zilu Ma,
Deyu Meng
Abstract:
By simulating the easy-to-hard learning manners of humans/animals, the learning regimes called curriculum learning~(CL) and self-paced learning~(SPL) have been recently investigated and invoked broad interests. However, the intrinsic mechanism for analyzing why such learning regimes can work has not been comprehensively investigated. To this issue, this paper proposes a concave conjugacy theory fo…
▽ More
By simulating the easy-to-hard learning manners of humans/animals, the learning regimes called curriculum learning~(CL) and self-paced learning~(SPL) have been recently investigated and invoked broad interests. However, the intrinsic mechanism for analyzing why such learning regimes can work has not been comprehensively investigated. To this issue, this paper proposes a concave conjugacy theory for looking into the insight of CL/SPL. Specifically, by using this theory, we prove the equivalence of the SPL regime and a latent concave objective, which is closely related to the known non-convex regularized penalty widely used in statistics and machine learning. Beyond the previous theory for explaining CL/SPL insights, this new theoretical framework on one hand facilitates two direct approaches for designing new SPL models for certain tasks, and on the other hand can help conduct the latent objective of self-paced curriculum learning, which is the advanced version of both CL/SPL and possess advantages of both learning regimes to a certain extent. This further facilitates a theoretical understanding for SPCL, instead of only CL/SPL as conventional. Under this theory, we attempt to attain intrinsic latent objectives of two curriculum forms, the partial order and group curriculums, which easily follow the theoretical understanding of the corresponding SPCL regimes.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
SPLBoost: An Improved Robust Boosting Algorithm Based on Self-paced Learning
Authors:
Kaidong Wang,
Yao Wang,
Qian Zhao,
Deyu Meng,
Zongben Xu
Abstract:
It is known that Boosting can be interpreted as a gradient descent technique to minimize an underlying loss function. Specifically, the underlying loss being minimized by the traditional AdaBoost is the exponential loss, which is proved to be very sensitive to random noise/outliers. Therefore, several Boosting algorithms, e.g., LogitBoost and SavageBoost, have been proposed to improve the robustne…
▽ More
It is known that Boosting can be interpreted as a gradient descent technique to minimize an underlying loss function. Specifically, the underlying loss being minimized by the traditional AdaBoost is the exponential loss, which is proved to be very sensitive to random noise/outliers. Therefore, several Boosting algorithms, e.g., LogitBoost and SavageBoost, have been proposed to improve the robustness of AdaBoost by replacing the exponential loss with some designed robust loss functions. In this work, we present a new way to robustify AdaBoost, i.e., incorporating the robust learning idea of Self-paced Learning (SPL) into Boosting framework. Specifically, we design a new robust Boosting algorithm based on SPL regime, i.e., SPLBoost, which can be easily implemented by slightly modifying off-the-shelf Boosting packages. Extensive experiments and a theoretical characterization are also carried out to illustrate the merits of the proposed SPLBoost.
△ Less
Submitted 23 June, 2017; v1 submitted 20 June, 2017;
originally announced June 2017.
-
On the Optimal Solution of Weighted Nuclear Norm Minimization
Authors:
Qi Xie,
Deyu Meng,
Shuhang Gu,
Lei Zhang,
Wangmeng Zuo,
Xiangchu Feng,
Zongben Xu
Abstract:
In recent years, the nuclear norm minimization (NNM) problem has been attracting much attention in computer vision and machine learning. The NNM problem is capitalized on its convexity and it can be solved efficiently. The standard nuclear norm regularizes all singular values equally, which is however not flexible enough to fit real scenarios. Weighted nuclear norm minimization (WNNM) is a natural…
▽ More
In recent years, the nuclear norm minimization (NNM) problem has been attracting much attention in computer vision and machine learning. The NNM problem is capitalized on its convexity and it can be solved efficiently. The standard nuclear norm regularizes all singular values equally, which is however not flexible enough to fit real scenarios. Weighted nuclear norm minimization (WNNM) is a natural extension and generalization of NNM. By assigning properly different weights to different singular values, WNNM can lead to state-of-the-art results in applications such as image denoising. Nevertheless, so far the global optimal solution of WNNM problem is not completely solved yet due to its non-convexity in general cases. In this article, we study the theoretical properties of WNNM and prove that WNNM can be equivalently transformed into a quadratic programming problem with linear constraints. This implies that WNNM is equivalent to a convex problem and its global optimum can be readily achieved by off-the-shelf convex optimization solvers. We further show that when the weights are non-descending, the globally optimal solution of WNNM can be obtained in closed-form.
△ Less
Submitted 23 May, 2014;
originally announced May 2014.
-
FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test
Authors:
Ji Zhao,
Deyu Meng
Abstract:
The maximum mean discrepancy (MMD) is a recently proposed test statistic for two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitud…
▽ More
The maximum mean discrepancy (MMD) is a recently proposed test statistic for two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitude expectation of a linear combination of sinusoid components based on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking advantage of sampling of Fourier transform, FastMMD decreases the time complexity for MMD calculation from $O(N^2 d)$ to $O(L N d)$, where $N$ and $d$ are the size and dimension of the sample set, respectively. Here $L$ is the number of basis functions for approximating kernels which determines the approximation accuracy. For kernels that are spherically invariant, the computation can be further accelerated to $O(L N \log d)$ by using the Fastfood technique (Le et al., 2013). The uniform convergence of our method has also been theoretically proved in both unbiased and biased estimates. We have further provided a geometric explanation for our method, namely ensemble of circular discrepancy, which facilitates us to understand the insight of MMD, and is hopeful to help arouse more extensive metrics for assessing two-sample test. Experimental results substantiate that FastMMD is with similar accuracy as exact MMD, while with faster computation speed and lower variance than the existing MMD approximation methods.
△ Less
Submitted 18 June, 2015; v1 submitted 12 May, 2014;
originally announced May 2014.
-
A recursive divide-and-conquer approach for sparse principal component analysis
Authors:
Qian Zhao,
Deyu Meng,
Zongben Xu
Abstract:
In this paper, a new method is proposed for sparse PCA based on the recursive divide-and-conquer methodology. The main idea is to separate the original sparse PCA problem into a series of much simpler sub-problems, each having a closed-form solution. By recursively solving these sub-problems in an analytical way, an efficient algorithm is constructed to solve the sparse PCA problem. The algorithm…
▽ More
In this paper, a new method is proposed for sparse PCA based on the recursive divide-and-conquer methodology. The main idea is to separate the original sparse PCA problem into a series of much simpler sub-problems, each having a closed-form solution. By recursively solving these sub-problems in an analytical way, an efficient algorithm is constructed to solve the sparse PCA problem. The algorithm only involves simple computations and is thus easy to implement. The proposed method can also be very easily extended to other sparse PCA problems with certain constraints, such as the nonnegative sparse PCA problem. Furthermore, we have shown that the proposed algorithm converges to a stationary point of the problem, and its computational complexity is approximately linear in both data size and dimensionality. The effectiveness of the proposed method is substantiated by extensive experiments implemented on a series of synthetic and real data in both reconstruction-error-minimization and data-variance-maximization viewpoints.
△ Less
Submitted 30 November, 2012;
originally announced November 2012.