-
Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time
Authors:
Jerry Li,
Guanghao Ye
Abstract:
Robust covariance estimation is the following, well-studied problem in high dimensional statistics: given $N$ samples from a $d$-dimensional Gaussian $\mathcal{N}(\boldsymbol{0}, Σ)$, but where an $\varepsilon$-fraction of the samples have been arbitrarily corrupted, output $\widehatΣ$ minimizing the total variation distance between $\mathcal{N}(\boldsymbol{0}, Σ)$ and…
▽ More
Robust covariance estimation is the following, well-studied problem in high dimensional statistics: given $N$ samples from a $d$-dimensional Gaussian $\mathcal{N}(\boldsymbol{0}, Σ)$, but where an $\varepsilon$-fraction of the samples have been arbitrarily corrupted, output $\widehatΣ$ minimizing the total variation distance between $\mathcal{N}(\boldsymbol{0}, Σ)$ and $\mathcal{N}(\boldsymbol{0}, \widehatΣ)$. This corresponds to learning $Σ$ in a natural affine-invariant variant of the Frobenius norm known as the \emph{Mahalanobis norm}. Previous work of Cheng et al demonstrated an algorithm that, given $N = Ω(d^2 / \varepsilon^2)$ samples, achieved a near-optimal error of $O(\varepsilon \log 1 / \varepsilon)$, and moreover, their algorithm ran in time $\widetilde{O}(T(N, d) \log κ/ \mathrm{poly} (\varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d \times N$ matrix by its transpose, and $κ$ is the condition number of $Σ$. When $\varepsilon$ is relatively small, their polynomial dependence on $1/\varepsilon$ in the runtime is prohibitively large. In this paper, we demonstrate a novel algorithm that achieves the same statistical guarantees, but which runs in time $\widetilde{O} (T(N, d) \log κ)$. In particular, our runtime has no dependence on $\varepsilon$. When $Σ$ is reasonably conditioned, our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors, showing that we can get robustness essentially "for free."
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Multi-View Graph Neural Networks for Molecular Property Prediction
Authors:
Hehuan Ma,
Yatao Bian,
Yu Rong,
Wenbing Huang,
Tingyang Xu,
Weiyang Xie,
Geyan Ye,
Junzhou Huang
Abstract:
The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information…
▽ More
The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information simultaneously. Guided by this observation, we present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture to enable more accurate predictions of molecular properties. In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process. This readout component also renders the whole architecture interpretable. We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme that enhances information communication of the two views, which results in the MV-GNN^cross variant. Lastly, we theoretically justify the expressiveness of the two proposed models in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that MV-GNN models achieve remarkably superior performance over the state-of-the-art models on a variety of challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of MV-GNN models.
△ Less
Submitted 12 June, 2020; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees
Authors:
Binghong Chen,
Bo Dai,
Qinjie Lin,
Guo Ye,
Han Liu,
Le Song
Abstract:
We propose a meta path planning algorithm named \emph{Neural Exploration-Exploitation Trees~(NEXT)} for learning from prior experience for solving new path planning problems in high dimensional continuous state and action spaces. Compared to more classical sampling-based methods like RRT, our approach achieves much better sample efficiency in high-dimensions and can benefit from prior experience o…
▽ More
We propose a meta path planning algorithm named \emph{Neural Exploration-Exploitation Trees~(NEXT)} for learning from prior experience for solving new path planning problems in high dimensional continuous state and action spaces. Compared to more classical sampling-based methods like RRT, our approach achieves much better sample efficiency in high-dimensions and can benefit from prior experience of planning in similar environments. More specifically, NEXT exploits a novel neural architecture which can learn promising search directions from problem structures. The learned prior is then integrated into a UCB-type algorithm to achieve an online balance between \emph{exploration} and \emph{exploitation} when solving a new problem. We conduct thorough experiments to show that NEXT accomplishes new planning problems with more compact search trees and significantly outperforms state-of-the-art methods on several benchmarks.
△ Less
Submitted 23 February, 2020; v1 submitted 28 February, 2019;
originally announced March 2019.
-
Robust recovery of complex exponential signals from random Gaussian projections via low rank Hankel matrix reconstruction
Authors:
Jian-Feng Cai,
Xiaobo Qu,
Weiyu Xu,
Gui-Bo Ye
Abstract:
This paper explores robust recovery of a superposition of $R$ distinct complex exponential functions from a few random Gaussian projections. We assume that the signal of interest is of $2N-1$ dimensional and $R<<2N-1$. This framework covers a large class of signals arising from real applications in biology, automation, imaging science, etc. To reconstruct such a signal, our algorithm is to seek a…
▽ More
This paper explores robust recovery of a superposition of $R$ distinct complex exponential functions from a few random Gaussian projections. We assume that the signal of interest is of $2N-1$ dimensional and $R<<2N-1$. This framework covers a large class of signals arising from real applications in biology, automation, imaging science, etc. To reconstruct such a signal, our algorithm is to seek a low-rank Hankel matrix of the signal by minimizing its nuclear norm subject to the consistency on the sampled data. Our theoretical results show that a robust recovery is possible as long as the number of projections exceeds $O(R\ln^2N)$. No incoherence or separation condition is required in our proof. Our method can be applied to spectral compressed sensing where the signal of interest is a superposition of $R$ complex sinusoids. Compared to existing results, our result here does not need any separation condition on the frequencies, while achieving better or comparable bounds on the number of measurements. Furthermore, our method provides theoretical guidance on how many samples are required in the state-of-the-art non-uniform sampling in NMR spectroscopy. The performance of our algorithm is further demonstrated by numerical experiments.
△ Less
Submitted 10 March, 2015;
originally announced March 2015.
-
Efficient Latent Variable Graphical Model Selection via Split Bregman Method
Authors:
Gui-Bo Ye,
Yuanfeng Wang,
Yifei Chen,
Xiaohui Xie
Abstract:
We consider the problem of covariance matrix estimation in the presence of latent variables. Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed variables via a tractable convex program, where the concentration matrix of the observed variables is decomposed into a sparse matrix (representing the graphical structure of the observed variables) and a low…
▽ More
We consider the problem of covariance matrix estimation in the presence of latent variables. Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed variables via a tractable convex program, where the concentration matrix of the observed variables is decomposed into a sparse matrix (representing the graphical structure of the observed variables) and a low rank matrix (representing the marginalization effect of latent variables). We present an efficient first-order method based on split Bregman to solve the convex problem. The algorithm is guaranteed to converge under mild conditions. We show that our algorithm is significantly faster than the state-of-the-art algorithm on both artificial and real-world data. Applying the algorithm to a gene expression data involving thousands of genes, we show that most of the correlation between observed variables can be explained by only a few dozen latent factors.
△ Less
Submitted 13 October, 2011;
originally announced October 2011.
-
Split Bregman Method for Sparse Inverse Covariance Estimation with Matrix Iteration Acceleration
Authors:
Gui-Bo Ye,
Jian-Feng Cai,
Xiaohui Xie
Abstract:
We consider the problem of estimating the inverse covariance matrix by maximizing the likelihood function with a penalty added to encourage the sparsity of the resulting matrix. We propose a new approach based on the split Bregman method to solve the regularized maximum likelihood estimation problem. We show that our method is significantly faster than the widely used graphical lasso method, which…
▽ More
We consider the problem of estimating the inverse covariance matrix by maximizing the likelihood function with a penalty added to encourage the sparsity of the resulting matrix. We propose a new approach based on the split Bregman method to solve the regularized maximum likelihood estimation problem. We show that our method is significantly faster than the widely used graphical lasso method, which is based on blockwise coordinate descent, on both artificial and real-world data. More importantly, different from the graphical lasso, the split Bregman based method is much more general, and can be applied to a class of regularization terms other than the $\ell_1$ norm
△ Less
Submitted 23 December, 2010; v1 submitted 5 December, 2010;
originally announced December 2010.
-
Split Bregman method for large scale fused Lasso
Authors:
Gui-Bo Ye,
Xiaohui Xie
Abstract:
rdering of regression or classification coefficients occurs in many real-world applications. Fused Lasso exploits this ordering by explicitly regularizing the differences between neighboring coefficients through an $\ell_1$ norm regularizer. However, due to nonseparability and nonsmoothness of the regularization term, solving the fused Lasso problem is computationally demanding. Existing solvers c…
▽ More
rdering of regression or classification coefficients occurs in many real-world applications. Fused Lasso exploits this ordering by explicitly regularizing the differences between neighboring coefficients through an $\ell_1$ norm regularizer. However, due to nonseparability and nonsmoothness of the regularization term, solving the fused Lasso problem is computationally demanding. Existing solvers can only deal with problems of small or medium size, or a special case of the fused Lasso problem in which the predictor matrix is identity matrix. In this paper, we propose an iterative algorithm based on split Bregman method to solve a class of large-scale fused Lasso problems, including a generalized fused Lasso and a fused Lasso support vector classifier. We derive our algorithm using augmented Lagrangian method and prove its convergence properties. The performance of our method is tested on both artificial data and real-world applications including proteomic data from mass spectrometry and genomic data from array CGH. We demonstrate that our method is many times faster than the existing solvers, and show that it is especially efficient for large p, small n problems.
△ Less
Submitted 25 June, 2010;
originally announced June 2010.
-
Learning sparse gradients for variable selection and dimension reduction
Authors:
Gui-Bo Ye,
Xiaohui Xie
Abstract:
Variable selection and dimension reduction are two commonly adopted approaches for high-dimensional data analysis, but have traditionally been treated separately. Here we propose an integrated approach, called sparse gradient learning (SGL), for variable selection and dimension reduction via learning the gradients of the prediction function directly from samples. By imposing a sparsity constraint…
▽ More
Variable selection and dimension reduction are two commonly adopted approaches for high-dimensional data analysis, but have traditionally been treated separately. Here we propose an integrated approach, called sparse gradient learning (SGL), for variable selection and dimension reduction via learning the gradients of the prediction function directly from samples. By imposing a sparsity constraint on the gradients, variable selection is achieved by selecting variables corresponding to non-zero partial derivatives, and effective dimensions are extracted based on the eigenvectors of the derived sparse empirical gradient covariance matrix. An error analysis is given for the convergence of the estimated gradients to the true ones in both the Euclidean and the manifold setting. We also develop an efficient forward-backward splitting algorithm to solve the SGL problem, making the framework practically scalable for medium or large datasets. The utility of SGL for variable selection and feature extraction is explicitly given and illustrated on artificial data as well as real-world examples. The main advantages of our method include variable selection for both linear and nonlinear predictions, effective dimension reduction with sparse loadings, and an efficient algorithm for large p, small n problems.
△ Less
Submitted 1 July, 2010; v1 submitted 25 June, 2010;
originally announced June 2010.