Search | arXiv e-print repository

Approximation of RKHS Functionals by Neural Networks

Authors: Tian-Yi Zhou, Namjoon Suh, Guang Cheng, Xiaoming Huo

Abstract: Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation… ▽ More Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation of functionals on the RKHS's. Specifically, we derive explicit error bounds for those induced by inverse multiquadric, Gaussian, and Sobolev kernels. Moreover, we apply our findings to functional regression, proving that neural networks can accurately approximate the regression maps in generalized functional linear models. Existing works on functional learning require integration-type basis function expansions with a set of pre-specified basis functions. By leveraging the interpolating orthogonal projections in RKHS's, our proposed network is much simpler in that we use point evaluations to replace basis function expansions. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2401.15262 [pdf, other]

Asymptotic Behavior of Adversarial Training Estimator under $\ell_\infty$-Perturbation

Authors: Yiling Xie, Xiaoming Huo

Abstract: Adversarial training has been proposed to hedge against adversarial attacks in machine learning and statistical models. This paper focuses on adversarial training under $\ell_\infty$-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the limiting di… ▽ More Adversarial training has been proposed to hedge against adversarial attacks in machine learning and statistical models. This paper focuses on adversarial training under $\ell_\infty$-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the limiting distribution of the adversarial training estimator under $\ell_\infty$-perturbation could put a positive probability mass at $0$ when the true parameter is $0$, providing a theoretical guarantee of the associated sparsity-recovery ability. Alternatively, a two-step procedure is proposed -- adaptive adversarial training, which could further improve the performance of adversarial training under $\ell_\infty$-perturbation. Specifically, the proposed procedure could achieve asymptotic unbiasedness and variable-selection consistency. Numerical experiments are conducted to show the sparsity-recovery ability of adversarial training under $\ell_\infty$-perturbation and to compare the empirical performance between classic adversarial training and adaptive adversarial training. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.04286 [pdf, ps, other]

Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes

Authors: Hyunouk Ko, Xiaoming Huo

Abstract: In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impo… ▽ More In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior. △ Less

Submitted 30 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2310.10767 [pdf, ps, other]

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

Authors: Tianxiang Gao, Xiaokai Huo, Hailiang Liu, Hongyang Gao

Abstract: Neural networks with wide layers have attracted significant attention due to their equivalence to Gaussian processes, enabling perfect fitting of training data while maintaining generalization performance, known as benign overfitting. However, existing results mainly focus on shallow or finite-depth networks, necessitating a comprehensive analysis of wide neural networks with infinite-depth layers… ▽ More Neural networks with wide layers have attracted significant attention due to their equivalence to Gaussian processes, enabling perfect fitting of training data while maintaining generalization performance, known as benign overfitting. However, existing results mainly focus on shallow or finite-depth networks, necessitating a comprehensive analysis of wide neural networks with infinite-depth layers, such as neural ordinary differential equations (ODEs) and deep equilibrium models (DEQs). In this paper, we specifically investigate the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process, establishing what is known as the Neural Network and Gaussian Process (NNGP) correspondence. Remarkably, this convergence holds even when the limits of depth and width are interchanged, which is not observed in typical infinite-depth Multilayer Perceptron (MLP) networks. Furthermore, we demonstrate that the associated Gaussian vector remains non-degenerate for any pairwise distinct input data, ensuring a strictly positive smallest eigenvalue of the corresponding kernel matrix using the NNGP kernel. These findings serve as fundamental elements for studying the training and generalization of DEQs, laying the groundwork for future research in this area. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2309.15075 [pdf, other]

On Excess Risk Convergence Rates of Neural Network Classifiers

Authors: Hyunouk Ko, Namjoon Suh, Xiaoming Huo

Abstract: The recent success of neural networks in pattern recognition and classification problems suggests that neural networks possess qualities distinct from other more classical classifiers such as SVMs or boosting classifiers. This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. Compared to the typical s… ▽ More The recent success of neural networks in pattern recognition and classification problems suggests that neural networks possess qualities distinct from other more classical classifiers such as SVMs or boosting classifiers. This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. Compared to the typical settings imposed in the literature, we consider a more general scenario that resembles actual practice in two respects: first, the function class to be approximated includes the Barron functions as a proper subset, and second, the neural network classifier constructed is the minimizer of a surrogate loss instead of the $0$-$1$ loss so that gradient descent-based numerical optimizations can be easily applied. While the class of functions we consider is quite large that optimal rates cannot be faster than $n^{-\frac{1}{3}}$, it is a regime in which dimension-free rates are possible and approximation power of neural networks can be taken advantage of. In particular, we analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence for the excess risk. Finally, we show that the rate obtained is in fact minimax optimal up to a logarithmic factor, and the minimax lower bound shows the effect of the margin assumption in this regime. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2308.08030 [pdf, other]

Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks

Authors: Tian-Yi Zhou, Xiaoming Huo

Abstract: This paper studies the binary classification of unbounded data from ${\mathbb R}^d$ generated under Gaussian Mixture Models (GMMs) using deep ReLU neural networks. We obtain $\unicode{x2013}$ for the first time $\unicode{x2013}$ non-asymptotic upper bounds and convergence rates of the excess risk (excess misclassification error) for the classification without restrictions on model parameters. The… ▽ More This paper studies the binary classification of unbounded data from ${\mathbb R}^d$ generated under Gaussian Mixture Models (GMMs) using deep ReLU neural networks. We obtain $\unicode{x2013}$ for the first time $\unicode{x2013}$ non-asymptotic upper bounds and convergence rates of the excess risk (excess misclassification error) for the classification without restrictions on model parameters. The convergence rates we derive do not depend on dimension $d$, demonstrating that deep ReLU networks can overcome the curse of dimensionality in classification. While the majority of existing generalization analysis of classification algorithms relies on a bounded domain, we consider an unbounded domain by leveraging the analyticity and fast decay of Gaussian distributions. To facilitate our analysis, we give a novel approximation error bound for general analytic functions using ReLU networks, which may be of independent interest. Gaussian distributions can be adopted nicely to model data arising in applications, e.g., speeches, images, and texts; our results provide a theoretical verification of the observed efficiency of deep neural networks in practical classification problems. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2307.05109 [pdf, other]

Conformalization of Sparse Generalized Linear Models

Authors: Etash Kumar Guha, Eugene Ndiaye, Xiaoming Huo

Abstract: Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indee… ▽ More Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: ICML 2023

arXiv:2303.15579 [pdf, other]

Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning

Authors: Yiling Xie, Xiaoming Huo

Abstract: We propose an adjusted Wasserstein distributionally robust estimator -- based on a nonlinear transformation of the Wasserstein distributionally robust (WDRO) estimator in statistical learning. The classic WDRO estimator is asymptotically biased, while our adjusted WDRO estimator is asymptotically unbiased, resulting in a smaller asymptotic mean squared error. Further, under certain conditions, our… ▽ More We propose an adjusted Wasserstein distributionally robust estimator -- based on a nonlinear transformation of the Wasserstein distributionally robust (WDRO) estimator in statistical learning. The classic WDRO estimator is asymptotically biased, while our adjusted WDRO estimator is asymptotically unbiased, resulting in a smaller asymptotic mean squared error. Further, under certain conditions, our proposed adjustment technique provides a general principle to de-bias asymptotically biased estimators. Specifically, we will investigate how the adjusted WDRO estimator is developed in the generalized linear model, including logistic regression, linear regression, and Poisson regression. Numerical experiments demonstrate the favorable practical performance of the adjusted estimator over the classic one. △ Less

Submitted 9 May, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.03576 [pdf, other]

A Survey of Numerical Algorithms that can Solve the Lasso Problems

Authors: Yujie Zhao, Xiaoming Huo

Abstract: In statistics, the least absolute shrinkage and selection operator (Lasso) is a regression method that performs both variable selection and regularization. There is a lot of literature available, discussing the statistical properties of the regression coefficients estimated by the Lasso method. However, there lacks a comprehensive review discussing the algorithms to solve the optimization problem… ▽ More In statistics, the least absolute shrinkage and selection operator (Lasso) is a regression method that performs both variable selection and regularization. There is a lot of literature available, discussing the statistical properties of the regression coefficients estimated by the Lasso method. However, there lacks a comprehensive review discussing the algorithms to solve the optimization problem in Lasso. In this review, we summarize five representative algorithms to optimize the objective function in Lasso, including the iterative shrinkage threshold algorithm (ISTA), fast iterative shrinkage-thresholding algorithms (FISTA), coordinate gradient descent algorithm (CGDA), smooth L1 algorithm (SLA), and path following algorithm (PFA). Additionally, we also compare their convergence rate, as well as their potential strengths and weakness. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2301.09675 [pdf, other]

Improved Rate of First Order Algorithms for Entropic Optimal Transport

Authors: Yiling Luo, Yiling Xie, Xiaoming Huo

Abstract: This paper improves the state-of-the-art rate of a first-order algorithm for solving entropy regularized optimal transport. The resulting rate for approximating the optimal transport (OT) has been improved from $\widetilde{O}({n^{2.5}}/ε)$ to $\widetilde{O}({n^2}/ε)$, where $n$ is the problem size and $ε$ is the accuracy level. In particular, we propose an accelerated primal-dual stochastic mirror… ▽ More This paper improves the state-of-the-art rate of a first-order algorithm for solving entropy regularized optimal transport. The resulting rate for approximating the optimal transport (OT) has been improved from $\widetilde{O}({n^{2.5}}/ε)$ to $\widetilde{O}({n^2}/ε)$, where $n$ is the problem size and $ε$ is the accuracy level. In particular, we propose an accelerated primal-dual stochastic mirror descent algorithm with variance reduction. Such special design helps us improve the rate compared to other accelerated primal-dual algorithms. We further propose a batch version of our stochastic algorithm, which improves the computational performance through parallel computing. To compare, we prove that the computational complexity of the Stochastic Sinkhorn algorithm is $\widetilde{O}({n^2}/{ε^2})$, which is slower than our accelerated primal-dual stochastic mirror algorithm. Experiments are done using synthetic and real data, and the results match our theoretical rates. Our algorithm may inspire more research to develop accelerated primal-dual algorithms that have rate $\widetilde{O}({n^2}/ε)$ for solving OT. △ Less

Submitted 23 January, 2023; originally announced January 2023.

arXiv:2212.01259 [pdf, other]

Covariance Estimators for the ROOT-SGD Algorithm in Online Learning

Authors: Yiling Luo, Xiaoming Huo, Yajun Mei

Abstract: Online learning naturally arises in many statistical and machine learning problems. The most widely used methods in online learning are stochastic first-order algorithms. Among this family of algorithms, there is a recently developed algorithm, Recursive One-Over-T SGD (ROOT-SGD). ROOT-SGD is advantageous in that it converges at a non-asymptotically fast rate, and its estimator further converges t… ▽ More Online learning naturally arises in many statistical and machine learning problems. The most widely used methods in online learning are stochastic first-order algorithms. Among this family of algorithms, there is a recently developed algorithm, Recursive One-Over-T SGD (ROOT-SGD). ROOT-SGD is advantageous in that it converges at a non-asymptotically fast rate, and its estimator further converges to a normal distribution. However, this normal distribution has unknown asymptotic covariance; thus cannot be directly applied to measure the uncertainty. To fill this gap, we develop two estimators for the asymptotic covariance of ROOT-SGD. Our covariance estimators are useful for statistical inference in ROOT-SGD. Our first estimator adopts the idea of plug-in. For each unknown component in the formula of the asymptotic covariance, we substitute it with its empirical counterpart. The plug-in estimator converges at the rate $\mathcal{O}(1/\sqrt{t})$, where $t$ is the sample size. Despite its quick convergence, the plug-in estimator has the limitation that it relies on the Hessian of the loss function, which might be unavailable in some cases. Our second estimator is a Hessian-free estimator that overcomes the aforementioned limitation. The Hessian-free estimator uses the random-scaling technique, and we show that it is an asymptotically consistent estimator of the true covariance. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2210.16645 [pdf, other]

Solving a Special Type of Optimal Transport Problem by a Modified Hungarian Algorithm

Authors: Yiling Xie, Yiling Luo, Xiaoming Huo

Abstract: Computing the empirical Wasserstein distance in the Wasserstein-distance-based independence test is an optimal transport (OT) problem with a special structure. This observation inspires us to study a special type of OT problem and propose a modified Hungarian algorithm to solve it exactly. For the OT problem involving two marginals with $m$ and $n$ atoms ($m\geq n$), respectively, the computationa… ▽ More Computing the empirical Wasserstein distance in the Wasserstein-distance-based independence test is an optimal transport (OT) problem with a special structure. This observation inspires us to study a special type of OT problem and propose a modified Hungarian algorithm to solve it exactly. For the OT problem involving two marginals with $m$ and $n$ atoms ($m\geq n$), respectively, the computational complexity of the proposed algorithm is $O(m^2n)$. Computing the empirical Wasserstein distance in the independence test requires solving this special type of OT problem, where $m=n^2$. The associated computational complexity of the proposed algorithm is $O(n^5)$, while the order of applying the classic Hungarian algorithm is $O(n^6)$. In addition to the aforementioned special type of OT problem, it is shown that the modified Hungarian algorithm could be adopted to solve a wider range of OT problems. Broader applications of the proposed algorithm are discussed -- solving the one-to-many assignment problem and the many-to-many assignment problem. We conduct numerical experiments to validate our theoretical results. The experiment results demonstrate that the proposed modified Hungarian algorithm compares favorably with the Hungarian algorithm, the well-known Sinkhorn algorithm, and the network simplex algorithm. △ Less

Submitted 28 February, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

arXiv:2210.14184 [pdf, other]

Learning Ability of Interpolating Deep Convolutional Neural Networks

Authors: Tian-Yi Zhou, Xiaoming Huo

Abstract: It is frequently observed that overparameterized neural networks generalize well. Regarding such phenomena, existing theoretical work mainly devotes to linear settings or fully-connected neural networks. This paper studies the learning ability of an important family of deep neural networks, deep convolutional neural networks (DCNNs), under both underparameterized and overparameterized settings. We… ▽ More It is frequently observed that overparameterized neural networks generalize well. Regarding such phenomena, existing theoretical work mainly devotes to linear settings or fully-connected neural networks. This paper studies the learning ability of an important family of deep neural networks, deep convolutional neural networks (DCNNs), under both underparameterized and overparameterized settings. We establish the first learning rates of underparameterized DCNNs without parameter or function variable structure restrictions presented in the literature. We also show that by adding well-defined layers to a non-interpolating DCNN, we can obtain some interpolating DCNNs that maintain the good learning rates of the non-interpolating DCNN. This result is achieved by a novel network deepening scheme designed for DCNNs. Our work provides theoretical verification of how overfitted DCNNs generalize well. △ Less

Submitted 16 August, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2205.10447 [pdf, other]

doi 10.1080/02664763.2022.2112557

Hot-spots Detection in Count Data by Poisson Assisted Smooth Sparse Tensor Decomposition

Authors: Yujie Zhao, Xiaoming Huo, Yajun Mei

Abstract: Count data occur widely in many bio-surveillance and healthcare applications, e.g., the numbers of new patients of different types of infectious diseases from different cities/counties/states repeatedly over time, say, daily/weekly/monthly. For this type of count data, one important task is the quick detection and localization of hot-spots in terms of unusual infectious rates so that we can respon… ▽ More Count data occur widely in many bio-surveillance and healthcare applications, e.g., the numbers of new patients of different types of infectious diseases from different cities/counties/states repeatedly over time, say, daily/weekly/monthly. For this type of count data, one important task is the quick detection and localization of hot-spots in terms of unusual infectious rates so that we can respond appropriately. In this paper, we develop a method called Poisson assisted Smooth Sparse Tensor Decomposition (PoSSTenD), which not only detects when hot-spots occur but also localizes where hot-spots occur. The main idea of our proposed PoSSTenD method is articulated as follows. First, we represent the observed count data as a three-dimensional tensor including (1) a spatial dimension for location patterns, e.g., different cities/countries/states; (2) a temporal domain for time patterns, e.g., daily/weekly/monthly; (3) a categorical dimension for different types of data sources, e.g., different types of diseases. Second, we fit this tensor into a Poisson regression model, and then we further decompose the infectious rate into two components: smooth global trend and local hot-spots. Third, we detect when hot-spots occur by building a cumulative sum (CUSUM) control chart and localize where hot-spots occur by their LASSO-type sparse estimation. The usefulness of our proposed methodology is validated through numerical simulation studies and a real-world dataset, which records the annual number of 10 different infectious diseases from 1993 to 2018 for 49 mainland states in the United States. △ Less

Submitted 1 June, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: 7 figures, 22 pages, 4 tables

Journal ref: Journal of Applied Statistics, 2022

arXiv:2205.00061 [pdf, other]

doi 10.1109/ISIT50566.2022.9834388

The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models

Authors: Yiling Luo, Xiaoming Huo, Yajun Mei

Abstract: We study the Stochastic Gradient Descent (SGD) algorithm in nonparametric statistics: kernel regression in particular. The directional bias property of SGD, which is known in the linear regression setting, is generalized to the kernel regression. More specifically, we prove that SGD with moderate and annealing step-size converges along the direction of the eigenvector that corresponds to the large… ▽ More We study the Stochastic Gradient Descent (SGD) algorithm in nonparametric statistics: kernel regression in particular. The directional bias property of SGD, which is known in the linear regression setting, is generalized to the kernel regression. More specifically, we prove that SGD with moderate and annealing step-size converges along the direction of the eigenvector that corresponds to the largest eigenvalue of the Gram matrix. In addition, the Gradient Descent (GD) with a moderate or small step-size converges along the direction that corresponds to the smallest eigenvalue. These facts are referred to as the directional bias properties; they may interpret how an SGD-computed estimator has a potentially smaller generalization error than a GD-computed estimator. The application of our theory is demonstrated by simulation studies and a case study that is based on the FashionMNIST dataset. △ Less

Submitted 29 April, 2022; originally announced May 2022.

arXiv:2205.00058 [pdf, other]

doi 10.1109/ISIT50566.2022.9834827

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

Authors: Yiling Luo, Xiaoming Huo, Yajun Mei

Abstract: In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic mirror descent (SMD) algorithm is a numerically efficient method -- each iteration involving a very small subset of the data. The variance reduction version of… ▽ More In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic mirror descent (SMD) algorithm is a numerically efficient method -- each iteration involving a very small subset of the data. The variance reduction version of SMD (VRSMD) can further improve SMD by inducing faster convergence. On the other hand, algorithms such as gradient descent and stochastic gradient descent have the implicit regularization property that leads to better performance in terms of the generalization errors. Little is known on whether such a property holds for VRSMD. We prove here that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression. This establishes the implicit regularization property for VRSMD. As an application of the above result, we derive a model estimation accuracy result in the setting when the true model is sparse. We use numerical examples to illustrate the empirical power of VRSMD. △ Less

Submitted 29 April, 2022; originally announced May 2022.

arXiv:2203.00813 [pdf, other]

An Accelerated Stochastic Algorithm for Solving the Optimal Transport Problem

Authors: Yiling Xie, Yiling Luo, Xiaoming Huo

Abstract: A primal-dual accelerated stochastic gradient descent with variance reduction algorithm (PDASGD) is proposed to solve linear-constrained optimization problems. PDASGD could be applied to solve the discrete optimal transport (OT) problem and enjoys the best-known computational complexity -- $\widetilde{\mathcal{O}}(n^2/ε)$, where $n$ is the number of atoms, and $ε>0$ is the accuracy. In the literat… ▽ More A primal-dual accelerated stochastic gradient descent with variance reduction algorithm (PDASGD) is proposed to solve linear-constrained optimization problems. PDASGD could be applied to solve the discrete optimal transport (OT) problem and enjoys the best-known computational complexity -- $\widetilde{\mathcal{O}}(n^2/ε)$, where $n$ is the number of atoms, and $ε>0$ is the accuracy. In the literature, some primal-dual accelerated first-order algorithms, e.g., APDAGD, have been proposed and have the order of $\widetilde{\mathcal{O}}(n^{2.5}/ε)$ for solving the OT problem. To understand why our proposed algorithm could improve the rate by a factor of $\widetilde{\mathcal{O}}(\sqrt{n})$, the conditions under which our stochastic algorithm has a lower order of computational complexity for solving linear-constrained optimization problems are discussed. It is demonstrated that the OT problem could satisfy the aforementioned conditions. Numerical experiments demonstrate superior practical performances of the proposed PDASGD algorithm for solving the OT problem. △ Less

Submitted 29 May, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: Compared with previous versions, both theoretical complexity and numerical performances have been improved for solving the OT problem in this version

arXiv:2103.10231 [pdf, ps, other]

Identification of Partial-Differential-Equations-Based Models from Noisy Data via Splines

Authors: Yujie Zhao, Xiaoming Huo, Yajun Mei

Abstract: We propose a two-stage method called \textit{Spline Assisted Partial Differential Equation based Model Identification (SAPDEMI)} to identify partial differential equation (PDE)-based models from noisy data. In the first stage, we employ the cubic splines to estimate unobservable derivatives. The underlying PDE is based on a subset of these derivatives. This stage is computationally efficient: its… ▽ More We propose a two-stage method called \textit{Spline Assisted Partial Differential Equation based Model Identification (SAPDEMI)} to identify partial differential equation (PDE)-based models from noisy data. In the first stage, we employ the cubic splines to estimate unobservable derivatives. The underlying PDE is based on a subset of these derivatives. This stage is computationally efficient: its computational complexity is a product of a constant with the sample size; this is the lowest possible order of computational complexity. In the second stage, we apply the Least Absolute Shrinkage and Selection Operator (Lasso) to identify the underlying PDE-based model. Statistical properties are developed, including the model identification accuracy. We validate our theory through various numerical examples and a real data case study. The case study is based on a National Aeronautics and Space Administration (NASA) data set. △ Less

Submitted 6 March, 2023; v1 submitted 18 March, 2021; originally announced March 2021.

arXiv:2103.07045 [pdf, ps, other]

Asymptotic Theory of $\ell_1$-Regularized PDE Identification from a Single Noisy Trajectory

Authors: Yuchen He, Namjoon Suh, Xiaoming Huo, Sungha Kang, Yajun Mei

Abstract: We prove the support recovery for a general class of linear and nonlinear evolutionary partial differential equation (PDE) identification from a single noisy trajectory using $\ell_1$ regularized Pseudo-Least Squares model~($\ell_1$-PsLS). In any associative $\mathbb{R}$-algebra generated by finitely many differentiation operators that contain the unknown PDE operator, applying $\ell_1$-PsLS to a… ▽ More We prove the support recovery for a general class of linear and nonlinear evolutionary partial differential equation (PDE) identification from a single noisy trajectory using $\ell_1$ regularized Pseudo-Least Squares model~($\ell_1$-PsLS). In any associative $\mathbb{R}$-algebra generated by finitely many differentiation operators that contain the unknown PDE operator, applying $\ell_1$-PsLS to a given data set yields a family of candidate models with coefficients $\mathbf{c}(λ)$ parameterized by the regularization weight $λ\geq 0$. The trace of $\{\mathbf{c}(λ)\}_{λ\geq 0}$ suffers from high variance due to data noises and finite difference approximation errors. We provide a set of sufficient conditions which guarantee that, from a single trajectory data denoised by a Local-Polynomial filter, the support of $\mathbf{c}(λ)$ asymptotically converges to the true signed-support associated with the underlying PDE for sufficiently many data and a certain range of $λ$. We also show various numerical experiments to validate our theory. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: 38 pages, 6 figures

arXiv:2010.13934 [pdf, other]

Accelerate the Warm-up Stage in the Lasso Computation via a Homotopic Approach

Authors: Yujie Zhao, Xiaoming Huo

Abstract: In optimization, it is known that when the objective functions are strictly convex and well-conditioned, gradient-based approaches can be extremely effective, e.g., achieving the exponential rate of convergence. On the other hand, the existing Lasso-type estimator in general cannot achieve the optimal rate due to the undesirable behavior of the absolute function at the origin. A homotopic method i… ▽ More In optimization, it is known that when the objective functions are strictly convex and well-conditioned, gradient-based approaches can be extremely effective, e.g., achieving the exponential rate of convergence. On the other hand, the existing Lasso-type estimator in general cannot achieve the optimal rate due to the undesirable behavior of the absolute function at the origin. A homotopic method is to use a sequence of surrogate functions to approximate the $\ell_1$ penalty that is used in the Lasso-type of estimators. The surrogate functions will converge to the $\ell_1$ penalty in the Lasso estimator. At the same time, each surrogate function is strictly convex, which enables a provable faster numerical rate of convergence. In this paper, we demonstrate that by meticulously defining the surrogate functions, one can prove a faster numerical convergence rate than any existing methods in computing for the Lasso-type of estimators. Namely, the state-of-the-art algorithms can only guarantee $O(1/ε)$ or $O(1/\sqrtε)$ convergence rates, while we can prove an $O([\log(1/ε)]^2)$ for the newly proposed algorithm. Our numerical simulations show that the new algorithm also performs better empirically. △ Less

Submitted 6 March, 2023; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: 19 pages, 3 figures, 3 tables

arXiv:2009.09310 [pdf, other]

Fast and Asymptotically Powerful Detection for Filamentary Objects in Digital Images

Authors: Kai Ni, Shanshan Cao, Xiaoming Huo

Abstract: Given an inhomogeneous chain embedded in a noisy image, we consider the conditions under which such an embedded chain is detectable. Many applications, such as detecting moving objects, detecting ship wakes, can be abstracted as the detection on the existence of chains. In this work, we provide the detection algorithm with low order of computation complexity to detect the chain and the optimal the… ▽ More Given an inhomogeneous chain embedded in a noisy image, we consider the conditions under which such an embedded chain is detectable. Many applications, such as detecting moving objects, detecting ship wakes, can be abstracted as the detection on the existence of chains. In this work, we provide the detection algorithm with low order of computation complexity to detect the chain and the optimal theoretical detectability regarding SNR (signal to noise ratio) under the normal distribution model. Specifically, we derive an analytical threshold that specifies what is detectable. We design a longest significant chain detection algorithm, with computation complexity in the order of $O(n\log n)$. We also prove that our proposed algorithm is asymptotically powerful, which means, as the dimension $n \rightarrow \infty$, the probability of false detection vanishes. We further provide some simulated examples and a real data example, which validate our theory. △ Less

Submitted 19 September, 2020; originally announced September 2020.

Comments: 13 pages, 8 figures

arXiv:2001.00068 [pdf, other]

Asymptotic convergence rate of the longest run in an inflating Bernoulli net

Authors: Kai Ni, Shanshan Cao, Xiaoming Huo

Abstract: In image detection, one problem is to test whether the set, though mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve, for example, a curve with $C^α$-norm bounded by $β$. One approach is to analyze the data by counting membership in multiscale multianisotropic strips, which involves an algorithm that delves into the… ▽ More In image detection, one problem is to test whether the set, though mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve, for example, a curve with $C^α$-norm bounded by $β$. One approach is to analyze the data by counting membership in multiscale multianisotropic strips, which involves an algorithm that delves into the length of the path connecting many consecutive "significant" nodes. In this paper, we develop the mathematical formalism of this algorithm and analyze the statistical property of the length of the longest significant run. The rate of convergence is derived. Using percolation theory and random graph theory, we present a novel probabilistic model named pseudo-tree model. Based on the asymptotic results for pseudo-tree model, we further study the length of the longest significant run in an "inflating" Bernoulli net. We find that the probability parameter $p$ of significant node plays an important role: there is a threshold $p_c$, such that in the cases of $p<p_c$ and $p>p_c$, very different asymptotic behaviors of the length of the significant are observed. We apply our results to the detection of an underlying curvilinear feature and argue that we achieve the lowest possible detectable strength in theory. △ Less

Submitted 31 December, 2019; originally announced January 2020.

arXiv:1912.00524 [pdf, other]

Factor Analysis on Citation, Using a Combined Latent and Logistic Regression Model

Authors: Namjoon Suh, Xiaoming Huo, Eric Heim, Lee Seversky

Abstract: We propose a combined model, which integrates the latent factor model and the logistic regression model, for the citation network. It is noticed that neither a latent factor model nor a logistic regression model alone is sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represents the main technological trends (a.k.a., factors), and a… ▽ More We propose a combined model, which integrates the latent factor model and the logistic regression model, for the citation network. It is noticed that neither a latent factor model nor a logistic regression model alone is sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represents the main technological trends (a.k.a., factors), and adds a sparse component that captures the remaining ad-hoc dependence. Parameter estimation is carried out through the construction of a joint-likelihood function of edges and properly chosen penalty terms. The convexity of the objective function allows us to develop an efficient algorithm, while the penalty terms push towards a low-dimensional latent component and a sparse graphical structure. Simulation results show that the proposed method works well in practical situations. The proposed method has been applied to a real application, which contains a citation network of statisticians (Ji and **, 2016). Some interesting findings are reported. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Comments: Citation network, matrix decomposition, latent variable model, logistic regression model, convex optimization, alternating direction method of multiplier

arXiv:1911.03592 [pdf, other]

Optimal Shape Control via $L_\infty$ Loss for Composite Fuselage Assembly

Authors: Juan Du, Shanshan Cao, Jeffrey H. Hunt, Xiaoming Huo

Abstract: Shape control is critical to ensure the quality of composite fuselage assembly. In current practice, the structures are adjusted to the design shape in terms of the $\ell_2$ loss for further assembly without considering the existing dimensional gap between two structures. Such practice has two limitations: (1) the design shape may not be the optimal shape in terms of a pair of incoming fuselages w… ▽ More Shape control is critical to ensure the quality of composite fuselage assembly. In current practice, the structures are adjusted to the design shape in terms of the $\ell_2$ loss for further assembly without considering the existing dimensional gap between two structures. Such practice has two limitations: (1) the design shape may not be the optimal shape in terms of a pair of incoming fuselages with different incoming dimensions; (2) the maximum gap is the key concern during the fuselage assembly process. This paper proposes an optimal shape control methodology via the $\ell_\infty$ loss for composite fuselage assembly process by considering the existing dimensional gap between the incoming pair of fuselages. Besides, due to the limitation on the number of available actuators in practice, we face an important problem of finding the best locations for the actuators among many potential locations, which makes the problem a sparse estimation problem. We are the first to solve the optimal shape control in fuselage assembly process using the $\ell_\infty$ model under the framework of sparse estimation, where we use the $\ell_1$ penalty to control the sparsity of the resulting estimator. From statistical point of view, this can be formulated as the $\ell_\infty$ loss based linear regression, and under some standard assumptions, such as the restricted eigenvalue (RE) conditions, and the light tailed noise, the non-asymptotic estimation error of the $\ell_1$ regularized $\ell_\infty$ linear model is derived to be the order of $O(σ\sqrt{\frac{S\log p}{n}})$, which meets the upper-bound in the existing literature. Compared to the current practice, the case study shows that our proposed method significantly reduces the maximum gap between two fuselages after shape adjustments. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: 31 pages, 10 figures

arXiv:1911.02753 [pdf, other]

Optimal Projections in the Distance-Based Statistical Methods

Authors: Chuan** Yu, Xiaoming Huo

Abstract: This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Székely [2016] for th… ▽ More This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Székely [2016] for the univariate variables, the computational complexity can be improved from $O(m^2)$ to $O(n m \cdot \mbox{log}(m))$, where $n$ is the number of projection directions and $m$ is the sample size. When $n \ll m/\log(m)$, computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either $2$ or the number of projection directions. In the generic settings, we propose an algorithm to find some approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated. △ Less

Submitted 6 November, 2019; originally announced November 2019.

arXiv:1903.00037 [pdf, other]

Distance-Based Independence Screening for Canonical Analysis

Authors: Yi** Ni, Chuan** Yu, Andy Ko, Xiaoming Huo

Abstract: This paper introduces a novel method called Distance-Based Independence Screening for Canonical Analysis (DISCA) that performs simultaneous dimension reduction for a pair of random variables by optimizing the distance covariance (dCov). dCov is a statistic first proposed by Székely et al. [2009] for independence testing. Compared with sufficient dimension reduction (SDR) and canonical correlation… ▽ More This paper introduces a novel method called Distance-Based Independence Screening for Canonical Analysis (DISCA) that performs simultaneous dimension reduction for a pair of random variables by optimizing the distance covariance (dCov). dCov is a statistic first proposed by Székely et al. [2009] for independence testing. Compared with sufficient dimension reduction (SDR) and canonical correlation analysis (CCA)-based approaches, DISCA is a model-free approach that does not impose dimensional or distributional restrictions on variables and is more sensitive to nonlinear relationships. Theoretically, we establish a non-asymptotic error bound to provide a guarantee of our method's performance. Numerically, DISCA performs comparable to or better than other state-of-the-art algorithms and is computationally faster. All codes of our DISCA method can be found on GitHub https : //github.com/Yi**911/DISCA.git, including an R package named DISCA. △ Less

Submitted 12 October, 2023; v1 submitted 28 February, 2019; originally announced March 2019.

Comments: 33 pages

arXiv:1707.04602 [pdf, other]

An Efficient and Distribution-Free Two-Sample Test Based on Energy Statistics and Random Projections

Authors: Cheng Huang, Xiaoming Huo

Abstract: A common disadvantage in existing distribution-free two-sample testing approaches is that the computational complexity could be high. Specifically, if the sample size is $N$, the computational complexity of those two-sample tests is at least $O(N^2)$. In this paper, we develop an efficient algorithm with complexity $O(N \log N)$ for computing energy statistics in univariate cases. For multivariate… ▽ More A common disadvantage in existing distribution-free two-sample testing approaches is that the computational complexity could be high. Specifically, if the sample size is $N$, the computational complexity of those two-sample tests is at least $O(N^2)$. In this paper, we develop an efficient algorithm with complexity $O(N \log N)$ for computing energy statistics in univariate cases. For multivariate cases, we introduce a two-sample test based on energy statistics and random projections, which enjoys the $O(K N \log N)$ computational complexity, where $K$ is the number of random projections. We name our method for multivariate cases as Randomly Projected Energy Statistics (RPES). We can show RPES achieves nearly the same test power with energy statistics both theoretically and empirically. Numerical experiments also demonstrate the efficiency of the proposed method over the competitors. △ Less

Submitted 14 July, 2017; originally announced July 2017.

Comments: 27 pages, 6 figures

arXiv:1701.06054 [pdf, ps, other]

A Statistically and Numerically Efficient Independence Test based on Random Projections and Distance Covariance

Authors: Cheng Huang, Xiaoming Huo

Abstract: Test of independence plays a fundamental role in many statistical techniques. Among the nonparametric approaches, the distance-based methods (such as the distance correlation based hypotheses testing for independence) have numerous advantages, comparing with many other alternatives. A known limitation of the distance-based method is that its computational complexity can be high. In general, when t… ▽ More Test of independence plays a fundamental role in many statistical techniques. Among the nonparametric approaches, the distance-based methods (such as the distance correlation based hypotheses testing for independence) have numerous advantages, comparing with many other alternatives. A known limitation of the distance-based method is that its computational complexity can be high. In general, when the sample size is $n$, the order of computational complexity of a distance-based method, which typically requires computing of all pairwise distances, can be $O(n^2)$. Recent advances have discovered that in the {\it univariate} cases, a fast method with $O(n \log n)$ computational complexity and $O(n)$ memory requirement exists. In this paper, we introduces a test of independence method based on random projection and distance correlation, which achieves nearly the same power as the state-of-the-art distance-based approach, works in the {\it multivariate} cases, and enjoys the $O(n K \log n)$ computational complexity and $O(\max\{n,K\})$ memory requirement, where $K$ is the number of random projections. Note that saving is achieved when $K < n/\log n$. We name our method a Randomly Projected Distance Covariance (RPDC). The statistical theoretical analysis takes advantage of some techniques on random projection which are rooted in contemporary machine learning. Numerical experiments demonstrate the efficiency of the proposed method, in relative to several competitors. △ Less

Submitted 21 January, 2017; originally announced January 2017.

Comments: 52 pages, 8 figures, technical paper

MSC Class: Primary 62G10; 62H20; 62H15; secondary 62G20

arXiv:1511.01443 [pdf, ps, other]

A Distributed One-Step Estimator

Authors: Cheng Huang, Xiaoming Huo

Abstract: Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator. We propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the centr… ▽ More Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator. We propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the centralized estimator. The proposed one-step approach merely requires one additional round of communication in relative to the averaging estimator; so the extra communication burden is insignificant. In finite sample cases, numerical examples show that the proposed estimator outperforms the simple averaging estimator with a large margin in terms of the mean squared errors. A potential application of the one-step approach is that one can use multiple machines to speed up large scale statistical inference with little compromise in the quality of estimators. The proposed method becomes more valuable when data can only be available at distributed machines with limited communication bandwidth. △ Less

Submitted 10 November, 2015; v1 submitted 4 November, 2015; originally announced November 2015.

Comments: 31 pages

arXiv:1410.1503 [pdf, ps, other]

Fast Computing for Distance Covariance

Authors: Xiaoming Huo, Gabor J. Szekely

Abstract: Distance covariance and distance correlation have been widely adopted in measuring dependence of a pair of random variables or random vectors. If the computation of distance covariance and distance correlation is implemented directly accordingly to its definition then its computational complexity is O($n^2$) which is a disadvantage compared to other faster methods. In this paper we show that the c… ▽ More Distance covariance and distance correlation have been widely adopted in measuring dependence of a pair of random variables or random vectors. If the computation of distance covariance and distance correlation is implemented directly accordingly to its definition then its computational complexity is O($n^2$) which is a disadvantage compared to other faster methods. In this paper we show that the computation of distance covariance and distance correlation of real valued random variables can be implemented by an O(n log n) algorithm and this is comparable to other computationally efficient algorithms. The new formula we derive for an unbiased estimator for squared distance covariance turns out to be a U-statistic. This fact implies some nice asymptotic properties that were derived before via more complex methods. We apply the fast computing algorithm to some synthetic data. Our work will make distance correlation applicable to a much wider class of applications. △ Less

Submitted 6 October, 2014; originally announced October 2014.

Comments: 38 pages, 6 tables, 5 figures. arXiv admin note: text overlap with arXiv:1205.4701 by other authors

Showing 1–30 of 30 results for author: Huo, X