-
DeltaPhi: Learning Physical Trajectory Residual for PDE Solving
Authors:
Xihang Yue,
Linchao Zhu,
Yi Yang
Abstract:
Although neural operator networks theoretically approximate any operator map**, the limited generalization capability prevents them from learning correct physical dynamics when potential data biases exist, particularly in the practical PDE solving scenario where the available data amount is restricted or the resolution is extremely low. To address this issue, we propose and formulate the Physica…
▽ More
Although neural operator networks theoretically approximate any operator map**, the limited generalization capability prevents them from learning correct physical dynamics when potential data biases exist, particularly in the practical PDE solving scenario where the available data amount is restricted or the resolution is extremely low. To address this issue, we propose and formulate the Physical Trajectory Residual Learning (DeltaPhi), which learns to predict the physical residuals between the pending solved trajectory and a known similar auxiliary trajectory. First, we transform the direct operator map** between input-output function fields in original training data to residual operator map** between input function pairs and output function residuals. Next, we learn the surrogate model for the residual operator map** based on existing neural operator networks. Additionally, we design helpful customized auxiliary inputs for efficient optimization. Through extensive experiments, we conclude that, compared to direct learning, physical residual learning is preferred for PDE solving.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
S-SOS: Stochastic Sum-Of-Squares for Parametric Polynomial Optimization
Authors:
Richard L. Zhu,
Mathias Oster,
Yuehaw Khoo
Abstract:
Global polynomial optimization is an important tool across applied mathematics, with many applications in operations research, engineering, and physical sciences. In various settings, the polynomials depend on external parameters that may be random. We discuss a stochastic sum-of-squares (S-SOS) algorithm based on the sum-of squares hierarchy that constructs a series of semidefinite programs to jo…
▽ More
Global polynomial optimization is an important tool across applied mathematics, with many applications in operations research, engineering, and physical sciences. In various settings, the polynomials depend on external parameters that may be random. We discuss a stochastic sum-of-squares (S-SOS) algorithm based on the sum-of squares hierarchy that constructs a series of semidefinite programs to jointly find strict lower bounds on the global minimum and extract candidates for parameterized global minimizers. We prove quantitative convergence of the hierarchy as the degree increases and use it to solve unconstrained and constrained polynomial optimization problems parameterized by random variables. By employing $n$-body priors from condensed matter physics to induce sparsity, we can use S-SOS to produce solutions and uncertainty intervals for sensor network localization problems containing up to 40 variables and semidefinite matrix sizes surpassing $800 \times 800$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Nonparametric estimation of FBSDEs with random terminal time
Authors:
Shaolin Ji,
Chenyao Yu,
Linlin Zhu
Abstract:
This paper investigates the nonparametric estimation of the functional coefficients of the FBSDEs with random terminal time, including the local constant and local linear estimators. We provide complete two-dimensional asymptotics in both the time span and the sampling interval, allowing for the precise characterization of their distribution. Moreover, the empirical likelihood (EL) method to const…
▽ More
This paper investigates the nonparametric estimation of the functional coefficients of the FBSDEs with random terminal time, including the local constant and local linear estimators. We provide complete two-dimensional asymptotics in both the time span and the sampling interval, allowing for the precise characterization of their distribution. Moreover, the empirical likelihood (EL) method to construct the data-driven confidence intervals for these estimators is provided. Some numerical simulations investigate the finite-sample properties of the estimators and compare the performance of the EL method and the conventional method in constructing confidence intervals based on asymptotic normality.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Efficient Orthogonal Decomposition with Automatic Basis Extraction for Low-Rank Matrix Approximation
Authors:
Weijie Shen,
Weiwei Xu,
Lei Zhu
Abstract:
Low-rank matrix approximation play a ubiquitous role in various applications such as image processing, signal processing, and data analysis. Recently, random algorithms of low-rank matrix approximation have gained widespread adoption due to their speed, accuracy, and robustness, particularly in their improved implementation on modern computer architectures. Existing low-rank approximation algorith…
▽ More
Low-rank matrix approximation play a ubiquitous role in various applications such as image processing, signal processing, and data analysis. Recently, random algorithms of low-rank matrix approximation have gained widespread adoption due to their speed, accuracy, and robustness, particularly in their improved implementation on modern computer architectures. Existing low-rank approximation algorithms often require prior knowledge of the rank of the matrix, which is typically unknown. To address this bottleneck, we propose a low-rank approximation algorithm termed efficient orthogonal decomposition with automatic basis extraction (EOD-ABE) tailored for the scenario where the rank of the matrix is unknown. Notably, we introduce a randomized algorithm to automatically extract the basis that reveals the rank. The efficacy of the proposed algorithms is theoretically and numerically validated, demonstrating superior speed, accuracy, and robustness compared to existing methods. Furthermore, we apply the algorithms to image reconstruction, achieving remarkable results.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Optimal estimate of electromagnetic field concentration between two nearly-touching inclusions in the quasi-static regime
Authors:
Youjun Deng,
Hongyu Liu,
Liyan Zhu
Abstract:
We investigate the electromagnetic field concentration between two nearly-touching inclusions that possess high-contrast electric permittivities in the quasi-static regime. By using layer potential techniques and asymptotic analysis in the low-frequency regime, we derive low-frequency expansions that provide integral representations for the solutions of the Maxwell equations. For the leading-order…
▽ More
We investigate the electromagnetic field concentration between two nearly-touching inclusions that possess high-contrast electric permittivities in the quasi-static regime. By using layer potential techniques and asymptotic analysis in the low-frequency regime, we derive low-frequency expansions that provide integral representations for the solutions of the Maxwell equations. For the leading-order term $\bE_0$ of the asymptotic expansion of the electric field, we prove that it has the blow up order of $ε^{-1} |\ln ε|^{-1}$ within the radial geometry, where $ε$ signifies the asymptotic distance between the inclusions. By delicate analysis of the integral operators involved, we further prove the boundedness of the first-order term $\bE_1$. We also conduct extensive numerical experiments which not only corroborate the theoretical findings but also provide more discoveries on the field concentration in the general geometric setup. Our study provides the first treatment in the literature on field concentration between nearly-touching material inclusions for the full Maxwell system.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations
Authors:
Umut Şimşekli,
Mert Gürbüzbalaban,
Sinan Yıldırım,
Lingjiong Zhu
Abstract:
Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differenti…
▽ More
Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $α$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the Gaussian distribution. Considering the $(ε, δ)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clip** the iterates, our theory reveals that under mild assumptions, such a projection step is not actually necessary. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Euler-Maruyama schemes for stochastic differential equations driven by stable Lévy processes with i.i.d. stable components
Authors:
Thanh Dang,
Lingjiong Zhu
Abstract:
We study Euler-Maruyama numerical schemes of stochastic differential equations driven by stable Lévy processes with i.i.d. stable components. We obtain a uniform-in-time approximation error in Wasserstein distance. Our approximation error has a linear dependence on the stepsize, which is expected to be tight, as can be seen from an explicit calculation for the case of an Ornstein-Uhlenbeck process…
▽ More
We study Euler-Maruyama numerical schemes of stochastic differential equations driven by stable Lévy processes with i.i.d. stable components. We obtain a uniform-in-time approximation error in Wasserstein distance. Our approximation error has a linear dependence on the stepsize, which is expected to be tight, as can be seen from an explicit calculation for the case of an Ornstein-Uhlenbeck process. We also obtain a uniform-in-time approximation error when Pareto noises are used in the discretization scheme.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Convergence Analysis for General Probability Flow ODEs of Diffusion Models in Wasserstein Distances
Authors:
Xuefeng Gao,
Lingjiong Zhu
Abstract:
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the f…
▽ More
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Inverse conductivity problem with one measurement: Uniqueness of multi-layer structures
Authors:
Lingzheng Kong,
Youjun Deng,
Liyan Zhu
Abstract:
In this paper, we study the recovery of multi-layer structures in inverse conductivity problem by using one measurement. First, we define the concept of Generalized Polarization Tensors (GPTs) for multi-layered medium and show some important properties of the proposed GPTs. With the help of GPTs, we present the perturbation formula for general multi-layered medium. Then we derive the perturbed ele…
▽ More
In this paper, we study the recovery of multi-layer structures in inverse conductivity problem by using one measurement. First, we define the concept of Generalized Polarization Tensors (GPTs) for multi-layered medium and show some important properties of the proposed GPTs. With the help of GPTs, we present the perturbation formula for general multi-layered medium. Then we derive the perturbed electric potential for multi-layer concentric disks structure in terms of the so-called generalized polarization matrix, whose dimension is the same as the number of the layers. By delicate analysis, we derive an algebraic identity involving the geometric and material configurations of multi-layer concentric disks. This enables us to reconstruct the multi-layer structures by using only one partial-order measurement.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models
Authors:
Xuefeng Gao,
Hoang M. Nguyen,
Lingjiong Zhu
Abstract:
Score-based generative models (SGMs) is a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we establish convergence guarantees for a general class of SGMs in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distribution. We specialize our result to several concrete SGMs with specific choices of forwar…
▽ More
Score-based generative models (SGMs) is a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we establish convergence guarantees for a general class of SGMs in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distribution. We specialize our result to several concrete SGMs with specific choices of forward processes modelled by stochastic differential equations, and obtain an upper bound on the iteration complexity for each model, which demonstrates the impacts of different choices of the forward processes. We also provide a lower bound when the data distribution is Gaussian. Numerically, we experiment SGMs with different forward processes, some of which are newly proposed in this paper, for unconditional image generation on CIFAR-10. We find that the experimental results are in good agreement with our theoretical predictions on the iteration complexity, and the models with our newly proposed forward processes can outperform existing models.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Fluctuations and moderate deviations for the mean fields of Hawkes processes
Authors:
Fuqing Gao,
Yunshi Gao,
Lingjiong Zhu
Abstract:
The Hawkes process is a counting process that has self- and mutually-exciting features with many applications in various fields. In recent years, there have been many interests in the mean-field results of the Hawkes process and its extensions. It is known that the mean-field limit of a multivariate nonlinear Hawkes process is a time-inhomogeneous Poisson process. In this paper, we study the fluct…
▽ More
The Hawkes process is a counting process that has self- and mutually-exciting features with many applications in various fields. In recent years, there have been many interests in the mean-field results of the Hawkes process and its extensions. It is known that the mean-field limit of a multivariate nonlinear Hawkes process is a time-inhomogeneous Poisson process. In this paper, we study the fluctuations for the mean fields and the large deviations associated with the fluctuations, i.e., the moderate deviations.
△ Less
Submitted 29 July, 2023;
originally announced July 2023.
-
Byzantine-robust distributed one-step estimation
Authors:
Chuhan Wang,
Xuehu Zhu,
Lixing Zhu
Abstract:
This paper proposes a Robust One-Step Estimator(ROSE) to solve the Byzantine failure problem in distributed M-estimation when a moderate fraction of node machines experience Byzantine failures. To define ROSE, the algorithms use the robust Variance Reduced Median Of the Local(VRMOL) estimator to determine the initial parameter value for iteration, and communicate between the node machines and the…
▽ More
This paper proposes a Robust One-Step Estimator(ROSE) to solve the Byzantine failure problem in distributed M-estimation when a moderate fraction of node machines experience Byzantine failures. To define ROSE, the algorithms use the robust Variance Reduced Median Of the Local(VRMOL) estimator to determine the initial parameter value for iteration, and communicate between the node machines and the central processor in the Newton-Raphson iteration procedure to derive the robust VRMOL estimator of the gradient, and the Hessian matrix so as to obtain the final estimator. ROSE has higher asymptotic relative efficiency than general median estimators without increasing the order of computational complexity. Moreover, this estimator can also cope with the problems involving anomalous or missing samples on the central processor. We prove the asymptotic normality when the parameter dimension p diverges as the sample size goes to infinity, and under weaker assumptions, derive the convergence rate. Numerical simulations and a real data application are conducted to evidence the effectiveness and robustness of ROSE.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Rotation Group Synchronization via Quotient Manifold
Authors:
Linglingzhi Zhu,
Chong Li,
Anthony Man-Cho So
Abstract:
Rotation group $\mathcal{SO}(d)$ synchronization is an important inverse problem and has attracted intense attention from numerous application fields such as graph realization, computer vision, and robotics. In this paper, we focus on the least-squares estimator of rotation group synchronization with general additive noise models, which is a nonconvex optimization problem with manifold constraints…
▽ More
Rotation group $\mathcal{SO}(d)$ synchronization is an important inverse problem and has attracted intense attention from numerous application fields such as graph realization, computer vision, and robotics. In this paper, we focus on the least-squares estimator of rotation group synchronization with general additive noise models, which is a nonconvex optimization problem with manifold constraints. Unlike the phase/orthogonal group synchronization, there are limited provable approaches for solving rotation group synchronization. First, we derive improved estimation results of the least-squares/spectral estimator, illustrating the tightness and validating the existing relaxation methods of solving rotation group synchronization through the optimum of relaxed orthogonal group version under near-optimal noise level for exact recovery. Moreover, departing from the standard approach of utilizing the geometry of the ambient Euclidean space, we adopt an intrinsic Riemannian approach to study orthogonal/rotation group synchronization. Benefiting from a quotient geometric view, we prove the positive definite condition of quotient Riemannian Hessian around the optimum of orthogonal group synchronization problem, and consequently the Riemannian local error bound property is established to analyze the convergence rate properties of various Riemannian algorithms. As a simple and feasible method, the sequential convergence guarantee of the (quotient) Riemannian gradient method for solving orthogonal/rotation group synchronization problem is studied, and we derive its global linear convergence rate to the optimum with the spectral initialization. All results are deterministic without any probabilistic model.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Asymptotics for the Laplace transform of the time integral of the geometric Brownian motion
Authors:
Dan Pirjol,
Lingjiong Zhu
Abstract:
We present an asymptotic result for the Laplace transform of the time integral of the geometric Brownian motion $F(θ,T) = \mathbb{E}[e^{-θX_T}]$ with $X_T = \int_0^T e^{σW_s + ( a - \frac12 σ^2)s} ds$, which is exact in the limit $σ^2 T \to 0$ at fixed $σ^2 θT^2$ and $aT$. This asymptotic result is applied to pricing zero coupon bonds in the Dothan model of stochastic interest rates. The asymptoti…
▽ More
We present an asymptotic result for the Laplace transform of the time integral of the geometric Brownian motion $F(θ,T) = \mathbb{E}[e^{-θX_T}]$ with $X_T = \int_0^T e^{σW_s + ( a - \frac12 σ^2)s} ds$, which is exact in the limit $σ^2 T \to 0$ at fixed $σ^2 θT^2$ and $aT$. This asymptotic result is applied to pricing zero coupon bonds in the Dothan model of stochastic interest rates. The asymptotic result provides an approximation for bond prices which is in good agreement with numerical evaluations in a wide range of model parameters. As a side result we obtain the asymptotics for Asian option prices in the Black-Scholes model, taking into account interest rates and dividend yield contributions in the $σ^{2}T\to 0$ limit.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Authors:
Libin Zhu,
Chaoyue Liu,
Adityanarayanan Radhakrishnan,
Mikhail Belkin
Abstract:
In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD). We provide evidence that the spikes in the training loss of SGD are "catapults", an optimization phenomenon originally observed in GD with large learning rates in [Lewkowycz et al. 2020]. We empirically show that thes…
▽ More
In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD). We provide evidence that the spikes in the training loss of SGD are "catapults", an optimization phenomenon originally observed in GD with large learning rates in [Lewkowycz et al. 2020]. We empirically show that these catapults occur in a low-dimensional subspace spanned by the top eigenvectors of the tangent kernel, for both GD and SGD. Second, we posit an explanation for how catapults lead to better generalization by demonstrating that catapults promote feature learning by increasing alignment with the Average Gradient Outer Product (AGOP) of the true predictor. Furthermore, we demonstrate that a smaller batch size in SGD induces a larger number of catapults, thereby improving AGOP alignment and test performance.
△ Less
Submitted 5 June, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent
Authors:
Lingjiong Zhu,
Mert Gurbuzbalaban,
Anant Raj,
Umut Simsekli
Abstract:
Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically requir…
▽ More
Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires develo** Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time-uniform bounds -- which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel.
△ Less
Submitted 28 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees
Authors:
Shangyuan Liu,
Linglingzhi Zhu,
Anthony Man-Cho So
Abstract:
Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and des…
▽ More
Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and design a novel model (LogSpecT) and its practical formulation (rLogSpecT) to overcome this issue. Contrary to rSpecT, the novel practical model rLogSpecT is always feasible. Furthermore, we provide recovery guarantees of rLogSpecT, which are derived from modern optimization tools related to epi-convergence. These tools could be of independent interest and significant for various learning problems. To demonstrate the advantages of rLogSpecT in practice, a highly efficient algorithm based on the linearized alternating direction method of multipliers (L-ADMM) is proposed. The subproblems of L-ADMM admit closed-form solutions and the convergence is guaranteed. Extensive numerical results on both synthetic and real networks corroborate the stability and superiority of our proposed methods, underscoring their potential for various graph learning applications.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Ore Extension of Group-cograded Hopf Coquasigroups
Authors:
Lingli Zhu,
Bingbing **,
Huili Liu,
Tao Yang
Abstract:
The aim of this paper is the Ore extension of group-cograded Hopf coquasigroups. This paper first shows a categorical interpretation and some examples of group-cograded Hopf coquasigroups, and then gives a necessary and sufficient conditions for the Ore extensions of group-cograded Hopf coquasigroups to be group-cograded Hopf coquasigroups. Finally, a certain isomorphism between Ore extensions are…
▽ More
The aim of this paper is the Ore extension of group-cograded Hopf coquasigroups. This paper first shows a categorical interpretation and some examples of group-cograded Hopf coquasigroups, and then gives a necessary and sufficient conditions for the Ore extensions of group-cograded Hopf coquasigroups to be group-cograded Hopf coquasigroups. Finally, a certain isomorphism between Ore extensions are considered.
△ Less
Submitted 11 July, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Finite element and integral equation methods to conical diffraction by imperfectly conducting gratings
Authors:
Guanghui Hu,
Jiayi Zhang,
Linlin Zhu
Abstract:
In this paper we study the variational method and integral equation methods for a conical diffraction problem for imperfectly conducting gratings modeled by the impedance boundary value problem of the Helmholtz equation in periodic structures. We justify the strong ellipticity of the sesquilinear form corresponding to the variational formulation and prove the uniqueness of solutions at any frequen…
▽ More
In this paper we study the variational method and integral equation methods for a conical diffraction problem for imperfectly conducting gratings modeled by the impedance boundary value problem of the Helmholtz equation in periodic structures. We justify the strong ellipticity of the sesquilinear form corresponding to the variational formulation and prove the uniqueness of solutions at any frequency. Convergence of the finite element method using the transparent boundary condition (Dirichlet-to-Neumann map**) is verified. The boundary integral equation method is also discussed.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Well-posedness of grating diffraction problems for plane wave incidence: explicit dependence on wavenumbers and incident angles
Authors:
Linlin Zhu,
Guanghui Hu
Abstract:
Suppose that a plane wave is incident onto an impenetrable grating profile of Dirichlet or Impedance type or a penetrable grating. The grating interface is assumed to be given by a Lipschitz function in two dimensions. We derive stability estimate of the grating diffraction problem via variational method with an explicit dependence of solutions on the incident wavenumber and incident angle.
Suppose that a plane wave is incident onto an impenetrable grating profile of Dirichlet or Impedance type or a penetrable grating. The grating interface is assumed to be given by a Lipschitz function in two dimensions. We derive stability estimate of the grating diffraction problem via variational method with an explicit dependence of solutions on the incident wavenumber and incident angle.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Elastostatics with multi-layer metamaterial structures and an algebraic framework for polariton resonances
Authors:
Youjun Deng,
Lingzheng Kong,
Hongyu Liu,
Liyan Zhu
Abstract:
Multi-layer structures are ubiquitous in constructing metamaterial devices to realise various frontier applications including super-resolution imaging and invisibility cloaking. In this paper, we develop a general mathematical framework for studying elastostatics within multi-layer material structures in $\mathbb{R}^d$, $d=2,3$. The multi-layer structure is formed by concentric balls and each laye…
▽ More
Multi-layer structures are ubiquitous in constructing metamaterial devices to realise various frontier applications including super-resolution imaging and invisibility cloaking. In this paper, we develop a general mathematical framework for studying elastostatics within multi-layer material structures in $\mathbb{R}^d$, $d=2,3$. The multi-layer structure is formed by concentric balls and each layer is filled by either a regular elastic material or an elastic metamaterial. The number of layers can be arbitrary and the material parameters in each layer may be different from one another. In practice, the multi-layer structure can serve as the building block for various material devices. Considering the im**ement of an incident field on the multi-layer structure, we first derive the exact perturbed field in terms of an elastic momentum matrix, whose dimension is the same as the number of layers. By highly intricate and delicate analysis, we derive a comprehensive study of the spectral properties of the elastic momentum matrix. This enables us to establishe a handy algebraic framework for studying polariton resonances associated with multi-layer metamaterial structures, which forms the fundamental basis for many metamaterial applications.
△ Less
Submitted 27 January, 2023;
originally announced February 2023.
-
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize
Authors:
Mert Gürbüzbalaban,
Yuanhan Hu,
Umut Şimşekli,
Lingjiong Zhu
Abstract:
Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random s…
▽ More
Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called "tail-index") in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules.
△ Less
Submitted 29 August, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Large deviations for the mean-field limit of Hawkes processes
Authors:
Fuqing Gao,
Lingjiong Zhu
Abstract:
Hawkes processes are a class of simple point processes whose intensity depends on the past history, and is in general non-Markovian. Limit theorems for Hawkes processes in various asymptotic regimes have been studied in the literature. In this paper, we study a multidimensional nonlinear Hawkes process in the asymptotic regime when the dimension goes to infinity, whose mean-field limit is a time-i…
▽ More
Hawkes processes are a class of simple point processes whose intensity depends on the past history, and is in general non-Markovian. Limit theorems for Hawkes processes in various asymptotic regimes have been studied in the literature. In this paper, we study a multidimensional nonlinear Hawkes process in the asymptotic regime when the dimension goes to infinity, whose mean-field limit is a time-inhomogeneous Poisson process, and our main result is a large deviation principle for the mean-field limit.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees
Authors:
Landi Zhu,
Mert Gürbüzbalaban,
Andrzej Ruszczyński
Abstract:
We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single time-scale algorithm, involving two versions of the inner function value tracking: linearized tracking of a continuously differentiable loss function, and SPIDER…
▽ More
We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single time-scale algorithm, involving two versions of the inner function value tracking: linearized tracking of a continuously differentiable loss function, and SPIDER tracking of a weakly convex loss function. We adopt the norm of the gradient of the Moreau envelope as our measure of stationarity and show that the sample complexity of $\mathcal{O}(\varepsilon^{-3})$ is possible in both cases, with only the constant larger in the second case. Finally, we demonstrate the performance of our algorithm with a robust learning example and a weakly convex, non-smooth regression example.
△ Less
Submitted 9 June, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
A delayed dual risk model
Authors:
Lingjiong Zhu
Abstract:
In this paper, we study a dual risk model with delays in the spirit of Dassios-Zhao. When a new innovation occurs, there is a delay before the innovation turns into a profit. We obtain large initial surplus asymptotics for the ruin probability and ruin time distributions. For some special cases, we get closed-form formulas. Numerical illustrations will also be provided.
In this paper, we study a dual risk model with delays in the spirit of Dassios-Zhao. When a new innovation occurs, there is a delay before the innovation turns into a profit. We obtain large initial surplus asymptotics for the ruin probability and ruin time distributions. For some special cases, we get closed-form formulas. Numerical illustrations will also be provided.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
Combinatorial Properties for a Class of Simplicial Complexes Extended from Pseudo-fractal Scale-free Web
Authors:
Zixuan Xie,
Yucheng Wang,
Wanyue Xu,
Liwang Zhu,
Wei Li,
Zhongzhi Zhang
Abstract:
Simplicial complexes are a popular tool used to model higher-order interactions between elements of complex social and biological systems. In this paper, we study some combinatorial aspects of a class of simplicial complexes created by a graph product, which is an extension of the pseudo-fractal scale-free web. We determine explicitly the independence number, the domination number, and the chromat…
▽ More
Simplicial complexes are a popular tool used to model higher-order interactions between elements of complex social and biological systems. In this paper, we study some combinatorial aspects of a class of simplicial complexes created by a graph product, which is an extension of the pseudo-fractal scale-free web. We determine explicitly the independence number, the domination number, and the chromatic number. Moreover, we derive closed-form expressions for the number of acyclic orientations, the number of root-connected acyclic orientations, the number of spanning trees, as well as the number of perfect matchings for some particular cases.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization
Authors:
Taoli Zheng,
Linglingzhi Zhu,
Anthony Man-Cho So,
Jose Blanchet,
Jia** Li
Abstract:
Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verif…
▽ More
Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verifying these regularity conditions is challenging in practice. To meet this challenge, we propose a novel universally applicable single-loop algorithm, the doubly smoothed gradient descent ascent method (DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA with the same hyperparameters is able to uniformly solve nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided KŁ properties, achieving convergence with $\mathcal{O}(ε^{-4})$ complexity. Sharper (even optimal) iteration complexity can be obtained when the KŁ exponent is known. Specifically, under the one-sided KŁ condition with exponent $θ\in(0,1)$, DS-GDA converges with an iteration complexity of $\mathcal{O}(ε^{-2\max\{2θ,1\}})$. They all match the corresponding best results in the literature. Moreover, we show that DS-GDA is practically applicable to general nonconvex-nonconcave problems even without any regularity conditions, such as the PŁ condition, KŁ condition, or weak Minty variational inequalities condition. For various challenging nonconvex-nonconcave examples in the literature, including ``Forsaken'', ``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'', the proposed DS-GDA can all get rid of limit cycles. To the best of our knowledge, this is the first first-order algorithm to achieve convergence on all of these formidable problems.
△ Less
Submitted 30 October, 2023; v1 submitted 25 December, 2022;
originally announced December 2022.
-
On classification of singular matrix difference equations of mixed order
Authors:
Li Zhu,
Huaqing Sun,
Bing Xie
Abstract:
This paper is concerned with singular matrix difference equations of mixed order. The existence and uniqueness of initial value problems for these equations are derived, and then the classification of them is obtained with a similar classical Weyl's method by selecting a suitable quasi-difference. An equivalent characterization of this classification is given in terms of the number of linearly ind…
▽ More
This paper is concerned with singular matrix difference equations of mixed order. The existence and uniqueness of initial value problems for these equations are derived, and then the classification of them is obtained with a similar classical Weyl's method by selecting a suitable quasi-difference. An equivalent characterization of this classification is given in terms of the number of linearly independent square summable solutions of the equation. The influence of off-diagonal coefficients on the classification is illustrated by two examples. In particular, two limit point criteria are established in terms of coefficients of the equation.
△ Less
Submitted 24 December, 2022;
originally announced December 2022.
-
Braided crossed category over crossed group-cograded weak Hopf quasigroups
Authors:
Huili Liu,
Lingli Zhu,
Tao Yang
Abstract:
In this paper, we generalizing the main result in Liu[10] to weak Hopf coquasigroups case. We first define and study group-cograded weak Hopf quasigroups, which generalize both group-cograded Hopf quasigroups and weak Hopf group-coalgebras. Then we introduce the notion of p-Yetter-Drinfeld weak quasimodule over group-cograded weak Hopf quasigroups H. If the antipode of H is bijective, we show that…
▽ More
In this paper, we generalizing the main result in Liu[10] to weak Hopf coquasigroups case. We first define and study group-cograded weak Hopf quasigroups, which generalize both group-cograded Hopf quasigroups and weak Hopf group-coalgebras. Then we introduce the notion of p-Yetter-Drinfeld weak quasimodule over group-cograded weak Hopf quasigroups H. If the antipode of H is bijective, we show that the category YDWQ(H) of Yetter-Drinfeld weak quasimodules over H is a crossed category, and the subcategory YD(H) of Yetter-Drinfeld modules is a braided crossed category.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
A greedy randomized average block projection method for linear feasibility problems
Authors:
Lin Zhu,
Yuan Lei,
Jiaxin Xie
Abstract:
The randomized projection (RP) method is a simple iterative scheme for solving linear feasibility problems and has recently gained popularity due to its speed and low memory requirement. This paper develops an accelerated variant of the standard RP method by using two ingredients: the greedy probability criterion and the average block approach, and obtains a greedy randomized average block project…
▽ More
The randomized projection (RP) method is a simple iterative scheme for solving linear feasibility problems and has recently gained popularity due to its speed and low memory requirement. This paper develops an accelerated variant of the standard RP method by using two ingredients: the greedy probability criterion and the average block approach, and obtains a greedy randomized average block projection (GRABP) method for solving large-scale systems of linear inequalities. We prove that this method converges linearly in expectation under different choices of extrapolated stepsizes. Numerical experiments on both randomly generated and real-world data show the advantage of GRABP over several state-of-the-art solvers, such as the randomized projection (RP) method, the sampling Kaczmarz Motzkin (SKM) method, the generalized SKM (GSKM) method, and the Nesterov acceleration of SKM method.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Two Iterative algorithms for the matrix sign function based on the adaptive filtering technology
Authors:
Feng Wu,
Keqi Ye,
Li Zhu,
Yueling Zhao,
Jiqiang Hu,
Wanxie Zhong
Abstract:
In this paper, two new efficient algorithms for calculating the sign function of the large-scale sparse matrix are proposed by combining filtering algorithm with Newton method and Newton Schultz method respectively. Through the theoretical analysis of the error diffusion in the iterative process, we designed an adaptive filtering threshold, which can ensure that the filtering has little impact on…
▽ More
In this paper, two new efficient algorithms for calculating the sign function of the large-scale sparse matrix are proposed by combining filtering algorithm with Newton method and Newton Schultz method respectively. Through the theoretical analysis of the error diffusion in the iterative process, we designed an adaptive filtering threshold, which can ensure that the filtering has little impact on the iterative process and the calculation result. Numerical experiments are consistent with our theoretical analysis, which shows that the computational efficiency of our method is much better than that of Newton method and Newton Schultz method, and the computational error is of the same order of magnitude as that of the two methods.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Restricted Strong Convexity of Deep Learning Models with Smooth Activations
Authors:
Arindam Banerjee,
Pedro Cisneros-Velarde,
Libin Zhu,
Mikhail Belkin
Abstract:
We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with $L$ layers, $m$ width, and $σ_0^2$ initialization variance. First, for suitable…
▽ More
We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with $L$ layers, $m$ width, and $σ_0^2$ initialization variance. First, for suitable $σ_0^2$, we establish a $O(\frac{\text{poly}(L)}{\sqrt{m}})$ upper bound on the spectral norm of the Hessian of such models, considerably sharpening prior results. Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is $Ω(\frac{\text{poly}(L)}{\sqrt{m}})$ for the square loss. We also present results for more general losses. The RSC based analysis does not need the ``near initialization" perspective and guarantees geometric convergence for gradient descent (GD). To the best of our knowledge, ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models, thus becoming an alternative sufficient condition for convergence that does not depend on the widely-used Neural Tangent Kernel (NTK). We share preliminary experimental results supporting our theoretical advances.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis
Authors:
Jia** Li,
Linglingzhi Zhu,
Anthony Man-Cho So
Abstract:
Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle…
▽ More
Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $θ\in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $ε$-game-stationary points and $ε$-optimization-stationary points of the problems of interest in $\mathcal{O}(ε^{-2\max\{2θ,1\}})$ iterations. Furthermore, when $θ\in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(ε^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $θ=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.
△ Less
Submitted 26 July, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
On minimum contrast method for multivariate spatial point processes
Authors:
Lin Zhu,
Junho Yang,
Mikyoung Jun,
Scott Cook
Abstract:
Compared to widely used likelihood-based approaches, the minimum contrast (MC) method offers a computationally efficient method for estimation and inference of spatial point processes. These relative gains in computing time become more pronounced when analyzing complicated multivariate point process models. Despite this, there has been little exploration of the MC method for multivariate spatial p…
▽ More
Compared to widely used likelihood-based approaches, the minimum contrast (MC) method offers a computationally efficient method for estimation and inference of spatial point processes. These relative gains in computing time become more pronounced when analyzing complicated multivariate point process models. Despite this, there has been little exploration of the MC method for multivariate spatial point processes. Therefore, this article introduces a new MC method for parametric multivariate spatial point processes. A contrast function is computed based on the trace of the power of the difference between the conjectured $K$-function matrix and its nonparametric unbiased edge-corrected estimator. Under standard assumptions, we derive the asymptotic normality of our MC estimator. The performance of the proposed method is demonstrated through simulation studies of bivariate log-Gaussian Cox processes and five-variate product-shot-noise Cox processes.
△ Less
Submitted 2 July, 2024; v1 submitted 15 August, 2022;
originally announced August 2022.
-
A filtering technique for the matrix power series being near-sparse
Authors:
Feng Wu,
Li Zhu,
Yuelin Zhao,
Kailing Zhang
Abstract:
This work presents a new algorithm for matrix power series which is near-sparse, that is, there are a large number of near-zero elements in it. The proposed algorithm uses a filtering technique to improve the sparsity of the matrices involved in the calculation process of the Paterson-Stockmeyer (PS) scheme. Based on the error analysis considering the transaction error and the error introduced by…
▽ More
This work presents a new algorithm for matrix power series which is near-sparse, that is, there are a large number of near-zero elements in it. The proposed algorithm uses a filtering technique to improve the sparsity of the matrices involved in the calculation process of the Paterson-Stockmeyer (PS) scheme. Based on the error analysis considering the transaction error and the error introduced by filtering, the proposed algorithm can obtain similar accuracy as the original PS scheme but is more efficient than it. For the near-sparse matrix power series, the proposed method is also more efficient than the MATLAB built-in codes.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection
Authors:
Shang-En Huang,
Seth Pettie,
Leqi Zhu
Abstract:
Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected laten…
▽ More
Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected latency $2^{Θ(n)}$, a bound that has never been improved in this model with $f=\lfloor (n-1)/3 \rfloor$ corruptions.
In this paper we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corrupt $f<n/3$ parties, while incurring only polynomial latency with high probability. Our protocol follows earlier polynomial latency protocols of King and Saia and Huang, Pettie, and Zhu, which had suboptimal resilience, namely $f \approx n/10^9$ and $f<n/4$, respectively.
Resilience $f=(n-1)/3$ is uniquely difficult as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flip** protocol that eventually lets us flip a coin with an unambiguous outcome. In the beginning the influence of the Byzantine players is too powerful to overcome and they can essentially fix the coin's behavior at will. We guarantee that after just a polynomial number of executions of the coin-flip** protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can ``blacklist'' players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test of fraud detection.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
A new stable and avoiding inversion iteration for computing matrix square root
Authors:
Li Zhu,
Keqi Ye,
Yuelin Zhao,
Feng Wu,
Jiqiang Hu,
Wanxie Zhong
Abstract:
The objective of this research was to compute the principal matrix square root with sparse approximation. A new stable iterative scheme avoiding fully matrix inversion (SIAI) is provided. The analysis on the sparsity and error of the matrices involved during the iterative process is given. Based on the bandwidth and error analysis, a more efficient algorithm combining the SIAI with the filtering t…
▽ More
The objective of this research was to compute the principal matrix square root with sparse approximation. A new stable iterative scheme avoiding fully matrix inversion (SIAI) is provided. The analysis on the sparsity and error of the matrices involved during the iterative process is given. Based on the bandwidth and error analysis, a more efficient algorithm combining the SIAI with the filtering technique is proposed. The high computational efficiency and accuracy of the proposed method are demonstrated by computing the principal square roots of different matrices to reveal its applicability over the existing methods.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Quadratic models for understanding catapult dynamics of neural networks
Authors:
Libin Zhu,
Chaoyue Liu,
Adityanarayanan Radhakrishnan,
Mikhail Belkin
Abstract:
While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour o…
▽ More
While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.
△ Less
Submitted 1 May, 2024; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Authors:
Libin Zhu,
Chaoyue Liu,
Mikhail Belkin
Abstract:
In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum in-degree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and ge…
▽ More
In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum in-degree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.
△ Less
Submitted 7 June, 2023; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Heavy-Tail Phenomenon in Decentralized SGD
Authors:
Mert Gurbuzbalaban,
Yuanhan Hu,
Umut Simsekli,
Kun Yuan,
Lingjiong Zhu
Abstract:
Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in m…
▽ More
Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in modern machine learning applications. In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of decentralization on the tail behavior. We first show that, when the loss function at each computational node is twice continuously differentiable and strongly convex outside a compact region, the law of the DE-SGD iterates converges to a distribution with polynomially decaying (heavy) tails. To have a more explicit control on the tail exponent, we then consider the case where the loss at each node is a quadratic, and show that the tail-index can be estimated as a function of the step-size, batch-size, and the topological properties of the network of the computational nodes. Then, we provide theoretical and empirical results showing that DE-SGD has heavier tails than centralized SGD. We also compare DE-SGD to disconnected SGD where nodes distribute the data but do not communicate. Our theory uncovers an interesting interplay between the tails and the network structure: we identify two regimes of parameters (stepsize and network size), where DE-SGD can have lighter or heavier tails than disconnected SGD depending on the regime. Finally, to support our theoretical results, we provide numerical experiments conducted on both synthetic data and neural networks.
△ Less
Submitted 16 May, 2022; v1 submitted 13 May, 2022;
originally announced May 2022.
-
A Sampling Control Framework and Applications to Robust and Adaptive Control
Authors:
Lijun Zhu,
Zhiyong Chen
Abstract:
In this paper, we propose a novel sampling control framework based on the emulation technique where the sampling error is regarded as an auxiliary input to the emulated system. Utilizing the supremum norm of sampling error, the design of periodic sampling and event-triggered control law renders the error dynamics bounded-input-bounded-state (BIBS), and when coupled with system dynamics, achieves g…
▽ More
In this paper, we propose a novel sampling control framework based on the emulation technique where the sampling error is regarded as an auxiliary input to the emulated system. Utilizing the supremum norm of sampling error, the design of periodic sampling and event-triggered control law renders the error dynamics bounded-input-bounded-state (BIBS), and when coupled with system dynamics, achieves global or semi-global stabilization. The proposed framework is then extended to tackle the event-triggered and periodic sampling stabilization for a system where only partial state is available for feedback and the system is subject to parameter uncertainties. The proposed framework is further extended to solve two classes of event-triggered adaptive control problems where the emulated closed-loop system does not admit an input-to-state stability (ISS) Lyapunov function. For the first class of systems with linear parameterized uncertainties, even-triggered global adaptive stabilization is achieved without the global Lipschitz condition on nonlinearities as often required in the literature. For the second class of systems with uncertainties whose bound is unknown, the event-triggered adaptive (dynamic) gain controller is designed for the first time. Finally, theoretical results are verified by two numerical examples.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Weighted residual empirical processes, martingale transformations and model checking for regressions
Authors:
Falong Tan,
Xu Guo,
Lixing Zhu
Abstract:
In this paper we propose a new methodology for testing the parametric forms of the mean and variance functions based on weighted residual empirical processes and their martingale transformations in regression models. The dimensions of the parameter vectors can be divergent as the sample size goes to infinity. We then study the convergence of weighted residual empirical processes and their martinga…
▽ More
In this paper we propose a new methodology for testing the parametric forms of the mean and variance functions based on weighted residual empirical processes and their martingale transformations in regression models. The dimensions of the parameter vectors can be divergent as the sample size goes to infinity. We then study the convergence of weighted residual empirical processes and their martingale transformation under the null and alternative hypotheses in the diverging dimension setting. The proposed tests based on weighted residual empirical processes can detect local alternatives distinct from the null at the fastest possible rate of order $n^{-1/2}$ but are not asymptotically distribution-free. While the tests based on martingale transformed weighted residual empirical processes can be asymptotically distribution-free, yet, unexpectedly, can only detect the local alternatives converging to the null at a much slower rate of order $n^{-1/4}$, which is somewhat different from existing asymptotically distribution-free tests based on martingale transformations. As the tests based on the residual empirical process are not distribution-free, we propose a smooth residual bootstrap and verify the validity of its approximation in diverging dimension settings. Simulation studies and a real data example are conducted to illustrate the effectiveness of our tests.
△ Less
Submitted 29 January, 2022;
originally announced January 2022.
-
Yetter-Drinfeld modules for group-cograded Hopf quasigroups
Authors:
Huili Liu,
Tao Yang,
Lingli Zhu
Abstract:
Let $H$ be a crossed group-cograded Hopf quasigroup. We first introduce the notion of $p$-Yetter-Drinfeld quasimodule over $H$. If the antipode of $H$ is bijective, we show that the category $\mathscr Y\mathscr D\mathscr Q(H)$ of Yetter-Drinfeld quasimodules over $H$ is a crossed category, and the subcategory $\mathscr Y\mathscr D(H)$ of Yetter-Drinfeld modules is a braided crossed category.
Let $H$ be a crossed group-cograded Hopf quasigroup. We first introduce the notion of $p$-Yetter-Drinfeld quasimodule over $H$. If the antipode of $H$ is bijective, we show that the category $\mathscr Y\mathscr D\mathscr Q(H)$ of Yetter-Drinfeld quasimodules over $H$ is a crossed category, and the subcategory $\mathscr Y\mathscr D(H)$ of Yetter-Drinfeld modules is a braided crossed category.
△ Less
Submitted 28 December, 2021; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Orthogonal Group Synchronization with Incomplete Measurements: Error Bounds and Linear Convergence of the Generalized Power Method
Authors:
Linglingzhi Zhu,
**xin Wang,
Anthony Man-Cho So
Abstract:
Group synchronization refers to estimating a collection of group elements from the noisy pairwise measurements. Such a nonconvex problem has received much attention from numerous scientific fields including computer vision, robotics, and cryo-electron microscopy. In this paper, we focus on the orthogonal group synchronization problem with general additive noise models under incomplete measurements…
▽ More
Group synchronization refers to estimating a collection of group elements from the noisy pairwise measurements. Such a nonconvex problem has received much attention from numerous scientific fields including computer vision, robotics, and cryo-electron microscopy. In this paper, we focus on the orthogonal group synchronization problem with general additive noise models under incomplete measurements, which is much more general than the commonly considered setting of complete measurements. Characterizations of the orthogonal group synchronization problem are given from perspectives of optimality conditions as well as fixed points of the projected gradient ascent method which is also known as the generalized power method (GPM). It is well worth noting that these results still hold even without generative models. In the meantime, we derive the local error bound property for the orthogonal group synchronization problem which is useful for the convergence rate analysis of different algorithms and can be of independent interest. Finally, we prove the linear convergence result of the GPM to a global maximizer under a general additive noise model based on the established local error bound property. Our theoretical convergence result holds under several deterministic conditions which can cover certain cases with adversarial noise, and as an example we specialize it to the setting of the Erdös-Rényi measurement graph and Gaussian noise.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Lattice structure design optimization under localized linear buckling constraints
Authors:
Xingtong Yang,
Xinzhuo Hu,
Liangchao Zhu,
Ming Li
Abstract:
An optimization method for the design of multi-lattice structures satisfying local buckling constraints is proposed in this paper. First, the concept of free material optimization is introduced to find an optimal elastic tensor distribution among all feasible elastic continua. By approximating the elastic tensor under the buckling-containing constraint, a matching lattice structure is embedded in…
▽ More
An optimization method for the design of multi-lattice structures satisfying local buckling constraints is proposed in this paper. First, the concept of free material optimization is introduced to find an optimal elastic tensor distribution among all feasible elastic continua. By approximating the elastic tensor under the buckling-containing constraint, a matching lattice structure is embedded in each macro element. The stresses in local cells are especially introduced to obtain a better structure. Finally, the present method obtains a lattice structure with excellent overall stiffness and local buckling resistance, which enhances the structural mechanical properties.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Optimal prediction for kernel-based semi-functional linear regression
Authors:
Keli Guo,
Jun Fan,
Lixing Zhu
Abstract:
In this paper, we establish minimax optimal rates of convergence for prediction in a semi-functional linear model that consists of a functional component and a less smooth nonparametric component. Our results reveal that the smoother functional component can be learned with the minimax rate as if the nonparametric component were known. More specifically, a double-penalized least squares method is…
▽ More
In this paper, we establish minimax optimal rates of convergence for prediction in a semi-functional linear model that consists of a functional component and a less smooth nonparametric component. Our results reveal that the smoother functional component can be learned with the minimax rate as if the nonparametric component were known. More specifically, a double-penalized least squares method is adopted to estimate both the functional and nonparametric components within the framework of reproducing kernel Hilbert spaces. By virtue of the representer theorem, an efficient algorithm that requires no iterations is proposed to solve the corresponding optimization problem, where the regularization parameters are selected by the generalized cross validation criterion. Numerical studies are provided to demonstrate the effectiveness of the method and to verify the theoretical analysis.
△ Less
Submitted 29 October, 2021;
originally announced October 2021.
-
High-performance computation of the exponential of a large sparse matrix
Authors:
Feng Wu,
Kailing Zhang,
Li Zhu,
Jiayao Hu
Abstract:
Computation of the large sparse matrix exponential has been an important topic in many fields, such as network and finite-element analysis. The existing scaling and squaring algorithm (SSA) is not suitable for the computation of the large sparse matrix exponential as it requires greater memories and computational cost than is actually needed. By introducing two novel concepts, i.e., real bandwidth…
▽ More
Computation of the large sparse matrix exponential has been an important topic in many fields, such as network and finite-element analysis. The existing scaling and squaring algorithm (SSA) is not suitable for the computation of the large sparse matrix exponential as it requires greater memories and computational cost than is actually needed. By introducing two novel concepts, i.e., real bandwidth and bandwidth, to measure the sparsity of the matrix, the sparsity of the matrix exponential is analyzed. It is found that for every matrix computed in the squaring phase of the SSA, a corresponding sparse approximate matrix exists. To obtain the sparse approximate matrix, a new filtering technique in terms of forward error analysis is proposed. Combining the filtering technique with the idea of kee** track of the incremental part, a competitive algorithm is developed for the large sparse matrix exponential. The proposed method can primarily alleviate the over-scaling problem due to the filtering technique. Three sets of numerical experiments, including one large matrix with a dimension larger than 2e6 , are conducted. The numerical experiments show that, compared with the expm function in MATLAB, the proposed algorithm can provide higher accuracy at lower computational cost and with less memory.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Global-in-time L p -- L q estimates for solutions of the Kramers-Fokker-Planck equation
Authors:
Xue ** Wang,
Lu Zhu
Abstract:
In this work, we prove an optimal global-in-time L p --L q estimate for solutions to the Kramers-Fokker-Planck equation with short range potential in dimension three. Our result shows that the decay rate as t $\rightarrow$ +$\infty$ is the same as the heat equation in x-variables and the divergence rate as t $\rightarrow$ 0 + is related to the sub-ellipticity with loss of 1/3 derivatives of the Kr…
▽ More
In this work, we prove an optimal global-in-time L p --L q estimate for solutions to the Kramers-Fokker-Planck equation with short range potential in dimension three. Our result shows that the decay rate as t $\rightarrow$ +$\infty$ is the same as the heat equation in x-variables and the divergence rate as t $\rightarrow$ 0 + is related to the sub-ellipticity with loss of 1/3 derivatives of the Kramers-Fokker-Planck operator.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Authors:
Hongjian Wang,
Mert Gürbüzbalaban,
Lingjiong Zhu,
Umut Şimşekli,
Murat A. Erdogdu
Abstract:
Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide conver…
▽ More
Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the $p$-th moment of the noise exists for some $p\in [1,2)$, we first identify a condition on the Hessian, coined '$p$-positive (semi-)definiteness', that leads to an interesting interpolation between positive semi-definite matrices ($p=2$) and diagonally dominant matrices with non-negative diagonal entries ($p=1$). Under this condition, we then provide a convergence rate for the distance to the global optimum in $L^p$. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate $α$-stable random vector. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function or to the algorithm itself, as typically required in robust statistics. We demonstrate the implications of our results to applications such as linear regression and generalized linear models subject to heavy-tailed data.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
A multi-objective optimization framework for on-line ridesharing systems
Authors:
Hamed Javidi,
Dan Simon,
Ling Zhu,
Yan Wang
Abstract:
The ultimate goal of ridesharing systems is to matchtravelers who do not have a vehicle with those travelers whowant to share their vehicle. A good match can be found amongthose who have similar itineraries and time schedules. In thisway each rider can be served without any delay and also eachdriver can earn as much as possible without having too muchdeviation from their original route. We propose…
▽ More
The ultimate goal of ridesharing systems is to matchtravelers who do not have a vehicle with those travelers whowant to share their vehicle. A good match can be found amongthose who have similar itineraries and time schedules. In thisway each rider can be served without any delay and also eachdriver can earn as much as possible without having too muchdeviation from their original route. We propose an algorithmthat leverages biogeography-based optimization to solve a multi-objective optimization problem for online ridesharing. It isnecessary to solve the ridesharing problem as a multi-objectiveproblem since there are some important objectives that must beconsidered simultaneously. We test our algorithm by evaluatingperformance on the Bei**g ridesharing dataset. The simulationresults indicate that BBO provides competitive performancerelative to state-of-the-art ridesharing optimization algorithms.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.