Search | arXiv e-print repository

DeltaPhi: Learning Physical Trajectory Residual for PDE Solving

Authors: Xihang Yue, Linchao Zhu, Yi Yang

Abstract: Although neural operator networks theoretically approximate any operator map**, the limited generalization capability prevents them from learning correct physical dynamics when potential data biases exist, particularly in the practical PDE solving scenario where the available data amount is restricted or the resolution is extremely low. To address this issue, we propose and formulate the Physica… ▽ More Although neural operator networks theoretically approximate any operator map**, the limited generalization capability prevents them from learning correct physical dynamics when potential data biases exist, particularly in the practical PDE solving scenario where the available data amount is restricted or the resolution is extremely low. To address this issue, we propose and formulate the Physical Trajectory Residual Learning (DeltaPhi), which learns to predict the physical residuals between the pending solved trajectory and a known similar auxiliary trajectory. First, we transform the direct operator map** between input-output function fields in original training data to residual operator map** between input function pairs and output function residuals. Next, we learn the surrogate model for the residual operator map** based on existing neural operator networks. Additionally, we design helpful customized auxiliary inputs for efficient optimization. Through extensive experiments, we conclude that, compared to direct learning, physical residual learning is preferred for PDE solving. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08954 [pdf, other]

S-SOS: Stochastic Sum-Of-Squares for Parametric Polynomial Optimization

Authors: Richard L. Zhu, Mathias Oster, Yuehaw Khoo

Abstract: Global polynomial optimization is an important tool across applied mathematics, with many applications in operations research, engineering, and physical sciences. In various settings, the polynomials depend on external parameters that may be random. We discuss a stochastic sum-of-squares (S-SOS) algorithm based on the sum-of squares hierarchy that constructs a series of semidefinite programs to jo… ▽ More Global polynomial optimization is an important tool across applied mathematics, with many applications in operations research, engineering, and physical sciences. In various settings, the polynomials depend on external parameters that may be random. We discuss a stochastic sum-of-squares (S-SOS) algorithm based on the sum-of squares hierarchy that constructs a series of semidefinite programs to jointly find strict lower bounds on the global minimum and extract candidates for parameterized global minimizers. We prove quantitative convergence of the hierarchy as the degree increases and use it to solve unconstrained and constrained polynomial optimization problems parameterized by random variables. By employing $n$-body priors from condensed matter physics to induce sparsity, we can use S-SOS to produce solutions and uncertainty intervals for sensor network localization problems containing up to 40 variables and semidefinite matrix sizes surpassing $800 \times 800$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.13266 [pdf, other]

Nonparametric estimation of FBSDEs with random terminal time

Authors: Shaolin Ji, Chenyao Yu, Linlin Zhu

Abstract: This paper investigates the nonparametric estimation of the functional coefficients of the FBSDEs with random terminal time, including the local constant and local linear estimators. We provide complete two-dimensional asymptotics in both the time span and the sampling interval, allowing for the precise characterization of their distribution. Moreover, the empirical likelihood (EL) method to const… ▽ More This paper investigates the nonparametric estimation of the functional coefficients of the FBSDEs with random terminal time, including the local constant and local linear estimators. We provide complete two-dimensional asymptotics in both the time span and the sampling interval, allowing for the precise characterization of their distribution. Moreover, the empirical likelihood (EL) method to construct the data-driven confidence intervals for these estimators is provided. Some numerical simulations investigate the finite-sample properties of the estimators and compare the performance of the EL method and the conventional method in constructing confidence intervals based on asymptotic normality. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.17290 [pdf, ps, other]

Efficient Orthogonal Decomposition with Automatic Basis Extraction for Low-Rank Matrix Approximation

Authors: Weijie Shen, Weiwei Xu, Lei Zhu

Abstract: Low-rank matrix approximation play a ubiquitous role in various applications such as image processing, signal processing, and data analysis. Recently, random algorithms of low-rank matrix approximation have gained widespread adoption due to their speed, accuracy, and robustness, particularly in their improved implementation on modern computer architectures. Existing low-rank approximation algorith… ▽ More Low-rank matrix approximation play a ubiquitous role in various applications such as image processing, signal processing, and data analysis. Recently, random algorithms of low-rank matrix approximation have gained widespread adoption due to their speed, accuracy, and robustness, particularly in their improved implementation on modern computer architectures. Existing low-rank approximation algorithms often require prior knowledge of the rank of the matrix, which is typically unknown. To address this bottleneck, we propose a low-rank approximation algorithm termed efficient orthogonal decomposition with automatic basis extraction (EOD-ABE) tailored for the scenario where the rank of the matrix is unknown. Notably, we introduce a randomized algorithm to automatically extract the basis that reveals the rank. The efficacy of the proposed algorithms is theoretically and numerically validated, demonstrating superior speed, accuracy, and robustness compared to existing methods. Furthermore, we apply the algorithms to image reconstruction, achieving remarkable results. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2403.12697 [pdf, other]

Optimal estimate of electromagnetic field concentration between two nearly-touching inclusions in the quasi-static regime

Authors: Youjun Deng, Hongyu Liu, Liyan Zhu

Abstract: We investigate the electromagnetic field concentration between two nearly-touching inclusions that possess high-contrast electric permittivities in the quasi-static regime. By using layer potential techniques and asymptotic analysis in the low-frequency regime, we derive low-frequency expansions that provide integral representations for the solutions of the Maxwell equations. For the leading-order… ▽ More We investigate the electromagnetic field concentration between two nearly-touching inclusions that possess high-contrast electric permittivities in the quasi-static regime. By using layer potential techniques and asymptotic analysis in the low-frequency regime, we derive low-frequency expansions that provide integral representations for the solutions of the Maxwell equations. For the leading-order term $\bE_0$ of the asymptotic expansion of the electric field, we prove that it has the blow up order of $ε^{-1} |\ln ε|^{-1}$ within the radial geometry, where $ε$ signifies the asymptotic distance between the inclusions. By delicate analysis of the integral operators involved, we further prove the boundedness of the first-order term $\bE_1$. We also conduct extensive numerical experiments which not only corroborate the theoretical findings but also provide more discoveries on the field concentration in the general geometric setup. Our study provides the first treatment in the literature on field concentration between nearly-touching material inclusions for the full Maxwell system. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.02051 [pdf, other]

Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

Authors: Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yıldırım, Lingjiong Zhu

Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differenti… ▽ More Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $α$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the Gaussian distribution. Considering the $(ε, δ)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clip** the iterates, our theory reveals that under mild assumptions, such a projection step is not actually necessary. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.12502 [pdf, ps, other]

Euler-Maruyama schemes for stochastic differential equations driven by stable Lévy processes with i.i.d. stable components

Authors: Thanh Dang, Lingjiong Zhu

Abstract: We study Euler-Maruyama numerical schemes of stochastic differential equations driven by stable Lévy processes with i.i.d. stable components. We obtain a uniform-in-time approximation error in Wasserstein distance. Our approximation error has a linear dependence on the stepsize, which is expected to be tight, as can be seen from an explicit calculation for the case of an Ornstein-Uhlenbeck process… ▽ More We study Euler-Maruyama numerical schemes of stochastic differential equations driven by stable Lévy processes with i.i.d. stable components. We obtain a uniform-in-time approximation error in Wasserstein distance. Our approximation error has a linear dependence on the stepsize, which is expected to be tight, as can be seen from an explicit calculation for the case of an Ornstein-Uhlenbeck process. We also obtain a uniform-in-time approximation error when Pareto noises are used in the discretization scheme. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 73 pages

arXiv:2401.17958 [pdf, ps, other]

Convergence Analysis for General Probability Flow ODEs of Diffusion Models in Wasserstein Distances

Authors: Xuefeng Gao, Lingjiong Zhu

Abstract: Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the f… ▽ More Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 47 pages, 3 tables. arXiv admin note: text overlap with arXiv:2311.11003

arXiv:2312.02421 [pdf, ps, other]

Inverse conductivity problem with one measurement: Uniqueness of multi-layer structures

Authors: Lingzheng Kong, Youjun Deng, Liyan Zhu

Abstract: In this paper, we study the recovery of multi-layer structures in inverse conductivity problem by using one measurement. First, we define the concept of Generalized Polarization Tensors (GPTs) for multi-layered medium and show some important properties of the proposed GPTs. With the help of GPTs, we present the perturbation formula for general multi-layered medium. Then we derive the perturbed ele… ▽ More In this paper, we study the recovery of multi-layer structures in inverse conductivity problem by using one measurement. First, we define the concept of Generalized Polarization Tensors (GPTs) for multi-layered medium and show some important properties of the proposed GPTs. With the help of GPTs, we present the perturbation formula for general multi-layered medium. Then we derive the perturbed electric potential for multi-layer concentric disks structure in terms of the so-called generalized polarization matrix, whose dimension is the same as the number of the layers. By delicate analysis, we derive an algebraic identity involving the geometric and material configurations of multi-layer concentric disks. This enables us to reconstruct the multi-layer structures by using only one partial-order measurement. △ Less

Submitted 4 December, 2023; originally announced December 2023.

MSC Class: 31A25; 35J05; 86A20

arXiv:2311.11003 [pdf, other]

Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models

Authors: Xuefeng Gao, Hoang M. Nguyen, Lingjiong Zhu

Abstract: Score-based generative models (SGMs) is a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we establish convergence guarantees for a general class of SGMs in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distribution. We specialize our result to several concrete SGMs with specific choices of forwar… ▽ More Score-based generative models (SGMs) is a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we establish convergence guarantees for a general class of SGMs in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distribution. We specialize our result to several concrete SGMs with specific choices of forward processes modelled by stochastic differential equations, and obtain an upper bound on the iteration complexity for each model, which demonstrates the impacts of different choices of the forward processes. We also provide a lower bound when the data distribution is Gaussian. Numerically, we experiment SGMs with different forward processes, some of which are newly proposed in this paper, for unconditional image generation on CIFAR-10. We find that the experimental results are in good agreement with our theoretical predictions on the iteration complexity, and the models with our newly proposed forward processes can outperform existing models. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2307.15903 [pdf, ps, other]

Fluctuations and moderate deviations for the mean fields of Hawkes processes

Authors: Fuqing Gao, Yunshi Gao, Lingjiong Zhu

Abstract: The Hawkes process is a counting process that has self- and mutually-exciting features with many applications in various fields. In recent years, there have been many interests in the mean-field results of the Hawkes process and its extensions. It is known that the mean-field limit of a multivariate nonlinear Hawkes process is a time-inhomogeneous Poisson process. In this paper, we study the fluct… ▽ More The Hawkes process is a counting process that has self- and mutually-exciting features with many applications in various fields. In recent years, there have been many interests in the mean-field results of the Hawkes process and its extensions. It is known that the mean-field limit of a multivariate nonlinear Hawkes process is a time-inhomogeneous Poisson process. In this paper, we study the fluctuations for the mean fields and the large deviations associated with the fluctuations, i.e., the moderate deviations. △ Less

Submitted 29 July, 2023; originally announced July 2023.

Comments: 38 pages

arXiv:2307.07767 [pdf, ps, other]

Byzantine-robust distributed one-step estimation

Authors: Chuhan Wang, Xuehu Zhu, Lixing Zhu

Abstract: This paper proposes a Robust One-Step Estimator(ROSE) to solve the Byzantine failure problem in distributed M-estimation when a moderate fraction of node machines experience Byzantine failures. To define ROSE, the algorithms use the robust Variance Reduced Median Of the Local(VRMOL) estimator to determine the initial parameter value for iteration, and communicate between the node machines and the… ▽ More This paper proposes a Robust One-Step Estimator(ROSE) to solve the Byzantine failure problem in distributed M-estimation when a moderate fraction of node machines experience Byzantine failures. To define ROSE, the algorithms use the robust Variance Reduced Median Of the Local(VRMOL) estimator to determine the initial parameter value for iteration, and communicate between the node machines and the central processor in the Newton-Raphson iteration procedure to derive the robust VRMOL estimator of the gradient, and the Hessian matrix so as to obtain the final estimator. ROSE has higher asymptotic relative efficiency than general median estimators without increasing the order of computational complexity. Moreover, this estimator can also cope with the problems involving anomalous or missing samples on the central processor. We prove the asymptotic normality when the parameter dimension p diverges as the sample size goes to infinity, and under weaker assumptions, derive the convergence rate. Numerical simulations and a real data application are conducted to evidence the effectiveness and robustness of ROSE. △ Less

Submitted 15 July, 2023; originally announced July 2023.

arXiv:2306.12730 [pdf, other]

Rotation Group Synchronization via Quotient Manifold

Authors: Linglingzhi Zhu, Chong Li, Anthony Man-Cho So

Abstract: Rotation group $\mathcal{SO}(d)$ synchronization is an important inverse problem and has attracted intense attention from numerous application fields such as graph realization, computer vision, and robotics. In this paper, we focus on the least-squares estimator of rotation group synchronization with general additive noise models, which is a nonconvex optimization problem with manifold constraints… ▽ More Rotation group $\mathcal{SO}(d)$ synchronization is an important inverse problem and has attracted intense attention from numerous application fields such as graph realization, computer vision, and robotics. In this paper, we focus on the least-squares estimator of rotation group synchronization with general additive noise models, which is a nonconvex optimization problem with manifold constraints. Unlike the phase/orthogonal group synchronization, there are limited provable approaches for solving rotation group synchronization. First, we derive improved estimation results of the least-squares/spectral estimator, illustrating the tightness and validating the existing relaxation methods of solving rotation group synchronization through the optimum of relaxed orthogonal group version under near-optimal noise level for exact recovery. Moreover, departing from the standard approach of utilizing the geometry of the ambient Euclidean space, we adopt an intrinsic Riemannian approach to study orthogonal/rotation group synchronization. Benefiting from a quotient geometric view, we prove the positive definite condition of quotient Riemannian Hessian around the optimum of orthogonal group synchronization problem, and consequently the Riemannian local error bound property is established to analyze the convergence rate properties of various Riemannian algorithms. As a simple and feasible method, the sequential convergence guarantee of the (quotient) Riemannian gradient method for solving orthogonal/rotation group synchronization problem is studied, and we derive its global linear convergence rate to the optimum with the spectral initialization. All results are deterministic without any probabilistic model. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.09084 [pdf, other]

Asymptotics for the Laplace transform of the time integral of the geometric Brownian motion

Authors: Dan Pirjol, Lingjiong Zhu

Abstract: We present an asymptotic result for the Laplace transform of the time integral of the geometric Brownian motion $F(θ,T) = \mathbb{E}[e^{-θX_T}]$ with $X_T = \int_0^T e^{σW_s + ( a - \frac12 σ^2)s} ds$, which is exact in the limit $σ^2 T \to 0$ at fixed $σ^2 θT^2$ and $aT$. This asymptotic result is applied to pricing zero coupon bonds in the Dothan model of stochastic interest rates. The asymptoti… ▽ More We present an asymptotic result for the Laplace transform of the time integral of the geometric Brownian motion $F(θ,T) = \mathbb{E}[e^{-θX_T}]$ with $X_T = \int_0^T e^{σW_s + ( a - \frac12 σ^2)s} ds$, which is exact in the limit $σ^2 T \to 0$ at fixed $σ^2 θT^2$ and $aT$. This asymptotic result is applied to pricing zero coupon bonds in the Dothan model of stochastic interest rates. The asymptotic result provides an approximation for bond prices which is in good agreement with numerical evaluations in a wide range of model parameters. As a side result we obtain the asymptotics for Asian option prices in the Black-Scholes model, taking into account interest rates and dividend yield contributions in the $σ^{2}T\to 0$ limit. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 17 pages, 2 figures, 2 tables

Journal ref: Operations Research Letters 2023, Volume 51, 346-352

arXiv:2306.04815 [pdf, other]

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

Authors: Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

Abstract: In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD). We provide evidence that the spikes in the training loss of SGD are "catapults", an optimization phenomenon originally observed in GD with large learning rates in [Lewkowycz et al. 2020]. We empirically show that thes… ▽ More In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD). We provide evidence that the spikes in the training loss of SGD are "catapults", an optimization phenomenon originally observed in GD with large learning rates in [Lewkowycz et al. 2020]. We empirically show that these catapults occur in a low-dimensional subspace spanned by the top eigenvectors of the tangent kernel, for both GD and SGD. Second, we posit an explanation for how catapults lead to better generalization by demonstrating that catapults promote feature learning by increasing alignment with the Average Gradient Outer Product (AGOP) of the true predictor. Furthermore, we demonstrate that a smaller batch size in SGD induces a larger number of catapults, thereby improving AGOP alignment and test performance. △ Less

Submitted 5 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: ICML 2024

arXiv:2305.12056 [pdf, ps, other]

Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Authors: Lingjiong Zhu, Mert Gurbuzbalaban, Anant Raj, Umut Simsekli

Abstract: Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically requir… ▽ More Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires develo** Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time-uniform bounds -- which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel. △ Less

Submitted 28 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: 49 pages, NeurIPS 2023

arXiv:2305.01379 [pdf, ps, other]

LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees

Authors: Shangyuan Liu, Linglingzhi Zhu, Anthony Man-Cho So

Abstract: Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and des… ▽ More Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and design a novel model (LogSpecT) and its practical formulation (rLogSpecT) to overcome this issue. Contrary to rSpecT, the novel practical model rLogSpecT is always feasible. Furthermore, we provide recovery guarantees of rLogSpecT, which are derived from modern optimization tools related to epi-convergence. These tools could be of independent interest and significant for various learning problems. To demonstrate the advantages of rLogSpecT in practice, a highly efficient algorithm based on the linearized alternating direction method of multipliers (L-ADMM) is proposed. The subproblems of L-ADMM admit closed-form solutions and the convergence is guaranteed. Extensive numerical results on both synthetic and real networks corroborate the stability and superiority of our proposed methods, underscoring their potential for various graph learning applications. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2304.05602 [pdf, ps, other]

Ore Extension of Group-cograded Hopf Coquasigroups

Authors: Lingli Zhu, Bingbing **, Huili Liu, Tao Yang

Abstract: The aim of this paper is the Ore extension of group-cograded Hopf coquasigroups. This paper first shows a categorical interpretation and some examples of group-cograded Hopf coquasigroups, and then gives a necessary and sufficient conditions for the Ore extensions of group-cograded Hopf coquasigroups to be group-cograded Hopf coquasigroups. Finally, a certain isomorphism between Ore extensions are… ▽ More The aim of this paper is the Ore extension of group-cograded Hopf coquasigroups. This paper first shows a categorical interpretation and some examples of group-cograded Hopf coquasigroups, and then gives a necessary and sufficient conditions for the Ore extensions of group-cograded Hopf coquasigroups to be group-cograded Hopf coquasigroups. Finally, a certain isomorphism between Ore extensions are considered. △ Less

Submitted 11 July, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: 15pages

MSC Class: 16T05; 16S36

arXiv:2304.04434 [pdf, ps, other]

Finite element and integral equation methods to conical diffraction by imperfectly conducting gratings

Authors: Guanghui Hu, Jiayi Zhang, Linlin Zhu

Abstract: In this paper we study the variational method and integral equation methods for a conical diffraction problem for imperfectly conducting gratings modeled by the impedance boundary value problem of the Helmholtz equation in periodic structures. We justify the strong ellipticity of the sesquilinear form corresponding to the variational formulation and prove the uniqueness of solutions at any frequen… ▽ More In this paper we study the variational method and integral equation methods for a conical diffraction problem for imperfectly conducting gratings modeled by the impedance boundary value problem of the Helmholtz equation in periodic structures. We justify the strong ellipticity of the sesquilinear form corresponding to the variational formulation and prove the uniqueness of solutions at any frequency. Convergence of the finite element method using the transparent boundary condition (Dirichlet-to-Neumann map**) is verified. The boundary integral equation method is also discussed. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.04204 [pdf, ps, other]

Well-posedness of grating diffraction problems for plane wave incidence: explicit dependence on wavenumbers and incident angles

Authors: Linlin Zhu, Guanghui Hu

Abstract: Suppose that a plane wave is incident onto an impenetrable grating profile of Dirichlet or Impedance type or a penetrable grating. The grating interface is assumed to be given by a Lipschitz function in two dimensions. We derive stability estimate of the grating diffraction problem via variational method with an explicit dependence of solutions on the incident wavenumber and incident angle. Suppose that a plane wave is incident onto an impenetrable grating profile of Dirichlet or Impedance type or a penetrable grating. The grating interface is assumed to be given by a Lipschitz function in two dimensions. We derive stability estimate of the grating diffraction problem via variational method with an explicit dependence of solutions on the incident wavenumber and incident angle. △ Less

Submitted 9 April, 2023; originally announced April 2023.

arXiv:2302.13983 [pdf, other]

Elastostatics with multi-layer metamaterial structures and an algebraic framework for polariton resonances

Authors: Youjun Deng, Lingzheng Kong, Hongyu Liu, Liyan Zhu

Abstract: Multi-layer structures are ubiquitous in constructing metamaterial devices to realise various frontier applications including super-resolution imaging and invisibility cloaking. In this paper, we develop a general mathematical framework for studying elastostatics within multi-layer material structures in $\mathbb{R}^d$, $d=2,3$. The multi-layer structure is formed by concentric balls and each laye… ▽ More Multi-layer structures are ubiquitous in constructing metamaterial devices to realise various frontier applications including super-resolution imaging and invisibility cloaking. In this paper, we develop a general mathematical framework for studying elastostatics within multi-layer material structures in $\mathbb{R}^d$, $d=2,3$. The multi-layer structure is formed by concentric balls and each layer is filled by either a regular elastic material or an elastic metamaterial. The number of layers can be arbitrary and the material parameters in each layer may be different from one another. In practice, the multi-layer structure can serve as the building block for various material devices. Considering the im**ement of an incident field on the multi-layer structure, we first derive the exact perturbed field in terms of an elastic momentum matrix, whose dimension is the same as the number of layers. By highly intricate and delicate analysis, we derive a comprehensive study of the spectral properties of the elastic momentum matrix. This enables us to establishe a handy algebraic framework for studying polariton resonances associated with multi-layer metamaterial structures, which forms the fundamental basis for many metamaterial applications. △ Less

Submitted 27 January, 2023; originally announced February 2023.

arXiv:2302.05516 [pdf, other]

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Authors: Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

Abstract: Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random s… ▽ More Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called "tail-index") in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules. △ Less

Submitted 29 August, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: To Appear

Journal ref: Transactions of Machine Learning Research, 2023

arXiv:2301.07585 [pdf, ps, other]

Large deviations for the mean-field limit of Hawkes processes

Authors: Fuqing Gao, Lingjiong Zhu

Abstract: Hawkes processes are a class of simple point processes whose intensity depends on the past history, and is in general non-Markovian. Limit theorems for Hawkes processes in various asymptotic regimes have been studied in the literature. In this paper, we study a multidimensional nonlinear Hawkes process in the asymptotic regime when the dimension goes to infinity, whose mean-field limit is a time-i… ▽ More Hawkes processes are a class of simple point processes whose intensity depends on the past history, and is in general non-Markovian. Limit theorems for Hawkes processes in various asymptotic regimes have been studied in the literature. In this paper, we study a multidimensional nonlinear Hawkes process in the asymptotic regime when the dimension goes to infinity, whose mean-field limit is a time-inhomogeneous Poisson process, and our main result is a large deviation principle for the mean-field limit. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 34 pages

arXiv:2301.06619 [pdf, other]

Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees

Authors: Landi Zhu, Mert Gürbüzbalaban, Andrzej Ruszczyński

Abstract: We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single time-scale algorithm, involving two versions of the inner function value tracking: linearized tracking of a continuously differentiable loss function, and SPIDER… ▽ More We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single time-scale algorithm, involving two versions of the inner function value tracking: linearized tracking of a continuously differentiable loss function, and SPIDER tracking of a weakly convex loss function. We adopt the norm of the gradient of the Moreau envelope as our measure of stationarity and show that the sample complexity of $\mathcal{O}(\varepsilon^{-3})$ is possible in both cases, with only the constant larger in the second case. Finally, we demonstrate the performance of our algorithm with a robust learning example and a weakly convex, non-smooth regression example. △ Less

Submitted 9 June, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

arXiv:2301.06450 [pdf, other]

A delayed dual risk model

Authors: Lingjiong Zhu

Abstract: In this paper, we study a dual risk model with delays in the spirit of Dassios-Zhao. When a new innovation occurs, there is a delay before the innovation turns into a profit. We obtain large initial surplus asymptotics for the ruin probability and ruin time distributions. For some special cases, we get closed-form formulas. Numerical illustrations will also be provided. In this paper, we study a dual risk model with delays in the spirit of Dassios-Zhao. When a new innovation occurs, there is a delay before the innovation turns into a profit. We obtain large initial surplus asymptotics for the ruin probability and ruin time distributions. For some special cases, we get closed-form formulas. Numerical illustrations will also be provided. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: 17 pages, 2 figures, 2 tables

Journal ref: Stochastic Models 33(1), 149-170, 2017

arXiv:2301.03230 [pdf, ps, other]

doi 10.1142/S0218348X23500226

Combinatorial Properties for a Class of Simplicial Complexes Extended from Pseudo-fractal Scale-free Web

Authors: Zixuan Xie, Yucheng Wang, Wanyue Xu, Liwang Zhu, Wei Li, Zhongzhi Zhang

Abstract: Simplicial complexes are a popular tool used to model higher-order interactions between elements of complex social and biological systems. In this paper, we study some combinatorial aspects of a class of simplicial complexes created by a graph product, which is an extension of the pseudo-fractal scale-free web. We determine explicitly the independence number, the domination number, and the chromat… ▽ More Simplicial complexes are a popular tool used to model higher-order interactions between elements of complex social and biological systems. In this paper, we study some combinatorial aspects of a class of simplicial complexes created by a graph product, which is an extension of the pseudo-fractal scale-free web. We determine explicitly the independence number, the domination number, and the chromatic number. Moreover, we derive closed-form expressions for the number of acyclic orientations, the number of root-connected acyclic orientations, the number of spanning trees, as well as the number of perfect matchings for some particular cases. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: accepted by Fractals

arXiv:2212.12978 [pdf, other]

Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization

Authors: Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet, Jia** Li

Abstract: Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verif… ▽ More Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verifying these regularity conditions is challenging in practice. To meet this challenge, we propose a novel universally applicable single-loop algorithm, the doubly smoothed gradient descent ascent method (DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA with the same hyperparameters is able to uniformly solve nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided KŁ properties, achieving convergence with $\mathcal{O}(ε^{-4})$ complexity. Sharper (even optimal) iteration complexity can be obtained when the KŁ exponent is known. Specifically, under the one-sided KŁ condition with exponent $θ\in(0,1)$, DS-GDA converges with an iteration complexity of $\mathcal{O}(ε^{-2\max\{2θ,1\}})$. They all match the corresponding best results in the literature. Moreover, we show that DS-GDA is practically applicable to general nonconvex-nonconcave problems even without any regularity conditions, such as the PŁ condition, KŁ condition, or weak Minty variational inequalities condition. For various challenging nonconvex-nonconcave examples in the literature, including ``Forsaken'', ``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'', the proposed DS-GDA can all get rid of limit cycles. To the best of our knowledge, this is the first first-order algorithm to achieve convergence on all of these formidable problems. △ Less

Submitted 30 October, 2023; v1 submitted 25 December, 2022; originally announced December 2022.

arXiv:2212.12708 [pdf, ps, other]

On classification of singular matrix difference equations of mixed order

Authors: Li Zhu, Huaqing Sun, Bing Xie

Abstract: This paper is concerned with singular matrix difference equations of mixed order. The existence and uniqueness of initial value problems for these equations are derived, and then the classification of them is obtained with a similar classical Weyl's method by selecting a suitable quasi-difference. An equivalent characterization of this classification is given in terms of the number of linearly ind… ▽ More This paper is concerned with singular matrix difference equations of mixed order. The existence and uniqueness of initial value problems for these equations are derived, and then the classification of them is obtained with a similar classical Weyl's method by selecting a suitable quasi-difference. An equivalent characterization of this classification is given in terms of the number of linearly independent square summable solutions of the equation. The influence of off-diagonal coefficients on the classification is illustrated by two examples. In particular, two limit point criteria are established in terms of coefficients of the equation. △ Less

Submitted 24 December, 2022; originally announced December 2022.

Comments: 27 pages

MSC Class: 34B20; 39A27

arXiv:2212.00363 [pdf, ps, other]

Braided crossed category over crossed group-cograded weak Hopf quasigroups

Authors: Huili Liu, Lingli Zhu, Tao Yang

Abstract: In this paper, we generalizing the main result in Liu[10] to weak Hopf coquasigroups case. We first define and study group-cograded weak Hopf quasigroups, which generalize both group-cograded Hopf quasigroups and weak Hopf group-coalgebras. Then we introduce the notion of p-Yetter-Drinfeld weak quasimodule over group-cograded weak Hopf quasigroups H. If the antipode of H is bijective, we show that… ▽ More In this paper, we generalizing the main result in Liu[10] to weak Hopf coquasigroups case. We first define and study group-cograded weak Hopf quasigroups, which generalize both group-cograded Hopf quasigroups and weak Hopf group-coalgebras. Then we introduce the notion of p-Yetter-Drinfeld weak quasimodule over group-cograded weak Hopf quasigroups H. If the antipode of H is bijective, we show that the category YDWQ(H) of Yetter-Drinfeld weak quasimodules over H is a crossed category, and the subcategory YD(H) of Yetter-Drinfeld modules is a braided crossed category. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 19pages

MSC Class: 16T05; 17A01; 18M15

arXiv:2211.10331 [pdf, ps, other]

A greedy randomized average block projection method for linear feasibility problems

Authors: Lin Zhu, Yuan Lei, Jiaxin Xie

Abstract: The randomized projection (RP) method is a simple iterative scheme for solving linear feasibility problems and has recently gained popularity due to its speed and low memory requirement. This paper develops an accelerated variant of the standard RP method by using two ingredients: the greedy probability criterion and the average block approach, and obtains a greedy randomized average block project… ▽ More The randomized projection (RP) method is a simple iterative scheme for solving linear feasibility problems and has recently gained popularity due to its speed and low memory requirement. This paper develops an accelerated variant of the standard RP method by using two ingredients: the greedy probability criterion and the average block approach, and obtains a greedy randomized average block projection (GRABP) method for solving large-scale systems of linear inequalities. We prove that this method converges linearly in expectation under different choices of extrapolated stepsizes. Numerical experiments on both randomly generated and real-world data show the advantage of GRABP over several state-of-the-art solvers, such as the randomized projection (RP) method, the sampling Kaczmarz Motzkin (SKM) method, the generalized SKM (GSKM) method, and the Nesterov acceleration of SKM method. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: 21 pages

arXiv:2210.03356 [pdf]

Two Iterative algorithms for the matrix sign function based on the adaptive filtering technology

Authors: Feng Wu, Keqi Ye, Li Zhu, Yueling Zhao, Jiqiang Hu, Wanxie Zhong

Abstract: In this paper, two new efficient algorithms for calculating the sign function of the large-scale sparse matrix are proposed by combining filtering algorithm with Newton method and Newton Schultz method respectively. Through the theoretical analysis of the error diffusion in the iterative process, we designed an adaptive filtering threshold, which can ensure that the filtering has little impact on… ▽ More In this paper, two new efficient algorithms for calculating the sign function of the large-scale sparse matrix are proposed by combining filtering algorithm with Newton method and Newton Schultz method respectively. Through the theoretical analysis of the error diffusion in the iterative process, we designed an adaptive filtering threshold, which can ensure that the filtering has little impact on the iterative process and the calculation result. Numerical experiments are consistent with our theoretical analysis, which shows that the computational efficiency of our method is much better than that of Newton method and Newton Schultz method, and the computational error is of the same order of magnitude as that of the two methods. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: 18 pages,12 figures

MSC Class: 65F30; 15A15

arXiv:2209.15106 [pdf, other]

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

Authors: Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Mikhail Belkin

Abstract: We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with $L$ layers, $m$ width, and $σ_0^2$ initialization variance. First, for suitable… ▽ More We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with $L$ layers, $m$ width, and $σ_0^2$ initialization variance. First, for suitable $σ_0^2$, we establish a $O(\frac{\text{poly}(L)}{\sqrt{m}})$ upper bound on the spectral norm of the Hessian of such models, considerably sharpening prior results. Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is $Ω(\frac{\text{poly}(L)}{\sqrt{m}})$ for the square loss. We also present results for more general losses. The RSC based analysis does not need the ``near initialization" perspective and guarantees geometric convergence for gradient descent (GD). To the best of our knowledge, ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models, thus becoming an alternative sufficient condition for convergence that does not depend on the widely-used Neural Tangent Kernel (NTK). We share preliminary experimental results supporting our theoretical advances. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.10825 [pdf, other]

Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis

Authors: Jia** Li, Linglingzhi Zhu, Anthony Man-Cho So

Abstract: Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle… ▽ More Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $θ\in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $ε$-game-stationary points and $ε$-optimization-stationary points of the problems of interest in $\mathcal{O}(ε^{-2\max\{2θ,1\}})$ iterations. Furthermore, when $θ\in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(ε^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $θ=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest. △ Less

Submitted 26 July, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

arXiv:2208.07044 [pdf, other]

On minimum contrast method for multivariate spatial point processes

Authors: Lin Zhu, Junho Yang, Mikyoung Jun, Scott Cook

Abstract: Compared to widely used likelihood-based approaches, the minimum contrast (MC) method offers a computationally efficient method for estimation and inference of spatial point processes. These relative gains in computing time become more pronounced when analyzing complicated multivariate point process models. Despite this, there has been little exploration of the MC method for multivariate spatial p… ▽ More Compared to widely used likelihood-based approaches, the minimum contrast (MC) method offers a computationally efficient method for estimation and inference of spatial point processes. These relative gains in computing time become more pronounced when analyzing complicated multivariate point process models. Despite this, there has been little exploration of the MC method for multivariate spatial point processes. Therefore, this article introduces a new MC method for parametric multivariate spatial point processes. A contrast function is computed based on the trace of the power of the difference between the conjectured $K$-function matrix and its nonparametric unbiased edge-corrected estimator. Under standard assumptions, we derive the asymptotic normality of our MC estimator. The performance of the proposed method is demonstrated through simulation studies of bivariate log-Gaussian Cox processes and five-variate product-shot-noise Cox processes. △ Less

Submitted 2 July, 2024; v1 submitted 15 August, 2022; originally announced August 2022.

arXiv:2208.05683 [pdf]

A filtering technique for the matrix power series being near-sparse

Authors: Feng Wu, Li Zhu, Yuelin Zhao, Kailing Zhang

Abstract: This work presents a new algorithm for matrix power series which is near-sparse, that is, there are a large number of near-zero elements in it. The proposed algorithm uses a filtering technique to improve the sparsity of the matrices involved in the calculation process of the Paterson-Stockmeyer (PS) scheme. Based on the error analysis considering the transaction error and the error introduced by… ▽ More This work presents a new algorithm for matrix power series which is near-sparse, that is, there are a large number of near-zero elements in it. The proposed algorithm uses a filtering technique to improve the sparsity of the matrices involved in the calculation process of the Paterson-Stockmeyer (PS) scheme. Based on the error analysis considering the transaction error and the error introduced by filtering, the proposed algorithm can obtain similar accuracy as the original PS scheme but is more efficient than it. For the near-sparse matrix power series, the proposed method is also more efficient than the MATLAB built-in codes. △ Less

Submitted 11 August, 2022; originally announced August 2022.

arXiv:2206.15335 [pdf, ps, other]

Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection

Authors: Shang-En Huang, Seth Pettie, Leqi Zhu

Abstract: Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected laten… ▽ More Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected latency $2^{Θ(n)}$, a bound that has never been improved in this model with $f=\lfloor (n-1)/3 \rfloor$ corruptions. In this paper we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corrupt $f<n/3$ parties, while incurring only polynomial latency with high probability. Our protocol follows earlier polynomial latency protocols of King and Saia and Huang, Pettie, and Zhu, which had suboptimal resilience, namely $f \approx n/10^9$ and $f<n/4$, respectively. Resilience $f=(n-1)/3$ is uniquely difficult as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flip** protocol that eventually lets us flip a coin with an unambiguous outcome. In the beginning the influence of the Byzantine players is too powerful to overcome and they can essentially fix the coin's behavior at will. We guarantee that after just a polynomial number of executions of the coin-flip** protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can ``blacklist'' players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test of fraud detection. △ Less

Submitted 30 June, 2022; originally announced June 2022.

arXiv:2206.10346 [pdf, other]

A new stable and avoiding inversion iteration for computing matrix square root

Authors: Li Zhu, Keqi Ye, Yuelin Zhao, Feng Wu, Jiqiang Hu, Wanxie Zhong

Abstract: The objective of this research was to compute the principal matrix square root with sparse approximation. A new stable iterative scheme avoiding fully matrix inversion (SIAI) is provided. The analysis on the sparsity and error of the matrices involved during the iterative process is given. Based on the bandwidth and error analysis, a more efficient algorithm combining the SIAI with the filtering t… ▽ More The objective of this research was to compute the principal matrix square root with sparse approximation. A new stable iterative scheme avoiding fully matrix inversion (SIAI) is provided. The analysis on the sparsity and error of the matrices involved during the iterative process is given. Based on the bandwidth and error analysis, a more efficient algorithm combining the SIAI with the filtering technique is proposed. The high computational efficiency and accuracy of the proposed method are demonstrated by computing the principal square roots of different matrices to reveal its applicability over the existing methods. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: 19 pages, 3 figures

arXiv:2205.11787 [pdf, other]

Quadratic models for understanding catapult dynamics of neural networks

Authors: Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

Abstract: While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour o… ▽ More While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks. △ Less

Submitted 1 May, 2024; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: accepted in ICLR 2024; changed the title

arXiv:2205.11786 [pdf, other]

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Authors: Libin Zhu, Chaoyue Liu, Mikhail Belkin

Abstract: In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum in-degree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and ge… ▽ More In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum in-degree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures. △ Less

Submitted 7 June, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022

arXiv:2205.06689 [pdf, other]

Heavy-Tail Phenomenon in Decentralized SGD

Authors: Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu

Abstract: Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in m… ▽ More Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in modern machine learning applications. In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of decentralization on the tail behavior. We first show that, when the loss function at each computational node is twice continuously differentiable and strongly convex outside a compact region, the law of the DE-SGD iterates converges to a distribution with polynomially decaying (heavy) tails. To have a more explicit control on the tail exponent, we then consider the case where the loss at each node is a quadratic, and show that the tail-index can be estimated as a function of the step-size, batch-size, and the topological properties of the network of the computational nodes. Then, we provide theoretical and empirical results showing that DE-SGD has heavier tails than centralized SGD. We also compare DE-SGD to disconnected SGD where nodes distribute the data but do not communicate. Our theory uncovers an interesting interplay between the tails and the network structure: we identify two regimes of parameters (stepsize and network size), where DE-SGD can have lighter or heavier tails than disconnected SGD depending on the regime. Finally, to support our theoretical results, we provide numerical experiments conducted on both synthetic data and neural networks. △ Less

Submitted 16 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

arXiv:2204.13855 [pdf, ps, other]

A Sampling Control Framework and Applications to Robust and Adaptive Control

Authors: Lijun Zhu, Zhiyong Chen

Abstract: In this paper, we propose a novel sampling control framework based on the emulation technique where the sampling error is regarded as an auxiliary input to the emulated system. Utilizing the supremum norm of sampling error, the design of periodic sampling and event-triggered control law renders the error dynamics bounded-input-bounded-state (BIBS), and when coupled with system dynamics, achieves g… ▽ More In this paper, we propose a novel sampling control framework based on the emulation technique where the sampling error is regarded as an auxiliary input to the emulated system. Utilizing the supremum norm of sampling error, the design of periodic sampling and event-triggered control law renders the error dynamics bounded-input-bounded-state (BIBS), and when coupled with system dynamics, achieves global or semi-global stabilization. The proposed framework is then extended to tackle the event-triggered and periodic sampling stabilization for a system where only partial state is available for feedback and the system is subject to parameter uncertainties. The proposed framework is further extended to solve two classes of event-triggered adaptive control problems where the emulated closed-loop system does not admit an input-to-state stability (ISS) Lyapunov function. For the first class of systems with linear parameterized uncertainties, even-triggered global adaptive stabilization is achieved without the global Lipschitz condition on nonlinearities as often required in the literature. For the second class of systems with uncertainties whose bound is unknown, the event-triggered adaptive (dynamic) gain controller is designed for the first time. Finally, theoretical results are verified by two numerical examples. △ Less

Submitted 28 April, 2022; originally announced April 2022.

arXiv:2201.12537 [pdf, ps, other]

Weighted residual empirical processes, martingale transformations and model checking for regressions

Authors: Falong Tan, Xu Guo, Lixing Zhu

Abstract: In this paper we propose a new methodology for testing the parametric forms of the mean and variance functions based on weighted residual empirical processes and their martingale transformations in regression models. The dimensions of the parameter vectors can be divergent as the sample size goes to infinity. We then study the convergence of weighted residual empirical processes and their martinga… ▽ More In this paper we propose a new methodology for testing the parametric forms of the mean and variance functions based on weighted residual empirical processes and their martingale transformations in regression models. The dimensions of the parameter vectors can be divergent as the sample size goes to infinity. We then study the convergence of weighted residual empirical processes and their martingale transformation under the null and alternative hypotheses in the diverging dimension setting. The proposed tests based on weighted residual empirical processes can detect local alternatives distinct from the null at the fastest possible rate of order $n^{-1/2}$ but are not asymptotically distribution-free. While the tests based on martingale transformed weighted residual empirical processes can be asymptotically distribution-free, yet, unexpectedly, can only detect the local alternatives converging to the null at a much slower rate of order $n^{-1/4}$, which is somewhat different from existing asymptotically distribution-free tests based on martingale transformations. As the tests based on the residual empirical process are not distribution-free, we propose a smooth residual bootstrap and verify the validity of its approximation in diverging dimension settings. Simulation studies and a real data example are conducted to illustrate the effectiveness of our tests. △ Less

Submitted 29 January, 2022; originally announced January 2022.

arXiv:2112.08046 [pdf, ps, other]

Yetter-Drinfeld modules for group-cograded Hopf quasigroups

Authors: Huili Liu, Tao Yang, Lingli Zhu

Abstract: Let $H$ be a crossed group-cograded Hopf quasigroup. We first introduce the notion of $p$-Yetter-Drinfeld quasimodule over $H$. If the antipode of $H$ is bijective, we show that the category $\mathscr Y\mathscr D\mathscr Q(H)$ of Yetter-Drinfeld quasimodules over $H$ is a crossed category, and the subcategory $\mathscr Y\mathscr D(H)$ of Yetter-Drinfeld modules is a braided crossed category. Let $H$ be a crossed group-cograded Hopf quasigroup. We first introduce the notion of $p$-Yetter-Drinfeld quasimodule over $H$. If the antipode of $H$ is bijective, we show that the category $\mathscr Y\mathscr D\mathscr Q(H)$ of Yetter-Drinfeld quasimodules over $H$ is a crossed category, and the subcategory $\mathscr Y\mathscr D(H)$ of Yetter-Drinfeld modules is a braided crossed category. △ Less

Submitted 28 December, 2021; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: 19pages. Add a mirror structure and modify some typos. Comments are welcomed

MSC Class: 16T05; 17A01; 18M15

arXiv:2112.06556 [pdf, ps, other]

Orthogonal Group Synchronization with Incomplete Measurements: Error Bounds and Linear Convergence of the Generalized Power Method

Authors: Linglingzhi Zhu, **xin Wang, Anthony Man-Cho So

Abstract: Group synchronization refers to estimating a collection of group elements from the noisy pairwise measurements. Such a nonconvex problem has received much attention from numerous scientific fields including computer vision, robotics, and cryo-electron microscopy. In this paper, we focus on the orthogonal group synchronization problem with general additive noise models under incomplete measurements… ▽ More Group synchronization refers to estimating a collection of group elements from the noisy pairwise measurements. Such a nonconvex problem has received much attention from numerous scientific fields including computer vision, robotics, and cryo-electron microscopy. In this paper, we focus on the orthogonal group synchronization problem with general additive noise models under incomplete measurements, which is much more general than the commonly considered setting of complete measurements. Characterizations of the orthogonal group synchronization problem are given from perspectives of optimality conditions as well as fixed points of the projected gradient ascent method which is also known as the generalized power method (GPM). It is well worth noting that these results still hold even without generative models. In the meantime, we derive the local error bound property for the orthogonal group synchronization problem which is useful for the convergence rate analysis of different algorithms and can be of independent interest. Finally, we prove the linear convergence result of the GPM to a global maximizer under a general additive noise model based on the established local error bound property. Our theoretical convergence result holds under several deterministic conditions which can cover certain cases with adversarial noise, and as an example we specialize it to the setting of the Erdös-Rényi measurement graph and Gaussian noise. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2111.05294 [pdf, other]

Lattice structure design optimization under localized linear buckling constraints

Authors: Xingtong Yang, Xinzhuo Hu, Liangchao Zhu, Ming Li

Abstract: An optimization method for the design of multi-lattice structures satisfying local buckling constraints is proposed in this paper. First, the concept of free material optimization is introduced to find an optimal elastic tensor distribution among all feasible elastic continua. By approximating the elastic tensor under the buckling-containing constraint, a matching lattice structure is embedded in… ▽ More An optimization method for the design of multi-lattice structures satisfying local buckling constraints is proposed in this paper. First, the concept of free material optimization is introduced to find an optimal elastic tensor distribution among all feasible elastic continua. By approximating the elastic tensor under the buckling-containing constraint, a matching lattice structure is embedded in each macro element. The stresses in local cells are especially introduced to obtain a better structure. Finally, the present method obtains a lattice structure with excellent overall stiffness and local buckling resistance, which enhances the structural mechanical properties. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: 12 pages, submitted to Computer-Aided Design

arXiv:2110.15536 [pdf, other]

Optimal prediction for kernel-based semi-functional linear regression

Authors: Keli Guo, Jun Fan, Lixing Zhu

Abstract: In this paper, we establish minimax optimal rates of convergence for prediction in a semi-functional linear model that consists of a functional component and a less smooth nonparametric component. Our results reveal that the smoother functional component can be learned with the minimax rate as if the nonparametric component were known. More specifically, a double-penalized least squares method is… ▽ More In this paper, we establish minimax optimal rates of convergence for prediction in a semi-functional linear model that consists of a functional component and a less smooth nonparametric component. Our results reveal that the smoother functional component can be learned with the minimax rate as if the nonparametric component were known. More specifically, a double-penalized least squares method is adopted to estimate both the functional and nonparametric components within the framework of reproducing kernel Hilbert spaces. By virtue of the representer theorem, an efficient algorithm that requires no iterations is proposed to solve the corresponding optimization problem, where the regularization parameters are selected by the generalized cross validation criterion. Numerical studies are provided to demonstrate the effectiveness of the method and to verify the theoretical analysis. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2110.04493 [pdf]

High-performance computation of the exponential of a large sparse matrix

Authors: Feng Wu, Kailing Zhang, Li Zhu, Jiayao Hu

Abstract: Computation of the large sparse matrix exponential has been an important topic in many fields, such as network and finite-element analysis. The existing scaling and squaring algorithm (SSA) is not suitable for the computation of the large sparse matrix exponential as it requires greater memories and computational cost than is actually needed. By introducing two novel concepts, i.e., real bandwidth… ▽ More Computation of the large sparse matrix exponential has been an important topic in many fields, such as network and finite-element analysis. The existing scaling and squaring algorithm (SSA) is not suitable for the computation of the large sparse matrix exponential as it requires greater memories and computational cost than is actually needed. By introducing two novel concepts, i.e., real bandwidth and bandwidth, to measure the sparsity of the matrix, the sparsity of the matrix exponential is analyzed. It is found that for every matrix computed in the squaring phase of the SSA, a corresponding sparse approximate matrix exists. To obtain the sparse approximate matrix, a new filtering technique in terms of forward error analysis is proposed. Combining the filtering technique with the idea of kee** track of the incremental part, a competitive algorithm is developed for the large sparse matrix exponential. The proposed method can primarily alleviate the over-scaling problem due to the filtering technique. Three sets of numerical experiments, including one large matrix with a dimension larger than 2e6 , are conducted. The numerical experiments show that, compared with the expm function in MATLAB, the proposed algorithm can provide higher accuracy at lower computational cost and with less memory. △ Less

Submitted 9 October, 2021; originally announced October 2021.

arXiv:2107.03246 [pdf, ps, other]

Global-in-time L p -- L q estimates for solutions of the Kramers-Fokker-Planck equation

Authors: Xue ** Wang, Lu Zhu

Abstract: In this work, we prove an optimal global-in-time L p --L q estimate for solutions to the Kramers-Fokker-Planck equation with short range potential in dimension three. Our result shows that the decay rate as t $\rightarrow$ +$\infty$ is the same as the heat equation in x-variables and the divergence rate as t $\rightarrow$ 0 + is related to the sub-ellipticity with loss of 1/3 derivatives of the Kr… ▽ More In this work, we prove an optimal global-in-time L p --L q estimate for solutions to the Kramers-Fokker-Planck equation with short range potential in dimension three. Our result shows that the decay rate as t $\rightarrow$ +$\infty$ is the same as the heat equation in x-variables and the divergence rate as t $\rightarrow$ 0 + is related to the sub-ellipticity with loss of 1/3 derivatives of the Kramers-Fokker-Planck operator. △ Less

Submitted 7 July, 2021; originally announced July 2021.

arXiv:2102.10346 [pdf, other]

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Authors: Hongjian Wang, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli, Murat A. Erdogdu

Abstract: Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide conver… ▽ More Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the $p$-th moment of the noise exists for some $p\in [1,2)$, we first identify a condition on the Hessian, coined '$p$-positive (semi-)definiteness', that leads to an interesting interpolation between positive semi-definite matrices ($p=2$) and diagonally dominant matrices with non-negative diagonal entries ($p=1$). Under this condition, we then provide a convergence rate for the distance to the global optimum in $L^p$. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate $α$-stable random vector. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function or to the algorithm itself, as typically required in robust statistics. We demonstrate the implications of our results to applications such as linear regression and generalized linear models subject to heavy-tailed data. △ Less

Submitted 20 February, 2021; originally announced February 2021.

arXiv:2012.05046 [pdf, other]

A multi-objective optimization framework for on-line ridesharing systems

Authors: Hamed Javidi, Dan Simon, Ling Zhu, Yan Wang

Abstract: The ultimate goal of ridesharing systems is to matchtravelers who do not have a vehicle with those travelers whowant to share their vehicle. A good match can be found amongthose who have similar itineraries and time schedules. In thisway each rider can be served without any delay and also eachdriver can earn as much as possible without having too muchdeviation from their original route. We propose… ▽ More The ultimate goal of ridesharing systems is to matchtravelers who do not have a vehicle with those travelers whowant to share their vehicle. A good match can be found amongthose who have similar itineraries and time schedules. In thisway each rider can be served without any delay and also eachdriver can earn as much as possible without having too muchdeviation from their original route. We propose an algorithmthat leverages biogeography-based optimization to solve a multi-objective optimization problem for online ridesharing. It isnecessary to solve the ridesharing problem as a multi-objectiveproblem since there are some important objectives that must beconsidered simultaneously. We test our algorithm by evaluatingperformance on the Bei**g ridesharing dataset. The simulationresults indicate that BBO provides competitive performancerelative to state-of-the-art ridesharing optimization algorithms. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Showing 1–50 of 155 results for author: Zhu, L