Search | arXiv e-print repository

Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization

Abstract: We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic grad… ▽ More We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic gradient descent and gradient tracking algorithms over the graphon. We establish a general lemma for the upper bound estimation related to a class of time-varying differential inequalities with negative linear terms, based upon which, we prove that for both kinds of algorithms, the second moments of the nodes' states are uniformly bounded. Especially, for the stochastic gradient tracking algorithm, we transform the convergence analysis into the asymptotic property of coupled nonlinear differential inequalities with time-varying coefficients and develop a decoupling method. For both kinds of algorithms, we show that by choosing the time-varying algorithm gains properly, all nodes' states achieve $\mathcal{L}^{\infty}$-consensus for a connected graphon. Furthermore, if the local cost functions are strongly convex, then all nodes' states converge to the minimizer of the global cost function and the auxiliary states in the stochastic gradient tracking algorithm converge to the gradient value of the global cost function at the minimizer uniformly in mean square. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.15977 [pdf, other]

A Bayesian framework for spectral reprojection

Authors: Tongtong Li, Anne Gelb

Abstract: Fourier partial sum approximations yield exponential accuracy for smooth and periodic functions, but produce the infamous Gibbs phenomenon for non-periodic ones. Spectral reprojection resolves the Gibbs phenomenon by projecting the Fourier partial sum onto a Gibbs complementary basis, often prescribed as the Gegenbauer polynomials. Noise in the Fourier data and the Runge phenomenon both degrade th… ▽ More Fourier partial sum approximations yield exponential accuracy for smooth and periodic functions, but produce the infamous Gibbs phenomenon for non-periodic ones. Spectral reprojection resolves the Gibbs phenomenon by projecting the Fourier partial sum onto a Gibbs complementary basis, often prescribed as the Gegenbauer polynomials. Noise in the Fourier data and the Runge phenomenon both degrade the quality of the Gegenbauer reconstruction solution, however. Motivated by its theoretical convergence properties, this paper proposes a new Bayesian framework for spectral reprojection, which allows a greater understanding of the impact of noise on the reprojection method from a statistical point of view. We are also able to improve the robustness with respect to the Gegenbauer polynomials parameters. Finally, the framework provides a mechanism to quantify the uncertainty of the solution estimate. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.13036 [pdf, other]

Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

Authors: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga… ▽ More Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincaré inequality offers improved bounds. △ Less

Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.05758 [pdf, other]

Planar Turán number for balanced double stars

Authors: Xin Xu, Qiang Zhou, Tong Li, Guiying Yan

Abstract: Planar Turán number, denoted by $ex_{\mathcal{P}}(n,H)$, is the maximum number of edges in an $n$-vertex planar graph which does not contain $H$ as a subgraph. Ghosh, Győri, Paulos and Xiao initiated the topic of the planar Turán number for double stars. For balanced double star, $S_{3,3}$ is the only remaining graph need to be considered. In this paper, we give the exact value of… ▽ More Planar Turán number, denoted by $ex_{\mathcal{P}}(n,H)$, is the maximum number of edges in an $n$-vertex planar graph which does not contain $H$ as a subgraph. Ghosh, Győri, Paulos and Xiao initiated the topic of the planar Turán number for double stars. For balanced double star, $S_{3,3}$ is the only remaining graph need to be considered. In this paper, we give the exact value of $ex_{\mathcal{P}}(n,S_{3,3})$, forcing the planar Turán number for all balanced double stars completely determined. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 26 pages, 16 figures

arXiv:2406.03006 [pdf, ps, other]

Quantum Algorithms and Lower Bounds for Finite-Sum Optimization

Authors: Yexin Zhang, Chenyi Zhang, Cong Fang, Liwei Wang, Tongyang Li

Abstract: Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $ψ\colon\mathbb{R}^d\to\mathbb{R}$ be a $μ$-stro… ▽ More Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $ψ\colon\mathbb{R}^d\to\mathbb{R}$ be a $μ$-strongly convex proximal function. The goal is to find an $ε$-optimal point for $F(\mathbf{x})=\frac{1}{n}\sum_{i=1}^n f_i(\mathbf{x})+ψ(\mathbf{x})$. We give a quantum algorithm with complexity $\tilde{O}\big(n+\sqrt{d}+\sqrt{\ell/μ}\big(n^{1/3}d^{1/3}+n^{-2/3}d^{5/6}\big)\big)$, improving the classical tight bound $\tildeΘ\big(n+\sqrt{n\ell/μ}\big)$. We also prove a quantum lower bound $\tildeΩ(n+n^{3/4}(\ell/μ)^{1/4})$ when $d$ is large enough. Both our quantum upper and lower bounds can extend to the cases where $ψ$ is not necessarily strongly convex, or each $f_i$ is Lipschitz but not necessarily smooth. In addition, when $F$ is nonconvex, our quantum algorithm can find an $ε$-critial point using $\tilde{O}(n+\ell(d^{1/3}n^{1/3}+\sqrt{d})/ε^2)$ queries. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 27 pages. To appear in the Forty-first International Conference on Machine Learning International Conference on Machine Learning (ICML 2024)

arXiv:2405.19245 [pdf, ps, other]

Efficient Optimal Control of Open Quantum Systems

Authors: Wenhao He, Tongyang Li, Xiantao Li, Zecheng Li, Chunhao Wang, Ke Wang

Abstract: The optimal control problem for open quantum systems can be formulated as a time-dependent Lindbladian that is parameterized by a number of time-dependent control variables. Given an observable and an initial state, the goal is to tune the control variables so that the expected value of some observable with respect to the final state is maximized. In this paper, we present algorithms for solving t… ▽ More The optimal control problem for open quantum systems can be formulated as a time-dependent Lindbladian that is parameterized by a number of time-dependent control variables. Given an observable and an initial state, the goal is to tune the control variables so that the expected value of some observable with respect to the final state is maximized. In this paper, we present algorithms for solving this optimal control problem efficiently, i.e., having a poly-logarithmic dependency on the system dimension, which is exponentially faster than best-known classical algorithms. Our algorithms are hybrid, consisting of both quantum and classical components. The quantum procedure simulates time-dependent Lindblad evolution that drives the initial state to the final state, and it also provides access to the gradients of the objective function via quantum gradient estimation. The classical procedure uses the gradient information to update the control variables. At the technical level, we provide the first (to the best of our knowledge) simulation algorithm for time-dependent Lindbladians with an $\ell_1$-norm dependence. As an alternative, we also present a simulation algorithm in the interaction picture to improve the algorithm for the cases where the time-independent component of a Lindbladian dominates the time-dependent part. On the classical side, we heavily adapt the state-of-the-art classical optimization analysis to interface with the quantum part of our algorithms. Both the quantum simulation techniques and the classical optimization analyses might be of independent interest. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 52 pages. To appear in the proceedings of TQC 2024

arXiv:2405.16760 [pdf, ps, other]

Graphon Particle Systems, Part I: Spatio-Temporal Approximation and Law of Large Numbers

Authors: Yan Chen, Tao Li

Abstract: We study a class of graphon particle systems with time-varying random coefficients. In a graphon particle system, the interactions among particles are characterized by the coupled mean field terms through an underlying graphon and the randomness of the coefficients comes from the stochastic processes associated with the particle labels. By constructing two-level approximated sequences converging i… ▽ More We study a class of graphon particle systems with time-varying random coefficients. In a graphon particle system, the interactions among particles are characterized by the coupled mean field terms through an underlying graphon and the randomness of the coefficients comes from the stochastic processes associated with the particle labels. By constructing two-level approximated sequences converging in 2-Wasserstein distance, we prove the existence and uniqueness of the solution to the system. Besides, by constructing two-level approximated functions converging to the graphon mean field terms, we establish the law of large numbers, which reveals that if the number of particles tends to infinity and the discretization step tends to zero, then the discrete-time interacting particle system over a large-scale network converges to the graphon particle system. As a byproduct, we discover that the graphon particle system can describe the limiting dynamics of the distributed stochastic gradient descent algorithm over the large-scale network and prove that if the gradients of the local cost functions are Lipschitz continuous, then the graphon particle system can be regarded as the spatio-temporal approximation of the discrete-time distributed stochastic gradient descent algorithm as the number of network nodes tends to infinity and the algorithm step size tends to zero. △ Less

Submitted 2 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.11454 [pdf, ps, other]

Comparisons Are All You Need for Optimizing Smooth Functions

Authors: Chenyi Zhang, Tongyang Li

Abstract: When optimizing machine learning models, there are various scenarios where gradient computations are challenging or even infeasible. Furthermore, in reinforcement learning (RL), preference-based RL that only compares between options has wide applications, including reinforcement learning with human feedback in large language models. In this paper, we systematically study optimization of a smooth f… ▽ More When optimizing machine learning models, there are various scenarios where gradient computations are challenging or even infeasible. Furthermore, in reinforcement learning (RL), preference-based RL that only compares between options has wide applications, including reinforcement learning with human feedback in large language models. In this paper, we systematically study optimization of a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$ only assuming an oracle that compares function values at two points and tells which is larger. When $f$ is convex, we give two algorithms using $\tilde{O}(n/ε)$ and $\tilde{O}(n^{2})$ comparison queries to find an $ε$-optimal solution, respectively. When $f$ is nonconvex, our algorithm uses $\tilde{O}(n/ε^2)$ comparison queries to find an $ε$-approximate stationary point. All these results match the best-known zeroth-order algorithms with function evaluation queries in $n$ dependence, thus suggest that \emph{comparisons are all you need for optimizing smooth functions using derivative-free methods}. In addition, we also give an algorithm for esca** saddle points and reaching an $ε$-second order stationary point of a nonconvex $f$, using $\tilde{O}(n^{1.5}/ε^{2.5})$ comparison queries. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.11207 [pdf, ps, other]

Anti-Ramsey Numbers of Expansions of Doubly Edge-critical Graphs in Uniform Hypergraphs

Authors: Tong Li, Yucong Tang, Guiying Yan

Abstract: For an $r$-graph $H$, the anti-Ramsey number ${\rm ar}(n,r,H)$ is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-graph on $n$ vertices with at least $c$ colors, there is a copy of $H$ whose edges have distinct colors. A 2-graph $F$ is doubly edge-$p$-critical if the chromatic number $χ(F - e)\geq p$ for every edge $e$ in $F$ and there exist two edges… ▽ More For an $r$-graph $H$, the anti-Ramsey number ${\rm ar}(n,r,H)$ is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-graph on $n$ vertices with at least $c$ colors, there is a copy of $H$ whose edges have distinct colors. A 2-graph $F$ is doubly edge-$p$-critical if the chromatic number $χ(F - e)\geq p$ for every edge $e$ in $F$ and there exist two edges $e_1,e_2$ in $F$ such that $χ(F -e_1- e_2)=p-1$. The anti-Ramsey numbers of doubly edge-$p$-critical 2-graphs were determined by Jiang and Pikhurko \cite{Jiang&Pikhurko2009}, which generalized the anti-Ramsey numbers of cliques determined by Erdős, Simonovits and Sós \cite{Erdos&Simonovits&Sos1975}. In general, few exact values of anti-Ramsey numbers of $r$-graphs are known for $r\geq 3$. Given a 2-graph $F$, the expansion $F^{(r)}$ of $F$ is an $r$-graph on $|V(F)|+(r-2)|F|$ vertices obtained from $F$ by adding $r-2$ new vertices to each edge of $F$. In this paper, we determine the exact value of ${\rm ar}(n,r,F^{(r)})$ for any doubly edge-$p$-critical 2-graph $F$ with $p>r\geq 3$ and sufficiently large $n$. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.07558 [pdf, other]

Synchronization of High-Dimensional Linear Networks over Finite Fields

Authors: Siyu Zou, Ting Li, Jiandong Zhu

Abstract: This paper investigates the synchronization problems for general high-dimensional linear networks over finite fields. By using the technique of linear transformations and invariant subspaces for linear spaces over finite fields, several necessary and sufficient conditions for the synchronization of high-dimensional linear networks over finite fields are proposed. This paper not only generalizes th… ▽ More This paper investigates the synchronization problems for general high-dimensional linear networks over finite fields. By using the technique of linear transformations and invariant subspaces for linear spaces over finite fields, several necessary and sufficient conditions for the synchronization of high-dimensional linear networks over finite fields are proposed. This paper not only generalizes the existing results from 1-dimensional to high-dimensional linear networks but also adopts a new approach. Finally, some numerical examples are given to illustrate the effectiveness of our theoretical results. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07107 [pdf, other]

A Pair of Bayesian Network Structures has Undecidable Conditional Independencies

Authors: Cheuk Ting Li

Abstract: Given a Bayesian network structure (directed acyclic graph), the celebrated d-separation algorithm efficiently determines whether the network structure implies a given conditional independence relation. We show that this changes drastically when we consider two Bayesian network structures instead. It is undecidable to determine whether two given network structures imply a given conditional indepen… ▽ More Given a Bayesian network structure (directed acyclic graph), the celebrated d-separation algorithm efficiently determines whether the network structure implies a given conditional independence relation. We show that this changes drastically when we consider two Bayesian network structures instead. It is undecidable to determine whether two given network structures imply a given conditional independency, that is, whether every collection of random variables satisfying both network structures must also satisfy the conditional independency. Although the approximate combination of two Bayesian networks is a well-studied topic, our result shows that it is fundamentally impossible to accurately combine the knowledge of two Bayesian network structures, in the sense that no algorithm can tell what conditional independencies are implied by the two network structures. We can also explicitly construct two Bayesian network structures, such that whether they imply a certain conditional independency is unprovable in the ZFC set theory, assuming ZFC is consistent. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 13 pages, 2 figures

arXiv:2405.04349 [pdf, ps, other]

Anti-Ramsey numbers of loose paths and cycles in uniform hypergraphs

Authors: Tong Li, Yucong Tang, Guanghui Wang, Guiying Yan

Abstract: For a fixed family of $r$-uniform hypergraphs $\mathcal{F}$, the anti-Ramsey number of $\mathcal{F}$, denoted by $ ar(n,r,\mathcal{F})$, is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-uniform hypergraph on $n$ vertices with at least $c$ colors, there is a rainbow copy of some hypergraph in $\mathcal{F}$. Here, a rainbow hypergraph is an edge-colored hypergr… ▽ More For a fixed family of $r$-uniform hypergraphs $\mathcal{F}$, the anti-Ramsey number of $\mathcal{F}$, denoted by $ ar(n,r,\mathcal{F})$, is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-uniform hypergraph on $n$ vertices with at least $c$ colors, there is a rainbow copy of some hypergraph in $\mathcal{F}$. Here, a rainbow hypergraph is an edge-colored hypergraph with all edges colored differently. Let $\mathcal{P}_k$ and $\mathcal{C}_k$ be the families of loose paths and loose cycles with $k$ edges in an $r$-uniform hypergraph, respectively. In this paper, we determine the exact values of $ ar(n,r,\mathcal{P}_k)$ and $ ar(n,r,\mathcal{C}_k)$ for all $k\geq 4$ and $r\geq 3$. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03586 [pdf, other]

Dissipative gradient nonlinearities prevent $δ$-formations in local and nonlocal attraction-repulsion chemotaxis models

Authors: Tongxing Li, Daniel Acosta Soba, Alessandro Columbu, Giuseppe Viglialoro

Abstract: We study some attraction repulsion chemotaxis models, characterized by nonlinearities laws for the diffusion of the cell density, and for the chemosensitivities and the production rates of the chemoattractant and the chemorepellent. Additionally, a source also involving some expression of the gradient of the species is incorporated. We study some attraction repulsion chemotaxis models, characterized by nonlinearities laws for the diffusion of the cell density, and for the chemosensitivities and the production rates of the chemoattractant and the chemorepellent. Additionally, a source also involving some expression of the gradient of the species is incorporated. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2403.16039 [pdf, ps, other]

Robust estimations from distribution structures: III. Invariant Moments

Authors: Tuobang Li

Abstract: Descriptive statistics for parametric models are currently highly sensative to departures, gross errors, and/or random errors. Here, leveraging the structures of parametric distributions and their central moment kernel distributions, a class of estimators, consistent simultanously for both a semiparametric distribution and a distinct parametric distribution, is proposed. These efficient estimators… ▽ More Descriptive statistics for parametric models are currently highly sensative to departures, gross errors, and/or random errors. Here, leveraging the structures of parametric distributions and their central moment kernel distributions, a class of estimators, consistent simultanously for both a semiparametric distribution and a distinct parametric distribution, is proposed. These efficient estimators are robust to both gross errors and departures from parametric assumptions, making them ideal for estimating the mean and central moments of common unimodal distributions. This article opens up the possibility of utilizing the common nature of probability models to construct near-optimal estimators that are suitable for various scenarios. △ Less

Submitted 13 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.14570 [pdf, ps, other]

Robust estimations from distribution structures: II. Central Moments

Authors: Tuobang Li

Abstract: In descriptive statistics, $U$-statistics arise naturally in producing minimum-variance unbiased estimators. In 1984, Serfling considered the distribution formed by evaluating the kernel of the $U$-statistics and proposed generalized $L$-statistics which includes Hodges-Lehamnn estimator and Bickel-Lehmann spread as special cases. However, the structures of the kernel distributions remain unclear.… ▽ More In descriptive statistics, $U$-statistics arise naturally in producing minimum-variance unbiased estimators. In 1984, Serfling considered the distribution formed by evaluating the kernel of the $U$-statistics and proposed generalized $L$-statistics which includes Hodges-Lehamnn estimator and Bickel-Lehmann spread as special cases. However, the structures of the kernel distributions remain unclear. In 1954, Hodges and Lehmann demonstrated that if $X$ and $Y$ are independently sampled from the same unimodal distribution, $X-Y$ will exhibit symmetrical unimodality with its peak centered at zero. Building upon this foundational work, the current study delves into the structure of the kernel distribution. It is shown that the $\mathbf{k}$th central moment kernel distributions ($\mathbf{k}>2$) derived from a unimodal distribution exhibit location invariance and is also nearly unimodal with the mode and median close to zero. This article provides an approach to study the general structure of kernel distributions. △ Less

Submitted 13 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12110 [pdf, other]

Robust estimations from distribution structures: I. Mean

Authors: Tuobang Li

Abstract: As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that their maximum biases concerning the mean can be quite different, but the underlying mechanisms largely remain unclear. This study exploited a semiparametric me… ▽ More As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that their maximum biases concerning the mean can be quite different, but the underlying mechanisms largely remain unclear. This study exploited a semiparametric method to classify distributions by the asymptotic orderliness of quantile combinations with varying breakdown points, showing their interrelations and connections to parametric distributions. Further deductions explain why the Winsorized mean typically has smaller biases compared to the trimmed mean; two sequences of semiparametric robust mean estimators emerge, particularly highlighting the superiority of the median Hodges Lehmann mean. This article sheds light on the understanding of the common nature of probability distributions. △ Less

Submitted 13 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.09356 [pdf, ps, other]

Very weak solutions of the Dirichlet problem for 2-Hessian equation

Authors: Tongtong Li, Guohuan Qiu

Abstract: For any $α$ small, we construct infinitely many $C^{1,α}$ very weak solutions to the 2-Hessian equation with prescribed boundary value. The proof relies on the convex integration method and cut-off technique. For any $α$ small, we construct infinitely many $C^{1,α}$ very weak solutions to the 2-Hessian equation with prescribed boundary value. The proof relies on the convex integration method and cut-off technique. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.12745 [pdf, ps, other]

Near-Optimal Quantum Algorithm for Minimizing the Maximal Loss

Authors: Hao Wang, Chenyi Zhang, Tongyang Li

Abstract: The problem of minimizing the maximum of $N$ convex, Lipschitz functions plays significant roles in optimization and machine learning. It has a series of results, with the most recent one requiring $O(Nε^{-2/3} + ε^{-8/3})$ queries to a first-order oracle to compute an $ε$-suboptimal point. On the other hand, quantum algorithms for optimization are rapidly advancing with speedups shown on many imp… ▽ More The problem of minimizing the maximum of $N$ convex, Lipschitz functions plays significant roles in optimization and machine learning. It has a series of results, with the most recent one requiring $O(Nε^{-2/3} + ε^{-8/3})$ queries to a first-order oracle to compute an $ε$-suboptimal point. On the other hand, quantum algorithms for optimization are rapidly advancing with speedups shown on many important optimization problems. In this paper, we conduct a systematic study for quantum algorithms and lower bounds for minimizing the maximum of $N$ convex, Lipschitz functions. On one hand, we develop quantum algorithms with an improved complexity bound of $\tilde{O}(\sqrt{N}ε^{-5/3} + ε^{-8/3})$. On the other hand, we prove that quantum algorithms must take $\tildeΩ(\sqrt{N}ε^{-2/3})$ queries to a first order quantum oracle, showing that our dependence on $N$ is optimal up to poly-logarithmic factors. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 22 pages, 1 figure, To appear in The Twelfth International Conference on Learning Representations (ICLR 2024)

arXiv:2402.10152 [pdf, other]

A new type of simplified inverse Lax-Wendroff boundary treatment I: hyperbolic conservation laws

Authors: Shihao Liu, Tingting Li, Ziqiang Cheng, Yan Jiang, Chi-Wang Shu, Meng** Zhang

Abstract: In this paper, we design a new kind of high order inverse Lax-Wendroff (ILW) boundary treatment for solving hyperbolic conservation laws with finite difference method on a Cartesian mesh. This new ILW method decomposes the construction of ghost point values near inflow boundary into two steps: interpolation and extrapolation. At first, we impose values of some artificial auxiliary points through a… ▽ More In this paper, we design a new kind of high order inverse Lax-Wendroff (ILW) boundary treatment for solving hyperbolic conservation laws with finite difference method on a Cartesian mesh. This new ILW method decomposes the construction of ghost point values near inflow boundary into two steps: interpolation and extrapolation. At first, we impose values of some artificial auxiliary points through a polynomial interpolating the interior points near the boundary. Then, we will construct a Hermite extrapolation based on those auxiliary point values and the spatial derivatives at boundary obtained via the ILW procedure. This polynomial will give us the approximation to the ghost point value. By an appropriate selection of those artificial auxiliary points, high-order accuracy and stable results can be achieved. Moreover, theoretical analysis indicates that comparing with the original ILW method, especially for higher order accuracy, the new proposed one would require fewer terms using the relatively complicated ILW procedure and thus improve computational efficiency on the premise of maintaining accuracy and stability. We perform numerical experiments on several benchmarks, including one- and two-dimensional scalar equations and systems. The robustness and efficiency of the proposed scheme is numerically verified. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.03624 [pdf, ps, other]

QQMR: A Structure-Preserving Quaternion Quasi-Minimal Residual Method for Non-Hermitian Quaternion Linear Systems

Authors: Tao Li, Qing-Wen Wang, Xin-Fang Zhang

Abstract: The quaternion biconjugate gradient (QBiCG) method, as a novel variant of quaternion Lanczos-type methods for solving the non-Hermitian quaternion linear systems, does not yield a minimization property. This means that the method possesses a rather irregular convergence behavior, which leads to numerical instability. In this paper, we propose a new structure-preserving quaternion quasi-minimal res… ▽ More The quaternion biconjugate gradient (QBiCG) method, as a novel variant of quaternion Lanczos-type methods for solving the non-Hermitian quaternion linear systems, does not yield a minimization property. This means that the method possesses a rather irregular convergence behavior, which leads to numerical instability. In this paper, we propose a new structure-preserving quaternion quasi-minimal residual method, based on the quaternion biconjugate orthonormalization procedure with coupled two-term recurrences, which overcomes the drawback of QBiCG. The computational cost and storage required by the proposed method are much less than the traditional QMR iterations for the real representation of quaternion linear systems. Some convergence properties of which are also established. Finally, we report the numerical results to show the robustness and effectiveness of the proposed method compared with QBiCG. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 25 pages

MSC Class: 15B33; 65F08; 65F10; 94A08

arXiv:2402.00380 [pdf, ps, other]

$n$-Dimensional Volumetric Stretch Energy Minimization for Volume-/Mass-Preserving Parameterizations

Authors: Zhong-Heng Tan, Tiexiang Li, Wen-Wei Lin, Shing-Tung Yau

Abstract: In this paper, we develop an $n$ dimensional volumetric stretch energy ($n$-VSE) functional for the volume-/mass-preserving parameterization of the $n$-manifolds topologically equivalent to $n$-ball. The $n$-VSE has a lower bound and equal to it if and only if the map is volume-/mass-preserving. This motivates us to minimize the $n$-VSE to achieve the ideal volume-/mass-preserving parameterization… ▽ More In this paper, we develop an $n$ dimensional volumetric stretch energy ($n$-VSE) functional for the volume-/mass-preserving parameterization of the $n$-manifolds topologically equivalent to $n$-ball. The $n$-VSE has a lower bound and equal to it if and only if the map is volume-/mass-preserving. This motivates us to minimize the $n$-VSE to achieve the ideal volume-/mass-preserving parameterization. In the discrete case, we also guarantee the relation between the lower bound and the volume-/mass-preservation, and propose the spherical and ball volume-/mass-preserving parameterization algorithms. The numerical experiments indicate the accuracy and robustness of the proposed algorithms. The modified algorithms are applied to the manifold registration and deformation, showing the versatility of $n$-VSE. △ Less

Submitted 1 February, 2024; originally announced February 2024.

MSC Class: 49Q10; 52C26; 65D18; 65F05; 68U05

arXiv:2401.09118 [pdf, other]

Learning based numerical methods for Helmholtz equation with high frequency

Authors: Yu Chen, ** Cheng, Tingyue Li, Yun Miao

Abstract: High-frequency issues have been remarkably challenges in numerical methods for partial differential equations. In this paper, a learning based numerical method (LbNM) is proposed for Helmholtz equation with high frequency. The main novelty is using Tikhonov regularization method to stably learn the solution operator by utilizing relevant information especially the fundamental solutions. Then apply… ▽ More High-frequency issues have been remarkably challenges in numerical methods for partial differential equations. In this paper, a learning based numerical method (LbNM) is proposed for Helmholtz equation with high frequency. The main novelty is using Tikhonov regularization method to stably learn the solution operator by utilizing relevant information especially the fundamental solutions. Then applying the solution operator to a new boundary input could quickly update the solution. Based on the method of fundamental solutions and the quantitative Runge approximation, we give the error estimate. This indicates interpretability and generalizability of the present method. Numerical results validates the error analysis and demonstrates the high-precision and high-efficiency features. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.04989 [pdf, ps, other]

On integral images of Curtis homomorphisms for $\mathrm{GL}_n$ and $\mathrm{U}_n$

Authors: Tzu-Jan Li

Abstract: For $G = \mathrm{GL}_n$ or $\mathrm{U}_n$ defined over a finite field of characteristic $p$, we refine a result of Bonnafé and Kessar on the saturatedness of the Curtis homomorphism $\mathrm{Cur}^G$ by describing the image of $\mathrm{Cur}^G$ over $\overline{\mathbb{Z}}[1/p]$ via a system of linear conditions. For $G = \mathrm{GL}_n$ or $\mathrm{U}_n$ defined over a finite field of characteristic $p$, we refine a result of Bonnafé and Kessar on the saturatedness of the Curtis homomorphism $\mathrm{Cur}^G$ by describing the image of $\mathrm{Cur}^G$ over $\overline{\mathbb{Z}}[1/p]$ via a system of linear conditions. △ Less

Submitted 20 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 7 pages, minor revision

arXiv:2312.13527 [pdf, other]

MindOpt Adapter for CPLEX Benchmarking Performance Analysis

Authors: Mou Sun, Tao Li, Wotao Yin

Abstract: This report provides a comprehensive analysis of the performance of MindOpt Adapter for CPLEX 12.9 in benchmark testing. CPLEX, recognized as a robust Mixed Integer Programming (MIP) solver, has faced some scrutiny regarding its performance on MIPLIB 2017 when configured to default settings. MindOpt Adapter aims to enhance CPLEX's performance by automatically applying improved configurations for s… ▽ More This report provides a comprehensive analysis of the performance of MindOpt Adapter for CPLEX 12.9 in benchmark testing. CPLEX, recognized as a robust Mixed Integer Programming (MIP) solver, has faced some scrutiny regarding its performance on MIPLIB 2017 when configured to default settings. MindOpt Adapter aims to enhance CPLEX's performance by automatically applying improved configurations for solving optimization problems. Our testing demonstrates that MindOpt Adapter for CPLEX yields successfully solved 232 of the 240 problems in the MIPLIB 2017 benchmark set. This performance surpasses all the other solvers in terms of the number of problems solved and the geometric mean of running times. The report provides a comparison of the benchmark results against the outcomes achieved by CPLEX under its default configuration. △ Less

Submitted 31 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.07907 [pdf, ps, other]

doi 10.1007/s11587-024-00847-8

On recognition of the direct squares of the simple groups with abelian Sylow 2-subgroups

Authors: Tao Li, A. R. Moghaddamfar, Andrey V. Vasil'ev, Zhigang Wang

Abstract: The spectrum of a group is the set of orders of its elements. Finite groups with the same spectra as the direct squares of the finite simple groups with abelian Sylow 2-subgroups are considered. It is proved that the direct square $J_1\times J_1$ of the sporadic Janko group $J_1$ and the direct squares ${^2}G_2(q)\times{^2}G_2(q)$ of the simple small Ree groups ${^2}G_2(q)$ are uniquely characteri… ▽ More The spectrum of a group is the set of orders of its elements. Finite groups with the same spectra as the direct squares of the finite simple groups with abelian Sylow 2-subgroups are considered. It is proved that the direct square $J_1\times J_1$ of the sporadic Janko group $J_1$ and the direct squares ${^2}G_2(q)\times{^2}G_2(q)$ of the simple small Ree groups ${^2}G_2(q)$ are uniquely characterized by their spectra in the class of finite groups, while for the direct square $PSL_2(q)\times PSL_2(q)$ of a 2-dimensional simple linear group $PSL_2(q)$, there are always infinitely many groups (even solvable groups) with the same spectra. △ Less

Submitted 13 December, 2023; originally announced December 2023.

MSC Class: 20D60; 20D06

arXiv:2311.18474 [pdf, ps, other]

A Robust Hessian-based Trust Region Algorithm for Spherical Conformal Parameterizations

Authors: Zhong-Heng Tan, Tiexiang Li, Wen-Wei Lin, Shing-Tung Yau

Abstract: Surface parameterizations are widely applied in computer graphics, medical imaging and transformation optics. In this paper, we rigorously derive the gradient vector and Hessian matrix of the discrete conformal energy for spherical conformal parameterizations of simply connected closed surfaces of genus-$0$. In addition, we give the sparsity structure of the Hessian matrix, which leads to a robust… ▽ More Surface parameterizations are widely applied in computer graphics, medical imaging and transformation optics. In this paper, we rigorously derive the gradient vector and Hessian matrix of the discrete conformal energy for spherical conformal parameterizations of simply connected closed surfaces of genus-$0$. In addition, we give the sparsity structure of the Hessian matrix, which leads to a robust Hessian-based trust region algorithm for the computation of spherical conformal maps. Numerical experiments demonstrate the local quadratic convergence of the proposed algorithm with low conformal distortions. We subsequently propose an application of our method to surface registrations that still maintains local quadratic convergence. △ Less

Submitted 30 November, 2023; originally announced November 2023.

MSC Class: 49Q10; 52C26; 65D18; 65F05; 68U05

arXiv:2311.15598 [pdf, other]

Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks

Authors: Zhongyuan Lyu, Ting Li, Dong Xia

Abstract: In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a… ▽ More In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization algorithm involving both node and sample splitting and a refinement procedure by likelihood-based Lloyd algorithm. Network clustering must be accompanied by node community detection. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. Numerical simulations and real data experiments both validate that our method outperforms existing methods. Oftentimes, the edges of networks carry count-type weights. We then extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks. The minimax optimal clustering error rates in these discrete mixtures all take the same exponential form characterized by the Renyi divergences. These optimal clustering error rates in discrete mixtures can also be achieved by our proposed two-stage clustering algorithm. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.15587 [pdf, other]

Quantum Langevin Dynamics for Optimization

Authors: Zherui Chen, Yuchen Lu, Hao Wang, Yizhou Liu, Tongyang Li

Abstract: We initiate the study of utilizing Quantum Langevin Dynamics (QLD) to solve optimization problems, particularly those non-convex objective functions that present substantial obstacles for traditional gradient descent algorithms. Specifically, we examine the dynamics of a system coupled with an infinite heat bath. This interaction induces both random quantum noise and a deterministic dam** effect… ▽ More We initiate the study of utilizing Quantum Langevin Dynamics (QLD) to solve optimization problems, particularly those non-convex objective functions that present substantial obstacles for traditional gradient descent algorithms. Specifically, we examine the dynamics of a system coupled with an infinite heat bath. This interaction induces both random quantum noise and a deterministic dam** effect to the system, which nudge the system towards a steady state that hovers near the global minimum of objective functions. We theoretically prove the convergence of QLD in convex landscapes, demonstrating that the average energy of the system can approach zero in the low temperature limit with an exponential decay rate correlated with the evolution time. Numerically, we first show the energy dissipation capability of QLD by retracing its origins to spontaneous emission. Furthermore, we conduct detailed discussion of the impact of each parameter. Finally, based on the observations when comparing QLD with classical Fokker-Plank-Smoluchowski equation, we propose a time-dependent QLD by making temperature and $\hbar$ time-dependent parameters, which can be theoretically proven to converge better than the time-independent case and also outperforms a series of state-of-the-art quantum and classical optimization algorithms in many non-convex landscapes. △ Less

Submitted 22 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 52 pages, 1 table, 25 figures

arXiv:2311.05248 [pdf, other]

A General Space of Belief Updates for Model Misspecification in Bayesian Networks

Authors: Tian** Li

Abstract: In an ideal setting for Bayesian agents, a perfect description of the rules of the environment (i.e., the objective observation model) is available, allowing them to reason through the Bayesian posterior to update their beliefs in an optimal way. But such an ideal setting hardly ever exists in the natural world, so agents have to make do with reasoning about how they should update their beliefs si… ▽ More In an ideal setting for Bayesian agents, a perfect description of the rules of the environment (i.e., the objective observation model) is available, allowing them to reason through the Bayesian posterior to update their beliefs in an optimal way. But such an ideal setting hardly ever exists in the natural world, so agents have to make do with reasoning about how they should update their beliefs simultaneously. This introduces a number of related challenges for a number of research areas: (1) For Bayesian statistics, this deviation of the subjective model from the true data-generating mechanism is termed model misspecification in the literature. (2) For neuroscience, it introduces the necessity to model how the agents' belief updates (how they use evidence to update their belief) and how their belief changes over time. The current paper addresses these two challenges by (a) providing a general class of posteriors/belief updates called cut-posteriors of Bayesian networks that have a much greater expressivity, and (b) parameterizing the space of possible posteriors to make meta-learning (i.e., choosing the belief update from this space in a principled manner) possible. For (a), it is noteworthy that any cut-posterior has local computation only, making computation tractable for human or artificial agents. For (b), a Markov Chain Monte Carlo algorithm to perform such meta-learning will be sketched here, though it is only an illustration and but no means the only possible meta-learning procedure possible for the space of cut-posteriors. Operationally, this work gives a general algorithm to take in an arbitrary Bayesian network and output all possible cut-posteriors in the space. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 14 pages, 4 figures

arXiv:2310.10082 [pdf, other]

A simple uniformly optimal method without line search for convex optimization

Authors: Tianjiao Li, Guanghui Lan

Abstract: Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In par… ▽ More Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Hölder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization. △ Less

Submitted 26 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2309.02585 [pdf, other]

A Structurally Informed Data Assimilation Approach for Nonlinear Partial Differential Equations

Authors: Tongtong Li, Anne Gelb, Yoonsang Lee

Abstract: Ensemble transform Kalman filtering (ETKF) data assimilation is often used to combine available observations with numerical simulations to obtain statistically accurate and reliable state representations in dynamical systems. However, it is well known that the commonly used Gaussian distribution assumption introduces biases for state variables that admit discontinuous profiles, which are prevalent… ▽ More Ensemble transform Kalman filtering (ETKF) data assimilation is often used to combine available observations with numerical simulations to obtain statistically accurate and reliable state representations in dynamical systems. However, it is well known that the commonly used Gaussian distribution assumption introduces biases for state variables that admit discontinuous profiles, which are prevalent in nonlinear partial differential equations. This investigation designs a new structurally informed non-Gaussian prior that exploits statistical information from the simulated state variables. In particular, we construct a new weighting matrix based on the second moment of the gradient information of the state variable to replace the prior covariance matrix used for model/data compromise in the ETKF data assimilation framework. We further adapt our weighting matrix to include information in discontinuity regions via a clustering technique. Our numerical experiments demonstrate that this new approach yields more accurate estimates than those obtained using ETKF on shallow water equations, even when ETKF is enhanced with inflation and localization techniques. △ Less

Submitted 5 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.13214 [pdf, ps, other]

Gl-QFOM and Gl-QGMRES: two efficient algorithms for quaternion linear systems with multiple right-hand sides

Authors: Tao Li, Qing-Wen Wang, Xin-Fang Zhang

Abstract: In this paper, we propose the global quaternion full orthogonalization (Gl-QFOM) and global quaternion generalized minimum residual (Gl-QGMRES) methods, which are built upon global orthogonal and oblique projections onto a quaternion matrix Krylov subspace, for solving quaternion linear systems with multiple right-hand sides. We first develop the global quaternion Arnoldi procedure to preserve the… ▽ More In this paper, we propose the global quaternion full orthogonalization (Gl-QFOM) and global quaternion generalized minimum residual (Gl-QGMRES) methods, which are built upon global orthogonal and oblique projections onto a quaternion matrix Krylov subspace, for solving quaternion linear systems with multiple right-hand sides. We first develop the global quaternion Arnoldi procedure to preserve the quaternion Hessenberg form during the iterations. We then establish the convergence analysis of the proposed methods, and show how to apply them to solve the Sylvester quaternion matrix equation. Numerical examples are provided to illustrate the effectiveness of our methods compared with the traditional Gl-FOM and Gl-GMRES iterations for the real representations of the original linear systems. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 24pages, 4 figures

arXiv:2308.05742 [pdf, other]

A Characterization of Entropy as a Universal Monoidal Natural Transformation

Authors: Cheuk Ting Li

Abstract: We show that the essential properties of entropy (monotonicity, additivity and subadditivity) are consequences of entropy being a monoidal natural transformation from the under category functor $-/\mathsf{LProb}_ρ$ (where $\mathsf{LProb}_ρ$ is category of $ρ$-th-power-summable probability distributions, $0<ρ<1$) to $Δ_{\mathbb{R}}$. Moreover, the Shannon entropy can be characterized as the univers… ▽ More We show that the essential properties of entropy (monotonicity, additivity and subadditivity) are consequences of entropy being a monoidal natural transformation from the under category functor $-/\mathsf{LProb}_ρ$ (where $\mathsf{LProb}_ρ$ is category of $ρ$-th-power-summable probability distributions, $0<ρ<1$) to $Δ_{\mathbb{R}}$. Moreover, the Shannon entropy can be characterized as the universal monoidal natural transformation from $-/\mathsf{LProb}_ρ$ to the category of integrally closed partially ordered abelian groups (a reflective subcategory of the lax-slice 2-category over $\mathsf{MonCat}_{\ell}$ in the 2-category of monoidal categories), providing a succinct characterization of Shannon entropy as a reflection arrow. We can likewise define entropy for every monoidal category with a monoidal structure on its under categories (e.g. the category of finite abelian groups, the category of finite inhabited sets, the category of finite dimensional vector spaces, and the augmented simplex category) via the reflection arrow. This implies that all these entropies over different categories are components of a single natural transformation (the unit of the idempotent monad), allowing us to connect these entropies in a natural manner. We also provide a universal characterization of the conditional Shannon entropy based on the chain rule which, unlike the characterization of information loss by Baez, Fritz and Leinster, does not require any continuity assumption. △ Less

Submitted 14 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 55 pages, 2 figures

arXiv:2307.06627

Fast and Practical Quantum-Inspired Classical Algorithms for Solving Linear Systems

Authors: Qian Zuo, Tongyang Li

Abstract: We propose fast and practical quantum-inspired classical algorithms for solving linear systems. Specifically, given sampling and query access to a matrix $A\in\mathbb{R}^{m\times n}$ and a vector $b\in\mathbb{R}^m$, we propose classical algorithms that produce a data structure for the solution $x\in\mathbb{R}^{n}$ of the linear system $Ax=b$ with the ability to sample and query its entries. The re… ▽ More We propose fast and practical quantum-inspired classical algorithms for solving linear systems. Specifically, given sampling and query access to a matrix $A\in\mathbb{R}^{m\times n}$ and a vector $b\in\mathbb{R}^m$, we propose classical algorithms that produce a data structure for the solution $x\in\mathbb{R}^{n}$ of the linear system $Ax=b$ with the ability to sample and query its entries. The resulting $x$ satisfies $\|x-A^{+}b\|\leqε\|A^{+}b\|$, where $\|\cdot\|$ is the spectral norm and $A^+$ is the Moore-Penrose inverse of $A$. Our algorithm has time complexity $\widetilde{O}(κ_F^4/κε^2)$ in the general case, where $κ_{F} =\|A\|_F\|A^+\|$ and $κ=\|A\|\|A^+\|$ are condition numbers. Compared to the prior state-of-the-art result [Shao and Montanaro, arXiv:2103.10309v2], our algorithm achieves a polynomial speedup in condition numbers. When $A$ is $s$-sparse, our algorithm has complexity $\widetilde{O}(s κ\log(1/ε))$, matching the quantum lower bound for solving linear systems in $κ$ and $1/ε$ up to poly-logarithmic factors [Harrow and Kothari]. When $A$ is $s$-sparse and symmetric positive-definite, our algorithm has complexity $\widetilde{O}(s\sqrtκ\log(1/ε))$. Technically, our main contribution is the application of the heavy ball momentum method to quantum-inspired classical algorithms for solving linear systems, where we propose two new methods with speedups: quantum-inspired Kaczmarz method with momentum and quantum-inspired coordinate descent method with momentum. Their analysis exploits careful decomposition of the momentum transition matrix and the application of novel spectral norm concentration bounds for independent random matrices. Finally, we also conduct numerical experiments for our algorithms on both synthetic and real-world datasets, and the experimental results support our theoretical claims. △ Less

Submitted 30 November, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: Theorem 3 and Theorem 5 are incorrect, and more efforts are needed to fix existing issues

arXiv:2307.01497 [pdf, other]

Accelerated stochastic approximation with state-dependent noise

Authors: Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li

Abstract: We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered… ▽ More We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings. △ Less

Submitted 13 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

arXiv:2307.00783 [pdf, other]

Monte Carlo Policy Gradient Method for Binary Optimization

Authors: Cheng Chen, Ruitao Chen, Tianyou Li, Ruichen Ao, Zaiwen Wen

Abstract: Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized… ▽ More Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the gradient efficiently. We further develop a filter scheme to replace the original objective function by the one with the local search technique to broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality for MCMC. Numerical results show that this framework is very promising to provide near-optimal solutions for quite a few binary optimization problems. △ Less

Submitted 3 July, 2023; originally announced July 2023.

MSC Class: 90C09; 90C27; 90C59; 60J45; 60J20

arXiv:2306.13562 [pdf, ps, other]

Nonlinear asymptotic stability and transition threshold for 2D Taylor-Couette flows in Sobolev spaces

Authors: Xinliang An, Taoran He, Te Li

Abstract: In this paper, we investigate the stability of the 2-dimensional (2D) Taylor-Couette (TC) flow for the incompressible Navier-Stokes equations. The explicit form of velocity for 2D TC flow is given by $u=(Ar+\frac{B}{r})(-\sin θ, \cos θ)^T$ with $(r, θ)\in [1, R]\times \mathbb{S}^1$ being an annulus and $A, B$ being constants. Here, $A, B$ encode the rotational effect and $R$ is the ratio of the ou… ▽ More In this paper, we investigate the stability of the 2-dimensional (2D) Taylor-Couette (TC) flow for the incompressible Navier-Stokes equations. The explicit form of velocity for 2D TC flow is given by $u=(Ar+\frac{B}{r})(-\sin θ, \cos θ)^T$ with $(r, θ)\in [1, R]\times \mathbb{S}^1$ being an annulus and $A, B$ being constants. Here, $A, B$ encode the rotational effect and $R$ is the ratio of the outer and inner radii of the annular region. Our focus is the long-term behavior of solutions around the steady 2D TC flow. While the laminar solution is known to be a global attractor for 2D channel flows and plane flows, it is unclear whether this is still true for rotating flows with curved geometries. In this article, we prove that the 2D Taylor-Couette flow is asymptotically stable, even at high Reynolds number ($Re\sim ν^{-1}$), with a sharp exponential decay rate of $\exp(-ν^{\frac13}|B|^{\frac23}R^{-2}t)$ as long as the initial perturbation is less than or equal to $ν^\frac12 |B|^{\frac12}R^{-2}$ in Sobolev space. The powers of $ν$ and $B$ in this decay estimate are optimal. It is derived using the method of resolvent estimates and is commonly recognized as the enhanced dissipative effect. Compared to the Couette flow, the enhanced dissipation of the rotating Taylor-Couette flow not only depends on the Reynolds number but also reflects the rotational aspect via the rotational coefficient $B$. The larger the $|B|$, the faster the long-time dissipation takes effect. We also conduct space-time estimates describing inviscid-dam** mechanism in our proof. To obtain these inviscid-dam** estimates, we find and construct a new set of explicit orthonormal basis of the weighted eigenfunctions for the Laplace operators corresponding to the circular flows. These provide new insights into the mathematical understanding of the 2D Taylor-Couette flows. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: 50 pages

arXiv:2306.10343

Complete self-shrinkers with bounded the second fundamental form in $\mathbb{R}^{n+1}$

Authors: Yayun Chen, Tongzhu Li

Abstract: Let $X:M^n\to \mathbb{R}^{n+1}$ be a complete properly immersed self-shrinker. In this paper, we prove that if the squared norm of the second fundamental form $S$ satisfies $1\leq S< C$ for some constant $C$, then $S=1$. Further we classify the $n$-dimensional complete proper self-shrinkers with constant squared norm of the second fundamental form in $\mathbb{R}^{n+1}$, which solve the conjecture… ▽ More Let $X:M^n\to \mathbb{R}^{n+1}$ be a complete properly immersed self-shrinker. In this paper, we prove that if the squared norm of the second fundamental form $S$ satisfies $1\leq S< C$ for some constant $C$, then $S=1$. Further we classify the $n$-dimensional complete proper self-shrinkers with constant squared norm of the second fundamental form in $\mathbb{R}^{n+1}$, which solve the conjecture proposed by Q.M. Cheng and G. Wei when the self-shrinker is proper. △ Less

Submitted 4 July, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

Comments: There is an error in the paper

MSC Class: 53C40; 53C24

arXiv:2306.06581 [pdf, other]

Importance Sparsification for Sinkhorn Algorithm

Authors: Mengyu Li, Jun Yu, Tao Li, Cheng Meng

Abstract: Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and… ▽ More Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and UOT solutions. Specifically, our method employs natural upper bounds for unknown optimal transport plans to establish effective sampling probabilities, and constructs a sparse kernel matrix to accelerate Sinkhorn iterations, reducing the computational cost of each iteration from $O(n^2)$ to $\widetilde{O}(n)$ for a sample of size $n$. Theoretically, we show the proposed estimators for the regularized OT and UOT problems are consistent under mild regularity conditions. Experiments on various synthetic data demonstrate Spar-Sink outperforms mainstream competitors in terms of both estimation error and speed. A real-world echocardiogram data analysis shows Spar-Sink can effectively estimate and visualize cardiac cycles, from which one can identify heart failure and arrhythmia. To evaluate the numerical accuracy of cardiac cycle prediction, we consider the task of predicting the end-systole time point using the end-diastole one. Results show Spar-Sink performs as well as the classical Sinkhorn algorithm, requiring significantly less computational time. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Accepted by Journal of Machine Learning Research

arXiv:2306.05707 [pdf, ps, other]

On the Mathematics of RNA Velocity II: Algorithmic Aspects

Authors: Tiejun Li, Yizhuo Wang, Guoguo Yang, Peijie Zhou

Abstract: In a previous paper [CSIAM Trans. Appl. Math. 2 (2021), 1-55], the authors proposed a theoretical framework for the analysis of RNA velocity, which is a promising concept in scRNA-seq data analysis to reveal the cell state-transition dynamical processes underlying snapshot data. The current paper is devoted to the algorithmic study of some key components in RNA velocity workflow. Four important po… ▽ More In a previous paper [CSIAM Trans. Appl. Math. 2 (2021), 1-55], the authors proposed a theoretical framework for the analysis of RNA velocity, which is a promising concept in scRNA-seq data analysis to reveal the cell state-transition dynamical processes underlying snapshot data. The current paper is devoted to the algorithmic study of some key components in RNA velocity workflow. Four important points are addressed in this paper: (1) We construct a rational time-scale fixation method which can determine the global gene-shared latent time for cells. (2) We present an uncertainty quantification strategy for the inferred parameters obtained through the EM algorithm. (3) We establish the optimal criterion for the choice of velocity kernel bandwidth with respect to the sample size in the downstream analysis and discuss its implications. (4) We propose a temporal distance estimation approach between two cell clusters along the cellular development path. Some illustrative numerical tests are also carried out to verify our analysis. These results are intended to provide tools and insights in further development of RNA velocity type methods in the future. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 32 pages, 5 figures

arXiv:2305.13526 [pdf, ps, other]

A finite theorem for Ahlfors' covering surface theory

Authors: Tian-Run Li, Yun-Ling Chen, Guang-Yuan Zhang

Abstract: Ahlfors' theory of covering surfaces is one of the major mathematical achievement of last century. The most important part of his theory is the Second Fundamental Theorem (SFT). We are interested in the relation of errors of Ahlfors' SFT with the same boundary curve. In this paper we will prove a result which is used to establish the best bound of the constant in Ahlfors' SFT in arXiv.2307.04623… ▽ More Ahlfors' theory of covering surfaces is one of the major mathematical achievement of last century. The most important part of his theory is the Second Fundamental Theorem (SFT). We are interested in the relation of errors of Ahlfors' SFT with the same boundary curve. In this paper we will prove a result which is used to establish the best bound of the constant in Ahlfors' SFT in arXiv.2307.04623. Precisely speaking, we will prove that for any surface $Σ\in\mathcal{F}_r(L,m)$, a new surface $Σ_1$ can be constructed based on it, such that $R(Σ_1)\ge R(Σ)$ and $L(\partialΣ_1)\le L(\partialΣ)$, where $R(Σ)$ is Ahlfors' error term and $L(\partialΣ)$ is the boundary length of the surface $Σ$, and the covering degree of $Σ_1$ has an upper bound independent of surfaces. Meanwhile, this conclusion suggests that the supremum of $H(Σ)=R(Σ)/L(\partialΣ)$ can be achieved by surfaces in the space $\mathcal{F}_r'(L,m)$. △ Less

Submitted 12 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 21 pages, 2 figures

MSC Class: 30D35; 30D45; 52B60

arXiv:2305.12615 [pdf, other]

doi 10.1007/s00220-023-04916-1

Global Finite-Energy Solutions of the Compressible Euler-Poisson Equations for General Pressure Laws with Spherical Symmetry

Authors: Gui-Qiang G. Chen, Feimin Huang, Tianhong Li, Weiqiang Wang, Yong Wang

Abstract: We are concerned with global finite-energy solutions of the three-dimensional compressible Euler-Poisson equations with gravitational potential and general pressure law, especially including the constitutive equation of white dwarf stars. We construct global finite-energy solutions of the Cauchy problem for the Euler-Poisson equations with large initial data of spherical symmetry as the inviscid l… ▽ More We are concerned with global finite-energy solutions of the three-dimensional compressible Euler-Poisson equations with gravitational potential and general pressure law, especially including the constitutive equation of white dwarf stars. We construct global finite-energy solutions of the Cauchy problem for the Euler-Poisson equations with large initial data of spherical symmetry as the inviscid limit of the solutions of the corresponding Cauchy problem for the Navier-Stokes-Poisson equations. The strong convergence of the vanishing viscosity solutions is achieved through entropy analysis, uniform estimates in $L^p$, and a more general compensated compactness framework via several new ingredients. A key estimate is first established for the integrability of the density over unbounded domains independent of the viscosity coefficient. Then a special entropy pair is carefully designed by solving a Goursat problem for the entropy equation such that a higher integrability of the velocity is established, which is a crucial step. Moreover, the weak entropy kernel for the general pressure law and its fractional derivatives of the required order near vacuum ($ρ=0$) and far-field ($ρ=\infty$) are carefully analyzed. Owing to the generality of the pressure law, only the $W^{-1,p}_{\rm loc}$-compactness of weak entropy dissipation measures with $p\in [1,2)$ can be obtained; this is rescued by the equi-integrability of weak entropy pairs which can be established by the estimates obtained above so that the div-curl lemma still applies. Finally, based on the above analysis of weak entropy pairs, the $L^p$ compensated compactness framework for the compressible Euler equations with general pressure law is established. This new compensated compactness framework and the techniques developed in this paper should be useful for solving further nonlinear problems with similar features. △ Less

Submitted 12 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 98 pages, 1 figure; To appear in "Communications in Mathematical Physics", 2024

MSC Class: 35Q85; 85A30; 35L65; 35D30; 35Q31; 76N10

arXiv:2305.06172 [pdf, other]

Principal Feature Detection via $Φ$-Sobolev Inequalities

Authors: Matthew T. C. Li, Youssef Marzouk, Olivier Zahm

Abstract: We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the refe… ▽ More We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the reference measure satisfies a subspace $φ$-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari $α$-divergences. Our construction proceeds in two stages. First, for any feature map and any $α$-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the $φ$-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of $α$-divergences $α\in (0,1]$. We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps. △ Less

Submitted 16 January, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: To appear in Bernoulli, but this version contains both the main file and the supplementary material

arXiv:2305.05868 [pdf, other]

Hadwiger's Conjecture for some graphs with independence number two

Authors: Tong Li, Qiang Zhou

Abstract: Let $h(G)$ denote the largest $t$ such that $G$ contains $K_t$ as a minor and $χ(G)$ be the chromatic number of $G$ respectively. In 1943, Hadwiger conjectured that $h(G) \geq χ(G)$ for any graph $G$. In this paper, we prove that Hadwiger's conjecture holds for $H$-free graphs with independence number two, where $H$ is one of some specified graphs. Let $h(G)$ denote the largest $t$ such that $G$ contains $K_t$ as a minor and $χ(G)$ be the chromatic number of $G$ respectively. In 1943, Hadwiger conjectured that $h(G) \geq χ(G)$ for any graph $G$. In this paper, we prove that Hadwiger's conjecture holds for $H$-free graphs with independence number two, where $H$ is one of some specified graphs. △ Less

Submitted 30 March, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 11 pages, 3 figures

arXiv:2304.14197 [pdf, other]

Logarithmic-Regret Quantum Learning Algorithms for Zero-Sum Games

Authors: Minbo Gao, Zhengfeng Ji, Tongyang Li, Qisheng Wang

Abstract: We propose the first online quantum algorithm for zero-sum games with $\tilde O(1)$ regret under the game setting. Moreover, our quantum algorithm computes an $\varepsilon$-approximate Nash equilibrium of an $m \times n$ matrix zero-sum game in quantum time $\tilde O(\sqrt{m+n}/\varepsilon^{2.5})$, yielding a quadratic improvement over classical algorithms in terms of $m, n$. Our algorithm uses st… ▽ More We propose the first online quantum algorithm for zero-sum games with $\tilde O(1)$ regret under the game setting. Moreover, our quantum algorithm computes an $\varepsilon$-approximate Nash equilibrium of an $m \times n$ matrix zero-sum game in quantum time $\tilde O(\sqrt{m+n}/\varepsilon^{2.5})$, yielding a quadratic improvement over classical algorithms in terms of $m, n$. Our algorithm uses standard quantum inputs and generates classical outputs with succinct descriptions, facilitating end-to-end applications. As an application, we obtain a fast quantum linear programming solver. Technically, our online quantum algorithm "quantizes" classical algorithms based on the optimistic multiplicative weight update method. At the heart of our algorithm is a fast quantum multi-sampling procedure for the Gibbs sampling problem, which may be of independent interest. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.04955 [pdf, other]

On Beckner's Inequality for Axially Symmetric Functions on $\mathbb{S}^6$

Authors: Changfeng Gui, Tuoxin Li, Juncheng Wei, Zikai Ye

Abstract: We prove that axially symmetric solutions to the $Q$-curvature type problem $$ αP_6 u + 120(1-\frac{e^{6u}}{\int_{\mathbb{S}^6} e^{6u}})=0 \ \ \ \ \ \mbox{on} \ \mathbb{S}^6 $$ must be constants, provided that $ \frac{1}{2}\leq α<1$. In view of the existence of non-constant solutions obtained by Gui-Hu-Xie \cite{GHW2022} for $\frac{1}{7}<α<\frac{1}{2}$, this result is sharp. This result closes the… ▽ More We prove that axially symmetric solutions to the $Q$-curvature type problem $$ αP_6 u + 120(1-\frac{e^{6u}}{\int_{\mathbb{S}^6} e^{6u}})=0 \ \ \ \ \ \mbox{on} \ \mathbb{S}^6 $$ must be constants, provided that $ \frac{1}{2}\leq α<1$. In view of the existence of non-constant solutions obtained by Gui-Hu-Xie \cite{GHW2022} for $\frac{1}{7}<α<\frac{1}{2}$, this result is sharp. This result closes the gap of the related results in \cite{GHW2022}, which proved a similar uniqueness result for $α\geq 0.6168$. The improvement is based on two types of new estimates: one is a better estimate of the semi-norm $\lfloor G\rfloor^2$, the other one is a family of refined estimates on Gegenbauer coefficients, such as pointwise decaying and cancellations properties. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 31 pages; any comment is welcome

arXiv:2303.13117 [pdf, other]

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

Authors: Ching Pui Wan, Tung Li, Jason Min Wang

Abstract: Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on develo** neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation… ▽ More Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on develo** neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation research problems. In this work, we analyze the end-to-end autoregressive models for vehicle routing problems and show that these models can benefit from the recent advances in reinforcement learning with a careful re-implementation of the model architecture. In particular, we re-implemented the Attention Model and trained it with Proximal Policy Optimization (PPO) in CleanRL, showing at least 8 times speed up in training time. We hereby introduce RLOR, a flexible framework for Deep Reinforcement Learning for Operation Research. We believe that a flexible framework is key to develo** deep reinforcement learning models for operation research problems. The code of our work is publicly available at https://github.com/cpwan/RLOR. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: 21 pages

arXiv:2303.12607 [pdf, other]

Algebraic capacity as tropical polynomial over $c_1$-nef symplectic cone

Authors: Tian-Jun Li, Shengzhen Ning

Abstract: In a series of work [Wor22],[Wor21] and [CW20], algebraic capacity was introduced in an algebraic manner for polarized surfaces and applied to the symplectic embedding problems. In this note, we give a reformulation of algebraic capacity in terms of almost complex geometry. For rational surfaces with $c_1\cdotω> 0$, we further introduce a sequence of tropical polynomials which will describe those… ▽ More In a series of work [Wor22],[Wor21] and [CW20], algebraic capacity was introduced in an algebraic manner for polarized surfaces and applied to the symplectic embedding problems. In this note, we give a reformulation of algebraic capacity in terms of almost complex geometry. For rational surfaces with $c_1\cdotω> 0$, we further introduce a sequence of tropical polynomials which will describe those capacities viewed as functions over the space of such symplectic forms. As an application, we give a direct proof of the correspondence between algebraic capacity and ECH capcity for smooth toric surface without terminology from algebraic geometry. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.11789 [pdf, ps, other]

Random Inverse Problems Over Graphs: Decentralized Online Learning

Authors: Tao Li, Xiwei Zhang

Abstract: We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a cla… ▽ More We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition. △ Less

Submitted 29 May, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.10599 [pdf, ps, other]

Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators

Authors: Tianyou Li, Fan Chen, Huajie Chen, Zaiwen Wen

Abstract: Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational i… ▽ More Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational inference. In this paper, we consider the SGD algorithm that employ the Markov Chain Monte Carlo (MCMC) estimator to compute the gradient, called MCMC-SGD. Since MCMC reduces the sampling complexity significantly, it is an asymptotically convergent biased estimator in practice. Moreover, by incorporating a general class of unbounded functions, it is much more difficult to analyze the MCMC sampling error. Therefore, we assume that the function is sub-exponential and use the Bernstein inequality for non-stationary Markov chains to derive error bounds of the MCMC estimator. Consequently, MCMC-SGD is proven to have a first order convergence rate $O(\log K/\sqrt{n K})$ with $K$ iterations and a sample size $n$. It partially explains how MCMC influences the behavior of SGD. Furthermore, we verify the correlated negative curvature condition under reasonable assumptions. It is shown that MCMC-SGD escapes from saddle points and reaches $(ε,ε^{1/4})$ approximate second order stationary points or $ε^{1/2}$-variance points at least $O(ε^{-11/2}\log^{2}(1/ε) )$ steps with high probability. Our analysis unveils the convergence pattern of MCMC-SGD across a broad class of stochastic optimization problems, and interprets the convergence phenomena observed in practical applications. △ Less

Submitted 23 March, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

Showing 1–50 of 306 results for author: Li, T