-
Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization
Authors:
Yan Chen,
Tao Li
Abstract:
We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic grad…
▽ More
We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic gradient descent and gradient tracking algorithms over the graphon. We establish a general lemma for the upper bound estimation related to a class of time-varying differential inequalities with negative linear terms, based upon which, we prove that for both kinds of algorithms, the second moments of the nodes' states are uniformly bounded. Especially, for the stochastic gradient tracking algorithm, we transform the convergence analysis into the asymptotic property of coupled nonlinear differential inequalities with time-varying coefficients and develop a decoupling method. For both kinds of algorithms, we show that by choosing the time-varying algorithm gains properly, all nodes' states achieve $\mathcal{L}^{\infty}$-consensus for a connected graphon. Furthermore, if the local cost functions are strongly convex, then all nodes' states converge to the minimizer of the global cost function and the auxiliary states in the stochastic gradient tracking algorithm converge to the gradient value of the global cost function at the minimizer uniformly in mean square.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A Bayesian framework for spectral reprojection
Authors:
Tongtong Li,
Anne Gelb
Abstract:
Fourier partial sum approximations yield exponential accuracy for smooth and periodic functions, but produce the infamous Gibbs phenomenon for non-periodic ones. Spectral reprojection resolves the Gibbs phenomenon by projecting the Fourier partial sum onto a Gibbs complementary basis, often prescribed as the Gegenbauer polynomials. Noise in the Fourier data and the Runge phenomenon both degrade th…
▽ More
Fourier partial sum approximations yield exponential accuracy for smooth and periodic functions, but produce the infamous Gibbs phenomenon for non-periodic ones. Spectral reprojection resolves the Gibbs phenomenon by projecting the Fourier partial sum onto a Gibbs complementary basis, often prescribed as the Gegenbauer polynomials. Noise in the Fourier data and the Runge phenomenon both degrade the quality of the Gegenbauer reconstruction solution, however. Motivated by its theoretical convergence properties, this paper proposes a new Bayesian framework for spectral reprojection, which allows a greater understanding of the impact of noise on the reprojection method from a statistical point of view. We are also able to improve the robustness with respect to the Gegenbauer polynomials parameters. Finally, the framework provides a mechanism to quantify the uncertainty of the solution estimate.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities
Authors:
Matthew T. C. Li,
Tiangang Cui,
Fengyi Li,
Youssef Marzouk,
Olivier Zahm
Abstract:
Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga…
▽ More
Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincaré inequality offers improved bounds.
△ Less
Submitted 21 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Planar Turán number for balanced double stars
Authors:
Xin Xu,
Qiang Zhou,
Tong Li,
Guiying Yan
Abstract:
Planar Turán number, denoted by $ex_{\mathcal{P}}(n,H)$, is the maximum number of edges in an $n$-vertex planar graph which does not contain $H$ as a subgraph. Ghosh, Győri, Paulos and Xiao initiated the topic of the planar Turán number for double stars. For balanced double star, $S_{3,3}$ is the only remaining graph need to be considered. In this paper, we give the exact value of…
▽ More
Planar Turán number, denoted by $ex_{\mathcal{P}}(n,H)$, is the maximum number of edges in an $n$-vertex planar graph which does not contain $H$ as a subgraph. Ghosh, Győri, Paulos and Xiao initiated the topic of the planar Turán number for double stars. For balanced double star, $S_{3,3}$ is the only remaining graph need to be considered. In this paper, we give the exact value of $ex_{\mathcal{P}}(n,S_{3,3})$, forcing the planar Turán number for all balanced double stars completely determined.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Quantum Algorithms and Lower Bounds for Finite-Sum Optimization
Authors:
Yexin Zhang,
Chenyi Zhang,
Cong Fang,
Liwei Wang,
Tongyang Li
Abstract:
Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $ψ\colon\mathbb{R}^d\to\mathbb{R}$ be a $μ$-stro…
▽ More
Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $ψ\colon\mathbb{R}^d\to\mathbb{R}$ be a $μ$-strongly convex proximal function. The goal is to find an $ε$-optimal point for $F(\mathbf{x})=\frac{1}{n}\sum_{i=1}^n f_i(\mathbf{x})+ψ(\mathbf{x})$. We give a quantum algorithm with complexity $\tilde{O}\big(n+\sqrt{d}+\sqrt{\ell/μ}\big(n^{1/3}d^{1/3}+n^{-2/3}d^{5/6}\big)\big)$, improving the classical tight bound $\tildeΘ\big(n+\sqrt{n\ell/μ}\big)$. We also prove a quantum lower bound $\tildeΩ(n+n^{3/4}(\ell/μ)^{1/4})$ when $d$ is large enough. Both our quantum upper and lower bounds can extend to the cases where $ψ$ is not necessarily strongly convex, or each $f_i$ is Lipschitz but not necessarily smooth. In addition, when $F$ is nonconvex, our quantum algorithm can find an $ε$-critial point using $\tilde{O}(n+\ell(d^{1/3}n^{1/3}+\sqrt{d})/ε^2)$ queries.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Efficient Optimal Control of Open Quantum Systems
Authors:
Wenhao He,
Tongyang Li,
Xiantao Li,
Zecheng Li,
Chunhao Wang,
Ke Wang
Abstract:
The optimal control problem for open quantum systems can be formulated as a time-dependent Lindbladian that is parameterized by a number of time-dependent control variables. Given an observable and an initial state, the goal is to tune the control variables so that the expected value of some observable with respect to the final state is maximized. In this paper, we present algorithms for solving t…
▽ More
The optimal control problem for open quantum systems can be formulated as a time-dependent Lindbladian that is parameterized by a number of time-dependent control variables. Given an observable and an initial state, the goal is to tune the control variables so that the expected value of some observable with respect to the final state is maximized. In this paper, we present algorithms for solving this optimal control problem efficiently, i.e., having a poly-logarithmic dependency on the system dimension, which is exponentially faster than best-known classical algorithms. Our algorithms are hybrid, consisting of both quantum and classical components. The quantum procedure simulates time-dependent Lindblad evolution that drives the initial state to the final state, and it also provides access to the gradients of the objective function via quantum gradient estimation. The classical procedure uses the gradient information to update the control variables.
At the technical level, we provide the first (to the best of our knowledge) simulation algorithm for time-dependent Lindbladians with an $\ell_1$-norm dependence. As an alternative, we also present a simulation algorithm in the interaction picture to improve the algorithm for the cases where the time-independent component of a Lindbladian dominates the time-dependent part. On the classical side, we heavily adapt the state-of-the-art classical optimization analysis to interface with the quantum part of our algorithms. Both the quantum simulation techniques and the classical optimization analyses might be of independent interest.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Graphon Particle Systems, Part I: Spatio-Temporal Approximation and Law of Large Numbers
Authors:
Yan Chen,
Tao Li
Abstract:
We study a class of graphon particle systems with time-varying random coefficients. In a graphon particle system, the interactions among particles are characterized by the coupled mean field terms through an underlying graphon and the randomness of the coefficients comes from the stochastic processes associated with the particle labels. By constructing two-level approximated sequences converging i…
▽ More
We study a class of graphon particle systems with time-varying random coefficients. In a graphon particle system, the interactions among particles are characterized by the coupled mean field terms through an underlying graphon and the randomness of the coefficients comes from the stochastic processes associated with the particle labels. By constructing two-level approximated sequences converging in 2-Wasserstein distance, we prove the existence and uniqueness of the solution to the system. Besides, by constructing two-level approximated functions converging to the graphon mean field terms, we establish the law of large numbers, which reveals that if the number of particles tends to infinity and the discretization step tends to zero, then the discrete-time interacting particle system over a large-scale network converges to the graphon particle system. As a byproduct, we discover that the graphon particle system can describe the limiting dynamics of the distributed stochastic gradient descent algorithm over the large-scale network and prove that if the gradients of the local cost functions are Lipschitz continuous, then the graphon particle system can be regarded as the spatio-temporal approximation of the discrete-time distributed stochastic gradient descent algorithm as the number of network nodes tends to infinity and the algorithm step size tends to zero.
△ Less
Submitted 2 July, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Comparisons Are All You Need for Optimizing Smooth Functions
Authors:
Chenyi Zhang,
Tongyang Li
Abstract:
When optimizing machine learning models, there are various scenarios where gradient computations are challenging or even infeasible. Furthermore, in reinforcement learning (RL), preference-based RL that only compares between options has wide applications, including reinforcement learning with human feedback in large language models. In this paper, we systematically study optimization of a smooth f…
▽ More
When optimizing machine learning models, there are various scenarios where gradient computations are challenging or even infeasible. Furthermore, in reinforcement learning (RL), preference-based RL that only compares between options has wide applications, including reinforcement learning with human feedback in large language models. In this paper, we systematically study optimization of a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$ only assuming an oracle that compares function values at two points and tells which is larger. When $f$ is convex, we give two algorithms using $\tilde{O}(n/ε)$ and $\tilde{O}(n^{2})$ comparison queries to find an $ε$-optimal solution, respectively. When $f$ is nonconvex, our algorithm uses $\tilde{O}(n/ε^2)$ comparison queries to find an $ε$-approximate stationary point. All these results match the best-known zeroth-order algorithms with function evaluation queries in $n$ dependence, thus suggest that \emph{comparisons are all you need for optimizing smooth functions using derivative-free methods}. In addition, we also give an algorithm for esca** saddle points and reaching an $ε$-second order stationary point of a nonconvex $f$, using $\tilde{O}(n^{1.5}/ε^{2.5})$ comparison queries.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Anti-Ramsey Numbers of Expansions of Doubly Edge-critical Graphs in Uniform Hypergraphs
Authors:
Tong Li,
Yucong Tang,
Guiying Yan
Abstract:
For an $r$-graph $H$, the anti-Ramsey number ${\rm ar}(n,r,H)$ is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-graph on $n$ vertices with at least $c$ colors, there is a copy of $H$ whose edges have distinct colors. A 2-graph $F$ is doubly edge-$p$-critical if the chromatic number $χ(F - e)\geq p$ for every edge $e$ in $F$ and there exist two edges…
▽ More
For an $r$-graph $H$, the anti-Ramsey number ${\rm ar}(n,r,H)$ is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-graph on $n$ vertices with at least $c$ colors, there is a copy of $H$ whose edges have distinct colors. A 2-graph $F$ is doubly edge-$p$-critical if the chromatic number $χ(F - e)\geq p$ for every edge $e$ in $F$ and there exist two edges $e_1,e_2$ in $F$ such that $χ(F -e_1- e_2)=p-1$. The anti-Ramsey numbers of doubly edge-$p$-critical 2-graphs were determined by Jiang and Pikhurko \cite{Jiang&Pikhurko2009}, which generalized the anti-Ramsey numbers of cliques determined by Erdős, Simonovits and Sós \cite{Erdos&Simonovits&Sos1975}. In general, few exact values of anti-Ramsey numbers of $r$-graphs are known for $r\geq 3$. Given a 2-graph $F$, the expansion $F^{(r)}$ of $F$ is an $r$-graph on $|V(F)|+(r-2)|F|$ vertices obtained from $F$ by adding $r-2$ new vertices to each edge of $F$. In this paper, we determine the exact value of ${\rm ar}(n,r,F^{(r)})$ for any doubly edge-$p$-critical 2-graph $F$ with $p>r\geq 3$ and sufficiently large $n$.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Synchronization of High-Dimensional Linear Networks over Finite Fields
Authors:
Siyu Zou,
Ting Li,
Jiandong Zhu
Abstract:
This paper investigates the synchronization problems for general high-dimensional linear networks over finite fields. By using the technique of linear transformations and invariant subspaces for linear spaces over finite fields, several necessary and sufficient conditions for the synchronization of high-dimensional linear networks over finite fields are proposed. This paper not only generalizes th…
▽ More
This paper investigates the synchronization problems for general high-dimensional linear networks over finite fields. By using the technique of linear transformations and invariant subspaces for linear spaces over finite fields, several necessary and sufficient conditions for the synchronization of high-dimensional linear networks over finite fields are proposed. This paper not only generalizes the existing results from 1-dimensional to high-dimensional linear networks but also adopts a new approach. Finally, some numerical examples are given to illustrate the effectiveness of our theoretical results.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
A Pair of Bayesian Network Structures has Undecidable Conditional Independencies
Authors:
Cheuk Ting Li
Abstract:
Given a Bayesian network structure (directed acyclic graph), the celebrated d-separation algorithm efficiently determines whether the network structure implies a given conditional independence relation. We show that this changes drastically when we consider two Bayesian network structures instead. It is undecidable to determine whether two given network structures imply a given conditional indepen…
▽ More
Given a Bayesian network structure (directed acyclic graph), the celebrated d-separation algorithm efficiently determines whether the network structure implies a given conditional independence relation. We show that this changes drastically when we consider two Bayesian network structures instead. It is undecidable to determine whether two given network structures imply a given conditional independency, that is, whether every collection of random variables satisfying both network structures must also satisfy the conditional independency. Although the approximate combination of two Bayesian networks is a well-studied topic, our result shows that it is fundamentally impossible to accurately combine the knowledge of two Bayesian network structures, in the sense that no algorithm can tell what conditional independencies are implied by the two network structures. We can also explicitly construct two Bayesian network structures, such that whether they imply a certain conditional independency is unprovable in the ZFC set theory, assuming ZFC is consistent.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Anti-Ramsey numbers of loose paths and cycles in uniform hypergraphs
Authors:
Tong Li,
Yucong Tang,
Guanghui Wang,
Guiying Yan
Abstract:
For a fixed family of $r$-uniform hypergraphs $\mathcal{F}$, the anti-Ramsey number of $\mathcal{F}$, denoted by $ ar(n,r,\mathcal{F})$, is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-uniform hypergraph on $n$ vertices with at least $c$ colors, there is a rainbow copy of some hypergraph in $\mathcal{F}$. Here, a rainbow hypergraph is an edge-colored hypergr…
▽ More
For a fixed family of $r$-uniform hypergraphs $\mathcal{F}$, the anti-Ramsey number of $\mathcal{F}$, denoted by $ ar(n,r,\mathcal{F})$, is the minimum number $c$ of colors such that for any edge-coloring of the complete $r$-uniform hypergraph on $n$ vertices with at least $c$ colors, there is a rainbow copy of some hypergraph in $\mathcal{F}$. Here, a rainbow hypergraph is an edge-colored hypergraph with all edges colored differently. Let $\mathcal{P}_k$ and $\mathcal{C}_k$ be the families of loose paths and loose cycles with $k$ edges in an $r$-uniform hypergraph, respectively. In this paper, we determine the exact values of $ ar(n,r,\mathcal{P}_k)$ and $ ar(n,r,\mathcal{C}_k)$ for all $k\geq 4$ and $r\geq 3$.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Dissipative gradient nonlinearities prevent $δ$-formations in local and nonlocal attraction-repulsion chemotaxis models
Authors:
Tongxing Li,
Daniel Acosta Soba,
Alessandro Columbu,
Giuseppe Viglialoro
Abstract:
We study some attraction repulsion chemotaxis models, characterized by nonlinearities laws for the diffusion of the cell density, and for the chemosensitivities and the production rates of the chemoattractant and the chemorepellent. Additionally, a source also involving some expression of the gradient of the species is incorporated.
We study some attraction repulsion chemotaxis models, characterized by nonlinearities laws for the diffusion of the cell density, and for the chemosensitivities and the production rates of the chemoattractant and the chemorepellent. Additionally, a source also involving some expression of the gradient of the species is incorporated.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Robust estimations from distribution structures: III. Invariant Moments
Authors:
Tuobang Li
Abstract:
Descriptive statistics for parametric models are currently highly sensative to departures, gross errors, and/or random errors. Here, leveraging the structures of parametric distributions and their central moment kernel distributions, a class of estimators, consistent simultanously for both a semiparametric distribution and a distinct parametric distribution, is proposed. These efficient estimators…
▽ More
Descriptive statistics for parametric models are currently highly sensative to departures, gross errors, and/or random errors. Here, leveraging the structures of parametric distributions and their central moment kernel distributions, a class of estimators, consistent simultanously for both a semiparametric distribution and a distinct parametric distribution, is proposed. These efficient estimators are robust to both gross errors and departures from parametric assumptions, making them ideal for estimating the mean and central moments of common unimodal distributions. This article opens up the possibility of utilizing the common nature of probability models to construct near-optimal estimators that are suitable for various scenarios.
△ Less
Submitted 13 June, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Robust estimations from distribution structures: II. Central Moments
Authors:
Tuobang Li
Abstract:
In descriptive statistics, $U$-statistics arise naturally in producing minimum-variance unbiased estimators. In 1984, Serfling considered the distribution formed by evaluating the kernel of the $U$-statistics and proposed generalized $L$-statistics which includes Hodges-Lehamnn estimator and Bickel-Lehmann spread as special cases. However, the structures of the kernel distributions remain unclear.…
▽ More
In descriptive statistics, $U$-statistics arise naturally in producing minimum-variance unbiased estimators. In 1984, Serfling considered the distribution formed by evaluating the kernel of the $U$-statistics and proposed generalized $L$-statistics which includes Hodges-Lehamnn estimator and Bickel-Lehmann spread as special cases. However, the structures of the kernel distributions remain unclear. In 1954, Hodges and Lehmann demonstrated that if $X$ and $Y$ are independently sampled from the same unimodal distribution, $X-Y$ will exhibit symmetrical unimodality with its peak centered at zero. Building upon this foundational work, the current study delves into the structure of the kernel distribution. It is shown that the $\mathbf{k}$th central moment kernel distributions ($\mathbf{k}>2$) derived from a unimodal distribution exhibit location invariance and is also nearly unimodal with the mode and median close to zero. This article provides an approach to study the general structure of kernel distributions.
△ Less
Submitted 13 June, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Robust estimations from distribution structures: I. Mean
Authors:
Tuobang Li
Abstract:
As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that their maximum biases concerning the mean can be quite different, but the underlying mechanisms largely remain unclear. This study exploited a semiparametric me…
▽ More
As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that their maximum biases concerning the mean can be quite different, but the underlying mechanisms largely remain unclear. This study exploited a semiparametric method to classify distributions by the asymptotic orderliness of quantile combinations with varying breakdown points, showing their interrelations and connections to parametric distributions. Further deductions explain why the Winsorized mean typically has smaller biases compared to the trimmed mean; two sequences of semiparametric robust mean estimators emerge, particularly highlighting the superiority of the median Hodges Lehmann mean. This article sheds light on the understanding of the common nature of probability distributions.
△ Less
Submitted 13 June, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Very weak solutions of the Dirichlet problem for 2-Hessian equation
Authors:
Tongtong Li,
Guohuan Qiu
Abstract:
For any $α$ small, we construct infinitely many $C^{1,α}$ very weak solutions to the 2-Hessian equation with prescribed boundary value. The proof relies on the convex integration method and cut-off technique.
For any $α$ small, we construct infinitely many $C^{1,α}$ very weak solutions to the 2-Hessian equation with prescribed boundary value. The proof relies on the convex integration method and cut-off technique.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Near-Optimal Quantum Algorithm for Minimizing the Maximal Loss
Authors:
Hao Wang,
Chenyi Zhang,
Tongyang Li
Abstract:
The problem of minimizing the maximum of $N$ convex, Lipschitz functions plays significant roles in optimization and machine learning. It has a series of results, with the most recent one requiring $O(Nε^{-2/3} + ε^{-8/3})$ queries to a first-order oracle to compute an $ε$-suboptimal point. On the other hand, quantum algorithms for optimization are rapidly advancing with speedups shown on many imp…
▽ More
The problem of minimizing the maximum of $N$ convex, Lipschitz functions plays significant roles in optimization and machine learning. It has a series of results, with the most recent one requiring $O(Nε^{-2/3} + ε^{-8/3})$ queries to a first-order oracle to compute an $ε$-suboptimal point. On the other hand, quantum algorithms for optimization are rapidly advancing with speedups shown on many important optimization problems. In this paper, we conduct a systematic study for quantum algorithms and lower bounds for minimizing the maximum of $N$ convex, Lipschitz functions. On one hand, we develop quantum algorithms with an improved complexity bound of $\tilde{O}(\sqrt{N}ε^{-5/3} + ε^{-8/3})$. On the other hand, we prove that quantum algorithms must take $\tildeΩ(\sqrt{N}ε^{-2/3})$ queries to a first order quantum oracle, showing that our dependence on $N$ is optimal up to poly-logarithmic factors.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
A new type of simplified inverse Lax-Wendroff boundary treatment I: hyperbolic conservation laws
Authors:
Shihao Liu,
Tingting Li,
Ziqiang Cheng,
Yan Jiang,
Chi-Wang Shu,
Meng** Zhang
Abstract:
In this paper, we design a new kind of high order inverse Lax-Wendroff (ILW) boundary treatment for solving hyperbolic conservation laws with finite difference method on a Cartesian mesh. This new ILW method decomposes the construction of ghost point values near inflow boundary into two steps: interpolation and extrapolation. At first, we impose values of some artificial auxiliary points through a…
▽ More
In this paper, we design a new kind of high order inverse Lax-Wendroff (ILW) boundary treatment for solving hyperbolic conservation laws with finite difference method on a Cartesian mesh. This new ILW method decomposes the construction of ghost point values near inflow boundary into two steps: interpolation and extrapolation. At first, we impose values of some artificial auxiliary points through a polynomial interpolating the interior points near the boundary. Then, we will construct a Hermite extrapolation based on those auxiliary point values and the spatial derivatives at boundary obtained via the ILW procedure. This polynomial will give us the approximation to the ghost point value. By an appropriate selection of those artificial auxiliary points, high-order accuracy and stable results can be achieved. Moreover, theoretical analysis indicates that comparing with the original ILW method, especially for higher order accuracy, the new proposed one would require fewer terms using the relatively complicated ILW procedure and thus improve computational efficiency on the premise of maintaining accuracy and stability. We perform numerical experiments on several benchmarks, including one- and two-dimensional scalar equations and systems. The robustness and efficiency of the proposed scheme is numerically verified.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
QQMR: A Structure-Preserving Quaternion Quasi-Minimal Residual Method for Non-Hermitian Quaternion Linear Systems
Authors:
Tao Li,
Qing-Wen Wang,
Xin-Fang Zhang
Abstract:
The quaternion biconjugate gradient (QBiCG) method, as a novel variant of quaternion Lanczos-type methods for solving the non-Hermitian quaternion linear systems, does not yield a minimization property. This means that the method possesses a rather irregular convergence behavior, which leads to numerical instability. In this paper, we propose a new structure-preserving quaternion quasi-minimal res…
▽ More
The quaternion biconjugate gradient (QBiCG) method, as a novel variant of quaternion Lanczos-type methods for solving the non-Hermitian quaternion linear systems, does not yield a minimization property. This means that the method possesses a rather irregular convergence behavior, which leads to numerical instability. In this paper, we propose a new structure-preserving quaternion quasi-minimal residual method, based on the quaternion biconjugate orthonormalization procedure with coupled two-term recurrences, which overcomes the drawback of QBiCG. The computational cost and storage required by the proposed method are much less than the traditional QMR iterations for the real representation of quaternion linear systems. Some convergence properties of which are also established. Finally, we report the numerical results to show the robustness and effectiveness of the proposed method compared with QBiCG.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
$n$-Dimensional Volumetric Stretch Energy Minimization for Volume-/Mass-Preserving Parameterizations
Authors:
Zhong-Heng Tan,
Tiexiang Li,
Wen-Wei Lin,
Shing-Tung Yau
Abstract:
In this paper, we develop an $n$ dimensional volumetric stretch energy ($n$-VSE) functional for the volume-/mass-preserving parameterization of the $n$-manifolds topologically equivalent to $n$-ball. The $n$-VSE has a lower bound and equal to it if and only if the map is volume-/mass-preserving. This motivates us to minimize the $n$-VSE to achieve the ideal volume-/mass-preserving parameterization…
▽ More
In this paper, we develop an $n$ dimensional volumetric stretch energy ($n$-VSE) functional for the volume-/mass-preserving parameterization of the $n$-manifolds topologically equivalent to $n$-ball. The $n$-VSE has a lower bound and equal to it if and only if the map is volume-/mass-preserving. This motivates us to minimize the $n$-VSE to achieve the ideal volume-/mass-preserving parameterization. In the discrete case, we also guarantee the relation between the lower bound and the volume-/mass-preservation, and propose the spherical and ball volume-/mass-preserving parameterization algorithms. The numerical experiments indicate the accuracy and robustness of the proposed algorithms. The modified algorithms are applied to the manifold registration and deformation, showing the versatility of $n$-VSE.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Learning based numerical methods for Helmholtz equation with high frequency
Authors:
Yu Chen,
** Cheng,
Tingyue Li,
Yun Miao
Abstract:
High-frequency issues have been remarkably challenges in numerical methods for partial differential equations. In this paper, a learning based numerical method (LbNM) is proposed for Helmholtz equation with high frequency. The main novelty is using Tikhonov regularization method to stably learn the solution operator by utilizing relevant information especially the fundamental solutions. Then apply…
▽ More
High-frequency issues have been remarkably challenges in numerical methods for partial differential equations. In this paper, a learning based numerical method (LbNM) is proposed for Helmholtz equation with high frequency. The main novelty is using Tikhonov regularization method to stably learn the solution operator by utilizing relevant information especially the fundamental solutions. Then applying the solution operator to a new boundary input could quickly update the solution. Based on the method of fundamental solutions and the quantitative Runge approximation, we give the error estimate. This indicates interpretability and generalizability of the present method. Numerical results validates the error analysis and demonstrates the high-precision and high-efficiency features.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
On integral images of Curtis homomorphisms for $\mathrm{GL}_n$ and $\mathrm{U}_n$
Authors:
Tzu-Jan Li
Abstract:
For $G = \mathrm{GL}_n$ or $\mathrm{U}_n$ defined over a finite field of characteristic $p$, we refine a result of Bonnafé and Kessar on the saturatedness of the Curtis homomorphism $\mathrm{Cur}^G$ by describing the image of $\mathrm{Cur}^G$ over $\overline{\mathbb{Z}}[1/p]$ via a system of linear conditions.
For $G = \mathrm{GL}_n$ or $\mathrm{U}_n$ defined over a finite field of characteristic $p$, we refine a result of Bonnafé and Kessar on the saturatedness of the Curtis homomorphism $\mathrm{Cur}^G$ by describing the image of $\mathrm{Cur}^G$ over $\overline{\mathbb{Z}}[1/p]$ via a system of linear conditions.
△ Less
Submitted 20 March, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
MindOpt Adapter for CPLEX Benchmarking Performance Analysis
Authors:
Mou Sun,
Tao Li,
Wotao Yin
Abstract:
This report provides a comprehensive analysis of the performance of MindOpt Adapter for CPLEX 12.9 in benchmark testing. CPLEX, recognized as a robust Mixed Integer Programming (MIP) solver, has faced some scrutiny regarding its performance on MIPLIB 2017 when configured to default settings. MindOpt Adapter aims to enhance CPLEX's performance by automatically applying improved configurations for s…
▽ More
This report provides a comprehensive analysis of the performance of MindOpt Adapter for CPLEX 12.9 in benchmark testing. CPLEX, recognized as a robust Mixed Integer Programming (MIP) solver, has faced some scrutiny regarding its performance on MIPLIB 2017 when configured to default settings. MindOpt Adapter aims to enhance CPLEX's performance by automatically applying improved configurations for solving optimization problems. Our testing demonstrates that MindOpt Adapter for CPLEX yields successfully solved 232 of the 240 problems in the MIPLIB 2017 benchmark set. This performance surpasses all the other solvers in terms of the number of problems solved and the geometric mean of running times. The report provides a comparison of the benchmark results against the outcomes achieved by CPLEX under its default configuration.
△ Less
Submitted 31 January, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
On recognition of the direct squares of the simple groups with abelian Sylow 2-subgroups
Authors:
Tao Li,
A. R. Moghaddamfar,
Andrey V. Vasil'ev,
Zhigang Wang
Abstract:
The spectrum of a group is the set of orders of its elements. Finite groups with the same spectra as the direct squares of the finite simple groups with abelian Sylow 2-subgroups are considered. It is proved that the direct square $J_1\times J_1$ of the sporadic Janko group $J_1$ and the direct squares ${^2}G_2(q)\times{^2}G_2(q)$ of the simple small Ree groups ${^2}G_2(q)$ are uniquely characteri…
▽ More
The spectrum of a group is the set of orders of its elements. Finite groups with the same spectra as the direct squares of the finite simple groups with abelian Sylow 2-subgroups are considered. It is proved that the direct square $J_1\times J_1$ of the sporadic Janko group $J_1$ and the direct squares ${^2}G_2(q)\times{^2}G_2(q)$ of the simple small Ree groups ${^2}G_2(q)$ are uniquely characterized by their spectra in the class of finite groups, while for the direct square $PSL_2(q)\times PSL_2(q)$ of a 2-dimensional simple linear group $PSL_2(q)$, there are always infinitely many groups (even solvable groups) with the same spectra.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
A Robust Hessian-based Trust Region Algorithm for Spherical Conformal Parameterizations
Authors:
Zhong-Heng Tan,
Tiexiang Li,
Wen-Wei Lin,
Shing-Tung Yau
Abstract:
Surface parameterizations are widely applied in computer graphics, medical imaging and transformation optics. In this paper, we rigorously derive the gradient vector and Hessian matrix of the discrete conformal energy for spherical conformal parameterizations of simply connected closed surfaces of genus-$0$. In addition, we give the sparsity structure of the Hessian matrix, which leads to a robust…
▽ More
Surface parameterizations are widely applied in computer graphics, medical imaging and transformation optics. In this paper, we rigorously derive the gradient vector and Hessian matrix of the discrete conformal energy for spherical conformal parameterizations of simply connected closed surfaces of genus-$0$. In addition, we give the sparsity structure of the Hessian matrix, which leads to a robust Hessian-based trust region algorithm for the computation of spherical conformal maps. Numerical experiments demonstrate the local quadratic convergence of the proposed algorithm with low conformal distortions. We subsequently propose an application of our method to surface registrations that still maintains local quadratic convergence.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks
Authors:
Zhongyuan Lyu,
Ting Li,
Dong Xia
Abstract:
In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a…
▽ More
In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization algorithm involving both node and sample splitting and a refinement procedure by likelihood-based Lloyd algorithm. Network clustering must be accompanied by node community detection. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. Numerical simulations and real data experiments both validate that our method outperforms existing methods. Oftentimes, the edges of networks carry count-type weights. We then extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks. The minimax optimal clustering error rates in these discrete mixtures all take the same exponential form characterized by the Renyi divergences. These optimal clustering error rates in discrete mixtures can also be achieved by our proposed two-stage clustering algorithm.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Quantum Langevin Dynamics for Optimization
Authors:
Zherui Chen,
Yuchen Lu,
Hao Wang,
Yizhou Liu,
Tongyang Li
Abstract:
We initiate the study of utilizing Quantum Langevin Dynamics (QLD) to solve optimization problems, particularly those non-convex objective functions that present substantial obstacles for traditional gradient descent algorithms. Specifically, we examine the dynamics of a system coupled with an infinite heat bath. This interaction induces both random quantum noise and a deterministic dam** effect…
▽ More
We initiate the study of utilizing Quantum Langevin Dynamics (QLD) to solve optimization problems, particularly those non-convex objective functions that present substantial obstacles for traditional gradient descent algorithms. Specifically, we examine the dynamics of a system coupled with an infinite heat bath. This interaction induces both random quantum noise and a deterministic dam** effect to the system, which nudge the system towards a steady state that hovers near the global minimum of objective functions. We theoretically prove the convergence of QLD in convex landscapes, demonstrating that the average energy of the system can approach zero in the low temperature limit with an exponential decay rate correlated with the evolution time. Numerically, we first show the energy dissipation capability of QLD by retracing its origins to spontaneous emission. Furthermore, we conduct detailed discussion of the impact of each parameter. Finally, based on the observations when comparing QLD with classical Fokker-Plank-Smoluchowski equation, we propose a time-dependent QLD by making temperature and $\hbar$ time-dependent parameters, which can be theoretically proven to converge better than the time-independent case and also outperforms a series of state-of-the-art quantum and classical optimization algorithms in many non-convex landscapes.
△ Less
Submitted 22 March, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
A General Space of Belief Updates for Model Misspecification in Bayesian Networks
Authors:
Tian** Li
Abstract:
In an ideal setting for Bayesian agents, a perfect description of the rules of the environment (i.e., the objective observation model) is available, allowing them to reason through the Bayesian posterior to update their beliefs in an optimal way. But such an ideal setting hardly ever exists in the natural world, so agents have to make do with reasoning about how they should update their beliefs si…
▽ More
In an ideal setting for Bayesian agents, a perfect description of the rules of the environment (i.e., the objective observation model) is available, allowing them to reason through the Bayesian posterior to update their beliefs in an optimal way. But such an ideal setting hardly ever exists in the natural world, so agents have to make do with reasoning about how they should update their beliefs simultaneously. This introduces a number of related challenges for a number of research areas: (1) For Bayesian statistics, this deviation of the subjective model from the true data-generating mechanism is termed model misspecification in the literature. (2) For neuroscience, it introduces the necessity to model how the agents' belief updates (how they use evidence to update their belief) and how their belief changes over time. The current paper addresses these two challenges by (a) providing a general class of posteriors/belief updates called cut-posteriors of Bayesian networks that have a much greater expressivity, and (b) parameterizing the space of possible posteriors to make meta-learning (i.e., choosing the belief update from this space in a principled manner) possible. For (a), it is noteworthy that any cut-posterior has local computation only, making computation tractable for human or artificial agents. For (b), a Markov Chain Monte Carlo algorithm to perform such meta-learning will be sketched here, though it is only an illustration and but no means the only possible meta-learning procedure possible for the space of cut-posteriors. Operationally, this work gives a general algorithm to take in an arbitrary Bayesian network and output all possible cut-posteriors in the space.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
A simple uniformly optimal method without line search for convex optimization
Authors:
Tianjiao Li,
Guanghui Lan
Abstract:
Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In par…
▽ More
Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Hölder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.
△ Less
Submitted 26 October, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
A Structurally Informed Data Assimilation Approach for Nonlinear Partial Differential Equations
Authors:
Tongtong Li,
Anne Gelb,
Yoonsang Lee
Abstract:
Ensemble transform Kalman filtering (ETKF) data assimilation is often used to combine available observations with numerical simulations to obtain statistically accurate and reliable state representations in dynamical systems. However, it is well known that the commonly used Gaussian distribution assumption introduces biases for state variables that admit discontinuous profiles, which are prevalent…
▽ More
Ensemble transform Kalman filtering (ETKF) data assimilation is often used to combine available observations with numerical simulations to obtain statistically accurate and reliable state representations in dynamical systems. However, it is well known that the commonly used Gaussian distribution assumption introduces biases for state variables that admit discontinuous profiles, which are prevalent in nonlinear partial differential equations. This investigation designs a new structurally informed non-Gaussian prior that exploits statistical information from the simulated state variables. In particular, we construct a new weighting matrix based on the second moment of the gradient information of the state variable to replace the prior covariance matrix used for model/data compromise in the ETKF data assimilation framework. We further adapt our weighting matrix to include information in discontinuity regions via a clustering technique. Our numerical experiments demonstrate that this new approach yields more accurate estimates than those obtained using ETKF on shallow water equations, even when ETKF is enhanced with inflation and localization techniques.
△ Less
Submitted 5 March, 2024; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Gl-QFOM and Gl-QGMRES: two efficient algorithms for quaternion linear systems with multiple right-hand sides
Authors:
Tao Li,
Qing-Wen Wang,
Xin-Fang Zhang
Abstract:
In this paper, we propose the global quaternion full orthogonalization (Gl-QFOM) and global quaternion generalized minimum residual (Gl-QGMRES) methods, which are built upon global orthogonal and oblique projections onto a quaternion matrix Krylov subspace, for solving quaternion linear systems with multiple right-hand sides. We first develop the global quaternion Arnoldi procedure to preserve the…
▽ More
In this paper, we propose the global quaternion full orthogonalization (Gl-QFOM) and global quaternion generalized minimum residual (Gl-QGMRES) methods, which are built upon global orthogonal and oblique projections onto a quaternion matrix Krylov subspace, for solving quaternion linear systems with multiple right-hand sides. We first develop the global quaternion Arnoldi procedure to preserve the quaternion Hessenberg form during the iterations. We then establish the convergence analysis of the proposed methods, and show how to apply them to solve the Sylvester quaternion matrix equation. Numerical examples are provided to illustrate the effectiveness of our methods compared with the traditional Gl-FOM and Gl-GMRES iterations for the real representations of the original linear systems.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
A Characterization of Entropy as a Universal Monoidal Natural Transformation
Authors:
Cheuk Ting Li
Abstract:
We show that the essential properties of entropy (monotonicity, additivity and subadditivity) are consequences of entropy being a monoidal natural transformation from the under category functor $-/\mathsf{LProb}_ρ$ (where $\mathsf{LProb}_ρ$ is category of $ρ$-th-power-summable probability distributions, $0<ρ<1$) to $Δ_{\mathbb{R}}$. Moreover, the Shannon entropy can be characterized as the univers…
▽ More
We show that the essential properties of entropy (monotonicity, additivity and subadditivity) are consequences of entropy being a monoidal natural transformation from the under category functor $-/\mathsf{LProb}_ρ$ (where $\mathsf{LProb}_ρ$ is category of $ρ$-th-power-summable probability distributions, $0<ρ<1$) to $Δ_{\mathbb{R}}$. Moreover, the Shannon entropy can be characterized as the universal monoidal natural transformation from $-/\mathsf{LProb}_ρ$ to the category of integrally closed partially ordered abelian groups (a reflective subcategory of the lax-slice 2-category over $\mathsf{MonCat}_{\ell}$ in the 2-category of monoidal categories), providing a succinct characterization of Shannon entropy as a reflection arrow. We can likewise define entropy for every monoidal category with a monoidal structure on its under categories (e.g. the category of finite abelian groups, the category of finite inhabited sets, the category of finite dimensional vector spaces, and the augmented simplex category) via the reflection arrow. This implies that all these entropies over different categories are components of a single natural transformation (the unit of the idempotent monad), allowing us to connect these entropies in a natural manner. We also provide a universal characterization of the conditional Shannon entropy based on the chain rule which, unlike the characterization of information loss by Baez, Fritz and Leinster, does not require any continuity assumption.
△ Less
Submitted 14 April, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
Fast and Practical Quantum-Inspired Classical Algorithms for Solving Linear Systems
Authors:
Qian Zuo,
Tongyang Li
Abstract:
We propose fast and practical quantum-inspired classical algorithms for solving linear systems. Specifically, given sampling and query access to a matrix $A\in\mathbb{R}^{m\times n}$ and a vector $b\in\mathbb{R}^m$, we propose classical algorithms that produce a data structure for the solution $x\in\mathbb{R}^{n}$ of the linear system $Ax=b$ with the ability to sample and query its entries. The re…
▽ More
We propose fast and practical quantum-inspired classical algorithms for solving linear systems. Specifically, given sampling and query access to a matrix $A\in\mathbb{R}^{m\times n}$ and a vector $b\in\mathbb{R}^m$, we propose classical algorithms that produce a data structure for the solution $x\in\mathbb{R}^{n}$ of the linear system $Ax=b$ with the ability to sample and query its entries. The resulting $x$ satisfies $\|x-A^{+}b\|\leqε\|A^{+}b\|$, where $\|\cdot\|$ is the spectral norm and $A^+$ is the Moore-Penrose inverse of $A$. Our algorithm has time complexity $\widetilde{O}(κ_F^4/κε^2)$ in the general case, where $κ_{F} =\|A\|_F\|A^+\|$ and $κ=\|A\|\|A^+\|$ are condition numbers. Compared to the prior state-of-the-art result [Shao and Montanaro, arXiv:2103.10309v2], our algorithm achieves a polynomial speedup in condition numbers. When $A$ is $s$-sparse, our algorithm has complexity $\widetilde{O}(s κ\log(1/ε))$, matching the quantum lower bound for solving linear systems in $κ$ and $1/ε$ up to poly-logarithmic factors [Harrow and Kothari]. When $A$ is $s$-sparse and symmetric positive-definite, our algorithm has complexity $\widetilde{O}(s\sqrtκ\log(1/ε))$.
Technically, our main contribution is the application of the heavy ball momentum method to quantum-inspired classical algorithms for solving linear systems, where we propose two new methods with speedups: quantum-inspired Kaczmarz method with momentum and quantum-inspired coordinate descent method with momentum. Their analysis exploits careful decomposition of the momentum transition matrix and the application of novel spectral norm concentration bounds for independent random matrices. Finally, we also conduct numerical experiments for our algorithms on both synthetic and real-world datasets, and the experimental results support our theoretical claims.
△ Less
Submitted 30 November, 2023; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Accelerated stochastic approximation with state-dependent noise
Authors:
Sasila Ilandarideva,
Anatoli Juditsky,
Guanghui Lan,
Tianjiao Li
Abstract:
We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered…
▽ More
We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size.
We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
△ Less
Submitted 13 July, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Monte Carlo Policy Gradient Method for Binary Optimization
Authors:
Cheng Chen,
Ruitao Chen,
Tianyou Li,
Ruichen Ao,
Zaiwen Wen
Abstract:
Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized…
▽ More
Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the gradient efficiently. We further develop a filter scheme to replace the original objective function by the one with the local search technique to broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality for MCMC. Numerical results show that this framework is very promising to provide near-optimal solutions for quite a few binary optimization problems.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Nonlinear asymptotic stability and transition threshold for 2D Taylor-Couette flows in Sobolev spaces
Authors:
Xinliang An,
Taoran He,
Te Li
Abstract:
In this paper, we investigate the stability of the 2-dimensional (2D) Taylor-Couette (TC) flow for the incompressible Navier-Stokes equations. The explicit form of velocity for 2D TC flow is given by $u=(Ar+\frac{B}{r})(-\sin θ, \cos θ)^T$ with $(r, θ)\in [1, R]\times \mathbb{S}^1$ being an annulus and $A, B$ being constants. Here, $A, B$ encode the rotational effect and $R$ is the ratio of the ou…
▽ More
In this paper, we investigate the stability of the 2-dimensional (2D) Taylor-Couette (TC) flow for the incompressible Navier-Stokes equations. The explicit form of velocity for 2D TC flow is given by $u=(Ar+\frac{B}{r})(-\sin θ, \cos θ)^T$ with $(r, θ)\in [1, R]\times \mathbb{S}^1$ being an annulus and $A, B$ being constants. Here, $A, B$ encode the rotational effect and $R$ is the ratio of the outer and inner radii of the annular region. Our focus is the long-term behavior of solutions around the steady 2D TC flow. While the laminar solution is known to be a global attractor for 2D channel flows and plane flows, it is unclear whether this is still true for rotating flows with curved geometries. In this article, we prove that the 2D Taylor-Couette flow is asymptotically stable, even at high Reynolds number ($Re\sim ν^{-1}$), with a sharp exponential decay rate of $\exp(-ν^{\frac13}|B|^{\frac23}R^{-2}t)$ as long as the initial perturbation is less than or equal to $ν^\frac12 |B|^{\frac12}R^{-2}$ in Sobolev space. The powers of $ν$ and $B$ in this decay estimate are optimal. It is derived using the method of resolvent estimates and is commonly recognized as the enhanced dissipative effect. Compared to the Couette flow, the enhanced dissipation of the rotating Taylor-Couette flow not only depends on the Reynolds number but also reflects the rotational aspect via the rotational coefficient $B$. The larger the $|B|$, the faster the long-time dissipation takes effect. We also conduct space-time estimates describing inviscid-dam** mechanism in our proof. To obtain these inviscid-dam** estimates, we find and construct a new set of explicit orthonormal basis of the weighted eigenfunctions for the Laplace operators corresponding to the circular flows. These provide new insights into the mathematical understanding of the 2D Taylor-Couette flows.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Complete self-shrinkers with bounded the second fundamental form in $\mathbb{R}^{n+1}$
Authors:
Yayun Chen,
Tongzhu Li
Abstract:
Let $X:M^n\to \mathbb{R}^{n+1}$ be a complete properly immersed self-shrinker. In this paper, we prove that if the squared norm of the second fundamental form $S$ satisfies $1\leq S< C$ for some constant $C$, then $S=1$. Further we classify the $n$-dimensional complete proper self-shrinkers with constant squared norm of the second fundamental form in $\mathbb{R}^{n+1}$, which solve the conjecture…
▽ More
Let $X:M^n\to \mathbb{R}^{n+1}$ be a complete properly immersed self-shrinker. In this paper, we prove that if the squared norm of the second fundamental form $S$ satisfies $1\leq S< C$ for some constant $C$, then $S=1$. Further we classify the $n$-dimensional complete proper self-shrinkers with constant squared norm of the second fundamental form in $\mathbb{R}^{n+1}$, which solve the conjecture proposed by Q.M. Cheng and G. Wei when the self-shrinker is proper.
△ Less
Submitted 4 July, 2023; v1 submitted 17 June, 2023;
originally announced June 2023.
-
Importance Sparsification for Sinkhorn Algorithm
Authors:
Mengyu Li,
Jun Yu,
Tao Li,
Cheng Meng
Abstract:
Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and…
▽ More
Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and UOT solutions. Specifically, our method employs natural upper bounds for unknown optimal transport plans to establish effective sampling probabilities, and constructs a sparse kernel matrix to accelerate Sinkhorn iterations, reducing the computational cost of each iteration from $O(n^2)$ to $\widetilde{O}(n)$ for a sample of size $n$. Theoretically, we show the proposed estimators for the regularized OT and UOT problems are consistent under mild regularity conditions. Experiments on various synthetic data demonstrate Spar-Sink outperforms mainstream competitors in terms of both estimation error and speed. A real-world echocardiogram data analysis shows Spar-Sink can effectively estimate and visualize cardiac cycles, from which one can identify heart failure and arrhythmia. To evaluate the numerical accuracy of cardiac cycle prediction, we consider the task of predicting the end-systole time point using the end-diastole one. Results show Spar-Sink performs as well as the classical Sinkhorn algorithm, requiring significantly less computational time.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
On the Mathematics of RNA Velocity II: Algorithmic Aspects
Authors:
Tiejun Li,
Yizhuo Wang,
Guoguo Yang,
Peijie Zhou
Abstract:
In a previous paper [CSIAM Trans. Appl. Math. 2 (2021), 1-55], the authors proposed a theoretical framework for the analysis of RNA velocity, which is a promising concept in scRNA-seq data analysis to reveal the cell state-transition dynamical processes underlying snapshot data. The current paper is devoted to the algorithmic study of some key components in RNA velocity workflow. Four important po…
▽ More
In a previous paper [CSIAM Trans. Appl. Math. 2 (2021), 1-55], the authors proposed a theoretical framework for the analysis of RNA velocity, which is a promising concept in scRNA-seq data analysis to reveal the cell state-transition dynamical processes underlying snapshot data. The current paper is devoted to the algorithmic study of some key components in RNA velocity workflow. Four important points are addressed in this paper: (1) We construct a rational time-scale fixation method which can determine the global gene-shared latent time for cells. (2) We present an uncertainty quantification strategy for the inferred parameters obtained through the EM algorithm. (3) We establish the optimal criterion for the choice of velocity kernel bandwidth with respect to the sample size in the downstream analysis and discuss its implications. (4) We propose a temporal distance estimation approach between two cell clusters along the cellular development path. Some illustrative numerical tests are also carried out to verify our analysis. These results are intended to provide tools and insights in further development of RNA velocity type methods in the future.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
A finite theorem for Ahlfors' covering surface theory
Authors:
Tian-Run Li,
Yun-Ling Chen,
Guang-Yuan Zhang
Abstract:
Ahlfors' theory of covering surfaces is one of the major mathematical achievement of last century. The most important part of his theory is the Second Fundamental Theorem (SFT). We are interested in the relation of errors of Ahlfors' SFT with the same boundary curve.
In this paper we will prove a result which is used to establish the best bound of the constant in Ahlfors' SFT in arXiv.2307.04623…
▽ More
Ahlfors' theory of covering surfaces is one of the major mathematical achievement of last century. The most important part of his theory is the Second Fundamental Theorem (SFT). We are interested in the relation of errors of Ahlfors' SFT with the same boundary curve.
In this paper we will prove a result which is used to establish the best bound of the constant in Ahlfors' SFT in arXiv.2307.04623.
Precisely speaking, we will prove that for any surface $Σ\in\mathcal{F}_r(L,m)$, a new surface $Σ_1$ can be constructed based on it, such that $R(Σ_1)\ge R(Σ)$ and $L(\partialΣ_1)\le L(\partialΣ)$, where $R(Σ)$ is Ahlfors' error term and $L(\partialΣ)$ is the boundary length of the surface $Σ$, and the covering degree of $Σ_1$ has an upper bound independent of surfaces. Meanwhile, this conclusion suggests that the supremum of $H(Σ)=R(Σ)/L(\partialΣ)$ can be achieved by surfaces in the space $\mathcal{F}_r'(L,m)$.
△ Less
Submitted 12 July, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Global Finite-Energy Solutions of the Compressible Euler-Poisson Equations for General Pressure Laws with Spherical Symmetry
Authors:
Gui-Qiang G. Chen,
Feimin Huang,
Tianhong Li,
Weiqiang Wang,
Yong Wang
Abstract:
We are concerned with global finite-energy solutions of the three-dimensional compressible Euler-Poisson equations with gravitational potential and general pressure law, especially including the constitutive equation of white dwarf stars. We construct global finite-energy solutions of the Cauchy problem for the Euler-Poisson equations with large initial data of spherical symmetry as the inviscid l…
▽ More
We are concerned with global finite-energy solutions of the three-dimensional compressible Euler-Poisson equations with gravitational potential and general pressure law, especially including the constitutive equation of white dwarf stars. We construct global finite-energy solutions of the Cauchy problem for the Euler-Poisson equations with large initial data of spherical symmetry as the inviscid limit of the solutions of the corresponding Cauchy problem for the Navier-Stokes-Poisson equations. The strong convergence of the vanishing viscosity solutions is achieved through entropy analysis, uniform estimates in $L^p$, and a more general compensated compactness framework via several new ingredients. A key estimate is first established for the integrability of the density over unbounded domains independent of the viscosity coefficient. Then a special entropy pair is carefully designed by solving a Goursat problem for the entropy equation such that a higher integrability of the velocity is established, which is a crucial step. Moreover, the weak entropy kernel for the general pressure law and its fractional derivatives of the required order near vacuum ($ρ=0$) and far-field ($ρ=\infty$) are carefully analyzed. Owing to the generality of the pressure law, only the $W^{-1,p}_{\rm loc}$-compactness of weak entropy dissipation measures with $p\in [1,2)$ can be obtained; this is rescued by the equi-integrability of weak entropy pairs which can be established by the estimates obtained above so that the div-curl lemma still applies. Finally, based on the above analysis of weak entropy pairs, the $L^p$ compensated compactness framework for the compressible Euler equations with general pressure law is established. This new compensated compactness framework and the techniques developed in this paper should be useful for solving further nonlinear problems with similar features.
△ Less
Submitted 12 March, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Principal Feature Detection via $Φ$-Sobolev Inequalities
Authors:
Matthew T. C. Li,
Youssef Marzouk,
Olivier Zahm
Abstract:
We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the refe…
▽ More
We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the reference measure satisfies a subspace $φ$-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari $α$-divergences. Our construction proceeds in two stages. First, for any feature map and any $α$-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the $φ$-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of $α$-divergences $α\in (0,1]$. We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps.
△ Less
Submitted 16 January, 2024; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Hadwiger's Conjecture for some graphs with independence number two
Authors:
Tong Li,
Qiang Zhou
Abstract:
Let $h(G)$ denote the largest $t$ such that $G$ contains $K_t$ as a minor and $χ(G)$ be the chromatic number of $G$ respectively. In 1943, Hadwiger conjectured that $h(G) \geq χ(G)$ for any graph $G$. In this paper, we prove that Hadwiger's conjecture holds for $H$-free graphs with independence number two, where $H$ is one of some specified graphs.
Let $h(G)$ denote the largest $t$ such that $G$ contains $K_t$ as a minor and $χ(G)$ be the chromatic number of $G$ respectively. In 1943, Hadwiger conjectured that $h(G) \geq χ(G)$ for any graph $G$. In this paper, we prove that Hadwiger's conjecture holds for $H$-free graphs with independence number two, where $H$ is one of some specified graphs.
△ Less
Submitted 30 March, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Logarithmic-Regret Quantum Learning Algorithms for Zero-Sum Games
Authors:
Minbo Gao,
Zhengfeng Ji,
Tongyang Li,
Qisheng Wang
Abstract:
We propose the first online quantum algorithm for zero-sum games with $\tilde O(1)$ regret under the game setting. Moreover, our quantum algorithm computes an $\varepsilon$-approximate Nash equilibrium of an $m \times n$ matrix zero-sum game in quantum time $\tilde O(\sqrt{m+n}/\varepsilon^{2.5})$, yielding a quadratic improvement over classical algorithms in terms of $m, n$. Our algorithm uses st…
▽ More
We propose the first online quantum algorithm for zero-sum games with $\tilde O(1)$ regret under the game setting. Moreover, our quantum algorithm computes an $\varepsilon$-approximate Nash equilibrium of an $m \times n$ matrix zero-sum game in quantum time $\tilde O(\sqrt{m+n}/\varepsilon^{2.5})$, yielding a quadratic improvement over classical algorithms in terms of $m, n$. Our algorithm uses standard quantum inputs and generates classical outputs with succinct descriptions, facilitating end-to-end applications. As an application, we obtain a fast quantum linear programming solver. Technically, our online quantum algorithm "quantizes" classical algorithms based on the optimistic multiplicative weight update method. At the heart of our algorithm is a fast quantum multi-sampling procedure for the Gibbs sampling problem, which may be of independent interest.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
On Beckner's Inequality for Axially Symmetric Functions on $\mathbb{S}^6$
Authors:
Changfeng Gui,
Tuoxin Li,
Juncheng Wei,
Zikai Ye
Abstract:
We prove that axially symmetric solutions to the $Q$-curvature type problem $$ αP_6 u + 120(1-\frac{e^{6u}}{\int_{\mathbb{S}^6} e^{6u}})=0 \ \ \ \ \ \mbox{on} \ \mathbb{S}^6 $$ must be constants, provided that $ \frac{1}{2}\leq α<1$. In view of the existence of non-constant solutions obtained by Gui-Hu-Xie \cite{GHW2022} for $\frac{1}{7}<α<\frac{1}{2}$, this result is sharp. This result closes the…
▽ More
We prove that axially symmetric solutions to the $Q$-curvature type problem $$ αP_6 u + 120(1-\frac{e^{6u}}{\int_{\mathbb{S}^6} e^{6u}})=0 \ \ \ \ \ \mbox{on} \ \mathbb{S}^6 $$ must be constants, provided that $ \frac{1}{2}\leq α<1$. In view of the existence of non-constant solutions obtained by Gui-Hu-Xie \cite{GHW2022} for $\frac{1}{7}<α<\frac{1}{2}$, this result is sharp. This result closes the gap of the related results in \cite{GHW2022}, which proved a similar uniqueness result for $α\geq 0.6168$. The improvement is based on two types of new estimates: one is a better estimate of the semi-norm $\lfloor G\rfloor^2$, the other one is a family of refined estimates on Gegenbauer coefficients, such as pointwise decaying and cancellations properties.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research
Authors:
Ching Pui Wan,
Tung Li,
Jason Min Wang
Abstract:
Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on develo** neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation…
▽ More
Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on develo** neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation research problems. In this work, we analyze the end-to-end autoregressive models for vehicle routing problems and show that these models can benefit from the recent advances in reinforcement learning with a careful re-implementation of the model architecture. In particular, we re-implemented the Attention Model and trained it with Proximal Policy Optimization (PPO) in CleanRL, showing at least 8 times speed up in training time. We hereby introduce RLOR, a flexible framework for Deep Reinforcement Learning for Operation Research. We believe that a flexible framework is key to develo** deep reinforcement learning models for operation research problems. The code of our work is publicly available at https://github.com/cpwan/RLOR.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Algebraic capacity as tropical polynomial over $c_1$-nef symplectic cone
Authors:
Tian-Jun Li,
Shengzhen Ning
Abstract:
In a series of work [Wor22],[Wor21] and [CW20], algebraic capacity was introduced in an algebraic manner for polarized surfaces and applied to the symplectic embedding problems. In this note, we give a reformulation of algebraic capacity in terms of almost complex geometry. For rational surfaces with $c_1\cdotω> 0$, we further introduce a sequence of tropical polynomials which will describe those…
▽ More
In a series of work [Wor22],[Wor21] and [CW20], algebraic capacity was introduced in an algebraic manner for polarized surfaces and applied to the symplectic embedding problems. In this note, we give a reformulation of algebraic capacity in terms of almost complex geometry. For rational surfaces with $c_1\cdotω> 0$, we further introduce a sequence of tropical polynomials which will describe those capacities viewed as functions over the space of such symplectic forms. As an application, we give a direct proof of the correspondence between algebraic capacity and ECH capcity for smooth toric surface without terminology from algebraic geometry.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Random Inverse Problems Over Graphs: Decentralized Online Learning
Authors:
Tao Li,
Xiwei Zhang
Abstract:
We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a cla…
▽ More
We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.
△ Less
Submitted 29 May, 2024; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators
Authors:
Tianyou Li,
Fan Chen,
Huajie Chen,
Zaiwen Wen
Abstract:
Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational i…
▽ More
Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational inference. In this paper, we consider the SGD algorithm that employ the Markov Chain Monte Carlo (MCMC) estimator to compute the gradient, called MCMC-SGD. Since MCMC reduces the sampling complexity significantly, it is an asymptotically convergent biased estimator in practice. Moreover, by incorporating a general class of unbounded functions, it is much more difficult to analyze the MCMC sampling error. Therefore, we assume that the function is sub-exponential and use the Bernstein inequality for non-stationary Markov chains to derive error bounds of the MCMC estimator. Consequently, MCMC-SGD is proven to have a first order convergence rate $O(\log K/\sqrt{n K})$ with $K$ iterations and a sample size $n$. It partially explains how MCMC influences the behavior of SGD. Furthermore, we verify the correlated negative curvature condition under reasonable assumptions. It is shown that MCMC-SGD escapes from saddle points and reaches $(ε,ε^{1/4})$ approximate second order stationary points or $ε^{1/2}$-variance points at least $O(ε^{-11/2}\log^{2}(1/ε) )$ steps with high probability. Our analysis unveils the convergence pattern of MCMC-SGD across a broad class of stochastic optimization problems, and interprets the convergence phenomena observed in practical applications.
△ Less
Submitted 23 March, 2024; v1 submitted 19 March, 2023;
originally announced March 2023.