-
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Authors:
Rui Pan,
Jipeng Zhang,
Xingyuan Pan,
Renjie Pi,
Xiaoyu Wang,
Tong Zhang
Abstract:
Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particu…
▽ More
Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particularly in the context of large language models (LLMs). This paper introduces the first scalable instantiation of this paradigm called ScaleBiO, focusing on bilevel optimization for large-scale LLM data reweighting. By combining with a recently proposed memory-efficient training technique called LISA, our novel algorithm allows the paradigm to scale to 34-billion-parameter LLMs on eight A40 GPUs, marking the first successful application of bilevel optimization under practical scenarios for large-sized LLMs. Empirically, extensive experiments on data reweighting verify the effectiveness of ScaleBiO for different-scaled models, including GPT-2, LLaMA-3-8B, GPT-NeoX-20B, and Yi-34B, where bilevel optimization succeeds in filtering irrelevant data samples and selecting informative samples. Theoretically, ScaleBiO ensures the optimality of the learned data weights, along with a convergence guarantee matching the conventional first-order bilevel optimization paradigm on smooth and strongly convex objectives.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Authors:
Yuxing Liu,
Rui Pan,
Tong Zhang
Abstract:
Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can d…
▽ More
Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can demonstrate the benefit of Adagrad over SGD was obtained in the original paper of Adagrad for nonsmooth objective functions. However, for nonsmooth objective functions, there can be a linear slowdown of convergence when batch size increases, and thus a convergence analysis based on nonsmooth assumption cannot be used for large batch algorithms. In this work, we resolve this gap between theory and practice by providing a new analysis of Adagrad on both convex and nonconvex smooth objectives suitable for the large batch setting. It is shown that under the anisotropic smoothness and noise conditions, increased batch size does not slow down convergence for Adagrad, and thus it can still achieve a faster convergence guarantee over SGD even in the large batch setting. We present detailed comparisons between SGD and Adagrad to provide a better understanding of the benefits of adaptive gradient methods. Experiments in logistic regression and instruction following fine-tuning tasks provide strong evidence to support our theoretical analysis.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization
Authors:
Motahareh Sohrabi,
Juan Ramirez,
Tianyue H. Zhang,
Simon Lacoste-Julien,
Jose Gallego-Posada
Abstract:
Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problems are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the…
▽ More
Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problems are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the lack of reliable, general-purpose update schemes for the Lagrange multipliers. This paper proposes the $ν$PI algorithm and contributes an optimization perspective on Lagrange multiplier updates based on PI controllers, extending the work of Stooke, Achiam and Abbeel (2020). We provide theoretical and empirical insights explaining the inability of momentum methods to address the shortcomings of gradient descent-ascent, and contrast this with the empirical success of our proposed $ν$PI controller. Moreover, we prove that $ν$PI generalizes popular momentum methods for single-objective minimization. Our experiments demonstrate that $ν$PI reliably stabilizes the multiplier dynamics and its hyperparameters enjoy robust and predictable behavior.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A structure-preserving scheme for computing effective diffusivity and anomalous diffusion phenomena of random flows
Authors:
Tan Zhang,
Zhongjian Wang,
Jack Xin,
Zhiwen Zhang
Abstract:
This paper aims to investigate the diffusion behavior of particles moving in stochastic flows under a structure-preserving scheme. We compute the effective diffusivity for normal diffusive random flows and establish the power law between spatial and temporal variables for cases with anomalous diffusion phenomena. From a Lagrangian approach, we separate the corresponding stochastic differential equ…
▽ More
This paper aims to investigate the diffusion behavior of particles moving in stochastic flows under a structure-preserving scheme. We compute the effective diffusivity for normal diffusive random flows and establish the power law between spatial and temporal variables for cases with anomalous diffusion phenomena. From a Lagrangian approach, we separate the corresponding stochastic differential equations (SDEs) into sub-problems and construct a one-step structure-preserving method to solve them. Then by modified equation systems, the convergence analysis in calculating the effective diffusivity is provided and compared between the structure-preserving scheme and the Euler-Maruyama scheme. Also, we provide the error estimate for the structure-preserving scheme in calculating the power law for a series of super-diffusive random flows. Finally, we calculate the effective diffusivity and anomalous diffusion phenomena for a series of 2D and 3D random fields.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
On the Sequence Evaluation based on Stochastic Processes
Authors:
Tianhao Zhang,
Zhexiao Lin,
Zhecheng Sheng,
Chen Jiang,
Dongyeop Kang
Abstract:
Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-bas…
▽ More
Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-based training objective for the text encoder and design a more thorough measurement (score) for long text evaluation compared to the previous approach. The proposed training objective effectively preserves the sequence coherence, while the new score comprehensively captures both temporal and spatial dependencies. Theoretical properties of our new score show its advantages in sequence evaluation. Experimental results show superior performance in various sequence evaluation tasks, including global and local discrimination within and between documents of different lengths. We also demonstrate the encoder achieves competitive results on discriminating human and AI written text.
△ Less
Submitted 15 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Ergodicity for 2D Navier-Stokes equations with a degenerate pure jump noise
Authors:
Xuhui Peng,
Jianliang Zhai,
Tusheng Zhang
Abstract:
In this paper, we establish the ergodicity for stochastic 2D Navier-Stokes equations driven by a highly degenerate pure jump Lévy noise. The noise could appear in as few as four directions. This gives an affirmative anwser to a longstanding problem. The case of Gaussian noise was treated in Hairer and Mattingly [\emph{Ann. of Math.}, 164(3):993--1032, 2006]. To obtain the uniqueness of invariant m…
▽ More
In this paper, we establish the ergodicity for stochastic 2D Navier-Stokes equations driven by a highly degenerate pure jump Lévy noise. The noise could appear in as few as four directions. This gives an affirmative anwser to a longstanding problem. The case of Gaussian noise was treated in Hairer and Mattingly [\emph{Ann. of Math.}, 164(3):993--1032, 2006]. To obtain the uniqueness of invariant measure, we use Malliavin calculus and anticipating stochastic calculus to establish the equi-continuity of the semigroup, the so-called {\em e-property}, and prove some weak irreducibility of the solution process.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
An improvement and generalization of Rotfel'd type inequalities for sectorial matrices
Authors:
Nan Fanghong,
Teng Zhang
Abstract:
Byusing equivalence conditions for sectorial matrices obtained by Alakhrass and Sababheh in 2020, we improve a Rotfel'd type inequality for sectorial matrices derived by P. Zhang in 2015 and generalize a result derived by Y. Mao et al. in 2024.
Byusing equivalence conditions for sectorial matrices obtained by Alakhrass and Sababheh in 2020, we improve a Rotfel'd type inequality for sectorial matrices derived by P. Zhang in 2015 and generalize a result derived by Y. Mao et al. in 2024.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
The foundation of generalized parallel connections, 2-sums, and segment-cosegment exchanges of matroids
Authors:
Matthew Baker,
Oliver Lorscheid,
Zach Walsh,
Tianyi Zhang
Abstract:
We show that, under suitable hypotheses, the foundation of a generalized parallel connection of matroids is the relative tensor product of the foundations. Using this result, we show that the foundation of a 2-sum of matroids is the absolute tensor product of the foundations, and that the foundation of a matroid is invariant under segment-cosegment exchange.
We show that, under suitable hypotheses, the foundation of a generalized parallel connection of matroids is the relative tensor product of the foundations. Using this result, we show that the foundation of a 2-sum of matroids is the absolute tensor product of the foundations, and that the foundation of a matroid is invariant under segment-cosegment exchange.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Theoretical Guarantees for the Subspace-Constrained Tyler's Estimator
Authors:
Gilad Lerman,
Feng Yu,
Teng Zhang
Abstract:
This work analyzes the subspace-constrained Tyler's estimator (STE) designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. It assumes a weak inlier-outlier model and allows the fraction of inliers to be smaller than a fraction that leads to computational hardness of the robust subspace recovery problem. It shows that in this setting, if the…
▽ More
This work analyzes the subspace-constrained Tyler's estimator (STE) designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. It assumes a weak inlier-outlier model and allows the fraction of inliers to be smaller than a fraction that leads to computational hardness of the robust subspace recovery problem. It shows that in this setting, if the initialization of STE, which is an iterative algorithm, satisfies a certain condition, then STE can effectively recover the underlying subspace. It further shows that under the generalized haystack model, STE initialized by the Tyler's M-estimator (TME), can recover the subspace when the fraction of iniliers is too small for TME to handle.
△ Less
Submitted 12 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Authors:
Rui Pan,
Xiang Liu,
Shizhe Diao,
Renjie Pi,
Jipeng Zhang,
Chi Han,
Tong Zhang
Abstract:
The machine learning community has witnessed impressive advancements since large language models (LLMs) first appeared. Yet, their massive memory consumption has become a significant roadblock to large-scale training. For instance, a 7B model typically requires at least 60 GB of GPU memory with full parameter training, which presents challenges for researchers without access to high-resource envir…
▽ More
The machine learning community has witnessed impressive advancements since large language models (LLMs) first appeared. Yet, their massive memory consumption has become a significant roadblock to large-scale training. For instance, a 7B model typically requires at least 60 GB of GPU memory with full parameter training, which presents challenges for researchers without access to high-resource environments. Parameter Efficient Fine-Tuning techniques such as Low-Rank Adaptation (LoRA) have been proposed to alleviate this problem. However, in most large-scale fine-tuning settings, their performance does not reach the level of full parameter training because they confine the parameter search to a low-rank subspace. Attempting to complement this deficiency, we investigate the layerwise properties of LoRA on fine-tuning tasks and observe an unexpected but consistent skewness of weight norms across different layers. Utilizing this key observation, a surprisingly simple training strategy is discovered, which outperforms both LoRA and full parameter training in a wide range of settings with memory costs as low as LoRA. We name it Layerwise Importance Sampled AdamW (LISA), a promising alternative for LoRA, which applies the idea of importance sampling to different layers in LLMs and randomly freezes most middle layers during optimization. Experimental results show that with similar or less GPU memory consumption, LISA surpasses LoRA or even full parameter tuning in downstream fine-tuning tasks, where LISA consistently outperforms LoRA by over 10%-35% in terms of MT-Bench score while achieving on-par or better performance in MMLU, AGIEval and WinoGrande. On large models, specifically LLaMA-2-70B, LISA surpasses LoRA on MT-Bench, GSM8K, and PubMedQA, demonstrating its effectiveness across different domains.
△ Less
Submitted 25 May, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Dynamics of a memory-based diffusion model with spatial heterogeneity and nonlinear boundary condition
Authors:
Quanli Ji,
Ranchao Wu,
Tonghua Zhang
Abstract:
In this work, we study the dynamics of a spatially heterogeneous single population model with the memory effect and nonlinear boundary condition. By virtue of the implicit function theorem and Lyapunov-Schmidt reduction, spatially nonconstant positive steady state solutions appear from two trivial solutions, respectively. By using bifurcation analysis, the Hopf bifurcation associated with one spat…
▽ More
In this work, we study the dynamics of a spatially heterogeneous single population model with the memory effect and nonlinear boundary condition. By virtue of the implicit function theorem and Lyapunov-Schmidt reduction, spatially nonconstant positive steady state solutions appear from two trivial solutions, respectively. By using bifurcation analysis, the Hopf bifurcation associated with one spatially nonconstant positive steady state is found to occur. The results complement the existing ones. Specifically, it is found that with the interaction of spatial heterogeneity and nonlinear boundary condition, when the memory term is stronger than the interaction of the interior reaction term and the boundary one, the memory-based diffusive model has a single stability switch from stability to instability, with the increase of the delayed memory value. Therefore, the memory delay will lead to a single stability switch of such memory-based diffusive model and consequently the Hopf bifurcation will happen in the model.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling
Authors:
Xunpeng Huang,
Hanze Dong,
Difan Zou,
Tong Zhang
Abstract:
Understanding the dimension dependency of computational complexity in high-dimensional sampling problem is a fundamental problem, both from a practical and theoretical perspective. Compared with samplers with unbiased stationary distribution, e.g., Metropolis-adjusted Langevin algorithm (MALA), biased samplers, e.g., Underdamped Langevin Dynamics (ULD), perform better in low-accuracy cases just be…
▽ More
Understanding the dimension dependency of computational complexity in high-dimensional sampling problem is a fundamental problem, both from a practical and theoretical perspective. Compared with samplers with unbiased stationary distribution, e.g., Metropolis-adjusted Langevin algorithm (MALA), biased samplers, e.g., Underdamped Langevin Dynamics (ULD), perform better in low-accuracy cases just because a lower dimension dependency in their complexities. Along this line, Freund et al. (2022) suggest that the modified Langevin algorithm with prior diffusion is able to converge dimension independently for strongly log-concave target distributions. Nonetheless, it remains open whether such property establishes for more general cases. In this paper, we investigate the prior diffusion technique for the target distributions satisfying log-Sobolev inequality (LSI), which covers a much broader class of distributions compared to the strongly log-concave ones. In particular, we prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules. The core of our proof technique is a novel construction of an interpolating SDE, which significantly helps to conduct a more accurate characterization of the discrete updates of the overdamped Langevin dynamics. Our theoretical analysis demonstrates the benefits of prior diffusion for a broader class of target distributions and provides new insights into develo** faster sampling algorithms.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Debiased Projected Two-Sample Comparisonscfor Single-Cell Expression Data
Authors:
Tianyu Zhang,
**g Lei,
Kathryn Roeder
Abstract:
We study several variants of the high-dimensional mean inference problem motivated by modern single-cell genomics data. By taking advantage of low-dimensional and localized signal structures commonly seen in such data, our proposed methods not only have the usual frequentist validity but also provide useful information on the potential locations of the signal if the null hypothesis is rejected. Ou…
▽ More
We study several variants of the high-dimensional mean inference problem motivated by modern single-cell genomics data. By taking advantage of low-dimensional and localized signal structures commonly seen in such data, our proposed methods not only have the usual frequentist validity but also provide useful information on the potential locations of the signal if the null hypothesis is rejected. Our method adaptively projects the high-dimensional vector onto a low-dimensional space, followed by a debiasing step using the semiparametric double-machine learning framework. Our analysis shows that debiasing is unnecessary under the global null, but necessary under a ``projected null'' that is of scientific interest. We also propose an ``anchored projection'' to maximize the power while avoiding the degeneracy issue under the null. Experiments on synthetic data and a real single-cell sequencing dataset demonstrate the effectiveness and interpretability of our methods.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation
Authors:
Teng Zhang,
Xing Fan
Abstract:
Most existing methodologies of estimating low-rank matrices rely on Burer-Monteiro factorization, but these approaches can suffer from slow convergence, especially when dealing with solutions characterized by a large condition number, defined by the ratio of the largest to the $r$-th singular values, where $r$ is the search rank. While methods such as Scaled Gradient Descent have been proposed to…
▽ More
Most existing methodologies of estimating low-rank matrices rely on Burer-Monteiro factorization, but these approaches can suffer from slow convergence, especially when dealing with solutions characterized by a large condition number, defined by the ratio of the largest to the $r$-th singular values, where $r$ is the search rank. While methods such as Scaled Gradient Descent have been proposed to address this issue, such methods are more complicated and sometimes have weaker theoretical guarantees, for example, in the rank-deficient setting. In contrast, this paper demonstrates the effectiveness of the projected gradient descent algorithm. Firstly, its local convergence rate is independent of the condition number. Secondly, under conditions where the objective function is rank-$2r$ restricted $L$-smooth and $μ$-strongly convex, with $L/μ< 3$, projected gradient descent with appropriate step size converges linearly to the solution. Moreover, a perturbed version of this algorithm effectively navigates away from saddle points, converging to an approximate solution or a second-order local minimizer across a wide range of step sizes. Furthermore, we establish that there are no spurious local minimizers in estimating asymmetric low-rank matrices when the objective function satisfies $L/μ<3.$
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Wong-Zakai approximations and support theorems for SDEs under Lyapunov conditions
Authors:
Qi Li,
Jianliang Zhai,
Tusheng Zhang
Abstract:
In this paper, we establish the Stroock-Varadhan type support theorems for stochastic differential equations (SDEs) under Lyapunov conditions, which significantly improve the existing results in the literature where the coefficients of the SDEs are required to be globally Lipschitz and of linear growth. Our conditions are very mild to include many important models, e.g. Threshold Ornstein-Ulenbeck…
▽ More
In this paper, we establish the Stroock-Varadhan type support theorems for stochastic differential equations (SDEs) under Lyapunov conditions, which significantly improve the existing results in the literature where the coefficients of the SDEs are required to be globally Lipschitz and of linear growth. Our conditions are very mild to include many important models, e.g. Threshold Ornstein-Ulenbeck process, Stochastic SIR model, Stochastic Lotka-Volterra systems, Stochastic Duffing-van der Pol oscillator model, which have polynomial the coefficients. To obtain the support theorem, we prove a new Wong-Zakai approximation problem, which is of independent interest.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Large Deviation Principle of Stochastic Evolution Equations with reflection
Authors:
Zdzisław Brzeźniak,
Qi Li,
Tusheng Zhang
Abstract:
In this paper, we establish a large deviation principle for stochastic evolution equations with reflection in an infinite dimensional ball. Weak convergence approach plays an important role.
In this paper, we establish a large deviation principle for stochastic evolution equations with reflection in an infinite dimensional ball. Weak convergence approach plays an important role.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Noether inequality for irregular threefolds of general type
Authors:
Yong Hu,
Tong Zhang
Abstract:
Let $X$ be a smooth irregular $3$-fold of general type over $\mathbb{C}$. We prove that the optimal Noether inequality $$ \mathrm{vol}(X) \ge \frac{4}{3}p_g(X) $$ holds if $p_g(X) \ge 16$ or if $X$ has a Gorenstein minimal model. Moreover, when $X$ attains the equality and $p_g(X) \ge 16$, its canonical model can be explicitly described.
Let $X$ be a smooth irregular $3$-fold of general type over $\mathbb{C}$. We prove that the optimal Noether inequality $$ \mathrm{vol}(X) \ge \frac{4}{3}p_g(X) $$ holds if $p_g(X) \ge 16$ or if $X$ has a Gorenstein minimal model. Moreover, when $X$ attains the equality and $p_g(X) \ge 16$, its canonical model can be explicitly described.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Toughness and Aα-spectral radius in graphs
Authors:
Sizhong Zhou,
Yuli Zhang,
Tao Zhang,
Hongxia Liu
Abstract:
Let $α\in[0,1)$, and let $G$ be a connected graph of order $n$ with $n\geq f(α)$, where $f(α)=6$ for $α\in[0,\frac{2}{3}]$ and $f(α)=\frac{4}{1-α}$ for $α\in(\frac{2}{3},1)$. A graph $G$ is said to be $t$-tough if $|S|\geq tc(G-S)$ for each subset $S$ of $V(G)$ with $c(G-S)\geq2$, where $c(G-S)$ is the number of connected components in $G-S$. The $A_α$-spectral radius of $G$ is denoted by…
▽ More
Let $α\in[0,1)$, and let $G$ be a connected graph of order $n$ with $n\geq f(α)$, where $f(α)=6$ for $α\in[0,\frac{2}{3}]$ and $f(α)=\frac{4}{1-α}$ for $α\in(\frac{2}{3},1)$. A graph $G$ is said to be $t$-tough if $|S|\geq tc(G-S)$ for each subset $S$ of $V(G)$ with $c(G-S)\geq2$, where $c(G-S)$ is the number of connected components in $G-S$. The $A_α$-spectral radius of $G$ is denoted by $ρ_α(G)$. In this paper, it is verified that $G$ is a 1-tough graph unless $G=K_1\vee(K_{n-2}\cup K_1)$ if $ρ_α(G)\geqρ_α(K_1\vee(K_{n-2}\cup K_1))$, where $ρ_α(K_1\vee(K_{n-2}\cup K_1))$ equals the largest root of $x^{3}-((α+1)n+α-3)x^{2}+(αn^{2}+(α^{2}-α-1)n-2α+1)x-α^{2}n^{2}+(3α^{2}-α+1)n-4α^{2}+5α-3=0$. Further, we present an $A_α$-spectral radius condition for a graph to be a $t$-tough graph.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Uniform large deviations and metastability of random dynamical systems
Authors:
Jifa Jiang,
Jian Wang,
Jianliang Zhai,
Tusheng Zhang
Abstract:
In this paper, we first provide a criterion on uniform large deviation principles (ULDP) of stochastic differential equations under Lyapunov conditions on the coefficients, which can be applied to stochastic systems with coefficients of polynomial growth and possible degenerate driving noises. In the second part, using the ULDP criterion we preclude the concentration of limiting measures of invari…
▽ More
In this paper, we first provide a criterion on uniform large deviation principles (ULDP) of stochastic differential equations under Lyapunov conditions on the coefficients, which can be applied to stochastic systems with coefficients of polynomial growth and possible degenerate driving noises. In the second part, using the ULDP criterion we preclude the concentration of limiting measures of invariant measures of stochastic dynamical systems on repellers and acyclic saddle chains and extend Freidlin and Wentzell's asymptotics theorem to stochastic systems with unbounded coefficients. Of particular interest, we determine the limiting measures of the invariant measures of the famous stochastic van der Pol equation and van der Pol Duffing equation whose noises are naturally degenerate. We also construct two examples to match the global phase portraits of Freidlin and Wentzell's unperturbed systems and to explicitly compute their transition difficulty matrices. Other applications include stochastic May-Leonard system and random systems with infinitely many equivalent classes.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Signed Mahonian Polynomials on Derangements in Classical Weyl Groups
Authors:
Kathy Q. Ji,
Dax T. X. Zhang
Abstract:
The polynomial of the major index ${\rm maj}_W (σ)$ over the subset $T$ of the Coxeter group $W$ is called the Mahonian polynomial over $T$, where ${\rm maj}_W (σ)$ is a Mahonian statistic of an element $σ\in T$, whereas the polynomial of the major index ${\rm maj}_W (σ)$ with the sign $(-1)^{\ell_W(σ)}$ over the subset $T$ is referred to as the signed Mahonian polynomial over $T$, where…
▽ More
The polynomial of the major index ${\rm maj}_W (σ)$ over the subset $T$ of the Coxeter group $W$ is called the Mahonian polynomial over $T$, where ${\rm maj}_W (σ)$ is a Mahonian statistic of an element $σ\in T$, whereas the polynomial of the major index ${\rm maj}_W (σ)$ with the sign $(-1)^{\ell_W(σ)}$ over the subset $T$ is referred to as the signed Mahonian polynomial over $T$, where ${\ell_W(σ)}$ is the length of $σ\in T$. Gessel, Wachs, and Chow established the formulas for the Mahonian polynomials over the sets of derangements in the symmetric group $S_n$ and the hyperoctahedral group $B_n$. By extending Wachs' approach and employing a refinement of Stanley's shuffle theorem established in our recent paper, we derive the formula for the Mahonian polynomials over the set of derangements in the even-signed permutation group $D_n$. This completes a picture which is now known for all the classical Weyl groups. Gessel-Simion, Adin-Gessel-Roichman, and Biagioli previously established formulas for the signed Mahonian polynomials over the classical Weyl groups. Building upon their formulas, we derive the formulas for the signed Mahonian polynomials over the set of derangements in classical Weyl groups. As applications of the formulas for the (signed) Mahonian polynomials over the sets of derangements in the classical Weyl groups, we obtain enumerative formulas of the number of derangements in classical Weyl groups with even lengths.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
New results on sparse representations in unions of orthonormal bases
Authors:
Tao Zhang,
Gennian Ge
Abstract:
The problem of sparse representation has significant applications in signal processing. The spark of a dictionary plays a crucial role in the study of sparse representation. Donoho and Elad initially explored the spark, and they provided a general lower bound. When the dictionary is a union of several orthonormal bases, Gribonval and Nielsen presented an improved lower bound for spark. In this pap…
▽ More
The problem of sparse representation has significant applications in signal processing. The spark of a dictionary plays a crucial role in the study of sparse representation. Donoho and Elad initially explored the spark, and they provided a general lower bound. When the dictionary is a union of several orthonormal bases, Gribonval and Nielsen presented an improved lower bound for spark. In this paper, we introduce a new construction of dictionary, achieving the spark bound given by Gribonval and Nielsen. Our result extends Shen et al.' s findings [IEEE Trans. Inform. Theory, vol. 68, pp. 4230--4243, 2022].
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Newton polytopes of dual $k$-Schur polynomials
Authors:
Bo Wang,
Candice X. T. Zhang,
Zhong-Xue Zhang
Abstract:
Rado's theorem about permutahedra and dominance order on partitions reveals that each Schur polynomial is M-convex, or equivalently, it has a saturated Newton polytope and this polytope is a generalized permutahedron as well. In this paper we show that the support of each dual $k$-Schur polynomial indexed by a $k$-bounded partition coincides with that of the Schur polynomial indexed by the same pa…
▽ More
Rado's theorem about permutahedra and dominance order on partitions reveals that each Schur polynomial is M-convex, or equivalently, it has a saturated Newton polytope and this polytope is a generalized permutahedron as well. In this paper we show that the support of each dual $k$-Schur polynomial indexed by a $k$-bounded partition coincides with that of the Schur polynomial indexed by the same partition, and hence the two polynomials share the same saturated Newton polytope. The main result is based on our recursive algorithm to generate a semistandard $k$-tableau for a given shape and $k$-weight. As consequences, we obtain the M-convexity of dual $k$-Schur polynomials, affine Stanley symmetric polynomials and cylindric skew Schur polynomials.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
A simple stochastic nonlinear AR model with application to bubble
Authors:
Xuanling Yang,
Dong Li,
Ting Zhang
Abstract:
Economic and financial time series can feature locally explosive behavior when a bubble is formed. The economic or financial bubble, especially its dynamics, is an intriguing topic that has been attracting longstanding attention. To illustrate the dynamics of the local explosion itself, the paper presents a novel, simple, yet useful time series model, called the stochastic nonlinear autoregressive…
▽ More
Economic and financial time series can feature locally explosive behavior when a bubble is formed. The economic or financial bubble, especially its dynamics, is an intriguing topic that has been attracting longstanding attention. To illustrate the dynamics of the local explosion itself, the paper presents a novel, simple, yet useful time series model, called the stochastic nonlinear autoregressive model, which is always strictly stationary and geometrically ergodic and can create long swings or persistence observed in many macroeconomic variables. When a nonlinear autoregressive coefficient is outside of a certain range, the model has periodically explosive behaviors and can then be used to portray the bubble dynamics. Further, the quasi-maximum likelihood estimation (QMLE) of our model is considered, and its strong consistency and asymptotic normality are established under minimal assumptions on innovation. A new model diagnostic checking statistic is developed for model fitting adequacy. In addition two methods for bubble tagging are proposed, one from the residual perspective and the other from the null-state perspective. Monte Carlo simulation studies are conducted to assess the performances of the QMLE and the two bubble tagging methods in finite samples. Finally, the usefulness of the model is illustrated by an empirical application to the monthly Hang Seng Index.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo
Authors:
Xunpeng Huang,
Difan Zou,
Hanze Dong,
Yian Ma,
Tong Zhang
Abstract:
To sample from a general target distribution $p_*\propto e^{-f_*}$ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimat…
▽ More
To sample from a general target distribution $p_*\propto e^{-f_*}$ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation. However, the original DMC algorithm encountered high gradient complexity, resulting in an exponential dependency on the error tolerance $ε$ of the obtained samples. In this paper, we demonstrate that the high complexity of DMC originates from its redundant design of score estimation, and proposed a more efficient algorithm, called RS-DMC, based on a novel recursive score estimation method. In particular, we first divide the entire diffusion process into multiple segments and then formulate the score estimation step (at any time step) as a series of interconnected mean estimation and sampling subproblems accordingly, which are correlated in a recursive manner. Importantly, we show that with a proper design of the segment decomposition, all sampling subproblems will only need to tackle a strongly log-concave distribution, which can be very efficient to solve using the Langevin-based samplers with a provably rapid convergence rate. As a result, we prove that the gradient complexity of RS-DMC only has a quasi-polynomial dependency on $ε$, which significantly improves exponential gradient complexity in Huang et al. (2023). Furthermore, under commonly used dissipative conditions, our algorithm is provably much faster than the popular Langevin-based algorithms. Our algorithm design and theoretical framework illuminate a novel direction for addressing sampling problems, which could be of broader applicability in the community.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Proof of Audenaert-Kittaneh's Conjecture
Authors:
Teng Zhang
Abstract:
By using Hadamard's 3-lines theorem for a certain analytic function defined in terms of the trace, we show that Audenaert-Kittaneh's Conjecture related to $p$-Schatten class.
By using Hadamard's 3-lines theorem for a certain analytic function defined in terms of the trace, we show that Audenaert-Kittaneh's Conjecture related to $p$-Schatten class.
△ Less
Submitted 26 February, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Solving multiscale dynamical systems by deep learning
Authors:
Zhi-Qin John Xu,
Junjie Yao,
Yuxiao Yi,
Liangkai Hang,
Weinan E,
Yaoyu Zhang,
Tianhan Zhang
Abstract:
Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of a global multiscale sampling method and a fitting by deep…
▽ More
Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of a global multiscale sampling method and a fitting by deep neural networks to handle multiscale systems. DeePODE's primary contribution is to address the multiscale challenge of efficiently uncovering representative training sets by combining the Monte Carlo method and the ODE system's intrinsic evolution without suffering from the ``curse of dimensionality''. The DeePODE method is validated in multiscale systems from diverse areas, including a predator-prey model, a power system oscillation, a battery electrolyte auto-ignition, and turbulent flames. Our methods exhibit strong generalization capabilities to unseen conditions, highlighting the power of deep learning in modeling intricate multiscale dynamical processes across science and engineering domains.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
Authors:
Rui Pan,
Yuxing Liu,
Xiaoyu Wang,
Tong Zhang
Abstract:
Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide…
▽ More
Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings, there is no rigorous theoretical analysis. In this paper, we fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods with step decay scheduler on quadratic objectives, under the anisotropic gradient noise condition. As a direct implication, we show that heavy-ball momentum can provide $\tilde{\mathcal{O}}(\sqrtκ)$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate with respect to the stochastic variance term. The combined effect implies an overall convergence rate within log factors from the statistical minimax rate. This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning, where a smaller number of iterations can significantly reduce the number of communication rounds, leading to acceleration in practice.
△ Less
Submitted 17 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Langlands Dualities through Bethe/Gauge Correspondence for 3d Gauge Theories
Authors:
Xiang-Mao Ding,
Ting Zhang
Abstract:
For non-simple laced Lie algebras, the $\text{B}_{N}$ and $\text{C}_{N}$ are Langlands dual to each other in mathematical. In this article, we give another Bethe/Gauge correspondence between 3d (or 2d) classical Lie group supersymmetry gauge theory with closed and open $\text{XXZ}$ (or $\text{XXX}$) spin chain. Here, the representations of the $\text{ADE}$ Lie algebras are self-dual, and while for…
▽ More
For non-simple laced Lie algebras, the $\text{B}_{N}$ and $\text{C}_{N}$ are Langlands dual to each other in mathematical. In this article, we give another Bethe/Gauge correspondence between 3d (or 2d) classical Lie group supersymmetry gauge theory with closed and open $\text{XXZ}$ (or $\text{XXX}$) spin chain. Here, the representations of the $\text{ADE}$ Lie algebras are self-dual, and while for the non-simple laced Lie algebras $\text{B}_{N}$ and $\text{C}_{N}$, their roles are exchanged in contrast with the results in \cite{DZ23a}. From Bethe/Gauge correspondence point of view, the two types of the effective superpotentials are Langlands duality to each other. For the $\text{B}_{N}$-type Lie algebra, a remarkable feature is that, to fix the spin sites by boundaries through Bethe/Gauge, the spins of the sites will be reversed. This is similarly to the so called electron-hole effect, we call this as a boundary-spin effect, a new kind of duality.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
On the proximal point algorithms for solving the monotone inclusion problem
Authors:
Tao Zhang,
Shiru Li,
Yong Xia
Abstract:
We consider finding a zero point of the maximally monotone operator $T$. First, instead of using the proximal point algorithm (PPA) for this purpose, we employ PPA to solve its Yosida regularization $T_λ$. Then, based on an $O(a_{k+1})$ ($a_{k+1}\geq \varepsilon>0$) resolvent index of $T$, it turns out that we can establish a convergence rate of $O (1/{\sqrt{\sum_{i=0}^{k}a_{i+1}^2}})$ for both th…
▽ More
We consider finding a zero point of the maximally monotone operator $T$. First, instead of using the proximal point algorithm (PPA) for this purpose, we employ PPA to solve its Yosida regularization $T_λ$. Then, based on an $O(a_{k+1})$ ($a_{k+1}\geq \varepsilon>0$) resolvent index of $T$, it turns out that we can establish a convergence rate of $O (1/{\sqrt{\sum_{i=0}^{k}a_{i+1}^2}})$ for both the $\|T_λ(\cdot)\|$ and the gap function $\mathtt{Gap}(\cdot)$ in the non-ergodic sense, and $O(1/\sum_{i=0}^{k}a_{i+1})$ for $\mathtt{Gap}(\cdot)$ in the ergodic sense. Second, to enhance the convergence rate of the newly-proposed PPA, we introduce an accelerated variant called the Contracting PPA. By utilizing a resolvent index of $T$ bounded by $O(a_{k+1})$ ($a_{k+1}\geq \varepsilon>0$), we establish a convergence rate of $O(1/\sum_{i=0}^{k}a_{i+1})$ for both $\|T_λ(\cdot)\|$ and $\mathtt {Gap}(\cdot)$, considering the non-ergodic sense. Third, to mitigate the limitation that the Contracting PPA lacks a convergence guarantee, we propose two additional versions of the algorithm. These novel approaches not only ensure guaranteed convergence but also provide sublinear and linear convergence rates for both $\|T_λ(\cdot)\|$ and $\mathtt {Gap}(\cdot)$, respectively, in the non-ergodic sense.
△ Less
Submitted 22 December, 2023; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Error estimation for the non-convex cosparse optimization problem
Authors:
Zisheng Liu,
Ting Zhang
Abstract:
When the signal does not have a sparse structure but has sparsity under a certain transformation domain, Nam et al. \cite{NS} introduced the cosparse analysis model, which provides a dual perspective on the sparse representation model. This paper mainly discusses the error estimation of non-convex $\ell_p(0<p<1)$ relaxation cosparse optimization model with noise condition. Compared with the existi…
▽ More
When the signal does not have a sparse structure but has sparsity under a certain transformation domain, Nam et al. \cite{NS} introduced the cosparse analysis model, which provides a dual perspective on the sparse representation model. This paper mainly discusses the error estimation of non-convex $\ell_p(0<p<1)$ relaxation cosparse optimization model with noise condition. Compared with the existing literature, under the same conditions, the value range of the $Ω$-RIP constant $δ_{7s}$ given in this paper is wider. When $p=0.5$ and $δ_{7s}=0.5$, the error constants $C_0$ and $C_1$ in this paper are better than those corresponding results in the literature \cite{Cand,LiSong1}. Moreover, when $0<p<1$, the error results of the non-convex relaxation method are significantly smaller than those of the convex relaxation method. The experimental results verify the correctness of the theoretical analysis and illustrate that the $\ell_p(0<p<1)$ method can provide robust reconstruction for cosparse optimization problems.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Local Convolution Enhanced Global Fourier Neural Operator For Multiscale Dynamic Spaces Prediction
Authors:
Xuanle Zhao,
Yue Sun,
Tielin Zhang,
Bo Xu
Abstract:
Neural operators extend the capabilities of traditional neural networks by allowing them to handle map**s between function spaces for the purpose of solving partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which is inspired by Green's function method and approximate operator kernel directly in the frequency domain. In this work, we focu…
▽ More
Neural operators extend the capabilities of traditional neural networks by allowing them to handle map**s between function spaces for the purpose of solving partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which is inspired by Green's function method and approximate operator kernel directly in the frequency domain. In this work, we focus on predicting multiscale dynamic spaces, which is equivalent to solving multiscale PDEs. Multiscale PDEs are characterized by rapid coefficient changes and solution space oscillations, which are crucial for modeling atmospheric convection and ocean circulation. To solve this problem, models should have the ability to capture rapid changes and process them at various scales. However, the FNO only approximates kernels in the low-frequency domain, which is insufficient when solving multiscale PDEs. To address this challenge, we propose a novel hierarchical neural operator that integrates improved Fourier layers with attention mechanisms, aiming to capture all details and handle them at various scales. These mechanisms complement each other in the frequency domain and encourage the model to solve multiscale problems. We perform experiments on dynamic spaces governed by forward and reverse problems of multiscale elliptic equations, Navier-Stokes equations and some other physical scenarios, and reach superior performance in existing PDE benchmarks, especially equations characterized by rapid coefficient variations.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Commutators for certain fractional type operators on weighted spaces and Orlicz-Morrey spaces
Authors:
Huoxiong Wu Tong Zhang
Abstract:
In this paper, we focus on a class of fractional type integral operators that can be served as extensions of Riesz potential with kernels $$K(x,y)=\frac{Ω_1(x-A_1 y)}{|x-A_1 y |^{\frac{n}{q_1}}} \cdots \frac{Ω_m(x-A_m y)}{|x-A_m y |^{\frac{n}{q_m}}},$$ where $α\in [0,n), m\geqslant1, \sum_{i=1}^m\frac{n}{q_i}=n-α$, $\{A_i\}^m_{i=1}$ are invertible matrixes, $Ω_i$ is homogeneous of degree 0 on…
▽ More
In this paper, we focus on a class of fractional type integral operators that can be served as extensions of Riesz potential with kernels $$K(x,y)=\frac{Ω_1(x-A_1 y)}{|x-A_1 y |^{\frac{n}{q_1}}} \cdots \frac{Ω_m(x-A_m y)}{|x-A_m y |^{\frac{n}{q_m}}},$$ where $α\in [0,n), m\geqslant1, \sum_{i=1}^m\frac{n}{q_i}=n-α$, $\{A_i\}^m_{i=1}$ are invertible matrixes, $Ω_i$ is homogeneous of degree 0 on $\R^n$ and $Ω_i\in L^{p_i}(S^{n-1})$ for some $p_i\in [1,\infty)$. Under appropriate assumptions, we obtain the weighted $L^p$ estimates as well as weighted Hardy estimates of the commutator for such operators with $BMO$-type function. In addition, we acquire the boundedness of these operators and their commutators with a function in Campanato space on Orcliz-Morrey spaces as well as the compactness for such commutators in a special case: $m=1$ and $A=I$.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Improved Convergence Rates of Windowed Anderson Acceleration for Symmetric Fixed-Point Iterations
Authors:
Casey Garner,
Gilad Lerman,
Teng Zhang
Abstract:
This paper studies the commonly utilized windowed Anderson acceleration (AA) algorithm for fixed-point methods, $x^{(k+1)}=q(x^{(k)})$. It provides the first proof that when the operator $q$ is linear and symmetric the windowed AA, which uses a sliding window of prior iterates, improves the root-linear convergence factor over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric J…
▽ More
This paper studies the commonly utilized windowed Anderson acceleration (AA) algorithm for fixed-point methods, $x^{(k+1)}=q(x^{(k)})$. It provides the first proof that when the operator $q$ is linear and symmetric the windowed AA, which uses a sliding window of prior iterates, improves the root-linear convergence factor over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric Jacobian at a fixed point, a slightly modified AA algorithm is proved to have an analogous root-linear convergence factor improvement over fixed-point iterations. Simulations verify our observations. Furthermore, experiments with different data models demonstrate AA is significantly superior to the standard fixed-point methods for Tyler's M-estimation.
△ Less
Submitted 8 March, 2024; v1 submitted 4 November, 2023;
originally announced November 2023.
-
Foundations of matroids -- Part 2: Further theory, examples, and computational methods
Authors:
Matthew Baker,
Oliver Lorscheid,
Tianyi Zhang
Abstract:
In this sequel to "Foundations of matroids - Part 1", we establish several presentations of the foundation of a matroid in terms of small building blocks. For example, we show that the foundation of a matroid M is the colimit of the foundations of all embedded minors of M isomorphic to one of the matroids $U^2_4$, $U^2_5$, $U^3_5$, $C_5$, $C_5^\ast$, $U^2_4\oplus U^1_2$, $F_7$, $F_7^\ast$, and we…
▽ More
In this sequel to "Foundations of matroids - Part 1", we establish several presentations of the foundation of a matroid in terms of small building blocks. For example, we show that the foundation of a matroid M is the colimit of the foundations of all embedded minors of M isomorphic to one of the matroids $U^2_4$, $U^2_5$, $U^3_5$, $C_5$, $C_5^\ast$, $U^2_4\oplus U^1_2$, $F_7$, $F_7^\ast$, and we show that this list is minimal. We establish similar minimal lists of building blocks for the classes of 2-connected and 3-connected matroids. We also establish a presentation for the foundation of a matroid in terms of its lattice of flats. Each of these presentations provides a useful method to compute the foundation of certain matroids, as we illustrate with a number of concrete examples. Combining these techniques with other results in the literature, we are able to compute the foundations of several interesting classes of matroids, including whirls, rank-2 uniform matroids, and projective geometries. In an appendix, we catalogue various 'small' pastures which occur as foundations of matroids, most of which were found with the assistance of a computer, and we discuss some of their interesting properties.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Lie triple 2-algebras
Authors:
Tao Zhang,
Zhang-Ju Liu
Abstract:
We invent a new cohomology theory for Lie triple algebras. Using this cohomology, we introduce the notions of 2-term $L_\infty$-triple algebras and Lie triple 2-algebras. We prove that the category of 2-term $L_\infty$-triple algebras is equivalent to the category of Lie triple 2-algebras. Crossed modules of Lie triple algebras are studied in detail.
We invent a new cohomology theory for Lie triple algebras. Using this cohomology, we introduce the notions of 2-term $L_\infty$-triple algebras and Lie triple 2-algebras. We prove that the category of 2-term $L_\infty$-triple algebras is equivalent to the category of Lie triple 2-algebras. Crossed modules of Lie triple algebras are studied in detail.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Convergence analysis on the alternating direction method of multipliers for the cosparse optimization problem
Authors:
Zisheng Liu,
Ting Zhang
Abstract:
From a dual perspective of the sparse representation model, Nam et al. proposed the cosparse analysis model. In this paper, we aim to investigate the convergence of the alternating direction method of multipliers (ADMM) for the cosparse optimization problem. First, we examine the variational inequality representation of the cosparse optimization problem by introducing auxiliary variables. Second,…
▽ More
From a dual perspective of the sparse representation model, Nam et al. proposed the cosparse analysis model. In this paper, we aim to investigate the convergence of the alternating direction method of multipliers (ADMM) for the cosparse optimization problem. First, we examine the variational inequality representation of the cosparse optimization problem by introducing auxiliary variables. Second, ADMM is used to solve cosparse optimization problem. Finally, by utilizing a tight frame with a uniform row norm and building upon lemmas and the strict contraction theorem, we establish a worst-case $\mathcal{O}(1/t)$ convergence rate in the ergodic sense.
△ Less
Submitted 22 November, 2023; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Online Estimation with Rolling Validation: Adaptive Nonparametric Estimation with Streaming Data
Authors:
Tianyu Zhang,
**g Lei
Abstract:
Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example includes variants of stochastic gradient descent. These algorithms often take one sample point at a time and instantly update the parameter estimate of interest. In this work we consider model selection and hyperparameter tuning for such online al…
▽ More
Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example includes variants of stochastic gradient descent. These algorithms often take one sample point at a time and instantly update the parameter estimate of interest. In this work we consider model selection and hyperparameter tuning for such online algorithms. We propose a weighted rolling-validation procedure, an online variant of leave-one-out cross-validation, that costs minimal extra computation for many typical stochastic gradient descent estimators. Similar to batch cross-validation, it can boost base estimators to achieve a better, adaptive convergence rate. Our theoretical analysis is straightforward, relying mainly on some general statistical stability assumptions. The simulation study underscores the significance of diverging weights in rolling validation in practice and demonstrates its sensitivity even when there is only a slim difference between candidate estimators.
△ Less
Submitted 4 April, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
When $D$-companion matrix meets incomplete polynomials
Authors:
Teng Zhang
Abstract:
In this paper, we provide a simple proof of a generalization of the Gauss-Lucas theorem. By using methods of D-companion matrix, we get the majorization relationship between the zeros of convex combinations of incomplete polynomials and an origin polynomial. Moreover, we prove that the set of all zeros of all convex combinations of incomplete polynomials coincides with the closed convex hull of ze…
▽ More
In this paper, we provide a simple proof of a generalization of the Gauss-Lucas theorem. By using methods of D-companion matrix, we get the majorization relationship between the zeros of convex combinations of incomplete polynomials and an origin polynomial. Moreover, we prove that the set of all zeros of all convex combinations of incomplete polynomials coincides with the closed convex hull of zeros of the original polynomial. The location of zeros of convex combinations of incomplete polynomials is determined.
△ Less
Submitted 7 January, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Deformations and extensions of homotopy associative algebras
Authors:
Tao Zhang
Abstract:
The representation and the cohomology theory of associative 2-algebras are developed. We study the deformations and abelian extensions of associative 2-algebras in details.
The representation and the cohomology theory of associative 2-algebras are developed. We study the deformations and abelian extensions of associative 2-algebras in details.
△ Less
Submitted 28 December, 2023; v1 submitted 28 July, 2023;
originally announced October 2023.
-
Robust globally divergence-free Weak Galerkin finite element method for incompressible Magnetohydrodynamics flow
Authors:
Min Zhang,
Tong Zhang,
Abstract:
This paper develops a weak Galerkin (WG) finite element method of arbitrary order for the steady incompressible Magnetohydrodynamics equations. The WG scheme uses piecewise polynomials of degrees $k(k\geq 1),k,k-1$, and $k-1$ respectively for the approximations of the velocity, the magnetic field, the pressure, and the magnetic pseudo-pressure in the interior of elements, and uses piecewise polyno…
▽ More
This paper develops a weak Galerkin (WG) finite element method of arbitrary order for the steady incompressible Magnetohydrodynamics equations. The WG scheme uses piecewise polynomials of degrees $k(k\geq 1),k,k-1$, and $k-1$ respectively for the approximations of the velocity, the magnetic field, the pressure, and the magnetic pseudo-pressure in the interior of elements, and uses piecewise polynomials of degree $k$ for their numerical traces on the interfaces of elements. The method is shown to yield globally divergence-free approximations of the velocity and magnetic fields. We give existence and uniqueness results for the discrete scheme and derive optimal a priori error estimates. We also present a convergent linearized iterative algorithm. Numerical experiments are provided to verify the obtained theoretical results.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
On (co-)morphisms of $n$-Lie-Rinehart algebras with applications to Nambu-Poisson manifolds
Authors:
Yanhui Bi,
Zhixiong Chen,
Tao Zhang
Abstract:
In this paper, we give a unified description of morphisms and comorphisms of $n$-Lie-Rinehart algebras. We show that these morphisms and comorphisms can be regarded as two subalgebras of the $ψ$-sum of $n$-Lie-Rinehart algebras. We also provide similar descriptions for morphisms and comorphisms of $n$-Lie algebroids. It is proved that the category of vector bundles with Nambu-Poisson structures of…
▽ More
In this paper, we give a unified description of morphisms and comorphisms of $n$-Lie-Rinehart algebras. We show that these morphisms and comorphisms can be regarded as two subalgebras of the $ψ$-sum of $n$-Lie-Rinehart algebras. We also provide similar descriptions for morphisms and comorphisms of $n$-Lie algebroids. It is proved that the category of vector bundles with Nambu-Poisson structures of rank $n$ and the category of their dual bundles with $n$-Lie algebroid structures of rank $n$ are equivalent to each other.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Interlacing property of a family of generating polynomials over Dyck paths
Authors:
Bo Wang,
Candice X. T. Zhang
Abstract:
In the study of a tantalizing symmetry on Catalan objects, Bóna et al. introduced a family of polynomials $\{W_{n,k}(x)\}_{n\geq k\geq 0}$ defined by \begin{align*} W_{n,k}(x)=\sum_{m=0}^{k}w_{n,k,m}x^{m}, \end{align*} where $w_{n,k,m}$ counts the number of Dyck paths of semilength $n$ with $k$ occurrences of $UD$ and $m$ occurrences of $UUD$. They proposed two conjectures on the interlacing prope…
▽ More
In the study of a tantalizing symmetry on Catalan objects, Bóna et al. introduced a family of polynomials $\{W_{n,k}(x)\}_{n\geq k\geq 0}$ defined by \begin{align*} W_{n,k}(x)=\sum_{m=0}^{k}w_{n,k,m}x^{m}, \end{align*} where $w_{n,k,m}$ counts the number of Dyck paths of semilength $n$ with $k$ occurrences of $UD$ and $m$ occurrences of $UUD$. They proposed two conjectures on the interlacing property of these polynomials, one of which states that $\{W_{n,k}(x)\}_{n\geq k}$ is a Sturm sequence for any fixed $k\geq 1$, and the other states that $\{W_{n,k}(x)\}_{1\leq k\leq n}$ is a Sturm-unimodal sequence for any fixed $n\geq 1$. In this paper, we obtain certain recurrence relations for $W_{n,k}(x)$, and further confirm their conjectures.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Symplectic Structure-Aware Hamiltonian (Graph) Embeddings
Authors:
Jiaxu Liu,
** Yi,
Tianle Zhang,
Xiaowei Huang
Abstract:
In traditional Graph Neural Networks (GNNs), the assumption of a fixed embedding manifold often limits their adaptability to diverse graph geometries. Recently, Hamiltonian system-inspired GNNs have been proposed to address the dynamic nature of such embeddings by incorporating physical laws into node feature updates. We present Symplectic Structure-Aware Hamiltonian GNN (SAH-GNN), a novel approac…
▽ More
In traditional Graph Neural Networks (GNNs), the assumption of a fixed embedding manifold often limits their adaptability to diverse graph geometries. Recently, Hamiltonian system-inspired GNNs have been proposed to address the dynamic nature of such embeddings by incorporating physical laws into node feature updates. We present Symplectic Structure-Aware Hamiltonian GNN (SAH-GNN), a novel approach that generalizes Hamiltonian dynamics for more flexible node feature updates. Unlike existing Hamiltonian approaches, SAH-GNN employs Riemannian optimization on the symplectic Stiefel manifold to adaptively learn the underlying symplectic structure, circumventing the limitations of existing Hamiltonian GNNs that rely on a pre-defined form of standard symplectic structure. This innovation allows SAH-GNN to automatically adapt to various graph datasets without extensive hyperparameter tuning. Moreover, it conserves energy during training meaning the implicit Hamiltonian system is physically meaningful. Finally, we empirically validate SAH-GNN's superiority and adaptability in node classification tasks across multiple types of graph datasets.
△ Less
Submitted 1 December, 2023; v1 submitted 9 September, 2023;
originally announced September 2023.
-
Reflection of Stochastic Evolution Equations in Infinite Dimensional Domains
Authors:
Zdzisław Brzeźniak,
Tusheng Zhang
Abstract:
In this paper, we establish the existence and the uniqueness of solutions of stochastic evolution equations (SEEs) with reflection in an infinite dimensional ball. Our framework is sufficiently general to include e.g. the stochastic Navier-Stokes equations.
In this paper, we establish the existence and the uniqueness of solutions of stochastic evolution equations (SEEs) with reflection in an infinite dimensional ball. Our framework is sufficiently general to include e.g. the stochastic Navier-Stokes equations.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
Drinfeld Modular Curves Subordinate to Conjugacy Classes of Nilpotent Upper-Triangular Matrices
Authors:
Zhuo Chen,
Chuangqiang Hu,
Tao Zhang,
Xiaopeng Zheng
Abstract:
We introduce normalized Drinfeld modular curves that parameterize rank $m$ Drinfeld modules compatible with a $T$-torsion structure arising from a given conjugacy class of nilpotent upper-triangular $n\times n$ matrices with rank $\geqslant n-m$ over a finite field $\mathbb{F}_q$. This creates a deep link connecting the classification of nilpotent upper-triangular matrices and the decomposition of…
▽ More
We introduce normalized Drinfeld modular curves that parameterize rank $m$ Drinfeld modules compatible with a $T$-torsion structure arising from a given conjugacy class of nilpotent upper-triangular $n\times n$ matrices with rank $\geqslant n-m$ over a finite field $\mathbb{F}_q$. This creates a deep link connecting the classification of nilpotent upper-triangular matrices and the decomposition of Drinfeld modular curves. The conjugacy classes of nilpotent upper-triangular matrices one-to-one corresponds to certain $T$-torsion flags, and form a tree structure. As a result, the associated Drinfeld modular curves are organized in the same tree. This generalizes the tower structure introduced by Bassa, Beelen, Garcia, Stichtenoth, and others. Additionally,we prove the geometric irreducibility of $(3,2)$-type normalized Drinfeld modular curves, and characterize their associated function fields.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
A convergent interacting particle method for computing KPP front speeds in random flows
Authors:
Tan Zhang,
Zhongjian Wang,
Jack Xin,
Zhiwen Zhang
Abstract:
We aim to efficiently compute spreading speeds of reaction-diffusion-advection (RDA) fronts in divergence free random flows under the Kolmogorov-Petrovsky-Piskunov (KPP) nonlinearity. We study a stochastic interacting particle method (IPM) for the reduced principal eigenvalue (Lyapunov exponent) problem of an associated linear advection-diffusion operator with spatially random coefficients. The Fo…
▽ More
We aim to efficiently compute spreading speeds of reaction-diffusion-advection (RDA) fronts in divergence free random flows under the Kolmogorov-Petrovsky-Piskunov (KPP) nonlinearity. We study a stochastic interacting particle method (IPM) for the reduced principal eigenvalue (Lyapunov exponent) problem of an associated linear advection-diffusion operator with spatially random coefficients. The Fourier representation of the random advection field and the Feynman-Kac (FK) formula of the principal eigenvalue (Lyapunov exponent) form the foundation of our method implemented as a genetic evolution algorithm. The particles undergo advection-diffusion, and mutation/selection through a fitness function originated in the FK semigroup. We analyze convergence of the algorithm based on operator splitting, present numerical results on representative flows such as 2D cellular flow and 3D Arnold-Beltrami-Childress (ABC) flow under random perturbations. The 2D examples serve as a consistency check with semi-Lagrangian computation. The 3D results demonstrate that IPM, being mesh free and self-adaptive, is simple to implement and efficient for computing front spreading speeds in the advection-dominated regime for high-dimensional random flows on unbounded domains where no truncation is needed.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
A VMiPG method for composite optimization with nonsmooth term having no closed-form proximal map**
Authors:
Taiwei Zhang,
Shaohua Pan,
Ruyu Liu
Abstract:
This paper concerns the minimization of the sum of a twice continuously differentiable function $f$ and a nonsmooth convex function $g$ without closed-form proximal map**. For this class of nonconvex and nonsmooth problems, we propose a line-search based variable metric inexact proximal gradient (VMiPG) method with uniformly bounded positive definite variable metric linear operators. This method…
▽ More
This paper concerns the minimization of the sum of a twice continuously differentiable function $f$ and a nonsmooth convex function $g$ without closed-form proximal map**. For this class of nonconvex and nonsmooth problems, we propose a line-search based variable metric inexact proximal gradient (VMiPG) method with uniformly bounded positive definite variable metric linear operators. This method computes in each step an inexact minimizer of a strongly convex model such that the difference between its objective value and the optimal value is controlled by its squared distance from the current iterate, and then seeks an appropriate step-size along the obtained direction with an armijo line-search criterion. We prove that the iterate sequence converges to a stationary point when $f$ and $g$ are definable in the same o-minimal structure over the real field $(\mathbb{R},+,\cdot)$, and if addition the objective function $f+g$ is a KL function of exponent $1/2$, the convergence has a local R-linear rate. The proposed VMiPG method with the variable metric linear operator constructed by the Hessian of the function $f$ is applied to the scenario that $f$ and $g$ have common composite structure, and numerical comparison with a state-of-art variable metric line-search algorithm indicates that the Hessian-based VMiPG method has a remarkable advantage in terms of the quality of objective values and the running time for those difficult problems such as high-dimensional fused weighted-lasso regressions.
△ Less
Submitted 6 April, 2024; v1 submitted 26 August, 2023;
originally announced August 2023.
-
On the large time asymptotics of bi-laplacian Schrödinger equation with general data
Authors:
Avy Soffer,
Jiayan Wu,
Xiaoxu Wu,
Ting Zhang
Abstract:
We study the bi-laplacian Schrödinger equation with a general interaction term, which can be either linear or nonlinear, and is time-dependent. We prove that the global solutions for this equation are asymptotically given by a free wave and a weakly localized part. The proof relies on constructing the Free Channel Wave Operator in a new way, based on the method developed from recent studies \cite{…
▽ More
We study the bi-laplacian Schrödinger equation with a general interaction term, which can be either linear or nonlinear, and is time-dependent. We prove that the global solutions for this equation are asymptotically given by a free wave and a weakly localized part. The proof relies on constructing the Free Channel Wave Operator in a new way, based on the method developed from recent studies \cite{SW20221}.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
Asymptotic-preserving neural networks for multiscale Vlasov-Poisson-Fokker-Planck system in the high-field regime
Authors:
Shi **,
Zheng Ma,
Tian-ai Zhang
Abstract:
The Vlasov-Poisson-Fokker-Planck (VPFP) system is a fundamental model in plasma physics that describes the Brownian motion of a large ensemble of particles within a surrounding bath. Under the high-field scaling, both collision and field are dominant. This paper introduces two Asymptotic-Preserving Neural Network (APNN) methods within a physics-informed neural network (PINN) framework for solving…
▽ More
The Vlasov-Poisson-Fokker-Planck (VPFP) system is a fundamental model in plasma physics that describes the Brownian motion of a large ensemble of particles within a surrounding bath. Under the high-field scaling, both collision and field are dominant. This paper introduces two Asymptotic-Preserving Neural Network (APNN) methods within a physics-informed neural network (PINN) framework for solving the VPFP system in the high-field regime. These methods aim to overcome the computational challenges posed by high dimensionality and multiple scales of the system. The first APNN method leverages the micro-macro decomposition model of the original VPFP system, while the second is based on the mass conservation law. Both methods ensure that the loss function of the neural networks transitions naturally from the kinetic model to the high-field limit model, thereby preserving the correct asymptotic behavior. Through extensive numerical experiments, these APNN methods demonstrate their effectiveness in solving multiscale and high dimensional uncertain problems, as well as their broader applicability for problems with long time duration and non-equilibrium initial data.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Patterson-Sullivan measures for relatively Anosov groups
Authors:
Richard Canary,
Andrew Zimmer,
Tengren Zhang
Abstract:
We establish existence, uniqueness and ergodicity results for Patterson-Sullivan measures for relatively Anosov groups. As applications we obtain an entropy gap theorem and a strict concavity result for entropies associated to linear functionals.
We establish existence, uniqueness and ergodicity results for Patterson-Sullivan measures for relatively Anosov groups. As applications we obtain an entropy gap theorem and a strict concavity result for entropies associated to linear functionals.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.