-
High-probability minimax lower bounds
Authors:
Tianyi Ma,
Kabir A. Verchand,
Richard J. Samworth
Abstract:
The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the…
▽ More
The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the notion of a minimax quantile, and seek to articulate its dependence on the quantile level. To this end, we develop high-probability variants of the classical Le Cam and Fano methods, as well as a technique to convert local minimax risk lower bounds to lower bounds on minimax quantiles. To illustrate the power of our framework, we deploy our techniques on several examples, recovering recent results in robust mean estimation and stochastic convex optimisation, as well as obtaining several new results in covariance matrix estimation, sparse linear regression, nonparametric density estimation and isotonic regression. Our overall goal is to argue that minimax quantiles can provide a finer-grained understanding of the difficulty of statistical problems, and that, in wide generality, lower bounds on these quantities can be obtained via user-friendly tools.
△ Less
Submitted 4 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Strong convergence rates for full-discrete approximations of the stochastic Allen-Cahn equations on 2D torus
Authors:
Ting Ma,
Lifei Wang,
Huanyu Yang
Abstract:
In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $α-δ$ and the temporal order of $α/{6}-δ$ in $\mathcal C^{-α}$ for…
▽ More
In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $α-δ$ and the temporal order of $α/{6}-δ$ in $\mathcal C^{-α}$ for $α\in(0,1/3)$ and $δ>0$ arbitrarily small.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Representations of non-finitely graded Lie algebras related to Virasoro algebra
Authors:
Chunguang Xia,
Tianyu Ma,
Xiao Dong,
Ming**g Zhang
Abstract:
In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and is…
▽ More
In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and isomorphism classes of these modules.
△ Less
Submitted 3 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
On eigenvalues and eigenfunctions of the operators defining multidimensional scaling on some symmetric spaces
Authors:
Tianyu Ma,
Eugene Stepanov
Abstract:
We study asymptotics of the eigenvalues and eigenfunctions of the operators used for constructing multidimensional scaling (MDS) on compact connected Riemannian manifolds, in particular on closed connected symmetric spaces. They are the limits of eigenvalues and eigenvectors of squared distance matrices of an increasing sequence of finite subsets covering the space densely in the limit. We show th…
▽ More
We study asymptotics of the eigenvalues and eigenfunctions of the operators used for constructing multidimensional scaling (MDS) on compact connected Riemannian manifolds, in particular on closed connected symmetric spaces. They are the limits of eigenvalues and eigenvectors of squared distance matrices of an increasing sequence of finite subsets covering the space densely in the limit. We show that for products of spheres and real projective spaces, the numbers of positive and negative eigenvalues of these operators are both infinite. We also find a class of spaces (namely $\mathbb{RP}^n$ with odd $n>1$) whose MDS defining operators are not trace class, and original distances cannot be reconstructed from the eigenvalues and eigenfunctions of these operators.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Optimized electrified meeting-point-based feeder bus services with capacitated charging stations and partial recharges
Authors:
Tai-Yu Ma,
Yumeng Fang,
Richard D. Connors,
Francesco Viti,
Haruko Nakao
Abstract:
Meeting-point-based feeder services using EVs have good potential to achieve an efficient and clean on-demand mobility service. However, customer-to-meeting-point, vehicle routing, and charging scheduling need to be jointly optimized to achieve the best system performance. To this aim, we assess the effect of different system parameters and configure them based on our previously developed hybrid m…
▽ More
Meeting-point-based feeder services using EVs have good potential to achieve an efficient and clean on-demand mobility service. However, customer-to-meeting-point, vehicle routing, and charging scheduling need to be jointly optimized to achieve the best system performance. To this aim, we assess the effect of different system parameters and configure them based on our previously developed hybrid metaheuristic algorithm. A set of test instances based on morning peak hour commuting scenarios between the cities of Arlon and Luxembourg are used to evaluate the impact of the set parameters on the optimal solutions. The experimental results suggest that higher meeting point availability can achieve better system performance. By jointly configuring different system parameters, the overall system performance can be significantly improved (-10.8% total kilometers traveled by vehicles compared to the benchmark) to serve all requests. Our experimental results show that the meeting-point-based system can reduce up to 70.2% the fleet size, 6.4% the in-vehicle travel time and 49.4% the kilometers traveled when compared to a traditional door-to-door dial-a-ride system.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
A hybrid metaheuristic to optimize electric first-mile feeder services with charging synchronization constraints and customer rejections
Authors:
Tai-Yu Ma,
Yumeng Fang,
Richard D. Connors,
Francesco Viti,
Haruko Nakao
Abstract:
This paper addresses the on-demand meeting-point-based feeder electric bus routing and charging scheduling problem under charging synchronization constraints. The problem considered exhibits the structure of the location routing problem, which is more difficult to solve than many electric vehicle routing problems with capacitated charging stations. We propose to model the problem using a mixed-int…
▽ More
This paper addresses the on-demand meeting-point-based feeder electric bus routing and charging scheduling problem under charging synchronization constraints. The problem considered exhibits the structure of the location routing problem, which is more difficult to solve than many electric vehicle routing problems with capacitated charging stations. We propose to model the problem using a mixed-integer linear programming approach based on a layered graph structure. An efficient hybrid metaheuristic solution algorithm is proposed. A mixture of random and greedy partial charging scheduling strategies is used to find feasible charging schedules under the synchronization constraints. The algorithm is tested on instances with up to 100 customers and 49 bus stops/meeting points. The results show that the proposed algorithm provides near-optimal solutions within less one minute on average compared with the best solutions found by a mixed-integer linear programming solver set with a 4-hour computation time limit. A case study on a larger sized case with 1000 customers and 111 meeting points shows the proposed method is applicable to real-world situations.
△ Less
Submitted 8 February, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ Activation
Authors:
Juncai He,
Tong Mao,
**chao Xu
Abstract:
In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial…
▽ More
In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial representation using deep ReLU$^k$ networks. This allows us to establish an upper bound on both the size and count of network parameters. Consequently, we are able to demonstrate a suboptimal approximation rate for functions from Sobolev spaces as well as for analytic functions. Additionally, through an exploration of the representation power of deep ReLU$^k$ networks for shallow networks, we reveal that deep ReLU$^k$ networks can approximate functions from a range of variation spaces, extending beyond those generated solely by the ReLU$^k$ activation function. This finding demonstrates the adaptability of deep ReLU$^k$ networks in approximating functions within various variation spaces.
△ Less
Submitted 10 January, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Distance spectral conditions for $ID$-factor-critical and fractional $[a, b]$-factor of graphs
Authors:
Tingyan Ma,
Ligong Wang
Abstract:
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. A graph is $ID$-factor-critical if for every independent set $I$ of $G$ whose size has the same parity as $|V(G)|$, $G-I$ has a perfect matching. For two positive integers $a$ and $b$ with $a\leq b$, let $h$: $E(G)\rightarrow [0, 1]$ be a function on $E(G)$ satisfying $a\leq\sum _{e\in E_{G}(v_{i})}h(e)\leq b$ for any vert…
▽ More
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. A graph is $ID$-factor-critical if for every independent set $I$ of $G$ whose size has the same parity as $|V(G)|$, $G-I$ has a perfect matching. For two positive integers $a$ and $b$ with $a\leq b$, let $h$: $E(G)\rightarrow [0, 1]$ be a function on $E(G)$ satisfying $a\leq\sum _{e\in E_{G}(v_{i})}h(e)\leq b$ for any vertex $v_{i}\in V(G)$. Then the spanning subgraph with edge set $E_{h}$, denoted by $G[E_{h}]$, is called a fractional $[a, b]$-factor of $G$ with indicator function $h$, where $E_{h}=\{e\in E(G)\mid h(e)>0\}$ and $E_{G}(v_{i})=\{e\in E(G)\mid e$ is incident with $v_{i}$ in $G$\}. A graph is defined as a fractional $[a, b]$-deleted graph if for any $e\in E(G)$, $G-e$ contains a fractional $[a, b]$-factor. For any integer $k\geq 1$, a graph has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper, we firstly give a distance spectral radius condition of $G$ to guarantee that $G$ is $ID$-factor-critical. Furthermore, we provide sufficient conditions in terms of distance spectral radius and distance signless Laplacian spectral radius for a graph to contain a fractional $[a, b]$-factor, fractional $[a, b]$-deleted-factor and $k$-factor.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
When Leibniz algebras are Nijenhuis?
Authors:
Haiying Li,
Tianshui Ma,
Shuanhong Wang
Abstract:
Leibniz algebras can be seen as a ``non-commutative" analogue of Lie algebras. Nijenhuis operators on Leibniz algebras introduced by Cariñena, Grabowski, and Marmo in [J. Phys. A: Math. Gen. 37(2004)] are (1, 1)-tensors with vanishing Nijenhuis torsion. Recently triangular Leibniz bialgebras were introduced by Tang and Sheng in [J. Noncommut. Geom. 16(2022)] via the twisting theory of twilled Leib…
▽ More
Leibniz algebras can be seen as a ``non-commutative" analogue of Lie algebras. Nijenhuis operators on Leibniz algebras introduced by Cariñena, Grabowski, and Marmo in [J. Phys. A: Math. Gen. 37(2004)] are (1, 1)-tensors with vanishing Nijenhuis torsion. Recently triangular Leibniz bialgebras were introduced by Tang and Sheng in [J. Noncommut. Geom. 16(2022)] via the twisting theory of twilled Leibniz algebras. In this paper we find that Leibniz algebras are very closely related to Nijenhuis operators, and prove that a triangular symplectic Leibniz bialgebra together with a dual triangular structure must possess Nijenhuis operators, which makes it possible to study the applications of Nijehhuis operators from the perspective of Leibniz algebras. At the same time, we regain the classical Leibniz Yang-Baxter equation by using the tensor form of classical $r$-matrics. At last we give the classification of triangular Leibniz bialgebras of low dimensions.
△ Less
Submitted 17 June, 2024; v1 submitted 22 October, 2023;
originally announced October 2023.
-
Infinitesimal (BiHom-)bialgebras of any weight (II): Representations
Authors:
Tianshui Ma,
Abdenacer Makhlouf
Abstract:
The aim of this paper is to investigate representation theory of infinitesimal (BiHom-)bialgebras of any weight $ł$ (abbr. $ł$-inf(BH)-bialgebras). Firstly, inspired by the well-known Majid-Radford's bosonization theory in Hopf algebra theory, we present a class of $ł$-inf(BH)-bialgebras, named $ł$-inf(BH)-biproduct bialgebras, consisting of an inf(BH)-product algebra structure and an inf(BH)-copr…
▽ More
The aim of this paper is to investigate representation theory of infinitesimal (BiHom-)bialgebras of any weight $ł$ (abbr. $ł$-inf(BH)-bialgebras). Firstly, inspired by the well-known Majid-Radford's bosonization theory in Hopf algebra theory, we present a class of $ł$-inf(BH)-bialgebras, named $ł$-inf(BH)-biproduct bialgebras, consisting of an inf(BH)-product algebra structure and an inf(BH)-coproduct coalgebra structure, which induces a structure of a $ł$-inf(BH)-Hopf bimodule over a $ł$-inf(BH)-bialgebra. Secondly, we explore relationships among $ł$-inf(BH)-Hopf bimodules, $ł$-Rota-Baxter (BiHom-)bimodules, (BiHom-)dendriform bimodules and (BiHom-)pre-Lie bimodules. Finally, we provide two kinds of general Gelfand-Dorfman theorems related to BiHom-Novikov algebras.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Fundamentals of thermoelasticity for curved beams
Authors:
Marcio A. Jorge Silva,
To Fu Ma
Abstract:
The purpose of this paper is twofold. Firstly, we conduct an in-depth analysis of mathematical modeling concerning thermal-mechanical curved beams, by taking into consideration three primary forces widely accepted in the literature: axial load, shear force, and bending moment. Additionally, we examine their appropriate thermal couplings, shedding light on the intricate interplay between stress-str…
▽ More
The purpose of this paper is twofold. Firstly, we conduct an in-depth analysis of mathematical modeling concerning thermal-mechanical curved beams, by taking into consideration three primary forces widely accepted in the literature: axial load, shear force, and bending moment. Additionally, we examine their appropriate thermal couplings, shedding light on the intricate interplay between stress-strain relationships and temperature variations. This analysis is situated within the well-recognized context of the Bresse governing model for arched beams. Secondly, drawing upon distinguished constitutive laws for heat flux of conduction, we compile a comprehensive list of thermoelastic curved beam systems in various scenarios. We introduce new categories of problems that exhibit specific features from the thermal point of view.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Convolution formulas for multivariate arithmetic Tutte polynomials
Authors:
Tianlong Ma,
Xian'an **,
Weiling Yang
Abstract:
The multivariate arithmetic Tutte polynomial of arithmetic matroids is a generalization of the multivariate Tutte polynomial of matroids. In this note, we give the convolution formulas for the multivariate arithmetic Tutte polynomial of the product of two arithmetic matroids. In particular, the convolution formulas for the multivariate arithmetic Tutte polynomial of an arithmetic matroid are obtai…
▽ More
The multivariate arithmetic Tutte polynomial of arithmetic matroids is a generalization of the multivariate Tutte polynomial of matroids. In this note, we give the convolution formulas for the multivariate arithmetic Tutte polynomial of the product of two arithmetic matroids. In particular, the convolution formulas for the multivariate arithmetic Tutte polynomial of an arithmetic matroid are obtained. Applying our results, several known convolution formulas including [5, Theorem 10.9 and Corollary 10.10] and [1, Theorems 1 and 4] are proved by a purely combinatorial proof. The proofs presented here are significantly shorter than the previous ones. In addition, we obtain a convolution formula for the characteristic polynomial of an arithmetic matroid.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
On the maximum local mean order of sub-k-trees of a k-tree
Authors:
Zhuo Li,
Tianlong Ma,
Fengming Dong,
Xian'an **
Abstract:
For a k-tree T, a generalization of a tree, the local mean order of sub-k-trees of T is the average order of sub-k-trees of T containing a given k-clique. The problem whether the largest local mean order of a tree (i.e., a 1-tree) at a vertex always takes on at a leaf was asked by Jamison in 1984 and was answered by Wagner and Wang in 2016. In 2018, Stephens and Oellermann asked a similar problem:…
▽ More
For a k-tree T, a generalization of a tree, the local mean order of sub-k-trees of T is the average order of sub-k-trees of T containing a given k-clique. The problem whether the largest local mean order of a tree (i.e., a 1-tree) at a vertex always takes on at a leaf was asked by Jamison in 1984 and was answered by Wagner and Wang in 2016. In 2018, Stephens and Oellermann asked a similar problem: for any k-tree T, does the maximum local mean order of sub-k-trees containing a given k-clique occur at a k-clique that is not a major k-clique of T? In this paper, we give it an affirmative answer.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Infinitesimal (BiHom-)bialgebras of any weight (I): Basic definitions and properties
Authors:
Tianshui Ma,
Abdenacer Makhlouf
Abstract:
The purpose of this paper is to introduce and study $λ$-infinitesimal BiHom-bialgebras (abbr. $ł$-infBH-bialgebra) and some related structures. They can be seen as an extension of $ł$-infinitesimal bialgebras considered by Ebrahimi-Fard, including Joni and Rota's infinitesimal bialgebras as well as Loday and Ronco's infinitesimal bialgebras, and including also infinitesimal BiHom-bialgebras introd…
▽ More
The purpose of this paper is to introduce and study $λ$-infinitesimal BiHom-bialgebras (abbr. $ł$-infBH-bialgebra) and some related structures. They can be seen as an extension of $ł$-infinitesimal bialgebras considered by Ebrahimi-Fard, including Joni and Rota's infinitesimal bialgebras as well as Loday and Ronco's infinitesimal bialgebras, and including also infinitesimal BiHom-bialgebras introduced by Liu, Makhlouf, Menini, Panaite. In this paper, we provide various relevant constructions and new concepts. Two ways are provided for a unitary (resp. counitary) algebra (coalgebra) to be a $ł$-infBH-bialgebra and the notion of $ł$-infBH-Hopf module is introduced and discussed. It is proved, in connexion with nonhomogeneous (co)associative BiHom-Yang-Baxter equation, that every (left BiHom-)module (resp. comodule) over a (anti-)quasitriangular (resp. (anti-)coquasitriangular) $ł$-infBH-bialgebra carries a structure of $ł$-infBH-Hopf module. Moreover, two approaches to construct BiHom-pre-Lie (co)algebras from $ł$-infBH-bialgebras are presented.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Tractability of approximation by general shallow networks
Authors:
Hrushikesh Mhaskar,
Tong Mao
Abstract:
In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)dτ( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form…
▽ More
In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)dτ( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces.
△ Less
Submitted 10 December, 2023; v1 submitted 6 August, 2023;
originally announced August 2023.
-
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Authors:
Kaiyue Wen,
Zhiyuan Li,
Tengyu Ma
Abstract:
Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investi…
▽ More
Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.
△ Less
Submitted 22 July, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Extreme coefficients of multiplicity Tutte polynomials
Authors:
Xian'an **,
Tianlong Ma,
Weiling Yang
Abstract:
The multiplicity Tutte polynomial, which includes the arithmetic Tutte polynomial, is a generalization of the classical Tutte polynomial of matroids. In this paper, we obtain an expression of the general coefficient and the expressions of six extreme coefficients of multiplicity Tutte polynomials. In particular, an expression of the general coefficient and the expressions of corresponding extreme…
▽ More
The multiplicity Tutte polynomial, which includes the arithmetic Tutte polynomial, is a generalization of the classical Tutte polynomial of matroids. In this paper, we obtain an expression of the general coefficient and the expressions of six extreme coefficients of multiplicity Tutte polynomials. In particular, an expression of the general coefficient and the expressions of corresponding extreme coefficients of classical Tutte polynomial of matroids are deduced.
△ Less
Submitted 5 February, 2024; v1 submitted 10 June, 2023;
originally announced June 2023.
-
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Authors:
Hong Liu,
Zhiyuan Li,
David Hall,
Percy Liang,
Tengyu Ma
Abstract:
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped St…
▽ More
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clip**. The clip** controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT models of sizes ranging from 125M to 1.5B, Sophia achieves a 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time, achieving the same perplexity with 50% fewer steps, less total compute, and reduced wall-clock time. Theoretically, we show that Sophia, in a much simplified setting, adapts to the heterogeneous curvatures in different parameter dimensions, and thus has a run-time bound that does not depend on the condition number of the loss.
△ Less
Submitted 5 March, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Rota-Baxter operators on cocommutative Hopf algebras and Hopf braces
Authors:
Huihui Zheng,
Li Guo,
Tianshui Ma,
Liangyun Zhang
Abstract:
This paper studies the relationship of Rota-Baxter operators on cocommutative Hopf algebras with Hopf braces and the Yang-Baxter equation, with emphasis on the embedding of cocommutative Hopf braces into Rota-Baxter Hopf algebras. Through Hopf braces, we establish a connection between relative Rota-Baxter operators on cocommutative Hopf algebras and bijective 1-cocycles. Finally, we introduce the…
▽ More
This paper studies the relationship of Rota-Baxter operators on cocommutative Hopf algebras with Hopf braces and the Yang-Baxter equation, with emphasis on the embedding of cocommutative Hopf braces into Rota-Baxter Hopf algebras. Through Hopf braces, we establish a connection between relative Rota-Baxter operators on cocommutative Hopf algebras and bijective 1-cocycles. Finally, we introduce the notion of symmetric Hopf braces, and establish the relationship between symmetric Hopf braces and Rota-Baxter Hopf algebras.
△ Less
Submitted 25 June, 2024; v1 submitted 15 April, 2023;
originally announced April 2023.
-
The sufficient conditions for $k$-leaf-connected graphs in terms of several topological indices
Authors:
Tingyan Ma,
Ligong Wang,
Yang Hu
Abstract:
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. For $k\geq2$ and given any subset $S\subseteq|V(G)|$ with $|S|=k$, if a graph $G$ of order $|V(G)|\geq k+1$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T$, then the graph $G$ is a $k$-leaf-connected graph. A graph $G$ is called Hamilton-connected if any two vertices of $G$ are connected…
▽ More
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. For $k\geq2$ and given any subset $S\subseteq|V(G)|$ with $|S|=k$, if a graph $G$ of order $|V(G)|\geq k+1$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T$, then the graph $G$ is a $k$-leaf-connected graph. A graph $G$ is called Hamilton-connected if any two vertices of $G$ are connected by a Hamilton path. Based on the definitions of $k$-leaf-connected and Hamilton-connected, we known that a graph is $2$-leaf-connected if and only if it is Hamilton-connected. During the past decades, there have been many results of sufficient conditions for Hamilton-connected with respect to topological indices. In this paper, we present sufficient conditions for a graph $G$ to be $k$-leaf-connected in terms of the Zagreb index, the reciprocal degree distance or the hyper-Zagreb index. Furthermore, we use the first Zagreb index and hyper-Zagreb index of the complement graph $\overline{G}$ to give sufficient conditions for a graph $G$ to be $k$-leaf-connected.
△ Less
Submitted 7 April, 2024; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Extremal trees, unicyclic and bicyclic graphs with respect to $p$-Sombor spectral radii
Authors:
Ruiling Zheng,
Tianlong Ma,
Xian'an **
Abstract:
For a graph $G=(V,E)$ and $v_{i}\in V$, denote by $d_{v_{i}}$ (or $d_{i}$ for short) the degree of vertex $v_{i}$. The $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$ ($p\neq0$) of a graph $G$ is a square matrix, where the $(i,j)$-entry is equal to $\displaystyle (d_{i}^{p}+d_{j}^{p})^{\frac{1}{p}}$ if the vertices $v_{i}$ and $v_{j}$ are adjacent, and 0 otherwise. The $p$-Sombor spectral radius of…
▽ More
For a graph $G=(V,E)$ and $v_{i}\in V$, denote by $d_{v_{i}}$ (or $d_{i}$ for short) the degree of vertex $v_{i}$. The $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$ ($p\neq0$) of a graph $G$ is a square matrix, where the $(i,j)$-entry is equal to $\displaystyle (d_{i}^{p}+d_{j}^{p})^{\frac{1}{p}}$ if the vertices $v_{i}$ and $v_{j}$ are adjacent, and 0 otherwise. The $p$-Sombor spectral radius of $G$, denoted by $\displaystyle ρ(\textbf{S}_{\textbf{p}}(G))$, is the largest eigenvalue of the $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$. In this paper, we consider the extremal trees, unicyclic and bicyclic graphs with respect to the $p$-Sombor spectral radii. We characterize completely the extremal graphs with the first three maximum Sombor spectral radii, which answers partially a problem posed by Liu et al. in [MATCH Commun. Math. Comput. Chem. 87 (2022) 59-87].
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Canonical curves and Kropina metrics in Lagrangian contact geometry
Authors:
T. Ma,
K. J. Flood,
V. S. Matveev,
V. Žádník
Abstract:
We present a Fefferman-type construction from Lagrangian contact to conformal structures and examine several related topics. In particular, we concentrate on describing the canonical curves and their correspondence. We show that chains and null-chains of an integrable Lagrangian contact structure are the projections of null-geodesics of the Fefferman space. Employing the Fermat principle, we reali…
▽ More
We present a Fefferman-type construction from Lagrangian contact to conformal structures and examine several related topics. In particular, we concentrate on describing the canonical curves and their correspondence. We show that chains and null-chains of an integrable Lagrangian contact structure are the projections of null-geodesics of the Fefferman space. Employing the Fermat principle, we realize chains as geodesics of Kropina (pseudo-Finsler) metrics. Using recent rigidity results, we show that ``sufficiently many'' chains determine the Lagrangian contact structure. Separately, we comment on Lagrangian contact structures induced by projective structures and the special case of dimension three.
△ Less
Submitted 16 June, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
On Generalization and Regularization via Wasserstein Distributionally Robust Optimization
Authors:
Qinyu Wu,
Jonathan Yu-Meng Li,
Tiantian Mao
Abstract:
Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly…
▽ More
Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly applied in machine learning. Existing results on generalization bounds and the equivalency to regularization are largely limited to the setting where the Wasserstein ball is of a certain type and the decision criterion takes certain forms of an expected function. In this paper, we show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization in a significantly broader setting where the Wasserstein ball can be of a general type and the decision criterion can be a general measure of risk, i.e., nonlinear in distributions. This allows for accommodating many important classification, regression, and risk minimization applications that have not been addressed to date using Wasserstein DRO. Our results are strong in that the generalization bounds do not suffer from the curse of dimensionality and the equivalency to regularization is exact. As a byproduct, our regularization results broaden considerably the class of Wasserstein DRO models that can be solved efficiently via regularization formulations.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
How Does Sharpness-Aware Minimization Minimize Sharpness?
Authors:
Kaiyue Wen,
Tengyu Ma,
Zhiyuan Li
Abstract:
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient…
▽ More
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.
△ Less
Submitted 5 January, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
An improvement of sufficient condition for $k$-leaf-connected graphs
Authors:
Tingyan Ma,
Guoyan Ao,
Ruifang Liu,
Ligong Wang,
Yang Hu
Abstract:
For integer $k\geq2,$ a graph $G$ is called $k$-leaf-connected if $|V(G)|\geq k+1$ and given any subset $S\subseteq V(G)$ with $|S|=k,$ $G$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T.$ Thus a graph is $2$-leaf-connected if and only if it is Hamilton-connected. In this paper, we present a best possible condition based upon the size to guarantee a graph to be…
▽ More
For integer $k\geq2,$ a graph $G$ is called $k$-leaf-connected if $|V(G)|\geq k+1$ and given any subset $S\subseteq V(G)$ with $|S|=k,$ $G$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T.$ Thus a graph is $2$-leaf-connected if and only if it is Hamilton-connected. In this paper, we present a best possible condition based upon the size to guarantee a graph to be $k$-leaf-connected, which not only improves the results of Gurgel and Wakabayashi [On $k$-leaf-connected graphs, J. Combin. Theory Ser. B 41 (1986) 1-16] and Ao, Liu, Yuan and Li [Improved sufficient conditions for $k$-leaf-connected graphs, Discrete Appl. Math. 314 (2022) 17-30], but also extends the result of Xu, Zhai and Wang [An improvement of spectral conditions for Hamilton-connected graphs, Linear Multilinear Algebra, 2021]. Our key approach is showing that an $(n+k-1)$-closed non-$k$-leaf-connected graph must contain a large clique if its size is large enough. As applications, sufficient conditions for a graph to be $k$-leaf-connected in terms of the (signless Laplacian) spectral radius of $G$ or its complement are also presented.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Dynamic charging management for electric vehicle demand responsive transport
Authors:
Tai-Yu Ma
Abstract:
With the climate change challenges, transport network companies started to electrify their fleet to reduce CO2 emissions. However, such an ecological transition brings new research challenges for dynamic electric fleet charging management under uncertainty. In this study, we address the dynamic charging scheduling management of shared ride-hailing services with public charging stations. A two-stag…
▽ More
With the climate change challenges, transport network companies started to electrify their fleet to reduce CO2 emissions. However, such an ecological transition brings new research challenges for dynamic electric fleet charging management under uncertainty. In this study, we address the dynamic charging scheduling management of shared ride-hailing services with public charging stations. A two-stage charging scheduling optimization approach under a rolling horizon framework is proposed to minimize the overall charging operational costs of the fleet, including vehicles' access times, charging times, and waiting times, by anticipating future public charging station availability. The charging station occupancy prediction is based on a hybrid LSTM (Long short-term memory) network approach and integrated into the proposed online vehicle-charger assignment. The proposed methodology is applied to a realistic simulation study in the city of Dundee, UK. The numerical studies show that the proposed approach can reduce the total charging waiting times of the fleet by 48.3% and the total charged the amount of energy of the fleet by 35.3% compared to a need-based charging reference policy.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Favorite Downcrossing Sites of One-Dimensional Simple Random Walk
Authors:
Chen-Xu Hao,
Ze-Chun Hu,
Ting Ma,
Renming Song
Abstract:
Random walk is a very important Markov process and has important applications in many fields.For a one-dimensional simple symmetric random walk $(S_n)$, a site $x$ is called a favorite downcrossing site at time $n$ if its downcrossing local time at time $n$ achieves the maximum among all sites. In this paper, we study the cardinality of the favorite downcrossing site set, and will show that with p…
▽ More
Random walk is a very important Markov process and has important applications in many fields.For a one-dimensional simple symmetric random walk $(S_n)$, a site $x$ is called a favorite downcrossing site at time $n$ if its downcrossing local time at time $n$ achieves the maximum among all sites. In this paper, we study the cardinality of the favorite downcrossing site set, and will show that with probability 1 there are only finitely many times at which there are at least four favorite downcrossing sites and three favorite downcrossing sites occurs infinitely often. Some related open questions will be introduced.
△ Less
Submitted 29 November, 2022; v1 submitted 21 July, 2022;
originally announced July 2022.
-
A General Wasserstein Framework for Data-driven Distributionally Robust Optimization: Tractability and Applications
Authors:
Jonathan Yu-Meng Li,
Tiantian Mao
Abstract:
Data-driven distributionally robust optimization is a recently emerging paradigm aimed at finding a solution that is driven by sample data but is protected against sampling errors. An increasingly popular approach, known as Wasserstein distributionally robust optimization (DRO), achieves this by applying the Wasserstein metric to construct a ball centred at the empirical distribution and finding a…
▽ More
Data-driven distributionally robust optimization is a recently emerging paradigm aimed at finding a solution that is driven by sample data but is protected against sampling errors. An increasingly popular approach, known as Wasserstein distributionally robust optimization (DRO), achieves this by applying the Wasserstein metric to construct a ball centred at the empirical distribution and finding a solution that performs well against the most adversarial distribution from the ball. In this paper, we present a general framework for studying different choices of a Wasserstein metric and point out the limitation of the existing choices. In particular, while choosing a Wasserstein metric of a higher order is desirable from a data-driven perspective, given its less conservative nature, such a choice comes with a high price from a robustness perspective - it is no longer applicable to many heavy-tailed distributions of practical concern. We show that this seemingly inevitable trade-off can be resolved by our framework, where a new class of Wasserstein metrics, called coherent Wasserstein metrics, is introduced. Like Wasserstein DRO, distributionally robust optimization using the coherent Wasserstein metrics, termed generalized Wasserstein distributionally robust optimization (GW-DRO), has all the desirable performance guarantees: finite-sample guarantee, asymptotic consistency, and computational tractability. The worst-case expectation problem in GW-DRO is in general a nonconvex optimization problem, yet we provide new analysis to prove its tractability without relying on the common duality scheme. Our framework, as shown in this paper, offers a fruitful opportunity to design novel Wasserstein DRO models that can be applied in various contexts such as operations management, finance, and machine learning.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Rota-Baxter Lie bialgebras, classical Yang-Baxter equations and special L-dendriform bialgebras
Authors:
Chengming Bai,
Li Guo,
Guilai Liu,
Tianshui Ma
Abstract:
We establish a bialgebra structure on Rota-Baxter Lie algebras following the Manin triple approach to Lie bialgebras. Explicitly, Rota-Baxter Lie bialgebras are characterized by generalizing matched pairs of Lie algebras and Manin triples of Lie algebras to the context of Rota-Baxter Lie algebras. The coboundary case leads to the introduction of the admissible classical Yang-Baxter equation (CYBE)…
▽ More
We establish a bialgebra structure on Rota-Baxter Lie algebras following the Manin triple approach to Lie bialgebras. Explicitly, Rota-Baxter Lie bialgebras are characterized by generalizing matched pairs of Lie algebras and Manin triples of Lie algebras to the context of Rota-Baxter Lie algebras. The coboundary case leads to the introduction of the admissible classical Yang-Baxter equation (CYBE) in Rota-Baxter Lie algebras, for which the antisymmetric solutions give rise to Rota-Baxter Lie bialgebras. The notions of $\mathcal{O}$-operators on Rota-Baxter Lie algebras and Rota-Baxter pre-Lie algebras are introduced to produce antisymmetric solutions of the admissible CYBE. Furthermore, extending the well-known property that a Rota-Baxter Lie algebra of weight zero induces a pre-Lie algebra, the Rota-Baxter Lie bialgebra of weight zero induces a bialgebra structure of independent interest, namely the special L-dendriform bialgebra, which is equivalent to a Lie group with a left-invariant flat pseudo-metric in geometry. This induction is also characterized as the inductions between the corresponding Manin triples and matched pairs. Finally, antisymmetric solutions of the admissible CYBE in a Rota-Baxter Lie algebra of weight zero give special L-dendriform bialgebras. In particular, both Rota-Baxter algebras of weight zero and Rota-Baxter pre-Lie algebras of weight zero can be used to construct special L-dendriform algebras.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Double crossed biproducts and related structures
Authors:
Tianshui Ma,
Jie Li,
Haiyan Yang,
Shuanhong Wang
Abstract:
Let $H$ be a bialgebra. Let $σ: H\otimes H\to A$ be a linear map, where $A$ is a left $H$-comodule coalgebra, and an algebra with a left $H$-weak action $\triangleright$. Let $τ: H\otimes H\to B$ be a linear map, where $B$ is a right $H$-comodule coalgebra, and an algebra with a right $H$-weak action $\triangleleft$. In this paper, we improve the necessary conditions for the two-sided crossed prod…
▽ More
Let $H$ be a bialgebra. Let $σ: H\otimes H\to A$ be a linear map, where $A$ is a left $H$-comodule coalgebra, and an algebra with a left $H$-weak action $\triangleright$. Let $τ: H\otimes H\to B$ be a linear map, where $B$ is a right $H$-comodule coalgebra, and an algebra with a right $H$-weak action $\triangleleft$. In this paper, we improve the necessary conditions for the two-sided crossed product algebra $A\#^σ H~{^τ\#} B$ and the two-sided smash coproduct coalgebra $A\times H\times B$ to form a bialgebra (called double crossed biproduct) such that the condition $b_{[1]}\triangleright a_0\otimes b_{[0]}\triangleleft a_{-1}=a\otimes b$ in Majid's double biproduct (or double-bosonization) is one of the necessary conditions. On the other hand, we provide a more general two-sided crossed product algebra structure via Brzezński's crossed product and give some applications.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Tight toughness, isolated toughness and binding number bounds for the $\{K_2,C_n\}$-factors
Authors:
Xiaxia Guan,
Tianlong Ma,
Chao Shi
Abstract:
The $\{K_2,C_n\}$-factor of a graph is a spanning subgraph whose each component is either $K_2$ or $C_n$. In this paper, a sufficient condition with regard to tight toughness, isolated toughness and binding number bounds to guarantee the existence of the $\{K_2,C_{2i+1}| i\geq 2 \}$-factor for any graph is obtained, which answers a problem due to Gao and Wang (J. Oper. Res. Soc. China (2021), http…
▽ More
The $\{K_2,C_n\}$-factor of a graph is a spanning subgraph whose each component is either $K_2$ or $C_n$. In this paper, a sufficient condition with regard to tight toughness, isolated toughness and binding number bounds to guarantee the existence of the $\{K_2,C_{2i+1}| i\geq 2 \}$-factor for any graph is obtained, which answers a problem due to Gao and Wang (J. Oper. Res. Soc. China (2021), https://doi.org/10.1007/s40305-021-00357-6).
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
A direct and elementary proof of the well-definedness of the interior and exterior polynomials of hypergraphs
Authors:
Xiaxia Guan,
Xian'an **,
Tianlong Ma
Abstract:
T. Kálmán (A version of Tutte's polynomial for hypergraphs, Adv. Math. 244 (2013) 823-873.) introduced the interior and exterior polynomials which are generalizations of the Tutte polynomial $T(x,y)$ on plane points $(1/x,1)$ and $(1,1/y)$ to hypergraphs. The two polynomials are defined under a fixed ordering of hyperedges, and are proved to be independent of the ordering using techniques of polyt…
▽ More
T. Kálmán (A version of Tutte's polynomial for hypergraphs, Adv. Math. 244 (2013) 823-873.) introduced the interior and exterior polynomials which are generalizations of the Tutte polynomial $T(x,y)$ on plane points $(1/x,1)$ and $(1,1/y)$ to hypergraphs. The two polynomials are defined under a fixed ordering of hyperedges, and are proved to be independent of the ordering using techniques of polytopes. In this paper, similar to the Tutte's original proof we provide a direct and elementary proof for the well-definedness of the interior and exterior polynomials of hypergraphs.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Transposed BiHom-Poisson algebras
Authors:
Tianshui Ma,
Bei Li
Abstract:
In this paper, we introduce the concept of transposed BiHom-Poisson (abbr. TBP) algebras which can be constructed by the BiHom-Novikov-Poisson algebras. Several useful identities for TBP algebras are provided. We also prove that the tensor product of two (T)BP algebras are closed. The notions of BP 3-Lie algebras and TBP 3-Lie algebras are presented and TBP algebras can induce TBP 3-Lie algebras b…
▽ More
In this paper, we introduce the concept of transposed BiHom-Poisson (abbr. TBP) algebras which can be constructed by the BiHom-Novikov-Poisson algebras. Several useful identities for TBP algebras are provided. We also prove that the tensor product of two (T)BP algebras are closed. The notions of BP 3-Lie algebras and TBP 3-Lie algebras are presented and TBP algebras can induce TBP 3-Lie algebras by two approaches. Finally, we give some examples for the TBP algebras of dimension 2.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.
-
Bialgebras, Frobenius algebras and associative Yang-Baxter equations for Rota-Baxter algebras
Authors:
Chengming Bai,
Li Guo,
Tianshui Ma
Abstract:
Rota-Baxter operators and bialgebras go hand in hand in their applications, such as in the Connes-Kreimer approach to renormalization and the operator approach to the classical Yang-Baxter equation. We establish a bialgebra structure that is compatible with the Rota-Baxter operator, called the Rota-Baxter antisymmetric infinitesimal (ASI) bialgebra. This bialgebra is characterized by generalizatio…
▽ More
Rota-Baxter operators and bialgebras go hand in hand in their applications, such as in the Connes-Kreimer approach to renormalization and the operator approach to the classical Yang-Baxter equation. We establish a bialgebra structure that is compatible with the Rota-Baxter operator, called the Rota-Baxter antisymmetric infinitesimal (ASI) bialgebra. This bialgebra is characterized by generalizations of matched pairs of algebras and double constructions of Frobenius algebras to the context of Rota-Baxter algebras. The study of the coboundary case leads to an enrichment of the associative Yang-Baxter equation (AYBE) to Rota-Baxter algebras. Antisymmetric solutions of the equation are used to construct Rota-Baxter ASI bialgebras. The notions of an $\mathcal{O}$-operator on a Rota-Baxter algebra and a Rota-Baxter dendriform algebra are also introduced to produce solutions of the AYBE in Rota-Baxter algebras and thus to provide Rota-Baxter ASI bialgebras. An unexpected byproduct is that a Rota-Baxter ASI bialgebra of weight zero gives rise to a quadri-bialgebra instead of bialgebra constructions for the dendriform algebra.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective
Authors:
Margalit Glasgow,
Honglin Yuan,
Tengyu Ma
Abstract:
Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the ex…
▽ More
Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the existing analysis captures the capacity of the algorithm. In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable. Additionally, we establish a lower bound in a heterogeneous setting that nearly matches the existing upper bound. While our lower bounds show the limitations of FedAvg, under an additional assumption of third-order smoothness, we prove more optimistic state-of-the-art convergence results in both convex and non-convex settings. Our analysis stems from a notion we call iterate bias, which is defined by the deviation of the expectation of the SGD trajectory from the noiseless gradient descent trajectory with the same initialization. We prove novel sharp bounds on this quantity, and show intuitively how to analyze this quantity from a Stochastic Differential Equation (SDE) perspective.
△ Less
Submitted 11 February, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Three Favorite Edges Occurs Infinitely Often for One-Dimensional Simple Random Walk
Authors:
Chen-Xu Hao,
Ze-Chun Hu,
Ting Ma,
Renming Song
Abstract:
For a one-dimensional simple symmetric random walk $(S_n)$, an edge $x$ (between points $x-1$ and $x$) is called a favorite edge at time $n$ if its local time at $n$ achieves the maximum among all edges. In this paper, we show that with probability 1 three favorite edges occurs infinitely often. Our work is inspired by Tóth and Werner [Combin. Probab. Comput. {\bf 6} (1997) 359-369], and Ding and…
▽ More
For a one-dimensional simple symmetric random walk $(S_n)$, an edge $x$ (between points $x-1$ and $x$) is called a favorite edge at time $n$ if its local time at $n$ achieves the maximum among all edges. In this paper, we show that with probability 1 three favorite edges occurs infinitely often. Our work is inspired by Tóth and Werner [Combin. Probab. Comput. {\bf 6} (1997) 359-369], and Ding and Shen [Ann. Probab. {\bf 46} (2018) 2545-2561], disproves a conjecture mentioned in Remark 1 on page 368 of Tóth and Werner [Combin. Probab. Comput. {\bf 6} (1997) 359-369].
△ Less
Submitted 12 July, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Nonhomogeneous associative Yang-Baxter equations
Authors:
Tianshui Ma,
Jie Li
Abstract:
We introduce the notion of associative (BiHom-)Yang-Baxter pair of weight $(λ,γ)$ which can provide the solution to the double curved Rota-Baxter (BiHom-)system. Equivalent characterizations of (quasitriangular) covariant BiHom-bialgebra are given. We also prove that associative BiHom-Yang-Baxter equation of weight $-1$ can be obtained by the unitary quasitriangular covariant BiHom-bialgebra. At l…
▽ More
We introduce the notion of associative (BiHom-)Yang-Baxter pair of weight $(λ,γ)$ which can provide the solution to the double curved Rota-Baxter (BiHom-)system. Equivalent characterizations of (quasitriangular) covariant BiHom-bialgebra are given. We also prove that associative BiHom-Yang-Baxter equation of weight $-1$ can be obtained by the unitary quasitriangular covariant BiHom-bialgebra. At last, we present two approaches to construct (BiHom-)pre-Lie modules from Rota-Baxter (BiHom-)paired modules.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
On the tensor product of two oriented quantum algebras
Authors:
Tianshui Ma,
Haiyang Yang,
Tao Yang
Abstract:
In this paper, we give the oriented quantum algebra (abbr. OQA) structures on the tensor product of two different OQAs by using Chen's weak $\mathfrak{R}$-matrix in [J. Algebra 204(1998):504-531]. As a special case, the OQA structures on the tensor product of an OQA with itself are provided, which are different from Radford's results in [J. Knot Theory Ramifications 16(2007):929-957].
In this paper, we give the oriented quantum algebra (abbr. OQA) structures on the tensor product of two different OQAs by using Chen's weak $\mathfrak{R}$-matrix in [J. Algebra 204(1998):504-531]. As a special case, the OQA structures on the tensor product of an OQA with itself are provided, which are different from Radford's results in [J. Knot Theory Ramifications 16(2007):929-957].
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
Schwarz symmetrizations in parabolic equations on complete manifolds
Authors:
Haiqing Cheng,
Tengfei Ma,
Kui Wang
Abstract:
In this article, we prove a sharp estimate for the solutions to parabolic equations on manifolds. Precisely, using symmetrization techniques and isoperimetric inequalities on Riemannian manifold, we obtain a Bandle's comparison on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. Our results generalize Bandle's result […
▽ More
In this article, we prove a sharp estimate for the solutions to parabolic equations on manifolds. Precisely, using symmetrization techniques and isoperimetric inequalities on Riemannian manifold, we obtain a Bandle's comparison on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. Our results generalize Bandle's result [6] to Riemannian setting, and Talenti's comparison for elliptic equation on manifolds by Colladay-Langford-McDonald [12] and Chen-Li [9] to parabolic equations.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Comparison results for Poisson equation with mixed boundary condition on manifolds
Authors:
Haiqing Cheng,
Tengfei Ma,
Kui Wang
Abstract:
In this article, we establish a $L^1$ estimate for solutions to Poisson equation with mixed boundary condition, on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. On Riemann surfaces we obtain a Talenti-type comparison. Our results generalize main theorems in [2] to Riemannian setting, and Chen-Li's result [8] to the…
▽ More
In this article, we establish a $L^1$ estimate for solutions to Poisson equation with mixed boundary condition, on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. On Riemann surfaces we obtain a Talenti-type comparison. Our results generalize main theorems in [2] to Riemannian setting, and Chen-Li's result [8] to the case of variable Robin parameter.
△ Less
Submitted 9 November, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Label Noise SGD Provably Prefers Flat Global Minimizers
Authors:
Alex Damian,
Tengyu Ma,
Jason D. Lee
Abstract:
In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to…
▽ More
In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to a stationary point of a regularized loss $L(θ) +λR(θ)$, where $L(θ)$ is the training loss, $λ$ is an effective regularization parameter depending on the step size, strength of the label noise, and the batch size, and $R(θ)$ is an explicit regularizer that penalizes sharp minimizers. Our analysis uncovers an additional regularization effect of large learning rates beyond the linear scaling rule that penalizes large eigenvalues of the Hessian more than small ones. We also prove extensions to classification with general loss functions, SGD with momentum, and SGD with general noise covariance, significantly strengthening the prior work of Blanc et al. to global convergence and large learning rates and of HaoChen et al. to general models.
△ Less
Submitted 4 December, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Optimal queueing-based rebalancing for one-way electric carsharing systems with stochastic demand
Authors:
Tai-Yu Ma,
Theodoros Pantelidis,
Joseph Y. J. Chow
Abstract:
Viability of electric vehicle car sharing operations depends on rebalancing algorithms. Earlier methods in the literature suggest a trend toward Markovian stochastic demand with server relocation with queueing constraints. We propose a new model formulation based on a node-charge graph structure that extends the relocation model to include transshipment relocation flows. Computational tests with u…
▽ More
Viability of electric vehicle car sharing operations depends on rebalancing algorithms. Earlier methods in the literature suggest a trend toward Markovian stochastic demand with server relocation with queueing constraints. We propose a new model formulation based on a node-charge graph structure that extends the relocation model to include transshipment relocation flows. Computational tests with up to 1000 node (and 4000 node-charges) suggest promising avenues for further study.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Rota-Baxter operators on Turaev's Hopf group (co)algebras I: Basic definitions and related algebraic structures
Authors:
Tianshui Ma,
Jie Li,
Liangyun Chen,
Shuanhong Wang
Abstract:
We find a natural compatible condition between the Rota-Baxter operator and Turaev's (Hopf) group-(co)algebras, which leads to the concept of Rota-Baxter Turaev's (Hopf) group-(co)algebra. Two characterizations of Rota-Baxter Turaev's group-algebras (abbr. T-algebras) are obtained: one by Atkinson factorization and the other by T-quasi-idempotent elements. The relations among some related Turaev's…
▽ More
We find a natural compatible condition between the Rota-Baxter operator and Turaev's (Hopf) group-(co)algebras, which leads to the concept of Rota-Baxter Turaev's (Hopf) group-(co)algebra. Two characterizations of Rota-Baxter Turaev's group-algebras (abbr. T-algebras) are obtained: one by Atkinson factorization and the other by T-quasi-idempotent elements. The relations among some related Turaev's group algebraic structures (such as (tri)dendriform T-algebras, Zinbiel T-algebras, pre-Lie T-algebras, Lie T-algebras) are discussed, and some concrete examples from the algebras of dimensions 2,3 and 4 are given. At last we prove that Rota-Baxter Poisson T-algebras can produce pre-Poisson T-algebras and Poisson T-algebras can be obtained from pre-Poisson T-algebras.
△ Less
Submitted 8 March, 2023; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Why Do Local Methods Solve Nonconvex Problems?
Authors:
Tengyu Ma
Abstract:
Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an…
▽ More
Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
Attractors for locally damped Bresse systems and a unique continuation property
Authors:
To Fu Ma,
Rodrigo N. Monteiro,
Paulo N. Seminario-Huertas
Abstract:
This paper is devoted to Bresse systems, a robust model for circular beams, given by a set of three coupled wave equations. The main objective is to establish the existence of global attractors for dynamics of semilinear problems with localized dam**. In order to deal with localized dam** a unique continuation property (UCP) is needed. Therefore we also provide a suitable UCP for Bresse system…
▽ More
This paper is devoted to Bresse systems, a robust model for circular beams, given by a set of three coupled wave equations. The main objective is to establish the existence of global attractors for dynamics of semilinear problems with localized dam**. In order to deal with localized dam** a unique continuation property (UCP) is needed. Therefore we also provide a suitable UCP for Bresse systems. Our strategy is to set the problem in a Riemannian geometry framework and see the system as a single equation with different Riemann metrics. Then we perform Carleman-type estimates to get our result.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Geodesic random walks, diffusion processes and Brownian motion on Finsler manifolds
Authors:
Tianyu Ma,
Vladimir S. Matveev,
Ilya Pavlyukevich
Abstract:
We show that geodesic random walks on a complete Finsler manifold of bounded geometry converge to a diffusion process which is, up to a drift, the Brownian motion corresponding to a Riemannian metric.
We show that geodesic random walks on a complete Finsler manifold of bounded geometry converge to a diffusion process which is, up to a drift, the Brownian motion corresponding to a Riemannian metric.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Tutte polynomials of fan-like graphs with applications in benzenoid systems
Authors:
Tianlong Ma,
Xian'an **,
Fuji Zhang
Abstract:
We study the computation of the Tutte polynomials of fan-like graphs and obtain expressions of their Tutte polynomials via generating functions. As applications, Tutte polynomials, in particular, the number of spanning trees, of two kinds of benzenoid systems, i.e. pyrene chains and triphenylene chains, are obtained.
We study the computation of the Tutte polynomials of fan-like graphs and obtain expressions of their Tutte polynomials via generating functions. As applications, Tutte polynomials, in particular, the number of spanning trees, of two kinds of benzenoid systems, i.e. pyrene chains and triphenylene chains, are obtained.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Two-stage battery recharge scheduling and vehicle-charger assignment policy for dynamic electric dial-a-ride services
Authors:
Tai-Yu Ma
Abstract:
Coordinating the charging scheduling of electric vehicles for dynamic dial-a-ride services is challenging considering charging queuing delays and stochastic customer demand. We propose a new two-stage solution approach to handle dynamic vehicle charging scheduling to minimize the costs of daily charging operations of the fleet. The approach comprises two components: daily vehicle charging scheduli…
▽ More
Coordinating the charging scheduling of electric vehicles for dynamic dial-a-ride services is challenging considering charging queuing delays and stochastic customer demand. We propose a new two-stage solution approach to handle dynamic vehicle charging scheduling to minimize the costs of daily charging operations of the fleet. The approach comprises two components: daily vehicle charging scheduling and online vehicle-charger assignment. A new battery charge scheduling model is proposed to obtain the vehicle charging schedules by minimizing the costs of vehicle daily charging operations while satisfying vehicle driving needs to serve customers. In the second stage, an online vehicle-charger assignment model is developed to minimize the total vehicle idle time for charges by considering queuing delays at the level of chargers. An efficient Lagrangian relaxation algorithm is proposed to solve the large-scale vehicle-charger assignment problem with small optimality gaps. The approach is applied to a realistic dynamic dial-a-ride service case study in Luxembourg and compared with the nearest charging station charging policy and first-come-first-served minimum charging delay policy under different charging infrastructure scenarios. Our computational results show that the approach can achieve significant savings for the operator in terms of charging waiting times (-74.9%), charging times (-38.6%), and charged energy costs (-27.4%). A sensitivity analysis is conducted to evaluate the impact of the different model parameters, showing the scalability and robustness of the approach in a stochastic environment.
△ Less
Submitted 15 April, 2021; v1 submitted 4 October, 2020;
originally announced October 2020.
-
Limiting behaviors for longest consecutive switches in an IID Bernoulli sequence
Authors:
Chen-Xu Hao,
Ting Ma
Abstract:
In this paper we mainly discuss sharp lower and upper bounds for the length of longest consecutive switches in IID Bernoulli sequences.
This work is an extension of results in Erdős and Révész (1975) for longest head-run and Hao et al. (2021) for longest consecutive switches in unbiased coin-tossing, and might be applied to reliability theory, biology, quality control, pattern recognition, finan…
▽ More
In this paper we mainly discuss sharp lower and upper bounds for the length of longest consecutive switches in IID Bernoulli sequences.
This work is an extension of results in Erdős and Révész (1975) for longest head-run and Hao et al. (2021) for longest consecutive switches in unbiased coin-tossing, and might be applied to reliability theory, biology, quality control, pattern recognition, finance, etc.
△ Less
Submitted 18 January, 2022; v1 submitted 18 September, 2020;
originally announced September 2020.
-
A Note on the Gaussian Minimum Conjecture
Authors:
Yang-Fan Zhong,
Ting Ma,
Ze-Chun Hu
Abstract:
Let $n\geq 2$ and $(X_i,1\leq i\leq n)$ be a centered Gaussian random vector. The Gaussian minimum conjecture says that $E\left(\min_{1\leq i\leq n}|X_i|\right)\geq E\left(\min_{1\leq i\leq n}|Y_i|\right)$, where $Y_1,\ldots,Y_n$ are independent centered Gaussian random variables with $E(X_i^2)=E(Y_i^2)$ for any $i=1,\ldots,n$. In this note, we will show that this conjecture holds if and only if…
▽ More
Let $n\geq 2$ and $(X_i,1\leq i\leq n)$ be a centered Gaussian random vector. The Gaussian minimum conjecture says that $E\left(\min_{1\leq i\leq n}|X_i|\right)\geq E\left(\min_{1\leq i\leq n}|Y_i|\right)$, where $Y_1,\ldots,Y_n$ are independent centered Gaussian random variables with $E(X_i^2)=E(Y_i^2)$ for any $i=1,\ldots,n$. In this note, we will show that this conjecture holds if and only if $n=2$.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.