Search | arXiv e-print repository

High-probability minimax lower bounds

Authors: Tianyi Ma, Kabir A. Verchand, Richard J. Samworth

Abstract: The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the… ▽ More The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the notion of a minimax quantile, and seek to articulate its dependence on the quantile level. To this end, we develop high-probability variants of the classical Le Cam and Fano methods, as well as a technique to convert local minimax risk lower bounds to lower bounds on minimax quantiles. To illustrate the power of our framework, we deploy our techniques on several examples, recovering recent results in robust mean estimation and stochastic convex optimisation, as well as obtaining several new results in covariance matrix estimation, sparse linear regression, nonparametric density estimation and isotonic regression. Our overall goal is to argue that minimax quantiles can provide a finer-grained understanding of the difficulty of statistical problems, and that, in wide generality, lower bounds on these quantities can be obtained via user-friendly tools. △ Less

Submitted 4 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: 38 pages, 3 figures

MSC Class: 62C20; 62B10

arXiv:2406.03715 [pdf, other]

Strong convergence rates for full-discrete approximations of the stochastic Allen-Cahn equations on 2D torus

Authors: Ting Ma, Lifei Wang, Huanyu Yang

Abstract: In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $α-δ$ and the temporal order of $α/{6}-δ$ in $\mathcal C^{-α}$ for… ▽ More In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $α-δ$ and the temporal order of $α/{6}-δ$ in $\mathcal C^{-α}$ for $α\in(0,1/3)$ and $δ>0$ arbitrarily small. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2403.04213 [pdf, ps, other]

Representations of non-finitely graded Lie algebras related to Virasoro algebra

Authors: Chunguang Xia, Tianyu Ma, Xiao Dong, Ming**g Zhang

Abstract: In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and is… ▽ More In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and isomorphism classes of these modules. △ Less

Submitted 3 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: 18 pages

arXiv:2401.11571 [pdf, other]

On eigenvalues and eigenfunctions of the operators defining multidimensional scaling on some symmetric spaces

Authors: Tianyu Ma, Eugene Stepanov

Abstract: We study asymptotics of the eigenvalues and eigenfunctions of the operators used for constructing multidimensional scaling (MDS) on compact connected Riemannian manifolds, in particular on closed connected symmetric spaces. They are the limits of eigenvalues and eigenvectors of squared distance matrices of an increasing sequence of finite subsets covering the space densely in the limit. We show th… ▽ More We study asymptotics of the eigenvalues and eigenfunctions of the operators used for constructing multidimensional scaling (MDS) on compact connected Riemannian manifolds, in particular on closed connected symmetric spaces. They are the limits of eigenvalues and eigenvectors of squared distance matrices of an increasing sequence of finite subsets covering the space densely in the limit. We show that for products of spheres and real projective spaces, the numbers of positive and negative eigenvalues of these operators are both infinite. We also find a class of spaces (namely $\mathbb{RP}^n$ with odd $n>1$) whose MDS defining operators are not trace class, and original distances cannot be reconstructed from the eigenvalues and eigenfunctions of these operators. △ Less

Submitted 21 January, 2024; originally announced January 2024.

MSC Class: 51Fxx

arXiv:2401.04427 [pdf]

Optimized electrified meeting-point-based feeder bus services with capacitated charging stations and partial recharges

Authors: Tai-Yu Ma, Yumeng Fang, Richard D. Connors, Francesco Viti, Haruko Nakao

Abstract: Meeting-point-based feeder services using EVs have good potential to achieve an efficient and clean on-demand mobility service. However, customer-to-meeting-point, vehicle routing, and charging scheduling need to be jointly optimized to achieve the best system performance. To this aim, we assess the effect of different system parameters and configure them based on our previously developed hybrid m… ▽ More Meeting-point-based feeder services using EVs have good potential to achieve an efficient and clean on-demand mobility service. However, customer-to-meeting-point, vehicle routing, and charging scheduling need to be jointly optimized to achieve the best system performance. To this aim, we assess the effect of different system parameters and configure them based on our previously developed hybrid metaheuristic algorithm. A set of test instances based on morning peak hour commuting scenarios between the cities of Arlon and Luxembourg are used to evaluate the impact of the set parameters on the optimal solutions. The experimental results suggest that higher meeting point availability can achieve better system performance. By jointly configuring different system parameters, the overall system performance can be significantly improved (-10.8% total kilometers traveled by vehicles compared to the benchmark) to serve all requests. Our experimental results show that the meeting-point-based system can reduce up to 70.2% the fleet size, 6.4% the in-vehicle travel time and 49.4% the kilometers traveled when compared to a traditional door-to-door dial-a-ride system. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.03838 [pdf]

A hybrid metaheuristic to optimize electric first-mile feeder services with charging synchronization constraints and customer rejections

Authors: Tai-Yu Ma, Yumeng Fang, Richard D. Connors, Francesco Viti, Haruko Nakao

Abstract: This paper addresses the on-demand meeting-point-based feeder electric bus routing and charging scheduling problem under charging synchronization constraints. The problem considered exhibits the structure of the location routing problem, which is more difficult to solve than many electric vehicle routing problems with capacitated charging stations. We propose to model the problem using a mixed-int… ▽ More This paper addresses the on-demand meeting-point-based feeder electric bus routing and charging scheduling problem under charging synchronization constraints. The problem considered exhibits the structure of the location routing problem, which is more difficult to solve than many electric vehicle routing problems with capacitated charging stations. We propose to model the problem using a mixed-integer linear programming approach based on a layered graph structure. An efficient hybrid metaheuristic solution algorithm is proposed. A mixture of random and greedy partial charging scheduling strategies is used to find feasible charging schedules under the synchronization constraints. The algorithm is tested on instances with up to 100 customers and 49 bus stops/meeting points. The results show that the proposed algorithm provides near-optimal solutions within less one minute on average compared with the best solutions found by a mixed-integer linear programming solver set with a 4-hour computation time limit. A case study on a larger sized case with 1000 customers and 111 meeting points shows the proposed method is applicable to real-world situations. △ Less

Submitted 8 February, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.16483 [pdf, ps, other]

Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ Activation

Authors: Juncai He, Tong Mao, **chao Xu

Abstract: In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial… ▽ More In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial representation using deep ReLU$^k$ networks. This allows us to establish an upper bound on both the size and count of network parameters. Consequently, we are able to demonstrate a suboptimal approximation rate for functions from Sobolev spaces as well as for analytic functions. Additionally, through an exploration of the representation power of deep ReLU$^k$ networks for shallow networks, we reveal that deep ReLU$^k$ networks can approximate functions from a range of variation spaces, extending beyond those generated solely by the ReLU$^k$ activation function. This finding demonstrates the adaptability of deep ReLU$^k$ networks in approximating functions within various variation spaces. △ Less

Submitted 10 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

arXiv:2310.19259 [pdf, ps, other]

Distance spectral conditions for $ID$-factor-critical and fractional $[a, b]$-factor of graphs

Authors: Tingyan Ma, Ligong Wang

Abstract: Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. A graph is $ID$-factor-critical if for every independent set $I$ of $G$ whose size has the same parity as $|V(G)|$, $G-I$ has a perfect matching. For two positive integers $a$ and $b$ with $a\leq b$, let $h$: $E(G)\rightarrow [0, 1]$ be a function on $E(G)$ satisfying $a\leq\sum _{e\in E_{G}(v_{i})}h(e)\leq b$ for any vert… ▽ More Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. A graph is $ID$-factor-critical if for every independent set $I$ of $G$ whose size has the same parity as $|V(G)|$, $G-I$ has a perfect matching. For two positive integers $a$ and $b$ with $a\leq b$, let $h$: $E(G)\rightarrow [0, 1]$ be a function on $E(G)$ satisfying $a\leq\sum _{e\in E_{G}(v_{i})}h(e)\leq b$ for any vertex $v_{i}\in V(G)$. Then the spanning subgraph with edge set $E_{h}$, denoted by $G[E_{h}]$, is called a fractional $[a, b]$-factor of $G$ with indicator function $h$, where $E_{h}=\{e\in E(G)\mid h(e)>0\}$ and $E_{G}(v_{i})=\{e\in E(G)\mid e$ is incident with $v_{i}$ in $G$\}. A graph is defined as a fractional $[a, b]$-deleted graph if for any $e\in E(G)$, $G-e$ contains a fractional $[a, b]$-factor. For any integer $k\geq 1$, a graph has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper, we firstly give a distance spectral radius condition of $G$ to guarantee that $G$ is $ID$-factor-critical. Furthermore, we provide sufficient conditions in terms of distance spectral radius and distance signless Laplacian spectral radius for a graph to contain a fractional $[a, b]$-factor, fractional $[a, b]$-deleted-factor and $k$-factor. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 10 pages

arXiv:2310.14267 [pdf, ps, other]

When Leibniz algebras are Nijenhuis?

Authors: Haiying Li, Tianshui Ma, Shuanhong Wang

Abstract: Leibniz algebras can be seen as a ``non-commutative" analogue of Lie algebras. Nijenhuis operators on Leibniz algebras introduced by Cariñena, Grabowski, and Marmo in [J. Phys. A: Math. Gen. 37(2004)] are (1, 1)-tensors with vanishing Nijenhuis torsion. Recently triangular Leibniz bialgebras were introduced by Tang and Sheng in [J. Noncommut. Geom. 16(2022)] via the twisting theory of twilled Leib… ▽ More Leibniz algebras can be seen as a ``non-commutative" analogue of Lie algebras. Nijenhuis operators on Leibniz algebras introduced by Cariñena, Grabowski, and Marmo in [J. Phys. A: Math. Gen. 37(2004)] are (1, 1)-tensors with vanishing Nijenhuis torsion. Recently triangular Leibniz bialgebras were introduced by Tang and Sheng in [J. Noncommut. Geom. 16(2022)] via the twisting theory of twilled Leibniz algebras. In this paper we find that Leibniz algebras are very closely related to Nijenhuis operators, and prove that a triangular symplectic Leibniz bialgebra together with a dual triangular structure must possess Nijenhuis operators, which makes it possible to study the applications of Nijehhuis operators from the perspective of Leibniz algebras. At the same time, we regain the classical Leibniz Yang-Baxter equation by using the tensor form of classical $r$-matrics. At last we give the classification of triangular Leibniz bialgebras of low dimensions. △ Less

Submitted 17 June, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.09975 [pdf, ps, other]

Infinitesimal (BiHom-)bialgebras of any weight (II): Representations

Authors: Tianshui Ma, Abdenacer Makhlouf

Abstract: The aim of this paper is to investigate representation theory of infinitesimal (BiHom-)bialgebras of any weight $ł$ (abbr. $ł$-inf(BH)-bialgebras). Firstly, inspired by the well-known Majid-Radford's bosonization theory in Hopf algebra theory, we present a class of $ł$-inf(BH)-bialgebras, named $ł$-inf(BH)-biproduct bialgebras, consisting of an inf(BH)-product algebra structure and an inf(BH)-copr… ▽ More The aim of this paper is to investigate representation theory of infinitesimal (BiHom-)bialgebras of any weight $ł$ (abbr. $ł$-inf(BH)-bialgebras). Firstly, inspired by the well-known Majid-Radford's bosonization theory in Hopf algebra theory, we present a class of $ł$-inf(BH)-bialgebras, named $ł$-inf(BH)-biproduct bialgebras, consisting of an inf(BH)-product algebra structure and an inf(BH)-coproduct coalgebra structure, which induces a structure of a $ł$-inf(BH)-Hopf bimodule over a $ł$-inf(BH)-bialgebra. Secondly, we explore relationships among $ł$-inf(BH)-Hopf bimodules, $ł$-Rota-Baxter (BiHom-)bimodules, (BiHom-)dendriform bimodules and (BiHom-)pre-Lie bimodules. Finally, we provide two kinds of general Gelfand-Dorfman theorems related to BiHom-Novikov algebras. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:2309.01758

arXiv:2310.07496 [pdf, ps, other]

Fundamentals of thermoelasticity for curved beams

Authors: Marcio A. Jorge Silva, To Fu Ma

Abstract: The purpose of this paper is twofold. Firstly, we conduct an in-depth analysis of mathematical modeling concerning thermal-mechanical curved beams, by taking into consideration three primary forces widely accepted in the literature: axial load, shear force, and bending moment. Additionally, we examine their appropriate thermal couplings, shedding light on the intricate interplay between stress-str… ▽ More The purpose of this paper is twofold. Firstly, we conduct an in-depth analysis of mathematical modeling concerning thermal-mechanical curved beams, by taking into consideration three primary forces widely accepted in the literature: axial load, shear force, and bending moment. Additionally, we examine their appropriate thermal couplings, shedding light on the intricate interplay between stress-strain relationships and temperature variations. This analysis is situated within the well-recognized context of the Bresse governing model for arched beams. Secondly, drawing upon distinguished constitutive laws for heat flux of conduction, we compile a comprehensive list of thermoelastic curved beam systems in various scenarios. We introduce new categories of problems that exhibit specific features from the thermal point of view. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 29 pages, 1 figure

MSC Class: 35Q79; 74A15; 74F05; 74K10; 80A05

arXiv:2310.04659 [pdf, ps, other]

Convolution formulas for multivariate arithmetic Tutte polynomials

Authors: Tianlong Ma, Xian'an **, Weiling Yang

Abstract: The multivariate arithmetic Tutte polynomial of arithmetic matroids is a generalization of the multivariate Tutte polynomial of matroids. In this note, we give the convolution formulas for the multivariate arithmetic Tutte polynomial of the product of two arithmetic matroids. In particular, the convolution formulas for the multivariate arithmetic Tutte polynomial of an arithmetic matroid are obtai… ▽ More The multivariate arithmetic Tutte polynomial of arithmetic matroids is a generalization of the multivariate Tutte polynomial of matroids. In this note, we give the convolution formulas for the multivariate arithmetic Tutte polynomial of the product of two arithmetic matroids. In particular, the convolution formulas for the multivariate arithmetic Tutte polynomial of an arithmetic matroid are obtained. Applying our results, several known convolution formulas including [5, Theorem 10.9 and Corollary 10.10] and [1, Theorems 1 and 4] are proved by a purely combinatorial proof. The proofs presented here are significantly shorter than the previous ones. In addition, we obtain a convolution formula for the characteristic polynomial of an arithmetic matroid. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.11885 [pdf, other]

On the maximum local mean order of sub-k-trees of a k-tree

Authors: Zhuo Li, Tianlong Ma, Fengming Dong, Xian'an **

Abstract: For a k-tree T, a generalization of a tree, the local mean order of sub-k-trees of T is the average order of sub-k-trees of T containing a given k-clique. The problem whether the largest local mean order of a tree (i.e., a 1-tree) at a vertex always takes on at a leaf was asked by Jamison in 1984 and was answered by Wagner and Wang in 2016. In 2018, Stephens and Oellermann asked a similar problem:… ▽ More For a k-tree T, a generalization of a tree, the local mean order of sub-k-trees of T is the average order of sub-k-trees of T containing a given k-clique. The problem whether the largest local mean order of a tree (i.e., a 1-tree) at a vertex always takes on at a leaf was asked by Jamison in 1984 and was answered by Wagner and Wang in 2016. In 2018, Stephens and Oellermann asked a similar problem: for any k-tree T, does the maximum local mean order of sub-k-trees containing a given k-clique occur at a k-clique that is not a major k-clique of T? In this paper, we give it an affirmative answer. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.01758 [pdf, ps, other]

Infinitesimal (BiHom-)bialgebras of any weight (I): Basic definitions and properties

Authors: Tianshui Ma, Abdenacer Makhlouf

Abstract: The purpose of this paper is to introduce and study $λ$-infinitesimal BiHom-bialgebras (abbr. $ł$-infBH-bialgebra) and some related structures. They can be seen as an extension of $ł$-infinitesimal bialgebras considered by Ebrahimi-Fard, including Joni and Rota's infinitesimal bialgebras as well as Loday and Ronco's infinitesimal bialgebras, and including also infinitesimal BiHom-bialgebras introd… ▽ More The purpose of this paper is to introduce and study $λ$-infinitesimal BiHom-bialgebras (abbr. $ł$-infBH-bialgebra) and some related structures. They can be seen as an extension of $ł$-infinitesimal bialgebras considered by Ebrahimi-Fard, including Joni and Rota's infinitesimal bialgebras as well as Loday and Ronco's infinitesimal bialgebras, and including also infinitesimal BiHom-bialgebras introduced by Liu, Makhlouf, Menini, Panaite. In this paper, we provide various relevant constructions and new concepts. Two ways are provided for a unitary (resp. counitary) algebra (coalgebra) to be a $ł$-infBH-bialgebra and the notion of $ł$-infBH-Hopf module is introduced and discussed. It is proved, in connexion with nonhomogeneous (co)associative BiHom-Yang-Baxter equation, that every (left BiHom-)module (resp. comodule) over a (anti-)quasitriangular (resp. (anti-)coquasitriangular) $ł$-infBH-bialgebra carries a structure of $ł$-infBH-Hopf module. Moreover, two approaches to construct BiHom-pre-Lie (co)algebras from $ł$-infBH-bialgebras are presented. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2308.03230 [pdf, ps, other]

Tractability of approximation by general shallow networks

Authors: Hrushikesh Mhaskar, Tong Mao

Abstract: In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)dτ( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form… ▽ More In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)dτ( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces. △ Less

Submitted 10 December, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

arXiv:2307.11007 [pdf, other]

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Authors: Kaiyue Wen, Zhiyuan Li, Tengyu Ma

Abstract: Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investi… ▽ More Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks. △ Less

Submitted 22 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: 34 pages,11 figures

arXiv:2306.06568 [pdf, ps, other]

Extreme coefficients of multiplicity Tutte polynomials

Authors: Xian'an **, Tianlong Ma, Weiling Yang

Abstract: The multiplicity Tutte polynomial, which includes the arithmetic Tutte polynomial, is a generalization of the classical Tutte polynomial of matroids. In this paper, we obtain an expression of the general coefficient and the expressions of six extreme coefficients of multiplicity Tutte polynomials. In particular, an expression of the general coefficient and the expressions of corresponding extreme… ▽ More The multiplicity Tutte polynomial, which includes the arithmetic Tutte polynomial, is a generalization of the classical Tutte polynomial of matroids. In this paper, we obtain an expression of the general coefficient and the expressions of six extreme coefficients of multiplicity Tutte polynomials. In particular, an expression of the general coefficient and the expressions of corresponding extreme coefficients of classical Tutte polynomial of matroids are deduced. △ Less

Submitted 5 February, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

arXiv:2305.14342 [pdf, other]

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Authors: Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

Abstract: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped St… ▽ More Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clip**. The clip** controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT models of sizes ranging from 125M to 1.5B, Sophia achieves a 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time, achieving the same perplexity with 50% fewer steps, less total compute, and reduced wall-clock time. Theoretically, we show that Sophia, in a much simplified setting, adapts to the heterogeneous curvatures in different parameter dimensions, and thus has a run-time bound that does not depend on the condition number of the loss. △ Less

Submitted 5 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.07684 [pdf, other]

Rota-Baxter operators on cocommutative Hopf algebras and Hopf braces

Authors: Huihui Zheng, Li Guo, Tianshui Ma, Liangyun Zhang

Abstract: This paper studies the relationship of Rota-Baxter operators on cocommutative Hopf algebras with Hopf braces and the Yang-Baxter equation, with emphasis on the embedding of cocommutative Hopf braces into Rota-Baxter Hopf algebras. Through Hopf braces, we establish a connection between relative Rota-Baxter operators on cocommutative Hopf algebras and bijective 1-cocycles. Finally, we introduce the… ▽ More This paper studies the relationship of Rota-Baxter operators on cocommutative Hopf algebras with Hopf braces and the Yang-Baxter equation, with emphasis on the embedding of cocommutative Hopf braces into Rota-Baxter Hopf algebras. Through Hopf braces, we establish a connection between relative Rota-Baxter operators on cocommutative Hopf algebras and bijective 1-cocycles. Finally, we introduce the notion of symmetric Hopf braces, and establish the relationship between symmetric Hopf braces and Rota-Baxter Hopf algebras. △ Less

Submitted 25 June, 2024; v1 submitted 15 April, 2023; originally announced April 2023.

Comments: 22pages

arXiv:2304.07093 [pdf, ps, other]

The sufficient conditions for $k$-leaf-connected graphs in terms of several topological indices

Authors: Tingyan Ma, Ligong Wang, Yang Hu

Abstract: Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. For $k\geq2$ and given any subset $S\subseteq|V(G)|$ with $|S|=k$, if a graph $G$ of order $|V(G)|\geq k+1$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T$, then the graph $G$ is a $k$-leaf-connected graph. A graph $G$ is called Hamilton-connected if any two vertices of $G$ are connected… ▽ More Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. For $k\geq2$ and given any subset $S\subseteq|V(G)|$ with $|S|=k$, if a graph $G$ of order $|V(G)|\geq k+1$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T$, then the graph $G$ is a $k$-leaf-connected graph. A graph $G$ is called Hamilton-connected if any two vertices of $G$ are connected by a Hamilton path. Based on the definitions of $k$-leaf-connected and Hamilton-connected, we known that a graph is $2$-leaf-connected if and only if it is Hamilton-connected. During the past decades, there have been many results of sufficient conditions for Hamilton-connected with respect to topological indices. In this paper, we present sufficient conditions for a graph $G$ to be $k$-leaf-connected in terms of the Zagreb index, the reciprocal degree distance or the hyper-Zagreb index. Furthermore, we use the first Zagreb index and hyper-Zagreb index of the complement graph $\overline{G}$ to give sufficient conditions for a graph $G$ to be $k$-leaf-connected. △ Less

Submitted 7 April, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 19 pages, conference or other essential info

MSC Class: 05C09; 05C35 ACM Class: G.2.2

arXiv:2304.02256 [pdf, other]

Extremal trees, unicyclic and bicyclic graphs with respect to $p$-Sombor spectral radii

Authors: Ruiling Zheng, Tianlong Ma, Xian'an **

Abstract: For a graph $G=(V,E)$ and $v_{i}\in V$, denote by $d_{v_{i}}$ (or $d_{i}$ for short) the degree of vertex $v_{i}$. The $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$ ($p\neq0$) of a graph $G$ is a square matrix, where the $(i,j)$-entry is equal to $\displaystyle (d_{i}^{p}+d_{j}^{p})^{\frac{1}{p}}$ if the vertices $v_{i}$ and $v_{j}$ are adjacent, and 0 otherwise. The $p$-Sombor spectral radius of… ▽ More For a graph $G=(V,E)$ and $v_{i}\in V$, denote by $d_{v_{i}}$ (or $d_{i}$ for short) the degree of vertex $v_{i}$. The $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$ ($p\neq0$) of a graph $G$ is a square matrix, where the $(i,j)$-entry is equal to $\displaystyle (d_{i}^{p}+d_{j}^{p})^{\frac{1}{p}}$ if the vertices $v_{i}$ and $v_{j}$ are adjacent, and 0 otherwise. The $p$-Sombor spectral radius of $G$, denoted by $\displaystyle ρ(\textbf{S}_{\textbf{p}}(G))$, is the largest eigenvalue of the $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$. In this paper, we consider the extremal trees, unicyclic and bicyclic graphs with respect to the $p$-Sombor spectral radii. We characterize completely the extremal graphs with the first three maximum Sombor spectral radii, which answers partially a problem posed by Liu et al. in [MATCH Commun. Math. Comput. Chem. 87 (2022) 59-87]. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2301.09907 [pdf, ps, other]

doi 10.1088/1361-6544/ad0c2b

Canonical curves and Kropina metrics in Lagrangian contact geometry

Authors: T. Ma, K. J. Flood, V. S. Matveev, V. Žádník

Abstract: We present a Fefferman-type construction from Lagrangian contact to conformal structures and examine several related topics. In particular, we concentrate on describing the canonical curves and their correspondence. We show that chains and null-chains of an integrable Lagrangian contact structure are the projections of null-geodesics of the Fefferman space. Employing the Fermat principle, we reali… ▽ More We present a Fefferman-type construction from Lagrangian contact to conformal structures and examine several related topics. In particular, we concentrate on describing the canonical curves and their correspondence. We show that chains and null-chains of an integrable Lagrangian contact structure are the projections of null-geodesics of the Fefferman space. Employing the Fermat principle, we realize chains as geodesics of Kropina (pseudo-Finsler) metrics. Using recent rigidity results, we show that ``sufficiently many'' chains determine the Lagrangian contact structure. Separately, we comment on Lagrangian contact structures induced by projective structures and the special case of dimension three. △ Less

Submitted 16 June, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: 25 pages, no figures

MSC Class: Primary: 53A40; 32V30; 53B40; 53A20; 58E10. Secondary: 32V05; 32V20; 53C22

Journal ref: Nonlinearity 37 (2024), 015007

arXiv:2212.05716 [pdf, ps, other]

On Generalization and Regularization via Wasserstein Distributionally Robust Optimization

Authors: Qinyu Wu, Jonathan Yu-Meng Li, Tiantian Mao

Abstract: Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly… ▽ More Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly applied in machine learning. Existing results on generalization bounds and the equivalency to regularization are largely limited to the setting where the Wasserstein ball is of a certain type and the decision criterion takes certain forms of an expected function. In this paper, we show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization in a significantly broader setting where the Wasserstein ball can be of a general type and the decision criterion can be a general measure of risk, i.e., nonlinear in distributions. This allows for accommodating many important classification, regression, and risk minimization applications that have not been addressed to date using Wasserstein DRO. Our results are strong in that the generalization bounds do not suffer from the curse of dimensionality and the equivalency to regularization is exact. As a byproduct, our regularization results broaden considerably the class of Wasserstein DRO models that can be solved efficiently via regularization formulations. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2211.05729 [pdf, other]

How Does Sharpness-Aware Minimization Minimize Sharpness?

Authors: Kaiyue Wen, Tengyu Ma, Zhiyuan Li

Abstract: Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient… ▽ More Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied. △ Less

Submitted 5 January, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: 94 pages, 1 figure

arXiv:2211.04778 [pdf, other]

An improvement of sufficient condition for $k$-leaf-connected graphs

Authors: Tingyan Ma, Guoyan Ao, Ruifang Liu, Ligong Wang, Yang Hu

Abstract: For integer $k\geq2,$ a graph $G$ is called $k$-leaf-connected if $|V(G)|\geq k+1$ and given any subset $S\subseteq V(G)$ with $|S|=k,$ $G$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T.$ Thus a graph is $2$-leaf-connected if and only if it is Hamilton-connected. In this paper, we present a best possible condition based upon the size to guarantee a graph to be… ▽ More For integer $k\geq2,$ a graph $G$ is called $k$-leaf-connected if $|V(G)|\geq k+1$ and given any subset $S\subseteq V(G)$ with $|S|=k,$ $G$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T.$ Thus a graph is $2$-leaf-connected if and only if it is Hamilton-connected. In this paper, we present a best possible condition based upon the size to guarantee a graph to be $k$-leaf-connected, which not only improves the results of Gurgel and Wakabayashi [On $k$-leaf-connected graphs, J. Combin. Theory Ser. B 41 (1986) 1-16] and Ao, Liu, Yuan and Li [Improved sufficient conditions for $k$-leaf-connected graphs, Discrete Appl. Math. 314 (2022) 17-30], but also extends the result of Xu, Zhai and Wang [An improvement of spectral conditions for Hamilton-connected graphs, Linear Multilinear Algebra, 2021]. Our key approach is showing that an $(n+k-1)$-closed non-$k$-leaf-connected graph must contain a large clique if its size is large enough. As applications, sufficient conditions for a graph to be $k$-leaf-connected in terms of the (signless Laplacian) spectral radius of $G$ or its complement are also presented. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: 15 pages, 2 figures

MSC Class: 05C50; 05C35

arXiv:2209.10830 [pdf]

Dynamic charging management for electric vehicle demand responsive transport

Authors: Tai-Yu Ma

Abstract: With the climate change challenges, transport network companies started to electrify their fleet to reduce CO2 emissions. However, such an ecological transition brings new research challenges for dynamic electric fleet charging management under uncertainty. In this study, we address the dynamic charging scheduling management of shared ride-hailing services with public charging stations. A two-stag… ▽ More With the climate change challenges, transport network companies started to electrify their fleet to reduce CO2 emissions. However, such an ecological transition brings new research challenges for dynamic electric fleet charging management under uncertainty. In this study, we address the dynamic charging scheduling management of shared ride-hailing services with public charging stations. A two-stage charging scheduling optimization approach under a rolling horizon framework is proposed to minimize the overall charging operational costs of the fleet, including vehicles' access times, charging times, and waiting times, by anticipating future public charging station availability. The charging station occupancy prediction is based on a hybrid LSTM (Long short-term memory) network approach and integrated into the proposed online vehicle-charger assignment. The proposed methodology is applied to a realistic simulation study in the city of Dundee, UK. The numerical studies show that the proposed approach can reduce the total charging waiting times of the fleet by 48.3% and the total charged the amount of energy of the fleet by 35.3% compared to a need-based charging reference policy. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2207.10846 [pdf, ps, other]

Favorite Downcrossing Sites of One-Dimensional Simple Random Walk

Authors: Chen-Xu Hao, Ze-Chun Hu, Ting Ma, Renming Song

Abstract: Random walk is a very important Markov process and has important applications in many fields.For a one-dimensional simple symmetric random walk $(S_n)$, a site $x$ is called a favorite downcrossing site at time $n$ if its downcrossing local time at time $n$ achieves the maximum among all sites. In this paper, we study the cardinality of the favorite downcrossing site set, and will show that with p… ▽ More Random walk is a very important Markov process and has important applications in many fields.For a one-dimensional simple symmetric random walk $(S_n)$, a site $x$ is called a favorite downcrossing site at time $n$ if its downcrossing local time at time $n$ achieves the maximum among all sites. In this paper, we study the cardinality of the favorite downcrossing site set, and will show that with probability 1 there are only finitely many times at which there are at least four favorite downcrossing sites and three favorite downcrossing sites occurs infinitely often. Some related open questions will be introduced. △ Less

Submitted 29 November, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 19 pages

arXiv:2207.09403 [pdf, other]

A General Wasserstein Framework for Data-driven Distributionally Robust Optimization: Tractability and Applications

Authors: Jonathan Yu-Meng Li, Tiantian Mao

Abstract: Data-driven distributionally robust optimization is a recently emerging paradigm aimed at finding a solution that is driven by sample data but is protected against sampling errors. An increasingly popular approach, known as Wasserstein distributionally robust optimization (DRO), achieves this by applying the Wasserstein metric to construct a ball centred at the empirical distribution and finding a… ▽ More Data-driven distributionally robust optimization is a recently emerging paradigm aimed at finding a solution that is driven by sample data but is protected against sampling errors. An increasingly popular approach, known as Wasserstein distributionally robust optimization (DRO), achieves this by applying the Wasserstein metric to construct a ball centred at the empirical distribution and finding a solution that performs well against the most adversarial distribution from the ball. In this paper, we present a general framework for studying different choices of a Wasserstein metric and point out the limitation of the existing choices. In particular, while choosing a Wasserstein metric of a higher order is desirable from a data-driven perspective, given its less conservative nature, such a choice comes with a high price from a robustness perspective - it is no longer applicable to many heavy-tailed distributions of practical concern. We show that this seemingly inevitable trade-off can be resolved by our framework, where a new class of Wasserstein metrics, called coherent Wasserstein metrics, is introduced. Like Wasserstein DRO, distributionally robust optimization using the coherent Wasserstein metrics, termed generalized Wasserstein distributionally robust optimization (GW-DRO), has all the desirable performance guarantees: finite-sample guarantee, asymptotic consistency, and computational tractability. The worst-case expectation problem in GW-DRO is in general a nonconvex optimization problem, yet we provide new analysis to prove its tractability without relying on the common duality scheme. Our framework, as shown in this paper, offers a fruitful opportunity to design novel Wasserstein DRO models that can be applied in various contexts such as operations management, finance, and machine learning. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.08703 [pdf, ps, other]

Rota-Baxter Lie bialgebras, classical Yang-Baxter equations and special L-dendriform bialgebras

Authors: Chengming Bai, Li Guo, Guilai Liu, Tianshui Ma

Abstract: We establish a bialgebra structure on Rota-Baxter Lie algebras following the Manin triple approach to Lie bialgebras. Explicitly, Rota-Baxter Lie bialgebras are characterized by generalizing matched pairs of Lie algebras and Manin triples of Lie algebras to the context of Rota-Baxter Lie algebras. The coboundary case leads to the introduction of the admissible classical Yang-Baxter equation (CYBE)… ▽ More We establish a bialgebra structure on Rota-Baxter Lie algebras following the Manin triple approach to Lie bialgebras. Explicitly, Rota-Baxter Lie bialgebras are characterized by generalizing matched pairs of Lie algebras and Manin triples of Lie algebras to the context of Rota-Baxter Lie algebras. The coboundary case leads to the introduction of the admissible classical Yang-Baxter equation (CYBE) in Rota-Baxter Lie algebras, for which the antisymmetric solutions give rise to Rota-Baxter Lie bialgebras. The notions of $\mathcal{O}$-operators on Rota-Baxter Lie algebras and Rota-Baxter pre-Lie algebras are introduced to produce antisymmetric solutions of the admissible CYBE. Furthermore, extending the well-known property that a Rota-Baxter Lie algebra of weight zero induces a pre-Lie algebra, the Rota-Baxter Lie bialgebra of weight zero induces a bialgebra structure of independent interest, namely the special L-dendriform bialgebra, which is equivalent to a Lie group with a left-invariant flat pseudo-metric in geometry. This induction is also characterized as the inductions between the corresponding Manin triples and matched pairs. Finally, antisymmetric solutions of the admissible CYBE in a Rota-Baxter Lie algebra of weight zero give special L-dendriform bialgebras. In particular, both Rota-Baxter algebras of weight zero and Rota-Baxter pre-Lie algebras of weight zero can be used to construct special L-dendriform algebras. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: 29 pages

MSC Class: 17B38; 17B62; 17B10; 16T25; 17A30; 17A36; 17D25

arXiv:2205.06433 [pdf, ps, other]

doi 10.1080/00927872.2022.2065492

Double crossed biproducts and related structures

Authors: Tianshui Ma, Jie Li, Haiyan Yang, Shuanhong Wang

Abstract: Let $H$ be a bialgebra. Let $σ: H\otimes H\to A$ be a linear map, where $A$ is a left $H$-comodule coalgebra, and an algebra with a left $H$-weak action $\triangleright$. Let $τ: H\otimes H\to B$ be a linear map, where $B$ is a right $H$-comodule coalgebra, and an algebra with a right $H$-weak action $\triangleleft$. In this paper, we improve the necessary conditions for the two-sided crossed prod… ▽ More Let $H$ be a bialgebra. Let $σ: H\otimes H\to A$ be a linear map, where $A$ is a left $H$-comodule coalgebra, and an algebra with a left $H$-weak action $\triangleright$. Let $τ: H\otimes H\to B$ be a linear map, where $B$ is a right $H$-comodule coalgebra, and an algebra with a right $H$-weak action $\triangleleft$. In this paper, we improve the necessary conditions for the two-sided crossed product algebra $A\#^σ H~{^τ\#} B$ and the two-sided smash coproduct coalgebra $A\times H\times B$ to form a bialgebra (called double crossed biproduct) such that the condition $b_{[1]}\triangleright a_0\otimes b_{[0]}\triangleleft a_{-1}=a\otimes b$ in Majid's double biproduct (or double-bosonization) is one of the necessary conditions. On the other hand, we provide a more general two-sided crossed product algebra structure via Brzezński's crossed product and give some applications. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Communications in Algebra,2022

Journal ref: Comm. Algebra 50(10)(2022), 4517-4535

arXiv:2204.04373 [pdf, other]

Tight toughness, isolated toughness and binding number bounds for the $\{K_2,C_n\}$-factors

Authors: Xiaxia Guan, Tianlong Ma, Chao Shi

Abstract: The $\{K_2,C_n\}$-factor of a graph is a spanning subgraph whose each component is either $K_2$ or $C_n$. In this paper, a sufficient condition with regard to tight toughness, isolated toughness and binding number bounds to guarantee the existence of the $\{K_2,C_{2i+1}| i\geq 2 \}$-factor for any graph is obtained, which answers a problem due to Gao and Wang (J. Oper. Res. Soc. China (2021), http… ▽ More The $\{K_2,C_n\}$-factor of a graph is a spanning subgraph whose each component is either $K_2$ or $C_n$. In this paper, a sufficient condition with regard to tight toughness, isolated toughness and binding number bounds to guarantee the existence of the $\{K_2,C_{2i+1}| i\geq 2 \}$-factor for any graph is obtained, which answers a problem due to Gao and Wang (J. Oper. Res. Soc. China (2021), https://doi.org/10.1007/s40305-021-00357-6). △ Less

Submitted 8 April, 2022; originally announced April 2022.

arXiv:2201.12496 [pdf, ps, other]

A direct and elementary proof of the well-definedness of the interior and exterior polynomials of hypergraphs

Authors: Xiaxia Guan, Xian'an **, Tianlong Ma

Abstract: T. Kálmán (A version of Tutte's polynomial for hypergraphs, Adv. Math. 244 (2013) 823-873.) introduced the interior and exterior polynomials which are generalizations of the Tutte polynomial $T(x,y)$ on plane points $(1/x,1)$ and $(1,1/y)$ to hypergraphs. The two polynomials are defined under a fixed ordering of hyperedges, and are proved to be independent of the ordering using techniques of polyt… ▽ More T. Kálmán (A version of Tutte's polynomial for hypergraphs, Adv. Math. 244 (2013) 823-873.) introduced the interior and exterior polynomials which are generalizations of the Tutte polynomial $T(x,y)$ on plane points $(1/x,1)$ and $(1,1/y)$ to hypergraphs. The two polynomials are defined under a fixed ordering of hyperedges, and are proved to be independent of the ordering using techniques of polytopes. In this paper, similar to the Tutte's original proof we provide a direct and elementary proof for the well-definedness of the interior and exterior polynomials of hypergraphs. △ Less

Submitted 28 January, 2022; originally announced January 2022.

Comments: 14 pages

arXiv:2201.00271 [pdf, ps, other]

Transposed BiHom-Poisson algebras

Authors: Tianshui Ma, Bei Li

Abstract: In this paper, we introduce the concept of transposed BiHom-Poisson (abbr. TBP) algebras which can be constructed by the BiHom-Novikov-Poisson algebras. Several useful identities for TBP algebras are provided. We also prove that the tensor product of two (T)BP algebras are closed. The notions of BP 3-Lie algebras and TBP 3-Lie algebras are presented and TBP algebras can induce TBP 3-Lie algebras b… ▽ More In this paper, we introduce the concept of transposed BiHom-Poisson (abbr. TBP) algebras which can be constructed by the BiHom-Novikov-Poisson algebras. Several useful identities for TBP algebras are provided. We also prove that the tensor product of two (T)BP algebras are closed. The notions of BP 3-Lie algebras and TBP 3-Lie algebras are presented and TBP algebras can induce TBP 3-Lie algebras by two approaches. Finally, we give some examples for the TBP algebras of dimension 2. △ Less

Submitted 1 January, 2022; originally announced January 2022.

MSC Class: 17B61; 17D30; 17A30; 17B63

Journal ref: Comm. Algebra 51(2)(2023),528-551

arXiv:2112.10928 [pdf, ps, other]

Bialgebras, Frobenius algebras and associative Yang-Baxter equations for Rota-Baxter algebras

Authors: Chengming Bai, Li Guo, Tianshui Ma

Abstract: Rota-Baxter operators and bialgebras go hand in hand in their applications, such as in the Connes-Kreimer approach to renormalization and the operator approach to the classical Yang-Baxter equation. We establish a bialgebra structure that is compatible with the Rota-Baxter operator, called the Rota-Baxter antisymmetric infinitesimal (ASI) bialgebra. This bialgebra is characterized by generalizatio… ▽ More Rota-Baxter operators and bialgebras go hand in hand in their applications, such as in the Connes-Kreimer approach to renormalization and the operator approach to the classical Yang-Baxter equation. We establish a bialgebra structure that is compatible with the Rota-Baxter operator, called the Rota-Baxter antisymmetric infinitesimal (ASI) bialgebra. This bialgebra is characterized by generalizations of matched pairs of algebras and double constructions of Frobenius algebras to the context of Rota-Baxter algebras. The study of the coboundary case leads to an enrichment of the associative Yang-Baxter equation (AYBE) to Rota-Baxter algebras. Antisymmetric solutions of the equation are used to construct Rota-Baxter ASI bialgebras. The notions of an $\mathcal{O}$-operator on a Rota-Baxter algebra and a Rota-Baxter dendriform algebra are also introduced to produce solutions of the AYBE in Rota-Baxter algebras and thus to provide Rota-Baxter ASI bialgebras. An unexpected byproduct is that a Rota-Baxter ASI bialgebra of weight zero gives rise to a quadri-bialgebra instead of bialgebra constructions for the dendriform algebra. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: 27 pages

MSC Class: 16T10; 16T25; 17B38; 16W99; 17A30; 16T05; 17B62; 57R56; 81$60

arXiv:2111.03741 [pdf, other]

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

Authors: Margalit Glasgow, Honglin Yuan, Tengyu Ma

Abstract: Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the ex… ▽ More Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the existing analysis captures the capacity of the algorithm. In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable. Additionally, we establish a lower bound in a heterogeneous setting that nearly matches the existing upper bound. While our lower bounds show the limitations of FedAvg, under an additional assumption of third-order smoothness, we prove more optimistic state-of-the-art convergence results in both convex and non-convex settings. Our analysis stems from a notion we call iterate bias, which is defined by the deviation of the expectation of the SGD trajectory from the noiseless gradient descent trajectory with the same initialization. We prove novel sharp bounds on this quantity, and show intuitively how to analyze this quantity from a Stochastic Differential Equation (SDE) perspective. △ Less

Submitted 11 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: Accepted to AISTATS 2022. The first two authors contributed equally

arXiv:2111.00688 [pdf, ps, other]

Three Favorite Edges Occurs Infinitely Often for One-Dimensional Simple Random Walk

Authors: Chen-Xu Hao, Ze-Chun Hu, Ting Ma, Renming Song

Abstract: For a one-dimensional simple symmetric random walk $(S_n)$, an edge $x$ (between points $x-1$ and $x$) is called a favorite edge at time $n$ if its local time at $n$ achieves the maximum among all edges. In this paper, we show that with probability 1 three favorite edges occurs infinitely often. Our work is inspired by Tóth and Werner [Combin. Probab. Comput. {\bf 6} (1997) 359-369], and Ding and… ▽ More For a one-dimensional simple symmetric random walk $(S_n)$, an edge $x$ (between points $x-1$ and $x$) is called a favorite edge at time $n$ if its local time at $n$ achieves the maximum among all edges. In this paper, we show that with probability 1 three favorite edges occurs infinitely often. Our work is inspired by Tóth and Werner [Combin. Probab. Comput. {\bf 6} (1997) 359-369], and Ding and Shen [Ann. Probab. {\bf 46} (2018) 2545-2561], disproves a conjecture mentioned in Remark 1 on page 368 of Tóth and Werner [Combin. Probab. Comput. {\bf 6} (1997) 359-369]. △ Less

Submitted 12 July, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: 23 pages

arXiv:2111.00371 [pdf, ps, other]

Nonhomogeneous associative Yang-Baxter equations

Authors: Tianshui Ma, Jie Li

Abstract: We introduce the notion of associative (BiHom-)Yang-Baxter pair of weight $(λ,γ)$ which can provide the solution to the double curved Rota-Baxter (BiHom-)system. Equivalent characterizations of (quasitriangular) covariant BiHom-bialgebra are given. We also prove that associative BiHom-Yang-Baxter equation of weight $-1$ can be obtained by the unitary quasitriangular covariant BiHom-bialgebra. At l… ▽ More We introduce the notion of associative (BiHom-)Yang-Baxter pair of weight $(λ,γ)$ which can provide the solution to the double curved Rota-Baxter (BiHom-)system. Equivalent characterizations of (quasitriangular) covariant BiHom-bialgebra are given. We also prove that associative BiHom-Yang-Baxter equation of weight $-1$ can be obtained by the unitary quasitriangular covariant BiHom-bialgebra. At last, we present two approaches to construct (BiHom-)pre-Lie modules from Rota-Baxter (BiHom-)paired modules. △ Less

Submitted 30 October, 2021; originally announced November 2021.

Journal ref: Bull. Math. Soc. SCI. Math. Roumanie (N.S.) 65(113)(1)(2022), 97-118

arXiv:2111.00370 [pdf, ps, other]

On the tensor product of two oriented quantum algebras

Authors: Tianshui Ma, Haiyang Yang, Tao Yang

Abstract: In this paper, we give the oriented quantum algebra (abbr. OQA) structures on the tensor product of two different OQAs by using Chen's weak $\mathfrak{R}$-matrix in [J. Algebra 204(1998):504-531]. As a special case, the OQA structures on the tensor product of an OQA with itself are provided, which are different from Radford's results in [J. Knot Theory Ramifications 16(2007):929-957]. In this paper, we give the oriented quantum algebra (abbr. OQA) structures on the tensor product of two different OQAs by using Chen's weak $\mathfrak{R}$-matrix in [J. Algebra 204(1998):504-531]. As a special case, the OQA structures on the tensor product of an OQA with itself are provided, which are different from Radford's results in [J. Knot Theory Ramifications 16(2007):929-957]. △ Less

Submitted 30 October, 2021; originally announced November 2021.

Journal ref: Journal of Algebra and its Applications 19(6)(2020), 2050104

arXiv:2110.09736 [pdf, ps, other]

Schwarz symmetrizations in parabolic equations on complete manifolds

Authors: Haiqing Cheng, Tengfei Ma, Kui Wang

Abstract: In this article, we prove a sharp estimate for the solutions to parabolic equations on manifolds. Precisely, using symmetrization techniques and isoperimetric inequalities on Riemannian manifold, we obtain a Bandle's comparison on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. Our results generalize Bandle's result [… ▽ More In this article, we prove a sharp estimate for the solutions to parabolic equations on manifolds. Precisely, using symmetrization techniques and isoperimetric inequalities on Riemannian manifold, we obtain a Bandle's comparison on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. Our results generalize Bandle's result [6] to Riemannian setting, and Talenti's comparison for elliptic equation on manifolds by Colladay-Langford-McDonald [12] and Chen-Li [9] to parabolic equations. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: All comments are welcome. arXiv admin note: text overlap with arXiv:2110.06814

arXiv:2110.06814 [pdf, ps, other]

Comparison results for Poisson equation with mixed boundary condition on manifolds

Authors: Haiqing Cheng, Tengfei Ma, Kui Wang

Abstract: In this article, we establish a $L^1$ estimate for solutions to Poisson equation with mixed boundary condition, on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. On Riemann surfaces we obtain a Talenti-type comparison. Our results generalize main theorems in [2] to Riemannian setting, and Chen-Li's result [8] to the… ▽ More In this article, we establish a $L^1$ estimate for solutions to Poisson equation with mixed boundary condition, on complete noncompact manifolds with nonnegative Ricci curvature and compact manifolds with positive Ricci curvature respectively. On Riemann surfaces we obtain a Talenti-type comparison. Our results generalize main theorems in [2] to Riemannian setting, and Chen-Li's result [8] to the case of variable Robin parameter. △ Less

Submitted 9 November, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: All comments are welcome! In this version, we corrected some typos

arXiv:2106.06530 [pdf, other]

Label Noise SGD Provably Prefers Flat Global Minimizers

Authors: Alex Damian, Tengyu Ma, Jason D. Lee

Abstract: In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to… ▽ More In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to a stationary point of a regularized loss $L(θ) +λR(θ)$, where $L(θ)$ is the training loss, $λ$ is an effective regularization parameter depending on the step size, strength of the label noise, and the batch size, and $R(θ)$ is an explicit regularizer that penalizes sharp minimizers. Our analysis uncovers an additional regularization effect of large learning rates beyond the linear scaling rule that penalizes large eigenvalues of the Hessian more than small ones. We also prove extensions to classification with general loss functions, SGD with momentum, and SGD with general noise covariance, significantly strengthening the prior work of Blanc et al. to global convergence and large learning rates and of HaoChen et al. to general models. △ Less

Submitted 4 December, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

Comments: 57 pages, 5 figures, NeurIPS 2021

arXiv:2106.02815 [pdf]

Optimal queueing-based rebalancing for one-way electric carsharing systems with stochastic demand

Authors: Tai-Yu Ma, Theodoros Pantelidis, Joseph Y. J. Chow

Abstract: Viability of electric vehicle car sharing operations depends on rebalancing algorithms. Earlier methods in the literature suggest a trend toward Markovian stochastic demand with server relocation with queueing constraints. We propose a new model formulation based on a node-charge graph structure that extends the relocation model to include transshipment relocation flows. Computational tests with u… ▽ More Viability of electric vehicle car sharing operations depends on rebalancing algorithms. Earlier methods in the literature suggest a trend toward Markovian stochastic demand with server relocation with queueing constraints. We propose a new model formulation based on a node-charge graph structure that extends the relocation model to include transshipment relocation flows. Computational tests with up to 1000 node (and 4000 node-charges) suggest promising avenues for further study. △ Less

Submitted 5 June, 2021; originally announced June 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2001.07282

arXiv:2104.05529 [pdf, ps, other]

doi 10.1016/j.geomphys.2022.104469

Rota-Baxter operators on Turaev's Hopf group (co)algebras I: Basic definitions and related algebraic structures

Authors: Tianshui Ma, Jie Li, Liangyun Chen, Shuanhong Wang

Abstract: We find a natural compatible condition between the Rota-Baxter operator and Turaev's (Hopf) group-(co)algebras, which leads to the concept of Rota-Baxter Turaev's (Hopf) group-(co)algebra. Two characterizations of Rota-Baxter Turaev's group-algebras (abbr. T-algebras) are obtained: one by Atkinson factorization and the other by T-quasi-idempotent elements. The relations among some related Turaev's… ▽ More We find a natural compatible condition between the Rota-Baxter operator and Turaev's (Hopf) group-(co)algebras, which leads to the concept of Rota-Baxter Turaev's (Hopf) group-(co)algebra. Two characterizations of Rota-Baxter Turaev's group-algebras (abbr. T-algebras) are obtained: one by Atkinson factorization and the other by T-quasi-idempotent elements. The relations among some related Turaev's group algebraic structures (such as (tri)dendriform T-algebras, Zinbiel T-algebras, pre-Lie T-algebras, Lie T-algebras) are discussed, and some concrete examples from the algebras of dimensions 2,3 and 4 are given. At last we prove that Rota-Baxter Poisson T-algebras can produce pre-Poisson T-algebras and Poisson T-algebras can be obtained from pre-Poisson T-algebras. △ Less

Submitted 8 March, 2023; v1 submitted 12 April, 2021; originally announced April 2021.

Journal ref: Journal of Geometry and Physics 175(2022),104469

arXiv:2103.13462 [pdf, other]

Why Do Local Methods Solve Nonconvex Problems?

Authors: Tengyu Ma

Abstract: Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an… ▽ More Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of Algorithms"

arXiv:2102.12025 [pdf, ps, other]

Attractors for locally damped Bresse systems and a unique continuation property

Authors: To Fu Ma, Rodrigo N. Monteiro, Paulo N. Seminario-Huertas

Abstract: This paper is devoted to Bresse systems, a robust model for circular beams, given by a set of three coupled wave equations. The main objective is to establish the existence of global attractors for dynamics of semilinear problems with localized dam**. In order to deal with localized dam** a unique continuation property (UCP) is needed. Therefore we also provide a suitable UCP for Bresse system… ▽ More This paper is devoted to Bresse systems, a robust model for circular beams, given by a set of three coupled wave equations. The main objective is to establish the existence of global attractors for dynamics of semilinear problems with localized dam**. In order to deal with localized dam** a unique continuation property (UCP) is needed. Therefore we also provide a suitable UCP for Bresse systems. Our strategy is to set the problem in a Riemannian geometry framework and see the system as a single equation with different Riemann metrics. Then we perform Carleman-type estimates to get our result. △ Less

Submitted 23 February, 2021; originally announced February 2021.

arXiv:2102.08296 [pdf, other]

doi 10.1007/s12220-021-00723-z

Geodesic random walks, diffusion processes and Brownian motion on Finsler manifolds

Authors: Tianyu Ma, Vladimir S. Matveev, Ilya Pavlyukevich

Abstract: We show that geodesic random walks on a complete Finsler manifold of bounded geometry converge to a diffusion process which is, up to a drift, the Brownian motion corresponding to a Riemannian metric. We show that geodesic random walks on a complete Finsler manifold of bounded geometry converge to a diffusion process which is, up to a drift, the Brownian motion corresponding to a Riemannian metric. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: 32 pages, 3 figures. Comments from reads are welcome

Journal ref: The Journal of Geometric Analysis volume 31, pages 12446--12484 (2021) (open access)

arXiv:2102.02089 [pdf, ps, other]

Tutte polynomials of fan-like graphs with applications in benzenoid systems

Authors: Tianlong Ma, Xian'an **, Fuji Zhang

Abstract: We study the computation of the Tutte polynomials of fan-like graphs and obtain expressions of their Tutte polynomials via generating functions. As applications, Tutte polynomials, in particular, the number of spanning trees, of two kinds of benzenoid systems, i.e. pyrene chains and triphenylene chains, are obtained. We study the computation of the Tutte polynomials of fan-like graphs and obtain expressions of their Tutte polynomials via generating functions. As applications, Tutte polynomials, in particular, the number of spanning trees, of two kinds of benzenoid systems, i.e. pyrene chains and triphenylene chains, are obtained. △ Less

Submitted 3 February, 2021; originally announced February 2021.

arXiv:2010.01541 [pdf]

doi 10.1371/journal.pone.0251582

Two-stage battery recharge scheduling and vehicle-charger assignment policy for dynamic electric dial-a-ride services

Authors: Tai-Yu Ma

Abstract: Coordinating the charging scheduling of electric vehicles for dynamic dial-a-ride services is challenging considering charging queuing delays and stochastic customer demand. We propose a new two-stage solution approach to handle dynamic vehicle charging scheduling to minimize the costs of daily charging operations of the fleet. The approach comprises two components: daily vehicle charging scheduli… ▽ More Coordinating the charging scheduling of electric vehicles for dynamic dial-a-ride services is challenging considering charging queuing delays and stochastic customer demand. We propose a new two-stage solution approach to handle dynamic vehicle charging scheduling to minimize the costs of daily charging operations of the fleet. The approach comprises two components: daily vehicle charging scheduling and online vehicle-charger assignment. A new battery charge scheduling model is proposed to obtain the vehicle charging schedules by minimizing the costs of vehicle daily charging operations while satisfying vehicle driving needs to serve customers. In the second stage, an online vehicle-charger assignment model is developed to minimize the total vehicle idle time for charges by considering queuing delays at the level of chargers. An efficient Lagrangian relaxation algorithm is proposed to solve the large-scale vehicle-charger assignment problem with small optimality gaps. The approach is applied to a realistic dynamic dial-a-ride service case study in Luxembourg and compared with the nearest charging station charging policy and first-come-first-served minimum charging delay policy under different charging infrastructure scenarios. Our computational results show that the approach can achieve significant savings for the operator in terms of charging waiting times (-74.9%), charging times (-38.6%), and charged energy costs (-27.4%). A sensitivity analysis is conducted to evaluate the impact of the different model parameters, showing the scalability and robustness of the approach in a stochastic environment. △ Less

Submitted 15 April, 2021; v1 submitted 4 October, 2020; originally announced October 2020.

arXiv:2009.08678 [pdf, ps, other]

Limiting behaviors for longest consecutive switches in an IID Bernoulli sequence

Authors: Chen-Xu Hao, Ting Ma

Abstract: In this paper we mainly discuss sharp lower and upper bounds for the length of longest consecutive switches in IID Bernoulli sequences. This work is an extension of results in Erdős and Révész (1975) for longest head-run and Hao et al. (2021) for longest consecutive switches in unbiased coin-tossing, and might be applied to reliability theory, biology, quality control, pattern recognition, finan… ▽ More In this paper we mainly discuss sharp lower and upper bounds for the length of longest consecutive switches in IID Bernoulli sequences. This work is an extension of results in Erdős and Révész (1975) for longest head-run and Hao et al. (2021) for longest consecutive switches in unbiased coin-tossing, and might be applied to reliability theory, biology, quality control, pattern recognition, finance, etc. △ Less

Submitted 18 January, 2022; v1 submitted 18 September, 2020; originally announced September 2020.

MSC Class: 60F15

arXiv:2008.06211 [pdf, ps, other]

A Note on the Gaussian Minimum Conjecture

Authors: Yang-Fan Zhong, Ting Ma, Ze-Chun Hu

Abstract: Let $n\geq 2$ and $(X_i,1\leq i\leq n)$ be a centered Gaussian random vector. The Gaussian minimum conjecture says that $E\left(\min_{1\leq i\leq n}|X_i|\right)\geq E\left(\min_{1\leq i\leq n}|Y_i|\right)$, where $Y_1,\ldots,Y_n$ are independent centered Gaussian random variables with $E(X_i^2)=E(Y_i^2)$ for any $i=1,\ldots,n$. In this note, we will show that this conjecture holds if and only if… ▽ More Let $n\geq 2$ and $(X_i,1\leq i\leq n)$ be a centered Gaussian random vector. The Gaussian minimum conjecture says that $E\left(\min_{1\leq i\leq n}|X_i|\right)\geq E\left(\min_{1\leq i\leq n}|Y_i|\right)$, where $Y_1,\ldots,Y_n$ are independent centered Gaussian random variables with $E(X_i^2)=E(Y_i^2)$ for any $i=1,\ldots,n$. In this note, we will show that this conjecture holds if and only if $n=2$. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Comments: 10 pages

Showing 1–50 of 110 results for author: Ma, T