Search | arXiv e-print repository

On a perturbation analysis of Higham squared maximum Gaussian elimination growth matrices

Authors: Alan Edelman, John Urschel, Bowen Zhu

Abstract: Gaussian elimination is the most popular technique for solving a dense linear system. Large errors in this procedure can occur in floating point arithmetic when the matrix's growth factor is large. We study this potential issue and how perturbations can improve the robustness of the Gaussian elimination algorithm. In their 1989 paper, Higham and Higham characterized the complete set of real n by n… ▽ More Gaussian elimination is the most popular technique for solving a dense linear system. Large errors in this procedure can occur in floating point arithmetic when the matrix's growth factor is large. We study this potential issue and how perturbations can improve the robustness of the Gaussian elimination algorithm. In their 1989 paper, Higham and Higham characterized the complete set of real n by n matrices that achieves the maximum growth factor under partial pivoting. This set of matrices serves as the critical focus of this work. Through theoretical insights and empirical results, we illustrate the high sensitivity of the growth factor of these matrices to perturbations and show how subtle changes can be strategically applied to matrix entries to significantly reduce the growth, thus enhancing computational stability and accuracy. △ Less

Submitted 2 June, 2024; originally announced June 2024.

MSC Class: 65F05; 15A23

arXiv:2402.12633 [pdf, ps, other]

Scalar curvature rigidity of the four-dimensional sphere

Authors: Simone Cecchini, **min Wang, Zhizhang Xie, Bo Zhu

Abstract: Let $(M,g)$ be a closed connected oriented (possibly non-spin) smooth four-dimensional manifold with scalar curvature bounded below by $n(n-1)$. In this paper, we prove that if $f$ is a smooth map of non-zero degree from $(M, g)$ to the unit four-sphere, then $f$ is an isometry. Following ideas of Gromov, we use $μ$-bubbles and a version with coefficients of the rigidity of the three-sphere to rul… ▽ More Let $(M,g)$ be a closed connected oriented (possibly non-spin) smooth four-dimensional manifold with scalar curvature bounded below by $n(n-1)$. In this paper, we prove that if $f$ is a smooth map of non-zero degree from $(M, g)$ to the unit four-sphere, then $f$ is an isometry. Following ideas of Gromov, we use $μ$-bubbles and a version with coefficients of the rigidity of the three-sphere to rule out the case of strict inequality. Our proof of rigidity is based on the harmonic map heat flow coupled with the Ricci flow. △ Less

Submitted 20 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Improved exposition. Comments are welcome!

arXiv:2401.13513 [pdf, ps, other]

Silting interval reduction and 0-Auslander extriangulated categories

Authors: Jixing Pan, Bin Zhu

Abstract: We give a reduction technique for silting intervals in extriangulated categories, which we call "silting interval reduction". It provides a reduction technique for tilting subcategories when the extriangulated categories are exact categories. In 0-Auslander extriangulated categories (a generalization of the well-known two-term category $K^{[-1,0]}(\mathsf{proj}Λ)$ for an Artin algebra $Λ$), we p… ▽ More We give a reduction technique for silting intervals in extriangulated categories, which we call "silting interval reduction". It provides a reduction technique for tilting subcategories when the extriangulated categories are exact categories. In 0-Auslander extriangulated categories (a generalization of the well-known two-term category $K^{[-1,0]}(\mathsf{proj}Λ)$ for an Artin algebra $Λ$), we provide a reduction theory for silting objects as an application of silting interval reduction. It unifies two-term silting reduction and Iyama-Yoshino's 2-Calabi-Yau reduction. The mutation theory developed by Gorsky, Nakaoka and Palu recently can be deduced from it. Since there are bijections between the silting objects and the support $τ$-tilting modules over certain finite dimensional algebras, we show it is compatible with $τ$-tilting reduction. This compatibility theorem also unifies the two compatibility theorems obtained by Jasso in his work on $τ$-tilting reduction. We give a new construction for 0-Auslander extriangulated categories using silting mutation, together with silting interval reduction, we obtain some results on silting quivers. Finally, we prove that $d$-Auslander extriangulated categories are related to a certain sequence of silting mutations. △ Less

Submitted 7 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: 31 pages

MSC Class: 16G10; 18G80; 18E40; 16S90

arXiv:2401.06368 [pdf, ps, other]

Arithmetic Siegel-Weil formula on $\mathcal{X}_0(N)$: singular terms

Authors: Baiqing Zhu

Abstract: For arbitrary level $N$, we relate the generating series of codimension 2 special cycles on $\mathcal{X}_{0}(N)$ to the derivatives of a genus 2 Eisenstein series, especially the singular terms of both sides. On the analytic side, we use difference formulas of local densities to relate the singular Fourier coefficients of the genus 2 Eisenstein series to the nonsingular Fourier coefficients of a g… ▽ More For arbitrary level $N$, we relate the generating series of codimension 2 special cycles on $\mathcal{X}_{0}(N)$ to the derivatives of a genus 2 Eisenstein series, especially the singular terms of both sides. On the analytic side, we use difference formulas of local densities to relate the singular Fourier coefficients of the genus 2 Eisenstein series to the nonsingular Fourier coefficients of a genus 1 Eisenstein series. On the geometric side, we study the reduction of cusps to compute the divisor class of the Hodge bundle and the heights of special divisors. When $N$ is square-free, this gives a different proof of the main results in the works of Du, Yang and Sankaran, Shi, and Yang. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 46 pages. arXiv admin note: text overlap with arXiv:2106.15038 by other authors

arXiv:2311.15347 [pdf, ps, other]

Filling Radius, Quantitative $K$-theory and Positive Scalar Curvature

Authors: **min Wang, Zhizhang Xie, Guoliang Yu, Bo Zhu

Abstract: We prove a quantitative upper bound on the filling radius of complete, spin manifolds with uniformly positive scalar curvature using the quantitative operator $K$-theory and index theory. We prove a quantitative upper bound on the filling radius of complete, spin manifolds with uniformly positive scalar curvature using the quantitative operator $K$-theory and index theory. △ Less

Submitted 28 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: minor revision

MSC Class: 53C23; 19D55; 58B34; 46L80

arXiv:2310.07838 [pdf, other]

Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

Authors: Qingyue Zhao, Banghua Zhu

Abstract: We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the… ▽ More We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{{|{\mathcal S}||{\mathcal A}|}/{n}}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory. △ Less

Submitted 14 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 41 pages, 2 figures; Appendix polished

arXiv:2308.05167 [pdf, ps, other]

Total positivity from a kind of lattice paths

Authors: Yu-Jie Cui, Bao-Xuan Zhu

Abstract: Total positivity of matrices is deeply studied and plays an important role in various branches of mathematics. The main purpose of this paper is to study total positivity of a matrix $M=[M_{n,k}]_{n,k}$ generated by the weighted lattice paths in $\mathbb{N}^2$ from the origin $(0,0)$ to the point $(k,n)$ consisting of types of steps: $(0,1)$ and $(1,t+i)$ for $0\leq i\leq \ell$, where each step… ▽ More Total positivity of matrices is deeply studied and plays an important role in various branches of mathematics. The main purpose of this paper is to study total positivity of a matrix $M=[M_{n,k}]_{n,k}$ generated by the weighted lattice paths in $\mathbb{N}^2$ from the origin $(0,0)$ to the point $(k,n)$ consisting of types of steps: $(0,1)$ and $(1,t+i)$ for $0\leq i\leq \ell$, where each step $(0,1)$ from height~$n-1$ gets the weight~$b_n(\textbf{y})$ and each step $(1,t+i)$ from height~$n-t-i$ gets the weight $a_n^{(i)}(\textbf{x})$. Using an algebraic method, we prove that the $\textbf{x}$-total positivity of the weight matrix $[a_i^{(i-j)}(\textbf{x})]_{i,j}$ implies that of $M$. Furthermore, using the Lindström-Gessel-Viennot lemma, we obtain that both $M$ and the Toeplitz matrix of each row sequence of $M$ with $t\geq1$ are $\textbf{x}$-totally positive under the following three cases respectively: (1) $\ell=1$, (2) $\ell=2$ and restrictions for $a_n^{(i)}$, (3) general $\ell$ and both $a^{(i)}_n$ and $b_n$ are independent of $n$. In addition, for the case (3), we show that the matrix $M$ is a Riordan array, present its explicit formula and prove total positivity of the Toeplitz matrix of the each column of $M$. In particular, from the results for Toeplitz-total positivity, we also obtain the Pólya frequency and log-concavity of the corresponding sequence. Finally, as applications, we in a unified manner establish total positivity and the Toeplitz-total positivity for many well-known combinatorial triangles, including the Pascal triangle, the Pascal square, the Delannoy triangle, the Delannoy square, the signless Stirling triangle of the first kind, the Legendre-Stirling triangle of the first kind, the Jacobi-Stirling triangle of the first kind, the Brenti's recursive matrix, and so on. △ Less

Submitted 9 August, 2023; originally announced August 2023.

MSC Class: 05A20; 05A15; 15B05;

arXiv:2307.05239 [pdf, ps, other]

The regularity of difference divisors

Authors: Baiqing Zhu

Abstract: We prove the regularity of difference divisors on unitary and GSpin Rapoport-Zink spaces. We prove the regularity of difference divisors on unitary and GSpin Rapoport-Zink spaces. △ Less

Submitted 24 August, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: 15 pages

arXiv:2306.11951 [pdf, ps, other]

On the Optimal Bounds for Noisy Computing

Authors: Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang

Abstract: We revisit the problem of computing with noisy information considered in Feige et al. 1994, which includes computing the OR function from noisy queries, and computing the MAX, SEARCH and SORT functions from noisy pairwise comparisons. For $K$ given elements, the goal is to correctly recover the desired function with probability at least $1-δ$ when the outcome of each query is flipped with probabil… ▽ More We revisit the problem of computing with noisy information considered in Feige et al. 1994, which includes computing the OR function from noisy queries, and computing the MAX, SEARCH and SORT functions from noisy pairwise comparisons. For $K$ given elements, the goal is to correctly recover the desired function with probability at least $1-δ$ when the outcome of each query is flipped with probability $p$. We consider both the adaptive sampling setting where each query can be adaptively designed based on past outcomes, and the non-adaptive sampling setting where the query cannot depend on past outcomes. The prior work provides tight bounds on the worst-case query complexity in terms of the dependence on $K$. However, the upper and lower bounds do not match in terms of the dependence on $δ$ and $p$. We improve the lower bounds for all the four functions under both adaptive and non-adaptive query models. Most of our lower bounds match the upper bounds up to constant factors when either $p$ or $δ$ is bounded away from $0$, while the ratio between the best prior upper and lower bounds goes to infinity when $p\rightarrow 0$ or $p\rightarrow 1/2$. On the other hand, we also provide matching upper and lower bounds for the number of queries in expectation, improving both the upper and lower bounds for the variable-length query model. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2304.10696 [pdf, ps, other]

Arithmetic Siegel-Weil formula on $\mathcal{X}_{0}(N)$

Authors: Baiqing Zhu

Abstract: We establish the arithmetic Siegel-Weil formula on the modular curve $\mathcal{X}_{0}(N)$ for arbitrary level $N$, i.e., we relate the arithmetic degrees of special cycles on $\mathcal{X}_{0}(N)$ to the derivatives of Fourier coefficients of a genus 2 Eisenstein series. We prove this formula by a precise identity between the local arithmetic intersection numbers on the Rapoport-Zink space associat… ▽ More We establish the arithmetic Siegel-Weil formula on the modular curve $\mathcal{X}_{0}(N)$ for arbitrary level $N$, i.e., we relate the arithmetic degrees of special cycles on $\mathcal{X}_{0}(N)$ to the derivatives of Fourier coefficients of a genus 2 Eisenstein series. We prove this formula by a precise identity between the local arithmetic intersection numbers on the Rapoport-Zink space associated to $\mathcal{X}_{0}(N)$ and the derivatives of local representation densities of quadratic forms. When $N$ is odd and square-free, this gives a different proof of the main results in [SSY22]. This local identity is proved by relating it to an identity in one dimension higher, but at hyperspecial level. △ Less

Submitted 6 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: 57 pages. arXiv admin note: text overlap with arXiv:2106.15038 by other authors

arXiv:2303.17824 [pdf, other]

Implementation and (Inverse Modified) Error Analysis for implicitly-templated ODE-nets

Authors: Aiqing Zhu, Tom Bertalan, Beibei Zhu, Yifa Tang, Ioannis G. Kevrekidis

Abstract: We focus on learning unknown dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers. First, we perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation. It is shown that training an ODE-net using an unrolled implicit scheme returns a close approximation of an Inverse Modified Differential Equation (I… ▽ More We focus on learning unknown dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers. First, we perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation. It is shown that training an ODE-net using an unrolled implicit scheme returns a close approximation of an Inverse Modified Differential Equation (IMDE). In addition, we establish a theoretical basis for hyper-parameter selection when training such ODE-nets, whereas current strategies usually treat numerical integration of ODE-nets as a black box. We thus formulate an adaptive algorithm which monitors the level of error and adapts the number of (unrolled) implicit solution iterations during the training process, so that the error of the unrolled approximation is less than the current learning loss. This helps accelerate training, while maintaining accuracy. Several numerical experiments are performed to demonstrate the advantages of the proposed algorithm compared to nonadaptive unrollings, and validate the theoretical analysis. We also note that this approach naturally allows for incorporating partially known physical terms in the equations, giving rise to what is termed ``gray box" identification. △ Less

Submitted 9 April, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

arXiv:2302.12965 [pdf, other]

A Weaker Regularity Condition for the Multidimensional $ν$-Moment Problem

Authors: Bin Zhu, Mattia Zorzi

Abstract: We consider the problem of finding a $d$-dimensional spectral density through a moment problem which is characterized by an integer parameter $ν$. Previous results showed that there exists an approximate solution under the regularity condition $ν\geq d/2+1$. To realize the process corresponding to such a spectral density, one would take $ν$ as small as possible. In this letter we show that this co… ▽ More We consider the problem of finding a $d$-dimensional spectral density through a moment problem which is characterized by an integer parameter $ν$. Previous results showed that there exists an approximate solution under the regularity condition $ν\geq d/2+1$. To realize the process corresponding to such a spectral density, one would take $ν$ as small as possible. In this letter we show that this condition can be weaken as $ν\geq d/2$. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 6 pages, 3 figures. Submitted to IEEE Control Systems Letters with the CDC option

MSC Class: 30E05

arXiv:2302.09415 [pdf, ps, other]

Optimal diameter estimates of three-dimensional Ricci limit spaces

Authors: Bo Zhu, Xingyu Zhu

Abstract: In this note, we prove that positive scalar curvature can pass to three dimensional Ricci limit spaces of non-negative Ricci curvature when it splits off a line. As a corollary, we obtain an optimal Bonnet-Myers type upper bound. Moreover, we obtain a similar statement in all dimensions for Alexandrov spaces of non-negative curvature. In this note, we prove that positive scalar curvature can pass to three dimensional Ricci limit spaces of non-negative Ricci curvature when it splits off a line. As a corollary, we obtain an optimal Bonnet-Myers type upper bound. Moreover, we obtain a similar statement in all dimensions for Alexandrov spaces of non-negative curvature. △ Less

Submitted 26 March, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: 6 pages, a similar result for non-negative sectional curvature is added

MSC Class: Primary 53C21

arXiv:2301.11270 [pdf, other]

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Authors: Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Abstract: We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model. However, we show that when training a policy based on the learned reward model, MLE fails while a pessim… ▽ More We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model. However, we show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions. Additionally, we demonstrate that under the PL model, the true MLE and an alternative MLE that splits the $K$-wise comparison into pairwise comparisons both converge. Moreover, the true MLE is asymptotically more efficient. Our results validate the empirical success of existing RLHF algorithms in InstructGPT and provide new insights for algorithm design. Furthermore, our results unify the problem of RLHF and max-entropy Inverse Reinforcement Learning (IRL), and provide the first sample complexity bound for max-entropy IRL. △ Less

Submitted 7 February, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2301.10437 [pdf, ps, other]

doi 10.1016/j.jalgebra.2023.08.031

Support $τ$-tilting subcategories in exact categories

Authors: Jixing Pan, Yaohua Zhang, Bin Zhu

Abstract: Let $\mathcal{E}=(\mathcal{A},\mathcal{S})$ be an exact category with enough projectives $\mathcal{P}$. We introduce the notion of support $τ$-tilting subcategories of $\mathcal{E}$. It is compatible with existing definitions of support $τ$-tilting modules (subcategories) in various context. It is also a generalization of tilting subcategories of exact categories. We show that there is a bijection… ▽ More Let $\mathcal{E}=(\mathcal{A},\mathcal{S})$ be an exact category with enough projectives $\mathcal{P}$. We introduce the notion of support $τ$-tilting subcategories of $\mathcal{E}$. It is compatible with existing definitions of support $τ$-tilting modules (subcategories) in various context. It is also a generalization of tilting subcategories of exact categories. We show that there is a bijection between support $τ$-tilting subcategories and certain $τ$-cotorsion pairs. Given a support $τ$-tilting subcategory $\mathcal{T}$, we find a subcategory $\mathcal{E}_{\mathcal{T}}$ of $\mathcal{E}$ which is an exact category and $\mathcal{T}$ is a tilting subcategory of $\mathcal{E}_{\mathcal{T}}$. If $\mathcal{E}$ is Krull-Schmidt, we prove the cardinal $|\mathcal{T}|$ is equal to the number of isomorphism classes of indecomposable projectives $Q$ such that ${\rm Hom}_{\mathcal{E}}(Q,\mathcal{T})\neq 0$. We also show a functorial version of Brenner-Butler's theorem. △ Less

Submitted 26 January, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: 20 pages. There are some modifications in the published version on J. Alg

MSC Class: 16G10; 18E40; 16S90; 18E99

Journal ref: J. Alg. 636(15): 455-482, 2023

arXiv:2301.06784 [pdf, other]

On the Statistical Consistency of a Generalized Cepstral Estimator

Authors: Bin Zhu, Mattia Zorzi

Abstract: We consider the problem to estimate the generalized cepstral coefficients of a stationary stochastic process or stationary multidimensional random field. It turns out that a naive version of the periodogram-based estimator for the generalized cepstral coefficients is not consistent. We propose a consistent estimator for those coefficients. Moreover, we show that the latter can be used in order to… ▽ More We consider the problem to estimate the generalized cepstral coefficients of a stationary stochastic process or stationary multidimensional random field. It turns out that a naive version of the periodogram-based estimator for the generalized cepstral coefficients is not consistent. We propose a consistent estimator for those coefficients. Moreover, we show that the latter can be used in order to build a consistent estimator for a particular class of cascade linear stochastic systems. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: 11 pages in IEEE Transactions template, 4 figures. Submitted to IEEE Transactions on Automatic Control

arXiv:2212.10880 [pdf, ps, other]

Mutation graph of support $τ$-tilting modules over a skew-gentle algebra

Authors: ** He, Yu Zhou, Bin Zhu

Abstract: Let $\mathcal{D}$ be a Hom-finite, Krull-Schmidt, 2-Calabi-Yau triangulated category with a rigid object $R$. Let $Λ=\operatorname{End}_{\mathcal{D}}R$ be the endomorphism algebra of $R$. We introduce the notion of mutation of maximal rigid objects in the two-term subcategory $R\ast R[1]$ via exchange triangles, which is shown to be compatible with mutation of support $τ$-tilting $Λ$-modules. In t… ▽ More Let $\mathcal{D}$ be a Hom-finite, Krull-Schmidt, 2-Calabi-Yau triangulated category with a rigid object $R$. Let $Λ=\operatorname{End}_{\mathcal{D}}R$ be the endomorphism algebra of $R$. We introduce the notion of mutation of maximal rigid objects in the two-term subcategory $R\ast R[1]$ via exchange triangles, which is shown to be compatible with mutation of support $τ$-tilting $Λ$-modules. In the case that $\mathcal{D}$ is the cluster category arising from a punctured marked surface, it is shown that the graph of mutations of support $τ$-tilting $Λ$-modules is isomorphic to the graph of flips of certain collections of tagged arcs on the surface, which is moreover proved to be connected. As a direct consequence, the mutation graph of support $τ$-tilting modules over a skew-gentle algebra is connected. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: 45 pages, 22 figures

arXiv:2212.10416 [pdf, ps, other]

Positive Scalar Curvature Meets Ricci Limit Spaces

Authors: **min Wang, Zhizhang Xie, Bo Zhu, Xingyu Zhu

Abstract: We investigate how the positive scalar curvature controls the size of a Ricci limit space when it comes from a sequence of $n$-manifolds with non-negative Ricci curvature and strictly positive scalar curvature lower bound. We prove such a limit space can split off $\mathbb{R}^{n-2}$ at most, and when the maximal splitting happens, the other non-splitting factor has an explicit uniform diameter upp… ▽ More We investigate how the positive scalar curvature controls the size of a Ricci limit space when it comes from a sequence of $n$-manifolds with non-negative Ricci curvature and strictly positive scalar curvature lower bound. We prove such a limit space can split off $\mathbb{R}^{n-2}$ at most, and when the maximal splitting happens, the other non-splitting factor has an explicit uniform diameter upper bound. Besides, we study some other consequences of having positive scalar curvature for manifolds using Ricci limit spaces techniques, for instance volume gap estimates and volume growth order estimates. △ Less

Submitted 26 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 22 pages. Some conditions added to the theorem 1.1 due to a gap in the original proof, and the proof is slightly changed accordingly. A corollary about the first Betti number is added

MSC Class: 53C21; 53C23

arXiv:2208.14372 [pdf, ps, other]

Dead-beat model predictive control for discrete-time linear systems

Authors: Bing Zhu

Abstract: In this paper, model predictive control (MPC) strategies are proposed for dead-beat control of linear systems with and without state and control constraints. In unconstrained MPC, deadbeat performance can be guaranteed by setting the control horizon to the system dimension, and adding an terminal equality constraint. It is proved that the unconstrained deadbeat MPC is equivalent to linear deadbeat… ▽ More In this paper, model predictive control (MPC) strategies are proposed for dead-beat control of linear systems with and without state and control constraints. In unconstrained MPC, deadbeat performance can be guaranteed by setting the control horizon to the system dimension, and adding an terminal equality constraint. It is proved that the unconstrained deadbeat MPC is equivalent to linear deadbeat control. The proposed constrained deadbeat MPC is designed by setting the control horizon equal to the system dimension and penalizing only the terminal cost. The recursive feasibility and deadbeat performance are proved theoretically. △ Less

Submitted 30 August, 2022; originally announced August 2022.

arXiv:2208.03875 [pdf, other]

doi 10.1088/1674-1056/aca9c8

Explicit K-symplectic methods for nonseparable non-canonical Hamiltonian systems

Authors: Beibei Zhu, Lun Ji, Aiqing Zhu, Yifa Tang

Abstract: We propose efficient numerical methods for nonseparable non-canonical Hamiltonian systems which are explicit, K-symplectic in the extended phase space with long time energy conservation properties. They are based on extending the original phase space to several copies of the phase space and imposing a mechanical restraint on the copies of the phase space. Explicit K-symplectic methods are construc… ▽ More We propose efficient numerical methods for nonseparable non-canonical Hamiltonian systems which are explicit, K-symplectic in the extended phase space with long time energy conservation properties. They are based on extending the original phase space to several copies of the phase space and imposing a mechanical restraint on the copies of the phase space. Explicit K-symplectic methods are constructed for three non-canonical Hamiltonian systems. Numerical results show that they outperform the higher order Runge-Kutta methods in preserving the phase orbit and the energy of the system over long time. △ Less

Submitted 7 August, 2022; originally announced August 2022.

arXiv:2206.09400 [pdf, ps, other]

doi 10.1016/j.jalgebra.2023.12.032

Ideal mutations in triangulated categories and generalized Auslander-Reiten theory

Authors: Yaohua Zhang, Bin Zhu

Abstract: We introduce the notion of ideal mutations in a triangulated category, which generalizes the version of Iyama and Yoshino \cite{iyama2008mutation} by replacing approximations by objects of a subcategory with approximations by morphisms of an ideal. As applications, for a Hom-finite Krull-Schmidt triangulated category $\mathcal{T}$ over an algebraically closed field $K$. (1) We generalize a theorem… ▽ More We introduce the notion of ideal mutations in a triangulated category, which generalizes the version of Iyama and Yoshino \cite{iyama2008mutation} by replacing approximations by objects of a subcategory with approximations by morphisms of an ideal. As applications, for a Hom-finite Krull-Schmidt triangulated category $\mathcal{T}$ over an algebraically closed field $K$. (1) We generalize a theorem of Jorgensen \cite[Theorem 3.3]{jorgensen2010quotients} to a more general setting; (2) We provide a method to detect whether $\mathcal{T}$ has Auslander-Reiten triangles or not by checking the necessary and sufficient conditions on its Jacobson radical $\mathcal{J}$: (i) $\mathcal{J}$ is functorially finite, (ii) Gh$_{\mathcal{J}}= {\rm CoGh}_{\mathcal{J}}$, and (iii) Gh$_{\mathcal{J}}$-source maps coincide with Gh$_{\mathcal{J}}$-sink maps; (3) We generalize the classical Auslander-Reiten theory by using ideal mutations. △ Less

Submitted 27 January, 2024; v1 submitted 19 June, 2022; originally announced June 2022.

Comments: 23 pages

MSC Class: 16G70; 18G80; 16N20

Journal ref: Journal of Algebra, 2024, 644: 191-221

arXiv:2206.07335 [pdf, other]

On Numerical Integration in Neural Ordinary Differential Equations

Authors: Aiqing Zhu, Pengzhan **, Beibei Zhu, Yifa Tang

Abstract: The combination of ordinary differential equations and neural networks, i.e., neural ordinary differential equations (Neural ODE), has been widely studied from various angles. However, deciphering the numerical integration in Neural ODE is still an open challenge, as many researches demonstrated that numerical integration significantly affects the performance of the model. In this paper, we propos… ▽ More The combination of ordinary differential equations and neural networks, i.e., neural ordinary differential equations (Neural ODE), has been widely studied from various angles. However, deciphering the numerical integration in Neural ODE is still an open challenge, as many researches demonstrated that numerical integration significantly affects the performance of the model. In this paper, we propose the inverse modified differential equations (IMDE) to clarify the influence of numerical integration on training Neural ODE models. IMDE is determined by the learning task and the employed ODE solver. It is shown that training a Neural ODE model actually returns a close approximation of the IMDE, rather than the true ODE. With the help of IMDE, we deduce that (i) the discrepancy between the learned model and the true ODE is bounded by the sum of discretization error and learning loss; (ii) Neural ODE using non-symplectic numerical integration fail to learn conservation laws theoretically. Several experiments are performed to numerically verify our theoretical analysis. △ Less

Submitted 15 June, 2022; originally announced June 2022.

arXiv:2205.05281 [pdf, ps, other]

doi 10.4208/cicp.OA-2022-0144

Poisson Integrators based on splitting method for Poisson systems

Authors: Beibei Zhu, Lun Ji, Aiqing Zhu, Yifa Tang

Abstract: We propose Poisson integrators for the numerical integration of separable Poisson systems. We analyze three situations in which the Poisson systems are separated in three ways and the Poisson integrators can be constructed by using the splitting method. Numerical results show that the Poisson integrators outperform the higher order non-Poisson integrators in phase orbit tracking, long-term energy… ▽ More We propose Poisson integrators for the numerical integration of separable Poisson systems. We analyze three situations in which the Poisson systems are separated in three ways and the Poisson integrators can be constructed by using the splitting method. Numerical results show that the Poisson integrators outperform the higher order non-Poisson integrators in phase orbit tracking, long-term energy conservation and efficiency. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2204.13858 [pdf, other]

One-Way Matching of Datasets with Low Rank Signals

Authors: Shuxiao Chen, Sizun Jiang, Zongming Ma, Garry P. Nolan, Bokai Zhu

Abstract: We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching under a mismatch proportion loss. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated exampl… ▽ More We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching under a mismatch proportion loss. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated examples. Furthermore, we illustrate practical use of the matching procedure on two single-cell data examples. △ Less

Submitted 3 October, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.00980 [pdf, ps, other]

Global existence and optimal decay rate of the classical solution to 3-D Radiative Hydrodynamics with and without Heat Conductivity

Authors: Guiqiong Gong, Boran Zhu, Jiawei Zhou

Abstract: The classical solution of the 3-D radiative hydrodynamics model is studied in $H^k$-norm under two different conditions, with and without heat conductivity. We have proved the following results in both cases. First, when the $H^k$ norm of the initial perturbation around a constant state is sufficiently small and the integer $k\geq2$, a unique classical solution to such Cauchy problem is shown to e… ▽ More The classical solution of the 3-D radiative hydrodynamics model is studied in $H^k$-norm under two different conditions, with and without heat conductivity. We have proved the following results in both cases. First, when the $H^k$ norm of the initial perturbation around a constant state is sufficiently small and the integer $k\geq2$, a unique classical solution to such Cauchy problem is shown to exist. Second, if we further assume that the $L^1$ norm of the initial perturbation is small too, the i-order($0\leq i\leq k-2$) derivative of the solutions have the decay rate of $(1+t)^{-\frac 34-\frac i2}$ in $H^2$ norm. Third, from the results above we can see that for radiative hydrodynamics, the radiation can do the same job as the heat conduction, which means if the thermal conductivity coefficient turns to $0$, because of the effect of radiation, the solvability of the system and decay rate of the solution stay the same. △ Less

Submitted 27 April, 2022; v1 submitted 2 April, 2022; originally announced April 2022.

arXiv:2204.00860 [pdf, other]

doi 10.1093/imrn/rnab360

On the $L_p$ Brunn-Minkowski theory and the $L_p$ Minkowski problem for $C$-coconvex sets

Authors: ** Yang, De** Ye, Baocheng Zhu

Abstract: Let $C$ be a pointed closed convex cone in $\mathbb{R}^n$ with vertex at the origin $o$ and having nonempty interior. The set $A\subset C$ is $C$-coconvex if the volume of $A$ is finite and $A^{\bullet}=C\setminus A$ is a closed convex set. For $0<p<1$, the $p$-co-sum of $C$-coconvex sets is introduced, and the corresponding $L_p$ Brunn-Minkowski inequality for $C$-coconvex sets is established. We… ▽ More Let $C$ be a pointed closed convex cone in $\mathbb{R}^n$ with vertex at the origin $o$ and having nonempty interior. The set $A\subset C$ is $C$-coconvex if the volume of $A$ is finite and $A^{\bullet}=C\setminus A$ is a closed convex set. For $0<p<1$, the $p$-co-sum of $C$-coconvex sets is introduced, and the corresponding $L_p$ Brunn-Minkowski inequality for $C$-coconvex sets is established. We also define the $L_p$ surface area measures, for $0\neq p\in \mathbb{R}$, of certain $C$-coconvex sets, which are critical in deriving a variational formula of the volume of the Wulff shape associated with a family of functions obtained from the $p$-co-sum. This motivates the $L_p$ Minkowski problem aiming to characterize the $L_p$ surface area measures of $C$-coconvex sets. The existence of solutions to the $L_p$ Minkowski problem for all $0\neq p\in \mathbb{R}$ is established. The $L_p$ Minkowski inequality for $0<p<1$ is proved and is used to obtain the uniqueness of the solutions to the $L_p$ Minkowski problem for $0<p<1$. For $p=0$, we introduce $(1-τ)\diamond A_1\oplus_0τ\diamond A_2$, the log-co-sum of two $C$-coconvex sets $A_{1}$ and $A_{2}$ with respect to $τ\in(0, 1)$, and prove the log-Brunn-Minkowski inequality of $C$-coconvex sets. The log-Minkowski inequality is also obtained and is applied to prove the uniqueness of the solutions to the log-Minkowski problem that characterizes the cone-volume measures of $C$-coconvex sets. Our result solves an open problem raised by Schneider in [Schneider, Adv. Math., 332 (2018), pp. 199-219]. △ Less

Submitted 2 April, 2022; originally announced April 2022.

Comments: Int. Math. Res. Not., in press

MSC Class: 53A15; 52B45; 52A39

arXiv:2202.13022 [pdf, other]

Arbitrary Order Energy and Enstrophy Conserving Finite Element Methods for 2D Incompressible Fluid Dynamics and Drift-Reduced Magnetohydrodynamics

Authors: Milan Holec, Ben Zhu, Ilon Joseph, Christopher J. Vogl, Ben S. Southworth, Alejandro Campos, Andris M. Dimits, Will E. Pazner

Abstract: Maintaining conservation laws in the fully discrete setting is critical for accurate long-time behavior of numerical simulations and requires accounting for discrete conservation properties in both space and time. This paper derives arbitrary order finite element exterior calculus spatial discretizations for the two-dimensional (2D) Navier-Stokes and drift-reduced magnetohydrodynamic equations tha… ▽ More Maintaining conservation laws in the fully discrete setting is critical for accurate long-time behavior of numerical simulations and requires accounting for discrete conservation properties in both space and time. This paper derives arbitrary order finite element exterior calculus spatial discretizations for the two-dimensional (2D) Navier-Stokes and drift-reduced magnetohydrodynamic equations that conserve both energy and enstrophy to machine precision when coupled with generally symplectic time-integration methods. Both continuous and discontinuous-Galerkin (DG) weak formulations can ensure conservation, but only generally symplectic time integration methods, such as the implicit midpoint method, permit exact conservation in time. Moreover, the symplectic implicit midpoint method yields an order of magnitude speedup over explicit schemes. The methods are implemented using the MFEM library and the solutions are verified for an extensive suite of 2D neutral fluid turbulence test problems. Numerical solutions are verified via comparison to a semi-analytic linear eigensolver as well as to the finite difference Global Drift Ballooning (GDB) code. However, it is found that turbulent simulations that conserve both energy and enstrophy tend to have too much power at high wavenumber and that this part of the spectrum should be controlled by reintroducing artificial dissipation. The DG formulation allows upwinding of the advection operator which dissipates enstrophy while still maintaining conservation of energy. Coupling upwinded DG with implicit symplectic integration appears to offer the best compromise of allowing mid-range wavenumbers to reach the appropriate amplitude while still controlling the high-wavenumber part of the spectrum. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.03793 [pdf, ps, other]

Coefficientwise Hankel-total positivity of the row-generating polynomials for the output matrices of certain production matrices

Authors: Bao-Xuan Zhu

Abstract: Total positivity of matrices is deeply studied and plays an important role in various branches of mathematics. The aim of this paper is to study the criteria for coefficientwise Hankel-total positivity of the row-generating polynomials of generalized $m$-Jacobi-Rogers triangles and their applications. Using the theory of production matrices, we present the criteria for coefficientwise Hankel-tot… ▽ More Total positivity of matrices is deeply studied and plays an important role in various branches of mathematics. The aim of this paper is to study the criteria for coefficientwise Hankel-total positivity of the row-generating polynomials of generalized $m$-Jacobi-Rogers triangles and their applications. Using the theory of production matrices, we present the criteria for coefficientwise Hankel-total positivity of the row-generating polynomials of the output matrices of certain production matrices. In particular, we gain a criterion for coefficientwise Hankel-total positivity of the row-generating polynomial sequence of the generalized $m$-Jacobi-Rogers triangle. This immediately implies that the corresponding generalized $m$-Jacobi-Rogers triangular convolution preserves the Stieltjes moment property of sequences and its zeroth column sequence is coefficientwise Hankel-totally positive and log-convex of higher order in all the indeterminates. In consequence, for $m=1$, we immediately obtain some results on Hankel-total positivity for the Catalan-Stieltjes matrices. In particular, we in a unified manner apply our results to some combinatorial triangles or polynomials including the generalized Jacobi Stirling triangle, a generalized elliptic polynomial, a refined Stirling cycle polynomial and a refined Eulerian polynomial. For the general $m$, combining our criterion and a function satisfying an autonomous differential equation, we present different criteria for coefficientwise Hankel-total positivity of the row-generating polynomial sequence of exponential Rirodan arrays. In addition, we also derive some results for coefficientwise Hankel-total positivity in terms of compositional functions and $m$-branched Stieltjes-type continued fractions. We apply our results to many combinatorial polynomials and solve some conjcetures proposed by Sokal. △ Less

Submitted 22 April, 2024; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: This paper has been accepted by Advances in Mathematics

arXiv:2202.01269 [pdf, ps, other]

Robust Estimation for Nonparametric Families via Generative Adversarial Networks

Authors: Banghua Zhu, Jiantao Jiao, Michael I. Jordan

Abstract: We provide a general framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems, which aim at estimating unknown parameter of the true distribution given adversarially corrupted samples. Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptic… ▽ More We provide a general framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems, which aim at estimating unknown parameter of the true distribution given adversarially corrupted samples. Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptical distributions, and analyze depth or scoring rule based GAN losses for the problem. Our work extend these to robust mean estimation, second moment estimation, and robust linear regression when the true distribution only has bounded Orlicz norms, which includes the broad family of sub-Gaussian, sub-Exponential and bounded moment distributions. We also provide a different set of sufficient conditions for the GAN loss to work: we only require its induced distance function to be a cumulative density function of some light-tailed distribution, which is easily satisfied by neural networks with sigmoid activation. In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance, which overcomes the computational intractability of the original Kolmogorov-Smirnov distance used in the prior work. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2201.12668 [pdf, ps, other]

Geometry of positive scalar curvature on complete manifold

Authors: Bo Zhu

Abstract: In this paper, we study the interplay of geometry and positive scalar curvature on a complete, non-compact manifold with non-negative Ricci curvature. In three-dimensional manifold, we prove a minimal volume growth, an estimate of integral of scalar curvature and width. In higher dimensional manifold, we obtain a volume growth with a stronger condition. In this paper, we study the interplay of geometry and positive scalar curvature on a complete, non-compact manifold with non-negative Ricci curvature. In three-dimensional manifold, we prove a minimal volume growth, an estimate of integral of scalar curvature and width. In higher dimensional manifold, we obtain a volume growth with a stronger condition. △ Less

Submitted 29 January, 2022; originally announced January 2022.

MSC Class: 53C21

arXiv:2110.06425 [pdf, other]

A Well-Posed Multidimensional Rational Covariance and Generalized Cepstral Extension Problem

Authors: Bin Zhu, Mattia Zorzi

Abstract: In the present paper we consider the problem of estimating the multidimensional power spectral density which describes a second-order stationary random field from a finite number of covariance and generalized cepstral coefficients. The latter can be framed as an optimization problem subject to multidimensional moment constraints, i.e., to search a spectral density maximizing an entropic index and… ▽ More In the present paper we consider the problem of estimating the multidimensional power spectral density which describes a second-order stationary random field from a finite number of covariance and generalized cepstral coefficients. The latter can be framed as an optimization problem subject to multidimensional moment constraints, i.e., to search a spectral density maximizing an entropic index and matching the moments. In connection with systems and control, such a problem can also be posed as finding a multidimensional sha** filter (i.e., a linear time-invariant system) which can output a random field that has identical moments with the given data when fed with a white noise, a fundamental problem in system identification. In particular, we consider the case where the dimension of the random field is greater than two for which a satisfying theory is still missing. We propose a multidimensional moment problem which takes into account a generalized definition of the cepstral moments, together with a consistent definition of the entropy. We show that it is always possible to find a rational power spectral density matching exactly the covariances and approximately the generalized cepstral coefficients, from which a sha** filter can be constructed via spectral factorization. In plain words, our theory allows to construct a well-posed spectral estimator for any finite dimension. △ Less

Submitted 6 January, 2023; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: 25 pages using the SIAM template, 1 figure; accepted for publication in SIAM Journal on Control and Optimization (SICON)

MSC Class: 42A70; 30E05; 47A57; 60G12

arXiv:2109.14926 [pdf, other]

A Fast Robust Numerical Continuation Solver to a Two-Dimensional Spectral Estimation Problem

Authors: Bin Zhu, Jiahao Liu

Abstract: This paper presents a fast algorithm to solve a spectral estimation problem for two-dimensional random fields. The latter is formulated as a convex optimization problem with the Itakura-Saito pseudodistance as the objective function subject to the constraints of moment equations. We exploit the structure of the Hessian of the dual objective function in order to make possible a fast Newton solver.… ▽ More This paper presents a fast algorithm to solve a spectral estimation problem for two-dimensional random fields. The latter is formulated as a convex optimization problem with the Itakura-Saito pseudodistance as the objective function subject to the constraints of moment equations. We exploit the structure of the Hessian of the dual objective function in order to make possible a fast Newton solver. Then we incorporate the Newton solver to a predictor-corrector numerical continuation method which is able to produce a parametrized family of solutions to the moment equations. We have performed two sets of numerical simulations to test our algorithm and spectral estimator. The simulations on the frequency estimation problem shows that our spectral estimator outperforms the classical windowed periodograms in the case of two hidden frequencies and has a higher resolution. The other set of simulations on system identification indicates that the numerical continuation method is more robust than Newton's method alone in ill-conditioned instances. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: 13 pages, 8 figures

arXiv:2109.12715 [pdf, ps, other]

Uryson width of three dimensional mean convex domain with non-negative Ricci curvature

Authors: Zhichao Wang, Bo Zhu

Abstract: We prove that for every three dimensional manifold with nonnegative Ricci curvature and strictly mean convex boundary, there exists a Morse function so that each connected component of its level sets has a uniform diameter bound, which depends only on the lower bound of mean curvature. This gives an upper bound of Uryson 1-width for those three manifolds with boundary. We prove that for every three dimensional manifold with nonnegative Ricci curvature and strictly mean convex boundary, there exists a Morse function so that each connected component of its level sets has a uniform diameter bound, which depends only on the lower bound of mean curvature. This gives an upper bound of Uryson 1-width for those three manifolds with boundary. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 18 pages; comments are welcome!

arXiv:2109.06255 [pdf, other]

Implicit Regularization Effects of the Sobolev Norms in Image Processing

Authors: Bowen Zhu, **gwei Hu, Yifei Lou, Yunan Yang

Abstract: In this paper, we propose to use the general $L^2$-based Sobolev norms, i.e., $H^s$ norms where $s\in \mathbb{R}$, to measure the data discrepancy due to noise in image processing tasks that are formulated as optimization problems. As opposed to a popular trend of develo** regularization methods, we emphasize that an implicit regularization effect can be achieved through the class of Sobolev nor… ▽ More In this paper, we propose to use the general $L^2$-based Sobolev norms, i.e., $H^s$ norms where $s\in \mathbb{R}$, to measure the data discrepancy due to noise in image processing tasks that are formulated as optimization problems. As opposed to a popular trend of develo** regularization methods, we emphasize that an implicit regularization effect can be achieved through the class of Sobolev norms as the data-fitting term. Specifically, we analyze that the implicit regularization comes from the weights that the $H^s$ norm imposes on different frequency contents of an underlying image. We further analyze the underlying noise assumption of using the Sobolev norm as the data-fitting term from a Bayesian perspective, build the connections with the Sobolev gradient-based methods and discuss the preconditioning effects on the convergence rate of the gradient descent algorithm, leading to a better understanding of functional spaces/metrics and the optimization process involved in image processing. Numerical results in full waveform inversion, image denoising and deblurring demonstrate the implicit regularization effects. △ Less

Submitted 28 February, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: 21 pages, 8 figures

MSC Class: 65K10; 46E36; 68U10; 49N45; 92C55; 49Q22

arXiv:2108.07964 [pdf, ps, other]

Silting reduction in extriangulated categories

Authors: Yu Liu, Panyue Zhou, Yu Zhou, Bin Zhu

Abstract: Presilting and silting subcategories in extriangulated categories were introduced by Adachi and Tsukamoto recently. In this paper, we prove that the Gabriel-Zisman localization $\mathcal B/({\rm thick}\mathcal W)$ of an extriangulated category $\mathcal B$ with respect to a presilting subcategory $\mathcal W$ satisfying certain condition can be realized as a subfactor category of $\mathcal B$. Thi… ▽ More Presilting and silting subcategories in extriangulated categories were introduced by Adachi and Tsukamoto recently. In this paper, we prove that the Gabriel-Zisman localization $\mathcal B/({\rm thick}\mathcal W)$ of an extriangulated category $\mathcal B$ with respect to a presilting subcategory $\mathcal W$ satisfying certain condition can be realized as a subfactor category of $\mathcal B$. This generalizes the result by Iyama-Yang for silting reduction on triangulated categories. Then we discuss the relation between silting subcategories and tilting subcategories in extriangulated categories, this gives us a kind of important examples of our results. In particular, for a finite dimensional Gorenstein algebra, we get the relative version of the description of the singularity category due to Happel and Chen-Zhang by this reduction. △ Less

Submitted 8 October, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: 22 pages

arXiv:2108.05477 [pdf, ps, other]

On a dichotomy of the curvature decay of steady Ricci soliton

Authors: Pak-Yeung Chan, Bo Zhu

Abstract: We establish a dichotomy on the curvature decay for four dimensional complete noncompact non Ricci flat steady gradient Ricci soliton with linear curvature decay and proper potential function. A similar dichotomy is also shown in higher dimensions under the additional assumption that the Ricci curvature is nonnegative outside a compact subset. We establish a dichotomy on the curvature decay for four dimensional complete noncompact non Ricci flat steady gradient Ricci soliton with linear curvature decay and proper potential function. A similar dichotomy is also shown in higher dimensions under the additional assumption that the Ricci curvature is nonnegative outside a compact subset. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: 30 pages

MSC Class: 53C21

arXiv:2106.12176 [pdf, ps, other]

Stability of combinatorial polynomials and its applications

Authors: Ming-Jian Ding, Bao-Xuan Zhu

Abstract: The aim of this paper is to make a systematical study on the stability of polynomials in combinatorics. Applying the characterizations of Borcea and Brändén concerning linear operators preserving stability, we present criteria for real stability and Hurwitz stability. We also give a criterion for Hurwitz stability of the Turán expressions. As applications, we derive some stability results occurr… ▽ More The aim of this paper is to make a systematical study on the stability of polynomials in combinatorics. Applying the characterizations of Borcea and Brändén concerning linear operators preserving stability, we present criteria for real stability and Hurwitz stability. We also give a criterion for Hurwitz stability of the Turán expressions. As applications, we derive some stability results occurred in the literature in a unified manner. In addition, we obtain the Hurwitz stability of Turán expressions for alternating runs polynomials of types $A$ and $B$ and solve a conjecture concerning Hurwitz stability of alternating runs polynomials defined on a dual set of Stirling permutations. Furthermore, we prove that the Hurwitz stability of any symmetric polynomial implies its semi-$γ$-positivity. We study a class of symmetric polynomials and derive many nice properties including Hurwitz stability, semi-$γ$-positivity, non $γ$-positivity, unimodality, strong $q$-log-convexity, the Jacobi continued fraction expansion and the relation with derivative polynomials. In particular, these properties of the alternating descents polynomials of types $A$ and $B$ can be obtained in a unified approach. Finally, we use real stability to prove a criterion for zeros interlacing between a polynomial and its reciprocal polynomial, which implies the alternatingly increasing property. This criterion extends a result of Brändén and Solus and unifies such properties for many combinatorial polynomials, including ascent polynomials for $k$-ary words, descent polynomials on signed Stirling permutations and $q$-analog of descent polynomials on colored permutations, and so on. We prove the alternatingly increasing property and zeros interlacing for two kinds of peak polynomials on the dual set of Stirling permutations. △ Less

Submitted 24 June, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Comments: We delete original Proposition 4.16 and adjust the order of some References. We also correct some typos

MSC Class: 05A15; 26C10; 05A20; 30B70

arXiv:2105.03952 [pdf, ps, other]

On the Musielak-Orlicz-Gauss image problem

Authors: Qingzhong Huang, Sudan Xing, De** Ye, Baocheng Zhu

Abstract: In the present paper we initiate the study of the Musielak-Orlicz-Brunn-Minkowski theory for convex bodies. In particular, we develop the Musielak-Orlicz-Gauss image problem aiming to characterize the Musielak-Orlicz-Gauss image measure of convex bodies. For a convex body $K$, its Musielak-Orlicz-Gauss image measure, denoted by $\widetilde{C}_Θ(K, \cdot)$, involves a triple $Θ=(G, Ψ, λ)$ where… ▽ More In the present paper we initiate the study of the Musielak-Orlicz-Brunn-Minkowski theory for convex bodies. In particular, we develop the Musielak-Orlicz-Gauss image problem aiming to characterize the Musielak-Orlicz-Gauss image measure of convex bodies. For a convex body $K$, its Musielak-Orlicz-Gauss image measure, denoted by $\widetilde{C}_Θ(K, \cdot)$, involves a triple $Θ=(G, Ψ, λ)$ where $G$ and $Ψ$ are two Musielak-Orlicz functions defined on $S^{n-1}\times (0, \infty)$ and $λ$ is a nonzero finite Lebesgue measure on the unit sphere $S^{n-1}$. Such a measure can be produced by a variational formula of $\widetilde{V}_{G, λ}(K)$ (the general dual volume of $K$ with respect to $λ$) under the perturbations of $K$ by the Musielak-Orlicz addition defined via the function $Ψ$. The Musielak-Orlicz-Gauss image problem contains many intensively studied Minkowski type problems and the recent Gauss image problem as its special cases. Under the condition that $G$ is decreasing on its second variable, the existence of solutions to this problem is established. △ Less

Submitted 9 May, 2021; originally announced May 2021.

MSC Class: 52A20; 52A30; 52A39; 52A40

arXiv:2103.12021 [pdf, other]

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Authors: Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

Abstract: Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets. From a practical standpoint, datasets o… ▽ More Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets. From a practical standpoint, datasets often deviate from these two extremes and the exact data composition is usually unknown a priori. To bridge this gap, we present a new offline RL framework that smoothly interpolates between the two extremes of data composition, hence unifying imitation learning and vanilla offline RL. The new framework is centered around a weak version of the concentrability coefficient that measures the deviation from the behavior policy to the expert policy alone. Under this new framework, we further investigate the question on algorithm design: can one develop an algorithm that achieves a minimax optimal rate and also adapts to unknown data composition? To address this question, we consider a lower confidence bound (LCB) algorithm developed based on pessimism in the face of uncertainty in offline RL. We study finite-sample properties of LCB as well as information-theoretic limits in multi-armed bandits, contextual bandits, and Markov decision processes (MDPs). Our analysis reveals surprising facts about optimality rates. In particular, in all three settings, LCB achieves a faster rate of $1/N$ for nearly-expert datasets compared to the usual rate of $1/\sqrt{N}$ in offline RL, where $N$ is the number of samples in the batch dataset. In the case of contextual bandits with at least two contexts, we prove that LCB is adaptively optimal for the entire data composition range, achieving a smooth transition from imitation learning to offline RL. We further show that LCB is almost adaptively optimal in MDPs. △ Less

Submitted 3 July, 2023; v1 submitted 22 March, 2021; originally announced March 2021.

Journal ref: Published at NeurIPS 2021 and IEEE Transactions on Information Theory

arXiv:2101.07781 [pdf, other]

Minimax Off-Policy Evaluation for Multi-Armed Bandits

Authors: Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright

Abstract: We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch estimator, a method that alternates between the plug-in and importance sampling estimators, is minimax rate-optimal for all sample sizes. Second, when the behavior poli… ▽ More We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch estimator, a method that alternates between the plug-in and importance sampling estimators, is minimax rate-optimal for all sample sizes. Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies. When the behavior policy is unknown, any estimator must have mean-squared error larger -- relative to the oracle estimator equipped with the knowledge of the behavior policy -- by a multiplicative factor proportional to the support size of the target policy. Moreover, we demonstrate that the plug-in approach achieves this worst-case competitive ratio up to a logarithmic factor. Third, we initiate the study of the partial knowledge setting in which it is assumed that the minimum probability taken by the behavior policy is known. We show that the plug-in estimator is optimal for relatively large values of the minimum probability, but is sub-optimal when the minimum probability is low. In order to remedy this gap, we propose a new estimator based on approximation by Chebyshev polynomials that provably achieves the optimal estimation error. Numerical experiments on both simulated and real data corroborate our theoretical findings. △ Less

Submitted 19 January, 2021; originally announced January 2021.

arXiv:2009.05758 [pdf, ps, other]

Approximation of stationary processes and Toeplitz Spectra

Authors: Giorgio Picci, Bin Zhu

Abstract: We study the approximation of stationary processes by a simple class of purely deterministic signals. This has an analytic counterpart in the approximation of symmetric positive definite Toeplitz matrices by submatrices of finite rank. We propose a notion of distance between them and prove a weak sense approximation result. We study the approximation of stationary processes by a simple class of purely deterministic signals. This has an analytic counterpart in the approximation of symmetric positive definite Toeplitz matrices by submatrices of finite rank. We propose a notion of distance between them and prove a weak sense approximation result. △ Less

Submitted 12 September, 2020; originally announced September 2020.

arXiv:2009.01058 [pdf, other]

Inverse modified differential equations for discovery of dynamics

Authors: Aiqing Zhu, Pengzhan **, Beibei Zhu, Yifa Tang

Abstract: The combination of numerical integration and deep learning, i.e., ODE-net, has been successfully employed in a variety of applications. In this work, we introduce inverse modified differential equations (IMDE) to contribute to the behaviour and error analysis of discovery of dynamics using ODE-net. It is shown that the difference between the learned ODE and the truncated IMDE is bounded by the sum… ▽ More The combination of numerical integration and deep learning, i.e., ODE-net, has been successfully employed in a variety of applications. In this work, we introduce inverse modified differential equations (IMDE) to contribute to the behaviour and error analysis of discovery of dynamics using ODE-net. It is shown that the difference between the learned ODE and the truncated IMDE is bounded by the sum of learning loss and a discrepancy which can be made sub exponentially small. In addition, we deduce that the total error of ODE-net is bounded by the sum of discrete error and learning loss. Furthermore, with the help of IMDE, theoretical results on learning Hamiltonian system are derived. Several experiments are performed to numerically verify our theoretical results. △ Less

Submitted 13 August, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

arXiv:2008.04120 [pdf, ps, other]

On a Stirling-Whitney-Riordan triangle

Authors: Bao-Xuan Zhu

Abstract: Based on the Stirling triangle of the second kind, the Whitney triangle of the second kind and one triangle of Riordan, we study a Stirling-Whitney-Riordan triangle $[T_{n,k}]_{n,k}$ satisfying the recurrence relation: \begin{eqnarray*} T_{n,k}&=&(b_1k+b_2)T_{n-1,k-1}+[(2λb_1+a_1)k+a_2+λ( b_1+b_2)] T_{n-1,k}+\\ &&λ(a_1+λb_1)(k+1)T_{n-1,k+1}, \end{eqnarray*} where initial conditions $T_{n,k}=0$ unl… ▽ More Based on the Stirling triangle of the second kind, the Whitney triangle of the second kind and one triangle of Riordan, we study a Stirling-Whitney-Riordan triangle $[T_{n,k}]_{n,k}$ satisfying the recurrence relation: \begin{eqnarray*} T_{n,k}&=&(b_1k+b_2)T_{n-1,k-1}+[(2λb_1+a_1)k+a_2+λ( b_1+b_2)] T_{n-1,k}+\\ &&λ(a_1+λb_1)(k+1)T_{n-1,k+1}, \end{eqnarray*} where initial conditions $T_{n,k}=0$ unless $0\le k\le n$ and $T_{0,0}=1$. We prove that the Stirling-Whitney-Riordan triangle $[T_{n,k}]_{n,k}$ is $\textbf{x}$-totally positive with $\textbf{x}=(a_1,a_2,b_1,b_2,λ)$. We show that the row-generating function $T_n(q)$ has only real zeros and the Turán-type polynomial $T_{n+1}(q)T_{n-1}(q)-T^2_n(q)$ is stable. We also present explicit formulae for $T_{n,k}$ and the exponential generating function of $T_n(q)$ and give a Jacobi continued fraction expansion for the ordinary generating function of $T_n(q)$. Furthermore, we get the $\textbf{x}$-Stieltjes moment property and $3$-$\textbf{x}$-log-convexity of $T_n(q)$ and show that the triangular convolution $z_n=\sum_{i=0}^nT_{n,i}x_iy_{n-i}$ preserves Stieltjes moment property of sequences. Finally, for the first column $(T_{n,0})_{n\geq0}$, we derive some properties similar to those of $(T_n(q))_{n\geq0}.$ △ Less

Submitted 24 March, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: To appear in Journal of Algebraic Combinatorics

MSC Class: 05A20; 05A15; 11A55; 15B48; 30B70; 44A60

arXiv:2007.14924 [pdf, ps, other]

Stieltjes moment properties and continued fractions from combinatorial triangles

Authors: Bao-Xuan Zhu

Abstract: Many combinatorial numbers can be placed in the following generalized triangular array $[T_{n,k}]_{n,k\ge 0}$ satisfying the recurrence relation: \begin{equation*} T_{n,k}=λ(a_0n+a_1k+a_2)T_{n-1,k}+(b_0n+b_1k+b_2)T_{n-1,k-1}+\frac{d(da_1-b_1)}λ(n-k+1)T_{n-1,k-2} \end{equation*} with $T_{0,0}=1$ and $T_{n,k}=0$ unless $0\le k\le n$ for suitable $a_0,a_1,a_2,b_0,b_1,b_2,d$ and $λ$. For $n\geq0$, den… ▽ More Many combinatorial numbers can be placed in the following generalized triangular array $[T_{n,k}]_{n,k\ge 0}$ satisfying the recurrence relation: \begin{equation*} T_{n,k}=λ(a_0n+a_1k+a_2)T_{n-1,k}+(b_0n+b_1k+b_2)T_{n-1,k-1}+\frac{d(da_1-b_1)}λ(n-k+1)T_{n-1,k-2} \end{equation*} with $T_{0,0}=1$ and $T_{n,k}=0$ unless $0\le k\le n$ for suitable $a_0,a_1,a_2,b_0,b_1,b_2,d$ and $λ$. For $n\geq0$, denote by $T_n(q)$ the generating function of the $n$-th row. In this paper, we develop various criteria for $\textbf{x}$-Stieltjes moment property and $3$-$\textbf{x}$-log-convexity of $T_n(q)$ based on the Jacobi continued fraction expression of $\sum_{n\geq0}T_n(q)t^n$, where $\textbf{x}$ is a set of indeterminates consisting of $q$ and those parameters occurring in the recurrence relation. With the help of a criterion of Wang and Zhu [Adv. in Appl. Math. (2016)], we show that the corresponding linear transformation of $T_{n,k}$ preserves Stieltjes moment properties of sequences. Finally, we present some related examples including factorial numbers, Whitney numbers, Stirling permutations, minimax trees and peak statistics. △ Less

Submitted 1 June, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: Advances in Applied Mathematics, 130 (2021) 102232, 33pp

MSC Class: 05A20; 05A15; 11A55; 11B83; 15B48; 30B70;

arXiv:2007.12602 [pdf, ps, other]

A unified approach to combinatorial triangles: a generalized Eulerian polynomial

Authors: Bao-Xuan Zhu

Abstract: Motivated by the classical Eulerian number, descent and excedance numbers in the hyperoctahedral groups, an triangular array from staircase tableaux and so on, we study a triangular array $[\mathcal {T}_{n,k}]_{n,k\ge 0}$ satisfying the recurrence relation: \begin{equation*} \mathcal {T}_{n,k}=λ(a_0n+a_1k+a_2)\mathcal {T}_{n-1,k}+(b_0n+b_1k+b_2)\mathcal {T}_{n-1,k-1}+\frac{cd}λ(n-k+1)\mathcal {T}_… ▽ More Motivated by the classical Eulerian number, descent and excedance numbers in the hyperoctahedral groups, an triangular array from staircase tableaux and so on, we study a triangular array $[\mathcal {T}_{n,k}]_{n,k\ge 0}$ satisfying the recurrence relation: \begin{equation*} \mathcal {T}_{n,k}=λ(a_0n+a_1k+a_2)\mathcal {T}_{n-1,k}+(b_0n+b_1k+b_2)\mathcal {T}_{n-1,k-1}+\frac{cd}λ(n-k+1)\mathcal {T}_{n-1,k-2} \end{equation*} with $\mathcal {T}_{0,0}=1$ and $\mathcal {T}_{n,k}=0$ unless $0\le k\le n$. We derive a functional transformation for its row-generating function $\mathcal{T}_n(x)$ from the row-generating function $A_n(x)$ of another array $[A_{n,k}]_{n,k}$ satisfying a two-term recurrence relation. Based on this transformation, we can get properties of $\mathcal {T}_{n,k}$ and $\mathcal{T}_n(x)$ including nonnegativity, log-concavity, real rootedness, explicit formula and so on. Then we extend the famous Frobenius formula, the $γ$ positivity decomposition and the David-Barton formula for the classical Eulerian polynomial to those of a generalized Eulerian polynomial. We also get an identity for the generalized Eulerian polynomial with the general derivative polynomial. Finally, we apply our results to an array from the Lambert function, a triangular array from staircase tableaux and the alternating-runs triangle of type $B$ in a unified approach. △ Less

Submitted 24 July, 2020; originally announced July 2020.

MSC Class: 05A15; 05A19; 05A20; 26C10

arXiv:2007.05321 [pdf, ps, other]

doi 10.1017/S0013091521000717

Tilting pairs in extriangulated categories

Authors: Tiwei Zhao, Bin Zhu, Xiao Zhuang

Abstract: Extriangulated categories were introduced by Nakaoka and Palu to give a unification of properties in exact categories and extension-closed subcategories of triangulated categories. A notion of tilting pairs in an extriangulated category is introduced in this paper. We give a Bazzoni characterization of tilting pairs in this setting. We also obtain Auslander-Reiten correspondence of tilting pairs w… ▽ More Extriangulated categories were introduced by Nakaoka and Palu to give a unification of properties in exact categories and extension-closed subcategories of triangulated categories. A notion of tilting pairs in an extriangulated category is introduced in this paper. We give a Bazzoni characterization of tilting pairs in this setting. We also obtain Auslander-Reiten correspondence of tilting pairs which classifies finite $\mathcal{C}$-tilting subcategories for a certain self-orthogonal subcategory $\mathcal{C}$ with some assumptions. This generalizes the known results given by Wei and Xi for the categories of finitely generated modules over Artin algebras, thereby providing new insights in exact and triangulated categories. △ Less

Submitted 5 November, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

Comments: 25 pages

Journal ref: Proceedings of the Edinburgh Mathematical Society 64 (2021) 947-981

arXiv:2006.14485 [pdf, ps, other]

Total positivity from the exponential Riordan arrays

Authors: Bao-Xuan Zhu

Abstract: Log-concavity and almost log-convexity of the cycle index polynomials were proved by Bender and Canfield [J. Combin. Theory Ser. A 74 (1996)]. Schirmacher [J. Combin. Theory Ser. A 85 (1999)] extended them to $q$-log-concavity and almost $q$-log-convexity. Motivated by these, we consider the stronger properties total positivity from the Toeplitz matrix and Hankel matrix. By using exponential Rio… ▽ More Log-concavity and almost log-convexity of the cycle index polynomials were proved by Bender and Canfield [J. Combin. Theory Ser. A 74 (1996)]. Schirmacher [J. Combin. Theory Ser. A 85 (1999)] extended them to $q$-log-concavity and almost $q$-log-convexity. Motivated by these, we consider the stronger properties total positivity from the Toeplitz matrix and Hankel matrix. By using exponential Riordan array methods, we give some criteria for total positivity of the triangular matrix of coefficients of the generalized cycle index polynomials, the Toeplitz matrix and Hankel matrix of the polynomial sequence in terms of the exponential formula, the logarithmic formula and the fractional formula, respectively. Finally, we apply our criteria to some triangular arrays satisfying some recurrence relations, including Bessel triangles of two kinds and their generalizations, the Lah triangle and its generalization, the idempotent triangle and some triangles related to binomial coefficients, rook polynomials and Laguerre polynomials. We not only get total positivity of these lower-triangles, and $q$-Stieltjes moment properties and $3$-$q$-log-convexity of their row-generating functions, but also prove that their triangular convolutions preserve Stieltjes moment property. In particular, we solve a conjecture of Sokal on $q$-Stieltjes moment property of rook polynomials. △ Less

Submitted 10 October, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: It will appear in SIAM J. Discrete Math

MSC Class: 05A20; 05A15; 11B83; 15B36; 44A60

arXiv:2006.07866 [pdf, ps, other]

Support $τ_n$-tilting pairs

Authors: Panyue Zhou, Bin Zhu

Abstract: We introduce the higher version of the notion of Adachi-Iyama-Reiten's support $τ$-tilting pairs, which is a generalization of maximal $τ_n$-rigid pairs in the sense of Jacobsen-Jørgensen. Let $\mathcal C$ be an $(n+2)$-angulated category with an $n$-suspension functor $Σ^n$ and an Opperman-Thomas cluster tilting object. We show that relative $n$-rigid objects in $\mathcal C$ are in bijection with… ▽ More We introduce the higher version of the notion of Adachi-Iyama-Reiten's support $τ$-tilting pairs, which is a generalization of maximal $τ_n$-rigid pairs in the sense of Jacobsen-Jørgensen. Let $\mathcal C$ be an $(n+2)$-angulated category with an $n$-suspension functor $Σ^n$ and an Opperman-Thomas cluster tilting object. We show that relative $n$-rigid objects in $\mathcal C$ are in bijection with $τ_n$-rigid pairs in the $n$-abelian category $\mathcal C/{\rm add}Σ^n T$, and relative maximal $n$-rigid objects in $\mathcal C$ are in bijection with support $τ_n$-tilting pairs. We also show that relative $n$-self-perpendicular objects are in bijection with maximal $τ_n$-rigid pairs. These results generalise the work for $\mathcal C$ being $2n$-Calabi-Yau by Jacobsen-Jørgensen and the work for $n=1$ by Yang-Zhu. △ Less

Submitted 14 June, 2020; originally announced June 2020.

Comments: 17 pages, comments are welcome

MSC Class: 18E30; 16G10

Journal ref: J. Algebra 616 (2023), 193-211

arXiv:2005.14073 [pdf, other]

Robust estimation via generalized quasi-gradients

Authors: Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

Abstract: We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients". Whenever these quasi-gradients exist, a large family of low-regret algorithms are guaranteed to approximate the global… ▽ More We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients". Whenever these quasi-gradients exist, a large family of low-regret algorithms are guaranteed to approximate the global minimum; this includes the commonly-used filtering algorithm. For robust mean estimation of distributions under bounded covariance, we show that any first-order stationary point of the associated optimization problem is an {approximate global minimum} if and only if the corruption level $ε< 1/3$. Consequently, any optimization algorithm that aproaches a stationary point yields an efficient robust estimator with breakdown point $1/3$. With careful initialization and step size, we improve this to $1/2$, which is optimal. For other tasks, including linear regression and joint mean and covariance estimation, the loss landscape is more rugged: there are stationary points arbitrarily far from the global minimum. Nevertheless, we show that generalized quasi-gradients exist and construct efficient algorithms. These algorithms are simpler than previous ones in the literature, and for linear regression we improve the estimation error from $O(\sqrtε)$ to the optimal rate of $O(ε)$ for small $ε$ assuming certified hypercontractivity. For mean estimation with near-identity covariance, we show that a simple gradient descent algorithm achieves breakdown point $1/3$ and iteration complexity $\tilde{O}(d/ε^2)$. △ Less

Submitted 28 May, 2020; originally announced May 2020.

arXiv:2005.04986 [pdf, ps, other]

doi 10.1016/j.jcp.2021.110325

Symplectic Neural Networks in Taylor Series Form for Hamiltonian Systems

Authors: Yun** Tong, Shiying Xiong, Xingzhe He, Guanghan Pan, Bo Zhu

Abstract: We propose an effective and lightweight learning algorithm, Symplectic Taylor Neural Networks (Taylor-nets), to conduct continuous, long-term predictions of a complex Hamiltonian dynamic system based on sparse, short-term observations. At the heart of our algorithm is a novel neural network architecture consisting of two sub-networks. Both are embedded with terms in the form of Taylor series expan… ▽ More We propose an effective and lightweight learning algorithm, Symplectic Taylor Neural Networks (Taylor-nets), to conduct continuous, long-term predictions of a complex Hamiltonian dynamic system based on sparse, short-term observations. At the heart of our algorithm is a novel neural network architecture consisting of two sub-networks. Both are embedded with terms in the form of Taylor series expansion designed with symmetric structure. The key mechanism underpinning our infrastructure is the strong expressiveness and special symmetric property of the Taylor series expansion, which naturally accommodate the numerical fitting process of the gradients of the Hamiltonian with respect to the generalized coordinates as well as preserve its symplectic structure. We further incorporate a fourth-order symplectic integrator in conjunction with neural ODEs' framework into our Taylor-net architecture to learn the continuous-time evolution of the target systems while simultaneously preserving their symplectic structures. We demonstrated the efficacy of our Taylor-net in predicting a broad spectrum of Hamiltonian dynamic systems, including the pendulum, the Lotka--Volterra, the Kepler, and the Hénon--Heiles systems. Our model exhibits unique computational merits by outperforming previous methods to a great extent regarding the prediction accuracy, the convergence rate, and the robustness despite using extremely small training data with a short training period (6000 times shorter than the predicting period), small sample sizes, and no intermediate data to train the networks. △ Less

Submitted 19 February, 2022; v1 submitted 11 May, 2020; originally announced May 2020.

Journal ref: Journal of Computational Physics, p.110325 (2021)

Showing 1–50 of 126 results for author: Zhu, B