-
Improving Generalization and Convergence by Enhancing Implicit Regularization
Authors:
Mingze Wang,
Haotian He,
**bo Wang,
Zilin Wang,
Guanhua Huang,
Feiyu Xiong,
Zhiyu Li,
Weinan E,
Lei Wu
Abstract:
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I…
▽ More
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
A simple, randomized algorithm for diagonalizing normal matrices
Authors:
Haoze He,
Daniel Kressner
Abstract:
We present and analyze a simple numerical method that diagonalizes a complex normal matrix A by diagonalizing the Hermitian matrix obtained from a random linear combination of the Hermitian and skew-Hermitian parts of A.
We present and analyze a simple numerical method that diagonalizes a complex normal matrix A by diagonalizing the Hermitian matrix obtained from a random linear combination of the Hermitian and skew-Hermitian parts of A.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Anderson Acceleration with Truncated Gram-Schmidt
Authors:
Ziyuan Tang,
Tianshi Xu,
Huan He,
Yousef Saad,
Yuanzhe Xi
Abstract:
Anderson Acceleration (AA) is a popular algorithm designed to enhance the convergence of fixed-point iterations. In this paper, we introduce a variant of AA based on a Truncated Gram-Schmidt process (AATGS) which has a few advantages over the classical AA. In particular, an attractive feature of AATGS is that its iterates obey a three-term recurrence in the situation when it is applied to solving…
▽ More
Anderson Acceleration (AA) is a popular algorithm designed to enhance the convergence of fixed-point iterations. In this paper, we introduce a variant of AA based on a Truncated Gram-Schmidt process (AATGS) which has a few advantages over the classical AA. In particular, an attractive feature of AATGS is that its iterates obey a three-term recurrence in the situation when it is applied to solving symmetric linear problems and this can lead to a considerable reduction of memory and computational costs. We analyze the convergence of AATGS in both full-depth and limited-depth scenarios and establish its equivalence to the classical AA in the linear case. We also report on the effectiveness of AATGS through a set of numerical experiments, ranging from solving nonlinear partial differential equations to tackling nonlinear optimization problems. In particular, the performance of the method is compared with that of the classical AA algorithms.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
A randomized algorithm for simultaneously diagonalizing symmetric matrices by congruence
Authors:
Haoze He,
Daniel Kressner
Abstract:
A family of symmetric matrices $A_1,\ldots, A_d$ is SDC (simultaneous diagonalization by congruence) if there is an invertible matrix $X$ such that every $X^T A_k X$ is diagonal. In this work, a novel randomized SDC (RSDC) algorithm is proposed that reduces SDC to a generalized eigenvalue problem by considering two (random) linear combinations of the family. We establish exact recovery: RSDC achie…
▽ More
A family of symmetric matrices $A_1,\ldots, A_d$ is SDC (simultaneous diagonalization by congruence) if there is an invertible matrix $X$ such that every $X^T A_k X$ is diagonal. In this work, a novel randomized SDC (RSDC) algorithm is proposed that reduces SDC to a generalized eigenvalue problem by considering two (random) linear combinations of the family. We establish exact recovery: RSDC achieves diagonalization with probability $1$ if the family is exactly SDC. Under a mild regularity assumption, robust recovery is also established: Given a family that is $ε$-close to SDC then RSDC diagonalizes, with high probability, the family up to an error of norm $\mathcal{O}(ε)$. Under a positive definiteness assumption, which often holds in applications, stronger results are established, including a bound on the condition number of the transformation matrix. For practical use, we suggest to combine RSDC with an optimization algorithm. The performance of the resulting method is verified for synthetic data, image separation and EEG analysis tasks. It turns out that our newly developed method outperforms existing optimization-based methods in terms of efficiency while achieving a comparable level of accuracy.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Authors:
Yihao Zhang,
Hangzhou He,
**gyu Zhu,
Huanran Chen,
Yifei Wang,
Zeming Wei
Abstract:
Adversarial Training (AT), which adversarially perturb the input samples during training, has been acknowledged as one of the most effective defenses against adversarial attacks, yet suffers from inevitably decreased clean accuracy. Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve general…
▽ More
Adversarial Training (AT), which adversarially perturb the input samples during training, has been acknowledged as one of the most effective defenses against adversarial attacks, yet suffers from inevitably decreased clean accuracy. Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization. However, as SAM is designed for better clean accuracy, its effectiveness in enhancing adversarial robustness remains unexplored. In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM. Intriguingly, we find that using SAM alone can improve adversarial robustness. To understand this unexpected property of SAM, we first provide empirical and theoretical insights into how SAM can implicitly learn more robust features, and conduct comprehensive experiments to show that SAM can improve adversarial robustness notably without sacrificing any clean accuracy, shedding light on the potential of SAM to be a substitute for AT when accuracy comes at a higher priority. Code is available at https://github.com/weizeming/SAM_AT.
△ Less
Submitted 5 June, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Quantum Annealing and Graph Neural Networks for Solving TSP with QUBO
Authors:
Haoqi He
Abstract:
This paper explores the application of Quadratic Unconstrained Binary Optimization (QUBO) models in solving the Travelling Salesman Problem (TSP) through Quantum Annealing algorithms and Graph Neural Networks. Quantum Annealing (QA), a quantum-inspired optimization method that exploits quantum tunneling to escape local minima, is used to solve QUBO formulations of TSP instances on Coherent Ising M…
▽ More
This paper explores the application of Quadratic Unconstrained Binary Optimization (QUBO) models in solving the Travelling Salesman Problem (TSP) through Quantum Annealing algorithms and Graph Neural Networks. Quantum Annealing (QA), a quantum-inspired optimization method that exploits quantum tunneling to escape local minima, is used to solve QUBO formulations of TSP instances on Coherent Ising Machines (CIMs). The paper also presents a novel approach where QUBO is employed as a loss function within a GNN architecture tailored for solving TSP efficiently. By leveraging GNN's capability to learn graph representations, this method finds approximate solutions to TSP with improved computational time compared to traditional exact solvers. The paper details how to construct a QUBO model for TSP by encoding city visits into binary variables and formulating constraints that guarantee valid tours. It further discusses the implementation of QUBO-based Quantum Annealing algorithm for TSP (QQA-TSP) and its feasibility demonstration using quantum simulation platforms. In addition, it introduces a Graph Neural Network solution for TSP (QGNN-TSP), which learns the underlying structure of the problem and produces competitive solutions via gradient descent over a QUBO-based loss function. The experimental results compare the performance of QQA-TSP against state-of-the-art classical solvers such as dynamic programming, Concorde, and Gurobi, while also presenting empirical outcomes from training and evaluating QGNN-TSP on various TSP datasets. The study highlights the promise of combining deep learning techniques with quantum-inspired optimization methods for solving NP-hard problems like TSP, suggesting future directions for enhancing GNN architectures and applying QUBO frameworks to more complex combinatorial optimization tasks.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Projection of Elliptic Orbits and Branching Laws
Authors:
Hongyu He
Abstract:
Let $G$ be a Lie group, and $H\subset G$ a closed subgroup. Let $π$ be an irreducible unitary representation of $G$. In this paper, we briefly discuss the orbit method and its application to the branching problem $π|_{H}$. We use the Gan-Gross-Prasad branching law for $(G, H)= ( U(p,q), U(p, q-1) )$ as an example to illustrate the relation between $\pro_{\f u(p, q-1)}^{\f u(p,q)} \mc O(λ)$ and the…
▽ More
Let $G$ be a Lie group, and $H\subset G$ a closed subgroup. Let $π$ be an irreducible unitary representation of $G$. In this paper, we briefly discuss the orbit method and its application to the branching problem $π|_{H}$. We use the Gan-Gross-Prasad branching law for $(G, H)= ( U(p,q), U(p, q-1) )$ as an example to illustrate the relation between $\pro_{\f u(p, q-1)}^{\f u(p,q)} \mc O(λ)$ and the branching law of the discrete series $D_λ|_{U(p,q-1)}$ for $λ$ an regular elliptic element. We also discuss some results regarding branching laws and wave front sets. The presentation of this paper does not follow the historical timeline of development.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches
Authors:
Ru Huang,
Kai Chang,
Huan He,
Ruipeng Li,
Yuanzhe Xi
Abstract:
We propose a data-driven and machine-learning-based approach to compute non-Galerkin coarse-grid operators in algebraic multigrid (AMG) methods, addressing the well-known issue of increasing operator complexity. Guided by the AMG theory on spectrally equivalent coarse-grid operators, we have developed novel ML algorithms that utilize neural networks (NNs) combined with smooth test vectors from mul…
▽ More
We propose a data-driven and machine-learning-based approach to compute non-Galerkin coarse-grid operators in algebraic multigrid (AMG) methods, addressing the well-known issue of increasing operator complexity. Guided by the AMG theory on spectrally equivalent coarse-grid operators, we have developed novel ML algorithms that utilize neural networks (NNs) combined with smooth test vectors from multigrid eigenvalue problems. The proposed method demonstrates promise in reducing the complexity of coarse-grid operators while maintaining overall AMG convergence for solving parametric partial differential equation (PDE) problems. Numerical experiments on anisotropic rotated Laplacian and linear elasticity problems are provided to showcase the performance and compare with existing methods for computing non-Galerkin coarse-grid operators.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
NLTGCR: A class of Nonlinear Acceleration Procedures based on Conjugate Residuals
Authors:
Huan He,
Ziyuan Tang,
Shifan Zhao,
Yousef Saad,
Yuanzhe Xi
Abstract:
This paper develops a new class of nonlinear acceleration algorithms based on extending conjugate residual-type procedures from linear to nonlinear equations. The main algorithm has strong similarities with Anderson acceleration as well as with inexact Newton methods - depending on which variant is implemented. We prove theoretically and verify experimentally, on a variety of problems from simulat…
▽ More
This paper develops a new class of nonlinear acceleration algorithms based on extending conjugate residual-type procedures from linear to nonlinear equations. The main algorithm has strong similarities with Anderson acceleration as well as with inexact Newton methods - depending on which variant is implemented. We prove theoretically and verify experimentally, on a variety of problems from simulation experiments to deep learning applications, that our method is a powerful accelerated iterative algorithm.
△ Less
Submitted 30 March, 2024; v1 submitted 31 May, 2023;
originally announced June 2023.
-
Ramanujan-inspired series for $1/π$ involving harmonic numbers
Authors:
Qinghu Hou,
Haihong He,
Xiaoxia Wang
Abstract:
By applying the derivative operator to the known identities from hypergeometric series or WZ pairs, we obtain seven series associated with harmonic numbers. Specifically, six of them are Ramanujan-like formulas for $1/π$ and the remaining onecontains harmonic numbers of order $2$. As conclusions, Sun's five conjectural series are proved.
By applying the derivative operator to the known identities from hypergeometric series or WZ pairs, we obtain seven series associated with harmonic numbers. Specifically, six of them are Ramanujan-like formulas for $1/π$ and the remaining onecontains harmonic numbers of order $2$. As conclusions, Sun's five conjectural series are proved.
△ Less
Submitted 8 July, 2023; v1 submitted 30 April, 2023;
originally announced May 2023.
-
A descent method for nonsmooth multiobjective optimization problems on Riemannian manifolds
Authors:
Chunming Tang,
Hao He,
**bao Jian,
Miantao Chao
Abstract:
In this paper, a descent method for nonsmooth multiobjective optimization problems on complete Riemannian manifolds is proposed. The objective functions are only assumed to be locally Lipschitz continuous instead of convexity used in existing methods. A necessary condition for Pareto optimality in Euclidean space is generalized to the Riemannian setting. At every iteration, an acceptable descent d…
▽ More
In this paper, a descent method for nonsmooth multiobjective optimization problems on complete Riemannian manifolds is proposed. The objective functions are only assumed to be locally Lipschitz continuous instead of convexity used in existing methods. A necessary condition for Pareto optimality in Euclidean space is generalized to the Riemannian setting. At every iteration, an acceptable descent direction is obtained by constructing a convex hull of some Riemannian $\varepsilon$-subgradients. And then a Riemannian Armijo-type line search is executed to produce the next iterate. The convergence result is established in the sense that a point satisfying the necessary condition for Pareto optimality can be generated by the algorithm in a finite number of iterations. Finally, some preliminary numerical results are reported, which show that the proposed method is efficient.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Two curious q-supercongruences and their extensions
Authors:
Haihong He,
Xiaoxia Wang
Abstract:
We prove two single-parameter q-supercongruences which were recently conjectured by Guo, and establish their further extensions with one more parameter. Crucial ingredients in the proof are the terminating form of q-binomial theorem and a Karlsson-Minton type summation formula due to Gasper. Incidentally, an assertion of Wang, Li and Tang is also verified by establishing its q-analogue.
We prove two single-parameter q-supercongruences which were recently conjectured by Guo, and establish their further extensions with one more parameter. Crucial ingredients in the proof are the terminating form of q-binomial theorem and a Karlsson-Minton type summation formula due to Gasper. Incidentally, an assertion of Wang, Li and Tang is also verified by establishing its q-analogue.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Centers and invariant straight lines of planar real polynomial vector fields and its configurations
Authors:
Hong** He,
Changjian Liu,
Dongmei Xiao
Abstract:
In the paper, we first give the least upper bound formula on the number of centers of planar real polynomial Hamiltonian vector fields.
This formula reveals that the greater the number of invariant straight lines of the vector field and the less the number of its centers. Then we obtain some rules on the configurations of centers of planar real polynomial Hamiltonian Kolmogorov vector fields whe…
▽ More
In the paper, we first give the least upper bound formula on the number of centers of planar real polynomial Hamiltonian vector fields.
This formula reveals that the greater the number of invariant straight lines of the vector field and the less the number of its centers. Then we obtain some rules on the configurations of centers of planar real polynomial Hamiltonian Kolmogorov vector fields when the number of centers is exactly the least upper bound. As an application of these results, we give an affirmative answer to a conjecture on the topological classification of configurations for the cubic Hamiltonian Kolmogorov vector fields with four centers. Moreover, we discuss the relationship between the number of centers of planar real polynomial vector fields and the existence of limit cycles, and prove that cubic real polynomial Kolmogorov vector fields have no limit cycles if the number of its centers reaches the maximum. More precisely, it is shown that the cubic real polynomial Kolmogorov vector field must have an elementary first integral in $\mathbb{R}^2\setminus\{xy=0\}$ if it has four centers, and the number of configurations of its centers is one more than that of the cubic polynomial Hamiltonian Kolmogorov vector fields.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
An efficient symmetric primal-dual algorithmic framework for saddle point problems
Authors:
Hong** He,
Kai Wang,
**tao Yu
Abstract:
In this paper, we propose a new primal-dual algorithmic framework for a class of convex-concave saddle point problems frequently arising from image processing and machine learning. Our algorithmic framework updates the primal variable between the twice calculations of the dual variable, thereby appearing a symmetric iterative scheme, which is accordingly called the {\bf s}ymmetric {\bf p}r{\bf i}m…
▽ More
In this paper, we propose a new primal-dual algorithmic framework for a class of convex-concave saddle point problems frequently arising from image processing and machine learning. Our algorithmic framework updates the primal variable between the twice calculations of the dual variable, thereby appearing a symmetric iterative scheme, which is accordingly called the {\bf s}ymmetric {\bf p}r{\bf i}mal-{\bf d}ual {\bf a}lgorithm (SPIDA). It is noteworthy that the subproblems of our SPIDA are equipped with Bregman proximal regularization terms, which make SPIDA versatile in the sense that it enjoys an algorithmic framework covering some existing algorithms such as the classical augmented Lagrangian method (ALM), linearized ALM, and Jacobian splitting algorithms for linearly constrained optimization problems. Besides, our algorithmic framework allows us to derive some customized versions so that SPIDA works as efficiently as possible for structured optimization problems. Theoretically, under some mild conditions, we prove the global convergence of SPIDA and estimate the linear convergence rate under a generalized error bound condition defined by Bregman distance. Finally, a series of numerical experiments on the matrix game, basis pursuit, robust principal component analysis, and image restoration demonstrate that our SPIDA works well on synthetic and real-world datasets.
△ Less
Submitted 25 July, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Randomized Joint Diagonalization of Symmetric Matrices
Authors:
Haoze He,
Daniel Kressner
Abstract:
Given a family of nearly commuting symmetric matrices, we consider the task of computing an orthogonal matrix that nearly diagonalizes every matrix in the family. In this paper, we propose and analyze randomized joint diagonalization (RJD) for performing this task. RJD applies a standard eigenvalue solver to random linear combinations of the matrices. Unlike existing optimization-based methods, RJ…
▽ More
Given a family of nearly commuting symmetric matrices, we consider the task of computing an orthogonal matrix that nearly diagonalizes every matrix in the family. In this paper, we propose and analyze randomized joint diagonalization (RJD) for performing this task. RJD applies a standard eigenvalue solver to random linear combinations of the matrices. Unlike existing optimization-based methods, RJD is simple to implement and leverages existing high-quality linear algebra software packages. Our main novel contribution is to prove robust recovery: Given a family that is $ε$-near to a commuting family, RJD jointly diagonalizes this family, with high probability, up to an error of norm O($ε$). We also discuss how the algorithm can be further improved by deflation techniques and demonstrate its state-of-the-art performance by numerical experiments with synthetic and real-world data.
△ Less
Submitted 23 October, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Pruning, cut trees, and the reconstruction problem
Authors:
Nicolas Broutin,
Hui He,
Minmin Wang
Abstract:
We consider a pruning of the inhomogeneous continuum random trees, as well as the cut trees that encode the genealogies of the fragmentations that come with the pruning. We propose a new approach to the reconstruction problem, which has been treated for the Brownian CRT in [Electron. J. Probab. vol. 22, 2017] and for the stable trees in [Ann. IHP B, vol 55, 2019]. Our approach does not rely upon s…
▽ More
We consider a pruning of the inhomogeneous continuum random trees, as well as the cut trees that encode the genealogies of the fragmentations that come with the pruning. We propose a new approach to the reconstruction problem, which has been treated for the Brownian CRT in [Electron. J. Probab. vol. 22, 2017] and for the stable trees in [Ann. IHP B, vol 55, 2019]. Our approach does not rely upon self-similarity and can potentially apply to general Lévy trees as well.
△ Less
Submitted 2 February, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
An Efficient Nonlinear Acceleration method that Exploits Symmetry of the Hessian
Authors:
Huan He,
Shifan Zhao,
Ziyuan Tang,
Joyce C Ho,
Yousef Saad,
Yuanzhe Xi
Abstract:
Nonlinear acceleration methods are powerful techniques to speed up fixed-point iterations. However, many acceleration methods require storing a large number of previous iterates and this can become impractical if computational resources are limited. In this paper, we propose a nonlinear Truncated Generalized Conjugate Residual method (nlTGCR) whose goal is to exploit the symmetry of the Hessian to…
▽ More
Nonlinear acceleration methods are powerful techniques to speed up fixed-point iterations. However, many acceleration methods require storing a large number of previous iterates and this can become impractical if computational resources are limited. In this paper, we propose a nonlinear Truncated Generalized Conjugate Residual method (nlTGCR) whose goal is to exploit the symmetry of the Hessian to reduce memory usage. The proposed method can be interpreted as either an inexact Newton or a quasi-Newton method. We show that, with the help of global strategies like residual check techniques, nlTGCR can converge globally for general nonlinear problems and that under mild conditions, nlTGCR is able to achieve superlinear convergence. We further analyze the convergence of nlTGCR in a stochastic setting. Numerical results demonstrate the superiority of nlTGCR when compared with several other competitive baseline approaches on a few problems. Our code will be available in the future.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
A Unified Bregman Alternating Minimization Algorithm for Generalized DC Programming with Application to Imaging Data
Authors:
Hong** He,
Zhiyuan Zhang
Abstract:
In this paper, we consider a class of nonconvex (not necessarily differentiable) optimization problems called generalized DC (Difference-of-Convex functions) programming, which is minimizing the sum of two separable DC parts and one two-block-variable coupling function. To circumvent the nonconvexity and nonseparability of the problem under consideration, we accordingly introduce a Unified Bregman…
▽ More
In this paper, we consider a class of nonconvex (not necessarily differentiable) optimization problems called generalized DC (Difference-of-Convex functions) programming, which is minimizing the sum of two separable DC parts and one two-block-variable coupling function. To circumvent the nonconvexity and nonseparability of the problem under consideration, we accordingly introduce a Unified Bregman Alternating Minimization Algorithm (UBAMA) by maximally exploiting the favorable DC structure of the objective. Specifically, we first follow the spirit of alternating minimization to update each block variable in a sequential order, which can efficiently tackle the nonseparablitity caused by the coupling function. Then, we employ the Fenchel-Young inequality to approximate the second DC components (i.e., concave parts) so that each subproblem reduces to a convex optimization problem, thereby alleviating the computational burden of the nonconvex DC parts. Moreover, each subproblem absorbs a Bregman proximal regularization term, which is usually beneficial for inducing closed-form solutions of subproblems for many cases via choosing appropriate Bregman kernel functions. It is remarkable that our algorithm not only provides an algorithmic framework to understand the iterative schemes of some novel existing algorithms, but also enjoys implementable schemes with easier subproblems than some state-of-the-art first-order algorithms developed for generic nonconvex and nonsmooth optimization problems. Theoretically, we prove that the sequence generated by our algorithm globally converges to a critical point under the Kurdyka-Łojasiewicz (KŁ) condition. Besides, we estimate the local convergence rates of our algorithm when we further know the prior information of the KŁ exponent.
△ Less
Submitted 4 August, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
AUTM Flow: Atomic Unrestricted Time Machine for Monotonic Normalizing Flows
Authors:
Difeng Cai,
Yuliang Ji,
Huan He,
Qiang Ye,
Yuanzhe Xi
Abstract:
Nonlinear monotone transformations are used extensively in normalizing flows to construct invertible triangular map**s from simple distributions to complex ones. In existing literature, monotonicity is usually enforced by restricting function classes or model parameters and the inverse transformation is often approximated by root-finding algorithms as a closed-form inverse is unavailable. In thi…
▽ More
Nonlinear monotone transformations are used extensively in normalizing flows to construct invertible triangular map**s from simple distributions to complex ones. In existing literature, monotonicity is usually enforced by restricting function classes or model parameters and the inverse transformation is often approximated by root-finding algorithms as a closed-form inverse is unavailable. In this paper, we introduce a new integral-based approach termed "Atomic Unrestricted Time Machine (AUTM)", equipped with unrestricted integrands and easy-to-compute explicit inverse. AUTM offers a versatile and efficient way to the design of normalizing flows with explicit inverse and unrestricted function classes or parameters. Theoretically, we present a constructive proof that AUTM is universal: all monotonic normalizing flows can be viewed as limits of AUTM flows. We provide a concrete example to show how to approximate any given monotonic normalizing flow using AUTM flows with guaranteed convergence. The result implies that AUTM can be used to transform an existing flow into a new one equipped with explicit inverse and unrestricted parameters. The performance of the new approach is evaluated on high dimensional density estimation, variational inference and image generation. Experiments demonstrate superior speed and memory efficiency of AUTM.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
von Neumann type trace inequality for dual quaternion matrices
Authors:
Chen Ling,
Hong** He,
Liqun Qi,
Tingting Feng
Abstract:
Dual quaternion matrices have important applications in multi-agent formation control. In this paper, we first address the concept of spectral norm of dual quaternion matrices. Then, we introduce a von Neumann type trace inequality and a Hoffman-Wielandt type inequality for general dual quaternion matrices, where the latter characterizes a simultaneous perturbation bound on all singular values of…
▽ More
Dual quaternion matrices have important applications in multi-agent formation control. In this paper, we first address the concept of spectral norm of dual quaternion matrices. Then, we introduce a von Neumann type trace inequality and a Hoffman-Wielandt type inequality for general dual quaternion matrices, where the latter characterizes a simultaneous perturbation bound on all singular values of a dual quaternion matrix. In particular, we also present two variants of the above two inequalities expressed by eigenvalues of dual quaternion Hermitian matrices. Our results are helpful for the further study of dual quaternion matrix theory, algorithmic design, and applications.
△ Less
Submitted 12 April, 2023; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Brownian continuum random tree conditioned to be large
Authors:
Romain Abraham,
Jean-Franç Ois Delmas,
Hui He
Abstract:
We consider a Feller diffusion (Zs, s $\ge$ 0) (with diffusion coefficient $\sqrt$ 2$β$ and drift $θ$ $\in$ R) that we condition on {Zt = at}, where at is a deterministic function, and we study the limit in distribution of the conditioned process and of its genealogical tree as t $\rightarrow$ +$\infty$. When at does not increase too rapidly, we recover the standard size-biased process (and the as…
▽ More
We consider a Feller diffusion (Zs, s $\ge$ 0) (with diffusion coefficient $\sqrt$ 2$β$ and drift $θ$ $\in$ R) that we condition on {Zt = at}, where at is a deterministic function, and we study the limit in distribution of the conditioned process and of its genealogical tree as t $\rightarrow$ +$\infty$. When at does not increase too rapidly, we recover the standard size-biased process (and the associated genealogical tree given by the Kesten's tree). When at behaves as $α$$β$ 2 t 2 when $θ$ = 0 or as $α$ e 2$β$|$θ$|t when $θ$ = 0, we obtain a new process whose distribution is described by a Girsanov transformation and equivalently by a SDE with a Poissonian immigration. Its associated genealogical tree is described by an infinite discrete skeleton (which does not satisfy the branching property) decorated with Brownian continuum random trees given by a Poisson point measure. As a by-product of this study, we introduce several sets of trees endowed with a Gromovtype distance which are of independent interest and which allow here to define in a formal and measurable way the decoration of a backbone with a family of continuum random trees.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Adaptive Zeroing-Type Neural Dynamics for Solving Quadratic Minimization and Applied to Target Tracking
Authors:
Huiting He,
Chengze Jiang,
Yudong Zhang,
Xiuchun Xiao,
Zhiyuan Song
Abstract:
The time-varying quadratic miniaturization (TVQM) problem, as a hotspot currently, urgently demands a more reliable and faster--solving model. To this end, a novel adaptive coefficient constructs framework is presented and realized to improve the performance of the solution model, leading to the adaptive zeroing-type neural dynamics (AZTND) model. Then the AZTND model is applied to solve the TVQM…
▽ More
The time-varying quadratic miniaturization (TVQM) problem, as a hotspot currently, urgently demands a more reliable and faster--solving model. To this end, a novel adaptive coefficient constructs framework is presented and realized to improve the performance of the solution model, leading to the adaptive zeroing-type neural dynamics (AZTND) model. Then the AZTND model is applied to solve the TVQM problem. The adaptive coefficients can adjust the step size of the model online so that the solution model converges faster. At the same time, the integration term develops to enhance the robustness of the model in a perturbed environment. Experiments demonstrate that the proposed model shows faster convergence and more reliable robustness than existing approaches. Finally, the AZTND model is applied in a target tracking scheme, proving the practicality of our proposed model.
△ Less
Submitted 29 November, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.
-
A Thin Fundamental Set for SL(2, Z)
Authors:
Hongyu He
Abstract:
Let $Γ=SL(2, \mathbb Z)$ and $G=SL(2, \mathbb R)$. Let $g=kan$ be the Iwasawa decomposition. Let $ε$ be a small positive number. In this paper, we construct a fundamental set $\mathcal F_ε$ such that the $k$-component of $ g \in \mathcal F_ε$ is within the $ε$-distance from the identity. We further prove an inequality for the $L^2$-norm of functions on $G/Γ$.
Let $Γ=SL(2, \mathbb Z)$ and $G=SL(2, \mathbb R)$. Let $g=kan$ be the Iwasawa decomposition. Let $ε$ be a small positive number. In this paper, we construct a fundamental set $\mathcal F_ε$ such that the $k$-component of $ g \in \mathcal F_ε$ is within the $ε$-distance from the identity. We further prove an inequality for the $L^2$-norm of functions on $G/Γ$.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
"Sparse + Low-Rank'' Tensor Completion Approach for Recovering Images and Videos
Authors:
Chenjian Pan,
Chen Ling,
Hong** He,
Liqun Qi,
Yanwei Xu
Abstract:
Recovering color images and videos from highly undersampled data is a fundamental and challenging task in face recognition and computer vision. By the multi-dimensional nature of color images and videos, in this paper, we propose a novel tensor completion approach, which is able to efficiently explore the sparsity of tensor data under the discrete cosine transform (DCT). Specifically, we introduce…
▽ More
Recovering color images and videos from highly undersampled data is a fundamental and challenging task in face recognition and computer vision. By the multi-dimensional nature of color images and videos, in this paper, we propose a novel tensor completion approach, which is able to efficiently explore the sparsity of tensor data under the discrete cosine transform (DCT). Specifically, we introduce two ``sparse + low-rank'' tensor completion models as well as two implementable algorithms for finding their solutions. The first one is a DCT-based sparse plus weighted nuclear norm induced low-rank minimization model. The second one is a DCT-based sparse plus $p$-shrinking map** induced low-rank optimization model. Moreover, we accordingly propose two implementable augmented Lagrangian-based algorithms for solving the underlying optimization models. A series of numerical experiments including color image inpainting and video data recovery demonstrate that our proposed approach performs better than many existing state-of-the-art tensor completion methods, especially for the case when the ratio of missing data is high.
△ Less
Submitted 19 August, 2022; v1 submitted 18 October, 2021;
originally announced October 2021.
-
GDA-AM: On the effectiveness of solving minimax optimization via Anderson Acceleration
Authors:
Huan He,
Shifan Zhao,
Yuanzhe Xi,
Joyce C Ho,
Yousef Saad
Abstract:
Many modern machine learning algorithms such as generative adversarial networks (GANs) and adversarial training can be formulated as minimax optimization. Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity. However, GDA can converge to non-optimal minimax points. We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-poin…
▽ More
Many modern machine learning algorithms such as generative adversarial networks (GANs) and adversarial training can be formulated as minimax optimization. Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity. However, GDA can converge to non-optimal minimax points. We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration and solves it using Anderson Mixing to con-verge to the local minimax. It addresses the diverging issue of simultaneous GDAand accelerates the convergence of alternating GDA. We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions. We also empirically show that GDA-AMsolves a variety of minimax problems and improves GAN training on several datasets
△ Less
Submitted 29 June, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training
Authors:
Cong Fang,
Hangfeng He,
Qi Long,
Weijie J. Su
Abstract:
In this paper, we introduce the \textit{Layer-Peeled Model}, a nonconvex yet analytically tractable optimization program, in a quest to better understand deep neural networks that are trained for a sufficiently long time. As the name suggests, this new model is derived by isolating the topmost layer from the remainder of the neural network, followed by imposing certain constraints separately on th…
▽ More
In this paper, we introduce the \textit{Layer-Peeled Model}, a nonconvex yet analytically tractable optimization program, in a quest to better understand deep neural networks that are trained for a sufficiently long time. As the name suggests, this new model is derived by isolating the topmost layer from the remainder of the neural network, followed by imposing certain constraints separately on the two parts of the network. We demonstrate that the Layer-Peeled Model, albeit simple, inherits many characteristics of well-trained neural networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training. First, when working on class-balanced datasets, we prove that any solution to this model forms a simplex equiangular tight frame, which in part explains the recently discovered phenomenon of neural collapse \cite{papyan2020prevalence}. More importantly, when moving to the imbalanced case, our analysis of the Layer-Peeled Model reveals a hitherto unknown phenomenon that we term \textit{Minority Collapse}, which fundamentally limits the performance of deep learning models on the minority classes. In addition, we use the Layer-Peeled Model to gain insights into how to mitigate Minority Collapse. Interestingly, this phenomenon is first predicted by the Layer-Peeled Model before being confirmed by our computational experiments.
△ Less
Submitted 8 September, 2021; v1 submitted 29 January, 2021;
originally announced January 2021.
-
Critical metrics of the volume functional on three-dimensional manifolds
Authors:
Huiya He
Abstract:
In this paper, we prove the three-dimensional $CPE$ conjecture with non-negative Ricci curvature. Moreover, we establish a classification result on three-dimensional vacuum static space with non-negative Ricci curvature. Finally, we show that a three-dimensional compact, oriented, connected Miao-Tam critical metric with smooth boundary, non-negative Ricci curvature and non-negative potential funct…
▽ More
In this paper, we prove the three-dimensional $CPE$ conjecture with non-negative Ricci curvature. Moreover, we establish a classification result on three-dimensional vacuum static space with non-negative Ricci curvature. Finally, we show that a three-dimensional compact, oriented, connected Miao-Tam critical metric with smooth boundary, non-negative Ricci curvature and non-negative potential function is isometric to a geodesic ball in a simply connected space form $\mathbb{R}^3$ or $\mathbb{S}^3$.
△ Less
Submitted 6 March, 2021; v1 submitted 14 January, 2021;
originally announced January 2021.
-
Low-Rank and Sparse Enhanced Tucker Decomposition for Tensor Completion
Authors:
Chenjian Pan,
Chen Ling,
Hong** He,
Liqun Qi,
Yanwei Xu
Abstract:
Tensor completion refers to the task of estimating the missing data from an incomplete measurement or observation, which is a core problem frequently arising from the areas of big data analysis, computer vision, and network engineering. Due to the multidimensional nature of high-order tensors, the matrix approaches, e.g., matrix factorization and direct matricization of tensors, are often not idea…
▽ More
Tensor completion refers to the task of estimating the missing data from an incomplete measurement or observation, which is a core problem frequently arising from the areas of big data analysis, computer vision, and network engineering. Due to the multidimensional nature of high-order tensors, the matrix approaches, e.g., matrix factorization and direct matricization of tensors, are often not ideal for tensor completion and recovery. In this paper, we introduce a unified low-rank and sparse enhanced Tucker decomposition model for tensor completion. Our model possesses a sparse regularization term to promote a sparse core tensor of the Tucker decomposition, which is beneficial for tensor data compression. Moreover, we enforce low-rank regularization terms on factor matrices of the Tucker decomposition for inducing the low-rankness of the tensor with a cheap computational cost. Numerically, we propose a customized ADMM with enough easy subproblems to solve the underlying model. It is remarkable that our model is able to deal with different types of real-world data sets, since it exploits the potential periodicity and inherent correlation properties appeared in tensors. A series of computational experiments on real-world data sets, including internet traffic data sets, color images, and face recognition, demonstrate that our model performs better than many existing state-of-the-art matricization and tensorization approaches in terms of achieving higher recovery accuracy.
△ Less
Submitted 19 May, 2021; v1 submitted 1 October, 2020;
originally announced October 2020.
-
On the growth of Rankin-Selberg L-functions for $SL(2)$
Authors:
Hongyu He
Abstract:
In this paper, we establish bounds of the Rankin-Selberg $L$-function for $SL(2)$ using the supnorm of the Eisenstein series and a purely representation theoretic index over the real group. Consequently, we obtain a subconvexity bound $L(\frac{1}{2}+ it, f_1 \times f_2) \leq C (1+ |t|)^{\frac{5}{6}+ε}$ for two Maass cusp forms of $SL(2, \mathbb Z)$.
In this paper, we establish bounds of the Rankin-Selberg $L$-function for $SL(2)$ using the supnorm of the Eisenstein series and a purely representation theoretic index over the real group. Consequently, we obtain a subconvexity bound $L(\frac{1}{2}+ it, f_1 \times f_2) \leq C (1+ |t|)^{\frac{5}{6}+ε}$ for two Maass cusp forms of $SL(2, \mathbb Z)$.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Certain L2-norms on automorphic representations of SL(2)
Authors:
Hongyu He
Abstract:
Let $Γ$ be a non-uniform lattice in $SL(2, \mathbb R)$. In this paper, we study various $L^2$-norms of automorphic representations of $SL(2, \mathbb R)$. We bound these norms with intrinsic norms defined on the representation. Comparison of these norms will help us understand the growth of $L$-functions in a systematic way.
Let $Γ$ be a non-uniform lattice in $SL(2, \mathbb R)$. In this paper, we study various $L^2$-norms of automorphic representations of $SL(2, \mathbb R)$. We bound these norms with intrinsic norms defined on the representation. Comparison of these norms will help us understand the growth of $L$-functions in a systematic way.
△ Less
Submitted 26 January, 2024; v1 submitted 20 August, 2020;
originally announced August 2020.
-
Local well-posedness and blow-up for a family of $U(1)$-invariant peakon equations
Authors:
Stephen C. Anco,
Huijun He,
Zhijun Qiao
Abstract:
The Cauchy problem for a unified family of integrable $U(1)$-invariant peakon equations from the NLS hierarchy is studied. As main results, local well-posedness is proved in Besov spaces, and blow-up is established through use of an $L^1$ conservation law.
The Cauchy problem for a unified family of integrable $U(1)$-invariant peakon equations from the NLS hierarchy is studied. As main results, local well-posedness is proved in Besov spaces, and blow-up is established through use of an $L^1$ conservation law.
△ Less
Submitted 25 December, 2020; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Certain L^2-norm and Asymptotic bounds of Whittaker Function for GL(n)
Authors:
Hongyu He
Abstract:
Whittaker functions of $GL(n, \mathbb R)$ , are most known for its role in the Fourier-Whittaker expansion of cusp forms. Their behavior in the Siegel set, in large, is well-understood. In this paper, we insert into the literature some potentially useful properties of Whittaker function over the group $GL(n, \mathbb R)$ and the mirobolic group $P_n$. We proved the square integrabilty of the Whitta…
▽ More
Whittaker functions of $GL(n, \mathbb R)$ , are most known for its role in the Fourier-Whittaker expansion of cusp forms. Their behavior in the Siegel set, in large, is well-understood. In this paper, we insert into the literature some potentially useful properties of Whittaker function over the group $GL(n, \mathbb R)$ and the mirobolic group $P_n$. We proved the square integrabilty of the Whittaker functions with respect to certain measures, extending a theorem of Jacquet and Shalika . For principal series representations, we gave various asymptotic bounds of smooth Whittaker functions over the whole group $GL(n, \mathbb R)$. Due to the lack of good terminology, we use whittaker functions to refer to $K$-finite or smooth vectors in the Whittaker model.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
Branching Brownian motion conditioned on small maximum
Authors:
Xinxin Chen,
Hui He,
Bastien Mallein
Abstract:
We consider a standard binary branching Brownian motion on the real line. It is known that the maximal position $M_t$ among all particles alive at time $t$, shifted by $m_t = \sqrt{2} t - \frac{3}{2\sqrt{2}} \log t$ converges in law to a randomly shifted Gumbel variable. Derrida and Shi (2017) conjectured the precise asymptotic behaviour of the corresponding lower deviation probability…
▽ More
We consider a standard binary branching Brownian motion on the real line. It is known that the maximal position $M_t$ among all particles alive at time $t$, shifted by $m_t = \sqrt{2} t - \frac{3}{2\sqrt{2}} \log t$ converges in law to a randomly shifted Gumbel variable. Derrida and Shi (2017) conjectured the precise asymptotic behaviour of the corresponding lower deviation probability $\mathbb{P}(M_t \leq \sqrt{2}αt)$ for $α< 1$. We verify their conjecture, and describe the law of the branching Brownian motion conditioned on having a small maximum.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Representation of ax+b group and Dirichlet Series
Authors:
Hongyu He
Abstract:
Let $G$ be the $ax+b$ group. There are essentially two irreducible infinite dimensional unitary representations of $G$, $(μ, L^2(\mathbb R^+))$ and $(μ^*, L^2(\mathbb R^+))$. In this paper, we give various characterizations about smooth vectors of $μ$ and their Mellin transforms. Let $\f d$ be a linear sum of delta distributions supported on the the positive integers $\mathbb Z^+$. We study the Me…
▽ More
Let $G$ be the $ax+b$ group. There are essentially two irreducible infinite dimensional unitary representations of $G$, $(μ, L^2(\mathbb R^+))$ and $(μ^*, L^2(\mathbb R^+))$. In this paper, we give various characterizations about smooth vectors of $μ$ and their Mellin transforms. Let $\f d$ be a linear sum of delta distributions supported on the the positive integers $\mathbb Z^+$. We study the Mellin transform of the matrix coefficients $μ_{ \f d, f}(a)$ with $f$ smooth. We express these Mellin transforms in terms of the Dirichlet series $L(s, \f d)$. We determine a sufficient condition such that the generalized matrix coefficient $μ_{\f d, f}$ is a locally integrable function and estimate the $L^2$-norms of $μ_{\f d, f}$ over the Siegel set. We further derive an inequality which may potentially be used to study the Dirichlet series $L(s, \f d)$.
△ Less
Submitted 22 February, 2022; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Some properties of stationary continuous state branching processes
Authors:
Romain Abraham,
Jean-François Delmas,
Hui He
Abstract:
We consider the genealogical tree of a stationary continuous state branching process with immigration. For a sub-critical stable branching mechanism, we consider the genealogical tree of the extant population at some fixed time and prove that, up to a deterministic time-change, it is distributed as a continuous-time Galton-Watson process with immigration. We obtain similar results for a critical s…
▽ More
We consider the genealogical tree of a stationary continuous state branching process with immigration. For a sub-critical stable branching mechanism, we consider the genealogical tree of the extant population at some fixed time and prove that, up to a deterministic time-change, it is distributed as a continuous-time Galton-Watson process with immigration. We obtain similar results for a critical stable branching mechanism when only looking at immigrants arriving in some fixed time-interval. For a general sub-critical branching mechanism, we consider the number of individuals that give descendants in the extant population. The associated processes (forward or backward in time) are pure-death or pure-birth Markov processes, for which we compute the transition rates.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Improved bounds for anti-Ramsey numbers of matchings in outerplanar graphs
Authors:
Yifan Pei,
Yongxin Lan,
Hua He
Abstract:
Let $\mathcal{O}_n$ be the set of all maximal outerplanar graphs of order $n$. Let $ar(\mathcal{O}_n,F)$ denote the maximum positive integer $k$ such that $T\in \mathcal{O}_n$ has no rainbow subgraph $F$ under a $k$-edge-coloring of $T$. Denote by $M_k$ a matching of size $k$. In this paper, we prove that $ar(\mathcal{O}_n,M_k)\le n+4k-9$ for $n\ge3k-3$, which expressively improves the existing up…
▽ More
Let $\mathcal{O}_n$ be the set of all maximal outerplanar graphs of order $n$. Let $ar(\mathcal{O}_n,F)$ denote the maximum positive integer $k$ such that $T\in \mathcal{O}_n$ has no rainbow subgraph $F$ under a $k$-edge-coloring of $T$. Denote by $M_k$ a matching of size $k$. In this paper, we prove that $ar(\mathcal{O}_n,M_k)\le n+4k-9$ for $n\ge3k-3$, which expressively improves the existing upper bound for $ar(\mathcal{O}_n,M_k)$. We also prove that $ar(\mathcal{O}_n,M_5)=n+4$ for all $n\ge 15$.
△ Less
Submitted 25 September, 2021; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Optimal Change-Point Detection with Training Sequences in the Large and Moderate Deviations Regimes
Authors:
Haiyun He,
Qiaosheng Zhang,
Vincent Y. F. Tan
Abstract:
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective. In contrast to most related works, we assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available. We further require the probability of the \emph{estimation error} to decay either…
▽ More
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective. In contrast to most related works, we assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available. We further require the probability of the \emph{estimation error} to decay either exponentially or sub-exponentially fast (corresponding respectively to the large and moderate deviations regimes in information theory parlance). Based on the training sequences as well as the test sequence consisting of a single change-point, we design a change-point estimator and further show that this estimator is optimal by establishing matching (strong) converses. This leads to a full characterization of the optimal confidence width (i.e., half the width of the confidence interval within which the true change-point is located at with high probability) as a function of the undetected error, under both the large and moderate deviations regimes.
△ Less
Submitted 3 October, 2021; v1 submitted 13 March, 2020;
originally announced March 2020.
-
A criterion for discrete branching laws for Klein four symmetric pairs and its application to $E_{6(-14)}$
Authors:
Haian He
Abstract:
Let $G$ be a noncompact connected simple Lie group, and $(G,G^Γ)$ a Klein four symmetric pair. In this paper, the author shows a necessary condition for the discrete decomposability of unitarizable simple $(\mathfrak{g},K)$-modules for Klein for symmetric pairs. Precisely, if certain conditions hold for $(G,G^Γ)$, there does not exist any unitarizable simple $(\mathfrak{g},K)$-module that is discr…
▽ More
Let $G$ be a noncompact connected simple Lie group, and $(G,G^Γ)$ a Klein four symmetric pair. In this paper, the author shows a necessary condition for the discrete decomposability of unitarizable simple $(\mathfrak{g},K)$-modules for Klein for symmetric pairs. Precisely, if certain conditions hold for $(G,G^Γ)$, there does not exist any unitarizable simple $(\mathfrak{g},K)$-module that is discretely decomposable as a $(\mathfrak{g}^Γ,K^Γ)$-module. As an application, for $G=\mathrm{E}_{6(-14)}$, the author obtains a complete classification of Klein four symmetric pairs $(G,G^Γ)$ with $G^Γ$ noncompact, such that there exists at least one nontrivial unitarizable simple $(\mathfrak{g},K)$-module that is discretely decomposable as a $(\mathfrak{g}^Γ,K^Γ)$-module and is also discretely decomposable as a $(\mathfrak{g}^σ,K^σ)$-module for some nonidentity element $σ\inΓ$.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
Authors:
Matteo Sordello,
Niccolò Dalmasso,
Hangfeng He,
Weijie Su
Abstract:
This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around a vicinity of a local minimum. The detection is performed by splitting the single thread into two an…
▽ More
This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around a vicinity of a local minimum. The detection is performed by splitting the single thread into two and using the inner product of the gradients from the two threads as a measure of stationarity. Owing to this simple yet provably valid stationarity detection, SplitSGD is easy-to-implement and essentially does not incur additional computational cost than standard SGD. Through a series of extensive experiments, we show that this method is appropriate for both convex problems and training (non-convex) neural networks, with performance compared favorably to other stochastic optimization methods. Importantly, this method is observed to be very robust with a set of default parameters for a wide range of problems and, moreover, can yield better generalization performance than other adaptive gradient methods such as Adam.
△ Less
Submitted 16 February, 2024; v1 submitted 18 October, 2019;
originally announced October 2019.
-
Learning Compositional Koopman Operators for Model-Based Control
Authors:
Yunzhu Li,
Hao He,
Jiajun Wu,
Dina Katabi,
Antonio Torralba
Abstract:
Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis. The Koopman operator theory lays the foundation for identifying the nonlinear-to-linear coordinate transformations with data-driven methods. Recently, researchers have proposed to use deep neural networks as a more expressive class of basis functions…
▽ More
Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis. The Koopman operator theory lays the foundation for identifying the nonlinear-to-linear coordinate transformations with data-driven methods. Recently, researchers have proposed to use deep neural networks as a more expressive class of basis functions for calculating the Koopman operators. These approaches, however, assume a fixed dimensional state space; they are therefore not applicable to scenarios with a variable number of objects. In this paper, we propose to learn compositional Koopman operators, using graph neural networks to encode the state into object-centric embeddings and using a block-wise linear transition matrix to regularize the shared structure across objects. The learned dynamics can quickly adapt to new environments of unknown physical parameters and produce control signals to achieve a specified goal. Our experiments on manipulating ropes and controlling soft robots show that the proposed method has better efficiency and generalization ability than existing baselines.
△ Less
Submitted 27 April, 2020; v1 submitted 18 October, 2019;
originally announced October 2019.
-
Kobayashi's conjecture on associated varieties for $(\mathrm{E}_{6(-14)},\mathrm{Spin}(8,1))$
Authors:
Haian He
Abstract:
The author confirms a conjecture on associated varieties by Toshiyuki KOBAYASHI for the Klein four symmetric pair $(\mathrm{E}_{6(-14)},\mathrm{Spin}(8,1))$, which provides an alternative way to confirm the conjecture for the symmetric pair $(\mathrm{Spin}(8,2),\mathrm{Spin}(8,1))$. Also, for Klein four symmetric pairs $(G,G^Γ)$ with the exceptional simple Lie groups $G$ of Hermitian type, there e…
▽ More
The author confirms a conjecture on associated varieties by Toshiyuki KOBAYASHI for the Klein four symmetric pair $(\mathrm{E}_{6(-14)},\mathrm{Spin}(8,1))$, which provides an alternative way to confirm the conjecture for the symmetric pair $(\mathrm{Spin}(8,2),\mathrm{Spin}(8,1))$. Also, for Klein four symmetric pairs $(G,G^Γ)$ with the exceptional simple Lie groups $G$ of Hermitian type, there exists a discrete series representation of $G$ which is $G^Γ$-admissible if and only if $(G,G^Γ)$ is of holomorphic type.
△ Less
Submitted 25 November, 2019; v1 submitted 13 August, 2019;
originally announced August 2019.
-
Properties of the solution set of generalized polynomial complementarity problems
Authors:
Liyun Ling,
Chen Ling,
Hong** He
Abstract:
In this paper, we consider the {\it generalized polynomial complementarity problem} (GPCP), which covers the recently introduced {\it polynomial complementarity problem} (PCP) and the well studied {\it tensor complementarity problem} (TCP) as special cases. By exploiting the structure of tensors, we first show that the solution set of GPCPs is nonempty and compact when a pair of leading tensors is…
▽ More
In this paper, we consider the {\it generalized polynomial complementarity problem} (GPCP), which covers the recently introduced {\it polynomial complementarity problem} (PCP) and the well studied {\it tensor complementarity problem} (TCP) as special cases. By exploiting the structure of tensors, we first show that the solution set of GPCPs is nonempty and compact when a pair of leading tensors is cone {\bf ER}. Then, we study some topological properties of the solution set of GPCPs under the condition that the leading tensor pair is cone ${\bf R}_0$. Finally, we study a notable global Lipschitzian error bound of the solution set of GPCPs, which is better than the results obtained in the current PCPs and TCPs literature. Moreover, such an error bound is potentially helpful for finding and analyzing numerical solutions to the problem under consideration.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.
-
Dirac series for $E_{6(-14)}$
Authors:
Lin-Gen Ding,
Chao-** Dong,
Haian He
Abstract:
Up to equivalence, this paper classifies all the irreducible unitary representations with non-zero Dirac cohomology for the simple Lie group $E_{6(-14)}$, which is of Hermitian symmetric type. Each FS-scattered Dirac series of $E_{6(-14)}$ is realized as a composition factor of certain $A_{\mathfrak{q}}(λ)$ module. Along the way, we have also obtained all the fully supported irreducible unitary re…
▽ More
Up to equivalence, this paper classifies all the irreducible unitary representations with non-zero Dirac cohomology for the simple Lie group $E_{6(-14)}$, which is of Hermitian symmetric type. Each FS-scattered Dirac series of $E_{6(-14)}$ is realized as a composition factor of certain $A_{\mathfrak{q}}(λ)$ module. Along the way, we have also obtained all the fully supported irreducible unitary representations of $E_{6(-14)}$ with integral infinitesimal characters.
△ Less
Submitted 15 May, 2020; v1 submitted 15 March, 2019;
originally announced March 2019.
-
A nonnegativity preserving algorithm for multilinear systems with nonsingular M-tensors
Authors:
Xueli Bai,
Hong** He,
Chen Ling,
Guanglu Zhou
Abstract:
This paper addresses multilinear systems of equations which arise in various applications such as data mining and numerical partial differential equations. When the multilinear system under consideration involves a nonsingular $\mathcal{M}$-tensor and a nonnegative right-hand side vector, it may have multiple nonnegative solutions. In this paper, we propose an algorithm which can always preserve t…
▽ More
This paper addresses multilinear systems of equations which arise in various applications such as data mining and numerical partial differential equations. When the multilinear system under consideration involves a nonsingular $\mathcal{M}$-tensor and a nonnegative right-hand side vector, it may have multiple nonnegative solutions. In this paper, we propose an algorithm which can always preserve the nonnegativity of solutions. Theoretically, we show that the sequence generated by the proposed algorithm is a nonnegative decreasing sequence and converges to a nonnegative solution of the system. Numerical results further support the novelty of the proposed method. Particularly, when some elements of the right-hand side vector are zeros, the proposed algorithm works well while existing state-of-the-art solvers may not produce a nonnegative solution.
△ Less
Submitted 13 May, 2019; v1 submitted 24 November, 2018;
originally announced November 2018.
-
Rigidity theorem for compact Bach-flat manifolds with positive constant $σ_2$
Authors:
Huiya He,
Hai** Fu
Abstract:
We prove that an n($\geq$ 4)-dimensional compact Bach-flat manifold with positive constant $σ_2$ is an Einstein manifold, provided that its Weyl curvature satisfies a suitable pinching condition.
We prove that an n($\geq$ 4)-dimensional compact Bach-flat manifold with positive constant $σ_2$ is an Einstein manifold, provided that its Weyl curvature satisfies a suitable pinching condition.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
On compact Riemannian manifolds with harmonic weyl curvature
Authors:
Hai** Fu,
Huiya He
Abstract:
We give some rigidity theorems for an n$(\geq4)$-dimensional compact Riemannian manifold with harmonic Weyl curvature, positive scalar curvature and positive constant $σ_2$. Moreover, when $n=4,$ we prove that a 4-dimensional compact locally conformally flat Riemannian manifold with positive scalar curvature and positive constant $σ_2$ is isometric to a quotient of the round $\mathbb{S}^4$.
We give some rigidity theorems for an n$(\geq4)$-dimensional compact Riemannian manifold with harmonic Weyl curvature, positive scalar curvature and positive constant $σ_2$. Moreover, when $n=4,$ we prove that a 4-dimensional compact locally conformally flat Riemannian manifold with positive scalar curvature and positive constant $σ_2$ is isometric to a quotient of the round $\mathbb{S}^4$.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
Further study on tensor absolute value equations
Authors:
Chen Ling,
Weijie Yan,
Hong** He,
Liqun Qi
Abstract:
In this paper, we consider the {\it tensor absolute value equations} (TAVEs), which is a newly introduced problem in the context of multilinear systems. Although the system of TAVEs is an interesting generalization of matrix {\it absolute value equations} (AVEs), the well-developed theory and algorithms for AVEs are not directly applicable to TAVEs due to the nonlinearity (or multilinearity) of th…
▽ More
In this paper, we consider the {\it tensor absolute value equations} (TAVEs), which is a newly introduced problem in the context of multilinear systems. Although the system of TAVEs is an interesting generalization of matrix {\it absolute value equations} (AVEs), the well-developed theory and algorithms for AVEs are not directly applicable to TAVEs due to the nonlinearity (or multilinearity) of the problem under consideration. Therefore, we first study the solutions existence of some classes of TAVEs with the help of degree theory, in addition to showing, by fixed point theory, that the system of TAVEs has at least one solution under some checkable conditions. Then, we give a bound of solutions of TAVEs for some special cases. To find a solution to TAVEs, we employ the generalized Newton method and report some preliminary results.
△ Less
Submitted 13 October, 2018;
originally announced October 2018.
-
Generalized tensor equations with leading structured tensors
Authors:
Weijie Yan,
Chen Ling,
Liyun Ling,
Hong** He
Abstract:
The system of tensor equations (TEs) has received much considerable attention in the recent literature. In this paper, we consider a class of generalized tensor equations (GTEs). An important difference between GTEs and TEs is that GTEs can be regarded as a system of non-homogenous polynomial equations, whereas TEs is a homogenous one. Such a difference usually makes the theoretical and algorithmi…
▽ More
The system of tensor equations (TEs) has received much considerable attention in the recent literature. In this paper, we consider a class of generalized tensor equations (GTEs). An important difference between GTEs and TEs is that GTEs can be regarded as a system of non-homogenous polynomial equations, whereas TEs is a homogenous one. Such a difference usually makes the theoretical and algorithmic results tailored for TEs not necessarily applicable to GTEs. To study properties of the solution set of GTEs, we first introduce a new class of so-named ${\rm Z}^+$-tensor, which includes the set of all P-tensors as its proper subset. With the help of degree theory, we prove that the system of GTEs with a leading coefficient ${\rm Z}^+$-tensor has at least one solution for any right-hand side vector. Moreover, we study the local error bounds under some appropriate conditions. Finally, we employ a Levenberg-Marquardt algorithm to find a solution to GTEs and report some preliminary numerical results.
△ Less
Submitted 13 October, 2018;
originally announced October 2018.
-
Discretely decomposable restrictions of $(\mathfrak{g},K)$-modules for Klein four symmetric pairs of exceptional Lie groups of Hermitian type
Authors:
Haian He
Abstract:
Let $(G,G^Γ)$ be a Klein four symmetric pair. The author wants to classify all the Klein four symmetric pairs $(G,G^Γ)$ such that there exists at least one nontrivial unitarizable simple $(\mathfrak{g},K)$-module $π_K$ that is discretely decomposable as a $(\mathfrak{g}^Γ,K^Γ)$-module. In this article, three assumptions will be made. Firstly, $G$ is an exceptional Lie group of Hermitian type, i.e.…
▽ More
Let $(G,G^Γ)$ be a Klein four symmetric pair. The author wants to classify all the Klein four symmetric pairs $(G,G^Γ)$ such that there exists at least one nontrivial unitarizable simple $(\mathfrak{g},K)$-module $π_K$ that is discretely decomposable as a $(\mathfrak{g}^Γ,K^Γ)$-module. In this article, three assumptions will be made. Firstly, $G$ is an exceptional Lie group of Hermitian type, i.e., $G=\mathrm{E}_{6(-14)}$ or $\mathrm{E}_{7(-25)}$. Secondly, $G^Γ$ is noncompact. Thirdly, there exists an element $σ\inΓ$ corresponding to a symmetric pair of anti-holomorphic type such that $π_K$ is discretely decomposable as a $(\mathfrak{g}^σ,K^σ)$-module.
△ Less
Submitted 22 March, 2019; v1 submitted 30 August, 2018;
originally announced August 2018.
-
Lower deviation and moderate deviation probabilities for maximum of a branching random walk
Authors:
Xinxin Chen,
Hui He
Abstract:
Given a super-critical branching random walk on $\mathbb R$ started from the origin, let $M_n$ be the maximal position of individuals at the $n$-th generation. Under some mild conditions, it is known from \cite{A13} that as $n\rightarrow\infty$, $M_n-x^*n+\frac{3}{2θ^*}\log n$ converges in law for some suitable constants $x^*$ and $θ^*$. In this work, we investigate its moderate deviation, in othe…
▽ More
Given a super-critical branching random walk on $\mathbb R$ started from the origin, let $M_n$ be the maximal position of individuals at the $n$-th generation. Under some mild conditions, it is known from \cite{A13} that as $n\rightarrow\infty$, $M_n-x^*n+\frac{3}{2θ^*}\log n$ converges in law for some suitable constants $x^*$ and $θ^*$. In this work, we investigate its moderate deviation, in other words, the convergence rates of $$\mathbb{P}\left(M_n\leq x^*n-\frac{3}{2θ^*}\log n-\ell_n\right),$$ for any positive sequence $(\ell_n)$ such that $\ell_n=O(n)$ and $\ell_n\uparrow\infty$. As a by-product, we also obtain lower deviation of $M_n$; i.e., the convergence rate of
\[
\mathbb{P}(M_n\leq xn),
\] for $x<x^*$ in Böttcher case where the offspring number is at least two. Finally, we apply our techniques to study the small ball probability of limit of derivative martingale.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.