Search | arXiv e-print repository

arXiv:2406.16499 [pdf, other]

Mixed precision iterative refinement for least squares with linear equality constraints and generalized least squares problems

Authors: Bowen Gao, Yuxin Ma, Meiyue Shao

Abstract: Recent development on mixed precision techniques has largely enhanced the performance of various linear algebra solvers, one of which being the least squares problem $\min_{x}\lVert b-Ax\rVert_{2}$. By transforming the least squares problem into an augmented linear system, mixed precision techniques are capable of refining the lower precision solution to the working precision. In this paper, we pr… ▽ More Recent development on mixed precision techniques has largely enhanced the performance of various linear algebra solvers, one of which being the least squares problem $\min_{x}\lVert b-Ax\rVert_{2}$. By transforming the least squares problem into an augmented linear system, mixed precision techniques are capable of refining the lower precision solution to the working precision. In this paper, we propose mixed precision iterative refinement algorithms for two variants of the least squares problem -- the least squares problem with linear equality constraints (LSE) and the generalized least squares problem (GLS). Both classical and GMRES-based iterative refinement can be applied to augmented systems of these two problems to improve the accuracy of the solution. For reasonably well-conditioned problems our algorithms reduce the execution time by a factor of 40% in average compared to the fixed precision ones from LAPACK on the x86-64 architecture. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 32 pages, 7 figures

MSC Class: 65F05; 65F08; 65F10

arXiv:2406.01908 [pdf, other]

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

Authors: Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

Abstract: Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L… ▽ More Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems. The new architecture PDHG-Net is designed by unrolling the recently emerged PDHG method into a neural network, combined with channel-expansion techniques borrowed from graph neural networks. We prove that the proposed PDHG-Net can recover PDHG algorithm, thus can approximate optimal solutions of LP instances with a polynomial number of neurons. We propose a two-stage inference approach: first use PDHG-Net to generate an approximate solution, and then apply PDHG algorithm to further improve the solution. Experiments show that our approach can significantly accelerate LP solving, achieving up to a 3$\times$ speedup compared to FOMs for large-scale LP problems. △ Less

Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2406.01200 [pdf, ps, other]

Probabilistic Lah numbers and Lah-Bell polynomials

Authors: Yuankui Ma, Taekyun Kim, Dae San Kim

Abstract: Let Y be a random variable whose moment generating function exists in some neighborhood of the origin. The aim of this paper is to study the probabilistic Lah numbers associated with Y and the probabilistic Lah-Bell polynomials associated with Y, as probabilistic versions of the Lah numbers and the Lah-Bell polynomials, respectively. We derive some properties, explicit expressions, recurrence rela… ▽ More Let Y be a random variable whose moment generating function exists in some neighborhood of the origin. The aim of this paper is to study the probabilistic Lah numbers associated with Y and the probabilistic Lah-Bell polynomials associated with Y, as probabilistic versions of the Lah numbers and the Lah-Bell polynomials, respectively. We derive some properties, explicit expressions, recurrence relations and certain identities for those numbers and polynomials. In addition, we treat the special cases that Y is the Poisson random variable with parameter α > 0 and the Bernoulli random variable with probability of success p. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages

MSC Class: 11B73; 11B83

arXiv:2406.00920 [pdf, ps, other]

Demystifying SGD with Doubly Stochastic Gradients

Authors: Kyurae Kim, Joohwan Ko, Yi-An Ma, Jacob R. Gardner

Abstract: Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each compone… ▽ More Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each component, while the sum is estimated by subsampling over these estimators. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as bounded variance. In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. In particular, for dependent estimators, our analysis allows fined-grained analysis of the effect correlations. As a result, under a per-iteration computational budget of $b \times m$, where $b$ is the minibatch size and $m$ is the number of Monte Carlo samples, our analysis suggests where one should invest most of the budget in general. Furthermore, we prove that random reshuffling (RR) improves the complexity dependence on the subsampling noise. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted to ICML'24

arXiv:2405.17527 [pdf, other]

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Authors: Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, Mingsheng Long

Abstract: Deep models have recently emerged as a promising tool to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve the PDEs reasonably well, they are mainly restricted to a specific set of PDEs, e.g. a certain equation or a finite set of coefficients. This bottleneck limits the generalizabil… ▽ More Deep models have recently emerged as a promising tool to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve the PDEs reasonably well, they are mainly restricted to a specific set of PDEs, e.g. a certain equation or a finite set of coefficients. This bottleneck limits the generalizability of neural solvers, which is widely recognized as its major advantage over numerical solvers. In this paper, we present the Universal PDE solver (Unisolver) capable of solving a wide scope of PDEs by leveraging a Transformer pre-trained on diverse data and conditioned on diverse PDEs. Instead of simply scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Our key finding is that a PDE solution is fundamentally under the control of a series of PDE components, e.g. equation symbols, coefficients, and initial and boundary conditions. Inspired by the mathematical structure of PDEs, we define a complete set of PDE components and correspondingly embed them as domain-wise (e.g. equation symbols) and point-wise (e.g. boundaries) conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art results on three challenging large-scale benchmarks, showing impressive gains and endowing favorable generalizability and scalability. △ Less

Submitted 1 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.06889 [pdf, other]

Tuning parameter selection for the adaptive nuclear norm regularized trace regression

Authors: Pan Shang, Lingchen Kong, Yiting Ma

Abstract: Regularized models have been applied in lots of areas, with high-dimensional data sets being popular. Because tuning parameter decides the theoretical performance and computational efficiency of the regularized models, tuning parameter selection is a basic and important issue. We consider the tuning parameter selection for adaptive nuclear norm regularized trace regression, which achieves by the B… ▽ More Regularized models have been applied in lots of areas, with high-dimensional data sets being popular. Because tuning parameter decides the theoretical performance and computational efficiency of the regularized models, tuning parameter selection is a basic and important issue. We consider the tuning parameter selection for adaptive nuclear norm regularized trace regression, which achieves by the Bayesian information criterion (BIC). The proposed BIC is established with the help of an unbiased estimator of degrees of freedom. Under some regularized conditions, this BIC is proved to achieve the rank consistency of the tuning parameter selection. That is the model solution under selected tuning parameter converges to the true solution and has the same rank with that of the true solution in probability. Some numerical results are presented to evaluate the performance of the proposed BIC on tuning parameter selection. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.04730 [pdf, ps, other]

A rigidity property for a type of wave-Klein-Gordon system

Authors: Yan-Tao Li, Yue Ma

Abstract: In this paper we investigate the rigidity property of a wave component coupled in a wave-Klein-Gordon system. We prove that when the radiation field of the wave component vanishes at the null infinity, the initial data of this component also vanish, therefor there is no wave in the whole spacetime In this paper we investigate the rigidity property of a wave component coupled in a wave-Klein-Gordon system. We prove that when the radiation field of the wave component vanishes at the null infinity, the initial data of this component also vanish, therefor there is no wave in the whole spacetime △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 29 pages

arXiv:2405.03931 [pdf, ps, other]

Incorporating changeable attitudes toward vaccination into an SIR infectious disease model

Authors: Yi Jiang, Kristin M. Kurianski, Jane H. Lee, Yan** Ma, Daniel Cicala, Glenn Ledder

Abstract: We develop a mechanistic model that classifies individuals both in terms of epidemiological status (SIR) and vaccination attitude (willing or unwilling), with the goal of discovering how disease spread is influenced by changing opinions about vaccination. Analysis of the model identifies existence and stability criteria for both disease-free and endemic disease equilibria. The analytical results,… ▽ More We develop a mechanistic model that classifies individuals both in terms of epidemiological status (SIR) and vaccination attitude (willing or unwilling), with the goal of discovering how disease spread is influenced by changing opinions about vaccination. Analysis of the model identifies existence and stability criteria for both disease-free and endemic disease equilibria. The analytical results, supported by numerical simulations, show that attitude changes induced by disease prevalence can destabilize endemic disease equilibria, resulting in limit cycles. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 30 pages, 3 tables, 10 figures

MSC Class: 37N25 (Primary) 92D30 (Secondary)

arXiv:2405.01298 [pdf, other]

Reorthogonalized Pythagorean variants of block classical Gram-Schmidt

Authors: Erin Carson, Kathryn Lund, Yuxin Ma, Eda Oktay

Abstract: Block classical Gram-Schmidt (BCGS) is commonly used for orthogonalizing a set of vectors $X$ in distributed computing environments due to its favorable communication properties relative to other orthogonalization approaches, such as modified Gram-Schmidt or Householder. However, it is known that BCGS (as well as recently developed low-synchronization variants of BCGS) can suffer from a significan… ▽ More Block classical Gram-Schmidt (BCGS) is commonly used for orthogonalizing a set of vectors $X$ in distributed computing environments due to its favorable communication properties relative to other orthogonalization approaches, such as modified Gram-Schmidt or Householder. However, it is known that BCGS (as well as recently developed low-synchronization variants of BCGS) can suffer from a significant loss of orthogonality in finite-precision arithmetic, which can contribute to instability and inaccurate solutions in downstream applications such as $s$-step Krylov subspace methods. A common solution to improve the orthogonality among the vectors is reorthogonalization. Focusing on the "Pythagorean" variant of BCGS, introduced in [E. Carson, K. Lund, & M. Rozložník. SIAM J. Matrix Anal. Appl. 42(3), pp. 1365--1380, 2021], which guarantees an $O(\varepsilon)κ^2(X)$ bound on the loss of orthogonality as long as $O(\varepsilon)κ^2(X)<1$, where $\varepsilon$ denotes the unit roundoff, we introduce and analyze two reorthogonalized Pythagorean BCGS variants. These variants feature favorable communication properties, with asymptotically two synchronization points per block column, as well as an improved $O(\varepsilon)$ bound on the loss of orthogonality. Our bounds are derived in a general fashion to additionally allow for the analysis of mixed-precision variants. We verify our theoretical results with a panel of test matrices and experiments from a new version of the \texttt{BlockStab} toolbox. △ Less

Submitted 2 May, 2024; originally announced May 2024.

MSC Class: 65-04; 65F25; 65G50; 65Y20

arXiv:2405.00703 [pdf, ps, other]

Local and Global Log-Gradient estimates of solutions to $Δ_pv+bv^q+cv^r =0$ on manifolds and applications

Authors: Jie He, Yuanqing Ma, Youde Wang

Abstract: In this paper, we employ the Nash-Moser iteration technique to study local and global properties of positive solutions to the equation $$Δ_pv+bv^q+cv^r =0$$ on complete Riemannian manifolds with Ricci curvature bounded from below, where $b, c\in\mathbb R$, $p>1$, and $q\leq r$ are some real constants. Assuming certain conditions on $b,\, c,\, p,\, q$ and $r$, we derive succinct Cheng-Yau type grad… ▽ More In this paper, we employ the Nash-Moser iteration technique to study local and global properties of positive solutions to the equation $$Δ_pv+bv^q+cv^r =0$$ on complete Riemannian manifolds with Ricci curvature bounded from below, where $b, c\in\mathbb R$, $p>1$, and $q\leq r$ are some real constants. Assuming certain conditions on $b,\, c,\, p,\, q$ and $r$, we derive succinct Cheng-Yau type gradient estimates for positive solutions, which is of sharp form. These gradient estimates allow us to obtain some Liouville-type theorems and Harnack inequalities. Our Liouville-type results are novel even in Euclidean spaces. Based on the local gradient estimates and a trick of Sung and Wang, we also obtain the global gradient estimates for such solutions. As applications we show the uniqueness of positive solutions to some generalized Allen-Cahn equation and Fisher-KPP equation. △ Less

Submitted 22 April, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2311.02568; text overlap with arXiv:2311.13179

arXiv:2404.08913 [pdf, ps, other]

On the best approximation by finite Gaussian mixtures

Authors: Yun Ma, Yihong Wu, Pengkun Yang

Abstract: We consider the problem of approximating a general Gaussian location mixture by finite mixtures. The minimum order of finite mixtures that achieve a prescribed accuracy (measured by various $f$-divergences) is determined within constant factors for the family of mixing distributions with compactly support or appropriate assumptions on the tail probability including subgaussian and subexponential.… ▽ More We consider the problem of approximating a general Gaussian location mixture by finite mixtures. The minimum order of finite mixtures that achieve a prescribed accuracy (measured by various $f$-divergences) is determined within constant factors for the family of mixing distributions with compactly support or appropriate assumptions on the tail probability including subgaussian and subexponential. While the upper bound is achieved using the technique of local moment matching, the lower bound is established by relating the best approximation error to the low-rank approximation of certain trigonometric moment matrices, followed by a refined spectral analysis of their minimum eigenvalue. In the case of Gaussian mixing distributions, this result corrects a previous lower bound in [Allerton Conference 48 (2010) 620-628]. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08830 [pdf, ps, other]

Strongly Gauduchon Hyperbolicity and Two Other Types of Hyperbolicity

Authors: Yi Ma

Abstract: This paper proposes sG-hyperbolicity as a new tool for studying hyperbolicity on complex manifolds. It demonstrates that this notion leads to a wider class of divisorially hyperbolic manifolds compared to balanced hyperbolicity. We also introduce weakly p-Kähler hyperbolic structures and pluriclosed star split hyperbolic metrics as possible new avenues for exploration. This paper proposes sG-hyperbolicity as a new tool for studying hyperbolicity on complex manifolds. It demonstrates that this notion leads to a wider class of divisorially hyperbolic manifolds compared to balanced hyperbolicity. We also introduce weakly p-Kähler hyperbolic structures and pluriclosed star split hyperbolic metrics as possible new avenues for exploration. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 12 pages

arXiv:2404.04507 [pdf, other]

Irrational-window-filter projection method and application to quasiperiodic Schrödinger eigenproblems

Authors: Kai Jiang, Xueyang Li, Yao Ma, Juan Zhang, **wen Zhang, Qi Zhou

Abstract: In this paper, we propose a new algorithm, the irrational-window-filter projection method (IWFPM), for solving arbitrary dimensional global quasiperiodic systems. Based on the projection method (PM), IWFPM further utilizes the concentrated distribution of Fourier coefficients to filter out relevant spectral points using an irrational window. Moreover, a corresponding index-shift transform is desig… ▽ More In this paper, we propose a new algorithm, the irrational-window-filter projection method (IWFPM), for solving arbitrary dimensional global quasiperiodic systems. Based on the projection method (PM), IWFPM further utilizes the concentrated distribution of Fourier coefficients to filter out relevant spectral points using an irrational window. Moreover, a corresponding index-shift transform is designed to make the Fast Fourier Transform available. The corresponding error analysis on the function approximation level is also given. We apply IWFPM to 1D, 2D, and 3D quasiperiodic Schrödinger eigenproblems to demonstrate its accuracy and efficiency. IWFPM exhibits a significant computational advantage over PM for both extended and localized quantum states. Furthermore, the widespread existence of such spectral point distribution feature can endow IWFPM with significant potential for broader applications in quasiperiodic systems. △ Less

Submitted 30 June, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

MSC Class: 35P05; 35J10; 65D15; 65T50; 81-08

arXiv:2403.19871 [pdf, other]

Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences

Authors: Dimitris Bertsimas, Vassilis Digalakis Jr, Yu Ma, Phevos Paschalidis

Abstract: We consider the task of retraining machine learning (ML) models when new batches of data become available. Existing methods focus largely on greedy approaches to find the best-performing model for each batch, without considering the stability of the model's structure across retraining iterations. In this study, we propose a methodology for finding sequences of ML models that are stable across retr… ▽ More We consider the task of retraining machine learning (ML) models when new batches of data become available. Existing methods focus largely on greedy approaches to find the best-performing model for each batch, without considering the stability of the model's structure across retraining iterations. In this study, we propose a methodology for finding sequences of ML models that are stable across retraining iterations. We develop a mixed-integer optimization formulation that is guaranteed to recover Pareto optimal models (in terms of the predictive power-stability trade-off) and an efficient polynomial-time algorithm that performs well in practice. We focus on retaining consistent analytical insights - which is important to model interpretability, ease of implementation, and fostering trust with users - by using custom-defined distance metrics that can be directly incorporated into the optimization problem. Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power, as evidenced through a real-world case study in a major hospital system in Connecticut. △ Less

Submitted 22 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19122 [pdf, other]

Safety-Critical Planning and Control for Dynamic Obstacle Avoidance Using Control Barrier Functions

Authors: Shuo Liu, Yihui Mao

Abstract: Dynamic obstacle avoidance is a challenging topic for optimal control and optimization-based trajectory planning problems, especially when in a tight environment. Many existing works use control barrier functions (CBFs) to enforce safety constraints within control systems. Inside these works, CBFs are usually formulated under model predictive control (MPC) framework to anticipate future states and… ▽ More Dynamic obstacle avoidance is a challenging topic for optimal control and optimization-based trajectory planning problems, especially when in a tight environment. Many existing works use control barrier functions (CBFs) to enforce safety constraints within control systems. Inside these works, CBFs are usually formulated under model predictive control (MPC) framework to anticipate future states and make informed decisions, or integrated with path planning algorithms as a safety enhancement tool. However, these approaches usually require knowledge of the obstacle boundary equations or have very slow computational efficiency. In this paper, we propose a novel framework to the iterative MPC with discrete-time CBFs (DCBFs) to generate a collision-free trajectory. The DCBFs are obtained from convex polyhedra generated in sequential grid maps, without the need to know the boundary equations of obstacles. Additionally, a path planning algorithm is incorporated into this framework to ensure the global optimality of the generated trajectory. We demonstrate through numerical examples that our framework enables a unicycle robot to safely and efficiently navigate through tight and dynamically changing environments, tackling both convex and nonconvex obstacles with remarkable computing efficiency and reliability in control and trajectory generation. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures. arXiv admin note: text overlap with arXiv:2210.04361

arXiv:2403.11217 [pdf]

Research on Personal Credit Risk Assessment Methods Based on Causal Inference

Authors: Jiaxin Wang, YiLong Ma

Abstract: The discussion on causality in human history dates back to ancient Greece, yet to this day, there is still no consensus. Fundamentally, this stems from the nature of human cognition, as understanding causality requires abstract tools to transcend the limitations of human cognition. In recent decades, the rapid development of mathematical and computational tools has provided new theoretical and tec… ▽ More The discussion on causality in human history dates back to ancient Greece, yet to this day, there is still no consensus. Fundamentally, this stems from the nature of human cognition, as understanding causality requires abstract tools to transcend the limitations of human cognition. In recent decades, the rapid development of mathematical and computational tools has provided new theoretical and technical means for exploring causality, creating more avenues for investigation. Based on this, this paper introduces a new definition of causality using category theory, proposed by Samuel Eilenberg and Saunders Mac Lane in 1945 to avoid the self-referential contradictions in set theory, notably the Russell paradox. Within this framework, the feasibility of indicator synthesis in causal inference is demonstrated. Due to the limitations in the development of category theory-related technical tools, this paper adopts the widely-used probabilistic causal graph tool proposed by Judea Pearl in 1995 to study the application of causal inference in personal credit risk management. The specific work includes: research on the construction method of causal inference index system, definition of causality and feasibility proof of indicator synthesis causal inference within this framework, application methods of causal graph model and intervention alternative criteria in personal credit risk management, and so on. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11163 [pdf, ps, other]

doi 10.1080/24754269.2024.2343151

A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

Authors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, **g Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng Wang

Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas… ▽ More This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first class of literature is about distributed computing and focuses on the situation, where the dataset size is too huge to be comfortably handled by one single computer. In this case, a distributed computation system with multiple computers has to be utilized. The second class of literature is about subsampling methods and concerns about the situation, where the sample size of dataset is small enough to be placed on one single computer but too large to be easily processed by its memory as a whole. The last class of literature studies those minibatch gradient related optimization techniques, which have been extensively used for optimizing various deep learning models. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.01131 [pdf, other]

LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation

Authors: Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Guojun Peng, Zhiguang Cao, Yining Ma, Yue-Jiao Gong

Abstract: Recent research explores optimization using large language models (LLMs) by either iteratively seeking next-step solutions from LLMs or directly prompting LLMs for an optimizer. However, these approaches exhibit inherent limitations, including low operational efficiency, high sensitivity to prompt design, and a lack of domain-specific knowledge. We introduce LLaMoCo, the first instruction-tuning f… ▽ More Recent research explores optimization using large language models (LLMs) by either iteratively seeking next-step solutions from LLMs or directly prompting LLMs for an optimizer. However, these approaches exhibit inherent limitations, including low operational efficiency, high sensitivity to prompt design, and a lack of domain-specific knowledge. We introduce LLaMoCo, the first instruction-tuning framework designed to adapt LLMs for solving optimization problems in a code-to-code manner. Specifically, we establish a comprehensive instruction set containing well-described problem prompts and effective optimization codes. We then develop a novel two-phase learning strategy that incorporates a contrastive learning-based warm-up procedure before the instruction-tuning phase to enhance the convergence behavior during model fine-tuning. The experiment results demonstrate that a CodeGen (350M) model fine-tuned by our LLaMoCo achieves superior optimization performance compared to GPT-4 Turbo and the other competitors across both synthetic and realistic problem sets. The fine-tuned model and the usage instructions are available at https://anonymous.4open.science/r/LLaMoCo-722A. △ Less

Submitted 5 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.00468 [pdf, ps, other]

Probabilistic central Bell polynomials

Authors: R. Xu, Y. Ma, T. Kim, D. S. Kim, S. Boulaars

Abstract: Let Y be a random variable whose moment generating function exists in a neighborhood of the origin. In this paper, we study the probabilistic central Bell polynomials associated with random variable Y, as probabilistic extension of the central Bell polynomials. In addition, we investigate the probabilistic central factorial numbers of the second kind associated with Y and the probabilistic central… ▽ More Let Y be a random variable whose moment generating function exists in a neighborhood of the origin. In this paper, we study the probabilistic central Bell polynomials associated with random variable Y, as probabilistic extension of the central Bell polynomials. In addition, we investigate the probabilistic central factorial numbers of the second kind associated with Y and the probabilistic central Fubini polynomials associated with Y. The aim of this paper is to derive some properties, explicit expressions, certain identities and recurrence relations for those polynomials and numbers. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 12 pages

MSC Class: 11B73; 11B83

arXiv:2402.10396 [pdf]

Improved SQP and SLSQP Algorithms for Feasible Path-based Process Optimisation

Authors: Yingjie Ma, Xi Gao, Chao Liu, Jie Li

Abstract: The feasible path algorithm has been widely used for process optimisation due to its good convergence. The sequential quadratic programming (SQP) algorithm is usually used to drive the feasible path algorithm towards optimality. However, existing SQP algorithms may suffer from inconsistent quadratic programming (QP) subproblems and numerical noise, especially for ill-conditioned optimisation probl… ▽ More The feasible path algorithm has been widely used for process optimisation due to its good convergence. The sequential quadratic programming (SQP) algorithm is usually used to drive the feasible path algorithm towards optimality. However, existing SQP algorithms may suffer from inconsistent quadratic programming (QP) subproblems and numerical noise, especially for ill-conditioned optimisation problems, leading to a suboptimal or infeasible solution. In this work, we propose an improved SQP algorithm (I-SQP) and an improved sequential least squares programming algorithm (I-SLSQP) that solves a least squares (LSQ) subproblem at each major iteration. A hybrid method through the combination of two relaxation formulations is proposed to solve inconsistent subproblems for better convergence and higher efficiency. We analyse a certain part of the dual LSQ solution algorithm and find it suffers from serious cancellation errors, resulting in an inaccurate search direction or no viable search direction generated. Therefore, the QP solver is used to solve LSQ subproblems in such a situation. Several challenging process optimisation problems are solved to demonstrate the advantages of the proposed algorithms over existing solvers including the SLSQP solver in SciPy, fmincon in Matlab and IPOPT. △ Less

Submitted 26 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 38 pages, 17 figures, 2 tables

arXiv:2401.08942 [pdf, ps, other]

Ramsey and Gallai-Ramsey numbers for linear forests and kipas

Authors: ** Li, Ya** Mao, Ingo Schiermeyer, Yifan Yao

Abstract: For two graphs $G,H$, the \emph{Ramsey number} $r(G,H)$ is the minimum integer $n$ such that any red/blue edge-coloring of $K_n$ contains either a red copy of $G$ or a blue copy of $H$. For two graphs $G,H$, the \emph{Gallai-Ramsey number} $\operatorname{gr}_k(G:H)$ is defined as the minimum integer $n$ such that any $k$-edge-coloring of $K_n$ must contain either a rainbow copy of $G$ or a monochr… ▽ More For two graphs $G,H$, the \emph{Ramsey number} $r(G,H)$ is the minimum integer $n$ such that any red/blue edge-coloring of $K_n$ contains either a red copy of $G$ or a blue copy of $H$. For two graphs $G,H$, the \emph{Gallai-Ramsey number} $\operatorname{gr}_k(G:H)$ is defined as the minimum integer $n$ such that any $k$-edge-coloring of $K_n$ must contain either a rainbow copy of $G$ or a monochromatic copy of $H$. In this paper, the classical Ramsey numbers of linear forest versus kipas are obtained. We obtain the exact values of $\operatorname{gr}_k(G:H)$, where $H$ is either a path or a kipas and $G\in\{K_{1,3},P_4^+,P_5\}$ and $P_4^+$ is the graph consisting of $P_4$ with one extra edge incident with inner vertex. △ Less

Submitted 25 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.07228 [pdf, other]

A Lawson-time-splitting extended Fourier pseudospectral method for the Gross-Pitaevskii equation with time-dependent low regularity potential

Authors: Bo Lin, Ying Ma, Chushan Wang

Abstract: We propose a Lawson-time-splitting extended Fourier pseudospectral (LTSeFP) method for the numerical integration of the Gross-Pitaevskii equation with time-dependent potential that is of low regularity in space. For the spatial discretization of low regularity potential, we use an extended Fourier pseudospectral (eFP) method, i.e., we compute the discrete Fourier transform of the low regularity po… ▽ More We propose a Lawson-time-splitting extended Fourier pseudospectral (LTSeFP) method for the numerical integration of the Gross-Pitaevskii equation with time-dependent potential that is of low regularity in space. For the spatial discretization of low regularity potential, we use an extended Fourier pseudospectral (eFP) method, i.e., we compute the discrete Fourier transform of the low regularity potential in an extended window. For the temporal discretization, to efficiently implement the eFP method for time-dependent low regularity potential, we combine the standard time-splitting method with a Lawson-type exponential integrator to integrate potential and nonlinearity differently. The LTSeFP method is both accurate and efficient: it achieves first-order convergence in time and optimal-order convergence in space in $L^2$-norm under low regularity potential, while the computational cost is comparable to the standard time-splitting Fourier pseudospectral method. Theoretically, we also prove such convergence orders for a large class of spatially low regularity time-dependent potential. Extensive numerical results are reported to confirm the error estimates and to demonstrate the superiority of our method. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: 19 pages, 10 figures

MSC Class: 35Q55; 65M15; 65M70; 81Q05

arXiv:2401.06325 [pdf, other]

Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo

Authors: Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang

Abstract: To sample from a general target distribution $p_*\propto e^{-f_*}$ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimat… ▽ More To sample from a general target distribution $p_*\propto e^{-f_*}$ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation. However, the original DMC algorithm encountered high gradient complexity, resulting in an exponential dependency on the error tolerance $ε$ of the obtained samples. In this paper, we demonstrate that the high complexity of DMC originates from its redundant design of score estimation, and proposed a more efficient algorithm, called RS-DMC, based on a novel recursive score estimation method. In particular, we first divide the entire diffusion process into multiple segments and then formulate the score estimation step (at any time step) as a series of interconnected mean estimation and sampling subproblems accordingly, which are correlated in a recursive manner. Importantly, we show that with a proper design of the segment decomposition, all sampling subproblems will only need to tackle a strongly log-concave distribution, which can be very efficient to solve using the Langevin-based samplers with a provably rapid convergence rate. As a result, we prove that the gradient complexity of RS-DMC only has a quasi-polynomial dependency on $ε$, which significantly improves exponential gradient complexity in Huang et al. (2023). Furthermore, under commonly used dissipative conditions, our algorithm is provably much faster than the popular Langevin-based algorithms. Our algorithm design and theoretical framework illuminate a novel direction for addressing sampling problems, which could be of broader applicability in the community. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 54 pages

arXiv:2401.02638 [pdf, ps, other]

Probabilistic degenerate Fubini polynomials associated with random variables

Authors: Rongrong Xu, Taekyun Kim, Dae San Kim, Yuankui Ma

Abstract: Let Y be a random variable such that the moment generating function of Y exists in a neighborhood of the origin. The aim of this paper is to study probabilistic versions of the degenerate Fubini polynomials and the degenerate Fubini polynomials of order $r$, namely the probabilisitc degenerate Fubini polynomials associated with Y and the probabilistic degenerate Fubini polynomials of order r assoc… ▽ More Let Y be a random variable such that the moment generating function of Y exists in a neighborhood of the origin. The aim of this paper is to study probabilistic versions of the degenerate Fubini polynomials and the degenerate Fubini polynomials of order $r$, namely the probabilisitc degenerate Fubini polynomials associated with Y and the probabilistic degenerate Fubini polynomials of order r associated with Y. We derive some properties, explicit expressions, certain identities and recurrence relations for those polynomials. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 15

MSC Class: 11B73; 11B83

arXiv:2312.17712 [pdf, ps, other]

The Euclidean-hyperboloidal foliation method. Application to f(R) modified gravity

Authors: Philippe G. LeFloch, Yue Ma

Abstract: This paper is a part of a series devoted to the Euclidean-hyperboloidal foliation method introduced by the authors for investigating the global existence problem associated with nonlinear systems of coupled wave-Klein-Gordon equations with small data. This method was developed especially for investigating the initial value problem for the Einstein-massive field system in wave gauge. Here, we study… ▽ More This paper is a part of a series devoted to the Euclidean-hyperboloidal foliation method introduced by the authors for investigating the global existence problem associated with nonlinear systems of coupled wave-Klein-Gordon equations with small data. This method was developed especially for investigating the initial value problem for the Einstein-massive field system in wave gauge. Here, we study the (fourth-order) field equations of f(R) modified gravity and investigate the global dynamical behavior of the gravitational field in the near-Minkowski regime. We establish the existence of a globally hyperbolic Cauchy development approaching Minkowski spacetime (in spacelike, null, and timelike directions), when the initial data set is sufficiently close to an asymptotically Euclidean and spacelike hypersurface in Minkowski spacetime. We cast the (fourth-order) f(R)-field equations in the form of a second-order wave-Klein-Gordon system, which has an analogous structure to the Einstein-massive field system but, in addition, involves a (possibly small) effective mass parameter. We establish the nonlinear stability of the Minkowski spacetime in the context of f(R) gravity, when the integrand f(R) in the action functional can be taken to be arbitrarily close to the integrand R of the standard Hilbert-Einstein action. △ Less

Submitted 7 May, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: 46 pages

arXiv:2312.14361 [pdf, ps, other]

A Gradient-Based Optimization Method Using the Koopman Operator

Authors: Mengqi Hu, Bian Li, Yi-An Ma, Yifei Lou, Xiu Yang

Abstract: In this paper, we propose a novel approach to solving optimization problems by reformulating the optimization problem into a dynamical system, followed by the adaptive spectral Koopman (ASK) method. The Koopman operator, employed in our approach, approximates the evolution of an ordinary differential equation (ODE) using a finite number of eigenfunctions and eigenvalues. We begin by providing a br… ▽ More In this paper, we propose a novel approach to solving optimization problems by reformulating the optimization problem into a dynamical system, followed by the adaptive spectral Koopman (ASK) method. The Koopman operator, employed in our approach, approximates the evolution of an ordinary differential equation (ODE) using a finite number of eigenfunctions and eigenvalues. We begin by providing a brief overview of the Koopman operator and the ASK method. Subsequently, we adapt the ASK method for solving a general optimization problem. Moreover, we provide an error bound to aid in understanding the performance of the proposed approach, marking the initial step in a more comprehensive numerical analysis. Experimentally, we demonstrate the applicability and accuracy of our method across a diverse range of optimization problems, including min-max problems. Our approach consistently yields smaller gradient norms and higher success rates in finding critical points compared to state-of-the-art gradient-based methods. We also observe the proposed method works particularly well when the dynamical properties of the system can be effectively modeled by the system's behaviors in a neighborhood of critical points. △ Less

Submitted 21 December, 2023; originally announced December 2023.

MSC Class: 37N30; 37N40; 37Mxx; 46N10; 47N10

arXiv:2312.01046 [pdf, other]

Bagged Regularized $k$-Distances for Anomaly Detection

Authors: Yuchao Cai, Yuheng Ma, Hanfang Yang, Hanyuan Hang

Abstract: We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm cal… ▽ More We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD) converting the unsupervised anomaly detection problem into a convex optimization problem. Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE). This approach enables us to successfully address the sensitivity challenge of the hyperparameter choice in distance-based algorithms. Moreover, when dealing with large-scale datasets, the efficiency issues can be addressed by the incorporated bagging technique in our BRDAD algorithm. On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity. On the practical side, we conduct numerical experiments on anomaly detection benchmarks to illustrate the insensitivity of parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Moreover, promising improvements are brought by applying the bagging technique in our algorithm on real-world datasets. △ Less

Submitted 13 February, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.16899 [pdf, other]

On the saturation spectrum of the unions of disjoint cycles

Authors: Yue Ma

Abstract: Let $G$ be a graph and $\mathcal{H}$ be a family of graphs. We say $G$ is $\mathcal{H}$-saturated if $G$ does not contain a copy of $H$ with $H\in\mathcal{H}$, but the addition of any edge $e\notin E(G)$ creates at least one copy of some $H\in\mathcal{H}$ within $G+e$. The saturation number of $\mathcal{H}$ is the minimum size of an $\mathcal{H}$-saturated graph on $n$ vertices, and the saturation… ▽ More Let $G$ be a graph and $\mathcal{H}$ be a family of graphs. We say $G$ is $\mathcal{H}$-saturated if $G$ does not contain a copy of $H$ with $H\in\mathcal{H}$, but the addition of any edge $e\notin E(G)$ creates at least one copy of some $H\in\mathcal{H}$ within $G+e$. The saturation number of $\mathcal{H}$ is the minimum size of an $\mathcal{H}$-saturated graph on $n$ vertices, and the saturation spectrum of $\mathcal{H}$ is the set of all possible sizes of an $\mathcal{H}$-saturated graph on $n$ vertices. Let $k\mathcal{C}_{\ge 3}$ be the family of the unions of $k$ vertex-disjoint cycles. In this note, we completely determine the saturation number and the saturation spectrum of $k\mathcal{C}_{\ge 3}$ for $k=2$ and give some results for $k\ge 3$. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 24 pages, 4 figures

arXiv:2311.06960 [pdf, other]

Robust Regression over Averaged Uncertainty

Authors: Dimitris Bertsimas, Yu Ma

Abstract: We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing… ▽ More We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets in real-world regression problems obtained from UCI datasets. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.01381 [pdf, ps, other]

Li-Yau Inequality and Liouville Property to a Semilinear Heat Equation on Riemannian Manifolds

Authors: Huan-Jie Chen, Shi-Zhong Du, Yue-Xiao Ma

Abstract: This work deals with the Entire solutions of a nonlinear equation. The first part of this paper is devoted to investigation of the Liouville property on compact manifolds, which extends a result by Castorina-Mantegazza [4] for positive f. Secondly, we will turn to non-compact manifolds and prove a Liouville theorem under the assumptions of boundedness of the Ricci curvature from below, diffeomorph… ▽ More This work deals with the Entire solutions of a nonlinear equation. The first part of this paper is devoted to investigation of the Liouville property on compact manifolds, which extends a result by Castorina-Mantegazza [4] for positive f. Secondly, we will turn to non-compact manifolds and prove a Liouville theorem under the assumptions of boundedness of the Ricci curvature from below, diffeomorphism of M with R^N and sub-criticality of p defined below. Finally, we also present simplified proofs of Yau's theorem for harmonic function and Gidas-Spruck's theorem for elliptic semilinear equation. Our proofs are based on Li-Yau type estimation for nonlinear equations. △ Less

Submitted 2 November, 2023; originally announced November 2023.

MSC Class: 35K58; 53B21; 35K05

arXiv:2310.20177 [pdf, other]

An extended Fourier pseudospectral method for the Gross-Pitaevskii equation with low regularity potential

Authors: Weizhu Bao, Bo Lin, Ying Ma, Chushan Wang

Abstract: We propose and analyze an extended Fourier pseudospectral (eFP) method for the spatial discretization of the Gross-Pitaevskii equation (GPE) with low regularity potential by treating the potential in an extended window for its discrete Fourier transform. The proposed eFP method maintains optimal convergence rates with respect to the regularity of the exact solution even if the potential is of low… ▽ More We propose and analyze an extended Fourier pseudospectral (eFP) method for the spatial discretization of the Gross-Pitaevskii equation (GPE) with low regularity potential by treating the potential in an extended window for its discrete Fourier transform. The proposed eFP method maintains optimal convergence rates with respect to the regularity of the exact solution even if the potential is of low regularity and enjoys similar computational cost as the standard Fourier pseudospectral method, and thus it is both efficient and accurate. Furthermore, similar to the Fourier spectral/pseudospectral methods, the eFP method can be easily coupled with different popular temporal integrators including finite difference methods, time-splitting methods and exponential-type integrators. Numerical results are presented to validate our optimal error estimates and to demonstrate that they are sharp as well as to show its efficiency in practical computations. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 20 pages, 7 figures

MSC Class: 35Q55; 65M15; 65M70; 81Q05

arXiv:2310.16463 [pdf, ps, other]

Constructing disjoint Steiner trees in Sierpiński graphs

Authors: Chenxu Yang, ** Li, Ya** Mao, Eddie Cheng, Ralf Klasing

Abstract: Let $G$ be a graph and $S\subseteq V(G)$ with $|S|\geq 2$. Then the trees $T_1, T_2, \cdots, T_\ell$ in $G$ are \emph{internally disjoint Steiner trees} connecting $S$ (or $S$-Steiner trees) if $E(T_i) \cap E(T_j )=\emptyset$ and $V(T_i)\cap V(T_j)=S$ for every pair of distinct integers $i,j$, $1 \leq i, j \leq \ell$. Similarly, if we only have the condition $E(T_i) \cap E(T_j )=\emptyset$ but wit… ▽ More Let $G$ be a graph and $S\subseteq V(G)$ with $|S|\geq 2$. Then the trees $T_1, T_2, \cdots, T_\ell$ in $G$ are \emph{internally disjoint Steiner trees} connecting $S$ (or $S$-Steiner trees) if $E(T_i) \cap E(T_j )=\emptyset$ and $V(T_i)\cap V(T_j)=S$ for every pair of distinct integers $i,j$, $1 \leq i, j \leq \ell$. Similarly, if we only have the condition $E(T_i) \cap E(T_j )=\emptyset$ but without the condition $V(T_i)\cap V(T_j)=S$, then they are \emph{edge-disjoint Steiner trees}. The \emph{generalized $k$-connectivity}, denoted by $κ_k(G)$, of a graph $G$, is defined as $κ_k(G)=\min\{κ_G(S)|S \subseteq V(G) \ \textrm{and} \ |S|=k \}$, where $κ_G(S)$ is the maximum number of internally disjoint $S$-Steiner trees. The \emph{generalized local edge-connectivity} $λ_{G}(S)$ is the maximum number of edge-disjoint Steiner trees connecting $S$ in $G$. The {\it generalized $k$-edge-connectivity} $λ_k(G)$ of $G$ is defined as $λ_k(G)=\min\{λ_{G}(S)\,|\,S\subseteq V(G) \ and \ |S|=k\}$. These measures are generalizations of the concepts of connectivity and edge-connectivity, and they and can be used as measures of vulnerability of networks. It is, in general, difficult to compute these generalized connectivities. However, there are precise results for some special classes of graphs. In this paper, we obtain the exact value of $λ_{k}(S(n,\ell))$ for $3\leq k\leq \ell^n$, and the exact value of $κ_{k}(S(n,\ell))$ for $3\leq k\leq \ell$, where $S(n, \ell)$ is the Sierpiński graphs with order $\ell^n$. As a direct consequence, these graphs provide additional interesting examples when $λ_{k}(S(n,\ell))=κ_{k}(S(n,\ell))$. We also study the some network properties of Sierpiński graphs. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.08791 [pdf, other]

Refinements on vertical Sato-Tate

Authors: Zhao Yu Ma

Abstract: Vertical Sato-Tate states that the Frobenius trace of a randomly chosen elliptic curve over $\mathbb F_p$ tends to a semicircular distribution as $p\rightarrow \infty$. We go beyond this statement by considering the number of elliptic curves $N_{t,p}'$ with a given trace $t$ over $\mathbb F_p$ and characterizing the 2-dimensional distribution of $(t,N_{t,p}')$. In particular, this gives the distri… ▽ More Vertical Sato-Tate states that the Frobenius trace of a randomly chosen elliptic curve over $\mathbb F_p$ tends to a semicircular distribution as $p\rightarrow \infty$. We go beyond this statement by considering the number of elliptic curves $N_{t,p}'$ with a given trace $t$ over $\mathbb F_p$ and characterizing the 2-dimensional distribution of $(t,N_{t,p}')$. In particular, this gives the distribution of the size of isogeny classes of elliptic curves over $\mathbb F_p$. Furthermore, we show a notion of stronger convergence for vertical Sato-Tate which states that the number of elliptic curves with Frobenius trace in an interval of length $p^ε$ converges to the expected amount. The key step in the proof is to truncate Gekeler's infinite product formula, which relies crucially on an effective Chebotarev's density theorem that was recently developed by Pierce, Turnage-Butterbaugh and Wood. △ Less

Submitted 29 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: 27 pages, 4 figures. Minor edits for clarity

MSC Class: 11G07 (Primary) 14H52; 11G20 (Secondary)

arXiv:2309.02660 [pdf, ps, other]

A Bi-level Globalization Strategy for Non-convex Consensus ADMM and ALADIN

Authors: Xu Du, **gzhe Wang, Xiaohua Zhou, Yijie Mao

Abstract: In this paper, we formally analyze global convergence in the realm of distributed consensus optimization. Current solutions have explored such analysis, particularly focusing on consensus alternating direction method of multipliers (CADMM), including convex and non-convex cases. While such efforts on non-convexity offer elegant theory guaranteeing global convergence, they entail strong assumptions… ▽ More In this paper, we formally analyze global convergence in the realm of distributed consensus optimization. Current solutions have explored such analysis, particularly focusing on consensus alternating direction method of multipliers (CADMM), including convex and non-convex cases. While such efforts on non-convexity offer elegant theory guaranteeing global convergence, they entail strong assumptions and complicated proof techniques that are increasingly pose challenges when adopted to real-world applications. To resolve such tension, we propose a novel bi-level globalization strategy that not only guarantees global convergence but also provides succinct proofs, all while requiring mild assumptions. We begin by adopting such a strategy to perform global convergence analysis for the non-convex cases in C-ADMM. Then, we employ our proposed strategy in consensus augmented Lagrangian based alternating direction inexact Newton method (C-ALADIN), a more recent and generalization of C-ADMM. Surprisingly, our analysis shows that C-ALADIN globally converges to local optimizer, complementary to the prior work on C-ALADIN, which had primarily focused on analyzing local convergence for non-convex cases. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.15461 [pdf, other]

Canonical Factors for Hybrid Neural Fields

Authors: Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma

Abstract: Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) e… ▽ More Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance succeeds with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, where we observe improvements across quality, robustness, compactness, and runtime. Results demonstrate that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: ICCV 2023. Project webpage: https://brentyi.github.io/tilted/

arXiv:2308.15089 [pdf, other]

doi 10.1142/S0218202524500155

Optimal error bounds on time-splitting methods for the nonlinear Schrödinger equation with low regularity potential and nonlinearity

Authors: Weizhu Bao, Ying Ma, Chushan Wang

Abstract: We establish optimal error bounds on time-splitting methods for the nonlinear Schrödinger equation with low regularity potential and typical power-type nonlinearity $ f(ρ) = ρ^σ$, where $ ρ:=|ψ|^2 $ is the density with $ ψ$ the wave function and $ σ> 0 $ the exponent of the nonlinearity. For the first-order Lie-Trotter time-splitting method, optimal $ L^2 $-norm error bound is proved for… ▽ More We establish optimal error bounds on time-splitting methods for the nonlinear Schrödinger equation with low regularity potential and typical power-type nonlinearity $ f(ρ) = ρ^σ$, where $ ρ:=|ψ|^2 $ is the density with $ ψ$ the wave function and $ σ> 0 $ the exponent of the nonlinearity. For the first-order Lie-Trotter time-splitting method, optimal $ L^2 $-norm error bound is proved for $L^\infty$-potential and $ σ> 0 $, and optimal $H^1$-norm error bound is obtained for $ W^{1, 4} $-potential and $ σ\geq 1/2 $. For the second-order Strang time-splitting method, optimal $ L^2 $-norm error bound is established for $H^2$-potential and $ σ\geq 1 $, and optimal $H^1$-norm error bound is proved for $H^3$-potential and $ σ\geq 3/2 $ (or $σ= 1$). Compared to those error estimates of time-splitting methods in the literature, our optimal error bounds either improve the convergence rates under the same regularity assumptions or significantly relax the regularity requirements on potential and nonlinearity for optimal convergence orders. A key ingredient in our proof is to adopt a new technique called \textit{regularity compensation oscillation} (RCO), where low frequency modes are analyzed by phase cancellation, and high frequency modes are estimated by regularity of the solution. Extensive numerical results are reported to confirm our error estimates and to demonstrate that they are sharp. △ Less

Submitted 7 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 34 pages, 8 figures

MSC Class: 35Q55; 65M15; 65M70; 81Q05

Journal ref: Math. Models Methods Appl. Sci., Vol. 34 (2024), pp. 803-844

arXiv:2308.14743 [pdf, ps, other]

doi 10.1088/1751-8121/ad1622

Domain Walls and Vector Solitons in the Coupled Nonlinear Schrodinger Equation

Authors: David D. J. M. Snee, Yi-** Ma

Abstract: We outline a program to classify domain walls (DWs) and vector solitons in the 1D two-component coupled nonlinear Schrodinger (CNLS) equation with general coefficients. The CNLS equation is reduced first to a complex ordinary differential equation (ODE), and then to a real ODE after imposing a restriction. In the real ODE, we identify four possible equilibria including ZZ, ZN, NZ, and NN, with Z (… ▽ More We outline a program to classify domain walls (DWs) and vector solitons in the 1D two-component coupled nonlinear Schrodinger (CNLS) equation with general coefficients. The CNLS equation is reduced first to a complex ordinary differential equation (ODE), and then to a real ODE after imposing a restriction. In the real ODE, we identify four possible equilibria including ZZ, ZN, NZ, and NN, with Z (N) denoting a zero (nonzero) value in a component, and analyze their spatial stability. We identify two types of DWs including asymmetric DWs between ZZ and NN and symmetric DWs between ZN and NZ. We identify three codimension-1 mechanisms for generating vector solitons in the real ODE including heteroclinic cycles, local bifurcations, and exact solutions. Heteroclinic cycles are formed by assembling two DWs back-to-back and generate extended bright-bright (BB), dark-dark (DD), and dark-bright (DB) solitons. Local bifurcations include the Turing (Hamiltonian-Hopf) bifurcation that generates Turing solitons with oscillatory tails and the pitchfork bifurcation that generates DB, bright-antidark, DD, and dark-antidark solitons with monotonic tails. Exact solutions include scalar bright and dark solitons with vector amplitudes. Any codimension-1 real vector soliton can be numerically continued into a codimension-0 family. Complex vector solitons have two more parameters: a dark or antidark component can be numerically continued in the wavenumber, while a bright component can be multiplied by a constant phase factor (polarization). We introduce a numerical continuation method to find real and complex vector solitons and show that DWs and DB solitons in the immiscible regime can be related by varying bifurcation parameters. We show that collisions between two polarized DB solitons typically feature a mass exchange that changes the parameters of the two bright components and the two soliton velocities. △ Less

Submitted 8 January, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 29 pages, 5 figures, 1 table; accepted version in Journal of Physics A after addressing referee comments

Journal ref: J. Phys. A: Math. Theor. 57 035702 (2024)

arXiv:2308.13031 [pdf, ps, other]

Initial data gluing in the asymptotically flat regime via solution operators with prescribed support properties

Authors: Yuchen Mao, Sung-** Oh, Zhongkai Tao

Abstract: We give new proofs of general relativistic initial data gluing results on unit-scale annuli based on explicit solution operators for the linearized constraint equation around the flat case with prescribed support properties. These results retrieve and optimize - in terms of positivity, regularity, size and/or spatial decay requirements - a number of known theorems concerning asymptotically flat in… ▽ More We give new proofs of general relativistic initial data gluing results on unit-scale annuli based on explicit solution operators for the linearized constraint equation around the flat case with prescribed support properties. These results retrieve and optimize - in terms of positivity, regularity, size and/or spatial decay requirements - a number of known theorems concerning asymptotically flat initial data, including Kerr exterior gluing by Corvino-Schoen and Chruściel-Delay, interior gluing (or "fill-in") by Bieri-Chruściel, and obstruction-free gluing by Czimek-Rodnianski. In particular, our proof of the strengthened obstruction-free gluing theorem relies on purely spacelike techniques, rather than null gluing as in the original approach. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 30 pages, 1 figure. Comments are welcome!

arXiv:2308.10059 [pdf, other]

The degree threshold for covering with all the connected $3$-graphs with $3$ edges

Authors: Yue Ma, Xinmin Hou, Zhi Yin

Abstract: Given two $r$-uniform hypergraphs $F$ and $H$, we say that $H$ has an $F$-covering if every vertex in $H$ is contained in a copy of $F$. Let $c_{i}(n,F)$ be the least integer such that every $n$-vertex $r$-graph $H$ with $δ_{i}(H)>c_i(n,F)$ has an $F$-covering. Falgas-Ravry, Markstöm and Zhao (Combin. Probab. Comput., 2021) asymptotically determined $c_1(n,K_{4}^{(3)-})$, where $K_{4}^{(3)-}$ is o… ▽ More Given two $r$-uniform hypergraphs $F$ and $H$, we say that $H$ has an $F$-covering if every vertex in $H$ is contained in a copy of $F$. Let $c_{i}(n,F)$ be the least integer such that every $n$-vertex $r$-graph $H$ with $δ_{i}(H)>c_i(n,F)$ has an $F$-covering. Falgas-Ravry, Markstöm and Zhao (Combin. Probab. Comput., 2021) asymptotically determined $c_1(n,K_{4}^{(3)-})$, where $K_{4}^{(3)-}$ is obtained by deleting an edge from the complete $3$-graph on $4$ vertices. Later, Tang, Ma and Hou (arXiv, 2022) asymptotically determined $c_1(n,C_{6}^{(3)})$, where $C_{6}^{(3)}$ is the linear triangle, i.e. $C_{6}^{(3)}=([6],\{123,345,561\})$. In this paper, we determine $c_1(n,F_5)$ asymptotically, where $F_5$ is the generalized triangle, i.e. $F_5=([5],\{123,124,345\})$. We also determine the exact values of $c_1(n,F)$, where $F$ is any connected $3$-graphs with $3$ edges and $F\notin\{K_4^{(3)-}, C_{6}^{(3)}, F_5\}$. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: 17 pages, 10 figures

arXiv:2308.02145 [pdf, other]

Optimization on Pareto sets: On a theory of multi-objective optimization

Authors: Abhishek Roy, Geelon So, Yi-An Ma

Abstract: In multi-objective optimization, a single decision vector must balance the trade-offs between many objectives. Solutions achieving an optimal trade-off are said to be Pareto optimal: these are decision vectors for which improving any one objective must come at a cost to another. But as the set of Pareto optimal vectors can be very large, we further consider a more practically significant Pareto-co… ▽ More In multi-objective optimization, a single decision vector must balance the trade-offs between many objectives. Solutions achieving an optimal trade-off are said to be Pareto optimal: these are decision vectors for which improving any one objective must come at a cost to another. But as the set of Pareto optimal vectors can be very large, we further consider a more practically significant Pareto-constrained optimization problem, where the goal is to optimize a preference function constrained to the Pareto set. We investigate local methods for solving this constrained optimization problem, which poses significant challenges because the constraint set is (i) implicitly defined, and (ii) generally non-convex and non-smooth, even when the objectives are. We define notions of optimality and stationarity, and provide an algorithm with a last-iterate convergence rate of $O(K^{-1/2})$ to stationarity when the objectives are strongly convex and Lipschitz smooth. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.15321 [pdf, ps, other]

Nonstandard limit theorems and large deviation for beta -Jacobi ensembles with a different scaling

Authors: Yutao Ma, Yong-Hua Mao, Siyu Wang

Abstract: We consider $β$-Jacobi ensembles with parameters $p_1, p_2\geq n.$ We prove that the empirical measure of the rescaled Jacobi ensembles converges weakly to a modified Watcher law via the spectral measure method, which revisits the weak limits obtained in \cite{MaLDPJ} while replacing the condition $βn\!>\!> \log n$ by $βn\!>\!>1.$ We also provide the central limit theorem and the large deviation… ▽ More We consider $β$-Jacobi ensembles with parameters $p_1, p_2\geq n.$ We prove that the empirical measure of the rescaled Jacobi ensembles converges weakly to a modified Watcher law via the spectral measure method, which revisits the weak limits obtained in \cite{MaLDPJ} while replacing the condition $βn\!>\!> \log n$ by $βn\!>\!>1.$ We also provide the central limit theorem and the large deviation for the corresponding rescaled spectral measure. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 23 pages

arXiv:2307.14670 [pdf, other]

Long-time asymptotics and the radiation condition for linear evolution equations on the half-line with time-periodic boundary conditions

Authors: Yifeng Mao, Dionyssios Mantzavinos, Mark A. Hoefer

Abstract: The large time $t$ asymptotics for scalar, constant coefficient,linear, third order, dispersive equations are obtained for asymptotically time-periodic Dirichlet boundary data and zero initial data on the half-line modeling a wavemaker acting upon an initially quiescent medium. The asymptotic Dirichlet-to-Neumann (D-N) map is constructed by expanding upon the recently developed $Q$-equation method… ▽ More The large time $t$ asymptotics for scalar, constant coefficient,linear, third order, dispersive equations are obtained for asymptotically time-periodic Dirichlet boundary data and zero initial data on the half-line modeling a wavemaker acting upon an initially quiescent medium. The asymptotic Dirichlet-to-Neumann (D-N) map is constructed by expanding upon the recently developed $Q$-equation method. The D-N map is proven to be unique if and only if the radiation condition that selects the unique wavenumber branch of the dispersion relation for a sinusoidal, time-dependent boundary condition holds: (i) for frequencies in a finite interval, the wavenumber is real and corresponds to positive group velocity, (ii) for frequencies outside the interval, the wavenumber is complex with positive imaginary part. For fixed spatial location $x$, the corresponding asymptotic solution is (i) a traveling wave or (ii) a spatially decaying, time-periodic wave. Uniform-in-$x$ asymptotic solutions for the physical cases of the linearized Korteweg-de Vries and Benjamin-Bona-Mahony (BBM) equations are obtained via integral asymptotics. The linearized BBM asymptotics are found to quantitatively agree with viscous core-annular fluid experiments. △ Less

Submitted 27 July, 2023; originally announced July 2023.

MSC Class: 35G16; 76B15; 35Q53; 35C20

arXiv:2307.13381 [pdf, other]

Scaff-PD: Communication Efficient Fair and Robust Federated Learning

Authors: Yaodong Yu, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan

Abstract: We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning. Our approach improves fairness by optimizing a family of distributionally robust objectives tailored to heterogeneous clients. We leverage the special structure of these objectives, and design an accelerated primal dual (APD) algorithm which uses bias corrected local steps (as in Scaff… ▽ More We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning. Our approach improves fairness by optimizing a family of distributionally robust objectives tailored to heterogeneous clients. We leverage the special structure of these objectives, and design an accelerated primal dual (APD) algorithm which uses bias corrected local steps (as in Scaffold) to achieve significant gains in communication efficiency and convergence speed. We evaluate Scaff-PD on several benchmark datasets and demonstrate its effectiveness in improving fairness and robustness while maintaining competitive accuracy. Our results suggest that Scaff-PD is a promising approach for federated learning in resource-constrained and heterogeneous settings. △ Less

Submitted 25 July, 2023; originally announced July 2023.

MSC Class: 68W40; 68W15; 90C25; 90C06 ACM Class: G.1.6; F.2.1; E.4

arXiv:2307.02037 [pdf, other]

Reverse Diffusion Monte Carlo

Authors: Xunpeng Huang, Hanze Dong, Yifan Hao, Yi-An Ma, Tong Zhang

Abstract: We propose a Monte Carlo sampler from the reverse diffusion process. Unlike the practice of diffusion models, where the intermediary updates -- the score functions -- are learned with a neural network, we transform the score matching problem into a mean estimation one. By estimating the means of the regularized posterior distributions, we derive a novel Monte Carlo sampling algorithm called revers… ▽ More We propose a Monte Carlo sampler from the reverse diffusion process. Unlike the practice of diffusion models, where the intermediary updates -- the score functions -- are learned with a neural network, we transform the score matching problem into a mean estimation one. By estimating the means of the regularized posterior distributions, we derive a novel Monte Carlo sampling algorithm called reverse diffusion Monte Carlo (rdMC), which is distinct from the Markov chain Monte Carlo (MCMC) methods. We determine the sample size from the error tolerance and the properties of the posterior distribution to yield an algorithm that can approximately sample the target distribution with any desired accuracy. Additionally, we demonstrate and prove under suitable conditions that sampling with rdMC can be significantly faster than that with MCMC. For multi-modal target distributions such as those in Gaussian mixture models, rdMC greatly improves over the Langevin-style MCMC sampling methods both theoretically and in practice. The proposed rdMC method offers a new perspective and solution beyond classical MCMC algorithms for the challenging complex distributions. △ Less

Submitted 13 March, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: 44 pages, 16 figures, ICLR 2024

arXiv:2306.17607 [pdf, ps, other]

Complete bipartite graphs without small rainbow stars

Authors: Weizhen Chen, Meng Ji, Ya** Mao, Meiqin Wei

Abstract: The $k$-edge-colored bipartite Gallai-Ramsey number $\operatorname{bgr}_k(G:H)$ is defined as the minimum integer $n$ such that $n^2\geq k$ and for every $N\geq n$, every edge-coloring (using all $k$ colors) of complete bipartite graph $K_{N,N}$ contains a rainbow copy of $G$ or a monochromatic copy of $H$. In this paper, we first study the structural theorem on the complete bipartite graph… ▽ More The $k$-edge-colored bipartite Gallai-Ramsey number $\operatorname{bgr}_k(G:H)$ is defined as the minimum integer $n$ such that $n^2\geq k$ and for every $N\geq n$, every edge-coloring (using all $k$ colors) of complete bipartite graph $K_{N,N}$ contains a rainbow copy of $G$ or a monochromatic copy of $H$. In this paper, we first study the structural theorem on the complete bipartite graph $K_{n,n}$ with no rainbow copy of $K_{1,3}$. Next, we utilize the results to prove the exact values of $\operatorname{bgr}_{k}(P_4: H)$, $\operatorname{bgr}_{k}(P_5: H)$, $\operatorname{bgr}_{k}(K_{1,3}: H)$, where $H$ is a various union of cycles and paths and stars. △ Less

Submitted 13 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

Comments: 13 pages

arXiv:2306.14853 [pdf, ps, other]

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Authors: Lesi Chen, Yaohua Ma, **gzhao Zhang

Abstract: Bilevel optimization has wide applications such as hyperparameter tuning, neural architecture search, and meta-learning. Designing efficient algorithms for bilevel optimization is challenging because the lower-level problem defines a feasibility set implicitly via another optimization problem. In this work, we consider one tractable case when the lower-level problem is strongly convex. Recent work… ▽ More Bilevel optimization has wide applications such as hyperparameter tuning, neural architecture search, and meta-learning. Designing efficient algorithms for bilevel optimization is challenging because the lower-level problem defines a feasibility set implicitly via another optimization problem. In this work, we consider one tractable case when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product oracle, one can provably find an $ε$-first-order stationary point within $\tilde{\mathcal{O}}(ε^{-2})$ oracle calls. However, Hessian-vector product may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(ε^{-3})$. In this work, we provide a tighter analysis demonstrating that this method can converge at the near-optimal $\tilde {\mathcal{O}}(ε^{-2})$ rate as second-order methods. Our analysis further leads to simple first-order algorithms that achieve similar convergence rates for finding second-order stationary points and for distributed bilevel problems. △ Less

Submitted 28 August, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: slightly change the title

arXiv:2306.12222 [pdf, ps, other]

Graphs without rainbow cliques of orders four and five

Authors: Yue Ma, Xinmin Hou

Abstract: Let $\mathcal{G}_n^k=\{G_1,G_2,\ldots,G_k\}$ be a multiset of graphs on vertex set $[n]$ and let $F$ be a fixed graph with edge set $F=\{e_1, e_2,\ldots, e_m\}$ and $k\ge m$. We say ${\mathcal{G}_n^k}$ is rainbow $F$-free if there is no $\{i_1, i_2,\ldots, i_{m}\}\subseteq[k]$ satisfying $e_j\in G_{i_j}$ for every $j\in[m]$. Let $\ex_k(n,F)$ be the maximum $\sum_{i=1}^{k}|G_i|$ among all the rainb… ▽ More Let $\mathcal{G}_n^k=\{G_1,G_2,\ldots,G_k\}$ be a multiset of graphs on vertex set $[n]$ and let $F$ be a fixed graph with edge set $F=\{e_1, e_2,\ldots, e_m\}$ and $k\ge m$. We say ${\mathcal{G}_n^k}$ is rainbow $F$-free if there is no $\{i_1, i_2,\ldots, i_{m}\}\subseteq[k]$ satisfying $e_j\in G_{i_j}$ for every $j\in[m]$. Let $\ex_k(n,F)$ be the maximum $\sum_{i=1}^{k}|G_i|$ among all the rainbow $F$-free multisets ${\mathcal{G}_n^k}$. Keevash, Saks, Sudakov, and Verstraëte (2004) determined the exact value of $\ex_k(n, K_r)$ when $n$ is sufficiently large and proposed the conjecture that the results remain true when $n\ge Cr^2$ for some constant $C$. Recently, Frankl (2022) confirmed the conjecture for $r=3$ and all possible values of $n$. In this paper, we determine the exact value of $\ex_k(n, K_r)$ for $n\ge r-1$ when $r=4$ and $5$, i.e. the conjecture of Keevash, Saks, Sudakov, and Verstraëte is true for $r\in\{4,5\}$. △ Less

Submitted 21 June, 2023; originally announced June 2023.

MSC Class: 05C35

arXiv:2306.10960 [pdf, ps, other]

Performance and Reliability Analysis for Practical Byzantine Fault Tolerance with Repairable Voting Nodes

Authors: Yan-Xia Chang, Qing Wang, Quan-Lin Li, Yaqian Ma

Abstract: The practical Byzantine fault tolerant (PBFT) consensus protocol is one of the basic consensus protocols in the development of blockchain technology. At the same time, the PBFT consensus protocol forms a basis for some other important BFT consensus protocols, such as Tendermint, Streamlet, HotStuff, and LibraBFT. In general, the voting nodes may always fail so that they can leave the PBFT-based bl… ▽ More The practical Byzantine fault tolerant (PBFT) consensus protocol is one of the basic consensus protocols in the development of blockchain technology. At the same time, the PBFT consensus protocol forms a basis for some other important BFT consensus protocols, such as Tendermint, Streamlet, HotStuff, and LibraBFT. In general, the voting nodes may always fail so that they can leave the PBFT-based blockchain system in a random time interval, making the number of timely available voting nodes uncertain. Thus, this uncertainty leads to the analysis of the PBFT-based blockchain systems with repairable voting nodes being more challenging. In this paper, we develop a novel PBFT consensus protocol with repairable voting nodes and study such a new blockchain system using a multi-dimensional Markov process and the first passage time method. Based on this, we provide performance and reliability analysis, including throughput, availability, and reliability, for the new PBFT-based blockchain system with repairable voting nodes. Furthermore, we provide an approximate algorithm for computing the throughput of the new PBFT-based blockchain system. We employ numerical examples to demonstrate the validity of our theoretical results and illustrate how the key system parameters influence performance measures of the PBFT-based blockchain system with repairable voting nodes. We hope the methodology and results developed in this paper will stimulate future research endeavors and open up new research trajectories in this field. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 55 pages, 17 figures

MSC Class: 90B22; 60J28 ACM Class: H.2.4; H.3.5; E.2; E.3; D.4.6; D.4.8

arXiv:2306.07546 [pdf, ps, other]

Quasi-stationary distributions for time-changed symmetric $α$-stable processes killed upon hitting zero

Authors: Zhe-Kang Fang, Yong-Hua Mao, Tao Wang

Abstract: For a time-changed symmetric $α$-stable process killed upon hitting zero, under the condition of entrance from infinity, we prove the existence and uniqueness of quasi-stationary distribution (QSD). The exponential convergence to the QSD from any initial distribution is proved under conditions on transition densities. For a time-changed symmetric $α$-stable process killed upon hitting zero, under the condition of entrance from infinity, we prove the existence and uniqueness of quasi-stationary distribution (QSD). The exponential convergence to the QSD from any initial distribution is proved under conditions on transition densities. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 19pages

arXiv:2306.06305 [pdf, other]

A Central Limit Theorem for Algorithmic Estimator of Saddle Point

Authors: Abhishek Roy, Yi-An Ma

Abstract: In this work, we study the asymptotic randomness of an algorithmic estimator of the saddle point of a globally convex-concave and locally strongly-convex strongly-concave objective. Specifically, we show that the averaged iterates of a Stochastic Extra-Gradient (SEG) method for a Saddle Point Problem (SPP) converges almost surely to the saddle point and follows a Central Limit Theorem (CLT) with o… ▽ More In this work, we study the asymptotic randomness of an algorithmic estimator of the saddle point of a globally convex-concave and locally strongly-convex strongly-concave objective. Specifically, we show that the averaged iterates of a Stochastic Extra-Gradient (SEG) method for a Saddle Point Problem (SPP) converges almost surely to the saddle point and follows a Central Limit Theorem (CLT) with optimal covariance under martingale-difference noise and the state(decision)-dependent Markov noise. To ensure the stability of the algorithm dynamics under the state-dependent Markov noise, we propose a variant of SEG with truncated varying sets. Interestingly, we show that a state-dependent Markovian data sequence can cause Stochastic Gradient Descent Ascent (SGDA) to diverge even if the target objective is strongly-convex strongly-concave. The main novelty of this work is establishing a CLT for SEG for a stochastic SPP, especially under sate-dependent Markov noise. This is the first step towards online inference of SPP with numerous potential applications including games, robust strategic classification, and reinforcement learning. We illustrate our results through numerical experiments. △ Less

Submitted 4 November, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Showing 1–50 of 361 results for author: Ma, Y