Search | arXiv e-print repository

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Authors: Hao Sun, Li Shen, Shixiang Chen, **gwei Sun, **g Li, Guangzhong Sun, Dacheng Tao

Abstract: Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same le… ▽ More Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 40 pages

arXiv:2303.00565 [pdf, other]

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Authors: Hao Sun, Li Shen, Qihuang Zhong, Liang Ding, Shixiang Chen, **gwei Sun, **g Li, Guangzhong Sun, Dacheng Tao

Abstract: Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks withou… ▽ More Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We theoretically show that AdaSAM admits a $\mathcal{O}(1/\sqrt{bT})$ convergence rate, which achieves linear speedup property with respect to mini-batch size $b$. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 18 pages

arXiv:2208.00956 [pdf, other]

doi 10.5194/npg-30-263-2023

An adjoint-free algorithm for conditional nonlinear optimal perturbations (CNOPs) via sampling

Authors: Bin Shi, Guodong Sun

Abstract: In this paper, we propose a sampling algorithm based on state-of-the-art statistical machine learning techniques to obtain conditional nonlinear optimal perturbations (CNOPs), which is different from traditional (deterministic) optimization methods.1 Specifically, the traditional approach is unavailable in practice, which requires numerically computing the gradient (first-order information) such t… ▽ More In this paper, we propose a sampling algorithm based on state-of-the-art statistical machine learning techniques to obtain conditional nonlinear optimal perturbations (CNOPs), which is different from traditional (deterministic) optimization methods.1 Specifically, the traditional approach is unavailable in practice, which requires numerically computing the gradient (first-order information) such that the computation cost is expensive, since it needs a large number of times to run numerical models. However, the sampling approach directly reduces the gradient to the objective function value (zeroth-order information), which also avoids using the adjoint technique that is unusable for many atmosphere and ocean models and requires large amounts of storage. We show an intuitive analysis for the sampling algorithm from the law of large numbers and further present a Chernoff-type concentration inequality to rigorously characterize the degree to which the sample average probabilistically approximates the exact gradient. The experiments are implemented to obtain the CNOPs for two numerical models, the Burgers equation with small viscosity and the Lorenz-96 model. We demonstrate the CNOPs obtained with their spatial patterns, objective values, computation times, and nonlinear error growth. Compared with the performance of the three approaches, all the characters for quantifying the CNOPs are nearly consistent, while the computation time using the sampling approach with fewer samples is much shorter. In other words, the new sampling algorithm shortens the computation time to the utmost at the cost of losing little accuracy. △ Less

Submitted 24 March, 2024; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: 20 pages, 6 figures, 4 tables

Journal ref: Nonlin. Processes Geophys.,30,263-276,2023

arXiv:2107.09595 [pdf, other]

Optimal control and comprehensive cost-effectiveness analysis for COVID-19

Authors: Joshua Kiddy Kwasi Asamoah, Eric Okyere, Afeez Abidemi, Stephen E. Moore, Gui-Quan Sun, Zhen **, Edward Acheampong, Joseph Frank Gordon

Abstract: Cost-effectiveness analysis is a mode of determining both the cost and economic health outcomes of one or more control interventions. In this work, we have formulated a non-autonomous nonlinear deterministic model to study the control of COVID-19 to unravel the cost and economic health outcomes for the autonomous nonlinear model proposed for the Kingdom of Saudi Arabia. The optimal control model c… ▽ More Cost-effectiveness analysis is a mode of determining both the cost and economic health outcomes of one or more control interventions. In this work, we have formulated a non-autonomous nonlinear deterministic model to study the control of COVID-19 to unravel the cost and economic health outcomes for the autonomous nonlinear model proposed for the Kingdom of Saudi Arabia. The optimal control model captures four time-dependent control functions, thus, $u_1$-practising physical or social distancing protocols; $u_2$-practising personal hygiene by cleaning contaminated surfaces with alcohol-based detergents; $u_3$-practising proper and safety measures by exposed, asymptomatic and symptomatic infected individuals; $u_4$-fumigating schools in all levels of education, sports facilities, commercial areas and religious worship centres. We proved the existence of the proposed optimal control model. The optimality system associated with the non-autonomous epidemic model is derived using Pontryagin's maximum principle. We have performed numerical simulations to investigate extensive cost-effectiveness analysis for fourteen optimal control strategies. Comparing the control strategies, we noticed that; Strategy 1 (practising physical or social distancing protocols) is the most cost-saving and most effective control intervention in Saudi Arabia in the absence of vaccination. But, in terms of the infection averted, we saw that strategy 6, strategy 11, strategy 12, and strategy 14 are just as good in controlling COVID-19. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:2005.05965 [pdf, other]

Continuation Method with the Trusty Time-step** Scheme for Linearly Constrained Optimization with Noisy Data

Authors: Xin-long Luo, Jia-hui Lv, Geng Sun

Abstract: The nonlinear optimization problem with linear constraints has many applications in engineering fields such as the visual-inertial navigation and localization of an unmanned aerial vehicle maintaining the horizontal flight. In order to solve this practical problem efficiently, this paper constructs a continuation method with the trusty time-step** scheme for the linearly equality-constrained opt… ▽ More The nonlinear optimization problem with linear constraints has many applications in engineering fields such as the visual-inertial navigation and localization of an unmanned aerial vehicle maintaining the horizontal flight. In order to solve this practical problem efficiently, this paper constructs a continuation method with the trusty time-step** scheme for the linearly equality-constrained optimization problem at every sampling time. At every iteration, the new method only solves a system of linear equations other than the traditional optimization method such as the sequential quadratic programming (SQP) method, which needs to solve a quadratic programming subproblem. Consequently, the new method can save much more computational time than SQP. Numerical results show that the new method works well for this problem and its consumed time is about one fifth of that of SQP (the built-in subroutine fmincon.m of the MATLAB2018a environment) or that of the traditional dynamical method (the built-in subroutine ode15s.m of the MATLAB2018a environment). Furthermore, we also give the global convergence analysis of the new method. △ Less

Submitted 31 October, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

arXiv:2002.04791 [pdf, other]

A Visual-inertial Navigation Method for High-Speed Unmanned Aerial Vehicles

Authors: Xin-long Luo, Jia-hui Lv, Geng Sun

Abstract: This paper investigates the localization problem of high-speed high-altitude unmanned aerial vehicle (UAV) with a monocular camera and inertial navigation system. It proposes a navigation method utilizing the complementarity of vision and inertial devices to overcome the singularity which arises from the horizontal flight of UAV. Furthermore, it modifies the mathematical model of localization prob… ▽ More This paper investigates the localization problem of high-speed high-altitude unmanned aerial vehicle (UAV) with a monocular camera and inertial navigation system. It proposes a navigation method utilizing the complementarity of vision and inertial devices to overcome the singularity which arises from the horizontal flight of UAV. Furthermore, it modifies the mathematical model of localization problem via separating linear parts from nonlinear parts and replaces a nonlinear least-squares problem with a linearly equality-constrained optimization problem. In order to avoid the ill-condition property near the optimal point of sequential unconstrained minimization techniques(penalty methods), it constructs a semi-implicit continuous method with a trust-region technique based on a differential-algebraic dynamical system to solve the linearly equality-constrained optimization problem. It also analyzes the global convergence property of the semi-implicit continuous method in an infinity integrated interval other than the traditional convergence analysis of numerical methods for ordinary differential equations in a finite integrated interval. Finally, the promising numerical results are also presented. △ Less

Submitted 11 February, 2020; originally announced February 2020.

MSC Class: 65H17; 65J15; 65K05; 65L05

arXiv:2002.04315 [pdf, other]

Symplectic Geometric Methods for Matrix Differential Equations Arising from Inertial Navigation Problems

Authors: Xin-Long Luo, Geng Sun

Abstract: This article explores some geometric and algebraic properties of the dynamical system which is represented by matrix differential equations arising from inertial navigation problems, such as the symplecticity and the orthogonality. Furthermore, it extends the applicable fields of symplectic geometric algorithms from the even dimensional Hamiltonian system to the odd dimensional dynamical system. F… ▽ More This article explores some geometric and algebraic properties of the dynamical system which is represented by matrix differential equations arising from inertial navigation problems, such as the symplecticity and the orthogonality. Furthermore, it extends the applicable fields of symplectic geometric algorithms from the even dimensional Hamiltonian system to the odd dimensional dynamical system. Finally, some numerical experiments are presented and illustrate the theoretical results of this paper. △ Less

Submitted 11 February, 2020; originally announced February 2020.

arXiv:1808.05548 [pdf, ps, other]

Symmetric-adjoint and symplectic-adjoint methods and their applications

Authors: Geng Sun, Siqing Gan, Hongyu Liu, Zaijiu Shang

Abstract: Symmetric method and symplectic method are classical notions in the theory of Runge-Kutta methods. They can generate numerical flows that respectively preserve the symmetry and symplecticity of the continuous flows in the phase space. Adjoint method is an important way of constructing a new Runge-Kutta method via the symmetrisation of another Runge-Kutta method. In this paper, we introduce a new n… ▽ More Symmetric method and symplectic method are classical notions in the theory of Runge-Kutta methods. They can generate numerical flows that respectively preserve the symmetry and symplecticity of the continuous flows in the phase space. Adjoint method is an important way of constructing a new Runge-Kutta method via the symmetrisation of another Runge-Kutta method. In this paper, we introduce a new notion, called symplectic-adjoint Runge-Kutta method. We prove some interesting properties of the symmetric-adjoint and symplectic-adjoint methods. These properties reveal some intrinsic connections among several classical classes of Runge-Kutta methods. In particular, the newly introduced notion and the corresponding properties enable us to develop a novel and practical approach of constructing high-order explicit Runge-Kutta methods, which is a challenging and longly overlooked topic in the theory of Runge-Kutta methods. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: 20 pages, comments are welcome

arXiv:0802.2121 [pdf, ps, other]

Preservation of stability properties near fixed points of linear hamiltonian systems by symplectic integrators

Authors: Xiaohua Ding, Hongyu Liu, Zaijiu Shang, Geng Sun, Lingshu Wang

Abstract: Based on reasonable testing model problems, we study the preservation by symplectic Runge-Kutta method (SRK) and symplectic partitioned Runge-Kutta method (SPRK) of structures for fixed points of linear Hamiltonian systems. The structure-preservation region provides a practical criterion for choosing step-size in symplectic computation. Examples are given to justify the investigation. Based on reasonable testing model problems, we study the preservation by symplectic Runge-Kutta method (SRK) and symplectic partitioned Runge-Kutta method (SPRK) of structures for fixed points of linear Hamiltonian systems. The structure-preservation region provides a practical criterion for choosing step-size in symplectic computation. Examples are given to justify the investigation. △ Less

Submitted 14 February, 2008; originally announced February 2008.

MSC Class: 37M15; 65P10

Showing 1–9 of 9 results for author: Sun, G