Search | arXiv e-print repository

Vertex Exchange Method for a Class of Quadratic Programming Problems

Authors: Ling Liang, Kim-Chuan Toh, Haizhao Yang

Abstract: A vertex exchange method is proposed for solving the strongly convex quadratic program subject to the generalized simplex constraint. We conduct rigorous convergence analysis for the proposed algorithm and demonstrate its essential roles in solving some important classes of constrained convex optimization. To get a feasible initial point to execute the algorithm, we also present and analyze a high… ▽ More A vertex exchange method is proposed for solving the strongly convex quadratic program subject to the generalized simplex constraint. We conduct rigorous convergence analysis for the proposed algorithm and demonstrate its essential roles in solving some important classes of constrained convex optimization. To get a feasible initial point to execute the algorithm, we also present and analyze a highly efficient semismooth Newton method for computing the projection onto the generalized simplex. The excellent practical performance of the proposed algorithms is demonstrated by a set of extensive numerical experiments. Our theoretical and numerical results further motivate the potential applications of the considered model and the proposed algorithms. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 32 pages, 5 tables

MSC Class: 90C06; 90C22; 90C25

arXiv:2407.03272 [pdf, other]

Nesterov's Accelerated Jacobi-Type Methods for Large-scale Symmetric Positive Semidefinite Linear Systems

Authors: Ling Liang, Qiyuan Pang, Kim-Chuan Toh, Haizhao Yang

Abstract: Solving symmetric positive semidefinite linear systems is an essential task in many scientific computing problems. While Jacobi-type methods, including the classical Jacobi method and the weighted Jacobi method, exhibit simplicity in their forms and friendliness to parallelization, they are not attractive either because of the potential convergence failure or their slow convergence rate. This pape… ▽ More Solving symmetric positive semidefinite linear systems is an essential task in many scientific computing problems. While Jacobi-type methods, including the classical Jacobi method and the weighted Jacobi method, exhibit simplicity in their forms and friendliness to parallelization, they are not attractive either because of the potential convergence failure or their slow convergence rate. This paper aims to showcase the possibility of improving classical Jacobi-type methods by employing Nesterov's acceleration technique that results in an accelerated Jacobi-type method with improved convergence properties. Simultaneously, it preserves the appealing features for parallel implementation. In particular, we show that the proposed method has an $O\left(\frac{1}{t^2}\right)$ convergence rate in terms of objective function values of the associated convex quadratic optimization problem, where $t\geq 1$ denotes the iteration counter. To further improve the practical performance of the proposed method, we also develop and analyze a restarted variant of the method, which is shown to have an $O\left(\frac{(\log_2(t))^2}{t^2}\right)$ convergence rate when the coefficient matrix is positive definite. Furthermore, we conduct appropriate numerical experiments to evaluate the efficiency of the proposed method. Our numerical results demonstrate that the proposed method outperforms the classical Jacobi-type methods and the conjugate gradient method and shows a comparable performance as the preconditioned conjugate gradient method with a diagonal preconditioner. Finally, we develop a parallel implementation and conduct speed-up tests on some large-scale systems. Our results indicate that the proposed framework is highly scalable. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 20 pages

MSC Class: 90C06; 90C22; 90C25

arXiv:2406.18287 [pdf, other]

Learning-rate-free Momentum SGD with Reshuffling Converges in Nonsmooth Nonconvex Optimization

Authors: Xiaoyin Hu, Nachuan Xiao, Xin Liu, Kim-Chuan Toh

Abstract: In this paper, we propose a generalized framework for develo** learning-rate-free momentum stochastic gradient descent (SGD) methods in the minimization of nonsmooth nonconvex functions, especially in training nonsmooth neural networks. Our framework adaptively generates learning rates based on the historical data of stochastic subgradients and iterates. Under mild conditions, we prove that our… ▽ More In this paper, we propose a generalized framework for develo** learning-rate-free momentum stochastic gradient descent (SGD) methods in the minimization of nonsmooth nonconvex functions, especially in training nonsmooth neural networks. Our framework adaptively generates learning rates based on the historical data of stochastic subgradients and iterates. Under mild conditions, we prove that our proposed framework enjoys global convergence to the stationary points of the objective function in the sense of the conservative field, hence providing convergence guarantees for training nonsmooth neural networks. Based on our proposed framework, we propose a novel learning-rate-free momentum SGD method (LFM). Preliminary numerical experiments reveal that LFM performs comparably to the state-of-the-art learning-rate-free methods (which have not been shown theoretically to be convergence) across well-known neural network training benchmarks. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 26 pages

arXiv:2406.12013 [pdf, other]

Convergence rates of S.O.S hierarchies for polynomial semidefinite programs

Authors: Hoang Anh Tran, Kim-Chuan Toh

Abstract: We introduce a S.O.S hierarchy of lower bounds for a polynomial optimization problem whose constraint is expressed as a matrix polynomial semidefinite condition. Our approach involves utilizing a penalty function framework to directly address the matrix-based constraint, making it applicable to both discrete and continuous polynomial optimization problems. We investigate the convergence rates of t… ▽ More We introduce a S.O.S hierarchy of lower bounds for a polynomial optimization problem whose constraint is expressed as a matrix polynomial semidefinite condition. Our approach involves utilizing a penalty function framework to directly address the matrix-based constraint, making it applicable to both discrete and continuous polynomial optimization problems. We investigate the convergence rates of these bounds in both problem types. The proposed method yields a variation of Putinar's theorem tailored for positive polynomials within a compact semidefinite set, defined by a matrix polynomial semidefinite constraint. More specifically, we derive novel insights into the convergence rates and degree of additional terms in the representation within this modified version of Putinar's theorem, based on the Jackson's theorem and a version of Łojasiewicz inequality. △ Less

Submitted 17 June, 2024; originally announced June 2024.

MSC Class: 90C22; 90C26; 41A10; 41A50

arXiv:2406.04646 [pdf, other]

An Inexact Bregman Proximal Difference-of-Convex Algorithm with Two Types of Relative Stop** Criteria

Authors: Lei Yang, **g**g Hu, Kim-Chuan Toh

Abstract: In this paper, we consider a class of difference-of-convex (DC) optimization problems, where the global Lipschitz gradient continuity assumption on the smooth part of the objective function is not required. Such problems are prevalent in many contemporary applications such as compressed sensing, statistical regression, and machine learning, and can be solved by a general Bregman proximal DC algori… ▽ More In this paper, we consider a class of difference-of-convex (DC) optimization problems, where the global Lipschitz gradient continuity assumption on the smooth part of the objective function is not required. Such problems are prevalent in many contemporary applications such as compressed sensing, statistical regression, and machine learning, and can be solved by a general Bregman proximal DC algorithm (BPDCA). However, the existing BPDCA is developed based on the stringent requirement that the involved subproblems must be solved exactly, which is often impractical and limits the applicability of the BPDCA. To facilitate the practical implementations and wider applications of the BPDCA, we develop an inexact Bregman proximal difference-of-convex algorithm (iBPDCA) by incorporating two types of relative-type stop** criteria for solving the subproblems. The proposed inexact framework has considerable flexibility to encompass many existing exact and inexact methods, and can accommodate different types of errors that may occur when solving the subproblem. This enables the potential application of our inexact framework across different DC decompositions to facilitate the design of a more efficient DCA scheme in practice. The global subsequential convergence and the global sequential convergence of our iBPDCA are established under suitable conditions including the Kurdyka-Łojasiewicz property. Some numerical experiments on the $\ell_{1-2}$ regularized least squares problem and the constrained $\ell_{1-2}$ sparse optimization problem are conducted to show the superior performance of our iBPDCA in comparison to existing algorithms. These results also empirically verify the necessity of develo** different types of stop** criteria to facilitate the efficient computation of the subproblem in each iteration of our iBPDCA. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2404.17386 [pdf, other]

Stochastic Bregman Subgradient Methods for Nonsmooth Nonconvex Optimization Problems

Authors: Kuangyu Ding, Kim-Chuan Toh

Abstract: This paper focuses on the problem of minimizing a locally Lipschitz continuous function. Motivated by the effectiveness of Bregman gradient methods in training nonsmooth deep neural networks and the recent progress in stochastic subgradient methods for nonsmooth nonconvex optimization problems \cite{bolte2021conservative,bolte2022subgradient,xiao2023adam}, we investigate the long-term behavior of… ▽ More This paper focuses on the problem of minimizing a locally Lipschitz continuous function. Motivated by the effectiveness of Bregman gradient methods in training nonsmooth deep neural networks and the recent progress in stochastic subgradient methods for nonsmooth nonconvex optimization problems \cite{bolte2021conservative,bolte2022subgradient,xiao2023adam}, we investigate the long-term behavior of stochastic Bregman subgradient methods in such context, especially when the objective function lacks Clarke regularity. We begin by exploring a general framework for Bregman-type methods, establishing their convergence by a differential inclusion approach. For practical applications, we develop a stochastic Bregman subgradient method that allows the subproblems to be solved inexactly. Furthermore, we demonstrate how a single timescale momentum can be integrated into the Bregman subgradient method with slight modifications to the momentum update. Additionally, we introduce a Bregman proximal subgradient method for solving composite optimization problems possibly with constraints, whose convergence can be guaranteed based on the general framework. Numerical experiments on training nonsmooth neural networks are conducted to validate the effectiveness of our proposed methods. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 28 pages, 6 figures

arXiv:2404.09438 [pdf, other]

Develo** Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

Authors: Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh

Abstract: In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for develo** Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These su… ▽ More In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for develo** Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 30 pages, 4 figures

arXiv:2402.15619 [pdf, other]

Towards Improved Uncertainty Quantification of Stochastic Epidemic Models Using Sequential Monte Carlo

Authors: Arindam Fadikar, Abby Stevens, Nicholson Collier, Kok Ben Toh, Olga Morozova, Anna Hotton, Jared Clark, David Higdon, Jonathan Ozik

Abstract: Sequential Monte Carlo (SMC) algorithms represent a suite of robust computational methodologies utilized for state estimation and parameter inference within dynamical systems, particularly in real-time or online environments where data arrives sequentially over time. In this research endeavor, we propose an integrated framework that combines a stochastic epidemic simulator with a sequential import… ▽ More Sequential Monte Carlo (SMC) algorithms represent a suite of robust computational methodologies utilized for state estimation and parameter inference within dynamical systems, particularly in real-time or online environments where data arrives sequentially over time. In this research endeavor, we propose an integrated framework that combines a stochastic epidemic simulator with a sequential importance sampling (SIS) scheme to dynamically infer model parameters, which evolve due to social as well as biological processes throughout the progression of an epidemic outbreak and are also influenced by evolving data measurement bias. Through iterative updates of a set of weighted simulated trajectories based on observed data, this framework enables the estimation of posterior distributions for these parameters, thereby capturing their temporal variability and associated uncertainties. Through simulation studies, we showcase the efficacy of SMC in accurately tracking the evolving dynamics of epidemics while appropriately accounting for uncertainties. Moreover, we delve into practical considerations and challenges inherent in implementing SMC for parameter estimation within dynamic epidemiological settings, areas where the substantial computational capabilities of high-performance computing resources can be usefully brought to bear. △ Less

Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 10 pages, 5 figures

arXiv:2402.06033 [pdf, ps, other]

An Inexact Halpern Iteration with Application to Distributionally Robust Optimization

Authors: Ling Liang, Kim-Chuan Toh, Jia-Jie Zhu

Abstract: The Halpern iteration for solving monotone inclusion problems has gained increasing interests in recent years due to its simple form and appealing convergence properties. In this paper, we investigate the inexact variants of the scheme in both deterministic and stochastic settings. We conduct extensive convergence analysis and show that by choosing the inexactness tolerances appropriately, the ine… ▽ More The Halpern iteration for solving monotone inclusion problems has gained increasing interests in recent years due to its simple form and appealing convergence properties. In this paper, we investigate the inexact variants of the scheme in both deterministic and stochastic settings. We conduct extensive convergence analysis and show that by choosing the inexactness tolerances appropriately, the inexact schemes admit an $O(k^{-1})$ convergence rate in terms of the (expected) residue norm. Our results relax the state-of-the-art inexactness conditions employed in the literature while sharing the same competitive convergence properties. We then demonstrate how the proposed methods can be applied for solving two classes of data-driven Wasserstein distributionally robust optimization problems that admit convex-concave min-max optimization reformulations. We highlight its capability of performing inexact computations for distributionally robust learning with stochastic first-order methods. △ Less

Submitted 12 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Correct a typo in the title and update authors' information

arXiv:2402.03942 [pdf, other]

Wasserstein distributionally robust optimization and its tractable regularization formulations

Authors: Hong T. M. Chu, Meixia Lin, Kim-Chuan Toh

Abstract: We study a variety of Wasserstein distributionally robust optimization (WDRO) problems where the distributions in the ambiguity set are chosen by constraining their Wasserstein discrepancies to the empirical distribution. Using the notion of weak Lipschitz property, we derive lower and upper bounds of the corresponding worst-case loss quantity and propose sufficient conditions under which this qua… ▽ More We study a variety of Wasserstein distributionally robust optimization (WDRO) problems where the distributions in the ambiguity set are chosen by constraining their Wasserstein discrepancies to the empirical distribution. Using the notion of weak Lipschitz property, we derive lower and upper bounds of the corresponding worst-case loss quantity and propose sufficient conditions under which this quantity coincides with its regularization scheme counterpart. Our constructive methodology and elementary analysis also directly characterize the closed-form of the approximate worst-case distribution. Extensive applications show that our theoretical results are applicable to various problems, including regression, classification and risk measure problems. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2312.13970 [pdf, other]

On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods

Authors: Anh Duc Nguyen, Tuan Dung Nguyen, Quang Minh Nguyen, Hoang H. Nguyen, Lam M. Nguyen, Kim-Chuan Toh

Abstract: This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of… ▽ More This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of the state-of-the-art Sinkhorn algorithm for POT due to its incompatible rounding procedure, which consequently degrades its qualitative performance in real world applications like point-cloud registration. To this end, we propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon^4)$. Our rounding algorithm also permits the development of two first-order methods to approximate the POT problem. The first algorithm, Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD), finds an $\varepsilon$-approximate solution to the POT problem in $\mathcal{\widetilde O}(n^{2.5}/\varepsilon)$, which is better in $\varepsilon$ than revised Sinkhorn. The second method, Dual Extrapolation, achieves the computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon)$, thereby being the best in the literature. We further demonstrate the flexibility of POT compared to standard OT as well as the practicality of our algorithms on real applications where two marginal distributions are unbalanced. △ Less

Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.07908 [pdf, ps, other]

A feasible method for general convex low-rank SDP problems

Authors: Tianyun Tang, Kim-Chuan Toh

Abstract: In this work, we consider the low rank decomposition (SDPR) of general convex semidefinite programming problems (SDP) that contain both a positive semidefinite matrix and a nonnegative vector as variables. We develop a rank-support-adaptive feasible method to solve (SDPR) based on Riemannian optimization. The method is able to escape from a saddle point to ensure its convergence to a global optima… ▽ More In this work, we consider the low rank decomposition (SDPR) of general convex semidefinite programming problems (SDP) that contain both a positive semidefinite matrix and a nonnegative vector as variables. We develop a rank-support-adaptive feasible method to solve (SDPR) based on Riemannian optimization. The method is able to escape from a saddle point to ensure its convergence to a global optimal solution for generic constraint vectors. We prove its global convergence and local linear convergence without assuming that the objective function is twice differentiable. Due to the special structure of the low-rank SDP problem, our algorithm can achieve better iteration complexity than existing results for more general smooth nonconvex problems. In order to overcome the degeneracy issues of SDP problems, we develop two strategies based on random perturbation and dual refinement. These techniques enable us to solve some primal degenerate SDP problems efficiently, for example, Lovász theta SDPs. Our work is a step forward in extending the application range of Riemannian optimization approaches for solving SDP problems. Numerical experiments are conducted to verify the efficiency and robustness of our method. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 36 pages, 1 figure

MSC Class: 90C06; 90C22; 90C30

arXiv:2312.05801 [pdf, other]

Stability and Character of Zero Field Skyrmionic States in Hybrid Magnetic Multilayer Nanodots

Authors: Alexander Kang-Jun Toh, McCoy W. Lim, T. S. Suraj, Xiaoye Chen, Hang Khume Tan, Royston Lim, Xuan Min Cheng, Nelson Lim, Sherry Yap, Durgesh Kumar, S. N. Piramanayagam, Pin Ho, Anjan Soumyanarayanan

Abstract: Ambient magnetic skyrmions stabilized in multilayer nanostructures are of immense interest due to their relevance to magnetic tunnel junction (MTJ) devices for memory and unconventional computing applications. However, existing skyrmionic nanostructures built using conventional metallic or oxide multilayer nanodots are unable to concurrently fulfill the requirements of nanoscale skyrmion stability… ▽ More Ambient magnetic skyrmions stabilized in multilayer nanostructures are of immense interest due to their relevance to magnetic tunnel junction (MTJ) devices for memory and unconventional computing applications. However, existing skyrmionic nanostructures built using conventional metallic or oxide multilayer nanodots are unable to concurrently fulfill the requirements of nanoscale skyrmion stability and feasibility of all-electrical readout and manipulation. Here, we develop a few-repeat hybrid multilayer platform consisting of metallic [Pt/CoB/Ir]3 and oxide [Pt/CoB/MgO] components that are coupled to evolve together as a single, composite stack. Zero-field (ZF) skyrmions with sizes as small as 50 nm are stabilized in the hybrid multilayer nanodots, which are smoothly modulated by up to 2.5x by varying CoB thickness and dot sizes. Meanwhile, skyrmion multiplets are also stabilized by small bias fields. Crucially, we observe higher order 'target' skyrmions with varying magnetization rotations in moderately-sized, low anisotropy nanodots. These results provide a viable route to realize long-sought skyrmionic MTJ devices and new possibilities for multi-state skyrmionic device concepts. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.06448 [pdf, other]

A Sparse Smoothing Newton Method for Solving Discrete Optimal Transport Problems

Authors: Di Hou, Ling Liang, Kim-Chuan Toh

Abstract: The discrete optimal transport (OT) problem, which offers an effective computational tool for comparing two discrete probability distributions, has recently attracted much attention and played essential roles in many modern applications. This paper proposes to solve the discrete OT problem by applying a squared smoothing Newton method via the Huber smoothing function for solving the corresponding… ▽ More The discrete optimal transport (OT) problem, which offers an effective computational tool for comparing two discrete probability distributions, has recently attracted much attention and played essential roles in many modern applications. This paper proposes to solve the discrete OT problem by applying a squared smoothing Newton method via the Huber smoothing function for solving the corresponding KKT system directly. The proposed algorithm admits appealing convergence properties and is able to take advantage of the solution sparsity to greatly reduce computational costs. Moreover, the algorithm can be extended to solve problems with similar structures including the Wasserstein barycenter (WB) problem with fixed supports. To verify the practical performance of the proposed method, we conduct extensive numerical experiments to solve a large set of discrete OT and WB benchmark problems. Our numerical results show that the proposed method is efficient compared to state-of-the-art linear programming (LP) solvers. Moreover, the proposed method consumes less memory than existing LP solvers, which demonstrates the potential usage of our algorithm for solving large-scale OT and WB problems. △ Less

Submitted 16 May, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: 29 pages, 17 figures

MSC Class: 90C05; 90C06; 90C25

arXiv:2311.01976 [pdf, other]

A Corrected Inexact Proximal Augmented Lagrangian Method with a Relative Error Criterion for a Class of Group-quadratic Regularized Optimal Transport Problems

Authors: Lei Yang, Ling Liang, Hong T. M. Chu, Kim-Chuan Toh

Abstract: The optimal transport (OT) problem and its related problems have attracted significant attention and have been extensively studied in various applications. In this paper, we focus on a class of group-quadratic regularized OT problems which aim to find solutions with specialized structures that are advantageous in practical scenarios. To solve this class of problems, we propose a corrected inexact… ▽ More The optimal transport (OT) problem and its related problems have attracted significant attention and have been extensively studied in various applications. In this paper, we focus on a class of group-quadratic regularized OT problems which aim to find solutions with specialized structures that are advantageous in practical scenarios. To solve this class of problems, we propose a corrected inexact proximal augmented Lagrangian method (ciPALM), with the subproblems being solved by the semi-smooth Newton ({\sc Ssn}) method. We establish that the proposed method exhibits appealing convergence properties under mild conditions. Moreover, our ciPALM distinguishes itself from the recently developed semismooth Newton-based inexact proximal augmented Lagrangian ({\sc Snipal}) method for linear programming. Specifically, {\sc Snipal} uses an absolute error criterion for the approximate minimization of the subproblem for which a summable sequence of tolerance parameters needs to be pre-specified for practical implementations. In contrast, our ciPALM adopts a relative error criterion with a \textit{single} tolerance parameter, which would be more friendly to tune from computational and implementation perspectives. These favorable properties position our ciPALM as a promising candidate for tackling large-scale problems. Various numerical studies validate the effectiveness of employing a relative error criterion for the inexact proximal augmented Lagrangian method, and also demonstrate that our ciPALM is competitive for solving large-scale group-quadratic regularized OT problems. △ Less

Submitted 2 April, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 37 pages, 6 figures

MSC Class: 90C05; 90C06; 90C25

arXiv:2310.08858 [pdf, other]

Adam-family Methods with Decoupled Weight Decay in Deep Learning

Authors: Kuangyu Ding, Nachuan Xiao, Kim-Chuan Toh

Abstract: In this paper, we investigate the convergence properties of a wide class of Adam-family methods for minimizing quadratically regularized nonsmooth nonconvex optimization problems, especially in the context of training nonsmooth neural networks with weight decay. Motivated by the AdamW method, we propose a novel framework for Adam-family methods with decoupled weight decay. Within our framework, th… ▽ More In this paper, we investigate the convergence properties of a wide class of Adam-family methods for minimizing quadratically regularized nonsmooth nonconvex optimization problems, especially in the context of training nonsmooth neural networks with weight decay. Motivated by the AdamW method, we propose a novel framework for Adam-family methods with decoupled weight decay. Within our framework, the estimators for the first-order and second-order moments of stochastic subgradients are updated independently of the weight decay term. Under mild assumptions and with non-diminishing stepsizes for updating the primary optimization variables, we establish the convergence properties of our proposed framework. In addition, we show that our proposed framework encompasses a wide variety of well-known Adam-family methods, hence offering convergence guarantees for these methods in the training of nonsmooth neural networks. More importantly, we show that our proposed framework asymptotically approximates the SGD method, thereby providing an explanation for the empirical observation that decoupled weight decay enhances generalization performance for Adam-family methods. As a practical application of our proposed framework, we propose a novel Adam-family method named Adam with Decoupled Weight Decay (AdamD), and establish its convergence properties under mild conditions. Numerical experiments demonstrate that AdamD outperforms Adam and is comparable to AdamW, in the aspects of both generalization performance and efficiency. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 26 pages

arXiv:2310.00376 [pdf, other]

Self-adaptive ADMM for semi-strongly convex problems

Authors: Tianyun Tang, Kim-Chuan Toh

Abstract: In this paper, we develop a self-adaptive ADMM that updates the penalty parameter adaptively. When one part of the objective function is strongly convex i.e., the problem is semi-strongly convex, our algorithm can update the penalty parameter adaptively with guaranteed convergence. We establish various types of convergence results including accelerated convergence rate of O(1/k^2), linear converge… ▽ More In this paper, we develop a self-adaptive ADMM that updates the penalty parameter adaptively. When one part of the objective function is strongly convex i.e., the problem is semi-strongly convex, our algorithm can update the penalty parameter adaptively with guaranteed convergence. We establish various types of convergence results including accelerated convergence rate of O(1/k^2), linear convergence and convergence of iteration points. This enhances various previous results because we allow the penalty parameter to change adaptively. We also develop a partial proximal point method with the subproblem solved by our adaptive ADMM. This enables us to solve problems without semi-strongly convex property. Numerical experiments are conducted to demonstrate the high efficiency and robustness of our method. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 36 pages, 2 figures

MSC Class: 90C06; 90C25; 90C90

arXiv:2308.16690 [pdf, other]

On solving a rank regularized minimization problem via equivalent factorized column-sparse regularized models

Authors: Wen**g Li, Wei Bian, Kim-Chuan Toh

Abstract: Rank regularized minimization problem is an ideal model for the low-rank matrix completion/recovery problem. The matrix factorization approach can transform the high-dimensional rank regularized problem to a low-dimensional factorized column-sparse regularized problem. The latter can greatly facilitate fast computations in applicable algorithms, but needs to overcome the simultaneous non-convexity… ▽ More Rank regularized minimization problem is an ideal model for the low-rank matrix completion/recovery problem. The matrix factorization approach can transform the high-dimensional rank regularized problem to a low-dimensional factorized column-sparse regularized problem. The latter can greatly facilitate fast computations in applicable algorithms, but needs to overcome the simultaneous non-convexity of the loss and regularization functions. In this paper, we consider the factorized column-sparse regularized model. Firstly, we optimize this model with bound constraints, and establish a certain equivalence between the optimized factorization problem and rank regularized problem. Further, we strengthen the optimality condition for stationary points of the factorization problem and define the notion of strong stationary point. Moreover, we establish the equivalence between the factorization problem and its a nonconvex relaxation in the sense of global minimizers and strong stationary points. To solve the factorization problem, we design two types of algorithms and give an adaptive method to reduce their computation. The first algorithm is from the relaxation point of view and its iterates own some properties from global minimizers of the factorization problem after finite iterations. We give some analysis on the convergence of its iterates to the strong stationary point. The second algorithm is designed for directly solving the factorization problem. We improve the PALM algorithm introduced by Bolte et al. (Math Program Ser A 146:459-494, 2014) for the factorization problem and give its improved convergence results. Finally, we conduct numerical experiments to show the promising performance of the proposed model and algorithms for low-rank matrix completion. △ Less

Submitted 20 May, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: 46 pages

MSC Class: 90C46; 90C26; 65K05

arXiv:2307.10855 [pdf, ps, other]

Quantifying low rank approximations of third order symmetric tensors

Authors: Shenglong Hu, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we present a method to certify the approximation quality of a low rank tensor to a given third order symmetric tensor. Under mild assumptions, best low rank approximation is attained if a control parameter is zero or quantified quasi-optimal low rank approximation is obtained if the control parameter is positive.This is based on a primal-dual method for computing a low rank approxim… ▽ More In this paper, we present a method to certify the approximation quality of a low rank tensor to a given third order symmetric tensor. Under mild assumptions, best low rank approximation is attained if a control parameter is zero or quantified quasi-optimal low rank approximation is obtained if the control parameter is positive.This is based on a primal-dual method for computing a low rank approximation for a given tensor. The certification is derived from the global optimality of the primal and dual problems, and is characterized by easily checkable relations between the primal and the dual solutions together with another rank condition. The theory is verified theoretically for orthogonally decomposable tensors as well as numerically through examples in the general case. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 46pages

arXiv:2307.10053 [pdf, other]

SGD-type Methods with Guaranteed Global Stability in Nonsmooth Nonconvex Optimization

Authors: Nachuan Xiao, Xiaoyin Hu, Kim-Chuan Toh

Abstract: In this paper, we focus on providing convergence guarantees for variants of the stochastic subgradient descent (SGD) method in minimizing nonsmooth nonconvex functions. We first develop a general framework to establish global stability for general stochastic subgradient methods, where the corresponding differential inclusion admits a coercive Lyapunov function. We prove that, with sufficiently sma… ▽ More In this paper, we focus on providing convergence guarantees for variants of the stochastic subgradient descent (SGD) method in minimizing nonsmooth nonconvex functions. We first develop a general framework to establish global stability for general stochastic subgradient methods, where the corresponding differential inclusion admits a coercive Lyapunov function. We prove that, with sufficiently small stepsizes and controlled noises, the iterates asymptotically stabilize around the stable set of its corresponding differential inclusion. Then we introduce a scheme for develo** SGD-type methods with regularized update directions for the primal variables. Based on our developed framework, we prove the global stability of our proposed scheme under mild conditions. We further illustrate that our scheme yields variants of SGD-type methods, which enjoy guaranteed convergence in training nonsmooth neural networks. In particular, by employing the sign map to regularize the update directions, we propose a novel subgradient method named the Sign-map Regularized SGD method (SRSGD). Preliminary numerical experiments exhibit the high efficiency of SRSGD in training deep neural networks. △ Less

Submitted 13 May, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: 36 pages

arXiv:2306.17369 [pdf, other]

Adaptive sieving: A dimension reduction technique for sparse optimization problems

Authors: Yancheng Yuan, Meixia Lin, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller sizes need to be solved. We further apply the proposed AS strategy to generate solution paths for large-scale sparse optimization problems efficiently. We establ… ▽ More In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller sizes need to be solved. We further apply the proposed AS strategy to generate solution paths for large-scale sparse optimization problems efficiently. We establish the theoretical guarantees for the proposed AS strategy including its finite termination property. Extensive numerical experiments are presented in this paper to demonstrate the effectiveness and flexibility of the AS strategy to solve large-scale machine learning models. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.14522 [pdf, other]

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

Authors: Kuangyu Ding, **gyang Li, Kim-Chuan Toh

Abstract: The widely used stochastic gradient methods for minimizing nonconvex composite objective functions require the Lipschitz smoothness of the differentiable part. But the requirement does not hold true for problem classes including quadratic inverse problems and training neural networks. To address this issue, we investigate a family of stochastic Bregman proximal gradient (SBPG) methods, which only… ▽ More The widely used stochastic gradient methods for minimizing nonconvex composite objective functions require the Lipschitz smoothness of the differentiable part. But the requirement does not hold true for problem classes including quadratic inverse problems and training neural networks. To address this issue, we investigate a family of stochastic Bregman proximal gradient (SBPG) methods, which only require smooth adaptivity of the differentiable part. SBPG replaces the upper quadratic approximation used in SGD with the Bregman proximity measure, resulting in a better approximation model that captures the non-Lipschitz gradients of the nonconvex objective. We formulate the vanilla SBPG and establish its convergence properties under nonconvex setting without finite-sum structure. Experimental results on quadratic inverse problems testify the robustness of SBPG. Moreover, we propose a momentum-based version of SBPG (MSBPG) and prove it has improved convergence properties. We apply MSBPG to the training of deep neural networks with a polynomial kernel function, which ensures the smooth adaptivity of the loss function. Experimental results on representative benchmarks demonstrate the effectiveness and robustness of MSBPG in training neural networks. Since the additional computation cost of MSBPG compared with SGD is negligible in large-scale optimization, MSBPG can potentially be employed as an universal open-source optimizer in the future. △ Less

Submitted 29 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 37 pages

arXiv:2306.14196 [pdf, other]

A Highly Efficient Algorithm for Solving Exclusive Lasso Problems

Authors: Meixia Lin, Yancheng Yuan, Defeng Sun, Kim-Chuan Toh

Abstract: The exclusive lasso (also known as elitist lasso) regularizer has become popular recently due to its superior performance on intra-group feature selection. Its complex nature poses difficulties for the computation of high-dimensional machine learning models involving such a regularizer. In this paper, we propose a highly efficient dual Newton method based proximal point algorithm (PPDNA) for solvi… ▽ More The exclusive lasso (also known as elitist lasso) regularizer has become popular recently due to its superior performance on intra-group feature selection. Its complex nature poses difficulties for the computation of high-dimensional machine learning models involving such a regularizer. In this paper, we propose a highly efficient dual Newton method based proximal point algorithm (PPDNA) for solving large-scale exclusive lasso models. As important ingredients, we systematically study the proximal map** of the weighted exclusive lasso regularizer and the corresponding generalized Jacobian. These results also make popular first-order algorithms for solving exclusive lasso models more practical. Extensive numerical results are presented to demonstrate the superior performance of the PPDNA against other popular numerical algorithms for solving the exclusive lasso problems. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2009.08719

arXiv:2306.11003 [pdf, other]

Agent-based modeling of the COVID-19 pandemic in Florida

Authors: Alexander N. Pillai, Kok Ben Toh, Dianela Perdomo, Sanjana Bhargava, Arlin Stoltzfus, Ira M. Longini Jr., Carl A. B. Pearson, Thomas J. Hladish

Abstract: The onset of the COVID-19 pandemic drove a widespread, often uncoordinated effort by research groups to develop mathematical models of SARS-CoV-2 to study its spread and inform control efforts. The urgent demand for insight at the outset of the pandemic meant early models were typically either simple or repurposed from existing research agendas. Our group predominantly uses agent-based models (ABM… ▽ More The onset of the COVID-19 pandemic drove a widespread, often uncoordinated effort by research groups to develop mathematical models of SARS-CoV-2 to study its spread and inform control efforts. The urgent demand for insight at the outset of the pandemic meant early models were typically either simple or repurposed from existing research agendas. Our group predominantly uses agent-based models (ABMs) to study fine-scale intervention scenarios. These high-resolution models are large, complex, require extensive empirical data, and are often more detailed than strictly necessary for answering qualitative questions like "Should we lockdown?" During the early stages of an extraordinary infectious disease crisis, particularly before clear empirical evidence is available, simpler models are more appropriate. As more detailed empirical evidence becomes available, however, and policy decisions become more nuanced and complex, fine-scale approaches like ours become more useful. In this manuscript, we discuss how our group navigated this transition as we modeled the pandemic. The role of modelers often included nearly real-time analysis, and the massive undertaking of adapting our tools quickly. We were often playing catch up with a firehose of evidence, while simultaneously struggling to do both academic research and real-time decision support, under conditions conducive to neither. By reflecting on our experiences of responding to the pandemic and what we learned from these challenges, we can better prepare for future demands. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2305.03938 [pdf, other]

Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh

Abstract: In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family… ▽ More In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clip** techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we show that our proposed methods converge even when the evaluation noises are only assumed to be integrable. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods. △ Less

Submitted 19 February, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

Comments: 53 pages

arXiv:2305.03926 [pdf, other]

Trajectory-oriented optimization of stochastic epidemiological models

Authors: Arindam Fadikar, Mickael Binois, Nicholson Collier, Abby Stevens, Kok Ben Toh, Jonathan Ozik

Abstract: Epidemiological models must be calibrated to ground truth for downstream tasks such as producing forward projections or running what-if scenarios. The meaning of calibration changes in case of a stochastic model since output from such a model is generally described via an ensemble or a distribution. Each member of the ensemble is usually mapped to a random number seed (explicitly or implicitly). W… ▽ More Epidemiological models must be calibrated to ground truth for downstream tasks such as producing forward projections or running what-if scenarios. The meaning of calibration changes in case of a stochastic model since output from such a model is generally described via an ensemble or a distribution. Each member of the ensemble is usually mapped to a random number seed (explicitly or implicitly). With the goal of finding not only the input parameter settings but also the random seeds that are consistent with the ground truth, we propose a class of Gaussian process (GP) surrogates along with an optimization strategy based on Thompson sampling. This Trajectory Oriented Optimization (TOO) approach produces actual trajectories close to the empirical observations instead of a set of parameter settings where only the mean simulation behavior matches with the ground truth. △ Less

Submitted 13 September, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2304.10092 [pdf, ps, other]

A Riemannian Dimension-reduced Second Order Method with Application in Sensor Network Localization

Authors: Tianyun Tang, Kim-Chuan Toh, Nachuan Xiao, Yinyu Ye

Abstract: In this paper, we propose a cubic-regularized Riemannian optimization method (RDRSOM), which partially exploits the second order information and achieves the iteration complexity of $\mathcal{O}(1/ε^{3/2})$. In order to reduce the per-iteration computational cost, we further propose a practical version of (RDRSOM), which is an extension of the well known Barzilai-Borwein method and achieves the it… ▽ More In this paper, we propose a cubic-regularized Riemannian optimization method (RDRSOM), which partially exploits the second order information and achieves the iteration complexity of $\mathcal{O}(1/ε^{3/2})$. In order to reduce the per-iteration computational cost, we further propose a practical version of (RDRSOM), which is an extension of the well known Barzilai-Borwein method and achieves the iteration complexity of $\mathcal{O}(1/ε^{3/2})$. We apply our method to solve a nonlinear formulation of the wireless sensor network localization problem whose feasible set is a Riemannian manifold that has not been considered in the literature before. Numerical experiments are conducted to verify the high efficiency of our algorithm compared to state-of-the-art Riemannian optimization methods and other nonlinear solvers. △ Less

Submitted 24 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: 19 pages

arXiv:2304.01467 [pdf, ps, other]

A Partial Exact Penalty Function Approach for Constrained Optimization

Authors: Nachuan Xiao, Xin Liu, Kim-Chuan Toh

Abstract: In this paper, we focus on a class of constrained nonlinear optimization problems (NLP), where some of its equality constraints define a closed embedded submanifold $\mathcal{M}$ in $\mathbb{R}^n$. Although NLP can be solved directly by various existing approaches for constrained optimization in Euclidean space, these approaches usually fail to recognize the manifold structure of $\mathcal{M}$. To… ▽ More In this paper, we focus on a class of constrained nonlinear optimization problems (NLP), where some of its equality constraints define a closed embedded submanifold $\mathcal{M}$ in $\mathbb{R}^n$. Although NLP can be solved directly by various existing approaches for constrained optimization in Euclidean space, these approaches usually fail to recognize the manifold structure of $\mathcal{M}$. To achieve better efficiency by utilizing the manifold structure of $\mathcal{M}$ in directly applying these existing optimization approaches, we propose a partial penalty function approach for NLP. In our proposed penalty function approach, we transform NLP into the corresponding constraint dissolving problem (CDP) in the Euclidean space, where the constraints that define $\mathcal{M}$ are eliminated through exact penalization. We establish the relationships on the constraint qualifications between NLP and CDP, and prove that NLP and CDP have the same stationary points and KKT points in a neighborhood of the feasible region under mild conditions. Therefore, various existing optimization approaches developed for constrained optimization in the Euclidean space can be directly applied to solve NLP through CDP. Preliminary numerical experiments demonstrate that by dissolving the constraints that define $\mathcal{M}$, CDP gains superior computational efficiency when compared to directly applying existing optimization approaches to solve NLP, especially in high dimensional scenarios. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 27 pages

arXiv:2303.06893 [pdf, ps, other]

On proximal augmented Lagrangian based decomposition methods for dual block-angular convex composite programming problems

Authors: Kuang-Yu Ding, Xin-Yee Lam, Kim-Chuan Toh

Abstract: We design inexact proximal augmented Lagrangian based decomposition methods for convex composite programming problems with dual block-angular structures. Our methods are particularly well suited for convex quadratic programming problems arising from stochastic programming models. The algorithmic framework is based on the application of the abstract inexact proximal ADMM framework developed in [Che… ▽ More We design inexact proximal augmented Lagrangian based decomposition methods for convex composite programming problems with dual block-angular structures. Our methods are particularly well suited for convex quadratic programming problems arising from stochastic programming models. The algorithmic framework is based on the application of the abstract inexact proximal ADMM framework developed in [Chen, Sun, Toh, Math. Prog. 161:237--270] to the dual of the target problem, as well as the application of the recently developed symmetric Gauss-Seidel decomposition theorem for solving a proximal multi-block convex composite quadratic programming problem. The key issues in our algorithmic design are firstly in designing appropriate proximal terms to decompose the computation of the dual variable blocks of the target problem to make the subproblems in each iteration easier to solve, and secondly to develop novel numerical schemes to solve the decomposed subproblems efficiently. Our inexact augmented Lagrangian based decomposition methods have guaranteed convergence. We present an application of the proposed algorithms to the doubly nonnegative relaxations of uncapacitated facility location problems, as well as to two-stage stochastic optimization problems. We conduct numerous numerical experiments to evaluate the performance of our method against state-of-the-art solvers such as Gurobi and MOSEK. Moreover, our proposed algorithms also compare favourably to the well-known progressive hedging algorithm of Rockafellar and Wets. △ Less

Submitted 13 March, 2023; originally announced March 2023.

arXiv:2303.06599 [pdf, ps, other]

A feasible method for solving an SDP relaxation of the quadratic knapsack problem

Authors: Tianyun Tang, Kim-Chuan Toh

Abstract: In this paper, we consider an SDP relaxation of the quadratic knapsack problem (QKP). After using the Burer-Monteiro factorization, we get a non-convex optimization problem, whose feasible region is an algebraic variety. Although there might be non-regular points on the algebraic variety, we prove that the algebraic variety is a smooth manifold except for a trivial point for a generic input data.… ▽ More In this paper, we consider an SDP relaxation of the quadratic knapsack problem (QKP). After using the Burer-Monteiro factorization, we get a non-convex optimization problem, whose feasible region is an algebraic variety. Although there might be non-regular points on the algebraic variety, we prove that the algebraic variety is a smooth manifold except for a trivial point for a generic input data. We also analyze the local geometric properties of non-regular points on this algebraic variety. In order to maintain the equivalence between the SDP problem and its non-convex formulation, we derive a new rank condition under which these two problems are equivalent. This new rank condition can be much weaker than the classical rank condition if the coefficient matrix has certain special structures. We also prove that under an appropriate rank condition, any second order stationary point of the non-convex problem is also a global optimal solution without any regularity assumption. This result is distinguished from previous results based on LICQ-like smoothness assumption. With all these theoretical properties, we design an algorithm that equip a manifold optimization method with a strategy to escape from non-optimal non-regular points. Our algorithm can also be used as a heuristic for solving the quadratic knapsack problem. Numerical experiments are conducted to verify the high efficiency and robustness of our algorithm as compared to other SDP solvers and a heuristic method based on dynamic programming. In particular, our algorithm is able to solve the SDP relaxation of a one-million dimensional QKP with a sparse cost matrix very accurately in about 20 minutes on a modest desktop computer. △ Less

Submitted 12 March, 2023; originally announced March 2023.

MSC Class: 90C06; 90C10; 90C22

arXiv:2303.05825 [pdf, ps, other]

A squared smoothing Newton method for semidefinite programming

Authors: Ling Liang, Defeng Sun, Kim-Chuan Toh

Abstract: This paper proposes a squared smoothing Newton method via the Huber smoothing function for solving semidefinite programming problems (SDPs). We first study the fundamental properties of the matrix-valued map** defined upon the Huber function. Using these results and existing ones in the literature, we then conduct rigorous convergence analysis and establish convergence properties for the propose… ▽ More This paper proposes a squared smoothing Newton method via the Huber smoothing function for solving semidefinite programming problems (SDPs). We first study the fundamental properties of the matrix-valued map** defined upon the Huber function. Using these results and existing ones in the literature, we then conduct rigorous convergence analysis and establish convergence properties for the proposed algorithm. In particular, we show that the proposed method is well-defined and admits global convergence. Moreover, under suitable regularity conditions, i.e., the primal and dual constraint nondegenerate conditions, the proposed method is shown to have a superlinear convergence rate. To evaluate the practical performance of the algorithm, we conduct extensive numerical experiments for solving various classes of SDPs. Comparison with the state-of-the-art SDP solvers demonstrates that our method is also efficient for computing accurate solutions of SDPs. △ Less

Submitted 2 July, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: 49 pages

MSC Class: 90C06; 90C22; 90C25

arXiv:2302.08020 [pdf, other]

doi 10.1038/s41586-024-07131-7

All-Electrical Skyrmionic Bits in a Chiral Magnetic Tunnel Junction

Authors: Shaohai Chen, Pin Ho, James Lourembam, Alexander K. J. Toh, Jifei Huang, Xiaoye Chen, Hang Khume Tan, Sherry K. L. Yap, Royston J. J. Lim, Hui Ru Tan, T. S. Suraj, Yeow Teck Toh, Idayu Lim, **g Zhou, Hong **g Chung, Sze Ter Lim, Anjan Soumyanarayanan

Abstract: Topological spin textures such as magnetic skyrmions hold considerable promise as robust, nanometre-scale, mobile bits for sustainable computing. A longstanding roadblock to unleashing their potential is the absence of a device enabling deterministic electrical readout of individual spin textures. Here we present the wafer-scale realization of a nanoscale chiral magnetic tunnel junction (MTJ) host… ▽ More Topological spin textures such as magnetic skyrmions hold considerable promise as robust, nanometre-scale, mobile bits for sustainable computing. A longstanding roadblock to unleashing their potential is the absence of a device enabling deterministic electrical readout of individual spin textures. Here we present the wafer-scale realization of a nanoscale chiral magnetic tunnel junction (MTJ) hosting a single, ambient skyrmion. Using a suite of electrical and multi-modal imaging techniques, we show that the MTJ nucleates skyrmions of fixed polarity, whose large readout signal - 20-70% relative to uniform states - corresponds directly to skyrmion size. Further, the MTJ exploits complementary mechanisms to stabilize distinctly sized skyrmions at zero field, thereby realizing three nonvolatile electrical states. Crucially, it can write and delete skyrmions using current densities 1,000 times lower than state-of-the-art. These results provide a platform to incorporate readout and manipulation of skyrmionic bits across myriad device architectures, and a springboard to harness chiral spin textures for multi-bit memory and unconventional computing. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: 8 pages, 5 figures

Journal ref: Nature (2024) 627, 522

arXiv:2212.02698 [pdf, other]

CDOpt: A Python Package for a Class of Riemannian Optimization

Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh

Abstract: Optimization over the embedded submanifold defined by constraints $c(x) = 0$ has attracted much interest over the past few decades due to its wide applications in various areas. Plenty of related optimization packages have been developed based on Riemannian optimization approaches, which rely on some basic geometrical materials of Riemannian manifolds, including retractions, vector transports, etc… ▽ More Optimization over the embedded submanifold defined by constraints $c(x) = 0$ has attracted much interest over the past few decades due to its wide applications in various areas. Plenty of related optimization packages have been developed based on Riemannian optimization approaches, which rely on some basic geometrical materials of Riemannian manifolds, including retractions, vector transports, etc. These geometrical materials can be challenging to determine in general. Existing packages only accommodate a few well-known manifolds whose geometrical materials are easily accessible. For other manifolds which are not contained in these packages, the users have to develop the geometric materials by themselves. In addition, it is not always tractable to adopt advanced features from various state-of-the-art unconstrained optimization solvers to Riemannian optimization approaches. We introduce CDOpt (available at https://cdopt.github.io/), a user-friendly Python package for a class Riemannian optimization. Based on constraint dissolving approaches, Riemannian optimization problems are transformed into their equivalent unconstrained counterparts in CDOpt. Therefore, solving Riemannian optimization problems through CDOpt directly benefits from various existing solvers and the rich expertise gained over decades for unconstrained optimization. Moreover, all the computations in CDOpt related to any manifold in question are conducted on its constraints expression, hence users can easily define new manifolds in CDOpt without any background on differential geometry. Furthermore, CDOpt extends the neural layers from PyTorch and Flax, thus allows users to train manifold constrained neural networks directly by the solvers for unconstrained optimization. Extensive numerical experiments demonstrate that CDOpt is highly efficient and robust in solving various classes of Riemannian optimization problems. △ Less

Submitted 28 March, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: 31 pages

arXiv:2209.06175 [pdf, ps, other]

Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant

Authors: Ngoc Hoang Anh Mai, Victor Magron, Jean-Bernard Lasserre, Kim-Chuan Toh

Abstract: We consider polynomial optimization problems (POP) on a semialgebraic set contained in the nonnegative orthant (every POP on a compact set can be put in this format by a simple translation of the origin). Such a POP can be converted to an equivalent POP by squaring each variable. Using even symmetry and the concept of factor width, we propose a hierarchy of semidefinite relaxations based on the ex… ▽ More We consider polynomial optimization problems (POP) on a semialgebraic set contained in the nonnegative orthant (every POP on a compact set can be put in this format by a simple translation of the origin). Such a POP can be converted to an equivalent POP by squaring each variable. Using even symmetry and the concept of factor width, we propose a hierarchy of semidefinite relaxations based on the extension of Pólya's Positivstellensatz by Dickinson-Povh. As its distinguishing and crucial feature, the maximal matrix size of each resulting semidefinite relaxation can be chosen arbitrarily and in addition, we prove that the sequence of values returned by the new hierarchy converges to the optimal value of the original POP at the rate $O(\varepsilon^{-c})$ if the semialgebraic set has nonempty interior. When applied to (i) robustness certification of multi-layer neural networks and (ii) computation of positive maximal singular values, our method based on Pólya's Positivstellensatz provides better bounds and runs several hundred times faster than the standard Moment-SOS hierarchy. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 39 pages, 15 tables

arXiv:2208.07514 [pdf, other]

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models

Authors: Yang**g Zhang, Ying Cui, Bodhisattva Sen, Kim-Chuan Toh

Abstract: In this paper we study the computation of the nonparametric maximum likelihood estimator (NPMLE) in multivariate mixture models. Our first approach discretizes this infinite dimensional convex optimization problem by fixing the support points of the NPMLE and optimizing over the mixture proportions. In this context we propose, leveraging the sparsity of the solution, an efficient and scalable semi… ▽ More In this paper we study the computation of the nonparametric maximum likelihood estimator (NPMLE) in multivariate mixture models. Our first approach discretizes this infinite dimensional convex optimization problem by fixing the support points of the NPMLE and optimizing over the mixture proportions. In this context we propose, leveraging the sparsity of the solution, an efficient and scalable semismooth Newton based augmented Lagrangian method (ALM). Our algorithm beats the state-of-the-art methods~\cite{koenker2017rebayes, kim2020fast} and can handle $n \approx 10^6$ data points with $m \approx 10^4$ support points. Our second procedure, which combines the expectation-maximization (EM) algorithm with the ALM approach above, allows for joint optimization of both the support points and the probability weights. For both our algorithms we provide formal results on their (superlinear) convergence properties. The computed NPMLE can be immediately used for denoising the observations in the framework of empirical Bayes. We propose new denoising estimands in this context along with their consistent estimates. Extensive numerical experiments are conducted to illustrate the effectiveness of our methods. In particular, we employ our procedures to analyze two astronomy data sets: (i) Gaia-TGAS Catalog~\cite{anderson2018improving} containing $n \approx 1.4 \times 10^6$ data points in two dimensions, and (ii) the $d=19$ dimensional data set from the APOGEE survey~\cite{majewski2017apache} with $n \approx 2.7 \times 10^4$. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Journal ref: Journal of Machine Learning Research, 25 (2024), pp. 1-46

arXiv:2208.00732 [pdf, ps, other]

An Improved Unconstrained Approach for Bilevel Optimization

Authors: Xiaoyin Hu, Nachuan Xiao, Xin Liu, Kim-Chuan Toh

Abstract: In this paper, we focus on the nonconvex-strongly-convex bilevel optimization problem (BLO). In this BLO, the objective function of the upper-level problem is nonconvex and possibly nonsmooth, and the lower-level problem is smooth and strongly convex with respect to the underlying variable $y$. We show that the feasible region of BLO is a Riemannian manifold. Then we transform BLO to its correspon… ▽ More In this paper, we focus on the nonconvex-strongly-convex bilevel optimization problem (BLO). In this BLO, the objective function of the upper-level problem is nonconvex and possibly nonsmooth, and the lower-level problem is smooth and strongly convex with respect to the underlying variable $y$. We show that the feasible region of BLO is a Riemannian manifold. Then we transform BLO to its corresponding unconstrained constraint dissolving problem (CDB), whose objective function is explicitly formulated from the objective functions in BLO. We prove that BLO is equivalent to the unconstrained optimization problem CDB. Therefore, various efficient unconstrained approaches, together with their theoretical results, can be directly applied to BLO through CDB. We propose a unified framework for develo** subgradient-based methods for CDB. Remarkably, we show that several existing efficient algorithms can fit the unified framework and be interpreted as descent algorithms for CDB. These examples further demonstrate the great potential of our proposed approach. △ Less

Submitted 23 December, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: 27 pages, revised version

MSC Class: 15A18; 65F15; 65K05; 90C06

arXiv:2205.14922 [pdf, other]

ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

Authors: Hui** Zhuang, Zhenyu Weng, Hongxin Wei, Renchunzi Xie, Kar-Ann Toh, Zhi** Lin

Abstract: Class-incremental learning (CIL) learns a classification model with training data of different classes arising progressively. Existing CIL either suffers from serious accuracy loss due to catastrophic forgetting, or invades data privacy by revisiting used exemplars. Inspired by linear learning formulations, we propose an analytic class-incremental learning (ACIL) with absolute memorization of past… ▽ More Class-incremental learning (CIL) learns a classification model with training data of different classes arising progressively. Existing CIL either suffers from serious accuracy loss due to catastrophic forgetting, or invades data privacy by revisiting used exemplars. Inspired by linear learning formulations, we propose an analytic class-incremental learning (ACIL) with absolute memorization of past knowledge while avoiding breaching of data privacy (i.e., without storing historical data). The absolute memorization is demonstrated in the sense that class-incremental learning using ACIL given present data would give identical results to that from its joint-learning counterpart which consumes both present and historical samples. This equality is theoretically validated. Data privacy is ensured since no historical data are involved during the learning process. Empirical validations demonstrate ACIL's competitive accuracy performance with near-identical results for various incremental task settings (e.g., 5-50 phases). This also allows ACIL to outperform the state-of-the-art methods for large-phase scenarios (e.g., 25 and 50 phases). △ Less

Submitted 10 December, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: published in NeurIPS 2022

arXiv:2205.10500 [pdf, other]

A Constraint Dissolving Approach for Nonsmooth Optimization over the Stiefel Manifold

Authors: Xiaoyin Hu, Nachuan Xiao, Xin Liu, Kim-Chuan Toh

Abstract: This paper focus on the minimization of a possibly nonsmooth objective function over the Stiefel manifold. The existing approaches either lack efficiency or can only tackle prox-friendly objective functions. We propose a constraint dissolving function named NCDF and show that it has the same first-order stationary points and local minimizers as the original problem in a neighborhood of the Stiefel… ▽ More This paper focus on the minimization of a possibly nonsmooth objective function over the Stiefel manifold. The existing approaches either lack efficiency or can only tackle prox-friendly objective functions. We propose a constraint dissolving function named NCDF and show that it has the same first-order stationary points and local minimizers as the original problem in a neighborhood of the Stiefel manifold. Furthermore, we show that the Clarke subdifferential of NCDF is easy to achieve from the Clarke subdifferential of the objective function. Therefore, various existing approaches for unconstrained nonsmooth optimization can be directly applied to nonsmooth optimization problems over the Stiefel manifold. We propose a framework for develo** subgradient-based methods and establish their convergence properties based on prior works. Furthermore, based on our proposed framework, we can develop efficient approaches for optimization over the Stiefel manifold. Preliminary numerical experiments further highlight that the proposed constraint dissolving approach yields efficient and direct implementations of various unconstrained approaches to nonsmooth optimization problems over the Stiefel manifold. △ Less

Submitted 20 January, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

Comments: Revised version, 26 pages

arXiv:2204.14067 [pdf, other]

Accelerating nuclear-norm regularized low-rank matrix optimization through Burer-Monteiro decomposition

Authors: Ching-pei Lee, Ling Liang, Tianyun Tang, Kim-Chuan Toh

Abstract: This work proposes a rapid algorithm, BM-Global, for nuclear-norm-regularized convex and low-rank matrix optimization problems. BM-Global efficiently decreases the objective value via low-cost steps leveraging the nonconvex but smooth Burer-Monteiro (BM) decomposition, while effectively escapes saddle points and spurious local minima ubiquitous in the BM form to obtain guarantees of fast convergen… ▽ More This work proposes a rapid algorithm, BM-Global, for nuclear-norm-regularized convex and low-rank matrix optimization problems. BM-Global efficiently decreases the objective value via low-cost steps leveraging the nonconvex but smooth Burer-Monteiro (BM) decomposition, while effectively escapes saddle points and spurious local minima ubiquitous in the BM form to obtain guarantees of fast convergence rates to the global optima of the original nuclear-norm-regularized problem through aperiodic inexact proximal gradient steps on it. The proposed approach adaptively adjusts the rank for the BM decomposition and can provably identify an optimal rank for the BM decomposition problem automatically in the course of optimization through tools of manifold identification. BM-Global hence also spends significantly less time on parameter tuning than existing matrix-factorization methods, which require an exhaustive search for finding this optimal rank. Extensive experiments on real-world large-scale problems of recommendation systems, regularized kernel estimation, and molecular conformation confirm that BM-Global can indeed effectively escapes spurious local minima at which existing BM approaches are stuck, and is a magnitude faster than state-of-the-art algorithms for low-rank matrix optimization problems involving a nuclear-norm regularizer. △ Less

Submitted 13 January, 2023; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: 51 pages, including 16 pages of supplementary materials

arXiv:2203.10319 [pdf, ps, other]

Dissolving Constraints for Riemannian Optimization

Authors: Nachuan Xiao, Xin Liu, Kim-Chuan Toh

Abstract: In this paper, we consider optimization problems over closed embedded submanifolds of $\mathbb{R}^n$, which are defined by the constraints $c(x) = 0$. We propose a class of constraint dissolving approaches for these Riemannian optimization problems. In these proposed approaches, solving a Riemannian optimization problem is transferred into the unconstrained minimization of a constraint dissolving… ▽ More In this paper, we consider optimization problems over closed embedded submanifolds of $\mathbb{R}^n$, which are defined by the constraints $c(x) = 0$. We propose a class of constraint dissolving approaches for these Riemannian optimization problems. In these proposed approaches, solving a Riemannian optimization problem is transferred into the unconstrained minimization of a constraint dissolving function named CDF. Different from existing exact penalty functions, the exact gradient and Hessian of CDF are easy to compute. We study the theoretical properties of CDF and prove that the original problem and CDF have the same first-order and second-order stationary points, local minimizers, and Łojasiewicz exponents in a neighborhood of the feasible region. Remarkably, the convergence properties of our proposed constraint dissolving approaches can be directly inherited from the existing rich results in unconstrained optimization. Therefore, the proposed constraint dissolving approaches build up short cuts from unconstrained optimization to Riemannian optimization. Several illustrative examples further demonstrate the potential of our proposed constraint dissolving approaches. △ Less

Submitted 14 October, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

Comments: 38 pages

arXiv:2203.09721 [pdf, ps, other]

Deterministic Bridge Regression for Compressive Classification

Authors: Kar-Ann Toh, Giuseppe Molteni, Zhi** Lin

Abstract: Pattern classification with compact representation is an important component in machine intelligence. In this work, an analytic bridge solution is proposed for compressive classification. The proposal has been based upon solving a penalized error formulation utilizing an approximated $\ell_p$-norm. The solution comes in a primal form for over-determined systems and in a dual form for under-determi… ▽ More Pattern classification with compact representation is an important component in machine intelligence. In this work, an analytic bridge solution is proposed for compressive classification. The proposal has been based upon solving a penalized error formulation utilizing an approximated $\ell_p$-norm. The solution comes in a primal form for over-determined systems and in a dual form for under-determined systems. While the primal form is suitable for problems of low dimension with large data samples, the dual form is suitable for problems of high dimension but with a small number of data samples. The solution has also been extended for problems with multiple classification outputs. Numerical studies based on simulated and real-world data validated the effectiveness of the proposed solution. △ Less

Submitted 17 March, 2022; originally announced March 2022.

arXiv:2202.06504 [pdf, other]

Analytic Learning of Convolutional Neural Network For Pattern Recognition

Authors: Hui** Zhuang, Zhi** Lin, Yimin Yang, Kar-Ann Toh

Abstract: Training convolutional neural networks (CNNs) with back-propagation (BP) is time-consuming and resource-intensive particularly in view of the need to visit the dataset multiple times. In contrast, analytic learning attempts to obtain the weights in one epoch. However, existing attempts to analytic learning considered only the multilayer perceptron (MLP). In this article, we propose an analytic con… ▽ More Training convolutional neural networks (CNNs) with back-propagation (BP) is time-consuming and resource-intensive particularly in view of the need to visit the dataset multiple times. In contrast, analytic learning attempts to obtain the weights in one epoch. However, existing attempts to analytic learning considered only the multilayer perceptron (MLP). In this article, we propose an analytic convolutional neural network learning (ACnnL). Theoretically we show that ACnnL builds a closed-form solution similar to its MLP counterpart, but differs in their regularization constraints. Consequently, we are able to answer to a certain extent why CNNs usually generalize better than MLPs from the implicit regularization point of view. The ACnnL is validated by conducting classification tasks on several benchmark datasets. It is encouraging that the ACnnL trains CNNs in a significantly fast manner with reasonably close prediction accuracies to those using BP. Moreover, our experiments disclose a unique advantage of ACnnL under the small-sample scenario when training data are scarce or expensive. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2112.04256 [pdf, ps, other]

Solving graph equipartition SDPs on an algebraic variety

Authors: Tianyun Tang, Kim-Chuan Toh

Abstract: Semidefinite programs are generally challenging to solve due to their high dimensionality. Burer and Monteiro developed a non-convex approach to solve linear SDP problems by applying its low rank property. Their approach is fast because they used factorization to reduce the problem size. In this paper, we focus on solving the SDP relaxation of a graph equipartition problem, which involves an addit… ▽ More Semidefinite programs are generally challenging to solve due to their high dimensionality. Burer and Monteiro developed a non-convex approach to solve linear SDP problems by applying its low rank property. Their approach is fast because they used factorization to reduce the problem size. In this paper, we focus on solving the SDP relaxation of a graph equipartition problem, which involves an additional semidefinite upper bound constraint over the traditional linear SDP. By applying the factorization approach, we get a non-convex problem with an additional non-smooth spectral inequality constraint. We discuss when the non-convex problem is equivalent to the original SDP, and when a second order stationary point of the non-convex problem is also a global minimum. Our results generalize previous works on smooth non-convex factorization approaches for linear SDP to the non-smooth case. Moreover, the constraints of the non-convex problem involve an algebraic variety with some conducive properties that allow us to use Riemannian optimization techniques and non-convex augmented Lagrangian method to solve the SDP problem very efficiently with certified global optimality. △ Less

Submitted 2 August, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: 44 pages, 0 figure

MSC Class: 90C06; 90C22; 90C27

arXiv:2109.05690 [pdf, other]

Inexact Bregman Proximal Gradient Method and its Inertial Variant with Absolute and Relative Stop** Criteria

Authors: Lei Yang, Kim-Chuan Toh

Abstract: The Bregman proximal gradient method (BPGM), which uses the Bregman distance as a proximity measure in the iterative scheme, has recently been re-developed for minimizing convex composite problems \textit{without} the global Lipschitz gradient continuity assumption. This makes the BPGM appealing for a wide range of applications, and hence it has received growing attention in recent years. However,… ▽ More The Bregman proximal gradient method (BPGM), which uses the Bregman distance as a proximity measure in the iterative scheme, has recently been re-developed for minimizing convex composite problems \textit{without} the global Lipschitz gradient continuity assumption. This makes the BPGM appealing for a wide range of applications, and hence it has received growing attention in recent years. However, most existing convergence results are only obtained under the assumption that the involved subproblems are solved \textit{exactly}, which is not realistic in many applications. For the BPGM to be implementable and practical, in this paper, we develop inexact versions of the BPGM by employing either an absolute-type stop** criterion or a relative-type stop** criterion solving the subproblems. The iteration complexity of $\mathcal{O}(1/k)$ and the convergence of the sequence are also established for our iBPGM under some conditions. Moreover, we develop an inertial variant of our iBPGM (denoted by v-iBPGM) and establish the iteration complexity of $\mathcal{O}(1/k^γ)$, where $γ\geq1$ is a restricted relative smoothness exponent. When the smooth part in the objective has a Lipschitz continuous gradient and the kernel function is strongly convex, we have $γ=2$ and thus the v-iBPGM improves the iteration complexity of the iBPGM from $\mathcal{O}(1/k)$ to $\mathcal{O}(1/k^2)$, in accordance with the existing results on the exact accelerated BPGM. Finally, some preliminary numerical experiments for solving the discrete quadratic regularized optimal transport problem are conducted to illustrate the convergence behaviors of our iBPGM and v-iBPGM under different inexactness settings. △ Less

Submitted 23 October, 2023; v1 submitted 12 September, 2021; originally announced September 2021.

arXiv:2109.05251 [pdf, other]

DC algorithms for a class of sparse group $\ell_0$ regularized optimization problems

Authors: Wen**g Li, Wei Bian, Kim-Chuan Toh

Abstract: In this paper, we consider a class of sparse group $\ell_0$ regularized optimization problems. Firstly, we give a continuous relaxation model of the considered problem and establish the equivalence of these two problems in the sense of global minimizers. Then, we define a class of stationary points of the relaxation problem, and prove that any defined stationary point is a local minimizer of the c… ▽ More In this paper, we consider a class of sparse group $\ell_0$ regularized optimization problems. Firstly, we give a continuous relaxation model of the considered problem and establish the equivalence of these two problems in the sense of global minimizers. Then, we define a class of stationary points of the relaxation problem, and prove that any defined stationary point is a local minimizer of the considered sparse group $\ell_0$ regularized problem and satisfies a desirable property of its global minimizers. Further, based on the difference-of-convex (DC) structure of the relaxation problem, we design two DC algorithms to solve the relaxation problem. We prove that any accumulation point of the iterates generated by them is a stationary point of the relaxation problem. In particular, all accumulation points have a common support set and a unified lower bound for the nonzero entries, and their zero entries can be attained within finite iterations. Moreover, we prove the convergence of the entire iterates generated by the proposed algorithms. Finally, we give some numerical experiments to show the efficiency of the proposed algorithms. △ Less

Submitted 5 May, 2022; v1 submitted 11 September, 2021; originally announced September 2021.

arXiv:2109.03632 [pdf, other]

On Regularized Square-root Regression Problems: Distributionally Robust Interpretation and Fast Computations

Authors: Hong T. M. Chu, Kim-Chuan Toh, Yang**g Zhang

Abstract: Square-root (loss) regularized models have recently become popular in linear regression due to their nice statistical properties. Moreover, some of these models can be interpreted as the distributionally robust optimization counterparts of the traditional least-squares regularized models. In this paper, we give a unified proof to show that any square-root regularized model whose penalty function b… ▽ More Square-root (loss) regularized models have recently become popular in linear regression due to their nice statistical properties. Moreover, some of these models can be interpreted as the distributionally robust optimization counterparts of the traditional least-squares regularized models. In this paper, we give a unified proof to show that any square-root regularized model whose penalty function being the sum of a simple norm and a seminorm can be interpreted as the distributionally robust optimization (DRO) formulation of the corresponding least-squares problem. In particular, the optimal transport cost in the DRO formulation is given by a certain dual form of the penalty. To solve the resulting square-root regularized model whose loss function and penalty function are both nonsmooth, we design a proximal point dual semismooth Newton algorithm and demonstrate its efficiency when the penalty is the sparse group Lasso penalty or the fused Lasso penalty. Extensive experiments demonstrate that our algorithm is highly efficient for solving the square-root sparse group Lasso problems and the square-root fused Lasso problems. △ Less

Submitted 5 October, 2023; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 39 pages, 7 figures

Journal ref: Journal of Machine Learning Research, 23 (2022), pp 1-39

arXiv:2108.07462 [pdf, ps, other]

A Dimension Reduction Technique for Large-scale Structured Sparse Optimization Problems with Application to Convex Clustering

Authors: Yancheng Yuan, Tsung-Hui Chang, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we propose a novel adaptive sieving (AS) technique and an enhanced AS (EAS) technique, which are solver independent and could accelerate optimization algorithms for solving large scale convex optimization problems with intrinsic structured sparsity. We establish the finite convergence property of the AS technique and the EAS technique with inexact solutions of the reduced subproblem… ▽ More In this paper, we propose a novel adaptive sieving (AS) technique and an enhanced AS (EAS) technique, which are solver independent and could accelerate optimization algorithms for solving large scale convex optimization problems with intrinsic structured sparsity. We establish the finite convergence property of the AS technique and the EAS technique with inexact solutions of the reduced subproblems. As an important application, we apply the AS technique and the EAS technique on the convex clustering model, which could accelerate the state-of-the-art algorithm SSNAL by more than 7 times and the algorithm ADMM by more than 14 times. △ Less

Submitted 17 August, 2021; originally announced August 2021.

MSC Class: 90C06; 90C25; 90C90

arXiv:2105.14033 [pdf, other]

An Inexact Projected Gradient Method with Rounding and Lifting by Nonlinear Programming for Solving Rank-One Semidefinite Relaxation of Polynomial Optimization

Authors: Heng Yang, Ling Liang, Luca Carlone, Kim-Chuan Toh

Abstract: We consider solving high-order semidefinite programming (SDP) relaxations of nonconvex polynomial optimization problems (POPs) that often admit degenerate rank-one optimal solutions. Instead of solving the SDP alone, we propose a new algorithmic framework that blends local search using the nonconvex POP into global descent using the convex SDP. In particular, we first design a globally convergent… ▽ More We consider solving high-order semidefinite programming (SDP) relaxations of nonconvex polynomial optimization problems (POPs) that often admit degenerate rank-one optimal solutions. Instead of solving the SDP alone, we propose a new algorithmic framework that blends local search using the nonconvex POP into global descent using the convex SDP. In particular, we first design a globally convergent inexact projected gradient method (iPGM) for solving the SDP that serves as the backbone of our framework. We then accelerate iPGM by taking long, but safeguarded, rank-one steps generated by fast nonlinear programming algorithms. We prove that the new framework is still globally convergent for solving the SDP. To solve the iPGM subproblem of projecting a given point onto the feasible set of the SDP, we design a two-phase algorithm with phase one using a symmetric Gauss-Seidel based accelerated proximal gradient method (sGS-APG) to generate a good initial point, and phase two using a modified limited-memory BFGS (L-BFGS) method to obtain an accurate solution. We analyze the convergence for both phases and establish a novel global convergence result for the modified L-BFGS that does not require the objective function to be twice continuously differentiable. We conduct numerical experiments for solving second-order SDP relaxations arising from a diverse set of POPs. Our framework demonstrates state-of-the-art efficiency, scalability, and robustness in solving degenerate rank-one SDPs to high accuracy, even in the presence of millions of equality constraints. △ Less

Submitted 26 October, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: Code available at https://github.com/MIT-SPARK/STRIDE

MSC Class: 90C06; 90C22; 90C23; 90C55

arXiv:2105.10370 [pdf, other]

Bregman Proximal Point Algorithm Revisited: A New Inexact Version and its Inertial Variant

Authors: Lei Yang, Kim-Chuan Toh

Abstract: We study a general convex optimization problem, which covers various classic problems in different areas and particularly includes many optimal transport related problems arising in recent years. To solve this problem, we revisit the classic Bregman proximal point algorithm (BPPA) and introduce a new inexact stop** condition for solving the subproblems, which can circumvent the underlying feasib… ▽ More We study a general convex optimization problem, which covers various classic problems in different areas and particularly includes many optimal transport related problems arising in recent years. To solve this problem, we revisit the classic Bregman proximal point algorithm (BPPA) and introduce a new inexact stop** condition for solving the subproblems, which can circumvent the underlying feasibility difficulty often appearing in existing inexact conditions when the problem has a complex feasible set. Our inexact condition also covers several existing inexact conditions as special cases and hence makes our inexact BPPA (iBPPA) more flexible to fit different scenarios in practice. Moreover, inspired by Nesterov's acceleration technique, we develop an inertial variant of our iBPPA, denoted by V-iBPPA, and establish the iteration complexity of $O(1/k^λ)$, where $λ\geq1$ is a quadrangle scaling exponent of the kernel function. In particular, when the proximal parameter is a constant and the kernel function is strongly convex with Lipschitz continuous gradient (hence $λ=2$), our V-iBPPA achieves a faster rate of $O(1/k^2)$ just as existing accelerated inexact proximal point algorithms. Some preliminary numerical experiments for solving the standard OT problem are conducted to show the convergence behaviors of our iBPPA and V-iBPPA under different inexactness settings. The experiments also empirically verify the potential of our V-iBPPA on improving the convergence speed. △ Less

Submitted 16 May, 2022; v1 submitted 21 May, 2021; originally announced May 2021.

arXiv:2103.13108 [pdf, ps, other]

QPPAL: A two-phase proximal augmented Lagrangian method for high dimensional convex quadratic programming problems

Authors: Ling Liang, Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we aim to solve high dimensional convex quadratic programming (QP) problems with a large number of quadratic terms, linear equality and inequality constraints. In order to solve the targeted {\bf QP} problems to a desired accuracy efficiently, we develop a two-phase {\bf P}roximal {\bf A}ugmented {\bf L}agrangian method {(QPPAL)}, with Phase I to generate a reasonably good initial p… ▽ More In this paper, we aim to solve high dimensional convex quadratic programming (QP) problems with a large number of quadratic terms, linear equality and inequality constraints. In order to solve the targeted {\bf QP} problems to a desired accuracy efficiently, we develop a two-phase {\bf P}roximal {\bf A}ugmented {\bf L}agrangian method {(QPPAL)}, with Phase I to generate a reasonably good initial point to warm start Phase II to obtain an accurate solution efficiently. More specifically, in Phase I, based on the recently developed symmetric Gauss-Seidel (sGS) decomposition technique, we design a novel sGS based semi-proximal augmented Lagrangian method for the purpose of finding a solution of low to medium accuracy. Then, in Phase II, a proximal augmented Lagrangian algorithm is proposed to obtain a more accurate solution efficiently. Extensive numerical results evaluating the performance of {QPPAL} against {existing state-of-the-art solvers Gurobi, OSQP and QPALM} are presented to demonstrate the high efficiency and robustness of our proposed algorithm for solving various classes of large-scale convex QP problems. {The MATLAB implementation of the software package QPPAL is available at: \url{https://blog.nus.edu.sg/mattohkc/softwares/qppal/}. △ Less

Submitted 28 January, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 28 pages, 4 figures

MSC Class: 90C06; 90C22; 90C25

Showing 1–50 of 113 results for author: Toh, K