Search | arXiv e-print repository

QPPAL: A two-phase proximal augmented Lagrangian method for high dimensional convex quadratic programming problems

Authors: Ling Liang, Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we aim to solve high dimensional convex quadratic programming (QP) problems with a large number of quadratic terms, linear equality and inequality constraints. In order to solve the targeted {\bf QP} problems to a desired accuracy efficiently, we develop a two-phase {\bf P}roximal {\bf A}ugmented {\bf L}agrangian method {(QPPAL)}, with Phase I to generate a reasonably good initial p… ▽ More In this paper, we aim to solve high dimensional convex quadratic programming (QP) problems with a large number of quadratic terms, linear equality and inequality constraints. In order to solve the targeted {\bf QP} problems to a desired accuracy efficiently, we develop a two-phase {\bf P}roximal {\bf A}ugmented {\bf L}agrangian method {(QPPAL)}, with Phase I to generate a reasonably good initial point to warm start Phase II to obtain an accurate solution efficiently. More specifically, in Phase I, based on the recently developed symmetric Gauss-Seidel (sGS) decomposition technique, we design a novel sGS based semi-proximal augmented Lagrangian method for the purpose of finding a solution of low to medium accuracy. Then, in Phase II, a proximal augmented Lagrangian algorithm is proposed to obtain a more accurate solution efficiently. Extensive numerical results evaluating the performance of {QPPAL} against {existing state-of-the-art solvers Gurobi, OSQP and QPALM} are presented to demonstrate the high efficiency and robustness of our proposed algorithm for solving various classes of large-scale convex QP problems. {The MATLAB implementation of the software package QPPAL is available at: \url{https://blog.nus.edu.sg/mattohkc/softwares/qppal/}. △ Less

Submitted 28 January, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 28 pages, 4 figures

MSC Class: 90C06; 90C22; 90C25

arXiv:2102.03705 [pdf, other]

doi 10.1016/j.automatica.2021.110007

An Analytic Layer-wise Deep Learning Framework with Applications to Robotics

Authors: Huu-Thiet Nguyen, Chien Chern Cheah, Kar-Ann Toh

Abstract: Deep learning (DL) has achieved great success in many applications, but it has been less well analyzed from the theoretical perspective. The unexplainable success of black-box DL models has raised questions among scientists and promoted the emergence of the field of explainable artificial intelligence (XAI). In robotics, it is particularly important to deploy DL algorithms in a predictable and sta… ▽ More Deep learning (DL) has achieved great success in many applications, but it has been less well analyzed from the theoretical perspective. The unexplainable success of black-box DL models has raised questions among scientists and promoted the emergence of the field of explainable artificial intelligence (XAI). In robotics, it is particularly important to deploy DL algorithms in a predictable and stable manner as robots are active agents that need to interact safely with the physical world. This paper presents an analytic deep learning framework for fully connected neural networks, which can be applied for both regression problems and classification problems. Examples for regression and classification problems include online robot control and robot vision. We present two layer-wise learning algorithms such that the convergence of the learning systems can be analyzed. Firstly, an inverse layer-wise learning algorithm for multilayer networks with convergence analysis for each layer is presented to understand the problems of layer-wise deep learning. Secondly, a forward progressive learning algorithm where the deep networks are built progressively by using single hidden layer networks is developed to achieve better accuracy. It is shown that the progressive learning method can be used for fine-tuning of weights from convergence point of view. The effectiveness of the proposed framework is illustrated based on classical benchmark recognition tasks using the MNIST and CIFAR-10 datasets and the results show a good balance between performance and explainability. The proposed method is subsequently applied for online learning of robot kinematics and experimental results on kinematic control of UR5e robot with unknown model are presented. △ Less

Submitted 24 August, 2023; v1 submitted 6 February, 2021; originally announced February 2021.

Comments: The paper has been published in Automatica

Journal ref: Automatica, vol. 135, Jan. 2022

arXiv:2101.09629 [pdf, ps, other]

doi 10.12752/8130

Solving Challenging Large Scale QAPs

Authors: Koichi Fujii, Naoki Ito, Sunyoung Kim, Masakazu Kojima, Yuji Shinano, Kim-Chuan Toh

Abstract: We report our progress on the project for solving larger scale quadratic assignment problems (QAPs). Our main approach to solve large scale NP-hard combinatorial optimization problems such as QAPs is a parallel branch-and-bound method efficiently implemented on a powerful computer system using the Ubiquity Generator (UG) framework that can utilize more than 100,000 cores. Lower bounding procedures… ▽ More We report our progress on the project for solving larger scale quadratic assignment problems (QAPs). Our main approach to solve large scale NP-hard combinatorial optimization problems such as QAPs is a parallel branch-and-bound method efficiently implemented on a powerful computer system using the Ubiquity Generator (UG) framework that can utilize more than 100,000 cores. Lower bounding procedures incorporated in the branch-and-bound method play a crucial role in solving the problems. For a strong lower bounding procedure, we employ the Lagrangian doubly nonnegative (DNN) relaxation and the Newton-bracketing method developed by the authors' group. In this report, we describe some basic tools used in the project including the lower bounding procedure and branching rules, and present some preliminary numerical results. Our next target problem is QAPs with dimension at least 50, as we have succeeded to solve tai30a and sko42 from QAPLIB for the first time. △ Less

Submitted 23 January, 2021; originally announced January 2021.

Comments: 15 pages

Report number: ZIB-Report (21-02) MSC Class: 90C20; 90C22

arXiv:2012.04862 [pdf, other]

An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems

Authors: Meixia Lin, Defeng Sun, Kim-Chuan Toh

Abstract: Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a unified framework for computing the least squares estimator of a multivariate shape-constrained convex regression function in $\mathbb{R}^d$. We prove that the lea… ▽ More Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a unified framework for computing the least squares estimator of a multivariate shape-constrained convex regression function in $\mathbb{R}^d$. We prove that the least squares estimator is computable via solving an essentially constrained convex quadratic programming (QP) problem with $(d+1)n$ variables, $n(n-1)$ linear inequality constraints and $n$ possibly non-polyhedral inequality constraints, where $n$ is the number of data points. To efficiently solve the generally very large-scale convex QP, we design a proximal augmented Lagrangian method (proxALM) whose subproblems are solved by the semismooth Newton method (SSN). To further accelerate the computation when $n$ is huge, we design a practical implementation of the constraint generation method such that each reduced problem is efficiently solved by our proposed proxALM. Comprehensive numerical experiments, including those in the pricing of basket options and estimation of production functions in economics, demonstrate that our proposed proxALM outperforms the state-of-the-art algorithms, and the proposed acceleration technique further shortens the computation time by a large margin. △ Less

Submitted 20 November, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2002.11410

arXiv:2012.03747 [pdf, other]

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Authors: Hui** Zhuang, Zhi** Lin, Kar-Ann Toh

Abstract: Decoupled learning is a branch of model parallelism which parallelizes the training of a network by splitting it depth-wise into multiple modules. Techniques from decoupled learning usually lead to stale gradient effect because of their asynchronous implementation, thereby causing performance degradation. In this paper, we propose an accumulated decoupled learning (ADL) which incorporates the grad… ▽ More Decoupled learning is a branch of model parallelism which parallelizes the training of a network by splitting it depth-wise into multiple modules. Techniques from decoupled learning usually lead to stale gradient effect because of their asynchronous implementation, thereby causing performance degradation. In this paper, we propose an accumulated decoupled learning (ADL) which incorporates the gradient accumulation technique to mitigate the stale gradient effect. We give both theoretical and empirical evidences regarding how the gradient staleness can be reduced. We prove that the proposed method can converge to critical points, i.e., the gradients converge to 0, in spite of its asynchronous nature. Empirical validation is provided by training deep convolutional neural networks to perform classification tasks on CIFAR-10 and ImageNet datasets. The ADL is shown to outperform several state-of-the-arts in the classification tasks, and is the fastest among the compared methods. △ Less

Submitted 3 December, 2020; originally announced December 2020.

arXiv:2011.14312 [pdf, other]

An efficient implementable inexact entropic proximal point algorithm for a class of linear programming problems

Authors: Hong T. M. Chu, Ling Liang, Kim-Chuan Toh, Lei Yang

Abstract: We introduce a class of specially structured linear programming (LP) problems, which has favorable modeling capability for important application problems in different areas such as optimal transport, discrete tomography and economics. To solve these generally large-scale LP problems efficiently, we design an implementable inexact entropic proximal point algorithm (iEPPA) combined with an easy-to-i… ▽ More We introduce a class of specially structured linear programming (LP) problems, which has favorable modeling capability for important application problems in different areas such as optimal transport, discrete tomography and economics. To solve these generally large-scale LP problems efficiently, we design an implementable inexact entropic proximal point algorithm (iEPPA) combined with an easy-to-implement dual block coordinate descent method as a subsolver. Unlike existing entropy-type proximal point algorithms, our iEPPA employs a more practically checkable stop** condition for solving the associated subproblems while achieving provable convergence. Moreover, when solving the capacity constrained multi-marginal optimal transport (CMOT) problem (a special case of our LP problem), our iEPPA is able to bypass the underlying numerical instability issues that often appear in the popular entropic regularization approach, since our algorithm does not require the proximal parameter to be very small in order to obtain an accurate approximate solution. Numerous numerical experiments show that our iEPPA is efficient and robust for solving large-scale CMOT problems. The experiments on the discrete tomography problem also highlight the potential modeling power of our model. △ Less

Submitted 23 April, 2022; v1 submitted 29 November, 2020; originally announced November 2020.

Comments: 28 pages, 6 figures

arXiv:2010.11559 [pdf, other]

doi 10.1080/10556788.2023.2269594

Learning Graph Laplacian with MCP

Authors: Yang**g Zhang, Kim-Chuan Toh, Defeng Sun

Abstract: We consider the problem of learning a graph under the Laplacian constraint with a non-convex penalty: minimax concave penalty (MCP). For solving the MCP penalized graphical model, we design an inexact proximal difference-of-convex algorithm (DCA) and prove its convergence to critical points. We note that each subproblem of the proximal DCA enjoys the nice property that the objective function in it… ▽ More We consider the problem of learning a graph under the Laplacian constraint with a non-convex penalty: minimax concave penalty (MCP). For solving the MCP penalized graphical model, we design an inexact proximal difference-of-convex algorithm (DCA) and prove its convergence to critical points. We note that each subproblem of the proximal DCA enjoys the nice property that the objective function in its dual problem is continuously differentiable with a semismooth gradient. Therefore, we apply an efficient semismooth Newton method to subproblems of the proximal DCA. Numerical experiments on various synthetic and real data sets demonstrate the effectiveness of the non-convex penalty MCP in promoting sparsity. Compared with the existing state-of-the-art method, our method is demonstrated to be more efficient and reliable for learning graph Laplacian with MCP. △ Less

Submitted 5 October, 2023; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 32 pages

arXiv:2010.08772 [pdf, ps, other]

An Inexact Augmented Lagrangian Method for Second-order Cone Programming with Applications

Authors: Ling Liang, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we adopt the augmented Lagrangian method (ALM) to solve convex quadratic second-order cone programming problems (SOCPs). Fruitful results on the efficiency of the ALM have been established in the literature. Recently, it has been shown in [Cui, Sun, and Toh, {\em Math. Program.}, 178 (2019), pp. 381--415] that if the quadratic growth condition holds at an optimal solution for the du… ▽ More In this paper, we adopt the augmented Lagrangian method (ALM) to solve convex quadratic second-order cone programming problems (SOCPs). Fruitful results on the efficiency of the ALM have been established in the literature. Recently, it has been shown in [Cui, Sun, and Toh, {\em Math. Program.}, 178 (2019), pp. 381--415] that if the quadratic growth condition holds at an optimal solution for the dual problem, then the KKT residual converges to zero R-superlinearly when the ALM is applied to the primal problem. Moreover, Cui, Ding, and Zhao [{\em SIAM J. Optim.}, 27 (2017), pp. 2332-2355] provided sufficient conditions for the quadratic growth condition to hold under the metric subregularity and bounded linear regularity conditions for solving composite matrix optimization problems involving spectral functions. Here, we adopt these recent ideas to analyze the convergence properties of the ALM when applied to SOCPs. To the best of our knowledge, no similar work has been done for SOCPs so far. In our paper, we first provide sufficient conditions to ensure the quadratic growth condition for SOCPs. With these elegant theoretical guarantees, we then design an SOCP solver and apply it to solve various classes of SOCPs, such as minimal enclosing ball problems, classical trust-region subproblems, square-root Lasso problems, and DIMACS Challenge problems. Numerical results show that the proposed ALM based solver is efficient and robust compared to the existing highly developed solvers, such as Mosek and SDPT3. △ Less

Submitted 22 October, 2021; v1 submitted 17 October, 2020; originally announced October 2020.

Comments: 25 pages, 0 figure

MSC Class: 90C06; 90C22; 90C25

arXiv:2009.11272 [pdf, ps, other]

On Degenerate Doubly Nonnegative Projection Problems

Authors: Ying Cui, Ling Liang, Defeng Sun, Kim-Chuan Toh

Abstract: The doubly nonnegative (DNN) cone, being the set of all positive semidefinite matrices whose elements are nonnegative, is a popular approximation of the computationally intractable completely positive cone. The major difficulty for implementing a Newton-type method to compute the projection of a given large scale matrix onto the DNN cone lies in the possible failure of the constraint nondegeneracy… ▽ More The doubly nonnegative (DNN) cone, being the set of all positive semidefinite matrices whose elements are nonnegative, is a popular approximation of the computationally intractable completely positive cone. The major difficulty for implementing a Newton-type method to compute the projection of a given large scale matrix onto the DNN cone lies in the possible failure of the constraint nondegeneracy, a generalization of the linear independence constraint qualification for nonlinear programming. Such a failure results in the singularity of the Jacobian of the nonsmooth equation representing the Karush-Kuhn-Tucker optimality condition that prevents the semismooth Newton-CG method from solving it with a desirable convergence rate. In this paper, we overcome the aforementioned difficulty by solving a sequence of better conditioned nonsmooth equations generated by the augmented Lagrangian method (ALM) instead of solving one above mentioned singular equation. By leveraging on the metric subregularity of the normal cone associated with the positive semidefinite cone, we derive sufficient conditions to ensure the dual quadratic growth condition of the underlying problem, which further leads to the asymptotically superlinear convergence of the proposed ALM. Numerical results on difficult randomly generated instances and from the semidefinite programming library are presented to demonstrate the efficiency of the algorithm for computing the DNN projection to a very high accuracy. △ Less

Submitted 1 September, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: 28 pages, 0 figure

MSC Class: 90C06; 90C22; 90C25

arXiv:2009.08719 [pdf, other]

Adaptive Sieving with PPDNA: Generating Solution Paths of Exclusive Lasso Models

Authors: Meixia Lin, Yancheng Yuan, Defeng Sun, Kim-Chuan Toh

Abstract: The exclusive lasso (also known as elitist lasso) regularization has become popular recently due to its superior performance on structured sparsity. Its complex nature poses difficulties for the computation of high-dimensional machine learning models involving such a regularizer. In this paper, we propose an adaptive sieving (AS) strategy for generating solution paths of machine learning models wi… ▽ More The exclusive lasso (also known as elitist lasso) regularization has become popular recently due to its superior performance on structured sparsity. Its complex nature poses difficulties for the computation of high-dimensional machine learning models involving such a regularizer. In this paper, we propose an adaptive sieving (AS) strategy for generating solution paths of machine learning models with the exclusive lasso regularizer, wherein a sequence of reduced problems with much smaller sizes need to be solved. In order to solve these reduced problems, we propose a highly efficient dual Newton method based proximal point algorithm (PPDNA). As important ingredients, we systematically study the proximal map** of the weighted exclusive lasso regularizer and the corresponding generalized Jacobian. These results also make popular first-order algorithms for solving exclusive lasso models practical. Various numerical experiments for the exclusive lasso models have demonstrated the effectiveness of the AS strategy for generating solution paths and the superior performance of the PPDNA. △ Less

Submitted 18 September, 2020; originally announced September 2020.

MSC Class: 90C06; 90C25; 90C90

arXiv:2004.08115 [pdf, other]

Estimation of sparse Gaussian graphical models with hidden clustering structure

Authors: Meixia Lin, Defeng Sun, Kim-Chuan Toh, Cheng**g Wang

Abstract: Estimation of Gaussian graphical models is important in natural science when modeling the statistical relationships between variables in the form of a graph. The sparsity and clustering structure of the concentration matrix is enforced to reduce model complexity and describe inherent regularities. We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure,… ▽ More Estimation of Gaussian graphical models is important in natural science when modeling the statistical relationships between variables in the form of a graph. The sparsity and clustering structure of the concentration matrix is enforced to reduce model complexity and describe inherent regularities. We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure, which also allows additional linear constraints to be imposed on the concentration matrix. We design an efficient two-phase algorithm for solving the proposed model. We develop a symmetric Gauss-Seidel based alternating direction method of the multipliers (sGS-ADMM) to generate an initial point to warm-start the second phase algorithm, which is a proximal augmented Lagrangian method (pALM), to get a solution with high accuracy. Numerical experiments on both synthetic data and real data demonstrate the good performance of our model, as well as the efficiency and robustness of our proposed algorithm. △ Less

Submitted 17 April, 2020; originally announced April 2020.

arXiv:2002.11410 [pdf, other]

Efficient algorithms for multivariate shape-constrained convex regression problems

Authors: Meixia Lin, Defeng Sun, Kim-Chuan Toh

Abstract: Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a comprehensive mechanism for computing the least squares estimator of a multivariate shape-constrained convex regression function in $\mathbb{R}^d$. We prove that t… ▽ More Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a comprehensive mechanism for computing the least squares estimator of a multivariate shape-constrained convex regression function in $\mathbb{R}^d$. We prove that the least squares estimator is computable via solving a constrained convex quadratic programming (QP) problem with $(n+1)d$ variables and at least $n(n-1)$ linear inequality constraints, where $n$ is the number of data points. For solving the generally very large-scale convex QP, we design two efficient algorithms, one is the symmetric Gauss-Seidel based alternating direction method of multipliers ({\tt sGS-ADMM}), and the other is the proximal augmented Lagrangian method ({\tt pALM}) with the subproblems solved by the semismooth Newton method ({\tt SSN}). Comprehensive numerical experiments, including those in the pricing of basket options and estimation of production functions in economics, demonstrate that both of our proposed algorithms outperform the state-of-the-art algorithm. The {\tt pALM} is more efficient than the {\tt sGS-ADMM} but the latter has the advantage of being simpler to implement. △ Less

Submitted 26 February, 2020; originally announced February 2020.

arXiv:2001.02118 [pdf, ps, other]

Mesh Independence of a Majorized ABCD Method for Sparse PDE-constrained Optimization Problems

Authors: Xiaoliang Song, Defeng Sun, Kim-Chuan Toh

Abstract: A majorized accelerated block coordinate descent (mABCD) method in Hilbert space is analyzed to solve a sparse PDE-constrained optimization problem via its dual. The finite element approximation method is investigated. The attractive $O(1/k^2)$ iteration complexity of {the mABCD} method for the dual objective function values can be achieved. Based on the convergence result, we prove the robustness… ▽ More A majorized accelerated block coordinate descent (mABCD) method in Hilbert space is analyzed to solve a sparse PDE-constrained optimization problem via its dual. The finite element approximation method is investigated. The attractive $O(1/k^2)$ iteration complexity of {the mABCD} method for the dual objective function values can be achieved. Based on the convergence result, we prove the robustness with respect to the mesh size $h$ for the mABCD method by establishing that asymptotically the infinite dimensional ABCD method and finite dimensional discretizations have the same convergence property, and the number of iterations of mABCD method remains almost constant as the discretization is refined. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1709.00005, arXiv:1708.09094, arXiv:1709.09539

arXiv:1906.04647 [pdf, other]

doi 10.1137/19M1267830

A Proximal Point Dual Newton Algorithm for Solving Group Graphical Lasso Problems

Authors: Yang**g Zhang, Ning Zhang, Defeng Sun, Kim-Chuan Toh

Abstract: Undirected graphical models have been especially popular for learning the conditional independence structure among a large number of variables where the observations are drawn independently and identically from the same distribution. However, many modern statistical problems would involve categorical data or time-varying data, which might follow different but related underlying distributions. In o… ▽ More Undirected graphical models have been especially popular for learning the conditional independence structure among a large number of variables where the observations are drawn independently and identically from the same distribution. However, many modern statistical problems would involve categorical data or time-varying data, which might follow different but related underlying distributions. In order to learn a collection of related graphical models simultaneously, various joint graphical models inducing sparsity in graphs and similarity across graphs have been proposed. In this paper, we aim to propose an implementable proximal point dual Newton algorithm (PPDNA) for solving the group graphical Lasso model, which encourages a shared pattern of sparsity across graphs. Though the group graphical Lasso regularizer is non-polyhedral, the asymptotic superlinear convergence of our proposed method PPDNA can be obtained by leveraging on the local Lipschitz continuity of the Karush-Kuhn-Tucker solution map** associated with the group graphical Lasso model. A variety of numerical experiments on real data sets illustrates that the PPDNA for solving the group graphical Lasso model can be highly efficient and robust. △ Less

Submitted 17 August, 2020; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: 24 pages

MSC Class: 90C22; 90C25; 90C31; 62J10

Journal ref: SIAM Journal on Optimization, 30 (2020) , 2197-2220

arXiv:1905.12840 [pdf, other]

A Newton-bracketing method for a simple conic optimization problem

Authors: Sunyoung Kim, Masakazu Kojima, Kim-Chuan Toh

Abstract: For the Lagrangian-DNN relaxation of quadratic optimization problems (QOPs), we propose a Newton-bracketing method to improve the performance of the bisection-projection method implemented in BBCPOP [to appear in ACM Tran. Softw., 2019]. The relaxation problem is converted into the problem of finding the largest zero $y^*$ of a continuously differentiable (except at $y^*$) convex function… ▽ More For the Lagrangian-DNN relaxation of quadratic optimization problems (QOPs), we propose a Newton-bracketing method to improve the performance of the bisection-projection method implemented in BBCPOP [to appear in ACM Tran. Softw., 2019]. The relaxation problem is converted into the problem of finding the largest zero $y^*$ of a continuously differentiable (except at $y^*$) convex function $g : \mathbb{R} \rightarrow \mathbb{R}$ such that $g(y) = 0$ if $y \leq y^*$ and $g(y) > 0$ otherwise. In theory, the method generates lower and upper bounds of $y^*$ both converging to $y^*$. Their convergence is quadratic if the right derivative of $g$ at $y^*$ is positive. Accurate computation of $g'(y)$ is necessary for the robustness of the method, but it is difficult to achieve in practice. As an alternative, we present a secant-bracketing method. We demonstrate that the method improves the quality of the lower bounds obtained by BBCPOP and SDPNAL+ for binary QOP instances from BIQMAC. Moreover, new lower bounds for the unknown optimal values of large scale QAP instances from QAPLIB are reported. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Comments: 19 pages, 2 figures

MSC Class: 90C20; 90C22; 90C25

arXiv:1903.11460 [pdf, ps, other]

A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems

Authors: Peipei Tang, Cheng**g Wang, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we consider high-dimensional nonconvex square-root-loss regression problems and introduce a proximal majorization-minimization (PMM) algorithm for these problems. Our key idea for making the proposed PMM to be efficient is to develop a sparse semismooth Newton method to solve the corresponding subproblems. By using the Kurdyka-Łojasiewicz property exhibited in the underlining proble… ▽ More In this paper, we consider high-dimensional nonconvex square-root-loss regression problems and introduce a proximal majorization-minimization (PMM) algorithm for these problems. Our key idea for making the proposed PMM to be efficient is to develop a sparse semismooth Newton method to solve the corresponding subproblems. By using the Kurdyka-Łojasiewicz property exhibited in the underlining problems, we prove that the PMM algorithm converges to a d-stationary point. We also analyze the oracle property of the initial subproblem used in our algorithm. Extensive numerical experiments are presented to demonstrate the high efficiency of the proposed PMM algorithm. △ Less

Submitted 27 May, 2020; v1 submitted 27 March, 2019; originally announced March 2019.

Comments: 34 pages, 8 tables

arXiv:1903.09546 [pdf, ps, other]

An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for Linear Programming

Authors: Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: Powerful interior-point methods (IPM) based commercial solvers, such as Gurobi and Mosek, have been hugely successful in solving large-scale linear programming (LP) problems. The high efficiency of these solvers depends critically on the sparsity of the problem data and advanced matrix factorization techniques. For a large scale LP problem with data matrix $A$ that is dense (possibly structured) o… ▽ More Powerful interior-point methods (IPM) based commercial solvers, such as Gurobi and Mosek, have been hugely successful in solving large-scale linear programming (LP) problems. The high efficiency of these solvers depends critically on the sparsity of the problem data and advanced matrix factorization techniques. For a large scale LP problem with data matrix $A$ that is dense (possibly structured) or whose corresponding normal matrix $AA^T$ has a dense Cholesky factor (even with re-ordering), these solvers may require excessive computational cost and/or extremely heavy memory usage in each interior-point iteration. Unfortunately, the natural remedy, i.e., the use of iterative methods based IPM solvers, although can avoid the explicit computation of the coefficient matrix and its factorization, is not practically viable due to the inherent extreme ill-conditioning of the large scale normal equation arising in each interior-point iteration. To provide a better alternative choice for solving large scale LPs with dense data or requiring expensive factorization of its normal equation, we propose a semismooth Newton based inexact proximal augmented Lagrangian ({\sc Snipal}) method. Different from classical IPMs, in each iteration of {\sc Snipal}, iterative methods can efficiently be used to solve simpler yet better conditioned semismooth Newton linear systems. Moreover, {\sc Snipal} not only enjoys a fast asymptotic superlinear convergence but is also proven to enjoy a finite termination property. Numerical comparisons with Gurobi have demonstrated encouraging potential of {\sc Snipal} for handling large-scale LP problems where the constraint matrix $A$ has a dense representation or $AA^T$ has a dense factorization even with an appropriate re-ordering. △ Less

Submitted 19 March, 2020; v1 submitted 22 March, 2019; originally announced March 2019.

Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF file

MSC Class: 90C05; 90C06; 90C25; 65F10

arXiv:1903.07325 [pdf, other]

Doubly nonnegative relaxations are equivalent to completely positive reformulations of quadratic optimization problems with block-clique graph structures

Authors: Sunyoung Kim, Masakazu Kojima, Kim-Chuan Toh

Abstract: We study the equivalence among a nonconvex QOP, its CPP and DNN relaxations under the assumption that the aggregated and correlative sparsity of the data matrices of the CPP relaxation is represented by a block-clique graph $G$. By exploiting the correlative sparsity, we decompose the CPP relaxation problem into a clique-tree structured family of smaller subproblems. Each subproblem is associated… ▽ More We study the equivalence among a nonconvex QOP, its CPP and DNN relaxations under the assumption that the aggregated and correlative sparsity of the data matrices of the CPP relaxation is represented by a block-clique graph $G$. By exploiting the correlative sparsity, we decompose the CPP relaxation problem into a clique-tree structured family of smaller subproblems. Each subproblem is associated with a node of a clique tree of $G$. The optimal value can be obtained by applying an algorithm that we propose for solving the subproblems recursively from leaf nodes to the root node of the clique-tree. We establish the equivalence between the QOP and its DNN relaxation from the equivalence between the reduced family of subproblems and their DNN relaxations by applying the known results on: (i) CPP and DNN reformulation of a class of QOPs with linear equality, complementarity and binary constraints in 4 nonnegative variables. (ii) DNN reformulation of a class of quadratically constrained convex QOPs with any size. (iii) DNN reformulation of LPs with any size. As a result, we show that a QOP whose subproblems are the QOPs mentioned in (i), (ii) and (iii) is equivalent to its DNN relaxation, if the subproblems form a clique-tree structured family induced from a block-clique graph. △ Less

Submitted 18 March, 2019; originally announced March 2019.

Comments: 25 pages, 4 figures

arXiv:1902.06952 [pdf, other]

doi 10.1137/20M1344160

An Efficient Linearly Convergent Regularized Proximal Point Algorithm for Fused Multiple Graphical Lasso Problems

Authors: Ning Zhang, Yang**g Zhang, Defeng Sun, Kim-Chuan Toh

Abstract: Nowadays, analysing data from different classes or over a temporal grid has attracted a great deal of interest. As a result, various multiple graphical models for learning a collection of graphical models simultaneously have been derived by introducing sparsity in graphs and similarity across multiple graphs. This paper focuses on the fused multiple graphical Lasso model which encourages not only… ▽ More Nowadays, analysing data from different classes or over a temporal grid has attracted a great deal of interest. As a result, various multiple graphical models for learning a collection of graphical models simultaneously have been derived by introducing sparsity in graphs and similarity across multiple graphs. This paper focuses on the fused multiple graphical Lasso model which encourages not only shared pattern of sparsity, but also shared values of edges across different graphs. For solving this model, we develop an efficient regularized proximal point algorithm, where the subproblem in each iteration of the algorithm is solved by a superlinearly convergent semismooth Newton method. To implement the semismooth Newton method, we derive an explicit expression for the generalized Jacobian of the proximal map** of the fused multiple graphical Lasso regularizer. Unlike those widely used first order methods, our approach has heavily exploited the underlying second order information through the semismooth Newton method. This can not only accelerate the convergence of the algorithm, but also improve its robustness. The efficiency and robustness of our proposed algorithm are demonstrated by comparing with some state-of-the-art methods on both synthetic and real data sets. Supplementary materials for this article are available online. △ Less

Submitted 19 February, 2019; originally announced February 2019.

Journal ref: SIAM Journal on Mathematics of Data Science, 3(2021), pp. 524-543

arXiv:1902.00151 [pdf, ps, other]

A dual Newton based preconditioned proximal point algorithm for exclusive lasso models

Authors: Meixia Lin, Defeng Sun, Kim-Chuan Toh, Yancheng Yuan

Abstract: The exclusive lasso (also known as elitist lasso) regularization has become popular recently due to its superior performance on group sparsity. Compared to the group lasso regularization which enforces the competition on variables among different groups, the exclusive lasso regularization also enforces the competition within each group. In this paper, we propose a highly efficient dual Newton base… ▽ More The exclusive lasso (also known as elitist lasso) regularization has become popular recently due to its superior performance on group sparsity. Compared to the group lasso regularization which enforces the competition on variables among different groups, the exclusive lasso regularization also enforces the competition within each group. In this paper, we propose a highly efficient dual Newton based preconditioned proximal point algorithm (PPDNA) to solve machine learning models involving the exclusive lasso regularizer. As an important ingredient, we provide a rigorous proof for deriving the closed-form solution to the proximal map** of the weighted exclusive lasso regularizer. In addition, we derive the corresponding HS-Jacobian to the proximal map** and analyze its structure --- which plays an essential role in the efficient computation of the PPA subproblem via applying a semismooth Newton method on its dual. Various numerical experiments in this paper demonstrate the superior performance of the proposed PPDNA against other state-of-the-art numerical algorithms. △ Less

Submitted 6 December, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

arXiv:1901.02179 [pdf, other]

A Geometrical Analysis of a Class of Nonconvex Conic Programs for Convex Conic Reformulations of Quadratic and Polynomial Optimization Problems

Authors: Sunyoung Kim, Masakazu Kojima, Kim-Chuan Toh

Abstract: We present a geometrical analysis on the completely positive programming reformulation of quadratic optimization problems and its extension to polynomial optimization problems with a class of geometrically defined nonconvex conic programs and their covexification. The class of nonconvex conic programs is described with a linear objective functionin a linear space $V$, and the constraint set is rep… ▽ More We present a geometrical analysis on the completely positive programming reformulation of quadratic optimization problems and its extension to polynomial optimization problems with a class of geometrically defined nonconvex conic programs and their covexification. The class of nonconvex conic programs is described with a linear objective functionin a linear space $V$, and the constraint set is represented geometrically as the intersection of a nonconvex cone $K \subset V$, a face $J$ of the convex hull of $K$ and a parallel translation $L$ of a supporting hyperplane of the nonconvex cone $K$. We show that under a moderate assumption, the original nonconvex conic program can equivalently be reformulated as a convex conic program by replacing the constraint set with the intersection of $J$ and the hyperplane $L$. The replacement procedure is applied to derive the completely positive programming reformulation of quadratic optimization problems and its extension to polynomial optimization problems. △ Less

Submitted 8 January, 2019; originally announced January 2019.

Comments: 27 pages, 2 figures

MSC Class: 90C20; 90C25; 90C26

arXiv:1812.06579 [pdf, ps, other]

doi 10.4208/jcm.1803-m2018-0278

A Unified Algorithmic Framework of Symmetric Gauss-Seidel Decomposition based Proximal ADMMs for Convex Composite Programming

Authors: Liang Chen, Defeng Sun, Kim-Chuan Toh, Ning Zhang

Abstract: This paper aims to present a fairly accessible generalization of several symmetric Gauss-Seidel decomposition based multi-block proximal alternating direction methods of multipliers (ADMMs) for convex composite optimization problems. The proposed method unifies and refines many constructive techniques that were separately developed for the computational efficiency of multi-block ADMM-type algorith… ▽ More This paper aims to present a fairly accessible generalization of several symmetric Gauss-Seidel decomposition based multi-block proximal alternating direction methods of multipliers (ADMMs) for convex composite optimization problems. The proposed method unifies and refines many constructive techniques that were separately developed for the computational efficiency of multi-block ADMM-type algorithms. Specifically, the majorized augmented Lagrangian functions, the indefinite proximal terms, the inexact symmetric Gauss-Seidel decomposition theorem, the tolerance criteria of approximately solving the subproblems, and the large dual step-lengths, are all incorporated in one algorithmic framework, which we named as sGS-imiPADMM. From the popularity of convergent variants of multi-block ADMMs in recent years, especially for high-dimensional multi-block convex composite conic programming problems, the unification presented in this paper, as well as the corresponding convergence results, may have the great potential of facilitating the implementation of many multi-block ADMMs in various problem settings. △ Less

Submitted 4 April, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

MSC Class: 90C25; 90C22; 90C06; 65K05

Journal ref: Journal of Computational Mathematics, 37(2019), 739--757

arXiv:1812.05243 [pdf, other]

A New Homotopy Proximal Variable-Metric Framework for Composite Convex Minimization

Authors: Quoc Tran-Dinh, Liang Ling, Kim-Chuan Toh

Abstract: This paper suggests two novel ideas to develop new proximal variable-metric methods for solving a class of composite convex optimization problems. The first idea is a new parameterization of the optimality condition which allows us to develop a class of homotopy proximal variable-metric methods. We show that under appropriate assumptions such as strong convexity-type and smoothness, or self-concor… ▽ More This paper suggests two novel ideas to develop new proximal variable-metric methods for solving a class of composite convex optimization problems. The first idea is a new parameterization of the optimality condition which allows us to develop a class of homotopy proximal variable-metric methods. We show that under appropriate assumptions such as strong convexity-type and smoothness, or self-concordance, our new schemes can achieve finite global iteration-complexity bounds. Our second idea is a primal-dual-primal framework for proximal-Newton methods which can lead to some useful computational features for a subclass of nonsmooth composite convex optimization problems. Starting from the primal problem, we formulate its dual problem, and use our homotopy proximal Newton method to solve this dual problem. Instead of solving the subproblem directly in the dual space, we suggest to dualize this subproblem to go back to the primal space. The resulting subproblem shares some similarity promoted by the regularizer of the original problem and leads to some computational advantages. As a byproduct, we specialize the proposed algorithm to solve covariance estimation problems. Surprisingly, our new algorithm does not require any matrix inversion or Cholesky factorization, and function evaluation, while it works in the primal space with sparsity structures that are promoted by the regularizer. Numerical examples on several applications are given to illustrate our theoretical development and to compare with state-of-the-arts. △ Less

Submitted 12 December, 2018; originally announced December 2018.

Comments: 35 pages, 1 figure, and 6 tables

Report number: UNC-STOR-3.12.2018 MSC Class: 90C25; 90C06; 90-08

arXiv:1812.04941 [pdf, ps, other]

A semi-proximal augmented Lagrangian based decomposition method for primal block angular convex composite quadratic conic programming problems

Authors: Xin-Yee Lam, Defeng Sun, Kim-Chuan Toh

Abstract: We propose a semi-proximal augmented Lagrangian based decomposition method for convex composite quadratic conic programming problems with primal block angular structures. Using our algorithmic framework, we are able to naturally derive several well known augmented Lagrangian based decomposition methods for stochastic programming such as the diagonal quadratic approximation method of Mulvey and Rus… ▽ More We propose a semi-proximal augmented Lagrangian based decomposition method for convex composite quadratic conic programming problems with primal block angular structures. Using our algorithmic framework, we are able to naturally derive several well known augmented Lagrangian based decomposition methods for stochastic programming such as the diagonal quadratic approximation method of Mulvey and Ruszczyński. Moreover, we are able to derive novel enhancements and generalizations of these well known methods. We also propose a semi-proximal symmetric Gauss-Seidel based alternating direction method of multipliers for solving the corresponding dual problem. Numerical results show that our algorithms can perform well even for very large instances of primal block angular convex QP problems. For example, one instance with more than $300,000$ linear constraints and $12,500,000$ nonnegative variables is solved in less than a minute whereas Gurobi took more than 3 hours, and another instance {\tt qp-gridgen1} with more than $331,000$ linear constraints and $986,000$ nonnegative variables is solved in about 5 minutes whereas Gurobi took more than 35 minutes. △ Less

Submitted 12 December, 2018; originally announced December 2018.

Comments: 32 pages

arXiv:1811.08227 [pdf, ps, other]

Analytic Network Learning

Authors: Kar-Ann Toh

Abstract: Based on the property that solving the system of linear matrix equations via the column space and the row space projections boils down to an approximation in the least squares error sense, a formulation for learning the weight matrices of the multilayer network can be derived. By exploiting into the vast number of feasible solutions of these interdependent weight matrices, the learning can be perf… ▽ More Based on the property that solving the system of linear matrix equations via the column space and the row space projections boils down to an approximation in the least squares error sense, a formulation for learning the weight matrices of the multilayer network can be derived. By exploiting into the vast number of feasible solutions of these interdependent weight matrices, the learning can be performed analytically layer by layer without needing of gradient computation after an initialization. Possible initialization schemes include utilizing the data matrix as initial weights and random initialization. The study is followed by an investigation into the representation capability and the output variance of the learning scheme. An extensive experimentation on synthetic and real-world data sets validates its numerical feasibility. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Comments: Some of the preliminary ideas of this work has been presented in the IEEE/ACIS 17th International Conference on Computer and Information Science: "Learning from the kernel and the range space" (ICIS 2018)

arXiv:1810.13372 [pdf, ps, other]

Best Nonnegative Rank-One Approximations of Tensors

Authors: Shenglong Hu, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we study the polynomial optimization problem of multi-forms over the intersection of the multi-spheres and the nonnegative orthants. This class of problems is NP-hard in general, and includes the problem of finding the best nonnegative rank-one approximation of a given tensor. A Positivstellensatz is given for this class of polynomial optimization problems, based on which a globally… ▽ More In this paper, we study the polynomial optimization problem of multi-forms over the intersection of the multi-spheres and the nonnegative orthants. This class of problems is NP-hard in general, and includes the problem of finding the best nonnegative rank-one approximation of a given tensor. A Positivstellensatz is given for this class of polynomial optimization problems, based on which a globally convergent hierarchy of doubly nonnegative (DNN) relaxations is proposed. A (zero-th order) DNN relaxation method is applied to solve these problems, resulting in linear matrix optimization problems under both the positive semidefinite and nonnegative conic constraints. A worst case approximation bound is given for this relaxation method. Then, the recent solver SDPNAL+ is adopted to solve this class of matrix optimization problems. Typically, the DNN relaxations are tight, and hence the best nonnegative rank-one approximation of a tensor can be revealed frequently. Extensive numerical experiments show that this approach is quite promising. △ Less

Submitted 31 October, 2018; originally announced October 2018.

Comments: 27 pages

MSC Class: 15A18; 15A42; 15A69; 90C22

arXiv:1810.11581 [pdf, ps, other]

Gradient-Free Learning Based on the Kernel and the Range Space

Authors: Kar-Ann Toh, Zhi** Lin, Zhengguo Li, Beomseok Oh, Lei Sun

Abstract: In this article, we show that solving the system of linear equations by manipulating the kernel and the range space is equivalent to solving the problem of least squares error approximation. This establishes the ground for a gradient-free learning search when the system can be expressed in the form of a linear matrix equation. When the nonlinear activation function is invertible, the learning prob… ▽ More In this article, we show that solving the system of linear equations by manipulating the kernel and the range space is equivalent to solving the problem of least squares error approximation. This establishes the ground for a gradient-free learning search when the system can be expressed in the form of a linear matrix equation. When the nonlinear activation function is invertible, the learning problem of a fully-connected multilayer feedforward neural network can be easily adapted for this novel learning framework. By a series of kernel and range space manipulations, it turns out that such a network learning boils down to solving a set of cross-coupling equations. By having the weights randomly initialized, the equations can be decoupled and the network solution shows relatively good learning capability for real world data sets of small to moderate dimensions. Based on the structural information of the matrix equation, the network representation is found to be dependent on the number of data samples and the output dimension. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: The idea of kernel and range projection was first introduced in the IEEE/ACIS ICIS conference which was held in Singapore in June 2018. This article presents a full development of the method supported by extensive numerical results

arXiv:1810.09856 [pdf, ps, other]

Spectral operators of matrices: semismoothness and characterizations of the generalized Jacobian

Authors: Chao Ding, Defeng Sun, Jie Sun, Kim-Chuan Toh

Abstract: Spectral operators of matrices proposed recently in [C. Ding, D.F. Sun, J. Sun, and K.C. Toh, Math. Program. {\bf 168}, 509--531 (2018)] are a class of matrix valued functions, which map matrices to matrices by applying a vector-to-vector function to all eigenvalues/singular values of the underlying matrices. Spectral operators play a crucial role in the study of various applications involving mat… ▽ More Spectral operators of matrices proposed recently in [C. Ding, D.F. Sun, J. Sun, and K.C. Toh, Math. Program. {\bf 168}, 509--531 (2018)] are a class of matrix valued functions, which map matrices to matrices by applying a vector-to-vector function to all eigenvalues/singular values of the underlying matrices. Spectral operators play a crucial role in the study of various applications involving matrices such as matrix optimization problems (MOPs) {that include semidefinite programming as one of the most important example classes}. In this paper, we will study more fundamental first- and second-order properties of spectral operators, including the Lipschitz continuity, $ρ$-order B(ouligand)-differentiability ($0<ρ\le 1$), $ρ$-order G-semismoothness ($0<ρ\le 1$), and characterization of generalized Jacobians. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: 25 pages. arXiv admin note: substantial text overlap with arXiv:1401.2269

MSC Class: 90C25; 90C06; 65K05; 49J50; 49J52

arXiv:1810.09071 [pdf, ps, other]

Learning from the Kernel and the Range Space

Authors: Kar-Ann Toh

Abstract: In this article, a novel approach to learning a complex function which can be written as the system of linear equations is introduced. This learning is grounded upon the observation that solving the system of linear equations by a manipulation in the kernel and the range space boils down to an estimation based on the least squares error approximation. The learning approach is applied to learn a de… ▽ More In this article, a novel approach to learning a complex function which can be written as the system of linear equations is introduced. This learning is grounded upon the observation that solving the system of linear equations by a manipulation in the kernel and the range space boils down to an estimation based on the least squares error approximation. The learning approach is applied to learn a deep feedforward network with full weight connections. The numerical experiments on network learning of synthetic and benchmark data not only show feasibility of the proposed learning approach but also provide insights into the mechanism of data representation. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: Camera-ready finalized on 22 April 2018, paper presented on 07 June 2018 in the 17th IEEE/ACIS International Conference on Computer and Information Science (ICIS) 2018

arXiv:1810.02677 [pdf, other]

Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm

Authors: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan

Abstract: Clustering is a fundamental problem in unsupervised learning. Popular methods like K-means, may suffer from poor performance as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as the clustering path) has been proposed in Pelckmans et al. (2005), Lindsten et al. (2011) and Hocking et al. (2011). The perfect recovery properties of the convex cluste… ▽ More Clustering is a fundamental problem in unsupervised learning. Popular methods like K-means, may suffer from poor performance as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as the clustering path) has been proposed in Pelckmans et al. (2005), Lindsten et al. (2011) and Hocking et al. (2011). The perfect recovery properties of the convex clustering model with uniformly weighted all pairwise-differences regularization have been proved by Zhu et al. (2014) and Panahi et al. (2017). However, no theoretical guarantee has been established for the general weighted convex clustering model, where better empirical results have been observed. In the numerical optimization aspect, although algorithms like the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA) have been proposed to solve the convex clustering model (Chi and Lange, 2015), it still remains very challenging to solve large-scale problems. In this paper, we establish sufficient conditions for the perfect recovery guarantee of the general weighted convex clustering model, which include and improve existing theoretical results as special cases. In addition, we develop a semismooth Newton based augmented Lagrangian method for solving large-scale convex clustering problems. Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm comparing to the existing first-order methods. In particular, our algorithm is able to solve a convex clustering problem with 200,000 points in $\mathbb{R}^3$ in about 6 minutes. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1802.07091

arXiv:1809.04249 [pdf, other]

A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters

Authors: Lei Yang, Jia Li, Defeng Sun, Kim-Chuan Toh

Abstract: We consider the problem of computing a Wasserstein barycenter for a set of discrete probability distributions with finite supports, which finds many applications in areas such as statistics, machine learning and image processing. When the support points of the barycenter are pre-specified, this problem can be modeled as a linear programming (LP) problem whose size can be extremely large. To handle… ▽ More We consider the problem of computing a Wasserstein barycenter for a set of discrete probability distributions with finite supports, which finds many applications in areas such as statistics, machine learning and image processing. When the support points of the barycenter are pre-specified, this problem can be modeled as a linear programming (LP) problem whose size can be extremely large. To handle this large-scale LP, we analyse the structure of its dual problem, which is conceivably more tractable and can be reformulated as a well-structured convex problem with 3 kinds of block variables and a coupling linear equality constraint. We then adapt a symmetric Gauss-Seidel based alternating direction method of multipliers (sGS-ADMM) to solve the resulting dual problem and establish its global convergence and global linear convergence rate. As a critical component for efficient computation, we also show how all the subproblems involved can be solved exactly and efficiently. This makes our method suitable for computing a Wasserstein barycenter on a large-scale data set, without introducing an entropy regularization term as is commonly practiced. In addition, our sGS-ADMM can be used as a subroutine in an alternating minimization method to compute a barycenter when its support points are not pre-specified. Numerical results on synthetic data sets and image data sets demonstrate that our method is highly competitive for solving large-scale Wasserstein barycenter problems, in comparison to two existing representative methods and the commercial software Gurobi. △ Less

Submitted 26 December, 2020; v1 submitted 12 September, 2018; originally announced September 2018.

arXiv:1808.07181 [pdf, other]

Efficient sparse semismooth Newton methods for the clustered lasso problem

Authors: Meixia Lin, Yong-** Liu, Defeng Sun, Kim-Chuan Toh

Abstract: We focus on solving the clustered lasso problem, which is a least squares problem with the $\ell_1$-type penalties imposed on both the coefficients and their pairwise differences to learn the group structure of the regression parameters. Here we first reformulate the clustered lasso regularizer as a weighted ordered-lasso regularizer, which is essential in reducing the computational cost from… ▽ More We focus on solving the clustered lasso problem, which is a least squares problem with the $\ell_1$-type penalties imposed on both the coefficients and their pairwise differences to learn the group structure of the regression parameters. Here we first reformulate the clustered lasso regularizer as a weighted ordered-lasso regularizer, which is essential in reducing the computational cost from $O(n^2)$ to $O(n\log (n))$. We then propose an inexact semismooth Newton augmented Lagrangian ({\sc Ssnal}) algorithm to solve the clustered lasso problem or its dual via this equivalent formulation, depending on whether the sample size is larger than the dimension of the features. An essential component of the {\sc Ssnal} algorithm is the computation of the generalized Jacobian of the proximal map** of the clustered lasso regularizer. Based on the new formulation, we derive an efficient procedure for its computation. Comprehensive results on the global convergence and local linear convergence of the {\sc Ssnal} algorithm are established. For the purpose of exposition and comparison, we also summarize/design several first-order methods that can be used to solve the problem under consideration, but with the key improvement from the new formulation of the clustered lasso regularizer. As a demonstration of the applicability of our algorithms, numerical experiments on the clustered lasso problem are performed. The experiments show that the {\sc Ssnal} algorithm substantially outperforms the best alternative algorithm for the clustered lasso problem. △ Less

Submitted 1 May, 2019; v1 submitted 21 August, 2018; originally announced August 2018.

arXiv:1806.03404 [pdf, ps, other]

Deterministic Stretchy Regression

Authors: Kar-Ann Toh, Lei Sun, Zhi** Lin

Abstract: An extension of the regularized least-squares in which the estimation parameters are stretchable is introduced and studied in this paper. The solution of this ridge regression with stretchable parameters is given in primal and dual spaces and in closed-form. Essentially, the proposed solution stretches the covariance computation by a power term, thereby compressing or amplifying the estimation par… ▽ More An extension of the regularized least-squares in which the estimation parameters are stretchable is introduced and studied in this paper. The solution of this ridge regression with stretchable parameters is given in primal and dual spaces and in closed-form. Essentially, the proposed solution stretches the covariance computation by a power term, thereby compressing or amplifying the estimation parameters. To maintain the computation of power root terms within the real space, an input transformation is proposed. The results of an empirical evaluation in both synthetic and real-world data illustrate that the proposed method is effective for compressive learning with high-dimensional data. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Comments: Submitted for journal (JMLR) review since 28-Sept-2017

arXiv:1804.00761 [pdf, other]

BBCPOP: A Sparse Doubly Nonnegative Relaxation of Polynomial Optimization Problems with Binary, Box and Complementarity Constraints

Authors: Naoki Ito, Sunyoung Kim, Masakazu Kojima, Akiko Takeda, Kim-Chuan Toh

Abstract: The software package BBCPOP is a MATLAB implementation of a hierarchy of sparse doubly nonnegative (DNN) relaxations of a class of polynomial optimization (minimization) problems (POPs) with binary, box and complementarity (BBC) constraints. Given a POP in the class and a relaxation order, BBCPOP constructs a simple conic optimization problem (COP), which serves as a DNN relaxation of the POP, and… ▽ More The software package BBCPOP is a MATLAB implementation of a hierarchy of sparse doubly nonnegative (DNN) relaxations of a class of polynomial optimization (minimization) problems (POPs) with binary, box and complementarity (BBC) constraints. Given a POP in the class and a relaxation order, BBCPOP constructs a simple conic optimization problem (COP), which serves as a DNN relaxation of the POP, and then solves the COP by applying the bisection and projection (BP) method. The COP is expressed with a linear objective function and constraints described as a single hyperplane and two cones, which are the Cartesian product of positive semidefinite cones and a polyhedral cone induced from the BBC constraints. BBCPOP aims to compute a tight lower bound for the optimal value of a large-scale POP in the class that is beyond the comfort zone of existing software packages. The robustness, reliability and efficiency of BBCPOP are demonstrated in comparison to the state-of-the-art software SDP package SDPNAL+ on randomly generated sparse POPs of degree 2 and 3 with up to a few thousands variables, and ones of degree 4, 5, 6. and 8 with up to a few hundred variables. Comparison with other BBC POPs that arise from combinatorial optimization problems such as quadratic assignment problems are also reported. The software package BBCPOP is available at https://sites.google.com/site/bbcpop1/. △ Less

Submitted 2 April, 2018; originally announced April 2018.

Comments: 28 pages, 4 figures

MSC Class: 90C20; 90C22; 90C25; 90C26

arXiv:1803.10803 [pdf, other]

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Authors: Liang Chen, Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: In this paper, we show that for a class of linearly constrained convex composite optimization problems, an (inexact) symmetric Gauss-Seidel based majorized multi-block proximal alternating direction method of multipliers (ADMM) is equivalent to an {\em inexact} proximal augmented Lagrangian method (ALM). This equivalence not only provides new perspectives for understanding some ADMM-type algorithm… ▽ More In this paper, we show that for a class of linearly constrained convex composite optimization problems, an (inexact) symmetric Gauss-Seidel based majorized multi-block proximal alternating direction method of multipliers (ADMM) is equivalent to an {\em inexact} proximal augmented Lagrangian method (ALM). This equivalence not only provides new perspectives for understanding some ADMM-type algorithms but also supplies meaningful guidelines on implementing them to achieve better computational efficiency. Even for the two-block case, a by-product of this equivalence is the convergence of the whole sequence generated by the classic ADMM with a step-length that exceeds the conventional upper bound of $(1+\sqrt{5})/2$, if one part of the objective is linear. This is exactly the problem setting in which the very first convergence analysis of ADMM was conducted by Gabay and Mercier in 1976, but, even under notably stronger assumptions, only the convergence of the primal sequence was known. A collection of illustrative examples are provided to demonstrate the breadth of applications for which our results can be used. Numerical experiments on solving a large number of linear and convex quadratic semidefinite programming problems are conducted to illustrate how the theoretical results established here can lead to improvements on the corresponding practical implementations. △ Less

Submitted 28 January, 2019; v1 submitted 28 March, 2018; originally announced March 2018.

MSC Class: 90C25; 65K05; 90C06; 49M27; 90C20

arXiv:1803.10740 [pdf, other]

Solving the OSCAR and SLOPE Models Using a Semismooth Newton-Based Augmented Lagrangian Method

Authors: Ziyan Luo, Defeng Sun, Kim-Chuan Toh, Naihua Xiu

Abstract: The octagonal shrinkage and clustering algorithm for regression (OSCAR), equipped with the $\ell_1$-norm and a pair-wise $\ell_{\infty}$-norm regularizer, is a useful tool for feature selection and grou** in high-dimensional data analysis. The computational challenge posed by OSCAR, for high dimensional and/or large sample size data, has not yet been well resolved due to the non-smoothness and i… ▽ More The octagonal shrinkage and clustering algorithm for regression (OSCAR), equipped with the $\ell_1$-norm and a pair-wise $\ell_{\infty}$-norm regularizer, is a useful tool for feature selection and grou** in high-dimensional data analysis. The computational challenge posed by OSCAR, for high dimensional and/or large sample size data, has not yet been well resolved due to the non-smoothness and inseparability of the regularizer involved. In this paper, we successfully resolve this numerical challenge by proposing a sparse semismooth Newton-based augmented Lagrangian method to solve the more general SLOPE (the sorted L-one penalized estimation) model. By appropriately exploiting the inherent sparse and low-rank property of the generalized Jacobian of the semismooth Newton system in the augmented Lagrangian subproblem, we show how the computational complexity can be substantially reduced. Our algorithm presents a notable advantage in the high-dimensional statistical regression settings. Numerical experiments are conducted on real data sets, and the results demonstrate that our algorithm is far superior, in both speed and robustness, than the existing state-of-the-art algorithms based on first-order iterative schemes, including the widely used accelerated proximal gradient (APG) method and the alternating direction method of multipliers (ADMM). △ Less

Submitted 28 March, 2018; originally announced March 2018.

arXiv:1803.06566 [pdf, other]

Computing the Best Approximation Over the Intersection of a Polyhedral Set and the Doubly Nonnegative Cone

Authors: Ying Cui, Defeng Sun, Kim-Chuan Toh

Abstract: This paper introduces an efficient algorithm for computing the best approximation of a given matrix onto the intersection of linear equalities, inequalities and the doubly nonnegative cone (the cone of all positive semidefinite matrices whose elements are nonnegative). In contrast to directly applying the block coordinate descent type methods, we propose an inexact accelerated (two-)block coordina… ▽ More This paper introduces an efficient algorithm for computing the best approximation of a given matrix onto the intersection of linear equalities, inequalities and the doubly nonnegative cone (the cone of all positive semidefinite matrices whose elements are nonnegative). In contrast to directly applying the block coordinate descent type methods, we propose an inexact accelerated (two-)block coordinate descent algorithm to tackle the four-block unconstrained nonsmooth dual program. The proposed algorithm hinges on the efficient semismooth Newton method to solve the subproblems, which have no closed form solutions since the original four blocks are merged into two larger blocks. The $O(1/k^2)$ iteration complexity of the proposed algorithm is established. Extensive numerical results over various large scale semidefinite programming instances from relaxations of combinatorial problems demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 17 March, 2018; originally announced March 2018.

arXiv:1802.07091 [pdf, other]

An Efficient Semismooth Newton Based Algorithm for Convex Clustering

Authors: Yancheng Yuan, Defeng Sun, Kim-Chuan Toh

Abstract: Clustering may be the most fundamental problem in unsupervised learning which is still active in machine learning research because its importance in many applications. Popular methods like K-means, may suffer from instability as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as clustering path), which is a convex relaxation of hierarchical clust… ▽ More Clustering may be the most fundamental problem in unsupervised learning which is still active in machine learning research because its importance in many applications. Popular methods like K-means, may suffer from instability as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as clustering path), which is a convex relaxation of hierarchical clustering model, has been proposed in [7] and [5] Although numerical algorithms like ADMM and AMA are proposed to solve convex clustering model [2], it is known to be very challenging to solve large-scale problems. In this paper, we propose a semi-smooth Newton based augmented Lagrangian method for large-scale convex clustering problems. Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm compared to existing first-order methods. △ Less

Submitted 20 February, 2018; originally announced February 2018.

arXiv:1712.05910 [pdf, other]

doi 10.1007/s10107-018-1329-6

An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems

Authors: Yang**g Zhang, Ning Zhang, Defeng Sun, Kim-Chuan Toh

Abstract: The sparse group Lasso is a widely used statistical model which encourages the sparsity both on a group and within the group level. In this paper, we develop an efficient augmented Lagrangian method for large-scale non-overlap** sparse group Lasso problems with each subproblem being solved by a superlinearly convergent inexact semismooth Newton method. Theoretically, we prove that, if the penalt… ▽ More The sparse group Lasso is a widely used statistical model which encourages the sparsity both on a group and within the group level. In this paper, we develop an efficient augmented Lagrangian method for large-scale non-overlap** sparse group Lasso problems with each subproblem being solved by a superlinearly convergent inexact semismooth Newton method. Theoretically, we prove that, if the penalty parameter is chosen sufficiently large, the augmented Lagrangian method converges globally at an arbitrarily fast linear rate for the primal iterative sequence, the dual infeasibility, and the duality gap of the primal and dual objective functions. Computationally, we derive explicitly the generalized Jacobian of the proximal map** associated with the sparse group Lasso regularizer and exploit fully the underlying second order sparsity through the semismooth Newton method. The efficiency and robustness of our proposed algorithm are demonstrated by numerical experiments on both the synthetic and real data sets. △ Less

Submitted 16 December, 2017; originally announced December 2017.

Journal ref: Mathematical Programming, 179 (2020), pp. 223-263

arXiv:1710.10604 [pdf, other]

SDPNAL+: A Matlab software for semidefinite programming with bound constraints (version 1.0)

Authors: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan, Xin-Yuan Zhao

Abstract: SDPNAL+ is a {\sc Matlab} software package that implements an augmented Lagrangian based method to solve large scale semidefinite programming problems with bound constraints. The implementation was initially based on a majorized semismooth Newton-CG augmented Lagrangian method, here we designed it within an inexact symmetric Gauss-Seidel based semi-proximal ADMM/ALM (alternating direction method o… ▽ More SDPNAL+ is a {\sc Matlab} software package that implements an augmented Lagrangian based method to solve large scale semidefinite programming problems with bound constraints. The implementation was initially based on a majorized semismooth Newton-CG augmented Lagrangian method, here we designed it within an inexact symmetric Gauss-Seidel based semi-proximal ADMM/ALM (alternating direction method of multipliers/augmented Lagrangian method) framework for the purpose of deriving simpler stop** conditions and closing the gap between the practical implementation of the algorithm and the theoretical algorithm. The basic code is written in {\sc Matlab}, but some subroutines in C language are incorporated via Mex files. We also design a convenient interface for users to input their SDP models into the solver. Numerous problems arising from combinatorial optimization and binary integer quadratic programming problems have been tested to evaluate the performance of the solver. Extensive numerical experiments conducted in [Yang, Sun, and Toh, Mathematical Programming Computation, 7 (2015), pp. 331--366] show that the proposed method is quite efficient and robust, in that it is able to solve 98.9\% of the 745 test instances of SDP problems arising from various applications to the accuracy of $ 10^{-6}$ in the relative KKT residual. △ Less

Submitted 16 May, 2019; v1 submitted 29 October, 2017; originally announced October 2017.

Journal ref: Optimization Methods and Software (2019) [https://doi.org/10.1080/10556788.2019.1576176]

arXiv:1706.08800 [pdf, other]

On the R-superlinear convergence of the KKT residues generated by the augmented Lagrangian method for convex composite conic programming

Authors: Ying Cui, Defeng Sun, Kim-Chuan Toh

Abstract: Due to the possible lack of primal-dual-type error bounds, the superlinear convergence for the Karush-Kuhn-Tucker (KKT) residues of the sequence generated by augmented Lagrangian method (ALM) for solving convex composite conic programming (CCCP) has long been an outstanding open question. In this paper, we aim to resolve this issue by first conducting convergence rate analysis for the ALM with Roc… ▽ More Due to the possible lack of primal-dual-type error bounds, the superlinear convergence for the Karush-Kuhn-Tucker (KKT) residues of the sequence generated by augmented Lagrangian method (ALM) for solving convex composite conic programming (CCCP) has long been an outstanding open question. In this paper, we aim to resolve this issue by first conducting convergence rate analysis for the ALM with Rockafellar's stop** criteria under only a mild quadratic growth condition on the dual of CCCP. More importantly, by further assuming that the Robinson constraint qualification holds, we establish the R-superlinear convergence of the KKT residues of the iterative sequence under easy-to-implement stop** criteria {for} the augmented Lagrangian subproblems. Equipped with this discovery, we gain insightful interpretations on the impressive numerical performance of several recently developed semismooth Newton-CG based ALM solvers for solving linear and convex quadratic semidefinite programming. △ Less

Submitted 27 June, 2017; originally announced June 2017.

arXiv:1706.08732 [pdf, other]

On efficiently solving the subproblems of a level-set method for fused lasso problems

Authors: Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: In applying the level-set method developed in [Van den Berg and Friedlander, SIAM J. on Scientific Computing, 31 (2008), pp.~890--912 and SIAM J. on Optimization, 21 (2011), pp.~1201--1229] to solve the fused lasso problems, one needs to solve a sequence of regularized least squares subproblems. In order to make the level-set method practical, we develop a highly efficient inexact semismooth Newto… ▽ More In applying the level-set method developed in [Van den Berg and Friedlander, SIAM J. on Scientific Computing, 31 (2008), pp.~890--912 and SIAM J. on Optimization, 21 (2011), pp.~1201--1229] to solve the fused lasso problems, one needs to solve a sequence of regularized least squares subproblems. In order to make the level-set method practical, we develop a highly efficient inexact semismooth Newton based augmented Lagrangian method for solving these subproblems. The efficiency of our approach is based on several ingredients that constitute the main contributions of this paper. Firstly, an explicit formula for constructing the generalized Jacobian of the proximal map** of the fused lasso regularizer is derived. Secondly, the special structure of the generalized Jacobian is carefully extracted and analyzed for the efficient implementation of the semismooth Newton method. Finally, numerical results, including the comparison between our approach and several state-of-the-art solvers, on real data sets are presented to demonstrate the high efficiency and robustness of our proposed algorithm in solving challenging large-scale fused lasso problems. △ Less

Submitted 27 June, 2017; originally announced June 2017.

MSC Class: 90C06; 90C20; 90C22; 90C25

arXiv:1703.06629 [pdf, ps, other]

A block symmetric Gauss-Seidel decomposition theorem for convex composite quadratic programming and its applications

Authors: Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: For a symmetric positive semidefinite linear system of equations $\mathcal{Q} {\bf x} = {\bf b}$, where ${\bf x} = (x_1,\ldots,x_s)$ is partitioned into $s$ blocks, with $s \geq 2$, we show that each cycle of the classical block symmetric Gauss-Seidel (block sGS) method exactly solves the associated quadratic programming (QP) problem but added with an extra proximal term of the form… ▽ More For a symmetric positive semidefinite linear system of equations $\mathcal{Q} {\bf x} = {\bf b}$, where ${\bf x} = (x_1,\ldots,x_s)$ is partitioned into $s$ blocks, with $s \geq 2$, we show that each cycle of the classical block symmetric Gauss-Seidel (block sGS) method exactly solves the associated quadratic programming (QP) problem but added with an extra proximal term of the form $\frac{1}{2} \| {\bf x}-{\bf x}^k \|_{\mathcal T}^2$, where ${\mathcal T}$ is a symmetric positive semidefinite matrix related to the sGS decomposition and ${\bf x}^k$ is the previous iterate. By leveraging on such a connection to optimization, we are able to extend the result (which we name as the block sGS decomposition theorem) for solving a convex composite QP (CCQP) with an additional possibly nonsmooth term in $x_1$, i.e., $\min\{ p(x_1) + \frac{1}{2}\langle {\bf x},\, \mathcal{Q} {\bf x} \rangle -\langle {\bf b},\, {\bf x}\rangle\}$, where $p(\cdot)$ is a proper closed convex function. Based on the block sGS decomposition theorem, we are able to extend the classical block sGS method to solve a CCQP. In addition, our extended block sGS method has the flexibility of allowing for inexact computation in each step of the block sGS cycle. At the same time, we can also accelerate the inexact block sGS method to achieve an iteration complexity of $O(1/k^2)$ after performing $k$ block sGS cycles. As a {fundamental} building block, the block sGS decomposition theorem has played a key role in various recently developed algorithms such as the inexact semiproximal {ALM/ADMM} for linearly constrained multi-block convex composite conic programming (CCCP), and the accelerated block coordinate descent method for multi-block CCCP. △ Less

Submitted 22 May, 2017; v1 submitted 20 March, 2017; originally announced March 2017.

MSC Class: 90C06; 90C20; 90C25; 65F10

arXiv:1702.05934 [pdf, other]

On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope

Authors: Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: We derive an explicit formula, as well as an efficient procedure, for constructing a generalized Jacobian for the projector of a given square matrix onto the Birkhoff polytope, i.e., the set of doubly stochastic matrices. To guarantee the high efficiency of our procedure, a semismooth Newton method for solving the dual of the projection problem is proposed and efficiently implemented. Extensive nu… ▽ More We derive an explicit formula, as well as an efficient procedure, for constructing a generalized Jacobian for the projector of a given square matrix onto the Birkhoff polytope, i.e., the set of doubly stochastic matrices. To guarantee the high efficiency of our procedure, a semismooth Newton method for solving the dual of the projection problem is proposed and efficiently implemented. Extensive numerical experiments are presented to demonstrate the merits and effectiveness of our method by comparing its performance against other powerful solvers such as the commercial software Gurobi and the academic code PPROJ [{\sc Hager and Zhang}, SIAM Journal on Optimization, 26 (2016), pp.~1773--1798]. In particular, our algorithm is able to solve the projection problem with over one billion variables and nonnegative constraints to a very high accuracy in less than 15 minutes on a modest desktop computer. More importantly, based on our efficient computation of the projections and their generalized Jacobians, we can design a highly efficient augmented Lagrangian method (ALM) for solving a class of convex quadratic programming (QP) problems constrained by the Birkhoff polytope. The resulted ALM is demonstrated to be much more efficient than Gurobi in solving a collection of QP problems arising from the relaxation of quadratic assignment problems. △ Less

Submitted 31 August, 2018; v1 submitted 20 February, 2017; originally announced February 2017.

MSC Class: 90C06; 90C20; 90C25; 65F10

arXiv:1611.09065 [pdf, other]

DrivingStyles: A mobile platform for driving styles and fuel consumption characterization

Authors: Javier E. Meseguer, C. K. Toh, Carlos T. Calafate, Juan Carlos Cano, Pietro Manzoni

Abstract: Intelligent Transportation Systems (ITS) rely on connected vehicle applications to address real-world problems. Research is currently being conducted to support safety, mobility and environmental applications. This paper presents the DrivingStyles architecture, which adopts data mining techniques and neural networks to analyze and generate a classification of driving styles and fuel consumption ba… ▽ More Intelligent Transportation Systems (ITS) rely on connected vehicle applications to address real-world problems. Research is currently being conducted to support safety, mobility and environmental applications. This paper presents the DrivingStyles architecture, which adopts data mining techniques and neural networks to analyze and generate a classification of driving styles and fuel consumption based on driver characterization. In particular, we have implemented an algorithm that is able to characterize the degree of aggressiveness of each driver. We have also developed a methodology to calculate, in real-time, the consumption and environmental impact of spark ignition and diesel vehicles from a set of variables obtained from the vehicle's Electronic Control Unit (ECU). In this paper, we demonstrate the impact of the driving style on fuel consumption, as well as its correlation with the greenhouse gas emissions generated by each vehicle. Overall, our platform is able to assist drivers in correcting their bad driving habits, while offering helpful tips to improve fuel economy and driving safety. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: Journal of Communications and Networks

arXiv:1610.00875 [pdf, ps, other]

On the Asymptotic Superlinear Convergence of the Augmented Lagrangian Method for Semidefinite Programming with Multiple Solutions

Authors: Ying Cui, Defeng Sun, Kim-Chuan Toh

Abstract: Solving large scale convex semidefinite programming (SDP) problems has long been a challenging task numerically. Fortunately, several powerful solvers including SDPNAL, SDPNAL+ and QSDPNAL have recently been developed to solve linear and convex quadratic SDP problems to high accuracy successfully. These solvers are based on the augmented Lagrangian method (ALM) applied to the dual problems with th… ▽ More Solving large scale convex semidefinite programming (SDP) problems has long been a challenging task numerically. Fortunately, several powerful solvers including SDPNAL, SDPNAL+ and QSDPNAL have recently been developed to solve linear and convex quadratic SDP problems to high accuracy successfully. These solvers are based on the augmented Lagrangian method (ALM) applied to the dual problems with the subproblems being solved by semismooth Newton-CG methods. Noticeably, thanks to Rockafellar's general theory on the proximal point algorithms, the primal iteration sequence generated by the ALM enjoys an asymptotic Q-superlinear convergence rate under a second order sufficient condition {for the primal problem}. This second order sufficient condition implies that the primal problem has a unique solution, which can be restrictive in many applications. For gaining more insightful interpretations on the high efficiency of these solvers, in this paper we conduct an asymptotic superlinear convergence analysis of the ALM for convex SDP when the primal problem has multiple solutions (can be unbounded). Under a fairly mild second order growth condition, we prove that the primal iteration sequence generated by the ALM converges asymptotically Q-superlinearly, while the dual feasibility and the dual objective function value converge asymptotically R-superlinearly. Moreover, by studying the metric subregularity of the Karush-Kuhn-Tucker solution map**, we also provide sufficient conditions to guarantee the asymptotic R-superlinear convergence of the dual iterate. △ Less

Submitted 4 October, 2016; originally announced October 2016.

arXiv:1609.07664 [pdf, other]

Max-Norm Optimization for Robust Matrix Recovery

Authors: Ethan X. Fang, Han Liu, Kim-Chuan Toh, Wen-Xin Zhou

Abstract: This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the un… ▽ More This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the uniform sampling assumption imposed for the widely used nuclear-norm penalized approach, and makes low-rank matrix recovery feasible in more practical settings. Theoretically, we prove that the proposed estimator achieves fast rates of convergence under different settings. Computationally, we propose an alternating direction method of multipliers algorithm to efficiently compute the estimator, which bridges a gap between theory and practice of machine learning methods with max-norm regularization. Further, we provide thorough numerical studies to evaluate the proposed method using both simulated and real datasets. △ Less

Submitted 24 September, 2016; originally announced September 2016.

Comments: 32 pages, 4 figures

arXiv:1607.05428 [pdf, ps, other]

A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems

Authors: Xudong Li, Defeng Sun, Kim-Chuan Toh

Abstract: We develop a fast and robust algorithm for solving large scale convex composite optimization models with an emphasis on the $\ell_1$-regularized least squares regression (Lasso) problems. Despite the fact that there exist a large number of solvers in the literature for the Lasso problems, we found that no solver can efficiently handle difficult large scale regression problems with real data. By le… ▽ More We develop a fast and robust algorithm for solving large scale convex composite optimization models with an emphasis on the $\ell_1$-regularized least squares regression (Lasso) problems. Despite the fact that there exist a large number of solvers in the literature for the Lasso problems, we found that no solver can efficiently handle difficult large scale regression problems with real data. By leveraging on available error bound results to realize the asymptotic superlinear convergence property of the augmented Lagrangian algorithm, and by exploiting the second order sparsity of the problem through the semismooth Newton method, we are able to propose an algorithm, called {\sc Ssnal}, to efficiently solve the aforementioned difficult problems. Under very mild conditions, which hold automatically for Lasso problems, both the primal and the dual iteration sequences generated by {\sc Ssnal} possess a fast linear convergence rate, which can even be superlinear asymptotically. Numerical comparisons between our approach and a number of state-of-the-art solvers, on real data sets, are presented to demonstrate the high efficiency and robustness of our proposed algorithm in solving difficult large scale Lasso problems. △ Less

Submitted 3 May, 2017; v1 submitted 19 July, 2016; originally announced July 2016.

MSC Class: 65F10; 90C06; 90C25; 90C31

arXiv:1607.01151 [pdf, ps, other]

Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity

Authors: Tillmann Weisser, Jean-Bernard Lasserre, Kim-Chuan Toh

Abstract: We provide a sparse version of the bounded degree SOS hierarchy BSOS [7] for polynomial optimization problems. It permits to treat large scale problems which satisfy a structured sparsity pattern. When the sparsity pattern satisfies the running intersection property this Sparse-BSOS hierarchy of semidefinite programs (with semidefinite constraints of fixed size) converges to the global optimum of… ▽ More We provide a sparse version of the bounded degree SOS hierarchy BSOS [7] for polynomial optimization problems. It permits to treat large scale problems which satisfy a structured sparsity pattern. When the sparsity pattern satisfies the running intersection property this Sparse-BSOS hierarchy of semidefinite programs (with semidefinite constraints of fixed size) converges to the global optimum of the original problem. Moreover, for the class of SOS-convex problems, finite convergence takes place at the first step of the hierarchy, just as in the dense version. △ Less

Submitted 27 May, 2017; v1 submitted 5 July, 2016; originally announced July 2016.

Report number: Rapport LAAS n{\textdegree} 16193

arXiv:1604.05473 [pdf, ps, other]

Fast algorithms for large scale generalized distance weighted discrimination

Authors: Xin Yee Lam, J. S. Marron, Defeng Sun, Kim-Chuan Toh

Abstract: High dimension low sample size statistical analysis is important in a wide range of applications. In such situations, the highly appealing discrimination method, support vector machine, can be improved to alleviate data piling at the margin. This leads naturally to the development of distance weighted discrimination (DWD), which can be modeled as a second-order cone programming problem and solved… ▽ More High dimension low sample size statistical analysis is important in a wide range of applications. In such situations, the highly appealing discrimination method, support vector machine, can be improved to alleviate data piling at the margin. This leads naturally to the development of distance weighted discrimination (DWD), which can be modeled as a second-order cone programming problem and solved by interior-point methods when the scale (in sample size and feature dimension) of the data is moderate. Here, we design a scalable and robust algorithm for solving large scale generalized DWD problems. Numerical experiments on real data sets from the UCI repository demonstrate that our algorithm is highly efficient in solving large scale problems, and sometimes even more efficient than the highly optimized LIBLINEAR and LIBSVM for solving the corresponding SVM problems. △ Less

Submitted 16 August, 2017; v1 submitted 19 April, 2016; originally announced April 2016.

MSC Class: 90C25; 90C06; 90C90

Showing 51–100 of 114 results for author: Toh, K