-
Complexity of linearized quadratic penalty for optimization with nonlinear equality constraints
Authors:
Lahcen El Bourkhissi,
Ion Necoara
Abstract:
In this paper we consider a nonconvex optimization problem with nonlinear equality constraints. We assume that both, the objective function and the functional constraints, are locally smooth. For solving this problem, we propose a linearized quadratic penalty method, i.e., we linearize the objective function and the functional constraints in the penalty formulation at the current iterate and add a…
▽ More
In this paper we consider a nonconvex optimization problem with nonlinear equality constraints. We assume that both, the objective function and the functional constraints, are locally smooth. For solving this problem, we propose a linearized quadratic penalty method, i.e., we linearize the objective function and the functional constraints in the penalty formulation at the current iterate and add a quadratic regularization, thus yielding a subproblem that is easy to solve, and whose solution is the next iterate. Under a dynamic regularization parameter choice, we derive convergence guarantees for the iterates of our method to an $ε$ first-order optimal solution in $\mathcal{O}(1/{ε^3})$ outer iterations. Finally, we show that when the problem data satisfy Kurdyka-Lojasiewicz property, e.g., are semialgebraic, the whole sequence generated by our algorithm converges and we derive convergence rates. We validate the theory and the performance of the proposed algorithm by numerically comparing it with the existing methods from the literature.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Moving higher-order Taylor approximations method for smooth constrained minimization problems
Authors:
Yassine Nabou,
Ion Necoara
Abstract:
In this paper we develop a higher-order method for solving composite (non)convex minimization problems with smooth (non)convex functional constraints. At each iteration our method approximates the smooth part of the objective function and of the constraints by higher-order Taylor approximations, leading to a moving Taylor approximation method (MTA). We present convergence guarantees for MTA algori…
▽ More
In this paper we develop a higher-order method for solving composite (non)convex minimization problems with smooth (non)convex functional constraints. At each iteration our method approximates the smooth part of the objective function and of the constraints by higher-order Taylor approximations, leading to a moving Taylor approximation method (MTA). We present convergence guarantees for MTA algorithm for both, nonconvex and convex problems. In particular, when the objective and the constraints are nonconvex functions, we prove that the sequence generated by MTA algorithm converges globally to a KKT point. Moreover, we derive convergence rates in the iterates when the problem data satisfy the Kurdyka-Lojasiewicz (KL) property. Further, when the objective function is (uniformly) convex and the constraints are also convex, we provide (linear/superlinear) sublinear convergence rates for our algorithm. Finally, we present an efficient implementation of the proposed algorithm and compare it with existing methods from the literature.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
A stochastic moving ball approximation method for smooth convex constrained minimization
Authors:
Nitesh Kumar Singh,
Ion Necoara
Abstract:
In this paper, we consider constrained optimization problems with convex, smooth objective and constraints. We propose a new stochastic gradient algorithm, called the Stochastic Moving Ball Approximation (SMBA) method, to solve this class of problems, where at each iteration we first take a gradient step for the objective function and then perform a projection step onto one ball approximation of a…
▽ More
In this paper, we consider constrained optimization problems with convex, smooth objective and constraints. We propose a new stochastic gradient algorithm, called the Stochastic Moving Ball Approximation (SMBA) method, to solve this class of problems, where at each iteration we first take a gradient step for the objective function and then perform a projection step onto one ball approximation of a randomly chosen constraint. The computational simplicity of SMBA, which uses first-order information and considers only one constraint at a time, makes it suitable for large-scale problems with many functional constraints. We provide a convergence analysis for the SMBA algorithm using basic assumptions on the problem, that yields new convergence rates in both optimality and feasibility criteria evaluated at some average point. Our convergence proofs are novel since we need to deal properly with infeasible iterates and with quadratic upper approximations of constraints that may yield empty balls. We derive convergence rates of order $\mathcal{O} (k^{-1/2})$ when the objective function is convex, and $\mathcal{O} (k^{-1})$ when the objective function is strongly convex. Preliminary numerical experiments on quadratically constrained quadratic problems demonstrate the viability and performance of our method when compared to some existing state-of-the-art optimization methods and software.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Proximal gradient methods with inexact oracle of degree q for composite optimization
Authors:
Yassine Nabou,
Francois Glineur,
Ion Necoara
Abstract:
We introduce the concept of inexact first-order oracle of degree q for a possibly nonconvex and nonsmooth function, which naturally appears in the context of approximate gradient, weak level of smoothness and other situations. Our definition is less conservative than those found in the existing literature, and it can be viewed as an interpolation between fully exact and the existing inexact first-…
▽ More
We introduce the concept of inexact first-order oracle of degree q for a possibly nonconvex and nonsmooth function, which naturally appears in the context of approximate gradient, weak level of smoothness and other situations. Our definition is less conservative than those found in the existing literature, and it can be viewed as an interpolation between fully exact and the existing inexact first-order oracle definitions. We analyze the convergence behavior of a (fast) inexact proximal gradient method using such an oracle for solving (non)convex composite minimization problems. We derive complexity estimates and study the dependence between the accuracy of the oracle and the desired accuracy of the gradient or of the objective function. Our results show that better rates can be obtained both theoretically and in numerical simulations when q is large.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Mini-batch stochastic subgradient for functional constrained optimization
Authors:
Nitesh Kumar Singh,
Ion Necoara
Abstract:
In this paper we consider finite sum composite convex optimization problems with many functional constraints. The objective function is expressed as a finite sum of two terms, one of which admits easy computation of (sub)gradients while the other is amenable to proximal evaluations. We assume a generalized bounded gradient condition on the objective which allows us to simultaneously tackle both sm…
▽ More
In this paper we consider finite sum composite convex optimization problems with many functional constraints. The objective function is expressed as a finite sum of two terms, one of which admits easy computation of (sub)gradients while the other is amenable to proximal evaluations. We assume a generalized bounded gradient condition on the objective which allows us to simultaneously tackle both smooth and nonsmooth problems. We also consider the cases of both with and without a strong convexity property. Further, we assume that each constraint set is given as the level set of a convex but not necessarily differentiable function. We reformulate the constrained finite sum problem into a stochastic optimization problem for which the stochastic subgradient projection method from [17] specializes to a collection of mini-batch variants, with different mini-batch sizes for the objective function and functional constraints, respectively. More specifically, at each iteration, our algorithm takes a mini-batch stochastic proximal subgradient step aimed at minimizing the objective function and then a subsequent mini-batch subgradient projection step minimizing the feasibility violation. By specializing different mini-batching strategies, we derive exact expressions for the stepsizes as a function of the mini-batch size and in some cases we also derive insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime. We also prove sublinear convergence rates for the mini-batch subgradient projection algorithm which depend explicitly on the mini-batch sizes and on the properties of the objective function. Numerical results also show a better performance of our mini-batch scheme over its single-batch counterpart.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Exact representation and efficient approximations of linear model predictive control laws via HardTanh type deep neural networks
Authors:
Daniela Lupu,
Ion Necoara
Abstract:
Deep neural networks have revolutionized many fields, including image processing, inverse problems, text mining and more recently, give very promising results in systems and control. Neural networks with hidden layers have a strong potential as an approximation framework of predictive control laws as they usually yield better approximation quality and smaller memory requirements than existing expl…
▽ More
Deep neural networks have revolutionized many fields, including image processing, inverse problems, text mining and more recently, give very promising results in systems and control. Neural networks with hidden layers have a strong potential as an approximation framework of predictive control laws as they usually yield better approximation quality and smaller memory requirements than existing explicit (multi-parametric) approaches. In this paper, we first show that neural networks with HardTanh activation functions can exactly represent predictive control laws of linear time-invariant systems. We derive theoretical bounds on the minimum number of hidden layers and neurons that a HardTanh neural network should have to exactly represent a given predictive control law. The choice of HardTanh deep neural networks is particularly suited for linear predictive control laws as they usually require less hidden layers and neurons than deep neural networks with ReLU units for representing exactly continuous piecewise affine (or equivalently min-max) maps. In the second part of the paper we bring the physics of the model and standard optimization techniques into the architecture design, in order to eliminate the disadvantages of the black-box HardTanh learning. More specifically, we design trainable unfolded HardTanh deep architectures for learning linear predictive control laws based on two standard iterative optimization algorithms, i.e., projected gradient descent and accelerated projected gradient descent. We also study the performance of the proposed HardTanh type deep neural networks on a linear model predictive control application.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Higher-order tensor methods for minimizing difference of convex functions
Authors:
Ion Necoara
Abstract:
Higher-order tensor methods were recently proposed for minimizing smooth convex and nonconvex functions. Higher-order algorithms accelerate the convergence of the classical first-order methods thanks to the higher-order derivatives used in the updates. The purpose of this paper is twofold. Firstly, to show that the higher-order algorithmic framework can be generalized and successfully applied to (…
▽ More
Higher-order tensor methods were recently proposed for minimizing smooth convex and nonconvex functions. Higher-order algorithms accelerate the convergence of the classical first-order methods thanks to the higher-order derivatives used in the updates. The purpose of this paper is twofold. Firstly, to show that the higher-order algorithmic framework can be generalized and successfully applied to (nonsmooth) difference of convex functions, namely, those that can be expressed as the difference of two smooth convex functions and a possibly nonsmooth convex one. We also provide examples when the subproblem can be solved efficiently, even globally. Secondly, to derive a complete convergence analysis for our higher-order difference of convex functions (HO-DC) algorithm. In particular, we prove that any limit point of the HO-DC iterative sequence is a critical point of the problem under consideration, the corresponding objective value is monotonically decreasing and the minimum value of the norms of its subgradients converges globally to zero at a sublinear rate. The sublinear or linear convergence rates of the iterations are obtained under the Kurdyka-Lojasiewicz property.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Coordinate descent methods beyond smoothness and separability
Authors:
Flavia Chorobura,
Ion Necoara
Abstract:
This paper deals with convex nonsmooth optimization problems. We introduce a general smooth approximation framework for the original function and apply random (accelerated) coordinate descent methods for minimizing the corresponding smooth approximations. Our framework covers the most important classes of smoothing techniques from the literature. Based on this general framework for the smooth appr…
▽ More
This paper deals with convex nonsmooth optimization problems. We introduce a general smooth approximation framework for the original function and apply random (accelerated) coordinate descent methods for minimizing the corresponding smooth approximations. Our framework covers the most important classes of smoothing techniques from the literature. Based on this general framework for the smooth approximation and using coordinate descent type methods we derive convergence rates in function values for the original objective. Moreover, if the original function satisfies a growth condition, then we prove that the smooth approximations also inherits this condition and consequently the convergence rates are improved in this case. We also present a relative randomized coordinate descent algorithm for solving nonseparable minimization problems with the objective function relative smooth along coordinates w.r.t. a (possibly nonseparable) differentiable function. For this algorithm we also derive convergence rates in the convex case and under the growth condition for the objective.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Acceleration and restart for the randomized Bregman-Kaczmarz method
Authors:
Lionel Tondji,
Ion Necoara,
Dirk A. Lorenz
Abstract:
Optimizing strongly convex functions subject to linear constraints is a fundamental problem with numerous applications. In this work, we propose a block (accelerated) randomized Bregman-Kaczmarz method that only uses a block of constraints in each iteration to tackle this problem. We consider a dual formulation of this problem in order to deal in an efficient way with the linear constraints. Using…
▽ More
Optimizing strongly convex functions subject to linear constraints is a fundamental problem with numerous applications. In this work, we propose a block (accelerated) randomized Bregman-Kaczmarz method that only uses a block of constraints in each iteration to tackle this problem. We consider a dual formulation of this problem in order to deal in an efficient way with the linear constraints. Using convex tools, we show that the corresponding dual function satisfies the Polyak-Lojasiewicz (PL) property, provided that the primal objective function is strongly convex and verifies additionally some other mild assumptions. However, adapting the existing theory on coordinate descent methods to our dual formulation can only give us sublinear convergence results in the dual space. In order to obtain convergence results in some criterion corresponding to the primal (original) problem, we transfer our algorithm to the primal space, which combined with the PL property allows us to get linear convergence rates. More specifically, we provide a theoretical analysis of the convergence of our proposed method under different assumptions on the objective and demonstrate in the numerical experiments its superior efficiency and speed up compared to existing methods for the same problem.
△ Less
Submitted 3 April, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Complexity of linearized augmented Lagrangian for optimization with nonlinear equality constraints
Authors:
Lahcen El Bourkhissi,
Ion Necoara
Abstract:
In this paper, we consider a nonconvex optimization problem with nonlinear equality constraints. We assume that both, the objective function and the functional constraints are locally smooth. For solving this problem, we propose a linearized augmented Lagrangian method, i.e., we linearize the functional constraints in the augmented Lagrangian at the current iterate and add a quadratic regularizati…
▽ More
In this paper, we consider a nonconvex optimization problem with nonlinear equality constraints. We assume that both, the objective function and the functional constraints are locally smooth. For solving this problem, we propose a linearized augmented Lagrangian method, i.e., we linearize the functional constraints in the augmented Lagrangian at the current iterate and add a quadratic regularization, yielding a subproblem that is easy to solve, and whose solution is the next iterate. Under a dynamic regularization parameter choice, we prove global asymptotic convergence of the iterates to a critical point of the problem. We also derive convergence guarantees for the iterates of our method to an $ε$ first-order optimal solution in $\mathcal{O}(1/{ε^2})$ outer iterations. Finally, we show that, when the problem data are e.g., semialgebraic, the sequence generated by our algorithm converges and we derive convergence rates. We validate the theory and the performance of the proposed algorithm by numerically comparing it with the existing methods from the literature.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Stochastic subgradient for composite optimization with functional constraints
Authors:
Ion Necoara,
Nitesh Kumar Singh
Abstract:
In this paper we consider convex optimization problems with stochastic composite objective function subject to (possibly) infinite intersection of constraints. The objective function is expressed in terms of expectation operator over a sum of two terms satisfying a stochastic bounded gradient condition, with or without strong convexity type properties. In contrast to the classical approach, where…
▽ More
In this paper we consider convex optimization problems with stochastic composite objective function subject to (possibly) infinite intersection of constraints. The objective function is expressed in terms of expectation operator over a sum of two terms satisfying a stochastic bounded gradient condition, with or without strong convexity type properties. In contrast to the classical approach, where the constraints are usually represented as intersection of simple sets, in this paper we consider that each constraint set is given as the level set of a convex but not necessarily differentiable function. Based on the flexibility offered by our general optimization model we consider a stochastic subgradient method with random feasibility updates. At each iteration, our algorithm takes a stochastic proximal (sub)gradient step aimed at minimizing the objective function and then a subsequent subgradient step minimizing the feasibility violation of the observed random constraint. We analyze the convergence behavior of the proposed algorithm for diminishing stepsizes and for the case when the objective function is convex or strongly convex, unifying the nonsmooth and smooth cases. We prove sublinear convergence rates for this stochastic subgradient algorithm, which are known to be optimal for subgradient methods on this class of problems. When the objective function has a linear least-square form and the constraints are polyhedral, it is shown that the algorithm converges linearly. Numerical evidence supports the effectiveness of our method in real problems.
△ Less
Submitted 15 January, 2024; v1 submitted 18 April, 2022;
originally announced April 2022.
-
Random coordinate descent methods for nonseparable composite optimization
Authors:
Flavia Chorobura,
Ion Necoara
Abstract:
In this paper we consider large-scale composite optimization problems having the objective function formed as a sum of two terms (possibly nonconvex), one has (block) coordinate-wise Lipschitz continuous gradient and the other is differentiable but nonseparable. Under these general settings we derive and analyze two new coordinate descent methods. The first algorithm, referred to as coordinate pro…
▽ More
In this paper we consider large-scale composite optimization problems having the objective function formed as a sum of two terms (possibly nonconvex), one has (block) coordinate-wise Lipschitz continuous gradient and the other is differentiable but nonseparable. Under these general settings we derive and analyze two new coordinate descent methods. The first algorithm, referred to as coordinate proximal gradient method, considers the composite form of the objective function, while the other algorithm disregards the composite form of the objective and uses the partial gradient of the full objective, yielding a coordinate gradient descent scheme with novel adaptive stepsize rules. We prove that these new stepsize rules make the coordinate gradient scheme a descent method, provided that additional assumptions hold for the second term in the objective function. We present a complete worst-case complexity analysis for these two new methods in both, convex and nonconvex settings, provided that the (block) coordinates are chosen random or cyclic. Preliminary numerical results also confirm the efficiency of our two algorithms on practical problems.
△ Less
Submitted 9 January, 2024; v1 submitted 27 March, 2022;
originally announced March 2022.
-
Efficiency of higher-order algorithms for minimizing composite functions
Authors:
Yassine Nabou,
Ion Necoara
Abstract:
Composite minimization involves a collection of functions which are aggregated in a nonsmooth manner. It covers, as a particular case, smooth approximation of minimax games, minimization of max-type functions, and simple composite minimization problems, where the objective function has a nonsmooth component. We design a higher-order majorization algorithmic framework for fully composite problems (…
▽ More
Composite minimization involves a collection of functions which are aggregated in a nonsmooth manner. It covers, as a particular case, smooth approximation of minimax games, minimization of max-type functions, and simple composite minimization problems, where the objective function has a nonsmooth component. We design a higher-order majorization algorithmic framework for fully composite problems (possibly nonconvex). Our framework replaces each component with a higher-order surrogate such that the corresponding error function has a higher-order Lipschitz continuous derivative. We present convergence guarantees for our method for composite optimization problems with (non)convex and (non)smooth objective function. In particular, we prove stationary point convergence guarantees for general nonconvex (possibly nonsmooth) problems and under Kurdyka-Lojasiewicz (KL) property of the objective function we derive improved rates depending on the KL parameter. For convex (possibly nonsmooth) problems we also provide sublinear convergence rates.
△ Less
Submitted 9 January, 2024; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Efficiency of stochastic coordinate proximal gradient methods on nonseparable composite optimization
Authors:
I. Necoara,
F. Chorobura
Abstract:
This paper deals with composite optimization problems having the objective function formed as the sum of two terms, one has Lipschitz continuous gradient along random subspaces and may be nonconvex and the second term is simple and differentiable, but possibly nonconvex and nonseparable. Under these settings we design a stochastic coordinate proximal gradient method which takes into account the no…
▽ More
This paper deals with composite optimization problems having the objective function formed as the sum of two terms, one has Lipschitz continuous gradient along random subspaces and may be nonconvex and the second term is simple and differentiable, but possibly nonconvex and nonseparable. Under these settings we design a stochastic coordinate proximal gradient method which takes into account the nonseparable composite form of the objective function. This algorithm achieves scalability by constructing at each iteration a local approximation model of the whole nonseparable objective function along a random subspace with user-determined dimension. We outline efficient techniques for selecting the random subspace, yielding an implementation that has low cost per-iteration while also achieving fast convergence rates. We present a probabilistic worst-case complexity analysis for our stochastic coordinate proximal gradient method in convex and nonconvex settings, in particular we prove high-probability bounds on the number of iterations before a given optimality is achieved. Extensive numerical results also confirm the efficiency of our algorithm.
△ Less
Submitted 9 January, 2024; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Convergence analysis of stochastic higher-order majorization-minimization algorithms
Authors:
Daniela Lupu,
Ion Necoara
Abstract:
Majorization-minimization schemes are a broad class of iterative methods targeting general optimization problems, including nonconvex, nonsmooth and stochastic. These algorithms minimize successively a sequence of upper bounds of the objective function so that along the iterations the objective value decreases. We present a stochastic higher-order algorithmic framework for minimizing the average o…
▽ More
Majorization-minimization schemes are a broad class of iterative methods targeting general optimization problems, including nonconvex, nonsmooth and stochastic. These algorithms minimize successively a sequence of upper bounds of the objective function so that along the iterations the objective value decreases. We present a stochastic higher-order algorithmic framework for minimizing the average of a very large number of sufficiently smooth functions. Our stochastic framework is based on the notion of stochastic higher-order upper bound approximations of the finite-sum objective function and minibatching. We derive convergence results for nonconvex and convex optimization problems when the higher-order approximation of the objective function yields an error that is p times differentiable and has Lipschitz continuous p derivative. More precisely, for general nonconvex problems we present asymptotic stationary point guarantees and under Kurdyka-Lojasiewicz property we derive local convergence rates ranging from sublinear to linear. For convex problems with uniformly convex objective function we derive local (super)linear convergence results for our algorithm. Compared to existing stochastic (first-order) methods, our algorithm adapts to the problem's curvature and allows using any batch size. Preliminary numerical tests support the effectiveness of our algorithmic framework.
△ Less
Submitted 10 January, 2024; v1 submitted 14 March, 2021;
originally announced March 2021.
-
General higher-order majorization-minimization algorithms for (non)convex optimization
Authors:
Ion Necoara,
Daniela Lupu
Abstract:
Majorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function so that along the iterations the objective function decreases. Such a simple principle allows to solve a large class of optimization problems, even nonconvex and nonsmooth. We propose a general higher-order majorization-minimization algorithmic framework for minimizing an ob…
▽ More
Majorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function so that along the iterations the objective function decreases. Such a simple principle allows to solve a large class of optimization problems, even nonconvex and nonsmooth. We propose a general higher-order majorization-minimization algorithmic framework for minimizing an objective function that admits an approximation (surrogate) such that the corresponding error function has a higher-order Lipschitz continuous derivative. We present convergence guarantees for our new method for general optimization problems with (non)convex and/or (non)smooth objective function. For convex (possibly nonsmooth) problems we provide global sublinear convergence rates, while for problems with uniformly convex objective function we obtain locally faster superlinear convergence rates. We also prove global stationary point guarantees for general nonconvex (possibly nonsmooth) problems and under Kurdyka-Lojasiewicz property of the objective function we derive local convergence rates ranging from sublinear to superlinear for our majorization-minimization algorithm. Moreover, for unconstrained nonconvex problems we derive convergence rates in terms of first- and second-order optimality conditions.
△ Less
Submitted 28 November, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Model reduction with pole-zero placement and high order moment matching
Authors:
Tudor C. Ionescu,
Orest V. Iftime,
Ion Necoara
Abstract:
In this paper, we compute a low order approximation of a system of large order $n$ that matches $ν$ moments of order $j_i$ of the transfer function, at $ν$ interpolation points, has $\ell$ poles and $k$ zeros fixed and also matches $ν-(\ell +k)$ moments of order $j_i+1$, where $j_i+1$ is the multiplicity of the $i$-th interpolation point. We derive explicit linear systems in the free parameters to…
▽ More
In this paper, we compute a low order approximation of a system of large order $n$ that matches $ν$ moments of order $j_i$ of the transfer function, at $ν$ interpolation points, has $\ell$ poles and $k$ zeros fixed and also matches $ν-(\ell +k)$ moments of order $j_i+1$, where $j_i+1$ is the multiplicity of the $i$-th interpolation point. We derive explicit linear systems in the free parameters to simultaneously achieve the required pole-zero placement and match the desired high order moments. We compute the closed form of the free parameters that meet the constraints, as the solution of a $ν$ order linear system. Furthermore, for data-driven model reduction, we generalize the construction of the Loewner matrices to include the data and the imposed pole and higher order moment constraints. The resulting approximations achieve a trade-off between the good norm approximation and the preservation of the dynamics of the original system in a region of interest.
△ Less
Submitted 24 February, 2021; v1 submitted 12 March, 2020;
originally announced March 2020.
-
General convergence analysis of stochastic first order methods for composite optimization
Authors:
Ion Necoara
Abstract:
In this paper we consider stochastic composite convex optimization problems with the objective function satisfying a stochastic bounded gradient condition, with or without a quadratic functional growth property. These models include the most well-known classes of objective functions analyzed in the literature: non-smooth Lipschitz functions and composition of a (potentially) non-smooth function an…
▽ More
In this paper we consider stochastic composite convex optimization problems with the objective function satisfying a stochastic bounded gradient condition, with or without a quadratic functional growth property. These models include the most well-known classes of objective functions analyzed in the literature: non-smooth Lipschitz functions and composition of a (potentially) non-smooth function and a smooth function, with or without strong convexity. Based on the flexibility offered by our optimization model we consider several variants of stochastic first order methods, such as the stochastic proximal gradient and the stochastic proximal point algorithms. Usually, the convergence theory for these methods has been derived for simple stochastic optimization models satisfying restrictive assumptions, the rates are in general sublinear and hold only for specific decreasing stepsizes. Hence, we analyze the convergence rates of stochastic first order methods with constant or variable stepsize under general assumptions covering a large class of objective functions. For constant stepsize we show that these methods can achieve linear convergence rate up to a constant proportional to the stepsize and under some strong stochastic bounded gradient condition even pure linear convergence. Moreover, when a variable stepsize is chosen we derive sublinear convergence rates for these stochastic first order methods. Finally, the stochastic gradient map** and the Moreau smoothing map** introduced in the present paper lead to simple and intuitive proofs.
△ Less
Submitted 8 March, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Linear convergence of dual coordinate descent on non-polyhedral convex problems
Authors:
Ion Necoara,
Olivier Fercoq
Abstract:
This paper deals with constrained convex problems, where the objective function is smooth strongly convex and the feasible set is given as the intersection of a large number of closed convex (possibly non-polyhedral) sets. In order to deal efficiently with the complicated constraints we consider a dual formulation of this problem. We prove that the corresponding dual function satisfies a quadratic…
▽ More
This paper deals with constrained convex problems, where the objective function is smooth strongly convex and the feasible set is given as the intersection of a large number of closed convex (possibly non-polyhedral) sets. In order to deal efficiently with the complicated constraints we consider a dual formulation of this problem. We prove that the corresponding dual function satisfies a quadratic growth property on any sublevel set, provided that the objective function is smooth and strongly convex and the sets verify the Slater's condition. To the best of our knowledge, this work is the first deriving a quadratic growth condition for the dual under these general assumptions. Existing works derive similar quadratic growth conditions under more conservative assumptions, e.g., the sets need to be either polyhedral or compact. Then, for finding the minimum of the dual problem, due to its special composite structure, we propose random (accelerated) coordinate descent algorithms. However, with the existing theory one can prove that such methods converge only sublinearly. Based on our new quadratic growth property derived for the dual, we now show that such methods have faster convergence, that is the dual random (accelerated) coordinate descent algorithms converge linearly. Besides providing a general dual framework for the analysis of randomized coordinate descent schemes, our results resolve an open problem in the literature related to the convergence of Dykstra algorithm on the best feasibility problem for a collection of convex sets. That is, we establish linear convergence rate for the randomized Dykstra algorithm when the convex sets satisfy the Slater's condition and derive also a new accelerated variant for the Dykstra algorithm.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Minibatch stochastic subgradient-based projection algorithms for solving convex inequalities
Authors:
Ion Necoara,
Angelia Nedich
Abstract:
This paper deals with the convex feasibility problem, where the feasible set is given as the intersection of a (possibly infinite) number of closed convex sets. We assume that each set is specified algebraically as a convex inequality, where the associated convex function is general (possibly non-differentiable). For finding a point satisfying all the convex inequalities we design and analyze rand…
▽ More
This paper deals with the convex feasibility problem, where the feasible set is given as the intersection of a (possibly infinite) number of closed convex sets. We assume that each set is specified algebraically as a convex inequality, where the associated convex function is general (possibly non-differentiable). For finding a point satisfying all the convex inequalities we design and analyze random projection algorithms using special subgradient iterations and extrapolated stepsizes. Moreover, the iterate updates are performed based on parallel random observations of several constraint components. For these minibatch stochastic subgradient-based projection methods we prove sublinear convergence results and, under some linear regularity condition for the functional constraints, we prove linear convergence rates. We also derive conditions under which these rates depend explicitly on the minibatch size. To the best of our knowledge, this work is the first deriving conditions that show when minibatch stochastic subgradient-based projection updates have a better complexity than their single-sample variants.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
A Globally Convergent Penalty-Based Gauss-Newton Algorithm with Applications
Authors:
Ilyes Mezghani,
Quoc Tran-Dinh,
Ion Necoara,
Anthony Papavasiliou
Abstract:
We propose a globally convergent Gauss-Newton algorithm for finding a local optimal solution of a non-convex and possibly non-smooth optimization problem. The algorithm that we present is based on a Gauss-Newton-type iteration for the non-smooth penalized formulation of the original problem. We establish a global convergence rate for this scheme from any initial point to a stationary point of the…
▽ More
We propose a globally convergent Gauss-Newton algorithm for finding a local optimal solution of a non-convex and possibly non-smooth optimization problem. The algorithm that we present is based on a Gauss-Newton-type iteration for the non-smooth penalized formulation of the original problem. We establish a global convergence rate for this scheme from any initial point to a stationary point of the problem while using an exact penalty formulation. Under some more restrictive conditions we also derive local quadratic convergence for this scheme. We apply our proposed algorithm to solve the Alternating Current optimal power flow problem on meshed electricity networks, which is a fundamental application in power systems engineering. We verify the performance of the proposed method by showing comparable behavior with IPOPT, a well-established solver. We perform our validation on several representative instances of the optimal power flow problem, which are sourced from the MATPOWER library.
△ Less
Submitted 7 December, 2020; v1 submitted 21 May, 2019;
originally announced May 2019.
-
Random minibatch subgradient algorithms for convex problems with functional constraints
Authors:
Angelia Nedich,
Ion Necoara
Abstract:
In this paper we consider non-smooth convex optimization problems with (possibly) infinite intersection of constraints. In contrast to the classical approach, where the constraints are usually represented as intersection of simple sets, which are easy to project onto, in this paper we consider that each constraint set is given as the level set of a convex but not necessarily differentiable functio…
▽ More
In this paper we consider non-smooth convex optimization problems with (possibly) infinite intersection of constraints. In contrast to the classical approach, where the constraints are usually represented as intersection of simple sets, which are easy to project onto, in this paper we consider that each constraint set is given as the level set of a convex but not necessarily differentiable function. For these settings we propose subgradient iterative algorithms with random minibatch feasibility updates. At each iteration, our algorithms take a step aimed at only minimizing the objective function and then a subsequent step minimizing the feasibility violation of the observed minibatch of constraints. The feasibility updates are performed based on either parallel or sequential random observations of several constraint components. We analyze the convergence behavior of the proposed algorithms for the case when the objective function is strongly convex and with bounded subgradients, while the functional constraints are endowed with a bounded first-order black-box oracle. For a diminishing stepsize, we prove sublinear convergence rates for the expected distances of the weighted averages of the iterates from the constraint set, as well as for the expected suboptimality of the function values along the weighted averages. Our convergence rates are known to be optimal for subgradient methods on this class of problems. Moreover, the rates depend explicitly on the minibatch size and show when minibatching helps a subgradient scheme with random feasibility updates.
△ Less
Submitted 10 January, 2024; v1 submitted 5 March, 2019;
originally announced March 2019.
-
Faster randomized block Kaczmarz algorithms
Authors:
Ion Necoara
Abstract:
The Kaczmarz algorithm is a simple iterative scheme for solving consistent linear systems. At each step, the method projects the current iterate onto the solution space of a single constraint. Hence, it requires very low cost per iteration and storage, and it has a linear rate of convergence. Distributed implementations of Kaczmarz have become, in recent years, the de facto architectural choice fo…
▽ More
The Kaczmarz algorithm is a simple iterative scheme for solving consistent linear systems. At each step, the method projects the current iterate onto the solution space of a single constraint. Hence, it requires very low cost per iteration and storage, and it has a linear rate of convergence. Distributed implementations of Kaczmarz have become, in recent years, the de facto architectural choice for large-scale linear systems. Therefore, in this paper we develop a family of randomized block Kaczmarz algorithms that uses at each step a subset of the constraints and extrapolated stepsizes, and can be deployed on distributed computing units. Our approach is based on several new ideas and tools, including stochastic selection rule for the blocks of rows, stochastic conditioning of the linear system, and novel strategies for designing extrapolated stepsizes. We prove that randomized block Kaczmarz algorithm converges linearly in expectation, with a rate depending on the geometric properties of the matrix and its submatrices and on the size of the blocks. Our convergence analysis reveals that the algorithm is most effective when it is given a good sampling of the rows into well-conditioned blocks. Besides providing a general framework for the design and analysis of randomized block Kaczmarz methods, our results resolve an open problem in the literature related to the theoretical understanding of observed practical efficiency of extrapolated block Kaczmarz methods.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
H2 model reduction of linear network systems by moment matching and optimization
Authors:
I. Necoara,
T. C. Ionescu
Abstract:
In this paper we study the problem of model reduction of linear network systems. We aim at computing a reduced order stable approximation of the network with the same topology and optimal w.r.t. H2 norm error approximation. Our approach is based on time-domain moment matching framework, where we optimize over families of parameterized reduced order models matching a set of moments at arbitrary int…
▽ More
In this paper we study the problem of model reduction of linear network systems. We aim at computing a reduced order stable approximation of the network with the same topology and optimal w.r.t. H2 norm error approximation. Our approach is based on time-domain moment matching framework, where we optimize over families of parameterized reduced order models matching a set of moments at arbitrary interpolation points. The parameterization of the low order models is in terms of the free parameters and of the interpolation points. For this family of parameterized models we formulate an optimization-based model reduction problem with the H2 norm of error approximation as objective function while the preservation of some structural and physical properties yields the constraints. This problem is nonconvex and we write it in terms of the Gramians of a minimal realization of the error system. We propose two solutions for this problem. The first solution assumes that the error system admits a block diagonal observability Gramian, allowing for a simple convex reformulation as semidefinite programming, but at the cost of some performance loss. We also derive sufficient conditions to guarantee block diagonalization of the Gramian. The second solution employs a gradient projection method for a smooth reformulation yielding (locally) optimal interpolation points and free parameters. The potential of the methods is illustrated on a power network.
△ Less
Submitted 19 May, 2019; v1 submitted 8 February, 2019;
originally announced February 2019.
-
Almost surely constrained convex optimization
Authors:
Olivier Fercoq,
Ahmet Alacaoglu,
Ion Necoara,
Volkan Cevher
Abstract:
We propose a stochastic gradient framework for solving stochastic composite convex optimization problems with (possibly) infinite number of linear inclusion constraints that need to be satisfied almost surely. We use smoothing and homotopy techniques to handle constraints without the need for matrix-valued projections. We show for our stochastic gradient algorithm $\mathcal{O}(\log(k)/\sqrt{k})$ c…
▽ More
We propose a stochastic gradient framework for solving stochastic composite convex optimization problems with (possibly) infinite number of linear inclusion constraints that need to be satisfied almost surely. We use smoothing and homotopy techniques to handle constraints without the need for matrix-valued projections. We show for our stochastic gradient algorithm $\mathcal{O}(\log(k)/\sqrt{k})$ convergence rate for general convex objectives and $\mathcal{O}(\log(k)/k)$ convergence rate for restricted strongly convex objectives. These rates are known to be optimal up to logarithmic factors, even without constraints. We demonstrate the performance of our algorithm with numerical experiments on basis pursuit, a hard margin support vector machines and a portfolio optimization and show that our algorithm achieves state-of-the-art practical performance.
△ Less
Submitted 31 January, 2019;
originally announced February 2019.
-
Optimal H2 moment matching-based model reduction for linear systems by (non)convex optimization
Authors:
I. Necoara,
T. C. Ionescu
Abstract:
In this paper we compute families of reduced order models that match a prescribed set of moments of a highly dimensional linear time-invariant system. First, we fully parametrize the models in the interpolation points and in the free parameters, and then we fix the set of interpolation points and parametrize the models only in the free parameters. Based on these two parametrizations and using as o…
▽ More
In this paper we compute families of reduced order models that match a prescribed set of moments of a highly dimensional linear time-invariant system. First, we fully parametrize the models in the interpolation points and in the free parameters, and then we fix the set of interpolation points and parametrize the models only in the free parameters. Based on these two parametrizations and using as objective function the H2-norm of the error approximation we derive non-convex optimization problems, i.e., we search for the optimal free parameters and even the interpolation points to determine the approximation model yielding the minimal H2-norm error. Further, we provide the necessary first-order optimality conditions for these optimization problems given explicitly in terms of the controllability and the observability Gramians of a minimal realization of the error system. Using the optimality conditions, we propose gradient type methods for solving the corresponding optimization problems, with mathematical guarantees on their convergence. We also derive convex SDP relaxations for these problems and analyze when the convex relaxations are exact. We illustrate numerically the efficiency of our results on several test examples.
△ Less
Submitted 18 November, 2018;
originally announced November 2018.
-
Randomized sketch descent methods for non-separable linearly constrained optimization
Authors:
Ion Necoara,
Martin Takac
Abstract:
In this paper we consider large-scale smooth optimization problems with multiple linear coupled constraints. Due to the non-separability of the constraints, arbitrary random sketching would not be guaranteed to work. Thus, we first investigate necessary and sufficient conditions for the sketch sampling to have well-defined algorithms. Based on these sampling conditions we developed new sketch desc…
▽ More
In this paper we consider large-scale smooth optimization problems with multiple linear coupled constraints. Due to the non-separability of the constraints, arbitrary random sketching would not be guaranteed to work. Thus, we first investigate necessary and sufficient conditions for the sketch sampling to have well-defined algorithms. Based on these sampling conditions we developed new sketch descent methods for solving general smooth linearly constrained problems, in particular, random sketch descent and accelerated random sketch descent methods. From our knowledge, this is the first convergence analysis of random sketch descent algorithms for optimization problems with multiple non-separable linear constraints. For the general case, when the objective function is smooth and non-convex, we prove for the non-accelerated variant sublinear rate in expectation for an appropriate optimality measure. In the smooth convex case, we derive for both algorithms, non-accelerated and accelerated random sketch descent, sublinear convergence rates in the expected values of the objective function. Additionally, if the objective function satisfies a strong convexity type condition, both algorithms converge linearly in expectation. In special cases, where complexity bounds are known for some particular sketching algorithms, such as coordinate descent methods for optimization problems with a single linear coupled constraint, our theory recovers the best-known bounds. We also show that when random sketch is sketching the coordinate directions randomly produces better results than the fixed selection rule. Finally, we present some numerical examples to illustrate the performances of our new algorithms.
△ Less
Submitted 7 August, 2018;
originally announced August 2018.
-
Composite Convex Optimization with Global and Local Inexact Oracles
Authors:
Tianxiao Sun,
Ion Necoara,
Quoc Tran-Dinh
Abstract:
We introduce new global and local inexact oracle concepts for a wide class of convex functions in composite convex minimization. Such inexact oracles naturally come from primal-dual framework, barrier smoothing, inexact computations of gradients and Hessian, and many other situations. We also provide examples showing that the class of convex functions equipped with the newly inexact second-order o…
▽ More
We introduce new global and local inexact oracle concepts for a wide class of convex functions in composite convex minimization. Such inexact oracles naturally come from primal-dual framework, barrier smoothing, inexact computations of gradients and Hessian, and many other situations. We also provide examples showing that the class of convex functions equipped with the newly inexact second-order oracles is larger than standard self-concordant as well as Lipschitz gradient function classes. Further, we investigate several properties of convex and/or self-concordant functions under the inexact second-order oracles which are useful for algorithm development. Next, we apply our theory to develop inexact proximal Newton-type schemes for minimizing general composite convex minimization problems equipped with such inexact oracles. Our theoretical results consist of new optimization algorithms, accompanied with global convergence guarantees to solve a wide class of composite convex optimization problems. When the first objective term is additionally self-concordant, we establish different local convergence results for our method. In particular, we prove that depending on the choice of accuracy levels of the inexact second-order oracles, we obtain different local convergence rates ranging from $R$-linear and $R$-superlinear to $R$-quadratic. In special cases, where convergence bounds are known, our theory recovers the best known rates. We also apply our settings to derive a new primal-dual method for composite convex minimization problems. Finally, we present some representative numerical examples to illustrate the benefit of our new algorithms.
△ Less
Submitted 22 February, 2020; v1 submitted 6 August, 2018;
originally announced August 2018.
-
Randomized projection methods for convex feasibility problems: conditioning and convergence rates
Authors:
Ion Necoara,
Peter Richtarik,
Andrei Patrascu
Abstract:
Finding a point in the intersection of a collection of closed convex sets, that is the convex feasibility problem, represents the main modeling strategy for many computational problems. In this paper we analyze new stochastic reformulations of the convex feasibility problem in order to facilitate the development of new algorithmic schemes. We also analyze the conditioning problem parameters using…
▽ More
Finding a point in the intersection of a collection of closed convex sets, that is the convex feasibility problem, represents the main modeling strategy for many computational problems. In this paper we analyze new stochastic reformulations of the convex feasibility problem in order to facilitate the development of new algorithmic schemes. We also analyze the conditioning problem parameters using certain (linear) regularity assumptions on the individual convex sets. Then, we introduce a general random projection algorithmic framework, which extends to the random settings many existing projection schemes, designed for the general convex feasibility problem. Our general random projection algorithm allows to project simultaneously on several sets, thus providing great flexibility in matching the implementation of the algorithm on the parallel architecture at hand. Based on the conditioning parameters, besides the asymptotic convergence results, we also derive explicit sublinear and linear convergence rates for this general algorithmic framework.
△ Less
Submitted 15 January, 2018;
originally announced January 2018.
-
Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization
Authors:
Andrei Patrascu,
Ion Necoara
Abstract:
A very popular approach for solving stochastic optimization problems is the stochastic gradient descent method (SGD). Although the SGD iteration is computationally cheap and the practical performance of this method may be satisfactory under certain circumstances, there is recent evidence of its convergence difficulties and instability for unappropriate parameters choice. To avoid these drawbacks n…
▽ More
A very popular approach for solving stochastic optimization problems is the stochastic gradient descent method (SGD). Although the SGD iteration is computationally cheap and the practical performance of this method may be satisfactory under certain circumstances, there is recent evidence of its convergence difficulties and instability for unappropriate parameters choice. To avoid these drawbacks naturally introduced by the SGD scheme, the stochastic proximal point algorithms have been recently considered in the literature. We introduce a new variant of the stochastic proximal point method (SPP) for solving stochastic convex optimization problems subject to (in)finite intersection of constraints satisfying a linear regularity type condition. For the newly introduced SPP scheme we prove new nonasymptotic convergence results. In particular, for convex and Lipschitz continuous objective functions, we prove nonasymptotic estimates for the rate of convergence in terms of the expected value function gap of order $\mathcal{O}(1/k^{1/2})$, where $k$ is the iteration counter. We also derive better nonasymptotic bounds for the rate of convergence in terms of expected quadratic distance from the iterates to the optimal solution for smooth strongly convex objective functions, which in the best case is of order $\mathcal{O}(1/k)$. Since these convergence rates can be attained by our SPP algorithm only under some natural restrictions on the stepsize, we also introduce a restarting variant of SPP method that overcomes these difficulties and derive the corresponding nonasymptotic convergence rates. Numerical evidence supports the effectiveness of our methods in real-world problems.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Complexity certifications of first order inexact Lagrangian methods for general convex programming
Authors:
Ion Necoara,
Andrei Patrascu,
Angelia Nedić
Abstract:
In this chapter we derive computational complexity certifications of first order inexact dual methods for solving general smooth constrained convex problems which can arise in real-time applications, such as model predictive control. When it is difficult to project on the primal constraint set described by a collection of general convex functions, we use the Lagrangian relaxation to handle the com…
▽ More
In this chapter we derive computational complexity certifications of first order inexact dual methods for solving general smooth constrained convex problems which can arise in real-time applications, such as model predictive control. When it is difficult to project on the primal constraint set described by a collection of general convex functions, we use the Lagrangian relaxation to handle the complicated constraints and then, we apply dual (fast) gradient algorithms based on inexact dual gradient information for solving the corresponding dual problem. The iteration complexity analysis is based on two types of approximate primal solutions: the primal last iterate and an average of primal iterates. We provide sublinear computational complexity estimates on the primal suboptimality and constraint (feasibility) violation of the generated approximate primal solutions. In the final part of the chapter, we present an open-source quadratic optimization solver, referred to as DuQuad, for convex quadratic programs and for evaluation of its behaviour. The solver contains the C-language implementations of the analyzed algorithms.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Complexity of first order inexact Lagrangian and penalty methods for conic convex programming
Authors:
Ion Necoara,
Andrei Patrascu,
Francois Glineur
Abstract:
In this paper we present a complete iteration complexity analysis of inexact first order Lagrangian and penalty methods for solving cone constrained convex problems that have or may not have optimal Lagrange multipliers that close the duality gap. We first assume the existence of optimal Lagrange multipliers and study primal-dual first order methods based on inexact information and augmented Lagra…
▽ More
In this paper we present a complete iteration complexity analysis of inexact first order Lagrangian and penalty methods for solving cone constrained convex problems that have or may not have optimal Lagrange multipliers that close the duality gap. We first assume the existence of optimal Lagrange multipliers and study primal-dual first order methods based on inexact information and augmented Lagrangian smoothing or Nesterov type smoothing. For inexact (fast) gradient augmented Lagrangian methods we derive a total computational complexity of $\mathcal{O}\left(\frac{1}ε\right)$ projections onto a simple primal set in order to attain an $ε-$optimal solution of the conic convex problem. For the inexact fast gradient method combined with Nesterov type smoothing we derive computational complexity $\mathcal{O}\left(\frac{1}{ε^{3/2}}\right)$ projections onto the same set. Then, we assume that optimal Lagrange multipliers for the cone constrained convex problem might not exist, and analyze the fast gradient method for solving penalty reformulations of the problem. For the fast gradient method combined with penalty framework we also derive a total computational complexity of $\mathcal{O}\left(\frac{1}{ε^{3/2}}\right)$ projections onto a simple primal set to attain an $ε-$optimal solution for the original problem.
△ Less
Submitted 23 March, 2017; v1 submitted 17 June, 2015;
originally announced June 2015.
-
Adaptive inexact fast augmented Lagrangian methods for constrained convex optimization
Authors:
Andrei Patrascu,
Ion Necoara,
Quoc Tran-Dinh
Abstract:
In this paper we analyze several inexact fast augmented Lagrangian methods for solving linearly constrained convex optimization problems. Mainly, our methods rely on the combination of excessive-gap-like smoothing technique developed in [15] and the newly introduced inexact oracle framework from [4]. We analyze several algorithmic instances with constant and adaptive smoothing parameters and deriv…
▽ More
In this paper we analyze several inexact fast augmented Lagrangian methods for solving linearly constrained convex optimization problems. Mainly, our methods rely on the combination of excessive-gap-like smoothing technique developed in [15] and the newly introduced inexact oracle framework from [4]. We analyze several algorithmic instances with constant and adaptive smoothing parameters and derive total computational complexity results in terms of projections onto a simple primal set. For the basic inexact fast augmented Lagrangian algorithm we obtain the overall computational complexity of order $\mathcal{O}\left(\frac{1}{ε^{5/4}}\right)$, while for the adaptive variant we get $\mathcal{O}\left(\frac{1}ε\right)$, projections onto a primal set in order to obtain an $ε-$optimal solution for our original problem.
△ Less
Submitted 12 May, 2015;
originally announced May 2015.
-
Random block coordinate descent methods for linearly constrained optimization over networks
Authors:
I. Necoara,
Yu. Nesterov,
F. Glineur
Abstract:
In this paper we develop random block coordinate gradient descent methods for minimizing large scale linearly constrained separable convex problems over networks. Since we have coupled constraints in the problem, we devise an algorithm that updates in parallel $τ\geq 2$ (block) components per iteration. Moreover, for this method the computations can be performed in a distributed fashion according…
▽ More
In this paper we develop random block coordinate gradient descent methods for minimizing large scale linearly constrained separable convex problems over networks. Since we have coupled constraints in the problem, we devise an algorithm that updates in parallel $τ\geq 2$ (block) components per iteration. Moreover, for this method the computations can be performed in a distributed fashion according to the structure of the network. However, its complexity per iteration is usually cheaper than of the full gradient method when the number of nodes $N$ in the network is large. We prove that for this method we obtain in expectation an $ε$-accurate solution in at most $\mathcal{O}(\frac{N}{τε})$ iterations and thus the convergence rate depends linearly on the number of (block) components $τ$ to be updated. For strongly convex functions the new method converges linearly. We also focus on how to choose the probabilities to make the randomized algorithm to converge as fast as possible and we arrive at solving a sparse SDP. Finally, we describe several applications that fit in our framework, in particular the convex feasibility problem. Numerically, we show that the parallel coordinate descent method with $τ>2$ accelerates on its basic counterpart corresponding to $τ=2$.
△ Less
Submitted 11 December, 2015; v1 submitted 23 April, 2015;
originally announced April 2015.
-
Linear convergence of first order methods for non-strongly convex optimization
Authors:
I. Necoara,
Yu. Nesterov,
F. Glineur
Abstract:
The standard assumption for proving linear convergence of first order methods for smooth convex optimization is the strong convexity of the objective function, an assumption which does not hold for many practical applications. In this paper, we derive linear convergence rates of several first order methods for solving smooth non-strongly convex constrained optimization problems, i.e. involving an…
▽ More
The standard assumption for proving linear convergence of first order methods for smooth convex optimization is the strong convexity of the objective function, an assumption which does not hold for many practical applications. In this paper, we derive linear convergence rates of several first order methods for solving smooth non-strongly convex constrained optimization problems, i.e. involving an objective function with a Lipschitz continuous gradient that satisfies some relaxed strong convexity condition. In particular, in the case of smooth constrained convex optimization, we provide several relaxations of the strong convexity conditions and prove that they are sufficient for getting linear convergence for several first order methods such as projected gradient, fast gradient and feasible descent methods. We also provide examples of functional classes that satisfy our proposed relaxations of strong convexity conditions. Finally, we show that the proposed relaxed strong convexity conditions cover important applications ranging from solving linear systems, Linear Programming, and dual formulations of linearly constrained convex problems.
△ Less
Submitted 9 August, 2016; v1 submitted 23 April, 2015;
originally announced April 2015.
-
DuQuad: an inexact (augmented) dual first order algorithm for quadratic programming
Authors:
Ion Necoara,
Andrei Patrascu
Abstract:
In this paper we present the solver DuQuad specialized for solving general convex quadratic problems arising in many engineering applications. When it is difficult to project on the primal feasible set, we use the (augmented) Lagrangian relaxation to handle the complicated constraints and then, we apply dual first order algorithms based on inexact dual gradient information for solving the correspo…
▽ More
In this paper we present the solver DuQuad specialized for solving general convex quadratic problems arising in many engineering applications. When it is difficult to project on the primal feasible set, we use the (augmented) Lagrangian relaxation to handle the complicated constraints and then, we apply dual first order algorithms based on inexact dual gradient information for solving the corresponding dual problem. The iteration complexity analysis is based on two types of approximate primal solutions: the primal last iterate and an average of primal iterates. We provide computational complexity estimates on the primal suboptimality and feasibility violation of the generated approximate primal solutions. Then, these algorithms are implemented in the programming language C in DuQuad, and optimized for low iteration complexity and low memory footprint. DuQuad has a dynamic Matlab interface which make the process of testing, comparing, and analyzing the algorithms simple. The algorithms are implemented using only basic arithmetic and logical operations and are suitable to run on low cost hardware. It is shown that if an approximate solution is sufficient for a given application, there exists problems where some of the implemented algorithms obtain the solution faster than state-of-the-art commercial solvers.
△ Less
Submitted 22 April, 2015;
originally announced April 2015.
-
Iteration complexity analysis of dual first order methods for conic convex programming
Authors:
Ion Necoara,
Andrei Patrascu
Abstract:
In this paper we provide a detailed analysis of the iteration complexity of dual first order methods for solving conic convex problems. When it is difficult to project on the primal feasible set described by convex constraints, we use the Lagrangian relaxation to handle the complicated constraints and then, we apply dual first order algorithms for solving the corresponding dual problem. We give co…
▽ More
In this paper we provide a detailed analysis of the iteration complexity of dual first order methods for solving conic convex problems. When it is difficult to project on the primal feasible set described by convex constraints, we use the Lagrangian relaxation to handle the complicated constraints and then, we apply dual first order algorithms for solving the corresponding dual problem. We give convergence analysis for dual first order algorithms (dual gradient and fast gradient algorithms): we provide sublinear or linear estimates on the primal suboptimality and feasibility violation of the generated approximate primal solutions. Our analysis relies on the Lipschitz property of the gradient of the dual function or an error bound property of the dual. Furthermore, the iteration complexity analysis is based on two types of approximate primal solutions: the last primal iterate or an average primal sequence.
△ Less
Submitted 13 March, 2015; v1 submitted 4 September, 2014;
originally announced September 2014.
-
On linear convergence of a distributed dual gradient algorithm for linearly constrained separable convex problems
Authors:
Ion Necoara,
Valentin Nedelcu
Abstract:
In this paper we propose a distributed dual gradient algorithm for minimizing linearly constrained separable convex problems and analyze its rate of convergence. In particular, we prove that under the assumption of strong convexity and Lipshitz continuity of the gradient of the primal objective function we have a global error bound type property for the dual problem. Using this error bound propert…
▽ More
In this paper we propose a distributed dual gradient algorithm for minimizing linearly constrained separable convex problems and analyze its rate of convergence. In particular, we prove that under the assumption of strong convexity and Lipshitz continuity of the gradient of the primal objective function we have a global error bound type property for the dual problem. Using this error bound property we devise a fully distributed dual gradient scheme, i.e. a gradient scheme based on a weighted step size, for which we derive global linear rate of convergence for both dual and primal suboptimality and for primal feasibility violation. Many real applications, e.g. distributed model predictive control, network utility maximization or optimal power flow, can be posed as linearly constrained separable convex problems for which dual gradient type methods from literature have sublinear convergence rate. In the present paper we prove for the first time that in fact we can achieve linear convergence rate for such algorithms when they are used for solving these applications. Numerical simulations are also provided to confirm our theory.
△ Less
Submitted 30 September, 2014; v1 submitted 14 June, 2014;
originally announced June 2014.
-
Iteration complexity analysis of random coordinate descent methods for $\ell_0$ regularized convex problems
Authors:
Andrei Patrascu,
Ion Necoara
Abstract:
In this paper we analyze a family of general random block coordinate descent methods for the minimization of $\ell_0$ regularized optimization problems, i.e. the objective function is composed of a smooth convex function and the $\ell_0$ regularization. Our family of methods covers particular cases such as random block coordinate gradient descent and random proximal coordinate descent methods. We…
▽ More
In this paper we analyze a family of general random block coordinate descent methods for the minimization of $\ell_0$ regularized optimization problems, i.e. the objective function is composed of a smooth convex function and the $\ell_0$ regularization. Our family of methods covers particular cases such as random block coordinate gradient descent and random proximal coordinate descent methods. We analyze necessary optimality conditions for this nonconvex $\ell_0$ regularized problem and devise a separation of the set of local minima into restricted classes based on approximation versions of the objective function. We provide a unified analysis of the almost sure convergence for this family of block coordinate descent algorithms and prove that, for each approximation version, the limit points are local minima from the corresponding restricted class of local minimizers. Under the strong convexity assumption, we prove linear convergence in probability for our family of methods.
△ Less
Submitted 18 July, 2014; v1 submitted 26 March, 2014;
originally announced March 2014.
-
Distributed dual gradient methods and error bound conditions
Authors:
Ion Necoara,
Valentin Nedelcu
Abstract:
In this paper we propose distributed dual gradient algorithms for linearly constrained separable convex problems and analyze their rate of convergence under different assumptions. Under the strong convexity assumption on the primal objective function we propose two distributed dual fast gradient schemes for which we prove sublinear rate of convergence for dual suboptimality but also primal subopti…
▽ More
In this paper we propose distributed dual gradient algorithms for linearly constrained separable convex problems and analyze their rate of convergence under different assumptions. Under the strong convexity assumption on the primal objective function we propose two distributed dual fast gradient schemes for which we prove sublinear rate of convergence for dual suboptimality but also primal suboptimality and feasibility violation for an average primal sequence or for the last generated primal iterate. Under the additional assumption of Lipshitz continuity of the gradient of the primal objective function we prove a global error bound type property for the dual problem and then we analyze a dual gradient scheme for which we derive global linear rate of convergence for both dual and primal suboptimality and primal feasibility violation. We also provide numerical simulations on optimal power flow problems.
△ Less
Submitted 3 February, 2014; v1 submitted 17 January, 2014;
originally announced January 2014.
-
Parallel coordinate descent methods for composite minimization: convergence analysis and error bounds
Authors:
Ion Necoara,
Dragos Clipici
Abstract:
In this paper we propose a distributed version of a randomized block-coordinate descent method for minimizing the sum of a partially separable smooth convex function and a fully separable non-smooth convex function. Under the assumption of block Lipschitz continuity of the gradient of the smooth function, this method is shown to have a sublinear convergence rate. Linear convergence rate of the met…
▽ More
In this paper we propose a distributed version of a randomized block-coordinate descent method for minimizing the sum of a partially separable smooth convex function and a fully separable non-smooth convex function. Under the assumption of block Lipschitz continuity of the gradient of the smooth function, this method is shown to have a sublinear convergence rate. Linear convergence rate of the method is obtained for the newly introduced class of generalized error bound functions. We prove that the new class of generalized error bound functions encompasses both global/local error bound functions and smooth strongly convex functions. We also show that the theoretical estimates on the convergence rate depend on the number of blocks chosen randomly and a natural measure of separability of the objective function.
△ Less
Submitted 20 November, 2015; v1 submitted 18 December, 2013;
originally announced December 2013.
-
Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization
Authors:
A. Patrascu,
I. Necoara
Abstract:
In this paper we analyze several new methods for solving nonconvex optimization problems with the objective function formed as a sum of two terms: one is nonconvex and smooth, and another is convex but simple and its structure is known. Further, we consider both cases: unconstrained and linearly constrained nonconvex problems. For optimization problems of the above structure, we propose random coo…
▽ More
In this paper we analyze several new methods for solving nonconvex optimization problems with the objective function formed as a sum of two terms: one is nonconvex and smooth, and another is convex but simple and its structure is known. Further, we consider both cases: unconstrained and linearly constrained nonconvex problems. For optimization problems of the above structure, we propose random coordinate descent algorithms and analyze their convergence properties. For the general case, when the objective function is nonconvex and composite we prove asymptotic convergence for the sequences generated by our algorithms to stationary points and sublinear rate of convergence in expectation for some optimality measure. Additionally, if the objective function satisfies an error bound condition we derive a local linear rate of convergence for the expected values of the objective function. We also present extensive numerical experiments for evaluating the performance of our algorithms in comparison with state-of-the-art methods.
△ Less
Submitted 24 June, 2014; v1 submitted 17 May, 2013;
originally announced May 2013.
-
Distributed model predictive control of leader-follower systems using an interior point method with efficient computations
Authors:
Ion Necoara,
Dragos N. Clipici,
Sorin Olaru
Abstract:
Standard model predictive control strategies imply the online computation of control inputs at each sampling instance, which traditionally limits this type of control scheme to systems with slow dynamics. This paper focuses on distributed model predictive control for large-scale systems comprised of interacting linear subsystems, where the online computations required for the control input can be…
▽ More
Standard model predictive control strategies imply the online computation of control inputs at each sampling instance, which traditionally limits this type of control scheme to systems with slow dynamics. This paper focuses on distributed model predictive control for large-scale systems comprised of interacting linear subsystems, where the online computations required for the control input can be distributed amongst them. A model predictive controller based on a distributed interior point method is derived, for which every subsystem in the network can compute stabilizing control inputs using distributed computations. We introduce local terminal sets and cost functions, which together satisfy distributed invariance conditions for the whole system, that guarantees stability of the closed-loop interconnected system. We show that the synthesis of both terminal sets and terminal cost functions can be done in a distributed framework.
△ Less
Submitted 24 February, 2013;
originally announced February 2013.
-
Computational Complexity of Inexact Gradient Augmented Lagrangian Methods: Application to Constrained MPC
Authors:
Valentin Nedelcu,
Ion Necoara,
Quoc Tran Dinh
Abstract:
We study the computational complexity certification of inexact gradient augmented Lagrangian methods for solving convex optimization problems with complicated constraints. We solve the augmented Lagrangian dual problem that arises from the relaxation of complicating constraints with gradient and fast gradient methods based on inexact first order information. Moreover, since the exact solution of t…
▽ More
We study the computational complexity certification of inexact gradient augmented Lagrangian methods for solving convex optimization problems with complicated constraints. We solve the augmented Lagrangian dual problem that arises from the relaxation of complicating constraints with gradient and fast gradient methods based on inexact first order information. Moreover, since the exact solution of the augmented Lagrangian primal problem is hard to compute in practice, we solve this problem up to some given inner accuracy. We derive relations between the inner and the outer accuracy of the primal and dual problems and we give a full convergence rate analysis for both gradient and fast gradient algorithms. We provide estimates on the primal and dual suboptimality and on primal feasibility violation of the generated approximate primal and dual solutions. Our analysis relies on the Lipschitz property of the dual function and on inexact dual gradients. We also discuss implementation aspects of the proposed algorithms on constrained model predictive control problems for embedded linear systems.
△ Less
Submitted 18 February, 2013;
originally announced February 2013.
-
Improved Dual Decomposition Based Optimization for DSL Dynamic Spectrum Management
Authors:
Paschalis Tsiaflakis,
Ion Necoara,
Johan A. K. Suykens,
Marc Moonen
Abstract:
Dynamic spectrum management (DSM) has been recognized as a key technology to significantly improve the performance of digital subscriber line (DSL) broadband access networks. The basic concept of DSM is to coordinate transmission over multiple DSL lines so as to mitigate the impact of crosstalk interference amongst them. Many algorithms have been proposed to tackle the nonconvex optimization probl…
▽ More
Dynamic spectrum management (DSM) has been recognized as a key technology to significantly improve the performance of digital subscriber line (DSL) broadband access networks. The basic concept of DSM is to coordinate transmission over multiple DSL lines so as to mitigate the impact of crosstalk interference amongst them. Many algorithms have been proposed to tackle the nonconvex optimization problems appearing in DSM, almost all of them relying on a standard subgradient based dual decomposition approach. In practice however, this approach is often found to lead to extremely slow convergence or even no convergence at all, one of the reasons being the very difficult tuning of the stepsize parameters. In this paper we propose a novel improved dual decomposition approach inspired by recent advances in mathematical programming. It uses a smoothing technique for the Lagrangian combined with an optimal gradient based scheme for updating the Lagrange multipliers. The stepsize parameters are furthermore selected optimally removing the need for a tuning strategy. With this approach we show how the convergence of current state-of-the-art DSM algorithms based on iterative convex approximations (SCALE, CA-DSB) can be improved by one order of magnitude. Furthermore we apply the improved dual decomposition approach to other DSM algorithms (OSB, ISB, ASB, (MS)-DSB, MIW) and propose further improvements to obtain fast and robust DSM algorithms. Finally, we demonstrate the effectiveness of the improved dual decomposition approach for a number of realistic multi-user DSL scenarios.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
An Interior-Point Lagrangian Decomposition Method for Separable Convex Optimization
Authors:
I. Necoara,
J. A. K. Suykens
Abstract:
In this paper, we propose a distributed algorithm for solving large-scale separable convex problems using Lagrangian dual decomposition and the interior-point framework. By adding self-concordant barrier terms to the ordinary Lagrangian, we prove under mild assumptions that the corresponding family of augmented dual functions is self-concordant. This makes it possible to efficiently use the Newton…
▽ More
In this paper, we propose a distributed algorithm for solving large-scale separable convex problems using Lagrangian dual decomposition and the interior-point framework. By adding self-concordant barrier terms to the ordinary Lagrangian, we prove under mild assumptions that the corresponding family of augmented dual functions is self-concordant. This makes it possible to efficiently use the Newton method for tracing the central path. We show that the new algorithm is globally convergent and highly parallelizable and thus it is suitable for solving large-scale separable convex problems.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
Rate analysis of inexact dual first order methods: Application to distributed MPC for network systems
Authors:
Ion Necoara,
Valentin Nedelcu
Abstract:
In this paper we propose and analyze two dual methods based on inexact gradient information and averaging that generate approximate primal solutions for smooth convex optimization problems. The complicating constraints are moved into the cost using the Lagrange multipliers. The dual problem is solved by inexact first order methods based on approximate gradients and we prove sublinear rate of conve…
▽ More
In this paper we propose and analyze two dual methods based on inexact gradient information and averaging that generate approximate primal solutions for smooth convex optimization problems. The complicating constraints are moved into the cost using the Lagrange multipliers. The dual problem is solved by inexact first order methods based on approximate gradients and we prove sublinear rate of convergence for these methods. In particular, we provide, for the first time, estimates on the primal feasibility violation and primal and dual suboptimality of the generated approximate primal and dual solutions. Moreover, we solve approximately the inner problems with a parallel coordinate descent algorithm and we show that it has linear convergence rate. In our analysis we rely on the Lipschitz property of the dual function and inexact dual gradients. Further, we apply these methods to distributed model predictive control for network systems. By tightening the complicating constraints we are also able to ensure the primal feasibility of the approximate solutions generated by the proposed algorithms. We obtain a distributed control strategy that has the following features: state and input constraints are satisfied, stability of the plant is guaranteed, whilst the number of iterations for the suboptimal solution can be precisely determined.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
Parallel and distributed optimization methods for estimation and control in networks
Authors:
Ion Necoara,
Valentin Nedelcu,
Ioan Dumitrache
Abstract:
System performance for networks composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool for solving estimation and control problems in large-scale networked systems. In this paper we review and analyze the optimization-theoretic concepts of paralle…
▽ More
System performance for networks composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool for solving estimation and control problems in large-scale networked systems. In this paper we review and analyze the optimization-theoretic concepts of parallel and distributed methods for solving coupled optimization problems and demonstrate how several estimation and control problems related to complex networked systems can be formulated in these settings. The paper presents a systematic framework for exploiting the potential of the decomposition structures as a way to obtain different parallel algorithms, each with a different tradeoff among convergence speed, message passing amount and distributed computation architecture. Several specific applications from estimation and process control are included to demonstrate the power of the approach.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
Application of a smoothing technique to decomposition in convex optimization
Authors:
Ion Necoara,
Johan A. K. Suykens
Abstract:
Dual decomposition is a powerful technique for deriving decomposition schemes for convex optimization problems with separable structure. Although the Augmented Lagrangian is computationally more stable than the ordinary Lagrangian, the prox-term destroys the separability of the given problem. In this paper we use another approach to obtain a smooth Lagrangian, based on a smoothing technique develo…
▽ More
Dual decomposition is a powerful technique for deriving decomposition schemes for convex optimization problems with separable structure. Although the Augmented Lagrangian is computationally more stable than the ordinary Lagrangian, the prox-term destroys the separability of the given problem. In this paper we use another approach to obtain a smooth Lagrangian, based on a smoothing technique developed by Nesterov, which preserves separability of the problem. With this approach we derive a new decomposition method, called proximal center algorithm, which from the viewpoint of efficiency estimates improves the bounds on the number of iterations of the classical dual gradient scheme by an order of magnitude. This can be achieved with the new decomposition algorithm since the resulting dual function has good smoothness properties and since we make use of the particular structure of the given problem.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: application to distributed MPC
Authors:
Ion Necoara,
Dragos Clipici
Abstract:
In this paper we propose a parallel coordinate descent algorithm for solving smooth convex optimization problems with separable constraints that may arise e.g. in distributed model predictive control (MPC) for linear network systems. Our algorithm is based on block coordinate descent updates in parallel and has a very simple iteration. We prove (sub)linear rate of convergence for the new algorithm…
▽ More
In this paper we propose a parallel coordinate descent algorithm for solving smooth convex optimization problems with separable constraints that may arise e.g. in distributed model predictive control (MPC) for linear network systems. Our algorithm is based on block coordinate descent updates in parallel and has a very simple iteration. We prove (sub)linear rate of convergence for the new algorithm under standard assumptions for smooth convex optimization. Further, our algorithm uses local information and thus is suitable for distributed implementations. Moreover, it has low iteration complexity, which makes it appropriate for embedded control. An MPC scheme based on this new parallel algorithm is derived, for which every subsystem in the network can compute feasible and stabilizing control inputs using distributed and cheap computations. For ensuring stability of the MPC scheme, we use a terminal cost formulation derived from a distributed synthesis. Preliminary numerical tests show better performance for our optimization algorithm than other existing methods.
△ Less
Submitted 18 November, 2014; v1 submitted 13 February, 2013;
originally announced February 2013.