Search | arXiv e-print repository

Bolstering Stochastic Gradient Descent with Model Building

Authors: S. Ilker Birbil, Ozgur Martin, Gonenc Onay, Figen Oztoprak

Abstract: Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by li… ▽ More Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods. △ Less

Submitted 13 March, 2024; v1 submitted 13 November, 2021; originally announced November 2021.

arXiv:2110.04355 [pdf, other]

Constrained Optimization in the Presence of Noise

Authors: Figen Oztoprak, Richard Byrd, Jorge Nocedal

Abstract: The problem of interest is the minimization of a nonlinear function subject to nonlinear equality constraints using a sequential quadratic programming (SQP) method. The minimization must be performed while observing only noisy evaluations of the objective and constraint functions. In order to obtain stability, the classical SQP method is modified by relaxing the standard Armijo line search based o… ▽ More The problem of interest is the minimization of a nonlinear function subject to nonlinear equality constraints using a sequential quadratic programming (SQP) method. The minimization must be performed while observing only noisy evaluations of the objective and constraint functions. In order to obtain stability, the classical SQP method is modified by relaxing the standard Armijo line search based on the noise level in the functions, which is assumed to be known. Convergence theory is presented giving conditions under which the iterates converge to a neighborhood of the solution characterized by the noise level and the problem conditioning. The analysis assumes that the SQP algorithm does not require regularization or trust regions. Numerical experiments indicate that the relaxed line search improves the practical performance of the method on problems involving uniformly distributed noise. One important application of this work is in the field of derivative-free optimization, when finite differences are employed to estimate gradients. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2102.09762 [pdf, other]

On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations

Authors: Hao-Jun Michael Shi, Melody Qiming Xuan, Figen Oztoprak, Jorge Nocedal

Abstract: The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calcu… ▽ More The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calculated by finite differences, with a differencing interval determined by the noise level in the functions and a bound on the second or third derivatives. It is assumed that noise level is known or can be estimated by means of difference tables or sampling. The use of finite differences has been largely dismissed in the derivative-free optimization literature as too expensive in terms of function evaluations and/or as impractical when the objective function contains noise. The test results presented in this paper suggest that such views should be re-examined and that the finite-difference approach has much to be recommended. The tests compared NEWUOA, DFO-LS and COBYLA against the finite-difference approach on three classes of problems: general unconstrained problems, nonlinear least squares, and general nonlinear programs with equality constraints. △ Less

Submitted 19 February, 2021; originally announced February 2021.

Comments: 82 pages, 38 tables, 29 figures

arXiv:1705.05158 [pdf, other]

An Alternative Globalization Strategy for Unconstrained Optimization

Authors: Figen Öztoprak, Ş. İlker Birbil

Abstract: We propose a new globalization strategy that can be used in unconstrained optimization algorithms to support rapid convergence from remote starting points. Our approach is based on using multiple points at each iteration to build a representative model of the objective function. Using the new information gathered from those multiple points, a local step is gradually improved by updating its direct… ▽ More We propose a new globalization strategy that can be used in unconstrained optimization algorithms to support rapid convergence from remote starting points. Our approach is based on using multiple points at each iteration to build a representative model of the objective function. Using the new information gathered from those multiple points, a local step is gradually improved by updating its direction as well as its length. We give a global convergence result and also provide parallel implementation details accompanied with a numerical study. Our numerical study shows that the proposed algorithm is a promising alternative as a globalization strategy. △ Less

Submitted 15 May, 2017; originally announced May 2017.

Comments: Submitted for publication

arXiv:1509.01698 [pdf, other]

HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

Authors: Kamer Kaya, Figen Öztoprak, Ş. İlker Birbil, A. Taylan Cemgil, Umut Şimşekli, Nurdan Kuru, Hazal Koptagel, M. Kaan Öztürk

Abstract: We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems. The algorithm is based on a local quadratic approximation, and hence, allows incorporating curvature information to speed-up the convergence. HAMSI is inherently parallel and it scales nicely with t… ▽ More We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems. The algorithm is based on a local quadratic approximation, and hence, allows incorporating curvature information to speed-up the convergence. HAMSI is inherently parallel and it scales nicely with the number of processors. Combined with techniques for effectively utilizing modern parallel computer architectures, we illustrate that the proposed method converges more rapidly than a parallel stochastic gradient descent when both methods are used to solve large-scale matrix factorization problems. This performance gain comes only at the expense of using memory that scales linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many large scale problems, where first order methods based on variants of stochastic gradient descent are applicable. △ Less

Submitted 4 August, 2017; v1 submitted 5 September, 2015; originally announced September 2015.

Comments: The software is available at https://github.com/spartensor/hamsi-mf

arXiv:1506.01418 [pdf, other]

Parallel Stochastic Gradient Markov Chain Monte Carlo for Matrix Factorisation Models

Authors: Umut Şimşekli, Hazal Koptagel, Hakan Güldaş, A. Taylan Cemgil, Figen Öztoprak, Ş. İlker Birbil

Abstract: For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD). PSGLD has very favourable scaling properties with increasing data size and is comparable in terms of computational requirements to optimisation methods based on stochastic gradient descent. PSGLD achieves… ▽ More For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD). PSGLD has very favourable scaling properties with increasing data size and is comparable in terms of computational requirements to optimisation methods based on stochastic gradient descent. PSGLD achieves high performance by exploiting the conditional independence structure of the MF models to sub-sample data in a systematic manner as to allow parallelisation and distributed computation. We provide a convergence proof of the algorithm and verify its superior performance on various architectures such as Graphics Processing Units, shared memory multi-core systems and multi-computer clusters. △ Less

Submitted 28 September, 2015; v1 submitted 3 June, 2015; originally announced June 2015.

Comments: 10 pages, 6 figures

arXiv:1505.04315 [pdf, ps, other]

A Second-Order Method for Convex $\ell_1$-Regularized Optimization with Active Set Prediction

Authors: Nitish Shirish Keskar, Jorge Nocedal, Figen Oztoprak, Andreas Waechter

Abstract: We describe an active-set method for the minimization of an objective function $φ$ that is the sum of a smooth convex function and an $\ell_1$-regularization term. A distinctive feature of the method is the way in which active-set identification and {second-order} subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm sel… ▽ More We describe an active-set method for the minimization of an objective function $φ$ that is the sum of a smooth convex function and an $\ell_1$-regularization term. A distinctive feature of the method is the way in which active-set identification and {second-order} subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach, and compare the new method against the state-of-the-art code LIBLINEAR. △ Less

Submitted 16 May, 2015; originally announced May 2015.

arXiv:1309.3529 [pdf, ps, other]

An Inexact Successive Quadratic Approximation Method for Convex L-1 Regularized Optimization

Authors: Richard H. Byrd, Jorge Nocedal, Figen Oztoprak

Abstract: We study a Newton-like method for the minimization of an objective function that is the sum of a smooth convex function and an l-1 regularization term. This method, which is sometimes referred to in the literature as a proximal Newton method, computes a step by minimizing a piecewise quadratic model of the objective function. In order to make this approach efficient in practice, it is imperative t… ▽ More We study a Newton-like method for the minimization of an objective function that is the sum of a smooth convex function and an l-1 regularization term. This method, which is sometimes referred to in the literature as a proximal Newton method, computes a step by minimizing a piecewise quadratic model of the objective function. In order to make this approach efficient in practice, it is imperative to perform this inner minimization inexactly. In this paper, we give inexactness conditions that guarantee global convergence and that can be used to control the local rate of convergence of the iteration. Our inexactness conditions are based on a semi-smooth function that represents a (continuous) measure of the optimality conditions of the problem, and that embodies the soft-thresholding iteration. We give careful consideration to the algorithm employed for the inner minimization, and report numerical results on two test sets originating in machine learning. △ Less

Submitted 13 September, 2013; originally announced September 2013.

Showing 1–8 of 8 results for author: Öztoprak, F