Search | arXiv e-print repository

Challenges with Differentiable Quantum Dynamics

Authors: Sri Hari Krishna Narayanan, Michael Perlin, Robert Lewis-Swan, Jeffrey Larson, Matt Menickelly, Jan Hückelheim, Paul Hovland

Abstract: Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application r… ▽ More Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application requirements. We therefore demonstrate a need for broader support of complex-valued, differentiable numerical integration in scientific computing libraries. △ Less

Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2401.10757 [pdf, ps, other]

Estimating Computational Noise on Parametric Curves

Authors: Matt Menickelly

Abstract: We consider ECNoise, a practical tool for estimating the magnitude of noise in evaluations of a black-box function. Recent developments in numerical optimization algorithms have seen increased usage of ECNoise as a subroutine to provide a solver with noise level estimates, so that the solver might somehow proportionally adjust for noise. Particularly motivated by problems in computationally expens… ▽ More We consider ECNoise, a practical tool for estimating the magnitude of noise in evaluations of a black-box function. Recent developments in numerical optimization algorithms have seen increased usage of ECNoise as a subroutine to provide a solver with noise level estimates, so that the solver might somehow proportionally adjust for noise. Particularly motivated by problems in computationally expensive derivative-free optimization, we question a fundamental assumption made in the original development of ECNoise, particularly the assumption that the set of points provided to ECNoise must satisfy fairly restrictive geometric conditions (in particular, that the points be collinear and equally spaced). Driven by prior practical experience, we show that in many situations, noise estimates obtained from providing an arbitrary (that is, not collinear) geometry of points as input to ECNoise are often indistinguishable from noise estimates obtained from using the standard (collinear and equally spaced) geometry. We analyze this via parametric curves that interpolate the arbitrary input points. The analysis provides insight into the circumstances in which one can expect arbitrary point selection to cause significant degradation of ECNoise. Moreover, the analysis suggests a practical means (the solution of a small mixed integer linear program) by which one can gradually adjust an initial arbitrary point selection to yield better noise estimates with higher probability. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.10121 [pdf, other]

A Novel Noise-Aware Classical Optimizer for Variational Quantum Algorithms

Authors: Jeffrey Larson, Matt Menickelly, Jiahao Shi

Abstract: A key component of variational quantum algorithms (VQAs) is the choice of classical optimizer employed to update the parameterization of an ansatz. It is well recognized that quantum algorithms will, for the foreseeable future, necessarily be run on noisy devices with limited fidelities. Thus, the evaluation of an objective function (e.g., the guiding function in the quantum approximate optimizati… ▽ More A key component of variational quantum algorithms (VQAs) is the choice of classical optimizer employed to update the parameterization of an ansatz. It is well recognized that quantum algorithms will, for the foreseeable future, necessarily be run on noisy devices with limited fidelities. Thus, the evaluation of an objective function (e.g., the guiding function in the quantum approximate optimization algorithm (QAOA) or the expectation of the electronic Hamiltonian in variational quantum eigensolver (VQE)) required by a classical optimizer is subject not only to stochastic error from estimating an expected value but also to error resulting from intermittent hardware noise. Model-based derivative-free optimization methods have emerged as popular choices of a classical optimizer in the noisy VQA setting, based on empirical studies. However, these optimization methods were not explicitly designed with the consideration of noise. In this work we adapt recent developments from the ``noise-aware numerical optimization'' literature to these commonly used derivative-free model-based methods. We introduce the key defining characteristics of these novel noise-aware derivative-free model-based methods that separate them from standard model-based methods. We study an implementation of such noise-aware derivative-free model-based methods and compare its performance on demonstrative VQA simulations to classical solvers packaged in \texttt{scikit-quant}. △ Less

Submitted 14 June, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.08912 [pdf, other]

Two-Stage Estimation and Variance Modeling for Latency-Constrained Variational Quantum Algorithms

Authors: Yunsoo Ha, Sara Shashaani, Matt Menickelly

Abstract: The Quantum Approximate Optimization Algorithm (QAOA) has enjoyed increasing attention in noisy intermediate-scale quantum computing due to its application to combinatorial optimization problems. Because combinatorial optimization problems are NP-hard, QAOA could serve as a potential demonstration of quantum advantage in the future. As a hybrid quantum-classical algorithm, the classical component… ▽ More The Quantum Approximate Optimization Algorithm (QAOA) has enjoyed increasing attention in noisy intermediate-scale quantum computing due to its application to combinatorial optimization problems. Because combinatorial optimization problems are NP-hard, QAOA could serve as a potential demonstration of quantum advantage in the future. As a hybrid quantum-classical algorithm, the classical component of QAOA resembles a simulation optimization problem, in which the simulation outcomes are attainable only through the quantum computer. The simulation that derives from QAOA exhibits two unique features that can have a substantial impact on the optimization process: (i) the variance of the stochastic objective values typically decreases in proportion to the optimality gap, and (ii) querying samples from a quantum computer introduces an additional latency overhead. In this paper, we introduce a novel stochastic trust-region method, derived from a derivative-free adaptive sampling trust-region optimization (ASTRO-DF) method, intended to efficiently solve the classical optimization problem in QAOA, by explicitly taking into account the two mentioned characteristics. The key idea behind the proposed algorithm involves constructing two separate local models in each iteration: a model of the objective function, and a model of the variance of the objective function. Exploiting the variance model allows us to both restrict the number of communications with the quantum computer, and also helps navigate the nonconvex objective landscapes typical in the QAOA optimization problems. We numerically demonstrate the superiority of our proposed algorithm using the SimOpt library and Qiskit, when we consider a metric of computational burden that explicitly accounts for communication costs. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2305.17336 [pdf, ps, other]

Avoiding Geometry Improvement in Derivative-Free Model-Based Methods via Randomization

Authors: Matt Menickelly

Abstract: We present a technique for model-based derivative-free optimization called \emph{basis sketching}. Basis sketching consists of taking random sketches of the Vandermonde matrix employed in constructing an interpolation model. This randomization enables weakening the general requirement in model-based derivative-free methods that interpolation sets contain a full-dimensional set of affinely independ… ▽ More We present a technique for model-based derivative-free optimization called \emph{basis sketching}. Basis sketching consists of taking random sketches of the Vandermonde matrix employed in constructing an interpolation model. This randomization enables weakening the general requirement in model-based derivative-free methods that interpolation sets contain a full-dimensional set of affinely independent points in every iteration. Practically, this weakening provides a theoretically justified means of avoiding potentially expensive geometry improvement steps in many model-based derivative-free methods. We demonstrate this practicality by extending the nonlinear least squares solver, \texttt{POUNDers} to a variant that employs basis sketching and we observe encouraging results on higher dimensional problems. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2302.09128 [pdf, other]

A Stochastic Quasi-Newton Method in the Absence of Common Random Numbers

Authors: Matt Menickelly, Stefan M. Wild, Miaolan Xie

Abstract: We present a quasi-Newton method for unconstrained stochastic optimization. Most existing literature on this topic assumes a setting of stochastic optimization in which a finite sum of component functions is a reasonable approximation of an expectation, and hence one can design a quasi-Newton method to exploit common random numbers. In contrast, and motivated by problems in variational quantum alg… ▽ More We present a quasi-Newton method for unconstrained stochastic optimization. Most existing literature on this topic assumes a setting of stochastic optimization in which a finite sum of component functions is a reasonable approximation of an expectation, and hence one can design a quasi-Newton method to exploit common random numbers. In contrast, and motivated by problems in variational quantum algorithms, we assume that function values and gradients are available only through inexact probabilistic zeroth- and first-order oracles and no common random numbers can be exploited. Our algorithmic framework -- derived from prior work on the SASS algorithm -- is general and does not assume common random numbers. We derive a high-probability tail bound on the iteration complexity of the algorithm for nonconvex and strongly convex functions. We present numerical results demonstrating the empirical benefits of augmenting SASS with our quasi-Newton updating scheme, both on synthetic problems and on real problems in quantum chemistry. △ Less

Submitted 17 February, 2023; originally announced February 2023.

MSC Class: 90C15; 90C53; 90C30; 90C26

arXiv:2207.08264 [pdf, other]

doi 10.1007/s12532-023-00245-5

Structure-Aware Methods for Expensive Derivative-Free Nonsmooth Composite Optimization

Authors: Jeffrey Larson, Matt Menickelly

Abstract: We present new methods for solving a broad class of bound-constrained nonsmooth composite minimization problems. These methods are specially designed for objectives that are some known map** of outputs from a computationally expensive function. We provide accompanying implementations of these methods: in particular, a novel manifold sampling algorithm (\mspshortref) with subproblems that are in… ▽ More We present new methods for solving a broad class of bound-constrained nonsmooth composite minimization problems. These methods are specially designed for objectives that are some known map** of outputs from a computationally expensive function. We provide accompanying implementations of these methods: in particular, a novel manifold sampling algorithm (\mspshortref) with subproblems that are in a sense primal versions of the dual problems solved by previous manifold sampling methods and a method (\goombahref) that employs more difficult optimization subproblems. For these two methods, we provide rigorous convergence analysis and guarantees. We demonstrate extensive testing of these methods. Open-source implementations of the methods developed in this manuscript can be found at \url{github.com/POptUS/IBCDFO/}. △ Less

Submitted 20 March, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

arXiv:2207.06305 [pdf, ps, other]

Stochastic Average Model Methods

Authors: Matt Menickelly, Stefan M. Wild

Abstract: We consider the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems. We are motivated by problems in which the summand functions are computationally expensive and evaluating all summands on every iteration of an optimization method may be undesirable. We present the idea of stochastic average model (SAM) m… ▽ More We consider the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems. We are motivated by problems in which the summand functions are computationally expensive and evaluating all summands on every iteration of an optimization method may be undesirable. We present the idea of stochastic average model (SAM) methods, inspired by stochastic average gradient methods. SAM methods sample component functions on each iteration of a trust-region method according to a discrete probability distribution on component functions; the distribution is designed to minimize an upper bound on the variance of the resulting stochastic model. We present promising numerical results concerning an implemented variant extending the derivative-free model-based trust-region solver POUNDERS, which we name SAM-POUNDERS. △ Less

Submitted 20 March, 2024; v1 submitted 13 July, 2022; originally announced July 2022.

arXiv:2202.08387 [pdf, other]

TROPHY: Trust Region Optimization Using a Precision Hierarchy

Authors: Richard J Clancy, Matt Menickelly, Jan Hückelheim, Paul Hovland, Prani Nalluri, Rebecca G**i

Abstract: We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a d… ▽ More We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a desire to improve computational efficiency for massive climate models. We employ our method on two examples: the CUTEst test set and a large-scale data assimilation problem to recover wind fields from radar returns. Although this paper is primarily a proof of concept, we show that if implemented on appropriate hardware, the use of mixed-precision can significantly reduce the computational load compared with fixed-precision solvers. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: 14 pages, 2 figures, 2 tables

MSC Class: 90-08 ACM Class: G.1.6

arXiv:2201.13438 [pdf, other]

doi 10.22331/q-2023-03-16-949

Latency considerations for stochastic optimizers in variational quantum algorithms

Authors: Matt Menickelly, Yunsoo Ha, Matthew Otten

Abstract: Variational quantum algorithms, which have risen to prominence in the noisy intermediate-scale quantum setting, require the implementation of a stochastic optimizer on classical hardware. To date, most research has employed algorithms based on the stochastic gradient iteration as the stochastic classical optimizer. In this work we propose instead using stochastic optimization algorithms that yield… ▽ More Variational quantum algorithms, which have risen to prominence in the noisy intermediate-scale quantum setting, require the implementation of a stochastic optimizer on classical hardware. To date, most research has employed algorithms based on the stochastic gradient iteration as the stochastic classical optimizer. In this work we propose instead using stochastic optimization algorithms that yield stochastic processes emulating the dynamics of classical deterministic algorithms. This approach results in methods with theoretically superior worst-case iteration complexities, at the expense of greater per-iteration sample (shot) complexities. We investigate this trade-off both theoretically and empirically and conclude that preferences for a choice of stochastic optimizer should explicitly depend on a function of both latency and shot execution times. △ Less

Submitted 17 October, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

Journal ref: Quantum 7, 949 (2023)

arXiv:2011.01283 [pdf, ps, other]

doi 10.1137/20M1378089

Manifold Sampling for Optimizing Nonsmooth Nonconvex Compositions

Authors: Jeffrey Larson, Matt Menickelly, Baoyu Zhou

Abstract: We propose a manifold sampling algorithm for minimizing a nonsmooth composition $f= h\circ F$, where we assume $h$ is nonsmooth and may be inexpensively computed in closed form and $F$ is smooth but its Jacobian may not be available. We additionally assume that the composition $h\circ F$ defines a continuous selection. Manifold sampling algorithms can be classified as model-based derivative-free m… ▽ More We propose a manifold sampling algorithm for minimizing a nonsmooth composition $f= h\circ F$, where we assume $h$ is nonsmooth and may be inexpensively computed in closed form and $F$ is smooth but its Jacobian may not be available. We additionally assume that the composition $h\circ F$ defines a continuous selection. Manifold sampling algorithms can be classified as model-based derivative-free methods, in that models of $F$ are combined with particularly sampled information about $h$ to yield local models for use within a trust-region framework. We demonstrate that cluster points of the sequence of iterates generated by the manifold sampling algorithm are Clarke stationary. We consider the tractability of three particular subproblems generated by the manifold sampling algorithm and the extent to which inexact solutions to these subproblems may be tolerated. Numerical results demonstrate that manifold sampling as a derivative-free algorithm is competitive with state-of-the-art algorithms for nonsmooth optimization that utilize first-order information about $f$. △ Less

Submitted 13 January, 2022; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: 29 pages, 7 figures

arXiv:2010.05668 [pdf, other]

doi 10.1088/1361-6471/abd009

Optimization and Supervised Machine Learning Methods for Fitting Numerical Physics Models without Derivatives

Authors: Raghu Bollapragada, Matt Menickelly, Witold Nazarewicz, Jared O'Neal, Paul-Gerhard Reinhard, Stefan M. Wild

Abstract: We address the calibration of a computationally expensive nuclear physics model for which derivative information with respect to the fit parameters is not readily available. Of particular interest is the performance of optimization-based training algorithms when dozens, rather than millions or more, of training data are available and when the expense of the model places limitations on the number o… ▽ More We address the calibration of a computationally expensive nuclear physics model for which derivative information with respect to the fit parameters is not readily available. Of particular interest is the performance of optimization-based training algorithms when dozens, rather than millions or more, of training data are available and when the expense of the model places limitations on the number of concurrent model evaluations that can be performed. As a case study, we consider the Fayans energy density functional model, which has characteristics similar to many model fitting and calibration problems in nuclear physics. We analyze hyperparameter tuning considerations and variability associated with stochastic optimization algorithms and illustrate considerations for tuning in different computational settings. △ Less

Submitted 14 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: 25-page article, 9-page supplement, 1-page notice

arXiv:2001.00887 [pdf, other]

Tuning Multigrid Methods with Robust Optimization

Authors: Jed Brown, Yunhui He, Scott MacLachlan, Matt Menickelly, Stefan M. Wild

Abstract: Local Fourier analysis is a useful tool for predicting and analyzing the performance of many efficient algorithms for the solution of discretized PDEs, such as multigrid and domain decomposition methods. The crucial aspect of local Fourier analysis is that it can be used to minimize an estimate of the spectral radius of a stationary iteration, or the condition number of a preconditioned system, in… ▽ More Local Fourier analysis is a useful tool for predicting and analyzing the performance of many efficient algorithms for the solution of discretized PDEs, such as multigrid and domain decomposition methods. The crucial aspect of local Fourier analysis is that it can be used to minimize an estimate of the spectral radius of a stationary iteration, or the condition number of a preconditioned system, in terms of a symbol representation of the algorithm. In practice, this is a "minimax" problem, minimizing with respect to solver parameters the appropriate measure of work, which involves maximizing over the Fourier frequency. Often, several algorithmic parameters may be determined by local Fourier analysis in order to obtain efficient algorithms. Analytical solutions to minimax problems are rarely possible beyond simple problems; the status quo in local Fourier analysis involves grid sampling, which is prohibitively expensive in high dimensions. In this paper, we propose and explore optimization algorithms to solve these problems efficiently. Several examples, with known and unknown analytical solutions, are presented to show the effectiveness of these approaches. △ Less

Submitted 27 July, 2020; v1 submitted 3 January, 2020; originally announced January 2020.

arXiv:1904.11585 [pdf, other]

doi 10.1017/S0962492919000060

Derivative-free optimization methods

Authors: Jeffrey Larson, Matt Menickelly, Stefan M. Wild

Abstract: In many optimization problems arising from scientific, engineering and artificial intelligence applications, objective and constraint functions are available only as the output of a black-box or simulation oracle that does not provide derivative information. Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization. We provide a review and perspectives on deve… ▽ More In many optimization problems arising from scientific, engineering and artificial intelligence applications, objective and constraint functions are available only as the output of a black-box or simulation oracle that does not provide derivative information. Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization. We provide a review and perspectives on developments in these methods, with an emphasis on highlighting recent developments and on unifying treatment of such problems in the non-linear optimization and machine learning literature. We categorize methods based on assumed properties of the black-box functions, as well as features of the methods. We first overview the primary setting of deterministic methods applied to unconstrained, non-convex optimization problems where the objective function is defined by a deterministic black-box oracle. We then discuss developments in randomized methods, methods that assume some additional structure about the objective (including convexity, separability and general non-smooth compositions), methods for problems where the output of the black-box oracle is stochastic, and methods for handling different types of constraints. △ Less

Submitted 25 June, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

Journal ref: Acta Numerica 28 (2019) 287-404

arXiv:1807.02736 [pdf, other]

Robust Learning of Trimmed Estimators via Manifold Sampling

Authors: Matt Menickelly, Stefan M. Wild

Abstract: We adapt a manifold sampling algorithm for the nonsmooth, nonconvex formulations of learning that arise when imposing robustness to outliers present in the training data. We demonstrate the approach on objectives based on trimmed loss. Empirical results show that the method has favorable scaling properties. Although savings in time come at the expense of not certifying optimality, the algorithm co… ▽ More We adapt a manifold sampling algorithm for the nonsmooth, nonconvex formulations of learning that arise when imposing robustness to outliers present in the training data. We demonstrate the approach on objectives based on trimmed loss. Empirical results show that the method has favorable scaling properties. Although savings in time come at the expense of not certifying optimality, the algorithm consistently returns high-quality solutions on the trimmed linear regression and multiclass classification problems tested. △ Less

Submitted 7 July, 2018; originally announced July 2018.

Comments: In ICML 2018 Workshop on Modern Trends in Nonconvex Optimization for Machine Learning

arXiv:1711.06082 [pdf, other]

Accurate, rapid identification of dislocation lines in coherent diffractive imaging via a min-max optimization formulation

Authors: A. Ulvestad, M. Menickelly, S. M. Wild

Abstract: Defects such as dislocations impact materials properties and their response during external stimuli. Defect engineering has emerged as a possible route to improving the performance of materials over a wide range of applications, including batteries, solar cells, and semiconductors. Imaging these defects in their native operating conditions to establish the structure-function relationship and, ulti… ▽ More Defects such as dislocations impact materials properties and their response during external stimuli. Defect engineering has emerged as a possible route to improving the performance of materials over a wide range of applications, including batteries, solar cells, and semiconductors. Imaging these defects in their native operating conditions to establish the structure-function relationship and, ultimately, to improve performance has remained a considerable challenge for both electron-based and x-ray-based imaging techniques. However, the advent of Bragg coherent x-ray diffractive imaging (BCDI) has made possible the 3D imaging of multiple dislocations in nanoparticles ranging in size from 100 nm to1000 nm. While the imaging process succeeds in many cases, nuances in identifying the dislocations has left manual identification as the preferred method. Derivative-based methods are also used, but they can be inaccurate and are computationally inefficient. Here we demonstrate a derivative-free method that is both more accurate and more computationally efficient than either derivative- or human-based methods for identifying 3D dislocation lines in nanocrystal images produced by BCDI. We formulate the problem as a min-max optimization problem and show exceptional accuracy for experimental images. We demonstrate a 260x speedup for a typical experimental dataset with higher accuracy over current methods. We discuss the possibility of using this algorithm as part of a sparsity-based phase retrieval process. We also provide the MATLAB code for use by other researchers. △ Less

Submitted 16 November, 2017; originally announced November 2017.

arXiv:1612.03225 [pdf, ps, other]

Optimal Generalized Decision Trees via Integer Programming

Authors: Oktay Gunluk, Jayant Kalagnanam, Minhan Li, Matt Menickelly, Katya Scheinberg

Abstract: Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a pres… ▽ More Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. Our approach can also handle numerical features via thresholding. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers. △ Less

Submitted 13 August, 2019; v1 submitted 9 December, 2016; originally announced December 2016.

MSC Class: 90C10

arXiv:1609.07428 [pdf, ps, other]

Convergence Rate Analysis of a Stochastic Trust Region Method via Submartingales

Authors: Jose Blanchet, Coralia Cartis, Matt Menickelly, Katya Scheinberg

Abstract: We propose a novel framework for analyzing convergence rates of stochastic optimization algorithms with adaptive step sizes. This framework is based on analyzing properties of an underlying generic stochastic process, in particular by deriving a bound on the expected stop** time of this process. We utilize this framework to analyze the bounds on expected global convergence rates of a stochastic… ▽ More We propose a novel framework for analyzing convergence rates of stochastic optimization algorithms with adaptive step sizes. This framework is based on analyzing properties of an underlying generic stochastic process, in particular by deriving a bound on the expected stop** time of this process. We utilize this framework to analyze the bounds on expected global convergence rates of a stochastic variant of a traditional trust region method, introduced in \cite{ChenMenickellyScheinberg2014}. While traditional trust region methods rely on exact computations of the gradient, Hessian and values of the objective function, this method assumes that these values are available up to some dynamically adjusted accuracy. Moreover, this accuracy is assumed to hold only with some sufficiently large, but fixed, probability, without any additional restrictions on the variance of the errors. This setting applies, for example, to standard stochastic optimization and machine learning formulations. Improving upon the analysis in \cite{ChenMenickellyScheinberg2014}, we show that the stochastic process defined by the algorithm satisfies the assumptions of our proposed general framework, with the stop** time defined as reaching accuracy $\|\nabla f(x)\|\leq ε$. The resulting bound for this stop** time is $O(ε^{-2})$, under the assumption of sufficiently accurate stochastic gradient, and is the first global complexity bound for a stochastic trust-region method. Finally, we apply the same framework to derive second order complexity bound under some additional assumptions. △ Less

Submitted 19 October, 2018; v1 submitted 23 September, 2016; originally announced September 2016.

Comments: 22 pages. arXiv admin note: text overlap with arXiv:1504.04231

arXiv:1504.04231 [pdf, other]

Stochastic Optimization Using a Trust-Region Method and Random Models

Authors: Ruobing Chen, Matt Menickelly, Katya Scheinberg

Abstract: In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function $f(x)$, obtained from stochastic observations of the function or its gradient. Our method also utilizes estimates of function values to gauge progress that is being made. The convergence analysis relies… ▽ More In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function $f(x)$, obtained from stochastic observations of the function or its gradient. Our method also utilizes estimates of function values to gauge progress that is being made. The convergence analysis relies on requirements that these models and these estimates are sufficiently accurate with sufficiently high, but fixed, probability. Beyond these conditions, no assumptions are made on how these models and estimates are generated. Under these general conditions we show an almost sure global convergence of the method to a first order stationary point. In the second part of the paper, we present examples of generating sufficiently accurate random models under biased or unbiased noise assumptions. Lastly, we present some computational results showing the benefits of the proposed method compared to existing approaches that are based on sample averaging or stochastic gradients. △ Less

Submitted 23 September, 2016; v1 submitted 16 April, 2015; originally announced April 2015.

Comments: Revised version posted September 23, 2016. Originally posted April 17, 2015

Showing 1–19 of 19 results for author: Menickelly, M