-
Challenges with Differentiable Quantum Dynamics
Authors:
Sri Hari Krishna Narayanan,
Michael Perlin,
Robert Lewis-Swan,
Jeffrey Larson,
Matt Menickelly,
Jan Hückelheim,
Paul Hovland
Abstract:
Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application r…
▽ More
Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application requirements. We therefore demonstrate a need for broader support of complex-valued, differentiable numerical integration in scientific computing libraries.
△ Less
Submitted 18 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Estimating Computational Noise on Parametric Curves
Authors:
Matt Menickelly
Abstract:
We consider ECNoise, a practical tool for estimating the magnitude of noise in evaluations of a black-box function. Recent developments in numerical optimization algorithms have seen increased usage of ECNoise as a subroutine to provide a solver with noise level estimates, so that the solver might somehow proportionally adjust for noise. Particularly motivated by problems in computationally expens…
▽ More
We consider ECNoise, a practical tool for estimating the magnitude of noise in evaluations of a black-box function. Recent developments in numerical optimization algorithms have seen increased usage of ECNoise as a subroutine to provide a solver with noise level estimates, so that the solver might somehow proportionally adjust for noise. Particularly motivated by problems in computationally expensive derivative-free optimization, we question a fundamental assumption made in the original development of ECNoise, particularly the assumption that the set of points provided to ECNoise must satisfy fairly restrictive geometric conditions (in particular, that the points be collinear and equally spaced). Driven by prior practical experience, we show that in many situations, noise estimates obtained from providing an arbitrary (that is, not collinear) geometry of points as input to ECNoise are often indistinguishable from noise estimates obtained from using the standard (collinear and equally spaced) geometry. We analyze this via parametric curves that interpolate the arbitrary input points. The analysis provides insight into the circumstances in which one can expect arbitrary point selection to cause significant degradation of ECNoise. Moreover, the analysis suggests a practical means (the solution of a small mixed integer linear program) by which one can gradually adjust an initial arbitrary point selection to yield better noise estimates with higher probability.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
A Novel Noise-Aware Classical Optimizer for Variational Quantum Algorithms
Authors:
Jeffrey Larson,
Matt Menickelly,
Jiahao Shi
Abstract:
A key component of variational quantum algorithms (VQAs) is the choice of classical optimizer employed to update the parameterization of an ansatz. It is well recognized that quantum algorithms will, for the foreseeable future, necessarily be run on noisy devices with limited fidelities. Thus, the evaluation of an objective function (e.g., the guiding function in the quantum approximate optimizati…
▽ More
A key component of variational quantum algorithms (VQAs) is the choice of classical optimizer employed to update the parameterization of an ansatz. It is well recognized that quantum algorithms will, for the foreseeable future, necessarily be run on noisy devices with limited fidelities. Thus, the evaluation of an objective function (e.g., the guiding function in the quantum approximate optimization algorithm (QAOA) or the expectation of the electronic Hamiltonian in variational quantum eigensolver (VQE)) required by a classical optimizer is subject not only to stochastic error from estimating an expected value but also to error resulting from intermittent hardware noise. Model-based derivative-free optimization methods have emerged as popular choices of a classical optimizer in the noisy VQA setting, based on empirical studies. However, these optimization methods were not explicitly designed with the consideration of noise. In this work we adapt recent developments from the ``noise-aware numerical optimization'' literature to these commonly used derivative-free model-based methods. We introduce the key defining characteristics of these novel noise-aware derivative-free model-based methods that separate them from standard model-based methods. We study an implementation of such noise-aware derivative-free model-based methods and compare its performance on demonstrative VQA simulations to classical solvers packaged in \texttt{scikit-quant}.
△ Less
Submitted 14 June, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Two-Stage Estimation and Variance Modeling for Latency-Constrained Variational Quantum Algorithms
Authors:
Yunsoo Ha,
Sara Shashaani,
Matt Menickelly
Abstract:
The Quantum Approximate Optimization Algorithm (QAOA) has enjoyed increasing attention in noisy intermediate-scale quantum computing due to its application to combinatorial optimization problems. Because combinatorial optimization problems are NP-hard, QAOA could serve as a potential demonstration of quantum advantage in the future. As a hybrid quantum-classical algorithm, the classical component…
▽ More
The Quantum Approximate Optimization Algorithm (QAOA) has enjoyed increasing attention in noisy intermediate-scale quantum computing due to its application to combinatorial optimization problems. Because combinatorial optimization problems are NP-hard, QAOA could serve as a potential demonstration of quantum advantage in the future. As a hybrid quantum-classical algorithm, the classical component of QAOA resembles a simulation optimization problem, in which the simulation outcomes are attainable only through the quantum computer. The simulation that derives from QAOA exhibits two unique features that can have a substantial impact on the optimization process: (i) the variance of the stochastic objective values typically decreases in proportion to the optimality gap, and (ii) querying samples from a quantum computer introduces an additional latency overhead. In this paper, we introduce a novel stochastic trust-region method, derived from a derivative-free adaptive sampling trust-region optimization (ASTRO-DF) method, intended to efficiently solve the classical optimization problem in QAOA, by explicitly taking into account the two mentioned characteristics. The key idea behind the proposed algorithm involves constructing two separate local models in each iteration: a model of the objective function, and a model of the variance of the objective function. Exploiting the variance model allows us to both restrict the number of communications with the quantum computer, and also helps navigate the nonconvex objective landscapes typical in the QAOA optimization problems. We numerically demonstrate the superiority of our proposed algorithm using the SimOpt library and Qiskit, when we consider a metric of computational burden that explicitly accounts for communication costs.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Avoiding Geometry Improvement in Derivative-Free Model-Based Methods via Randomization
Authors:
Matt Menickelly
Abstract:
We present a technique for model-based derivative-free optimization called \emph{basis sketching}. Basis sketching consists of taking random sketches of the Vandermonde matrix employed in constructing an interpolation model. This randomization enables weakening the general requirement in model-based derivative-free methods that interpolation sets contain a full-dimensional set of affinely independ…
▽ More
We present a technique for model-based derivative-free optimization called \emph{basis sketching}. Basis sketching consists of taking random sketches of the Vandermonde matrix employed in constructing an interpolation model. This randomization enables weakening the general requirement in model-based derivative-free methods that interpolation sets contain a full-dimensional set of affinely independent points in every iteration. Practically, this weakening provides a theoretically justified means of avoiding potentially expensive geometry improvement steps in many model-based derivative-free methods. We demonstrate this practicality by extending the nonlinear least squares solver, \texttt{POUNDers} to a variant that employs basis sketching and we observe encouraging results on higher dimensional problems.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
A Stochastic Quasi-Newton Method in the Absence of Common Random Numbers
Authors:
Matt Menickelly,
Stefan M. Wild,
Miaolan Xie
Abstract:
We present a quasi-Newton method for unconstrained stochastic optimization. Most existing literature on this topic assumes a setting of stochastic optimization in which a finite sum of component functions is a reasonable approximation of an expectation, and hence one can design a quasi-Newton method to exploit common random numbers. In contrast, and motivated by problems in variational quantum alg…
▽ More
We present a quasi-Newton method for unconstrained stochastic optimization. Most existing literature on this topic assumes a setting of stochastic optimization in which a finite sum of component functions is a reasonable approximation of an expectation, and hence one can design a quasi-Newton method to exploit common random numbers. In contrast, and motivated by problems in variational quantum algorithms, we assume that function values and gradients are available only through inexact probabilistic zeroth- and first-order oracles and no common random numbers can be exploited. Our algorithmic framework -- derived from prior work on the SASS algorithm -- is general and does not assume common random numbers. We derive a high-probability tail bound on the iteration complexity of the algorithm for nonconvex and strongly convex functions. We present numerical results demonstrating the empirical benefits of augmenting SASS with our quasi-Newton updating scheme, both on synthetic problems and on real problems in quantum chemistry.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Structure-Aware Methods for Expensive Derivative-Free Nonsmooth Composite Optimization
Authors:
Jeffrey Larson,
Matt Menickelly
Abstract:
We present new methods for solving a broad class of bound-constrained nonsmooth composite minimization problems. These methods are specially designed for objectives that are some known map** of outputs from a computationally expensive function. We provide accompanying implementations of these methods: in particular, a novel manifold sampling algorithm (\mspshortref) with subproblems that are in…
▽ More
We present new methods for solving a broad class of bound-constrained nonsmooth composite minimization problems. These methods are specially designed for objectives that are some known map** of outputs from a computationally expensive function. We provide accompanying implementations of these methods: in particular, a novel manifold sampling algorithm (\mspshortref) with subproblems that are in a sense primal versions of the dual problems solved by previous manifold sampling methods and a method (\goombahref) that employs more difficult optimization subproblems. For these two methods, we provide rigorous convergence analysis and guarantees. We demonstrate extensive testing of these methods. Open-source implementations of the methods developed in this manuscript can be found at \url{github.com/POptUS/IBCDFO/}.
△ Less
Submitted 20 March, 2023; v1 submitted 17 July, 2022;
originally announced July 2022.
-
Stochastic Average Model Methods
Authors:
Matt Menickelly,
Stefan M. Wild
Abstract:
We consider the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems. We are motivated by problems in which the summand functions are computationally expensive and evaluating all summands on every iteration of an optimization method may be undesirable. We present the idea of stochastic average model (SAM) m…
▽ More
We consider the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems. We are motivated by problems in which the summand functions are computationally expensive and evaluating all summands on every iteration of an optimization method may be undesirable. We present the idea of stochastic average model (SAM) methods, inspired by stochastic average gradient methods. SAM methods sample component functions on each iteration of a trust-region method according to a discrete probability distribution on component functions; the distribution is designed to minimize an upper bound on the variance of the resulting stochastic model. We present promising numerical results concerning an implemented variant extending the derivative-free model-based trust-region solver POUNDERS, which we name SAM-POUNDERS.
△ Less
Submitted 20 March, 2024; v1 submitted 13 July, 2022;
originally announced July 2022.
-
TROPHY: Trust Region Optimization Using a Precision Hierarchy
Authors:
Richard J Clancy,
Matt Menickelly,
Jan Hückelheim,
Paul Hovland,
Prani Nalluri,
Rebecca G**i
Abstract:
We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a d…
▽ More
We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a desire to improve computational efficiency for massive climate models. We employ our method on two examples: the CUTEst test set and a large-scale data assimilation problem to recover wind fields from radar returns. Although this paper is primarily a proof of concept, we show that if implemented on appropriate hardware, the use of mixed-precision can significantly reduce the computational load compared with fixed-precision solvers.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Latency considerations for stochastic optimizers in variational quantum algorithms
Authors:
Matt Menickelly,
Yunsoo Ha,
Matthew Otten
Abstract:
Variational quantum algorithms, which have risen to prominence in the noisy intermediate-scale quantum setting, require the implementation of a stochastic optimizer on classical hardware. To date, most research has employed algorithms based on the stochastic gradient iteration as the stochastic classical optimizer. In this work we propose instead using stochastic optimization algorithms that yield…
▽ More
Variational quantum algorithms, which have risen to prominence in the noisy intermediate-scale quantum setting, require the implementation of a stochastic optimizer on classical hardware. To date, most research has employed algorithms based on the stochastic gradient iteration as the stochastic classical optimizer. In this work we propose instead using stochastic optimization algorithms that yield stochastic processes emulating the dynamics of classical deterministic algorithms. This approach results in methods with theoretically superior worst-case iteration complexities, at the expense of greater per-iteration sample (shot) complexities. We investigate this trade-off both theoretically and empirically and conclude that preferences for a choice of stochastic optimizer should explicitly depend on a function of both latency and shot execution times.
△ Less
Submitted 17 October, 2022; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Manifold Sampling for Optimizing Nonsmooth Nonconvex Compositions
Authors:
Jeffrey Larson,
Matt Menickelly,
Baoyu Zhou
Abstract:
We propose a manifold sampling algorithm for minimizing a nonsmooth composition $f= h\circ F$, where we assume $h$ is nonsmooth and may be inexpensively computed in closed form and $F$ is smooth but its Jacobian may not be available. We additionally assume that the composition $h\circ F$ defines a continuous selection. Manifold sampling algorithms can be classified as model-based derivative-free m…
▽ More
We propose a manifold sampling algorithm for minimizing a nonsmooth composition $f= h\circ F$, where we assume $h$ is nonsmooth and may be inexpensively computed in closed form and $F$ is smooth but its Jacobian may not be available. We additionally assume that the composition $h\circ F$ defines a continuous selection. Manifold sampling algorithms can be classified as model-based derivative-free methods, in that models of $F$ are combined with particularly sampled information about $h$ to yield local models for use within a trust-region framework. We demonstrate that cluster points of the sequence of iterates generated by the manifold sampling algorithm are Clarke stationary. We consider the tractability of three particular subproblems generated by the manifold sampling algorithm and the extent to which inexact solutions to these subproblems may be tolerated. Numerical results demonstrate that manifold sampling as a derivative-free algorithm is competitive with state-of-the-art algorithms for nonsmooth optimization that utilize first-order information about $f$.
△ Less
Submitted 13 January, 2022; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Optimization and Supervised Machine Learning Methods for Fitting Numerical Physics Models without Derivatives
Authors:
Raghu Bollapragada,
Matt Menickelly,
Witold Nazarewicz,
Jared O'Neal,
Paul-Gerhard Reinhard,
Stefan M. Wild
Abstract:
We address the calibration of a computationally expensive nuclear physics model for which derivative information with respect to the fit parameters is not readily available. Of particular interest is the performance of optimization-based training algorithms when dozens, rather than millions or more, of training data are available and when the expense of the model places limitations on the number o…
▽ More
We address the calibration of a computationally expensive nuclear physics model for which derivative information with respect to the fit parameters is not readily available. Of particular interest is the performance of optimization-based training algorithms when dozens, rather than millions or more, of training data are available and when the expense of the model places limitations on the number of concurrent model evaluations that can be performed.
As a case study, we consider the Fayans energy density functional model, which has characteristics similar to many model fitting and calibration problems in nuclear physics. We analyze hyperparameter tuning considerations and variability associated with stochastic optimization algorithms and illustrate considerations for tuning in different computational settings.
△ Less
Submitted 14 December, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Tuning Multigrid Methods with Robust Optimization
Authors:
Jed Brown,
Yunhui He,
Scott MacLachlan,
Matt Menickelly,
Stefan M. Wild
Abstract:
Local Fourier analysis is a useful tool for predicting and analyzing the performance of many efficient algorithms for the solution of discretized PDEs, such as multigrid and domain decomposition methods. The crucial aspect of local Fourier analysis is that it can be used to minimize an estimate of the spectral radius of a stationary iteration, or the condition number of a preconditioned system, in…
▽ More
Local Fourier analysis is a useful tool for predicting and analyzing the performance of many efficient algorithms for the solution of discretized PDEs, such as multigrid and domain decomposition methods. The crucial aspect of local Fourier analysis is that it can be used to minimize an estimate of the spectral radius of a stationary iteration, or the condition number of a preconditioned system, in terms of a symbol representation of the algorithm. In practice, this is a "minimax" problem, minimizing with respect to solver parameters the appropriate measure of work, which involves maximizing over the Fourier frequency. Often, several algorithmic parameters may be determined by local Fourier analysis in order to obtain efficient algorithms. Analytical solutions to minimax problems are rarely possible beyond simple problems; the status quo in local Fourier analysis involves grid sampling, which is prohibitively expensive in high dimensions. In this paper, we propose and explore optimization algorithms to solve these problems efficiently. Several examples, with known and unknown analytical solutions, are presented to show the effectiveness of these approaches.
△ Less
Submitted 27 July, 2020; v1 submitted 3 January, 2020;
originally announced January 2020.
-
Derivative-free optimization methods
Authors:
Jeffrey Larson,
Matt Menickelly,
Stefan M. Wild
Abstract:
In many optimization problems arising from scientific, engineering and artificial intelligence applications, objective and constraint functions are available only as the output of a black-box or simulation oracle that does not provide derivative information. Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization. We provide a review and perspectives on deve…
▽ More
In many optimization problems arising from scientific, engineering and artificial intelligence applications, objective and constraint functions are available only as the output of a black-box or simulation oracle that does not provide derivative information. Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization. We provide a review and perspectives on developments in these methods, with an emphasis on highlighting recent developments and on unifying treatment of such problems in the non-linear optimization and machine learning literature. We categorize methods based on assumed properties of the black-box functions, as well as features of the methods. We first overview the primary setting of deterministic methods applied to unconstrained, non-convex optimization problems where the objective function is defined by a deterministic black-box oracle. We then discuss developments in randomized methods, methods that assume some additional structure about the objective (including convexity, separability and general non-smooth compositions), methods for problems where the output of the black-box oracle is stochastic, and methods for handling different types of constraints.
△ Less
Submitted 25 June, 2019; v1 submitted 25 April, 2019;
originally announced April 2019.
-
Robust Learning of Trimmed Estimators via Manifold Sampling
Authors:
Matt Menickelly,
Stefan M. Wild
Abstract:
We adapt a manifold sampling algorithm for the nonsmooth, nonconvex formulations of learning that arise when imposing robustness to outliers present in the training data. We demonstrate the approach on objectives based on trimmed loss. Empirical results show that the method has favorable scaling properties. Although savings in time come at the expense of not certifying optimality, the algorithm co…
▽ More
We adapt a manifold sampling algorithm for the nonsmooth, nonconvex formulations of learning that arise when imposing robustness to outliers present in the training data. We demonstrate the approach on objectives based on trimmed loss. Empirical results show that the method has favorable scaling properties. Although savings in time come at the expense of not certifying optimality, the algorithm consistently returns high-quality solutions on the trimmed linear regression and multiclass classification problems tested.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.
-
Accurate, rapid identification of dislocation lines in coherent diffractive imaging via a min-max optimization formulation
Authors:
A. Ulvestad,
M. Menickelly,
S. M. Wild
Abstract:
Defects such as dislocations impact materials properties and their response during external stimuli. Defect engineering has emerged as a possible route to improving the performance of materials over a wide range of applications, including batteries, solar cells, and semiconductors. Imaging these defects in their native operating conditions to establish the structure-function relationship and, ulti…
▽ More
Defects such as dislocations impact materials properties and their response during external stimuli. Defect engineering has emerged as a possible route to improving the performance of materials over a wide range of applications, including batteries, solar cells, and semiconductors. Imaging these defects in their native operating conditions to establish the structure-function relationship and, ultimately, to improve performance has remained a considerable challenge for both electron-based and x-ray-based imaging techniques. However, the advent of Bragg coherent x-ray diffractive imaging (BCDI) has made possible the 3D imaging of multiple dislocations in nanoparticles ranging in size from 100 nm to1000 nm. While the imaging process succeeds in many cases, nuances in identifying the dislocations has left manual identification as the preferred method. Derivative-based methods are also used, but they can be inaccurate and are computationally inefficient. Here we demonstrate a derivative-free method that is both more accurate and more computationally efficient than either derivative- or human-based methods for identifying 3D dislocation lines in nanocrystal images produced by BCDI. We formulate the problem as a min-max optimization problem and show exceptional accuracy for experimental images. We demonstrate a 260x speedup for a typical experimental dataset with higher accuracy over current methods. We discuss the possibility of using this algorithm as part of a sparsity-based phase retrieval process. We also provide the MATLAB code for use by other researchers.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Optimal Generalized Decision Trees via Integer Programming
Authors:
Oktay Gunluk,
Jayant Kalagnanam,
Minhan Li,
Matt Menickelly,
Katya Scheinberg
Abstract:
Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a pres…
▽ More
Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. Our approach can also handle numerical features via thresholding. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.
△ Less
Submitted 13 August, 2019; v1 submitted 9 December, 2016;
originally announced December 2016.
-
Convergence Rate Analysis of a Stochastic Trust Region Method via Submartingales
Authors:
Jose Blanchet,
Coralia Cartis,
Matt Menickelly,
Katya Scheinberg
Abstract:
We propose a novel framework for analyzing convergence rates of stochastic optimization algorithms with adaptive step sizes. This framework is based on analyzing properties of an underlying generic stochastic process, in particular by deriving a bound on the expected stop** time of this process. We utilize this framework to analyze the bounds on expected global convergence rates of a stochastic…
▽ More
We propose a novel framework for analyzing convergence rates of stochastic optimization algorithms with adaptive step sizes. This framework is based on analyzing properties of an underlying generic stochastic process, in particular by deriving a bound on the expected stop** time of this process. We utilize this framework to analyze the bounds on expected global convergence rates of a stochastic variant of a traditional trust region method, introduced in \cite{ChenMenickellyScheinberg2014}. While traditional trust region methods rely on exact computations of the gradient, Hessian and values of the objective function, this method assumes that these values are available up to some dynamically adjusted accuracy. Moreover, this accuracy is assumed to hold only with some sufficiently large, but fixed, probability, without any additional restrictions on the variance of the errors. This setting applies, for example, to standard stochastic optimization and machine learning formulations. Improving upon the analysis in \cite{ChenMenickellyScheinberg2014}, we show that the stochastic process defined by the algorithm satisfies the assumptions of our proposed general framework, with the stop** time defined as reaching accuracy $\|\nabla f(x)\|\leq ε$. The resulting bound for this stop** time is $O(ε^{-2})$, under the assumption of sufficiently accurate stochastic gradient, and is the first global complexity bound for a stochastic trust-region method. Finally, we apply the same framework to derive second order complexity bound under some additional assumptions.
△ Less
Submitted 19 October, 2018; v1 submitted 23 September, 2016;
originally announced September 2016.
-
Stochastic Optimization Using a Trust-Region Method and Random Models
Authors:
Ruobing Chen,
Matt Menickelly,
Katya Scheinberg
Abstract:
In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function $f(x)$, obtained from stochastic observations of the function or its gradient. Our method also utilizes estimates of function values to gauge progress that is being made. The convergence analysis relies…
▽ More
In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function $f(x)$, obtained from stochastic observations of the function or its gradient. Our method also utilizes estimates of function values to gauge progress that is being made. The convergence analysis relies on requirements that these models and these estimates are sufficiently accurate with sufficiently high, but fixed, probability. Beyond these conditions, no assumptions are made on how these models and estimates are generated. Under these general conditions we show an almost sure global convergence of the method to a first order stationary point. In the second part of the paper, we present examples of generating sufficiently accurate random models under biased or unbiased noise assumptions. Lastly, we present some computational results showing the benefits of the proposed method compared to existing approaches that are based on sample averaging or stochastic gradients.
△ Less
Submitted 23 September, 2016; v1 submitted 16 April, 2015;
originally announced April 2015.