-
Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter
Authors:
M. Aamir,
B. Acar,
G. Adamov,
T. Adams,
C. Adloff,
S. Afanasiev,
C. Agrawal,
C. Agrawal,
A. Ahmad,
H. A. Ahmed,
S. Akbar,
N. Akchurin,
B. Akgul,
B. Akgun,
R. O. Akpinar,
E. Aktas,
A. AlKadhim,
V. Alexakhin,
J. Alimena,
J. Alison,
A. Alpana,
W. Alshehri,
P. Alvarez Dominguez,
M. Alyari,
C. Amendola
, et al. (550 additional authors not shown)
Abstract:
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr…
▽ More
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
$^{171}$Yb$^+$ optical clock with $2.2\times 10^{-18}$ systematic uncertainty and absolute frequency measurements
Authors:
Alexandra Tofful,
Charles F. A. Baynham,
E. Anne Curtis,
Adam O. Parsons,
Billy I. Robertson,
Marco Schioppo,
Jacob Tunesi,
Helen S. Margolis,
Richard J. Hendricks,
Josh Whale,
Richard C. Thompson,
Rachel M. Godun
Abstract:
A full evaluation of the uncertainty budget for the ytterbium ion optical clock at the National Physical Laboratory (NPL) was performed on the electric octupole (E3) $^2\mathrm{S}_{1/2}\,\rightarrow\, ^2\mathrm{F}_{7/2}$ transition. The total systematic frequency shift was measured with a fractional standard systematic uncertainty of $2.2\times 10^{-18}$. Furthermore, the absolute frequency of the…
▽ More
A full evaluation of the uncertainty budget for the ytterbium ion optical clock at the National Physical Laboratory (NPL) was performed on the electric octupole (E3) $^2\mathrm{S}_{1/2}\,\rightarrow\, ^2\mathrm{F}_{7/2}$ transition. The total systematic frequency shift was measured with a fractional standard systematic uncertainty of $2.2\times 10^{-18}$. Furthermore, the absolute frequency of the E3 transition of the $^{171}$Yb$^+$ ion was measured between 2019 and 2023 via a link to International Atomic Time (TAI) and against the local caesium fountain NPL-CsF2. The absolute frequencies were measured with fractional standard uncertainties between $3.7 \times 10^{-16}$ and $1.1 \times 10^{-15}$, and all were in agreement with the 2021 BIPM recommended frequency.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Almost-sure convergence of iterates and multipliers in stochastic sequential quadratic optimization
Authors:
Frank E. Curtis,
Xin Jiang,
Qi Wang
Abstract:
Stochastic sequential quadratic optimization (SQP) methods for solving continuous optimization problems with nonlinear equality constraints have attracted attention recently, such as for solving large-scale data-fitting problems subject to nonconvex constraints. However, for a recently proposed subclass of such methods that is built on the popular stochastic-gradient methodology from the unconstra…
▽ More
Stochastic sequential quadratic optimization (SQP) methods for solving continuous optimization problems with nonlinear equality constraints have attracted attention recently, such as for solving large-scale data-fitting problems subject to nonconvex constraints. However, for a recently proposed subclass of such methods that is built on the popular stochastic-gradient methodology from the unconstrained setting, convergence guarantees have been limited to the asymptotic convergence of the expected value of a stationarity measure to zero. This is in contrast to the unconstrained setting in which almost-sure convergence guarantees (of the gradient of the objective to zero) can be proved for stochastic-gradient-based methods. In this paper, new almost-sure convergence guarantees for the primal iterates, Lagrange multipliers, and stationarity measures generated by a stochastic SQP algorithm in this subclass of methods are proved. It is shown that the error in the Lagrange multipliers can be bounded by the distance of the primal iterate to a primal stationary point plus the error in the latest stochastic gradient estimate. It is further shown that, subject to certain assumptions, this latter error can be made to vanish by employing a running average of the Lagrange multipliers that are computed during the run of the algorithm. The results of numerical experiments are provided to demonstrate the proved theoretical guarantees.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
A Stochastic-Gradient-based Interior-Point Algorithm for Solving Smooth Bound-Constrained Optimization Problems
Authors:
Frank E. Curtis,
Vyacheslav Kungurtsev,
Daniel P. Robinson,
Qi Wang
Abstract:
A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using sto…
▽ More
A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using stochastic gradient estimates. It is also unique in its use of inner neighborhoods of the feasible region -- defined by a positive and vanishing neighborhood-parameter sequence -- in which the iterates are forced to remain. It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings. The results of numerical experiments show that in both settings the algorithm can outperform projection-based methods.
△ Less
Submitted 13 March, 2024; v1 submitted 28 April, 2023;
originally announced April 2023.
-
Sequential Quadratic Optimization for Stochastic Optimization with Deterministic Nonlinear Inequality and Equality Constraints
Authors:
Frank E. Curtis,
Daniel P. Robinson,
Baoyu Zhou
Abstract:
A sequential quadratic optimization algorithm for minimizing an objective function defined by an expectation subject to nonlinear inequality and equality constraints is proposed, analyzed, and tested. The context of interest is when it is tractable to evaluate constraint function and derivative values in each iteration, but it is intractable to evaluate the objective function or its derivatives in…
▽ More
A sequential quadratic optimization algorithm for minimizing an objective function defined by an expectation subject to nonlinear inequality and equality constraints is proposed, analyzed, and tested. The context of interest is when it is tractable to evaluate constraint function and derivative values in each iteration, but it is intractable to evaluate the objective function or its derivatives in any iteration, and instead an algorithm can only make use of stochastic objective gradient estimates. Under loose assumptions, including that the gradient estimates are unbiased, the algorithm is proved to possess convergence guarantees in expectation. The results of numerical experiments are presented to demonstrate that the proposed algorithm can outperform an alternative approach that relies on the ability to compute more accurate gradient estimates.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization
Authors:
Yutong Dai,
Guanyi Wang,
Frank E. Curtis,
Daniel P. Robinson
Abstract:
This paper introduces a new proximal stochastic gradient method with variance reduction and stabilization for minimizing the sum of a convex stochastic function and a group sparsity-inducing regularization function. Since the method may be viewed as a stabilized version of the recently proposed algorithm PStorm, we call our algorithm S-PStorm. Our analysis shows that S-PStorm has strong convergenc…
▽ More
This paper introduces a new proximal stochastic gradient method with variance reduction and stabilization for minimizing the sum of a convex stochastic function and a group sparsity-inducing regularization function. Since the method may be viewed as a stabilized version of the recently proposed algorithm PStorm, we call our algorithm S-PStorm. Our analysis shows that S-PStorm has strong convergence results. In particular, we prove an upper bound on the number of iterations required by S-PStorm before its iterates correctly identify (with high probability) an optimal support (i.e., the zero and nonzero structure of an optimal solution). Most algorithms in the literature with such a support identification property use variance reduction techniques that require either periodically evaluating an exact gradient or storing a history of stochastic gradients. Unlike these methods, S-PStorm achieves variance reduction without requiring either of these, which is advantageous. Moreover, our support-identification result for S-PStorm shows that, with high probability, an optimal support will be identified correctly in all iterations with the index above a threshold. We believe that this type of result is new to the literature since the few existing other results prove that the optimal support is identified with high probability at each iteration with a sufficiently large index (meaning that the optimal support might be identified in some iterations, but not in others). Numerical experiments on regularized logistic loss problems show that S-PStorm outperforms existing methods in various metrics that measure how efficiently and robustly iterates of an algorithm identify an optimal support.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Analysis of atomic-clock data to constrain variations of fundamental constants
Authors:
Nathaniel Sherrill,
Adam O. Parsons,
Charles F. A. Baynham,
William Bowden,
E. Anne Curtis,
Richard Hendricks,
Ian R. Hill,
Richard Hobson,
Helen S. Margolis,
Billy I. Robertson,
Marco Schioppo,
Krzysztof Szymaniec,
Alexandra Tofful,
Jacob Tunesi,
Rachel M. Godun,
Xavier Calmet
Abstract:
We present a new framework to study the time variation of fundamental constants in a model-independent way. Model independence implies more free parameters than assumed in previous studies. Using data from atomic clocks based on $^{87}$Sr, $^{171}$Yb$^+$ and $^{133}$Cs, we set bounds on parameters controlling the variation of the fine-structure constant, $α$, and the electron-to-proton mass ratio,…
▽ More
We present a new framework to study the time variation of fundamental constants in a model-independent way. Model independence implies more free parameters than assumed in previous studies. Using data from atomic clocks based on $^{87}$Sr, $^{171}$Yb$^+$ and $^{133}$Cs, we set bounds on parameters controlling the variation of the fine-structure constant, $α$, and the electron-to-proton mass ratio, $μ$. We consider variations on timescales ranging from a minute to almost a day. In addition, we use our results to derive some of the tightest limits to date on the parameter space of models of ultralight dark matter and axion-like particles.
△ Less
Submitted 15 December, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Incremental Quasi-Newton Algorithms for Solving Nonconvex, Nonsmooth, Finite-Sum Optimization Problems
Authors:
Gulcin Dinc Yalcin,
Frank E. Curtis
Abstract:
Algorithms for solving nonconvex, nonsmooth, finite-sum optimization problems are proposed and tested. In particular, the algorithms are proposed and tested in the context of an optimization problem formulation arising in semi-supervised machine learning. The common feature of all algorithms is that they employ an incremental quasi-Newton (IQN) strategy, specifically an incremental BFGS (IBFGS) st…
▽ More
Algorithms for solving nonconvex, nonsmooth, finite-sum optimization problems are proposed and tested. In particular, the algorithms are proposed and tested in the context of an optimization problem formulation arising in semi-supervised machine learning. The common feature of all algorithms is that they employ an incremental quasi-Newton (IQN) strategy, specifically an incremental BFGS (IBFGS) strategy. One applies an IBFGS strategy to the problem directly, whereas the others apply an IBFGS strategy to a difference-of-convex reformulation, smoothed approximation, or (strongly) convex local approximation. Experiments show that all IBFGS approaches fare well in practice, and all outperform a state-of-the-art bundle method.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
An inexact column-and-constraint generation method to solve two-stage robust optimization problems
Authors:
Man Yiu Tsang,
Karmel S. Shehadeh,
Frank E. Curtis
Abstract:
We propose a new inexact column-and-constraint generation (i-C&CG) method to solve two-stage robust optimization problems. The method allows solutions to the master problems to be inexact, which is desirable when solving large-scale and/or challenging problems. It is equipped with a backtracking routine that controls the trade-off between bound improvement and inexactness. Importantly, this routin…
▽ More
We propose a new inexact column-and-constraint generation (i-C&CG) method to solve two-stage robust optimization problems. The method allows solutions to the master problems to be inexact, which is desirable when solving large-scale and/or challenging problems. It is equipped with a backtracking routine that controls the trade-off between bound improvement and inexactness. Importantly, this routine allows us to derive theoretical finite convergence guarantees for our i-C&CG method. Numerical experiments demonstrate computational advantages of our i-C&CG method over state-of-the-art column-and-constraint generation methods.
△ Less
Submitted 5 November, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Recent Developments in Security-Constrained AC Optimal Power Flow: Overview of Challenge 1 in the ARPA-E Grid Optimization Competition
Authors:
Ignacio Aravena,
Daniel K. Molzahn,
Shixuan Zhang,
Cosmin G. Petra,
Frank E. Curtis,
Shenyinying Tu,
Andreas Wächter,
Ermin Wei,
Elizabeth Wong,
Amin Gholami,
Kaizhao Sun,
Xu Andy Sun,
Stephen T. Elbert,
Jesse T. Holzer,
Arun Veeramany
Abstract:
The optimal power flow problem is central to many tasks in the design and operation of electric power grids. This problem seeks the minimum cost operating point for an electric power grid while satisfying both engineering requirements and physical laws describing how power flows through the electric network. By additionally considering the possibility of component failures and using an accurate AC…
▽ More
The optimal power flow problem is central to many tasks in the design and operation of electric power grids. This problem seeks the minimum cost operating point for an electric power grid while satisfying both engineering requirements and physical laws describing how power flows through the electric network. By additionally considering the possibility of component failures and using an accurate AC power flow model of the electric network, the security-constrained AC optimal power flow (SC-AC-OPF) problem is of paramount practical relevance. To assess recent progress in solution algorithms for SC-AC-OPF problems and spur new innovations, the U.S. Department of Energy's Advanced Research Projects Agency--Energy (ARPA-E) organized Challenge 1 of the Grid Optimization (GO) competition. This paper describes the SC-AC-OPF problem formulation used in the competition, overviews historical developments and the state of the art in SC-AC-OPF algorithms, discusses the competition, and summarizes the algorithms used by the top three teams in Challenge 1 of the GO Competition (Teams gollnlp, GO-SNIP, and GMI-GO).
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Stochastic Optimization Approaches for an Operating Room and Anesthesiologist Scheduling Problem
Authors:
Man Yiu Tsang,
Karmel S. Shehadeh,
Frank E. Curtis,
Beth Hochman,
Tricia E. Brentjens
Abstract:
We propose combined allocation, assignment, sequencing, and scheduling problems under uncertainty involving multiple operation rooms (ORs), anesthesiologists, and surgeries, as well as methodologies for solving such problems. Specifically, given sets of ORs, regular anesthesiologists, on-call anesthesiologists, and surgeries, our methodologies solve the following decision-making problems simultane…
▽ More
We propose combined allocation, assignment, sequencing, and scheduling problems under uncertainty involving multiple operation rooms (ORs), anesthesiologists, and surgeries, as well as methodologies for solving such problems. Specifically, given sets of ORs, regular anesthesiologists, on-call anesthesiologists, and surgeries, our methodologies solve the following decision-making problems simultaneously: (1) an allocation problem that decides which ORs to open and which on-call anesthesiologists to call in, (2) an assignment problem that assigns an OR and an anesthesiologist to each surgery, and (3) a sequencing and scheduling problem that determines the order of surgeries and their scheduled start times in each OR. To address uncertainty of each surgery's duration, we propose and analyze stochastic programming (SP) and distributionally robust optimization (DRO) models with both risk-neutral and risk-averse objectives. We obtain near-optimal solutions of our SP models using sample average approximation and propose a computationally efficient column-and-constraint generation method to solve our DRO models. In addition, we derive symmetry-breaking constraints that improve the models' solvability. Using real-world, publicly available surgery data and a case study from a health system in New York, we conduct extensive computational experiments comparing the proposed methodologies empirically and theoretically, demonstrating where significant performance improvements can be gained. Additionally, we derive several managerial insights relevant to practice.
△ Less
Submitted 12 January, 2024; v1 submitted 24 April, 2022;
originally announced April 2022.
-
Worst-Case Complexity of TRACE with Inexact Subproblem Solutions for Nonconvex Smooth Optimization
Authors:
Frank E. Curtis,
Qi Wang
Abstract:
An algorithm for solving nonconvex smooth optimization problems is proposed, analyzed, and tested. The algorithm is an extension of the Trust Region Algorithm with Contractions and Expansions (TRACE) [Math. Prog. 162(1):132, 2017]. In particular, the extension allows the algorithm to use inexact solutions of the arising subproblems, which is an important feature for solving large-scale problems. I…
▽ More
An algorithm for solving nonconvex smooth optimization problems is proposed, analyzed, and tested. The algorithm is an extension of the Trust Region Algorithm with Contractions and Expansions (TRACE) [Math. Prog. 162(1):132, 2017]. In particular, the extension allows the algorithm to use inexact solutions of the arising subproblems, which is an important feature for solving large-scale problems. Inexactness is allowed in a manner such that the optimal iteration complexity of ${\cal O}(ε^{-3/2})$ for attaining an $ε$-approximate first-order stationary point is maintained while the worst-case complexity in terms of Hessian-vector products may be significantly improved as compared to the original TRACE. Numerical experiments show the benefits of allowing inexact subproblem solutions and that the algorithm compares favorably to a state-of-the-art technique.
△ Less
Submitted 24 April, 2022;
originally announced April 2022.
-
Derivative-Free Bound-Constrained Optimization for Solving Structured Problems with Surrogate Models
Authors:
Frank E. Curtis,
Shima Dezfulian,
Andreas Wächter
Abstract:
We propose and analyze a model-based derivative-free (DFO) algorithm for solving bound-constrained optimization problems where the objective function is the composition of a smooth function and a vector of black-box functions. We assume that the black-box functions are smooth and the evaluation of them is the computational bottleneck of the algorithm. The distinguishing feature of our algorithm is…
▽ More
We propose and analyze a model-based derivative-free (DFO) algorithm for solving bound-constrained optimization problems where the objective function is the composition of a smooth function and a vector of black-box functions. We assume that the black-box functions are smooth and the evaluation of them is the computational bottleneck of the algorithm. The distinguishing feature of our algorithm is the use of approximate function values at interpolation points which can be obtained by an application-specific surrogate model that is cheap to evaluate. As an example, we consider the situation in which a sequence of related optimization problems is solved and present a regression-based approximation scheme that uses function values that were evaluated when solving prior problem instances. In addition, we propose and analyze a new algorithm for obtaining interpolation points that handles unrelaxable bound constraints. Our numerical results show that our algorithm outperforms a state-of-the-art DFO algorithm for solving a least-squares problem from a chemical engineering application when a history of black-box function evaluations is available.
△ Less
Submitted 2 January, 2024; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Worst-Case Complexity of an SQP Method for Nonlinear Equality Constrained Stochastic Optimization
Authors:
Frank E. Curtis,
Michael J. O'Neill,
Daniel P. Robinson
Abstract:
A worst-case complexity bound is proved for a sequential quadratic optimization (commonly known as SQP) algorithm that has been designed for solving optimization problems involving a stochastic objective function and deterministic nonlinear equality constraints. Barring additional terms that arise due to the adaptivity of the monotonically nonincreasing merit parameter sequence, the proved complex…
▽ More
A worst-case complexity bound is proved for a sequential quadratic optimization (commonly known as SQP) algorithm that has been designed for solving optimization problems involving a stochastic objective function and deterministic nonlinear equality constraints. Barring additional terms that arise due to the adaptivity of the monotonically nonincreasing merit parameter sequence, the proved complexity bound is comparable to that known for the stochastic gradient algorithm for unconstrained nonconvex optimization. The overall complexity bound, which accounts for the adaptivity of the merit parameter sequence, shows that a result comparable to the unconstrained setting (with additional logarithmic factors) holds with high probability.
△ Less
Submitted 6 January, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Measuring the stability of fundamental constants with a network of clocks
Authors:
G. Barontini,
L. Blackburn,
V. Boyer,
F. Butuc-Mayer,
X. Calmet,
J. R. Crespo Lopez-Urrutia,
E. A. Curtis,
B. Darquie,
J. Dunningham,
N. J. Fitch,
E. M. Forgan,
K. Georgiou,
P. Gill,
R. M. Godun,
J. Goldwin,
V. Guarrera,
A. C. Harwood,
I. R. Hill,
R. J. Hendricks,
M. Jeong,
M. Y. H. Johnson,
M. Keller,
L. P. Kozhiparambil Sajith,
F. Kuipers,
H. S. Margolis
, et al. (19 additional authors not shown)
Abstract:
The detection of variations of fundamental constants of the Standard Model would provide us with compelling evidence of new physics, and could lift the veil on the nature of dark matter and dark energy. In this work, we discuss how a network of atomic and molecular clocks can be used to look for such variations with unprecedented sensitivity over a wide range of time scales. This is precisely the…
▽ More
The detection of variations of fundamental constants of the Standard Model would provide us with compelling evidence of new physics, and could lift the veil on the nature of dark matter and dark energy. In this work, we discuss how a network of atomic and molecular clocks can be used to look for such variations with unprecedented sensitivity over a wide range of time scales. This is precisely the goal of the recently launched QSNET project: A network of clocks for measuring the stability of fundamental constants. QSNET will include state-of-the-art atomic clocks, but will also develop next-generation molecular and highly charged ion clocks with enhanced sensitivity to variations of fundamental constants. We describe the technological and scientific aims of QSNET and evaluate its expected performance. We show that in the range of parameters probed by QSNET, either we will discover new physics, or we will impose new constraints on violations of fundamental symmetries and a range of theories beyond the Standard Model, including dark matter and dark energy models.
△ Less
Submitted 11 May, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
A Decomposition Algorithm for Large-Scale Security-Constrained AC Optimal Power Flow
Authors:
Frank E. Curtis,
Daniel K. Molzahn,
Shenyinying Tu,
Andreas Wächter,
Ermin Wei,
Elizabeth Wong
Abstract:
A decomposition algorithm for solving large-scale security-constrained AC optimal power flow problems is presented. The formulation considered is the one used in the ARPA-E Grid Optimization (GO) Competition, Challenge 1, held from November 2018 through October 2019. The techniques found to be most effective in terms of performance in the challenge are presented, including strategies for contingen…
▽ More
A decomposition algorithm for solving large-scale security-constrained AC optimal power flow problems is presented. The formulation considered is the one used in the ARPA-E Grid Optimization (GO) Competition, Challenge 1, held from November 2018 through October 2019. The techniques found to be most effective in terms of performance in the challenge are presented, including strategies for contingency selection, fast contingency evaluation, handling complementarity constraints, avoiding issues related to degeneracy, and exploiting parallelism. The results of numerical experiments are provided to demonstrate the effectiveness of the proposed techniques as compared to alternative strategies.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Inexact Sequential Quadratic Optimization for Minimizing a Stochastic Objective Function Subject to Deterministic Nonlinear Equality Constraints
Authors:
Frank E. Curtis,
Daniel P. Robinson,
Baoyu Zhou
Abstract:
An algorithm is proposed, analyzed, and tested experimentally for solving stochastic optimization problems in which the decision variables are constrained to satisfy equations defined by deterministic, smooth, and nonlinear functions. It is assumed that constraint function and derivative values can be computed, but that only stochastic approximations are available for the objective function and it…
▽ More
An algorithm is proposed, analyzed, and tested experimentally for solving stochastic optimization problems in which the decision variables are constrained to satisfy equations defined by deterministic, smooth, and nonlinear functions. It is assumed that constraint function and derivative values can be computed, but that only stochastic approximations are available for the objective function and its derivatives. The algorithm is of the sequential quadratic optimization variety. A distinguishing feature of the algorithm is that it allows inexact subproblem solutions to be employed, which is particularly useful in large-scale settings when the matrices defining the subproblems are too large to form and/or factorize. Conditions are imposed on the inexact subproblem solutions that account for the fact that only stochastic objective gradient estimates are available. Convergence results in expectation are established for the method. Numerical experiments show that it outperforms an alternative algorithm that employs highly accurate subproblem solutions in every iteration.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians
Authors:
Albert S. Berahas,
Frank E. Curtis,
Michael J. O'Neill,
Daniel P. Robinson
Abstract:
A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear equality constrained optimization problems in which the objective function is defined by an expectation of a stochastic function. The algorithmic structure of the proposed method is based on a step decomposition strategy that is known in the literature to be widely effective in practice, wherein each search dire…
▽ More
A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear equality constrained optimization problems in which the objective function is defined by an expectation of a stochastic function. The algorithmic structure of the proposed method is based on a step decomposition strategy that is known in the literature to be widely effective in practice, wherein each search direction is computed as the sum of a normal step (toward linearized feasibility) and a tangential step (toward objective decrease in the null space of the constraint Jacobian). However, the proposed method is unique from others in the literature in that it both allows the use of stochastic objective gradient estimates and possesses convergence guarantees even in the setting in which the constraint Jacobians may be rank deficient. The results of numerical experiments demonstrate that the algorithm offers superior performance when compared to popular alternatives.
△ Less
Submitted 16 March, 2023; v1 submitted 24 June, 2021;
originally announced June 2021.
-
Practical Optimal Control of a Wave-Energy Converter in Regular Wave Environments
Authors:
Mertcan Yetkin,
Sudharsan Kalidoss,
Frank E. Curtis,
Lawrence V. Snyder,
Arindam Banerjee
Abstract:
A generic formulation for the optimal control of a single wave-energy converter (WEC) is proposed. The formulation involves hard and soft constraints on the motion of the WEC to promote reduced damage and fatigue to the device during operation. Most of the WEC control literature ignores the cost of the control and could therefore result in generating less power than expected, or even negative powe…
▽ More
A generic formulation for the optimal control of a single wave-energy converter (WEC) is proposed. The formulation involves hard and soft constraints on the motion of the WEC to promote reduced damage and fatigue to the device during operation. Most of the WEC control literature ignores the cost of the control and could therefore result in generating less power than expected, or even negative power. Therefore, to ensure actual power gains in practice, we incorporate a penalty term in the objective function to approximate the cost of applying the control force. A discretization of the resulting optimal control problem is a quadratic optimization problem that can be solved efficiently using state-of-the-art solvers. Using hydrodynamic coefficients estimated by simulations made in WEC-Sim, numerical illustrations are provided of the trade-off between careful operation of the device and power generated. Finally, a demonstration of the real-time use of the approach is provided.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
A Subspace Acceleration Method for Minimization Involving a Group Sparsity-Inducing Regularizer
Authors:
Frank E. Curtis,
Yutong Dai,
Daniel P. Robinson
Abstract:
We consider the problem of minimizing an objective function that is the sum of a convex function and a group sparsity-inducing regularizer. Problems that integrate such regularizers arise in modern machine learning applications, often for the purpose of obtaining models that are easier to interpret and that have higher predictive accuracy. We present a new method for solving such problems that uti…
▽ More
We consider the problem of minimizing an objective function that is the sum of a convex function and a group sparsity-inducing regularizer. Problems that integrate such regularizers arise in modern machine learning applications, often for the purpose of obtaining models that are easier to interpret and that have higher predictive accuracy. We present a new method for solving such problems that utilize subspace acceleration, domain decomposition, and support identification. Our analysis shows, under common assumptions, that the iterate sequence generated by our framework is globally convergent, converges to an $ε$-approximate solution in at most $O(ε^{-(1+p)})$ (respectively, $O(ε^{-(2+p)})$) iterations for all $ε$ bounded above and large enough (respectively, all $ε$ bounded above) where $p > 0$ is an algorithm parameter, and exhibits superlinear local convergence. Preliminary numerical results for the task of binary classification based on regularized logistic regression show that our approach is efficient and robust, with the ability to outperform a state-of-the-art method.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Sequential Quadratic Optimization for Nonlinear Equality Constrained Stochastic Optimization
Authors:
Albert Berahas,
Frank E. Curtis,
Daniel P. Robinson,
Baoyu Zhou
Abstract:
Sequential quadratic optimization algorithms are proposed for solving smooth nonlinear optimization problems with equality constraints. The main focus is an algorithm proposed for the case when the constraint functions are deterministic, and constraint function and derivative values can be computed explicitly, but the objective function is stochastic. It is assumed in this setting that it is intra…
▽ More
Sequential quadratic optimization algorithms are proposed for solving smooth nonlinear optimization problems with equality constraints. The main focus is an algorithm proposed for the case when the constraint functions are deterministic, and constraint function and derivative values can be computed explicitly, but the objective function is stochastic. It is assumed in this setting that it is intractable to compute objective function and derivative values explicitly, although one can compute stochastic function and gradient estimates. As a starting point for this stochastic setting, an algorithm is proposed for the deterministic setting that is modeled after a state-of-the-art line-search SQP algorithm, but uses a stepsize selection scheme based on Lipschitz constants (or adaptively estimated Lipschitz constants) in place of the line search. This sets the stage for the proposed algorithm for the stochastic setting, for which it is assumed that line searches would be intractable. Under reasonable assumptions, convergence (resp.,~convergence in expectation) from remote starting points is proved for the proposed deterministic (resp.,~stochastic) algorithm. The results of numerical experiments demonstrate the practical performance of our proposed techniques.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Gradient Sampling Methods with Inexact Subproblem Solutions and Gradient Aggregation
Authors:
Frank E. Curtis,
Minhan Li
Abstract:
Gradient sampling (GS) has proved to be an effective methodology for the minimization of objective functions that may be nonconvex and/or nonsmooth. The most computationally expensive component of a contemporary GS method is the need to solve a convex quadratic subproblem in each iteration. In this paper, a strategy is proposed that allows the use of inexact solutions of these subproblems, which,…
▽ More
Gradient sampling (GS) has proved to be an effective methodology for the minimization of objective functions that may be nonconvex and/or nonsmooth. The most computationally expensive component of a contemporary GS method is the need to solve a convex quadratic subproblem in each iteration. In this paper, a strategy is proposed that allows the use of inexact solutions of these subproblems, which, as proved in the paper, can be incorporated without the loss of theoretical convergence guarantees. Numerical experiments show that by exploiting inexact subproblem solutions, one can consistently reduce the computational effort required by a GS method. Additionally, a strategy is proposed for aggregating gradient information after a subproblem is solved (potentially inexactly), as has been exploited in bundle methods for nonsmooth optimization. It is proved that the aggregation scheme can be introduced without the loss of theoretical convergence guarantees. Numerical experiments show that incorporating this gradient aggregation approach substantially reduces the computational effort required by a GS method.
△ Less
Submitted 6 August, 2021; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Adaptive Stochastic Optimization
Authors:
Frank E. Curtis,
Katya Scheinberg
Abstract:
Optimization lies at the heart of machine learning and signal processing. Contemporary approaches based on the stochastic gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the…
▽ More
Optimization lies at the heart of machine learning and signal processing. Contemporary approaches based on the stochastic gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the potential to offer significant computational savings when training large-scale systems.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
Trust-Region Newton-CG with Strong Second-Order Complexity Guarantees for Nonconvex Optimization
Authors:
Frank E. Curtis,
Daniel P. Robinson,
Clément Royer,
Stephen J. Wright
Abstract:
Worst-case complexity guarantees for nonconvex optimization algorithms have been a topic of growing interest. Multiple frameworks that achieve the best known complexity bounds among a broad class of first- and second-order strategies have been proposed. These methods have often been designed primarily with complexity guarantees in mind and, as a result, represent a departure from the algorithms th…
▽ More
Worst-case complexity guarantees for nonconvex optimization algorithms have been a topic of growing interest. Multiple frameworks that achieve the best known complexity bounds among a broad class of first- and second-order strategies have been proposed. These methods have often been designed primarily with complexity guarantees in mind and, as a result, represent a departure from the algorithms that have proved to be the most effective in practice. In this paper, we consider trust-region Newton methods, one of the most popular classes of algorithms for solving nonconvex optimization problems. By introducing slight modifications to the original scheme, we obtain two methods -- one based on exact subproblem solves and one exploiting inexact subproblem solves as in the popular "trust-region Newton-Conjugate-Gradient" (trust-region Newton-CG) method -- with iteration and operation complexity bounds that match the best known bounds for the aforementioned class of first- and second-order methods. The resulting trust-region Newton-CG method also retains the attractive practical behavior of classical trust-region Newton-CG, which we demonstrate with numerical comparisons on a standard benchmark test set.
△ Less
Submitted 19 November, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
A Fully Stochastic Second-Order Trust Region Method
Authors:
Frank E. Curtis,
Rui Shi
Abstract:
A stochastic second-order trust region method is proposed, which can be viewed as a second-order extension of the trust-region-ish (TRish) algorithm proposed by Curtis et al. (INFORMS J. Optim. 1(3) 200-220, 2019). In each iteration, a search direction is computed by (approximately) solving a trust region subproblem defined by stochastic gradient and Hessian estimates. The algorithm has convergenc…
▽ More
A stochastic second-order trust region method is proposed, which can be viewed as a second-order extension of the trust-region-ish (TRish) algorithm proposed by Curtis et al. (INFORMS J. Optim. 1(3) 200-220, 2019). In each iteration, a search direction is computed by (approximately) solving a trust region subproblem defined by stochastic gradient and Hessian estimates. The algorithm has convergence guarantees for stochastic minimization in the fully stochastic regime, meaning that guarantees hold when each stochastic gradient is required merely to be an unbiased estimate of the true gradient with bounded variance and when the stochastic Hessian estimates are bounded uniformly in norm. The algorithm is also equipped with a worst-case complexity guarantee in the nearly deterministic regime, i.e., when the stochastic gradient and Hessian estimates are very close in expectation to the true gradients and Hessians. The results of numerical experiments for training convolutional neural networks for image classification and training a recurrent neural network for time series forecasting are presented. These results show that the algorithm can outperform a stochastic gradient approach and the first-order TRish algorithm in practice.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Search for transient variations of the fine structure constant and dark matter using fiber-linked optical atomic clocks
Authors:
B. M. Roberts,
P. Delva,
A. Al-Masoudi,
A. Amy-Klein,
C. Bærentsen,
C. F. A. Baynham,
E. Benkler,
S. Bilicki,
S. Bize,
W. Bowden,
J. Calvert,
V. Cambier,
E. Cantin,
E. A. Curtis,
S. Dörscher,
M. Favier,
F. Frank,
P. Gill,
R. M. Godun,
G. Grosche,
C. Guo,
A. Hees,
I. R. Hill,
R. Hobson,
N. Huntemann
, et al. (29 additional authors not shown)
Abstract:
We search for transient variations of the fine structure constant using data from a European network of fiber-linked optical atomic clocks. By searching for coherent variations in the recorded clock frequency comparisons across the network, we significantly improve the constraints on transient variations of the fine structure constant. For example, we constrain the variation in alpha to <5*10^-17…
▽ More
We search for transient variations of the fine structure constant using data from a European network of fiber-linked optical atomic clocks. By searching for coherent variations in the recorded clock frequency comparisons across the network, we significantly improve the constraints on transient variations of the fine structure constant. For example, we constrain the variation in alpha to <5*10^-17 for transients of duration 10^3 s. This analysis also presents a possibility to search for dark matter, the mysterious substance hypothesised to explain galaxy dynamics and other astrophysical phenomena that is thought to dominate the matter density of the universe. At the current sensitivity level, we find no evidence for dark matter in the form of topological defects (or, more generally, any macroscopic objects), and we thus place constraints on certain potential couplings between the dark matter and standard model particles, substantially improving upon the existing constraints, particularly for large (>~10^4 km) objects.
△ Less
Submitted 8 July, 2019; v1 submitted 4 July, 2019;
originally announced July 2019.
-
Limited-Memory BFGS with Displacement Aggregation
Authors:
Albert S. Berahas,
Frank E. Curtis,
Baoyu Zhou
Abstract:
A displacement aggregation strategy is proposed for the curvature pairs stored in a limited-memory BFGS (a.k.a. L-BFGS) method such that the resulting (inverse) Hessian approximations are equal to those that would be derived from a full-memory BFGS method. This means that, if a sufficiently large number of pairs are stored, then an optimization algorithm employing the limited-memory method can ach…
▽ More
A displacement aggregation strategy is proposed for the curvature pairs stored in a limited-memory BFGS (a.k.a. L-BFGS) method such that the resulting (inverse) Hessian approximations are equal to those that would be derived from a full-memory BFGS method. This means that, if a sufficiently large number of pairs are stored, then an optimization algorithm employing the limited-memory method can achieve the same theoretical convergence properties as when full-memory (inverse) Hessian approximations are stored and employed, such as a local superlinear rate of convergence under assumptions that are common for attaining such guarantees. To the best of our knowledge, this is the first work in which a local superlinear convergence rate guarantee is offered by a quasi-Newton scheme that does not either store all curvature pairs throughout the entire run of the optimization algorithm or store an explicit (inverse) Hessian approximation. Numerical results are presented to show that displacement aggregation within an adaptive L-BFGS scheme can lead to better performance than standard L-BFGS.
△ Less
Submitted 25 August, 2020; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Gradient Sampling Methods for Nonsmooth Optimization
Authors:
James V. Burke,
Frank E. Curtis,
Adrian S. Lewis,
Michael L. Overton,
Lucas E. A. Simões
Abstract:
This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide o…
▽ More
This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide overviews of various enhancements that have been proposed to improve practical performance, as well as of several extensions that have been made in the literature, such as to solve constrained problems. The paper also includes clarification of certain technical aspects of the analysis of gradient sampling algorithms, most notably related to the assumptions one needs to make about the set of points at which the objective is continuously differentiable. Finally, we discuss possible future research directions.
△ Less
Submitted 29 April, 2018;
originally announced April 2018.
-
Inexact Sequential Quadratic Optimization with Penalty Parameter Updates Within the QP Solve: Extended Version
Authors:
James V. Burke,
Frank E. Curtis,
Hao Wang,
Jiashan Wang
Abstract:
This paper focuses on the design of sequential quadratic optimization (commonly known as SQP) methods for solving large-scale nonlinear optimization problems. The most computationally demanding aspect of such an approach is the computation of the search direction during each iteration, for which we consider the use of matrix-free methods. In particular, we develop a method that requires an inexact…
▽ More
This paper focuses on the design of sequential quadratic optimization (commonly known as SQP) methods for solving large-scale nonlinear optimization problems. The most computationally demanding aspect of such an approach is the computation of the search direction during each iteration, for which we consider the use of matrix-free methods. In particular, we develop a method that requires an inexact solve of a single QP subproblem to establish the convergence of the overall SQP method. It is known that SQP methods can be plagued by poor behavior of the global convergence mechanism. To confront this issue, we propose the use of an exact penalty function with a dynamic penalty parameter updating strategy to be employed within the subproblem solver in such a way that the resulting search direction predicts progress toward both feasibility and optimality. We present our parameter updating strategy and prove that, under reasonable assumptions, the strategy does not modify the penalty parameter unnecessarily. We also discuss a matrix-free subproblem solver in which our updating strategy can be incorporated. We close the paper with a discussion of the results of numerical experiments that illustrate the benefits of our proposed techniques.
△ Less
Submitted 26 February, 2020; v1 submitted 25 March, 2018;
originally announced March 2018.
-
ADMM for Multiaffine Constrained Optimization
Authors:
Wenbo Gao,
Donald Goldfarb,
Frank E. Curtis
Abstract:
We expand the scope of the alternating direction method of multipliers (ADMM). Specifically, we show that ADMM, when employed to solve problems with multiaffine constraints that satisfy certain verifiable assumptions, converges to the set of constrained stationary points if the penalty parameter in the augmented Lagrangian is sufficiently large. When the Kurdyka-Łojasiewicz (K-Ł) property holds, t…
▽ More
We expand the scope of the alternating direction method of multipliers (ADMM). Specifically, we show that ADMM, when employed to solve problems with multiaffine constraints that satisfy certain verifiable assumptions, converges to the set of constrained stationary points if the penalty parameter in the augmented Lagrangian is sufficiently large. When the Kurdyka-Łojasiewicz (K-Ł) property holds, this is strengthened to convergence to a single constrained stationary point. Our analysis applies under assumptions that we have endeavored to make as weak as possible. It applies to problems that involve nonconvex and/or nonsmooth objective terms, in addition to the multiaffine constraints that can involve multiple (three or more) blocks of variables. To illustrate the applicability of our results, we describe examples including nonnegative matrix factorization, sparse learning, risk parity portfolio selection, nonconvex formulations of convex problems, and neural network training. In each case, our ADMM approach encounters only subproblems that have closed-form solutions.
△ Less
Submitted 22 October, 2019; v1 submitted 26 February, 2018;
originally announced February 2018.
-
Concise Complexity Analyses for Trust-Region Methods
Authors:
Frank E. Curtis,
Zachary Lubberts,
Daniel P. Robinson
Abstract:
Concise complexity analyses are presented for simple trust region algorithms for solving unconstrained optimization problems. In contrast to a traditional trust region algorithm, the algorithms considered in this paper require certain control over the choice of trust region radius after any successful iteration. The analyses highlight the essential algorithm components required to obtain certain c…
▽ More
Concise complexity analyses are presented for simple trust region algorithms for solving unconstrained optimization problems. In contrast to a traditional trust region algorithm, the algorithms considered in this paper require certain control over the choice of trust region radius after any successful iteration. The analyses highlight the essential algorithm components required to obtain certain complexity bounds. In addition, a new update strategy for the trust region radius is proposed that offers a second-order complexity bound.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Regional Complexity Analysis of Algorithms for Nonconvex Smooth Optimization
Authors:
Frank E. Curtis,
Daniel P. Robinson
Abstract:
A strategy is proposed for characterizing the worst-case performance of algorithms for solving nonconvex smooth optimization problems. Contemporary analyses characterize worst-case performance by providing, under certain assumptions on an objective function, an upper bound on the number of iterations (or function or derivative evaluations) required until a pth-order stationarity condition is appro…
▽ More
A strategy is proposed for characterizing the worst-case performance of algorithms for solving nonconvex smooth optimization problems. Contemporary analyses characterize worst-case performance by providing, under certain assumptions on an objective function, an upper bound on the number of iterations (or function or derivative evaluations) required until a pth-order stationarity condition is approximately satisfied. This arguably leads to conservative characterizations based on anomalous objectives rather than on ones that are typically encountered in practice. By contrast, the strategy proposed in this paper characterizes worst-case performance separately over regions comprising a search space. These regions are defined generically based on properties of derivative values. In this manner, one can analyze the worst-case performance of an algorithm independently from any particular class of objectives. Then, once given a class of objectives, one can obtain an informative, fine-tuned complexity analysis merely by delineating the types of regions that comprise the search spaces for functions in the class. Regions defined by first- and second-order derivatives are discussed in detail and example complexity analyses are provided for a few fundamental first- and second-order algorithms when employed to minimize convex and nonconvex objectives of interest. It is also explained how the strategy can be generalized to regions defined by higher-order derivatives and for analyzing the behavior of higher-order algorithms.
△ Less
Submitted 24 August, 2018; v1 submitted 3 February, 2018;
originally announced February 2018.
-
Measurement of differential polarizabilities at a mid-infrared wavelength in $^{171}\mathrm{Yb}^+$
Authors:
C F A Baynham,
E A Curtis,
R M Godun,
J M Jones,
P B R Nisbet-Jones,
P E G Baird,
K Bongs,
P Gill,
T Fordell,
T Hieta,
T Lindvall,
M T Spidell,
J H Lehman
Abstract:
An atom exposed to an electric field will experience Stark shifts of its internal energy levels, proportional to their polarizabilities. In optical frequency metrology, the Stark shift due to background black-body radiation (BBR) modifies the frequency of the optical clock transition, and often represents a large contribution to a clock's uncertainty budget. For clocks based on singly-ionized ytte…
▽ More
An atom exposed to an electric field will experience Stark shifts of its internal energy levels, proportional to their polarizabilities. In optical frequency metrology, the Stark shift due to background black-body radiation (BBR) modifies the frequency of the optical clock transition, and often represents a large contribution to a clock's uncertainty budget. For clocks based on singly-ionized ytterbium, the ion's complex structure makes this shift difficult to calculate theoretically. We present a measurement of the differential polarizabilities of two ultra-narrow optical clock transitions present in $^{171}\mathrm{Yb}^+$, performed by exposing the ion to an oscillating electric field at a wavelength in the region of room temperature BBR spectra. By measuring the frequency shift to the transitions caused by a laser at $λ=7.17 μm$, we obtain values for scalar and tensor differential polarizabilities with uncertainties at the percent level for both the electric quadrupole and octupole transitions at 436nm and 467nm respectively. These values agree with previously reported experimental measurements and, in the case of the electric quadrupole transition, allow a 5-fold improvement in the determination of the room-temperature BBR shift.
However, we note significant concerns over the validity of the uncertainty charactarization presented and draw the reader's attention to the Note on applicability section for a discussion.
△ Less
Submitted 10 September, 2020; v1 submitted 30 January, 2018;
originally announced January 2018.
-
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
Authors:
Frank E. Curtis,
Katya Scheinberg,
Rui Shi
Abstract:
An algorithm is proposed for solving stochastic and finite sum minimization problems. Based on a trust region methodology, the algorithm employs normalized steps, at least as long as the norms of the stochastic gradient estimates are within a specified interval. The complete algorithm---which dynamically chooses whether or not to employ normalized steps---is proved to have convergence guarantees t…
▽ More
An algorithm is proposed for solving stochastic and finite sum minimization problems. Based on a trust region methodology, the algorithm employs normalized steps, at least as long as the norms of the stochastic gradient estimates are within a specified interval. The complete algorithm---which dynamically chooses whether or not to employ normalized steps---is proved to have convergence guarantees that are similar to those possessed by a traditional stochastic gradient approach under various sets of conditions related to the accuracy of the stochastic gradient estimates and choice of stepsize sequence. The results of numerical experiments are presented when the method is employed to minimize convex and nonconvex machine learning test problems. These results illustrate that the method can outperform a traditional stochastic gradient approach.
△ Less
Submitted 26 June, 2018; v1 submitted 29 December, 2017;
originally announced December 2017.
-
An Accelerated Communication-Efficient Primal-Dual Optimization Framework for Structured Machine Learning
Authors:
Chenxin Ma,
Martin Jaggi,
Frank E. Curtis,
Nathan Srebro,
Martin Takáč
Abstract:
Distributed optimization algorithms are essential for training machine learning models on very large-scale datasets. However, they often suffer from communication bottlenecks. Confronting this issue, a communication-efficient primal-dual coordinate ascent framework (CoCoA) and its improved variant CoCoA+ have been proposed, achieving a convergence rate of $\mathcal{O}(1/t)$ for solving empirical r…
▽ More
Distributed optimization algorithms are essential for training machine learning models on very large-scale datasets. However, they often suffer from communication bottlenecks. Confronting this issue, a communication-efficient primal-dual coordinate ascent framework (CoCoA) and its improved variant CoCoA+ have been proposed, achieving a convergence rate of $\mathcal{O}(1/t)$ for solving empirical risk minimization problems with Lipschitz continuous losses. In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of $\mathcal{O}(1/t^2)$ in terms of reducing suboptimality. The analysis of this rate is also notable in that the convergence rate bounds involve constants that, except in extreme cases, are significantly reduced compared to those previously provided for CoCoA+. The results of numerical experiments are provided to show that acceleration can lead to significant performance gains.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
A Self-Correcting Variable-Metric Algorithm Framework for Nonsmooth Optimization
Authors:
Frank E. Curtis,
Daniel P. Robinson,
Baoyu Zhou
Abstract:
An algorithm framework is proposed for minimizing nonsmooth functions. The framework is variable-metric in that, in each iteration, a step is computed using a symmetric positive definite matrix whose value is updated as in a quasi-Newton scheme. However, unlike previously proposed variable-metric algorithms for minimizing nonsmooth functions, the framework exploits self-correcting properties made…
▽ More
An algorithm framework is proposed for minimizing nonsmooth functions. The framework is variable-metric in that, in each iteration, a step is computed using a symmetric positive definite matrix whose value is updated as in a quasi-Newton scheme. However, unlike previously proposed variable-metric algorithms for minimizing nonsmooth functions, the framework exploits self-correcting properties made possible through BFGS-type updating. In so doing, the framework does not overly restrict the manner in which the step computation matrices are updated, yet the scheme is controlled well enough that global convergence guarantees can be established. The results of numerical experiments for a few algorithms are presented to demonstrate the self-correcting behaviors that are guaranteed by the framework.
△ Less
Submitted 2 February, 2019; v1 submitted 8 August, 2017;
originally announced August 2017.
-
An Inexact Regularized Newton Framework with a Worst-Case Iteration Complexity of $\mathcal{O}(ε^{-3/2})$ for Nonconvex Optimization
Authors:
Frank E. Curtis,
Daniel P. Robinson,
Mohammadreza Samadi
Abstract:
An algorithm for solving smooth nonconvex optimization problems is proposed that, in the worst-case, takes $\mathcal{O}(ε^{-3/2})$ iterations to drive the norm of the gradient of the objective function below a prescribed positive real number $ε$ and can take $\mathcal{O}(ε^{-3})$ iterations to drive the leftmost eigenvalue of the Hessian of the objective above $-ε$. The proposed algorithm is a gen…
▽ More
An algorithm for solving smooth nonconvex optimization problems is proposed that, in the worst-case, takes $\mathcal{O}(ε^{-3/2})$ iterations to drive the norm of the gradient of the objective function below a prescribed positive real number $ε$ and can take $\mathcal{O}(ε^{-3})$ iterations to drive the leftmost eigenvalue of the Hessian of the objective above $-ε$. The proposed algorithm is a general framework that covers a wide range of techniques including quadratically and cubically regularized Newton methods, such as the Adaptive Regularisation using Cubics (ARC) method and the recently proposed Trust-Region Algorithm with Contractions and Expansions (TRACE). The generality of our method is achieved through the introduction of generic conditions that each trial step is required to satisfy, which in particular allow for inexact regularized Newton steps to be used. These conditions center around a new subproblem that can be approximately solved to obtain trial steps that satisfy the conditions. A new instance of the framework, distinct from ARC and TRACE, is described that may be viewed as a hybrid between quadratically and cubically regularized Newton methods. Numerical results demonstrate that our hybrid algorithm outperforms a cublicly regularized Newton method.
△ Less
Submitted 14 March, 2018; v1 submitted 1 August, 2017;
originally announced August 2017.
-
Complexity Analysis of a Trust Funnel Algorithm for Equality Constrained Optimization
Authors:
Frank E. Curtis,
Daniel P. Robinson,
Mohammadreza Samadi
Abstract:
A method is proposed for solving equality constrained nonlinear optimization problems involving twice continuously differentiable functions. The method employs a trust funnel approach consisting of two phases: a first phase to locate an $ε$-feasible point and a second phase to seek optimality while maintaining at least $ε$-feasibility. A two-phase approach of this kind based on a cubic regularizat…
▽ More
A method is proposed for solving equality constrained nonlinear optimization problems involving twice continuously differentiable functions. The method employs a trust funnel approach consisting of two phases: a first phase to locate an $ε$-feasible point and a second phase to seek optimality while maintaining at least $ε$-feasibility. A two-phase approach of this kind based on a cubic regularization methodology was recently proposed along with a supporting worst-case iteration complexity analysis. Unfortunately, however, in that approach, the objective function is completely ignored in the first phase when $ε$-feasibility is sought. The main contribution of the method proposed in this paper is that the same worst-case iteration complexity is achieved, but with a first phase that also accounts for improvements in the objective function. As such, the method typically requires fewer iterations in the second phase, as the results of numerical experiments demonstrate.
△ Less
Submitted 2 July, 2017;
originally announced July 2017.
-
Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning
Authors:
Frank E. Curtis,
Katya Scheinberg
Abstract:
The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a super…
▽ More
The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a supervised learning problem and show how it leads to various optimization problems, depending on the context and underlying assumptions. We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks. The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing stochastic methods, and second-order methods. Finally, we discuss how these approaches can be employed to the training of deep neural networks, emphasizing the difficulties that arise from the complex, nonconvex structure of these models.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Exploiting Negative Curvature in Deterministic and Stochastic Optimization
Authors:
Frank E. Curtis,
Daniel P. Robinson
Abstract:
This paper addresses the question of whether it can be beneficial for an optimization algorithm to follow directions of negative curvature. Although prior work has established convergence results for algorithms that integrate both descent and negative curvature steps, there has not yet been extensive numerical evidence showing that such methods offer consistent performance improvements. In this pa…
▽ More
This paper addresses the question of whether it can be beneficial for an optimization algorithm to follow directions of negative curvature. Although prior work has established convergence results for algorithms that integrate both descent and negative curvature steps, there has not yet been extensive numerical evidence showing that such methods offer consistent performance improvements. In this paper, we present new frameworks for combining descent and negative curvature directions: alternating two-step approaches and dynamic step approaches. The aspect that distinguishes our approaches from ones previously proposed is that they make algorithmic decisions based on (estimated) upper-bounding models of the objective function. A consequence of this aspect is that our frameworks can, in theory, employ fixed stepsizes, which makes the methods readily translatable from deterministic to stochastic settings. For deterministic problems, we show that instances of our dynamic framework yield gains in performance compared to related methods that only follow descent steps. We also show that gains can be made in a stochastic setting in cases when a standard stochastic-gradient-type method might make slow progress.
△ Less
Submitted 3 April, 2018; v1 submitted 1 March, 2017;
originally announced March 2017.
-
R-Linear Convergence of Limited Memory Steepest Descent
Authors:
Frank E. Curtis,
Wei Guo
Abstract:
The limited memory steepest descent method (LMSD) proposed by Fletcher is an extension of the Barzilai-Borwein "two-point step size" strategy for steepest descent methods for solving unconstrained optimization problems. It is known that the Barzilai-Borwein strategy yields a method with an R-linear rate of convergence when it is employed to minimize a strongly convex quadratic. This paper extends…
▽ More
The limited memory steepest descent method (LMSD) proposed by Fletcher is an extension of the Barzilai-Borwein "two-point step size" strategy for steepest descent methods for solving unconstrained optimization problems. It is known that the Barzilai-Borwein strategy yields a method with an R-linear rate of convergence when it is employed to minimize a strongly convex quadratic. This paper extends this analysis for LMSD, also for strongly convex quadratics. In particular, it is shown that the method is R-linearly convergent for any choice of the history length parameter. The results of numerical experiments are provided to illustrate behaviors of the method that are revealed through the theoretical analysis.
△ Less
Submitted 12 October, 2016;
originally announced October 2016.
-
Optimization Methods for Large-Scale Machine Learning
Authors:
Léon Bottou,
Frank E. Curtis,
Jorge Nocedal
Abstract:
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine…
▽ More
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
△ Less
Submitted 8 February, 2018; v1 submitted 15 June, 2016;
originally announced June 2016.
-
A Reduced-Space Algorithm for Minimizing $\ell_1$-Regularized Convex Functions
Authors:
Tianyi Chen,
Frank E. Curtis,
Daniel P. Robinson
Abstract:
We present a new method for minimizing the sum of a differentiable convex function and an $\ell_1$-norm regularizer. The main features of the new method include: $(i)$ an evolving set of indices corresponding to variables that are predicted to be nonzero at a solution (i.e., the support); $(ii)$ a reduced-space subproblem defined in terms of the predicted support; $(iii)$ conditions that determine…
▽ More
We present a new method for minimizing the sum of a differentiable convex function and an $\ell_1$-norm regularizer. The main features of the new method include: $(i)$ an evolving set of indices corresponding to variables that are predicted to be nonzero at a solution (i.e., the support); $(ii)$ a reduced-space subproblem defined in terms of the predicted support; $(iii)$ conditions that determine how accurately each subproblem must be solved, which allow for Newton, Newton-CG, and coordinate-descent techniques to be employed; $(iv)$ a computationally practical condition that determines when the predicted support should be updated; and $(v)$ a reduced proximal gradient step that ensures sufficient decrease in the objective function when it is decided that variables should be added to the predicted support. We prove a convergence guarantee for our method and demonstrate its efficiency on a large set of model prediction problems.
△ Less
Submitted 22 February, 2016;
originally announced February 2016.
-
A low maintenance Sr optical lattice clock
Authors:
Ian R. Hill,
Richard Hobson,
William Bowden,
Elizabeth M. Bridge,
Sean Donnellan,
E. Anne Curtis,
Patrick Gill
Abstract:
We describe the Sr optical lattice clock apparatus at NPL with particular emphasis on techniques used to increase reliability and minimise the human requirement in its operation. Central to this is a clock-referenced transfer cavity scheme for the stabilisation of cooling and trap** lasers. We highlight several measures to increase the reliability of the clock with a view towards the realisation…
▽ More
We describe the Sr optical lattice clock apparatus at NPL with particular emphasis on techniques used to increase reliability and minimise the human requirement in its operation. Central to this is a clock-referenced transfer cavity scheme for the stabilisation of cooling and trap** lasers. We highlight several measures to increase the reliability of the clock with a view towards the realisation of an optical time-scale. The clock contributed 502 hours of data over a 25 day period (84% uptime) in a recent measurement campaign with several uninterrupted periods of more than 48 hours. An instability of $2\times10^{-17}$ was reached after $10^5$ s of averaging in an interleaved self-comparison of the clock.
△ Less
Submitted 18 February, 2016;
originally announced February 2016.
-
Primal-Dual Active-Set Methods for Isotonic Regression and Trend Filtering
Authors:
Zheng Han,
Frank E. Curtis
Abstract:
Isotonic regression (IR) is a non-parametric calibration method used in supervised learning. For performing large-scale IR, we propose a primal-dual active-set (PDAS) algorithm which, in contrast to the state-of-the-art Pool Adjacent Violators (PAV) algorithm, can be parallized and is easily warm-started thus well-suited in the online settings. We prove that, like the PAV algorithm, our PDAS algor…
▽ More
Isotonic regression (IR) is a non-parametric calibration method used in supervised learning. For performing large-scale IR, we propose a primal-dual active-set (PDAS) algorithm which, in contrast to the state-of-the-art Pool Adjacent Violators (PAV) algorithm, can be parallized and is easily warm-started thus well-suited in the online settings. We prove that, like the PAV algorithm, our PDAS algorithm for IR is convergent and has a work complexity of O(n), though our numerical experiments suggest that our PDAS algorithm is often faster than PAV. In addition, we propose PDAS variants (with safeguarding to ensure convergence) for solving related trend filtering (TF) problems, providing the results of experiments to illustrate their effectiveness.
△ Less
Submitted 3 April, 2016; v1 submitted 10 August, 2015;
originally announced August 2015.
-
Beads on a string: Structure of bound aggregates of globular particles and long polymer chains
Authors:
Anton Souslov,
Jennifer E. Curtis,
Paul M. Goldbart
Abstract:
Macroscopic properties of suspensions, such as those composed of globular particles (e.g., colloidal or macromolecular), can be tuned by controlling the equilibrium aggregation of the particles. We examine how aggregation -- and, hence, macroscopic properties -- can be controlled in a system composed of both globular particles and long, flexible polymer chains that reversibly bind to one another.…
▽ More
Macroscopic properties of suspensions, such as those composed of globular particles (e.g., colloidal or macromolecular), can be tuned by controlling the equilibrium aggregation of the particles. We examine how aggregation -- and, hence, macroscopic properties -- can be controlled in a system composed of both globular particles and long, flexible polymer chains that reversibly bind to one another. We base this on a minimal statistical mechanical model of a single aggregate in which the polymer chain is treated either as ideal or self-avoiding, and, in addition, the globular particles are taken to interact with one another via excluded volume repulsion. Furthermore, each of the globular particles is taken to have one single site to which at most one polymer segment may bind. Within the context of this model, we examine the statistics of the equilibrium size of an aggregate and, thence, the structure of dilute and semidilute suspensions of these aggregates. We apply the model to biologically relevant aggregates, specifically those composed of macromolecular proteoglycan globules and long hyaluronan polymer chains. These aggregates are especially relevant to the materials properties of cartilage and the structure-function properties of perineuronal nets in brain tissue, as well as the pericellular coats of mammalian cells.
△ Less
Submitted 3 October, 2015; v1 submitted 28 May, 2015;
originally announced May 2015.
-
Adaptive Augmented Lagrangian Methods: Algorithms and Practical Numerical Experience
Authors:
Frank E. Curtis,
Nicholas I. M. Gould,
Hao Jiang,
Daniel P. Robinson
Abstract:
In this paper, we consider augmented Lagrangian (AL) algorithms for solving large-scale nonlinear optimization problems that execute adaptive strategies for updating the penalty parameter. Our work is motivated by the recently proposed adaptive AL trust region method by Curtis, Jiang, and Robinson [Math. Prog., DOI: 10.1007/s10107-014-0784-y, 2013]. The first focal point of this paper is a new var…
▽ More
In this paper, we consider augmented Lagrangian (AL) algorithms for solving large-scale nonlinear optimization problems that execute adaptive strategies for updating the penalty parameter. Our work is motivated by the recently proposed adaptive AL trust region method by Curtis, Jiang, and Robinson [Math. Prog., DOI: 10.1007/s10107-014-0784-y, 2013]. The first focal point of this paper is a new variant of the approach that employs a line search rather than a trust region strategy, where a critical algorithmic feature for the line search strategy is the use of convexified piecewise quadratic models of the AL function for computing the search directions. We prove global convergence guarantees for our line search algorithm that are on par with those for the previously proposed trust region method. A second focal point of this paper is the practical performance of the line search and trust region algorithm variants in Matlab software, as well as that of an adaptive penalty parameter updating strategy incorporated into the Lancelot software. We test these methods on problems from the CUTEst and COPS collections, as well as on challenging test problems related to optimal power flow. Our numerical experience suggests that the adaptive algorithms outperform traditional AL methods in terms of efficiency and reliability. As with traditional AL algorithms, the adaptive methods are matrix-free and thus represent a viable option for solving extreme-scale problems.
△ Less
Submitted 19 August, 2014;
originally announced August 2014.
-
Zeeman Slowers for Strontium based on Permanent Magnets
Authors:
Ian R. Hill,
Yuri B. Ovchinnikov,
Elizabeth M. Bridge,
E. Anne Curtis,
Patrick Gill
Abstract:
We present the design, construction, and characterisation of longitudinal- and transverse-field Zeeman slowers, based on arrays of permanent magnets, for slowing thermal beams of atomic Sr. The slowers are optimised for operation with deceleration related to the local laser intensity (by the parameter $ε$), which uses more effectively the available laser power, in contrast to the usual constant de…
▽ More
We present the design, construction, and characterisation of longitudinal- and transverse-field Zeeman slowers, based on arrays of permanent magnets, for slowing thermal beams of atomic Sr. The slowers are optimised for operation with deceleration related to the local laser intensity (by the parameter $ε$), which uses more effectively the available laser power, in contrast to the usual constant deceleration mode. Slowing efficiencies of up to $\approx$ $18$ $%$ are realised and compared to those predicted by modelling. We highlight the transverse-field slower, which is compact, highly tunable, light-weight, and requires no electrical power, as a simple solution to slowing Sr, well-suited to spaceborne application. For $^{88}$Sr we achieve a slow-atom flux of around $6\times 10^9$ atoms$\,$s$^{-1}$ at $30$ ms$^{-1}$, loading approximately $5\times 10^8$ atoms in to a magneto-optical-trap (MOT), and capture all isotopes in approximate relative natural abundances.
△ Less
Submitted 21 February, 2014;
originally announced February 2014.
-
Matrix-Free Solvers for Exact Penalty Subproblems
Authors:
James V. Burke,
Frank E. Curtis,
Hao Wang,
Jiashan Wang
Abstract:
We present two matrix-free methods for approximately solving exact penalty subproblems that arise when solving large-scale optimization problems. The first approach is a novel iterative re-weighting algorithm (IRWA), which iteratively minimizes quadratic models of relaxed subproblems while automatically updating a relaxation vector. The second approach is based on alternating direction augmented L…
▽ More
We present two matrix-free methods for approximately solving exact penalty subproblems that arise when solving large-scale optimization problems. The first approach is a novel iterative re-weighting algorithm (IRWA), which iteratively minimizes quadratic models of relaxed subproblems while automatically updating a relaxation vector. The second approach is based on alternating direction augmented Lagrangian (ADAL) technology applied to our setting. The main computational costs of each algorithm are the repeated minimizations of convex quadratic functions which can be performed matrix-free. We prove that both algorithms are globally convergent under loose assumptions, and that each requires at most $O(1/\varepsilon^2)$ iterations to reach $\varepsilon$-optimality of the objective function.
Numerical experiments exhibit the ability of both algorithms to efficiently find inexact solutions. Moreover, in certain cases, IRWA is shown to be more reliable than ADAL.
△ Less
Submitted 9 February, 2014;
originally announced February 2014.
-
Shaking-induced dynamics of cold atoms in magnetic traps
Authors:
I. Llorente García,
B. Darquié,
C. D. J. Sinclair,
E. A. Curtis,
M. Tachikawa,
J. J. Hudson,
E. A. Hinds
Abstract:
We describe an experiment in which cold rubidium atoms, confined in an elongated magnetic trap, are excited by transverse oscillation of the trap centre. The temperature after excitation exhibits resonance as a function of the driving frequency. We measure these resonances at several different trap frequencies. In order to interpret the experiments, we develop a simple model that incorporates both…
▽ More
We describe an experiment in which cold rubidium atoms, confined in an elongated magnetic trap, are excited by transverse oscillation of the trap centre. The temperature after excitation exhibits resonance as a function of the driving frequency. We measure these resonances at several different trap frequencies. In order to interpret the experiments, we develop a simple model that incorporates both collisions between atoms and the anharmonicity of the real three-dimensional trap** potential. As well as providing a precise connection between the transverse harmonic oscillation frequency and the temperature resonance frequency, this model gives insight into the heating and loss mechanisms, and into the dynamics of driven clouds of cold trapped atoms.
△ Less
Submitted 26 August, 2013;
originally announced August 2013.