Skip to main content

Showing 1–21 of 21 results for author: Milzarek, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2406.05637  [pdf, ps, other

    math.OC cs.LG math.PR stat.ML

    A Generalized Version of Chung's Lemma and its Applications

    Authors: Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu

    Abstract: Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 43 pages, 5 figures

    MSC Class: 90C15; 90C30; 90C26

  2. arXiv:2406.02273  [pdf, ps, other

    math.OC cs.LG

    A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

    Authors: Junwen Qiu, Bohao Ma, Xiao Li, Andre Milzarek

    Abstract: We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 29 pages

    MSC Class: 90C06; 90C26; 90C30

  3. arXiv:2405.16954  [pdf, ps, other

    math.OC cs.LG

    Convergence of SGD with momentum in the nonconvex case: A time window-based analysis

    Authors: Junwen Qiu, Bohao Ma, Andre Milzarek

    Abstract: We propose a novel time window-based analysis technique to investigate the convergence properties of the stochastic gradient descent method with momentum (SGDM) in nonconvex settings. Despite its popularity, the convergence behavior of SGDM remains less understood in nonconvex scenarios. This is primarily due to the absence of a sufficient descent property and challenges in simultaneously controll… ▽ More

    Submitted 23 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 25 pages

  4. arXiv:2404.18452  [pdf, other

    math.OC

    Random Reshuffling with Momentum for Nonconvex Problems: Iteration Complexity and Last Iterate Convergence

    Authors: Junwen Qiu, Andre Milzarek

    Abstract: Random reshuffling with momentum (RRM) corresponds to the SGD optimizer with momentum option enabled, as found in popular machine learning libraries like PyTorch and TensorFlow. Despite its widespread use in practical applications, the understanding of its convergence properties in nonconvex scenarios remains limited. Under a Lipschitz smoothness assumption, this paper provides one of the first it… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 51 pages, 10 figures

    MSC Class: 90C26; 90C15

  5. arXiv:2312.01047  [pdf, other

    math.OC cs.LG

    A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization

    Authors: Junwen Qiu, Xiao Li, Andre Milzarek

    Abstract: Random reshuffling techniques are prevalent in large-scale applications, such as training neural networks. While the convergence and acceleration effects of random reshuffling-type methods are fairly well understood in the smooth setting, much less studies seem available in the nonsmooth case. In this work, we design a new normal map-based proximal random reshuffling (norm-PRR) method for nonsmoot… ▽ More

    Submitted 30 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: 43 pages, 4 figures

    MSC Class: 90C26; 90C15

  6. arXiv:2311.07276  [pdf, other

    math.OC

    Variational Properties of Decomposable Functions Part II: Strong Second-Order Theory

    Authors: Wenqing Ouyang, Andre Milzarek

    Abstract: Local superlinear convergence of the semismooth Newton method usually requires the uniform invertibility of the generalized Jacobian matrix, e.g. BD-regularity or CD-regularity. For several types of nonlinear programming and composite-type optimization problems -- for which the generalized Jacobian of the stationary equation can be calculated explicitly -- this is characterized by the strong secon… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 28 pages; preliminary draft

  7. arXiv:2311.07267  [pdf, ps, other

    math.OC

    Variational Properties of Decomposable Functions. Part I: Strict Epi-Calculus and Applications

    Authors: Wenqing Ouyang, Andre Milzarek

    Abstract: We provide systematic studies of the variational properties of decomposable functions which are compositions of an outer support function and an inner smooth map** under certain constraint qualifications. We put a particular focus on the strict twice epi-differentiability and the associated strict second subderivative of such functions. Calculus rules for the (strict) second subderivative and tw… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 28 pages; preliminary draft

  8. arXiv:2309.17096  [pdf, other

    math.NA

    Obtaining Pseudo-inverse Solutions With MINRES

    Authors: Yang Liu, Andre Milzarek, Fred Roosta

    Abstract: The celebrated minimum residual method (MINRES), proposed in the seminal paper of Paige and Saunders, has seen great success and wide-spread use in solving linear least-squared problems involving Hermitian matrices, with further extensions to complex symmetric settings. Unless the system is consistent whereby the right-hand side vector lies in the range of the matrix, MINRES is not guaranteed to o… ▽ More

    Submitted 29 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

  9. arXiv:2305.05828  [pdf, other

    math.OC cs.LG

    Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality

    Authors: Andre Milzarek, Junwen Qiu

    Abstract: In this paper, we present a novel stochastic normal map-based algorithm ($\mathsf{norM}\text{-}\mathsf{SGD}$) for nonconvex composite-type optimization problems and discuss its convergence properties. Using a time window-based strategy, we first analyze the global convergence behavior of $\mathsf{norM}\text{-}\mathsf{SGD}$ and it is shown that every accumulation point of the generated sequence of… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 34 pages, 14 figures

    MSC Class: 90C26; 90C15

  10. arXiv:2206.03907  [pdf, ps, other

    math.OC cs.LG

    A Unified Convergence Theorem for Stochastic Optimization Methods

    Authors: Xiao Li, Andre Milzarek

    Abstract: In this work, we provide a fundamental unified convergence theorem used for deriving expected and almost sure convergence results for a series of stochastic optimization methods. Our unified theorem only requires to verify several representative conditions and is not tailored to any specific algorithm. As a direct application, we recover expected and almost sure convergence results of the stochast… ▽ More

    Submitted 19 October, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  11. arXiv:2206.01372  [pdf, other

    math.OC

    Descent Properties of an Anderson Accelerated Gradient Method With Restarting

    Authors: Wenqing Ouyang, Yang Liu, Andre Milzarek

    Abstract: Anderson Acceleration (AA) is a popular acceleration technique to enhance the convergence of fixed-point iterations. The analysis of AA approaches typically focuses on the convergence behavior of a corresponding fixed-point residual, while the behavior of the underlying objective function values along the accelerated iterates is currently not well understood. In this paper, we investigate local pr… ▽ More

    Submitted 25 September, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: 30 pages; 4 figures

    MSC Class: 90C26

  12. arXiv:2204.00406  [pdf, other

    math.OC stat.ML

    A Semismooth Newton Stochastic Proximal Point Algorithm with Variance Reduction

    Authors: Andre Milzarek, Fabian Schaipp, Michael Ulbrich

    Abstract: We develop an implementable stochastic proximal point (SPP) method for a class of weakly convex, composite optimization problems. The proposed stochastic proximal point algorithm incorporates a variance reduction mechanism and the resulting SPP updates are solved using an inexact semismooth Newton framework. We establish detailed convergence results that take the inexactness of the SPP steps into… ▽ More

    Submitted 26 March, 2024; v1 submitted 1 April, 2022; originally announced April 2022.

    MSC Class: 90C26; 90C06; 65K10

  13. arXiv:2112.15287  [pdf, other

    math.OC cs.DC cs.LG cs.MA

    Distributed Random Reshuffling over Networks

    Authors: Kun Huang, Xiao Li, Andre Milzarek, Shi Pu, Junwen Qiu

    Abstract: In this paper, we consider distributed optimization problems where $n$ agents, each possessing a local cost function, collaboratively minimize the average of the local cost functions over a connected network. To solve the problem, we propose a distributed random reshuffling (D-RR) algorithm that invokes the random reshuffling (RR) update in each agent. We show that D-RR inherits favorable characte… ▽ More

    Submitted 23 March, 2023; v1 submitted 30 December, 2021; originally announced December 2021.

    Comments: 20 pages, 13 figures

  14. arXiv:2110.04926  [pdf, ps, other

    math.OC cs.LG

    Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality

    Authors: Xiao Li, Andre Milzarek, Junwen Qiu

    Abstract: We study the random reshuffling (RR) method for smooth nonconvex optimization problems with a finite-sum structure. Though this method is widely utilized in practice such as the training of neural networks, its convergence behavior is only understood in several limited settings. In this paper, under the well-known Kurdyka-Lojasiewicz (KL) inequality, we establish strong limit-point convergence res… ▽ More

    Submitted 25 January, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in SIAM Journal on Optimization

  15. arXiv:2106.09340  [pdf, other

    math.OC

    A trust region-type normal map-based semismooth Newton method for nonsmooth nonconvex composite optimization

    Authors: Wenqing Ouyang, Andre Milzarek

    Abstract: We propose a novel trust region method for solving a class of nonsmooth, nonconvex composite-type optimization problems. The approach embeds inexact semismooth Newton steps for finding zeros of a normal map-based stationarity measure for the problem in a trust region framework. Based on a new merit function and acceptance mechanism, global convergence and transition to fast local q-superlinear con… ▽ More

    Submitted 3 October, 2023; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: 55 pages, 5 figures

  16. arXiv:2006.02559  [pdf, other

    math.OC math.NA

    Nonmonotone Globalization for Anderson Acceleration via Adaptive Regularization

    Authors: Wenqing Ouyang, Jiong Tao, Andre Milzarek, Bailin Deng

    Abstract: Anderson acceleration (AA) is a popular method for accelerating fixed-point iterations, but may suffer from instability and stagnation. We propose a globalization method for AA to improve stability and achieve unified global and local convergence. Unlike existing AA globalization approaches that rely on safeguarding operations and might hinder fast local convergence, we adopt a nonmonotone trust-r… ▽ More

    Submitted 2 May, 2023; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: Accepted to Journal of Scientific Computing

  17. arXiv:2002.08513  [pdf, ps, other

    math.OC

    A Trust-Region Method For Nonsmooth Nonconvex Optimization

    Authors: Ziang Chen, Andre Milzarek, Zaiwen Wen

    Abstract: We propose a trust-region type method for a class of nonsmooth nonconvex optimization problems where the objective function is a summation of a (probably nonconvex) smooth function and a (probably nonsmooth) convex function. The model function of our trust-region subproblem is always quadratic and the linear term of the model is generated using abstract descent directions. Therefore, the trust-reg… ▽ More

    Submitted 23 October, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

  18. arXiv:1910.09373  [pdf, ps, other

    math.OC stat.ML

    A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization

    Authors: Minghan Yang, Andre Milzarek, Zaiwen Wen, Tong Zhang

    Abstract: In this paper, a novel stochastic extra-step quasi-Newton method is developed to solve a class of nonsmooth nonconvex composite optimization problems. We assume that the gradient of the smooth part of the objective function can only be approximated by stochastic oracles. The proposed method combines general stochastic higher order steps derived from an underlying proximal type fixed-point equation… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: 41 pages

    MSC Class: 90C06; 90C15; 90C26; 90C53

  19. arXiv:1908.00745  [pdf, ps, other

    math.OC

    On The Geometric Analysis of A Quartic-quadratic Optimization Problem under A Spherical Constraint

    Authors: Haixiang Zhang, Andre Milzarek, Zaiwen Wen, Wotao Yin

    Abstract: This paper considers the problem of solving a special quartic-quadratic optimization problem with a single sphere constraint, namely, finding a global and local minimizer of $\frac{1}{2}\mathbf{z}^{*}A\mathbf{z}+\fracβ{2}\sum_{k=1}^{n}\lvert z_{k}\rvert^{4}$ such that $\lVert\mathbf{z}\rVert_{2}=1$. This problem spans multiple domains including quantum mechanics and chemistry sciences and we inves… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

  20. arXiv:1803.03466  [pdf, ps, other

    math.OC stat.ML

    A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization

    Authors: Andre Milzarek, Xiantao Xiao, Shicong Cen, Zaiwen Wen, Michael Ulbrich

    Abstract: In this work, we present a globalized stochastic semismooth Newton method for solving stochastic optimization problems involving smooth nonconvex and nonsmooth convex terms in the objective function. We assume that only noisy gradient and Hessian information of the smooth part of the objective function is available via calling stochastic first and second order oracles. The proposed method can be s… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

    MSC Class: 49M15; 65C60; 65K05; 90C06

  21. arXiv:1708.02016  [pdf, ps, other

    math.OC

    Adaptive Regularized Newton Method for Riemannian Optimization

    Authors: Jiang Hu, Andre Milzarek, Zaiwen Wen, Yaxiang Yuan

    Abstract: Optimization on Riemannian manifolds widely arises in eigenvalue computation, density functional theory, Bose-Einstein condensates, low rank nearest correlation, image registration, and signal processing, etc. We propose an adaptive regularized Newton method which approximates the original objective function by the second-order Taylor expansion in Euclidean space but keeps the Riemannian manifold… ▽ More

    Submitted 7 August, 2017; originally announced August 2017.