Skip to main content

Showing 1–50 of 74 results for author: Mokhtari, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2406.04592  [pdf, ps, other

    math.OC cs.LG stat.ML

    Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions

    Authors: Devyani Maladkar, Ruichen Jiang, Aryan Mokhtari

    Abstract: Adaptive gradient methods are arguably the most successful optimization algorithms for neural network training. While it is well-known that adaptive gradient methods can achieve better dimensional dependence than stochastic gradient descent (SGD) under favorable geometry for stochastic convex optimization, the theoretical justification for their success in stochastic non-convex optimization remain… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 21 pages

  2. arXiv:2406.02016  [pdf, other

    math.OC cs.LG stat.ML

    Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

    Authors: Ruichen Jiang, Ali Kavis, Qiujiang **, Sujay Sanghavi, Aryan Mokhtari

    Abstract: We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic meth… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  3. arXiv:2406.01478  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Newton Proximal Extragradient Method

    Authors: Ruichen Jiang, Michał Dereziński, Aryan Mokhtari

    Abstract: Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 32 pages, 1 figure

  4. arXiv:2404.16731  [pdf, ps, other

    math.OC

    Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search

    Authors: Qiujiang **, Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we establish the first explicit and non-asymptotic global convergence analysis of the BFGS method when deployed with an inexact line search scheme that satisfies the Armijo-Wolfe conditions. We show that BFGS achieves a global convergence rate of $(1-\frac{1}κ)^k$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ=\frac{L}μ$ denotes the condition number. Furthe… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  5. arXiv:2404.01267  [pdf, other

    math.OC

    Non-asymptotic Global Convergence Rates of BFGS with Exact Line Search

    Authors: Qiujiang **, Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon's equivalence result, our findings are also applicable to other quasi-Newton methods in the convex Broyden class employing exact line search, such as the Davidon-Fletcher-Powell (DFP) method. Specifically, we focus on… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  6. arXiv:2402.08097  [pdf, ps, other

    math.OC cs.LG stat.ML

    An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

    Authors: **cheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

    Abstract: In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based upd… ▽ More

    Submitted 31 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  7. arXiv:2401.03058  [pdf, other

    math.OC cs.LG stat.ML

    Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

    Authors: Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

    Abstract: Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 27 pages, 2 figures

  8. arXiv:2308.07536  [pdf, ps, other

    math.OC cs.LG stat.ML

    Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

    Authors: **cheng Cao, Ruichen Jiang, Nazanin Abolfazli, Erfan Yazdandoost Hamedani, Aryan Mokhtari

    Abstract: In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  9. arXiv:2306.15444  [pdf, other

    math.OC cs.LG stat.ML

    Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

    Authors: Zhan Gao, Aryan Mokhtari, Alec Koppel

    Abstract: Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit local superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or all past curvature information to form the current Hessian inverse approxima… ▽ More

    Submitted 18 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  10. arXiv:2306.02429  [pdf, other

    math.OC

    An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

    Authors: Nazanin Abolfazli, Ruichen Jiang, Aryan Mokhtari, Erfan Yazdandoost Hamedani

    Abstract: Bilevel optimization is an important class of optimization problems where one optimization problem is nested within another. While various methods have emerged to address unconstrained general bilevel optimization problems, there has been a noticeable gap in research when it comes to methods tailored for the constrained scenario. The few methods that do accommodate constrained problems, often exhi… ▽ More

    Submitted 13 March, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

  11. arXiv:2306.02212  [pdf, other

    math.OC cs.LG stat.ML

    Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

    Authors: Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we propose an accelerated quasi-Newton proximal extragradient (A-QPNE) method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective, we prove that our method can achieve a convergence rate of ${O}\bigl(\min\{\frac{1}{k^2}, \frac{\sqrt{d\log k}}{k^{2.5}}\}\bigr)$, where $d$ is the problem dimension and $k$ is the number of i… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 44 pages, 1 figure

  12. arXiv:2302.08580  [pdf, other

    math.OC cs.LG stat.ML

    Online Learning Guided Curvature Approximation: A Quasi-Newton Method with Global Non-Asymptotic Superlinear Convergence

    Authors: Ruichen Jiang, Qiujiang **, Aryan Mokhtari

    Abstract: Quasi-Newton algorithms are among the most popular iterative methods for solving unconstrained minimization problems, largely due to their favorable superlinear convergence property. However, existing results for these algorithms are limited as they provide either (i) a global convergence guarantee with an asymptotic superlinear convergence rate, or (ii) a local non-asymptotic superlinear rate for… ▽ More

    Submitted 25 July, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: 33 pages, 1 figure, accepted to COLT 2023

  13. arXiv:2301.04430  [pdf, other

    cs.LG cs.NI math.PR stat.ML

    Network Adaptive Federated Learning: Congestion and Lossy Compression

    Authors: Parikshit Hegde, Gustavo de Veciana, Aryan Mokhtari

    Abstract: In order to achieve the dual goals of privacy and learning across distributed data, Federated Learning (FL) systems rely on frequent exchanges of large files (model updates) between a set of clients and the server. As such FL systems are exposed to, or indeed the cause of, congestion across a wide set of network resources. Lossy compression can be used to reduce the size of exchanged files and ass… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

  14. arXiv:2211.14103  [pdf, other

    math.OC

    Conditional Gradient Methods

    Authors: Gábor Braun, Alejandro Carderera, Cyrille W. Combettes, Hamed Hassani, Amin Karbasi, Aryan Mokhtari, Sebastian Pokutta

    Abstract: The purpose of this survey is to serve both as a gentle introduction and a coherent overview of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for function minimization. These algorithms are especially useful in convex optimization when linear optimization is cheaper than projections. The selection of the material has been guided by the principle of highli… ▽ More

    Submitted 27 July, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: 240 pages with many figures. The FrankWolfe.jl Julia package (https://github.com/ZIB-IOL/FrankWolfe.jl) providces state-of-the-art implementations of many Frank--Wolfe methods. v2 fixes many typos, adds clarifications and replaces an image for copyright reasons

  15. arXiv:2206.08868  [pdf, other

    math.OC cs.LG stat.ML

    A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem

    Authors: Ruichen Jiang, Nazanin Abolfazli, Aryan Mokhtari, Erfan Yazdandoost Hamedani

    Abstract: In this paper, we study a class of bilevel optimization problems, also known as simple bilevel optimization, where we minimize a smooth objective function over the optimal solution set of another convex constrained optimization problem. Several iterative methods have been developed for tackling this class of problems. Alas, their convergence guarantees are either asymptotic for the upper-level obj… ▽ More

    Submitted 23 April, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted to AISTATS 2023

  16. arXiv:2206.00207  [pdf, other

    math.OC

    Statistical and Computational Complexities of BFGS Quasi-Newton Method for Generalized Linear Models

    Authors: Qiujiang **, Tongzheng Ren, Nhat Ho, Aryan Mokhtari

    Abstract: The gradient descent (GD) method has been used widely to solve parameter estimation in generalized linear models (GLMs), a generalization of linear models when the link function can be non-linear. In GLMs with a polynomial link function, it has been shown that in the high signal-to-noise ratio (SNR) regime, due to the problem's strong convexity and smoothness, GD converges linearly and reaches the… ▽ More

    Submitted 14 March, 2024; v1 submitted 31 May, 2022; originally announced June 2022.

  17. arXiv:2202.10538  [pdf, other

    math.OC

    Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood

    Authors: Qiujiang **, Alec Koppel, Ketan Rajawat, Aryan Mokhtari

    Abstract: Non-asymptotic analysis of quasi-Newton methods have gained traction recently. In particular, several works have established a non-asymptotic superlinear rate of $\mathcal{O}((1/\sqrt{t})^t)$ for the (classic) BFGS method by exploiting the fact that its error of Newton direction approximation approaches zero. Moreover, a greedy variant of BFGS was recently proposed which accelerates its convergenc… ▽ More

    Submitted 15 June, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

  18. arXiv:2202.09674  [pdf, other

    math.OC cs.LG stat.ML

    Generalized Optimistic Methods for Convex-Concave Saddle Point Problems

    Authors: Ruichen Jiang, Aryan Mokhtari

    Abstract: The optimistic gradient method has seen increasing popularity for solving convex-concave saddle point problems. To analyze its iteration complexity, a recent work [arXiv:1906.01115] proposed an interesting perspective that interprets this method as an approximation to the proximal point method. In this paper, we follow this approach and distill the underlying idea of optimism to propose a generali… ▽ More

    Submitted 10 January, 2024; v1 submitted 19 February, 2022; originally announced February 2022.

    Comments: 60 pages, 3 figures; simplified and improved the line search scheme. Due to the character limit, the abstract appearing here is slightly shorter than that in the PDF file

    MSC Class: 90C25; 90C33; 90C47

  19. arXiv:2202.05791  [pdf, other

    stat.ML cs.LG math.OC

    The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

    Authors: Matthew Faw, Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari, Sanjay Shakkottai, Rachel Ward

    Abstract: We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following a… ▽ More

    Submitted 25 July, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: Accepted to COLT 2022

  20. arXiv:2111.01262  [pdf, other

    math.OC cs.DS cs.LG eess.SY stat.ML

    Minimax Optimization: The Case of Convex-Submodular

    Authors: Arman Adibi, Aryan Mokhtari, Hamed Hassani

    Abstract: Minimax optimization has been central in addressing various applications in machine learning, game theory, and control theory. Prior literature has thus far mainly focused on studying such problems in the continuous domain, e.g., convex-concave minimax optimization is now understood to a significant extent. Nevertheless, minimax problems extend far beyond the continuous domain to mixed continuous-… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  21. arXiv:2106.05445  [pdf, other

    math.OC cs.LG

    Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

    Authors: Qiujiang **, Aryan Mokhtari

    Abstract: In this paper, we study the application of quasi-Newton methods for solving empirical risk minimization (ERM) problems defined over a large dataset. Traditional deterministic and stochastic quasi-Newton methods can be executed to solve such problems; however, it is known that their global convergence rate may not be better than first-order methods, and their local superlinear convergence only appe… ▽ More

    Submitted 26 October, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

  22. arXiv:2102.07078  [pdf, other

    cs.LG math.OC

    Exploiting Shared Representations for Personalized Federated Learning

    Authors: Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Deep neural networks have shown the ability to extract universal feature representations from data such as images and text that have been useful for a variety of learning tasks. However, the fruits of representation learning have yet to be fully-realized in federated settings. Although data in federated settings is often non-i.i.d. across clients, the success of centralized deep learning suggests… ▽ More

    Submitted 24 March, 2023; v1 submitted 14 February, 2021; originally announced February 2021.

  23. arXiv:2102.03832  [pdf, other

    cs.LG math.OC stat.ML

    Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

    Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over $m$ tasks, each with $n$ data points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for… ▽ More

    Submitted 16 November, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  24. arXiv:2010.14672  [pdf, other

    cs.LG math.OC stat.ML

    How Does the Task Landscape Affect MAML Performance?

    Authors: Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Model-Agnostic Meta-Learning (MAML) has become increasingly popular for training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard non-adaptive learning (NAL), and little is understood about how much MAML improves over NAL in terms of the fast adaptability of thei… ▽ More

    Submitted 9 August, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

  25. arXiv:2007.05852  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Submodular Meta-Learning

    Authors: Arman Adibi, Aryan Mokhtari, Hamed Hassani

    Abstract: In this paper, we introduce a discrete variant of the meta-learning framework. Meta-learning aims at exploiting prior experience and data to improve performance on future tasks. By now, there exist numerous formulations for meta-learning in the continuous domain. Notably, the Model-Agnostic Meta-Learning (MAML) formulation views each task as a continuous optimization problem and based on prior dat… ▽ More

    Submitted 9 January, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

  26. arXiv:2006.13326  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Safe Learning under Uncertain Objectives and Constraints

    Authors: Mohammad Fereydounian, Zebang Shen, Aryan Mokhtari, Amin Karbasi, Hamed Hassani

    Abstract: In this paper, we consider non-convex optimization problems under \textit{unknown} yet safety-critical constraints. Such problems naturally arise in a variety of domains including robotics, manufacturing, and medical procedures, where it is infeasible to know or identify all the constraints. Therefore, the parameter space should be explored in a conservative way to ensure that none of the constrai… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 42 pages, 2 figures

  27. arXiv:2003.13607  [pdf, other

    math.OC cs.LG

    Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods

    Authors: Qiujiang **, Aryan Mokhtari

    Abstract: In this paper, we study and prove the non-asymptotic superlinear convergence rate of the Broyden class of quasi-Newton algorithms which includes the Davidon--Fletcher--Powell (DFP) method and the Broyden--Fletcher--Goldfarb--Shanno (BFGS) method. The asymptotic superlinear convergence rate of these quasi-Newton methods has been extensively studied in the literature, but their explicit finite-time… ▽ More

    Submitted 30 November, 2021; v1 submitted 30 March, 2020; originally announced March 2020.

  28. arXiv:2002.07948  [pdf, other

    cs.LG math.OC stat.ML

    Personalized Federated Learning: A Meta-Learning Approach

    Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common ou… ▽ More

    Submitted 22 October, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: To appear in 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  29. arXiv:2002.05135  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

    Authors: Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computati… ▽ More

    Submitted 16 November, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  30. arXiv:2002.04766  [pdf, other

    cs.LG math.OC stat.ML

    Task-Robust Model-Agnostic Meta-Learning

    Authors: Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Meta-learning methods have shown an impressive ability to train models that rapidly learn new tasks. However, these methods only aim to perform well in expectation over tasks coming from some particular distribution that is typically equivalent across meta-training and meta-testing, rather than considering worst-case task performance. In this work we introduce the notion of "task-robustness" by re… ▽ More

    Submitted 18 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

  31. arXiv:1910.14380  [pdf, other

    math.OC cs.LG stat.ML

    A Decentralized Proximal Point-type Method for Saddle Point Problems

    Authors: Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng

    Abstract: In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point metho… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: 18 pages

  32. arXiv:1910.04322  [pdf, other

    math.OC cs.LG stat.ML

    One Sample Stochastic Frank-Wolfe

    Authors: Mingrui Zhang, Zebang Shen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi

    Abstract: One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit.… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  33. arXiv:1909.13014  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization

    Authors: Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, Ramtin Pedarsani

    Abstract: Federated learning is a distributed framework according to which a model is trained over a set of devices, while kee** data localized. This framework faces several systems-oriented challenges which include (i) communication bottleneck since a large number of devices upload their local updates to a parameter server, and (ii) scalability as the federated network consists of millions of devices. Du… ▽ More

    Submitted 7 June, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

  34. arXiv:1908.10400  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

    Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions. We start with the MAML method and its first-order approximation (FO-MAML) and highlight the challenges that emerge in their analysis. By overcoming these challeng… ▽ More

    Submitted 15 May, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

    Comments: To appear in the proceedings of the $23^{rd}$ International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

  35. arXiv:1907.10595  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Robust and Communication-Efficient Collaborative Learning

    Authors: Amirhossein Reisizadeh, Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

    Abstract: We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers' delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm… ▽ More

    Submitted 31 October, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

  36. arXiv:1906.01115  [pdf, ps, other

    math.OC cs.LG stat.ML

    Convergence Rate of $\mathcal{O}(1/k)$ for Optimistic Gradient and Extra-gradient Methods in Smooth Convex-Concave Saddle Point Problems

    Authors: Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil

    Abstract: We study the iteration complexity of the optimistic gradient descent-ascent (OGDA) method and the extra-gradient (EG) method for finding a saddle point of a convex-concave unconstrained min-max problem. To do so, we first show that both OGDA and EG can be interpreted as approximate variants of the proximal point method. This is similar to the approach taken in [Nemirovski, 2004] which analyzes EG… ▽ More

    Submitted 29 September, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 19 pages

  37. arXiv:1906.00506  [pdf, ps, other

    math.OC

    DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate

    Authors: Saeed Soori, Konstantin Mischenko, Aryan Mokhtari, Maryam Mehri Dehnavi, Mert Gurbuzbalaban

    Abstract: In this paper, we consider distributed algorithms for solving the empirical risk minimization problem under the master/worker communication model. We develop a distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence. To our knowledge, this is the first distributed asynchronous algorithm with superlinear convergence guarantees. Our algorithm is communication-efficie… ▽ More

    Submitted 10 June, 2019; v1 submitted 2 June, 2019; originally announced June 2019.

  38. arXiv:1902.06992  [pdf, other

    math.OC cs.LG

    Stochastic Conditional Gradient++

    Authors: Hamed Hassani, Amin Karbasi, Aryan Mokhtari, Zebang Shen

    Abstract: In this paper, we consider the general non-oblivious stochastic optimization where the underlying stochasticity may change during the optimization procedure and depends on the point at which the function is evaluated. We develop Stochastic Frank-Wolfe++ ($\text{SFW}{++} $), an efficient variant of the conditional gradient method for minimizing a smooth non-convex function subject to a convex body… ▽ More

    Submitted 8 September, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

  39. arXiv:1902.06332  [pdf, other

    cs.LG cs.DC cs.DS math.OC stat.ML

    Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free

    Authors: Mingrui Zhang, Lin Chen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi

    Abstract: How can we efficiently mitigate the overhead of gradient communications in distributed optimization? This problem is at the heart of training scalable machine learning models and has been mainly studied in the unconstrained setting. In this paper, we propose Quantized-Frank-Wolfe (QFW), the first projection-free and communication-efficient algorithm for solving constrained optimization problems at… ▽ More

    Submitted 30 May, 2019; v1 submitted 17 February, 2019; originally announced February 2019.

  40. arXiv:1901.08511  [pdf, ps, other

    math.OC cs.LG stat.ML

    A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach

    Authors: Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil

    Abstract: In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods. We show that both of these algorithms admit a unified analysis as approximations of the classical proximal point method for solving saddle point problems. This viewpoint enables us to develop a new framework for… ▽ More

    Submitted 5 September, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

    Comments: 25 pages, 3 figures

  41. arXiv:1811.02521  [pdf, ps, other

    math.OC

    Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

    Authors: **gzhao Zhang, César A. Uribe, Aryan Mokhtari, Ali Jadbabaie

    Abstract: We develop a distributed algorithm for convex Empirical Risk Minimization, the problem of minimizing large but finite sum of convex functions over networks. The proposed algorithm is derived from directly discretizing the second-order heavy-ball differential equation and results in an accelerated convergence rate, i.e, faster than distributed gradient descent-based methods for strongly convex obje… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  42. arXiv:1809.02162  [pdf, ps, other

    cs.LG math.OC stat.ML

    Esca** Saddle Points in Constrained Optimization

    Authors: Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie

    Abstract: In this paper, we study the problem of esca** from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. Specifically, our results hold if one can find a $ρ$-approximate sol… ▽ More

    Submitted 9 October, 2018; v1 submitted 6 September, 2018; originally announced September 2018.

  43. A Primal-Dual Quasi-Newton Method for Exact Consensus Optimization

    Authors: Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro

    Abstract: We introduce the primal-dual quasi-Newton (PD-QN) method as an approximated second order method for solving decentralized optimization problems. The PD-QN method performs quasi-Newton updates on both the primal and dual variables of the consensus optimization problem to find the optimal point of the augmented Lagrangian. By optimizing the augmented Lagrangian, the PD-QN method is able to find the… ▽ More

    Submitted 10 July, 2019; v1 submitted 4 September, 2018; originally announced September 2018.

  44. arXiv:1806.11536  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    An Exact Quantized Decentralized Gradient Descent Algorithm

    Authors: Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

    Abstract: We consider the problem of decentralized consensus optimization, where the sum of $n$ smooth and strongly convex functions are minimized over $n$ distributed agents that form a connected network. In particular, we consider the case that the communicated local decision variables among nodes are quantized in order to alleviate the communication bottleneck in distributed optimization. We propose the… ▽ More

    Submitted 1 August, 2019; v1 submitted 29 June, 2018; originally announced June 2018.

  45. arXiv:1805.00521  [pdf, other

    math.OC cs.LG stat.ML

    Direct Runge-Kutta Discretization Achieves Acceleration

    Authors: **gzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

    Abstract: We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lip… ▽ More

    Submitted 27 November, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

    Comments: 24 pages. 4 figures

  46. arXiv:1804.09554  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization

    Authors: Aryan Mokhtari, Hamed Hassani, Amin Karbasi

    Abstract: This paper considers stochastic optimization problems for a large class of objective functions, including convex and continuous submodular. Stochastic proximal gradient methods have been widely used to solve such problems; however, their applicability remains limited when the problem dimension is large and the projection onto a convex set is costly. Instead, stochastic conditional gradient methods… ▽ More

    Submitted 12 November, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: arXiv admin note: text overlap with arXiv:1711.01660

  47. arXiv:1802.03825  [pdf, other

    math.OC

    Decentralized Submodular Maximization: Bridging Discrete and Continuous Settings

    Authors: Aryan Mokhtari, Hamed Hassani, Amin Karbasi

    Abstract: In this paper, we showcase the interplay between discrete and continuous optimization in network-structured settings. We propose the first fully decentralized optimization method for a wide class of non-convex objective functions that possess a diminishing returns property. More specifically, given an arbitrary connected network and a global continuous submodular function, formed by a sum of local… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.

  48. arXiv:1711.01660  [pdf, other

    math.OC cs.LG

    Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap

    Authors: Aryan Mokhtari, Hamed Hassani, Amin Karbasi

    Abstract: In this paper, we study the problem of \textit{constrained} and \textit{stochastic} continuous submodular maximization. Even though the objective function is not concave (nor convex) and is defined in terms of an expectation, we develop a variant of the conditional gradient method, called \alg, which achieves a \textit{tight} approximation guarantee. More precisely, for a monotone and continuous D… ▽ More

    Submitted 5 November, 2017; originally announced November 2017.

  49. arXiv:1709.00599  [pdf, other

    cs.LG math.OC

    First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization

    Authors: Aryan Mokhtari, Alejandro Ribeiro

    Abstract: This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solve the ERM problem corresponding to the full dataset directly, adaptive sample size schemes start with… ▽ More

    Submitted 2 September, 2017; originally announced September 2017.

  50. arXiv:1707.08028  [pdf, ps, other

    math.OC

    A Newton-Based Method for Nonconvex Optimization with Fast Evasion of Saddle Points

    Authors: Santiago Paternain, Aryan Mokhtari, Alejandro Ribeiro

    Abstract: Machine learning problems such as neural network training, tensor decomposition, and matrix factorization, require local minimization of a nonconvex function. This local minimization is challenged by the presence of saddle points, of which there can be many and from which descent methods may take inordinately large number of iterations to escape. This paper presents a second-order method that modi… ▽ More

    Submitted 20 July, 2018; v1 submitted 25 July, 2017; originally announced July 2017.