Skip to main content

Showing 1–50 of 77 results for author: Mokhtari, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04592  [pdf, ps, other

    math.OC cs.LG stat.ML

    Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions

    Authors: Devyani Maladkar, Ruichen Jiang, Aryan Mokhtari

    Abstract: Adaptive gradient methods are arguably the most successful optimization algorithms for neural network training. While it is well-known that adaptive gradient methods can achieve better dimensional dependence than stochastic gradient descent (SGD) under favorable geometry for stochastic convex optimization, the theoretical justification for their success in stochastic non-convex optimization remain… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 21 pages

  2. arXiv:2406.02016  [pdf, other

    math.OC cs.LG stat.ML

    Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

    Authors: Ruichen Jiang, Ali Kavis, Qiujiang **, Sujay Sanghavi, Aryan Mokhtari

    Abstract: We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic meth… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  3. arXiv:2406.01478  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Newton Proximal Extragradient Method

    Authors: Ruichen Jiang, Michał Dereziński, Aryan Mokhtari

    Abstract: Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 32 pages, 1 figure

  4. arXiv:2402.11639  [pdf, other

    cs.LG cs.AI cs.CL

    In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness

    Authors: Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai

    Abstract: A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with making a prediction in that context. As such, that learner must adapt to the context without additional training. We explore the role of softmax attention in an I… ▽ More

    Submitted 28 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  5. arXiv:2402.08097  [pdf, ps, other

    math.OC cs.LG stat.ML

    An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

    Authors: **cheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

    Abstract: In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based upd… ▽ More

    Submitted 31 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  6. arXiv:2401.03058  [pdf, other

    math.OC cs.LG stat.ML

    Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

    Authors: Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

    Abstract: Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 27 pages, 2 figures

  7. arXiv:2312.03235  [pdf, other

    cs.DC cs.PF

    HEET: A Heterogeneity Measure to Quantify the Difference across Distributed Computing Systems

    Authors: Ali Mokhtari, Saeid Ghafouri, Pooyan Jamshidi, Mohsen Amini Salehi

    Abstract: Although system heterogeneity has been extensively studied in the past, there is yet to be a study on measuring the impact of heterogeneity on system performance. For this purpose, we propose a heterogeneity measure that can characterize the impact of the heterogeneity of a system on its performance behavior in terms of throughput or makespan. We develop a mathematical model to characterize a hete… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  8. arXiv:2308.07536  [pdf, ps, other

    math.OC cs.LG stat.ML

    Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

    Authors: **cheng Cao, Ruichen Jiang, Nazanin Abolfazli, Erfan Yazdandoost Hamedani, Aryan Mokhtari

    Abstract: In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  9. arXiv:2307.06887  [pdf, other

    cs.LG

    Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

    Authors: Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical… ▽ More

    Submitted 6 June, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

  10. arXiv:2306.15444  [pdf, other

    math.OC cs.LG stat.ML

    Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

    Authors: Zhan Gao, Aryan Mokhtari, Alec Koppel

    Abstract: Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit local superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or all past curvature information to form the current Hessian inverse approxima… ▽ More

    Submitted 18 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  11. arXiv:2306.02212  [pdf, other

    math.OC cs.LG stat.ML

    Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

    Authors: Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we propose an accelerated quasi-Newton proximal extragradient (A-QPNE) method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective, we prove that our method can achieve a convergence rate of ${O}\bigl(\min\{\frac{1}{k^2}, \frac{\sqrt{d\log k}}{k^{2.5}}\}\bigr)$, where $d$ is the problem dimension and $k$ is the number of i… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 44 pages, 1 figure

  12. arXiv:2303.11453  [pdf, other

    cs.LG stat.ML

    Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing

    Authors: Nived Rajaraman, Devvrit, Aryan Mokhtari, Kannan Ramchandran

    Abstract: Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based updates it generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the… ▽ More

    Submitted 4 June, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 49 pages, 2 figures

  13. arXiv:2303.10901  [pdf, other

    cs.DC cs.AR cs.OS

    E2C: A Visual Simulator to Reinforce Education of Heterogeneous Computing Systems

    Authors: Ali Mokhtari, Drake Rawls, Tony Huynh, Jeremiah Green, Mohsen Amini Salehi

    Abstract: With the increasing popularity of accelerator technologies (e.g., GPUs and TPUs) and the emergence of domain-specific computing via ASICs and FPGA, the matter of heterogeneity and understanding its ramifications on the performance has become more critical than ever before. However, it is challenging to effectively educate students about the potential impacts of heterogeneity on the performance of… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted in Edupar '23, as part of IPDPS '23 Conference. arXiv admin note: text overlap with arXiv:2212.11333

  14. arXiv:2302.08580  [pdf, other

    math.OC cs.LG stat.ML

    Online Learning Guided Curvature Approximation: A Quasi-Newton Method with Global Non-Asymptotic Superlinear Convergence

    Authors: Ruichen Jiang, Qiujiang **, Aryan Mokhtari

    Abstract: Quasi-Newton algorithms are among the most popular iterative methods for solving unconstrained minimization problems, largely due to their favorable superlinear convergence property. However, existing results for these algorithms are limited as they provide either (i) a global convergence guarantee with an asymptotic superlinear convergence rate, or (ii) a local non-asymptotic superlinear rate for… ▽ More

    Submitted 25 July, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: 33 pages, 1 figure, accepted to COLT 2023

  15. arXiv:2302.07920  [pdf, other

    cs.LG

    InfoNCE Loss Provably Learns Cluster-Preserving Representations

    Authors: Advait Parulekar, Liam Collins, Karthikeyan Shanmugam, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: The goal of contrasting learning is to learn a representation that preserves underlying clusters by kee** samples with similar content, e.g. the ``dogness'' of a dog, close to each other in the space generated by the representation. A common and successful approach for tackling this unsupervised learning problem is minimizing the InfoNCE loss associated with the training samples, where each samp… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  16. arXiv:2301.04430  [pdf, other

    cs.LG cs.NI math.PR stat.ML

    Network Adaptive Federated Learning: Congestion and Lossy Compression

    Authors: Parikshit Hegde, Gustavo de Veciana, Aryan Mokhtari

    Abstract: In order to achieve the dual goals of privacy and learning across distributed data, Federated Learning (FL) systems rely on frequent exchanges of large files (model updates) between a set of clients and the server. As such FL systems are exposed to, or indeed the cause of, congestion across a wide set of network resources. Lossy compression can be used to reduce the size of exchanged files and ass… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

  17. arXiv:2212.11333  [pdf, other

    cs.DC cs.OS

    E2C: A Visual Simulator for Heterogeneous Computing Systems

    Authors: Ali Mokhtari, Mohsen Amini Salehi

    Abstract: Heterogeneity has been an indispensable aspect of distributed computing throughout the history of these systems. In particular, with the increasing prevalence of accelerator technologies (e.g., GPUs and TPUs) and the emergence of domain-specific computing via ASICs and FPGA, the matter of heterogeneity and harnessing it has become a more critical challenge than ever before. Harnessing system heter… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: https://hpcclab.github.io/E2C-Sim-docs/

    Journal ref: Tutorial at 15th ACM/IEEE Utility Cloud Computing (UCC '22) conference, Vancouver, Washington, USA, Dec. 2022

  18. arXiv:2211.07130  [pdf, other

    cs.DC cs.AI cs.LG

    Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge

    Authors: SM Zobaed, Ali Mokhtari, Jaya Prakash Champati, Mathieu Kourouma, Mohsen Amini Salehi

    Abstract: Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that c… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted in Utility Cloud Computing Conference 2022

  19. arXiv:2209.01143  [pdf, other

    cs.LG cs.AI

    Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

    Authors: Mao Ye, Ruichen Jiang, Haoxiang Wang, Dhruv Choudhary, Xiaocong Du, Bhargav Bhushanam, Aryan Mokhtari, Arun Kejariwal, Qiang Liu

    Abstract: One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose to learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training so that the recommendation model c… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

  20. arXiv:2206.08868  [pdf, other

    math.OC cs.LG stat.ML

    A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem

    Authors: Ruichen Jiang, Nazanin Abolfazli, Aryan Mokhtari, Erfan Yazdandoost Hamedani

    Abstract: In this paper, we study a class of bilevel optimization problems, also known as simple bilevel optimization, where we minimize a smooth objective function over the optimal solution set of another convex constrained optimization problem. Several iterative methods have been developed for tackling this class of problems. Alas, their convergence guarantees are either asymptotic for the upper-level obj… ▽ More

    Submitted 23 April, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted to AISTATS 2023

  21. arXiv:2206.02078  [pdf, other

    cs.LG cs.DC

    Straggler-Resilient Personalized Federated Learning

    Authors: Isidoros Tziotis, Zebang Shen, Ramtin Pedarsani, Hamed Hassani, Aryan Mokhtari

    Abstract: Federated Learning is an emerging learning paradigm that allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. Despite its success, federated learning faces several challenges related to its decentralized nature. In this work, we develop a novel algorithmic procedure with theoretical speedup guarantees that simult… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

  22. arXiv:2206.00065  [pdf, other

    cs.DC cs.LG cs.PF

    FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems

    Authors: Ali Mokhtari, Md Abir Hossen, Pooyan Jamshidi, Mohsen Amini Salehi

    Abstract: Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They use heterogeneous resources with diverse computing performance (e.g., CPU, GPU, and/or FPGAs) to fulfill the latency constraints of ML applications. The challe… ▽ More

    Submitted 20 July, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

  23. arXiv:2205.13692  [pdf, other

    cs.LG

    FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

    Authors: Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  24. arXiv:2202.09674  [pdf, other

    math.OC cs.LG stat.ML

    Generalized Optimistic Methods for Convex-Concave Saddle Point Problems

    Authors: Ruichen Jiang, Aryan Mokhtari

    Abstract: The optimistic gradient method has seen increasing popularity for solving convex-concave saddle point problems. To analyze its iteration complexity, a recent work [arXiv:1906.01115] proposed an interesting perspective that interprets this method as an approximation to the proximal point method. In this paper, we follow this approach and distill the underlying idea of optimism to propose a generali… ▽ More

    Submitted 10 January, 2024; v1 submitted 19 February, 2022; originally announced February 2022.

    Comments: 60 pages, 3 figures; simplified and improved the line search scheme. Due to the character limit, the abstract appearing here is slightly shorter than that in the PDF file

    MSC Class: 90C25; 90C33; 90C47

  25. arXiv:2202.09398  [pdf, other

    cs.MA cs.IT

    Provably Private Distributed Averaging Consensus: An Information-Theoretic Approach

    Authors: Mohammad Fereydounian, Aryan Mokhtari, Ramtin Pedarsani, Hamed Hassani

    Abstract: In this work, we focus on solving a decentralized consensus problem in a private manner. Specifically, we consider a setting in which a group of nodes, connected through a network, aim at computing the mean of their local values without revealing those values to each other. The distributed consensus problem is a classic problem that has been extensively studied and its convergence characteristics… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: 31 pages

  26. arXiv:2202.05791  [pdf, other

    stat.ML cs.LG math.OC

    The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

    Authors: Matthew Faw, Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari, Sanjay Shakkottai, Rachel Ward

    Abstract: We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following a… ▽ More

    Submitted 25 July, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: Accepted to COLT 2022

  27. arXiv:2202.03483  [pdf, other

    cs.LG

    MAML and ANIL Provably Learn Representations

    Authors: Liam Collins, Aryan Mokhtari, Sewoong Oh, Sanjay Shakkottai

    Abstract: Recent empirical evidence has driven conventional wisdom to believe that gradient-based meta-learning (GBML) methods perform well at few-shot learning because they learn an expressive data representation that is shared across tasks. However, the mechanics of GBML have remained largely mysterious from a theoretical perspective. In this paper, we prove that two well-known GBML methods, MAML and ANIL… ▽ More

    Submitted 4 June, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

  28. arXiv:2111.01262  [pdf, other

    math.OC cs.DS cs.LG eess.SY stat.ML

    Minimax Optimization: The Case of Convex-Submodular

    Authors: Arman Adibi, Aryan Mokhtari, Hamed Hassani

    Abstract: Minimax optimization has been central in addressing various applications in machine learning, game theory, and control theory. Prior literature has thus far mainly focused on studying such problems in the continuous domain, e.g., convex-concave minimax optimization is now understood to a significant extent. Nevertheless, minimax problems extend far beyond the continuous domain to mixed continuous-… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  29. arXiv:2106.05445  [pdf, other

    math.OC cs.LG

    Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

    Authors: Qiujiang **, Aryan Mokhtari

    Abstract: In this paper, we study the application of quasi-Newton methods for solving empirical risk minimization (ERM) problems defined over a large dataset. Traditional deterministic and stochastic quasi-Newton methods can be executed to solve such problems; however, it is known that their global convergence rate may not be better than first-order methods, and their local superlinear convergence only appe… ▽ More

    Submitted 26 October, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

  30. arXiv:2102.07078  [pdf, other

    cs.LG math.OC

    Exploiting Shared Representations for Personalized Federated Learning

    Authors: Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Deep neural networks have shown the ability to extract universal feature representations from data such as images and text that have been useful for a variety of learning tasks. However, the fruits of representation learning have yet to be fully-realized in federated settings. Although data in federated settings is often non-i.i.d. across clients, the success of centralized deep learning suggests… ▽ More

    Submitted 24 March, 2023; v1 submitted 14 February, 2021; originally announced February 2021.

  31. arXiv:2102.03832  [pdf, other

    cs.LG math.OC stat.ML

    Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

    Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over $m$ tasks, each with $n$ data points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for… ▽ More

    Submitted 16 November, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  32. arXiv:2012.14453  [pdf, other

    cs.LG cs.DC stat.ML

    Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity

    Authors: Amirhossein Reisizadeh, Isidoros Tziotis, Hamed Hassani, Aryan Mokhtari, Ramtin Pedarsani

    Abstract: Federated Learning is a novel paradigm that involves learning from data samples distributed across a large network of clients while the data remains local. It is, however, known that federated learning is prone to multiple system challenges including system heterogeneity where clients have different computation and communication capabilities. Such heterogeneity in clients' computation speeds has a… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

  33. arXiv:2010.14672  [pdf, other

    cs.LG math.OC stat.ML

    How Does the Task Landscape Affect MAML Performance?

    Authors: Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Model-Agnostic Meta-Learning (MAML) has become increasingly popular for training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard non-adaptive learning (NAL), and little is understood about how much MAML improves over NAL in terms of the fast adaptability of thei… ▽ More

    Submitted 9 August, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

  34. arXiv:2007.05852  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Submodular Meta-Learning

    Authors: Arman Adibi, Aryan Mokhtari, Hamed Hassani

    Abstract: In this paper, we introduce a discrete variant of the meta-learning framework. Meta-learning aims at exploiting prior experience and data to improve performance on future tasks. By now, there exist numerous formulations for meta-learning in the continuous domain. Notably, the Model-Agnostic Meta-Learning (MAML) formulation views each task as a continuous optimization problem and based on prior dat… ▽ More

    Submitted 9 January, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

  35. arXiv:2007.01154  [pdf, other

    cs.LG cs.DC stat.ML

    Federated Learning with Compression: Unified Analysis and Sharp Guarantees

    Authors: Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, Mehrdad Mahdavi

    Abstract: In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions. Two notable trends to deal with the communication overhead of federated algorithms are gradient compression and local computation… ▽ More

    Submitted 20 November, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

    Comments: version 2. more experiments and comparisons

  36. arXiv:2006.13326  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Safe Learning under Uncertain Objectives and Constraints

    Authors: Mohammad Fereydounian, Zebang Shen, Aryan Mokhtari, Amin Karbasi, Hamed Hassani

    Abstract: In this paper, we consider non-convex optimization problems under \textit{unknown} yet safety-critical constraints. Such problems naturally arise in a variety of domains including robotics, manufacturing, and medical procedures, where it is infeasible to know or identify all the constraints. Therefore, the parameter space should be explored in a conservative way to ensure that none of the constrai… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 42 pages, 2 figures

  37. arXiv:2006.04101   

    cs.LG stat.ML

    Hybrid Model for Anomaly Detection on Call Detail Records by Time Series Forecasting

    Authors: Aryan Mokhtari, Leyla Sadighi, Behnam Bahrak, Mojtaba Eshghie

    Abstract: Mobile network operators store an enormous amount of information like log files that describe various events and users' activities. Analysis of these logs might be used in many critical applications such as detecting cyber-attacks, finding behavioral patterns of users, security incident response, network forensics, etc. In a cellular network Call Detail Records (CDR) is one type of such logs conta… ▽ More

    Submitted 19 October, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: The Authors have changes and I am no more one of the authors in this manuscript

  38. arXiv:2005.11050  [pdf, other

    cs.DC cs.OS cs.PF

    Autonomous Task Drop** Mechanism to Achieve Robustness in Heterogeneous Computing Systems

    Authors: Ali Mokhtari, Chavit Denninnart, Mohsen Amini Salehi

    Abstract: Robustness of a distributed computing system is defined as the ability to maintain its performance in the presence of uncertain parameters. Uncertainty is a key problem in heterogeneous (and even homogeneous) distributed computing systems that perturbs system robustness. Notably, the performance of these systems is perturbed by uncertainty in both task execution time and arrival. Accordingly, our… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

    Journal ref: in 29th Heterogeneity in Computing Workshop (HCW 2019), in the Proceedings of the IPDPS 2019 Workshops & PhD Forum (IPDPSW)

  39. arXiv:2003.13607  [pdf, other

    math.OC cs.LG

    Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods

    Authors: Qiujiang **, Aryan Mokhtari

    Abstract: In this paper, we study and prove the non-asymptotic superlinear convergence rate of the Broyden class of quasi-Newton algorithms which includes the Davidon--Fletcher--Powell (DFP) method and the Broyden--Fletcher--Goldfarb--Shanno (BFGS) method. The asymptotic superlinear convergence rate of these quasi-Newton methods has been extensively studied in the literature, but their explicit finite-time… ▽ More

    Submitted 30 November, 2021; v1 submitted 30 March, 2020; originally announced March 2020.

  40. arXiv:2002.09964  [pdf, other

    cs.DC cs.LG cs.MA eess.SP eess.SY

    Quantized Decentralized Stochastic Learning over Directed Graphs

    Authors: Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

    Abstract: We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decen… ▽ More

    Submitted 28 December, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

  41. arXiv:2002.07948  [pdf, other

    cs.LG math.OC stat.ML

    Personalized Federated Learning: A Meta-Learning Approach

    Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common ou… ▽ More

    Submitted 22 October, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: To appear in 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  42. arXiv:2002.05135  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

    Authors: Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computati… ▽ More

    Submitted 16 November, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  43. arXiv:2002.04766  [pdf, other

    cs.LG math.OC stat.ML

    Task-Robust Model-Agnostic Meta-Learning

    Authors: Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Meta-learning methods have shown an impressive ability to train models that rapidly learn new tasks. However, these methods only aim to perform well in expectation over tasks coming from some particular distribution that is typically equivalent across meta-training and meta-testing, rather than considering worst-case task performance. In this work we introduce the notion of "task-robustness" by re… ▽ More

    Submitted 18 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

  44. arXiv:1910.14380  [pdf, other

    math.OC cs.LG stat.ML

    A Decentralized Proximal Point-type Method for Saddle Point Problems

    Authors: Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng

    Abstract: In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point metho… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: 18 pages

  45. arXiv:1910.04322  [pdf, other

    math.OC cs.LG stat.ML

    One Sample Stochastic Frank-Wolfe

    Authors: Mingrui Zhang, Zebang Shen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi

    Abstract: One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit.… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  46. arXiv:1909.13014  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization

    Authors: Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, Ramtin Pedarsani

    Abstract: Federated learning is a distributed framework according to which a model is trained over a set of devices, while kee** data localized. This framework faces several systems-oriented challenges which include (i) communication bottleneck since a large number of devices upload their local updates to a parameter server, and (ii) scalability as the federated network consists of millions of devices. Du… ▽ More

    Submitted 7 June, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

  47. arXiv:1908.10400  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

    Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

    Abstract: We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions. We start with the MAML method and its first-order approximation (FO-MAML) and highlight the challenges that emerge in their analysis. By overcoming these challeng… ▽ More

    Submitted 15 May, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

    Comments: To appear in the proceedings of the $23^{rd}$ International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

  48. arXiv:1907.10595  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Robust and Communication-Efficient Collaborative Learning

    Authors: Amirhossein Reisizadeh, Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

    Abstract: We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers' delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm… ▽ More

    Submitted 31 October, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

  49. arXiv:1906.01115  [pdf, ps, other

    math.OC cs.LG stat.ML

    Convergence Rate of $\mathcal{O}(1/k)$ for Optimistic Gradient and Extra-gradient Methods in Smooth Convex-Concave Saddle Point Problems

    Authors: Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil

    Abstract: We study the iteration complexity of the optimistic gradient descent-ascent (OGDA) method and the extra-gradient (EG) method for finding a saddle point of a convex-concave unconstrained min-max problem. To do so, we first show that both OGDA and EG can be interpreted as approximate variants of the proximal point method. This is similar to the approach taken in [Nemirovski, 2004] which analyzes EG… ▽ More

    Submitted 29 September, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 19 pages

  50. arXiv:1902.06992  [pdf, other

    math.OC cs.LG

    Stochastic Conditional Gradient++

    Authors: Hamed Hassani, Amin Karbasi, Aryan Mokhtari, Zebang Shen

    Abstract: In this paper, we consider the general non-oblivious stochastic optimization where the underlying stochasticity may change during the optimization procedure and depends on the point at which the function is evaluated. We develop Stochastic Frank-Wolfe++ ($\text{SFW}{++} $), an efficient variant of the conditional gradient method for minimizing a smooth non-convex function subject to a convex body… ▽ More

    Submitted 8 September, 2020; v1 submitted 19 February, 2019; originally announced February 2019.