Skip to main content

Showing 1–12 of 12 results for author: Even, M

Searching in archive math. Search in all archives.
.
  1. arXiv:2405.14532  [pdf, other

    stat.ML cs.LG math.ST

    Aligning Embeddings and Geometric Random Graphs: Informational Results and Computational Approaches for the Procrustes-Wasserstein Problem

    Authors: Mathieu Even, Luca Ganassali, Jakob Maier, Laurent Massoulié

    Abstract: The Procrustes-Wasserstein problem consists in matching two high-dimensional point clouds in an unsupervised setting, and has many applications in natural language processing and computer vision. We consider a planted model with two datasets $X,Y$ that consist of $n$ datapoints in $\mathbb{R}^d$, where $Y$ is a noisy version of $X$, up to an orthogonal transformation and a relabeling of the data p… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 28 pages, 1 figure. Comments are most welcome!

  2. arXiv:2311.00465  [pdf, ps, other

    math.OC cs.DC cs.LG

    Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization

    Authors: Mathieu Even, Anastasia Koloskova, Laurent Massoulié

    Abstract: Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization. Yet, combining these two techniques together still remains a challenge. In this paper, we take a step in this direction and introduce Asynchronous SGD on Gr… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  3. arXiv:2307.04679  [pdf, ps, other

    cs.LG math.OC

    Minimax Excess Risk of First-Order Methods for Statistical Learning with Data-Dependent Oracles

    Authors: Kevin Scaman, Mathieu Even, Batiste Le Bars, Laurent Massoulié

    Abstract: In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient… ▽ More

    Submitted 1 July, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: 22 pages, 0 figures

  4. arXiv:2302.14428  [pdf, other

    math.OC cs.LG

    Stochastic Gradient Descent under Markovian Sampling Schemes

    Authors: Mathieu Even

    Abstract: We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. These schemes encompass applications that range from decentralized optimization with a random walker (token algorithms), to RL and online system identification problems. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the und… ▽ More

    Submitted 23 June, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

  5. arXiv:2302.08982  [pdf, other

    cs.LG math.OC stat.ML

    (S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

    Authors: Mathieu Even, Scott Pesme, Suriya Gunasekar, Nicolas Flammarion

    Abstract: In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp ch… ▽ More

    Submitted 25 October, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

  6. arXiv:2206.07638  [pdf, other

    math.OC cs.DC cs.LG

    Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

    Authors: Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

    Abstract: The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the… ▽ More

    Submitted 20 April, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

  7. arXiv:2201.13097  [pdf, other

    math.OC

    Sample Optimality and All-for-all Strategies in Personalized Federated and Collaborative Learning

    Authors: Mathieu Even, Laurent Massoulié, Kevin Scaman

    Abstract: In personalized Federated Learning, each member of a potentially large set of agents aims to train a model minimizing its loss function averaged over its local data distribution. We study this problem under the lens of stochastic optimization. Specifically, we introduce information-theoretic lower bounds on the number of samples required from all agents to approximately minimize the generalization… ▽ More

    Submitted 1 February, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

  8. arXiv:2106.07644  [pdf, other

    math.OC cs.LG cs.MA math.PR stat.ML

    A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

    Authors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

    Abstract: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, o… ▽ More

    Submitted 27 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035

  9. arXiv:2106.03585  [pdf, other

    math.OC cs.MA math.PR stat.ML

    Asynchronous speedup in decentralized optimization

    Authors: Mathieu Even, Hadrien Hendrikx, Laurent Massoulie

    Abstract: In decentralized optimization, nodes of a communication network each possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node functions. While synchronous algorithms are heavily impacted by a few slow nodes or edges in the graph (the \emph{straggler problem}), their asynchronous counterparts are notoriously harder to parametr… ▽ More

    Submitted 1 September, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  10. arXiv:2104.09813  [pdf, other

    math.OC

    Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction

    Authors: Radu-Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx

    Abstract: We study the problem of minimizing a relatively-smooth convex function using stochastic Bregman gradient methods. We first prove the convergence of Bregman Stochastic Gradient Descent (BSGD) to a region that depends on the noise (magnitude of the gradients) at the optimum. In particular, BSGD with a constant step-size converges to the exact minimizer when this noise is zero (\emph{interpolation} s… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  11. arXiv:2102.04259  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    Concentration of Non-Isotropic Random Tensors with Applications to Learning and Empirical Risk Minimization

    Authors: Mathieu Even, Laurent Massoulié

    Abstract: Dimension is an inherent bottleneck to some modern learning tasks, where optimization methods suffer from the size of the data. In this paper, we study non-isotropic distributions of data and develop tools that aim at reducing these dimensional costs by a dependency on an effective dimension rather than the ambient one. Based on non-asymptotic estimates of the metric entropy of ellipsoids -- that… ▽ More

    Submitted 27 October, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    MSC Class: 60E15; 60B20; 60E15; 60F10

  12. arXiv:2011.02379  [pdf, other

    cs.DC cs.MA math.OC

    Asynchrony and Acceleration in Gossip Algorithms

    Authors: Mathieu Even, Hadrien Hendrikx, Laurent Massoulié

    Abstract: This paper considers the minimization of a sum of smooth and strongly convex functions dispatched over the nodes of a communication network. Previous works on the subject either focus on synchronous algorithms, which can be heavily slowed down by a few slow nodes (the straggler problem), or consider a model of asynchronous operation (Boyd et al., 2006) in which adjacent nodes communicate at the in… ▽ More

    Submitted 7 February, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    MSC Class: 68Q87; 60G55; 90-10