Search | arXiv e-print repository

DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization

Abstract: This work introduces DADAO: the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $μ$-strongly convex functions distributed over a given network of size $n$. Our key insight is based on modeling the local gradient updates and gossip communication procedures with separate independent Poisson Point Processes. This allows us to decoupl… ▽ More This work introduces DADAO: the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $μ$-strongly convex functions distributed over a given network of size $n$. Our key insight is based on modeling the local gradient updates and gossip communication procedures with separate independent Poisson Point Processes. This allows us to decouple the computation and communication steps, which can be run in parallel, while making the whole approach completely asynchronous. This leads to communication acceleration compared to synchronous approaches. Our new method employs primal gradients and does not use a multi-consensus inner loop nor other ad-hoc mechanisms such as Error Feedback, Gradient Tracking, or a Proximal operator. By relating the inverse of the smallest positive eigenvalue of the Laplacian matrix $χ_1$ and the maximal resistance $χ_2\leq χ_1$ of the graph to a sufficient minimal communication rate between the nodes of the network, we show that our algorithm requires $\mathcal{O}(n\sqrt{\frac{L}μ}\log(\frac{1}ε))$ local gradients and only $\mathcal{O}(n\sqrt{χ_1χ_2}\sqrt{\frac{L}μ}\log(\frac{1}ε))$ communications to reach a precision $ε$, up to logarithmic terms. Thus, we simultaneously obtain an accelerated rate for both computations and communications, leading to an improvement over state-of-the-art works, our simulations further validating the strength of our relatively unconstrained method. △ Less

Submitted 6 December, 2023; v1 submitted 26 July, 2022; originally announced August 2022.

Comments: International Conference on Machine Learning, Jul 2023, Honolulu, United States

arXiv:1812.07956 [pdf, other]

On Lazy Training in Differentiable Programming

Authors: Lenaic Chizat, Edouard Oyallon, Francis Bach

Abstract: In a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying. In this work, we show that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, t… ▽ More In a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying. In this work, we show that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels. Through a theoretical analysis, we exhibit various situations where this phenomenon arises in non-convex optimization and we provide bounds on the distance between the lazy and linearized optimization paths. Our numerical experiments bring a critical note, as we observe that the performance of commonly used non-linear deep convolutional neural networks in computer vision degrades when trained in the lazy regime. This makes it unlikely that "lazy training" is behind the many successes of neural networks in difficult high dimensional tasks. △ Less

Submitted 7 January, 2020; v1 submitted 19 December, 2018; originally announced December 2018.

Journal ref: Advances in Neural Information Processing Systems (NeurIPS), Dec 2019, Vancouver, Canada

arXiv:1806.00370 [pdf, ps, other]

Nonlinear Acceleration of CNNs

Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Abstract: The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to… ▽ More The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training. △ Less

Submitted 1 June, 2018; originally announced June 2018.

arXiv:1805.09639 [pdf, ps, other]

Online Regularized Nonlinear Acceleration

Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Abstract: Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is com… ▽ More Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is comparable to that of quasi-Newton methods. However, RNA cannot accelerate faster multistep algorithms like Nesterov's method and often diverges in this context. Here, we adapt RNA to overcome these issues, so that our scheme can be used on fast algorithms such as gradient methods with momentum. We show optimal complexity bounds for quadratics and asymptotically optimal rates on general convex minimization problems. Moreover, this new scheme works online, i.e., extrapolated solution estimates can be reinjected at each iteration, significantly improving numerical performance over classical accelerated methods. △ Less

Submitted 21 June, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

Showing 1–4 of 4 results for author: Oyallon, E