Search | arXiv e-print repository

arXiv:2405.20623 [pdf, other]

Prune at the Clients, Not the Server: Accelerated Sparse Training in Federated Learning

Authors: Georg Meinhardt, Kai Yi, Laurent Condat, Peter Richtárik

Abstract: In the recent paradigm of Federated Learning (FL), multiple clients train a shared model while kee** their local data private. Resource constraints of clients and communication costs pose major problems for training large models in FL. On the one hand, addressing the resource limitations of the clients, sparse training has proven to be a powerful tool in the centralized setting. On the other han… ▽ More In the recent paradigm of Federated Learning (FL), multiple clients train a shared model while kee** their local data private. Resource constraints of clients and communication costs pose major problems for training large models in FL. On the one hand, addressing the resource limitations of the clients, sparse training has proven to be a powerful tool in the centralized setting. On the other hand, communication costs in FL can be addressed by local training, where each client takes multiple gradient steps on its local data. Recent work has shown that local training can provably achieve the optimal accelerated communication complexity [Mishchenko et al., 2022]. Hence, one would like an accelerated sparse training algorithm. In this work we show that naive integration of sparse training and acceleration at the server fails, and how to fix it by letting the clients perform these tasks appropriately. We introduce Sparse-ProxSkip, our method developed for the nonconvex setting, inspired by RandProx [Condat and Richtárik, 2022], which provably combines sparse training and acceleration in the convex setting. We demonstrate the good performance of Sparse-ProxSkip in extensive experiments. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.19951 [pdf, ps, other]

A Simple Linear Convergence Analysis of the Point-SAGA Algorithm

Authors: Laurent Condat, Peter Richtárik

Abstract: Point-SAGA is a randomized algorithm for minimizing a sum of convex functions using their proximity operators (proxs), proposed by Defazio (2016). At every iteration, the prox of only one randomly chosen function is called. We generalize the algorithm to any number of prox calls per iteration, not only one, and propose a simple proof of linear convergence when the functions are smooth and strongly… ▽ More Point-SAGA is a randomized algorithm for minimizing a sum of convex functions using their proximity operators (proxs), proposed by Defazio (2016). At every iteration, the prox of only one randomly chosen function is called. We generalize the algorithm to any number of prox calls per iteration, not only one, and propose a simple proof of linear convergence when the functions are smooth and strongly convex. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.14255 [pdf, other]

Stochastic Proximal Point Methods for Monotone Inclusions under Expected Similarity

Authors: Abdurakhmon Sadiev, Laurent Condat, Peter Richtárik

Abstract: Monotone inclusions have a wide range of applications, including minimization, saddle-point, and equilibria problems. We introduce new stochastic algorithms, with or without variance reduction, to estimate a root of the expectation of possibly set-valued monotone operators, using at every iteration one call to the resolvent of a randomly sampled operator. We also introduce a notion of similarity b… ▽ More Monotone inclusions have a wide range of applications, including minimization, saddle-point, and equilibria problems. We introduce new stochastic algorithms, with or without variance reduction, to estimate a root of the expectation of possibly set-valued monotone operators, using at every iteration one call to the resolvent of a randomly sampled operator. We also introduce a notion of similarity between the operators, which holds even for discontinuous operators. We leverage it to derive linear convergence results in the strongly monotone setting. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.09904 [pdf, other]

FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models

Authors: Kai Yi, Georg Meinhardt, Laurent Condat, Peter Richtárik

Abstract: Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy. A critical bottleneck in FL is the communication cost. A pivotal strategy to mitigate this burden is \emph{Local Training}, which involves running multiple local stoc… ▽ More Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy. A critical bottleneck in FL is the communication cost. A pivotal strategy to mitigate this burden is \emph{Local Training}, which involves running multiple local stochastic gradient descent iterations between communication phases. Our work is inspired by the innovative \emph{Scaffnew} algorithm, which has considerably advanced the reduction of communication complexity in FL. We introduce FedComLoc (Federated Compressed and Local Training), integrating practical and effective compression into \emph{Scaffnew} to further enhance communication efficiency. Extensive experiments, using the popular TopK compressor and quantization, demonstrate its prowess in substantially reducing communication overheads in heterogeneous settings. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.04348 [pdf, other]

LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression

Authors: Laurent Condat, Artavazd Maranjyan, Peter Richtárik

Abstract: In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical. We introduce LoCoDL, a communication-efficient algorithm that leverages the two popular and effective techniques of Local training, which reduces the communication frequency, and Compression, in which short bitstreams are sent instead of full-… ▽ More In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical. We introduce LoCoDL, a communication-efficient algorithm that leverages the two popular and effective techniques of Local training, which reduces the communication frequency, and Compression, in which short bitstreams are sent instead of full-dimensional vectors of floats. LoCoDL works with a large class of unbiased compressors that includes widely-used sparsification and quantization methods. LoCoDL provably benefits from local training and compression and enjoys a doubly-accelerated communication complexity, with respect to the condition number of the functions and the model dimension, in the general heterogenous regime with strongly convex functions. This is confirmed in practice, with LoCoDL outperforming existing algorithms. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2310.07983 [pdf, other]

Revisiting Decentralized ProxSkip: Achieving Linear Speedup

Authors: Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, **de Cao

Abstract: The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect… ▽ More The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect to the number of nodes. So far, questions remain open about how ProxSkip behaves in the non-convex setting and whether linear speedup is achievable. In this paper, we revisit decentralized ProxSkip and address both questions. We demonstrate that the leading communication complexity of ProxSkip is $\mathcal{O}\left(\frac{pσ^2}{nε^2}\right)$ for non-convex and convex settings, and $\mathcal{O}\left(\frac{pσ^2}{nε}\right)$ for the strongly convex setting, where $n$ represents the number of nodes, $p$ denotes the probability of communication, $σ^2$ signifies the level of stochastic noise, and $ε$ denotes the desired accuracy level. This result illustrates that ProxSkip achieves linear speedup and can asymptotically reduce communication overhead proportional to the probability of communication. Additionally, for the strongly convex setting, we further prove that ProxSkip can achieve linear speedup with network-independent stepsizes. △ Less

Submitted 19 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2307.09836 [pdf, other]

Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application to Sparse Autoencoders

Authors: Guillaume Perez, Laurent Condat, Michel Barlaud

Abstract: Looking for sparsity is nowadays crucial to speed up the training of large-scale neural networks. Projections onto the $\ell_{1,2}$ and $\ell_{1,\infty}$ are among the most efficient techniques to sparsify and reduce the overall cost of neural networks. In this paper, we introduce a new projection algorithm for the $\ell_{1,\infty}$ norm ball. The worst-case time complexity of this algorithm is… ▽ More Looking for sparsity is nowadays crucial to speed up the training of large-scale neural networks. Projections onto the $\ell_{1,2}$ and $\ell_{1,\infty}$ are among the most efficient techniques to sparsify and reduce the overall cost of neural networks. In this paper, we introduce a new projection algorithm for the $\ell_{1,\infty}$ norm ball. The worst-case time complexity of this algorithm is $\mathcal{O}\big(nm+J\log(nm)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. $J$ is a term that tends to 0 when the sparsity is high, and to $nm$ when the sparsity is low. Its implementation is easy and it is guaranteed to converge to the exact solution in a finite time. Moreover, we propose to incorporate the $\ell_{1,\infty}$ ball projection while training an autoencoder to enforce feature selection and sparsity of the weights. Sparsification appears in the encoder to primarily do feature selection due to our application in biology, where only a very small part ($<2\%$) of the data is relevant. We show that both in the biological case and in the general case of sparsity that our method is the fastest. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 22 pages, 8 figures

arXiv:2305.13170 [pdf, other]

Explicit Personalization and Local Training: Double Communication Acceleration in Federated Learning

Authors: Kai Yi, Laurent Condat, Peter Richtárik

Abstract: Federated Learning is an evolving machine learning paradigm, in which multiple clients perform computations based on their individual private data, interspersed by communication with a remote server. A common strategy to curtail communication costs is Local Training, which consists in performing multiple local stochastic gradient descent steps between successive communication rounds. However, the… ▽ More Federated Learning is an evolving machine learning paradigm, in which multiple clients perform computations based on their individual private data, interspersed by communication with a remote server. A common strategy to curtail communication costs is Local Training, which consists in performing multiple local stochastic gradient descent steps between successive communication rounds. However, the conventional approach to local training overlooks the practical necessity for client-specific personalization, a technique to tailor local models to individual needs. We introduce Scafflix, a novel algorithm that efficiently integrates explicit personalization with local training. This innovative approach benefits from these two techniques, thereby achieving doubly accelerated communication, as we demonstrate both in theory and practice. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2302.09832 [pdf, other]

TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation

Authors: Laurent Condat, Ivan Agarský, Grigory Malinovsky, Peter Richtárik

Abstract: In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server. Communication is usually slow and costly and forms the main bottleneck. This is particularly true in federated learning, where a large number of users collaborate toward a global training task. In addition, it is desirable for a robust algorithm to al… ▽ More In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server. Communication is usually slow and costly and forms the main bottleneck. This is particularly true in federated learning, where a large number of users collaborate toward a global training task. In addition, it is desirable for a robust algorithm to allow for partial participation, since it is often the case that some clients are not able to participate to the entire process and are idle at certain times. Two strategies are popular to reduce the communication burden: 1) local training, which consists in communicating less frequently, or equivalently performing more local computations between the communication rounds; and 2) compression, whereby compressed information instead of full-dimensional vectors is communicated. We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation. In the strongly convex setting, TAMUNA converges linearly to the exact solution and provably benefits from the two mechanisms: it exhibits a doubly-accelerated convergence rate, with respect to the condition number of the functions and the model dimension. △ Less

Submitted 27 April, 2024; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: This work is a follow-up of our previous work introducing CompressedScaffnew in paper arXiv:2210.13277

arXiv:2210.13277 [pdf, other]

Provably Doubly Accelerated Federated Learning: The First Theoretically Successful Combination of Local Training and Communication Compression

Authors: Laurent Condat, Ivan Agarský, Peter Richtárik

Abstract: In federated learning, a large number of users are involved in a global learning task, in a collaborative way. They alternate local computations and two-way communication with a distant orchestrating server. Communication, which can be slow and costly, is the main bottleneck in this setting. To reduce the communication load and therefore accelerate distributed gradient descent, two strategies are… ▽ More In federated learning, a large number of users are involved in a global learning task, in a collaborative way. They alternate local computations and two-way communication with a distant orchestrating server. Communication, which can be slow and costly, is the main bottleneck in this setting. To reduce the communication load and therefore accelerate distributed gradient descent, two strategies are popular: 1) communicate less frequently; that is, perform several iterations of local computations between the communication rounds; and 2) communicate compressed information instead of full-dimensional vectors. We propose the first algorithm for distributed optimization and federated learning, which harnesses these two strategies jointly and converges linearly to an exact solution in the strongly convex setting, with a doubly accelerated rate: our algorithm benefits from the two acceleration mechanisms provided by local training and compression, namely a better dependency on the condition number of the functions and on the dimension of the model, respectively. △ Less

Submitted 2 February, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2209.01455 [pdf, other]

doi 10.1109/TCI.2023.3261503

Joint demosaicing and fusion of multiresolution coded acquisitions: A unified image formation and reconstruction method

Authors: Daniele Picone, Mauro Dalla Mura, Laurent Condat

Abstract: Novel optical imaging devices allow for hybrid acquisition modalities such as compressed acquisitions with locally different spatial and spectral resolutions captured by a single focal plane array. In this work, we propose to model the capturing system of a multiresolution coded acquisition (MRCA) in a unified framework, which natively includes conventional systems such as those based on spectral/… ▽ More Novel optical imaging devices allow for hybrid acquisition modalities such as compressed acquisitions with locally different spatial and spectral resolutions captured by a single focal plane array. In this work, we propose to model the capturing system of a multiresolution coded acquisition (MRCA) in a unified framework, which natively includes conventional systems such as those based on spectral/color filter arrays, compressed coded apertures, and multiresolution sensing. We also propose a model-based image reconstruction algorithm performing a joint demosaicing and fusion (JoDeFu) of any acquisition modeled in the MRCA framework. The JoDeFu reconstruction algorithm solves an inverse problem with a proximal splitting technique and is able to reconstruct an uncompressed image datacube at the highest available spatial and spectral resolution. An implementation of the code is available at https://github.com/danaroth83/jodefu. △ Less

Submitted 10 April, 2023; v1 submitted 3 September, 2022; originally announced September 2022.

Comments: 15 pages, 7 figures; regular paper

Journal ref: IEEE Transactions on Computational Imaging, Vol. 9 (2023), p. 335-349

arXiv:2207.12891 [pdf, ps, other]

RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates

Authors: Laurent Condat, Peter Richtárik

Abstract: Proximal splitting algorithms are well suited to solving large-scale nonsmooth optimization problems, in particular those arising in machine learning. We propose a new primal-dual algorithm, in which the dual update is randomized; equivalently, the proximity operator of one of the function in the problem is replaced by a stochastic oracle. For instance, some randomly chosen dual variables, instead… ▽ More Proximal splitting algorithms are well suited to solving large-scale nonsmooth optimization problems, in particular those arising in machine learning. We propose a new primal-dual algorithm, in which the dual update is randomized; equivalently, the proximity operator of one of the function in the problem is replaced by a stochastic oracle. For instance, some randomly chosen dual variables, instead of all, are updated at each iteration. Or, the proximity operator of a function is called with some small probability only. A nonsmooth variance-reduction technique is implemented so that the algorithm finds an exact minimizer of the general problem involving smooth and nonsmooth functions, possibly composed with linear operators. We derive linear convergence results in presence of strong convexity; these results are new even in the deterministic case, when our algorithms reverts to the recently proposed Primal-Dual Davis-Yin algorithm. Some randomized algorithms of the literature are also recovered as particular cases (e.g., Point-SAGA). But our randomization technique is general and encompasses many unbiased mechanisms beyond sampling and probabilistic updates, including compression. Since the convergence speed depends on the slowest among the primal and dual contraction mechanisms, the iteration complexity might remain the same when randomness is used. On the other hand, the computation complexity can be significantly reduced. Overall, randomness helps getting faster algorithms. This has long been known for stochastic-gradient-type algorithms, and our work shows that this fully applies in the more general primal-dual setting as well. △ Less

Submitted 7 March, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: International Conference on Learning Representations (ICLR) 2023

arXiv:2207.12330 [pdf, ps, other]

Tikhonov Regularization of Sphere-Valued Signals

Authors: Laurent Condat

Abstract: It is common to have to process signals, whose values are points on the 3-D sphere. We consider a Tikhonov-type regularization model to smoothen or interpolate sphere-valued signals defined on arbitrary graphs. We propose a convex relaxation of this nonconvex problem as a semidefinite program, which is easy to solve numerically and is efficient in practice. It is common to have to process signals, whose values are points on the 3-D sphere. We consider a Tikhonov-type regularization model to smoothen or interpolate sphere-valued signals defined on arbitrary graphs. We propose a convex relaxation of this nonconvex problem as a semidefinite program, which is easy to solve numerically and is efficient in practice. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2205.04180 [pdf, other]

EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization

Authors: Laurent Condat, Kai Yi, Peter Richtárik

Abstract: In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each communication round of iterative methods. There are two classes of compression operators and separate algorithms making use of them. In the case of unbiased random compressors with bo… ▽ More In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each communication round of iterative methods. There are two classes of compression operators and separate algorithms making use of them. In the case of unbiased random compressors with bounded variance (e.g., rand-k), the DIANA algorithm of Mishchenko et al. (2019), which implements a variance reduction technique for handling the variance introduced by compression, is the current state of the art. In the case of biased and contractive compressors (e.g., top-k), the EF21 algorithm of Richtárik et al. (2021), which instead implements an error-feedback mechanism, is the current state of the art. These two classes of compression schemes and algorithms are distinct, with different analyses and proof techniques. In this paper, we unify them into a single framework and propose a new algorithm, recovering DIANA and EF21 as particular cases. Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases. This allows us to inherit the best of the two worlds: like EF21 and unlike DIANA, biased compressors, like top-k, whose good performance in practice is recognized, can be used. And like DIANA and unlike EF21, independent randomness at the compressors allows to mitigate the effects of compression, with the convergence rate improving when the number of parallel workers is large. This is the first time that an algorithm with all these features is proposed. We prove its linear convergence under certain conditions. Our approach takes a step towards better understanding of two so-far distinct worlds of communication-efficient distributed learning. △ Less

Submitted 6 March, 2023; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: Conference NeurIPS 2022

arXiv:2108.02602 [pdf, ps, other]

doi 10.1109/TSP.2022.3179816

Tikhonov Regularization of Circle-Valued Signals

Authors: Laurent Condat

Abstract: It is common to have to process signals or images whose values are cyclic and can be represented as points on the complex circle, like wrapped phases, angles, orientations, or color hues. We consider a Tikhonov-type regularization model to smoothen or interpolate circle-valued signals defined on arbitrary graphs. We propose a convex relaxation of this nonconvex problem as a semidefinite program, a… ▽ More It is common to have to process signals or images whose values are cyclic and can be represented as points on the complex circle, like wrapped phases, angles, orientations, or color hues. We consider a Tikhonov-type regularization model to smoothen or interpolate circle-valued signals defined on arbitrary graphs. We propose a convex relaxation of this nonconvex problem as a semidefinite program, and an efficient algorithm to solve it. △ Less

Submitted 7 June, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

arXiv:2106.03056 [pdf, ps, other]

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

Authors: Laurent Condat, Peter Richtárik

Abstract: We propose a generic variance-reduced algorithm, which we call MUltiple RANdomized Algorithm (MURANA), for minimizing a sum of several smooth functions plus a regularizer, in a sequential or distributed manner. Our method is formulated with general stochastic operators, which allow us to model various strategies for reducing the computational complexity. For example, MURANA supports sparse activat… ▽ More We propose a generic variance-reduced algorithm, which we call MUltiple RANdomized Algorithm (MURANA), for minimizing a sum of several smooth functions plus a regularizer, in a sequential or distributed manner. Our method is formulated with general stochastic operators, which allow us to model various strategies for reducing the computational complexity. For example, MURANA supports sparse activation of the gradients, and also reduction of the communication load via compression of the update vectors. This versatility allows MURANA to cover many existing randomization mechanisms within a unified framework. However, MURANA also encodes new methods as special cases. We highlight one of them, which we call ELVIRA, and show that it improves upon Loopless SVRG. △ Less

Submitted 7 March, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

arXiv:2102.11079 [pdf, ps, other]

An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints

Authors: Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik

Abstract: Optimization problems under affine constraints appear in various areas of machine learning. We consider the task of minimizing a smooth strongly convex function F(x) under the affine constraint Kx=b, with an oracle providing evaluations of the gradient of F and multiplications by K and its transpose. We provide lower bounds on the number of gradient computations and matrix multiplications to achie… ▽ More Optimization problems under affine constraints appear in various areas of machine learning. We consider the task of minimizing a smooth strongly convex function F(x) under the affine constraint Kx=b, with an oracle providing evaluations of the gradient of F and multiplications by K and its transpose. We provide lower bounds on the number of gradient computations and matrix multiplications to achieve a given accuracy. Then we propose an accelerated primal-dual algorithm achieving these lower bounds. Our algorithm is the first optimal algorithm for this class of problems. △ Less

Submitted 10 April, 2022; v1 submitted 22 February, 2021; originally announced February 2021.

arXiv:2010.03246 [pdf, other]

Optimal Gradient Compression for Distributed and Federated Learning

Authors: Alyazeed Albasyoni, Mher Safaryan, Laurent Condat, Peter Richtárik

Abstract: Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, o… ▽ More Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, or low-rank approximation. Since compression is a lossy, or inexact, process, the iteration complexity is typically worsened; but the total communication complexity can improve significantly, possibly leading to large computation time savings. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error. We perform both worst-case and average-case analysis, providing tight lower bounds. In the worst-case analysis, we introduce an efficient compression operator, Sparse Dithering, which is very close to the lower bound. In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound. Thus, our new compression schemes significantly outperform the state of the art. We conduct numerical experiments to illustrate this improvement. △ Less

Submitted 7 October, 2020; originally announced October 2020.

arXiv:2010.00952 [pdf, other]

doi 10.3389/frsip.2021.776825

Distributed Proximal Splitting Algorithms with Rates and Acceleration

Authors: Laurent Condat, Grigory Malinovsky, Peter Richtárik

Abstract: We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization. We derive sublinear and linear convergence results with new rates on the function value suboptimality or distance to the solution, as well as new accelerated versions, using varying stepsizes. In addition, we propose distributed variants of these algorithms, which can be accelerated… ▽ More We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization. We derive sublinear and linear convergence results with new rates on the function value suboptimality or distance to the solution, as well as new accelerated versions, using varying stepsizes. In addition, we propose distributed variants of these algorithms, which can be accelerated as well. While most existing results are ergodic, our nonergodic results significantly broaden our understanding of primal-dual optimization algorithms. △ Less

Submitted 27 January, 2022; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:2004.02635 [pdf, other]

doi 10.1007/s10957-022-02061-8

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

Authors: Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik

Abstract: We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal-dual algorithm, which we call PDDY,… ▽ More We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal-dual algorithm, which we call PDDY, for this problem. It is constructed by applying Davis-Yin splitting to a monotone inclusion in a primal-dual product space, where the operators are monotone under a specific metric depending on L. We show that three existing algorithms (the two forms of the Condat-Vu algorithm and the PD3O algorithm) have the same structure, so that PDDY is the fourth missing link in this self-consistent class of primal-dual algorithms. This representation eases the convergence analysis: it allows us to derive sublinear convergence rates in general, and linear convergence results in presence of strong convexity. Moreover, within our broad and flexible analysis framework, we propose new stochastic generalizations of the algorithms, in which a variance-reduced random estimate of the gradient of F is used, instead of the true gradient. Furthermore, we obtain, as a special case of PDDY, a linearly converging algorithm for the minimization of a strongly convex function F under a linear constraint; we discuss its important application to decentralized optimization. △ Less

Submitted 26 July, 2022; v1 submitted 3 April, 2020; originally announced April 2020.

arXiv:2004.01442 [pdf, other]

From Local SGD to Local Fixed-Point Methods for Federated Learning

Authors: Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik

Abstract: Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computatio… ▽ More Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computations done locally on a mobile device. We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations. In both cases, the goal is to limit communication of the locally-computed variables, which is often the bottleneck in distributed frameworks. We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach. △ Less

Submitted 16 June, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

Comments: Accepted to ICML 2020

arXiv:1912.00137 [pdf, ps, other]

Proximal Splitting Algorithms for Convex Optimization: A Tour of Recent Advances, with New Twists

Authors: Laurent Condat, Daichi Kitahara, Andrés Contreras, Akira Hirabayashi

Abstract: Convex nonsmooth optimization problems, whose solutions live in very high dimensional spaces, have become ubiquitous. To solve them, the class of first-order algorithms known as proximal splitting algorithms is particularly adequate: they consist of simple operations, handling the terms in the objective function separately. In this overview, we demystify a selection of recent proximal splitting al… ▽ More Convex nonsmooth optimization problems, whose solutions live in very high dimensional spaces, have become ubiquitous. To solve them, the class of first-order algorithms known as proximal splitting algorithms is particularly adequate: they consist of simple operations, handling the terms in the objective function separately. In this overview, we demystify a selection of recent proximal splitting algorithms: we present them within a unified framework, which consists in applying splitting methods for monotone inclusions in primal-dual product spaces, with well-chosen metrics. Along the way, we easily derive new variants of the algorithms and revisit existing convergence results, extending the parameter ranges in several cases. In particular, we emphasize that when the smooth term in the objective function is quadratic, e.g., for least-squares problems, convergence is guaranteed with larger values of the relaxation parameter than previously known. Such larger values are usually beneficial for the convergence speed in practice. △ Less

Submitted 24 February, 2023; v1 submitted 30 November, 2019; originally announced December 2019.

Comments: To appear in SIAM Review

MSC Class: 90C25; 90C30; 90C06; 47J25; 47J26; 68W15; 65K05

arXiv:1708.00267 [pdf, ps, other]

Riesz-based orientation of localizable Gaussian fields

Authors: Kévin Polisano, Marianne Clausel, Valérie Perrier, Laurent Condat

Abstract: In this work we give a sense to the notion of orientation for self-similar Gaussian fields with stationary increments, based on a Riesz analysis of these fields, with isotropic zero-mean analysis functions. We propose a structure tensor formulation and provide an intrinsic definition of the orientation vector as eigenvector of this tensor. That is, we show that the orientation vector does not depe… ▽ More In this work we give a sense to the notion of orientation for self-similar Gaussian fields with stationary increments, based on a Riesz analysis of these fields, with isotropic zero-mean analysis functions. We propose a structure tensor formulation and provide an intrinsic definition of the orientation vector as eigenvector of this tensor. That is, we show that the orientation vector does not depend on the analysis function, but only on the anisotropy encoded in the spectral density of the field. Then, we generalize this definition to a larger class of random fields called localizable Gaussian fields, whose orientation is derived from the orientation of their tangent fields. Two classes of Gaussian models with prescribed orientation are studied in the light of these new analysis tools. △ Less

Submitted 29 November, 2018; v1 submitted 1 August, 2017; originally announced August 2017.

arXiv:1504.05854 [pdf, ps, other]

doi 10.1109/TSP.2016.2516962

On-the-fly Approximation of Multivariate Total Variation Minimization

Authors: Jordan Frecon, Nelly Pustelnik, Patrice Abry, Laurent Condat

Abstract: In the context of change-point detection, addressed by Total Variation minimization strategies, an efficient on-the-fly algorithm has been designed leading to exact solutions for univariate data. In this contribution, an extension of such an on-the-fly strategy to multivariate data is investigated. The proposed algorithm relies on the local validation of the Karush-Kuhn-Tucker conditions on the du… ▽ More In the context of change-point detection, addressed by Total Variation minimization strategies, an efficient on-the-fly algorithm has been designed leading to exact solutions for univariate data. In this contribution, an extension of such an on-the-fly strategy to multivariate data is investigated. The proposed algorithm relies on the local validation of the Karush-Kuhn-Tucker conditions on the dual problem. Showing that the non-local nature of the multivariate setting precludes to obtain an exact on-the-fly solution, we devise an on-the-fly algorithm delivering an approximate solution, whose quality is controlled by a practitioner-tunable parameter, acting as a trade-off between quality and computational cost. Performance assessment shows that high quality solutions are obtained on-the-fly while benefiting of computational costs several orders of magnitude lower than standard iterative procedures. The proposed algorithm thus provides practitioners with an efficient multivariate change-point detection on-the-fly procedure. △ Less

Submitted 28 August, 2016; v1 submitted 22 April, 2015; originally announced April 2015.

arXiv:1503.06716 [pdf, ps, other]

Modélisations de textures par champ gaussien à orientation locale prescrite

Authors: Kévin Polisano, Marianne Clausel, Valérie Perrier, Laurent Condat

Abstract: This paper presents two new models of oriented texture, based on a new class of Gaussian fields, called locally anisotropic fractional Brownian fields, with prescribed local orientation at any point. These fields are a local version of a specific class of anisotropic self-similar Gaussian fields with stationary increments. The simulation of such textures is obtained using a new algorithm mixing th… ▽ More This paper presents two new models of oriented texture, based on a new class of Gaussian fields, called locally anisotropic fractional Brownian fields, with prescribed local orientation at any point. These fields are a local version of a specific class of anisotropic self-similar Gaussian fields with stationary increments. The simulation of such textures is obtained using a new algorithm mixing the tangent field formulation with the Cholesky method or the turning band method, this latter method having proved its efficiency for generating stationary anisotropic textures. Numerical experiments show the ability of the method for synthesis of textures with prescribed local orientation. △ Less

Submitted 26 June, 2015; v1 submitted 20 March, 2015; originally announced March 2015.

Comments: in French

Journal ref: Gretsi, Sep 2015, Lyon, France

arXiv:1406.5439 [pdf, ps, other]

A forward-backward view of some primal-dual optimization methods in image recovery

Authors: Patrick L. Combettes, Laurent Condat, Jean-Christophe Pesquet, Bang Cong Vu

Abstract: A wide array of image recovery problems can be abstracted into the problem of minimizing a sum of composite convex functions in a Hilbert space. To solve such problems, primal-dual proximal approaches have been developed which provide efficient solutions to large-scale optimization problems. The objective of this paper is to show that a number of existing algorithms can be derived from a general f… ▽ More A wide array of image recovery problems can be abstracted into the problem of minimizing a sum of composite convex functions in a Hilbert space. To solve such problems, primal-dual proximal approaches have been developed which provide efficient solutions to large-scale optimization problems. The objective of this paper is to show that a number of existing algorithms can be derived from a general form of the forward-backward algorithm applied in a suitable product space. Our approach also allows us to develop useful extensions of existing algorithms by introducing a variable metric. An illustration to image restoration is provided. △ Less

Submitted 20 June, 2014; originally announced June 2014.

arXiv:1405.5891 [pdf, ps, other]

Texture Modeling by Gaussian fields with prescribed local orientation

Authors: Kévin Polisano, Marianne Clausel, Valérie Perrier, Laurent Condat

Abstract: This paper presents a new framework for oriented texture modeling. We introduce a new class of Gaussian fields, called Locally Anisotropic Fractional Brownian Fields, with prescribed local orientation at any point. These fields are a local version of a specific class of anisotropic self-similar Gaussian fields with stationary increments. The simulation of such textures is obtained using a new algo… ▽ More This paper presents a new framework for oriented texture modeling. We introduce a new class of Gaussian fields, called Locally Anisotropic Fractional Brownian Fields, with prescribed local orientation at any point. These fields are a local version of a specific class of anisotropic self-similar Gaussian fields with stationary increments. The simulation of such textures is obtained using a new algorithm mixing the tangent field formulation and a turning band method, this latter method having proved its efficiency for generating stationary anisotropic textures. Numerical experiments show the ability of the method for synthesis of textures with prescribed local orientation. △ Less

Submitted 22 May, 2014; originally announced May 2014.

Comments: Article de 4 pages accepté à l'ICIP 2014 (http://www.icip2014.org)

arXiv:1404.7680 [pdf, other]

doi 10.1109/TIP.2014.2363000

2-D Prony-Huang Transform: A New Tool for 2-D Spectral Analysis

Authors: Jérémy Schmitt, Nelly Pustelnik, Pierre Borgnat, Patrick Flandrin, Laurent Condat

Abstract: This work proposes an extension of the 1-D Hilbert Huang transform for the analysis of images. The proposed method consists in (i) adaptively decomposing an image into oscillating parts called intrinsic mode functions (IMFs) using a mode decomposition procedure, and (ii) providing a local spectral analysis of the obtained IMFs in order to get the local amplitudes, frequencies, and orientations. Fo… ▽ More This work proposes an extension of the 1-D Hilbert Huang transform for the analysis of images. The proposed method consists in (i) adaptively decomposing an image into oscillating parts called intrinsic mode functions (IMFs) using a mode decomposition procedure, and (ii) providing a local spectral analysis of the obtained IMFs in order to get the local amplitudes, frequencies, and orientations. For the decomposition step, we propose two robust 2-D mode decompositions based on non-smooth convex optimization: a "Genuine 2-D" approach, that constrains the local extrema of the IMFs, and a "Pseudo 2-D" approach, which constrains separately the extrema of lines, columns, and diagonals. The spectral analysis step is based on Prony annihilation property that is applied on small square patches of the IMFs. The resulting 2-D Prony-Huang transform is validated on simulated and real data. △ Less

Submitted 30 April, 2014; originally announced April 2014.

Comments: 24 pages, 7 figures

Showing 1–28 of 28 results for author: Condat, L