Search | arXiv e-print repository

Minimax Excess Risk of First-Order Methods for Statistical Learning with Data-Dependent Oracles

Authors: Kevin Scaman, Mathieu Even, Batiste Le Bars, Laurent Massoulié

Abstract: In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient… ▽ More In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient is accessed through partial observations given by a data-dependent oracle. This novel class of oracles can query the gradient with any given data distribution, and is thus well suited to scenarios in which the training data distribution does not match the target (or test) distribution. In particular, our upper and lower bounds are proportional to the smallest mean square error achievable by gradient estimators, thus allowing us to easily derive multiple sharp bounds in the aforementioned scenarios using the extensive literature on parameter estimation. △ Less

Submitted 1 July, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 22 pages, 0 figures

arXiv:2306.02939 [pdf, other]

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

Authors: Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Kevin Scaman, Giovanni Neglia

Abstract: This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex… ▽ More This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial for generalization. △ Less

Submitted 13 June, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2204.04452 [pdf, other]

Refined Convergence and Topology Learning for Decentralized SGD with Heterogeneous Data

Authors: Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Erick Lavoie, Anne-Marie Kermarrec

Abstract: One of the key challenges in decentralized and federated learning is to design algorithms that efficiently deal with highly heterogeneous data distributions across agents. In this paper, we revisit the analysis of the popular Decentralized Stochastic Gradient Descent algorithm (D-SGD) under data heterogeneity. We exhibit the key role played by a new quantity, called neighborhood heterogeneity, on… ▽ More One of the key challenges in decentralized and federated learning is to design algorithms that efficiently deal with highly heterogeneous data distributions across agents. In this paper, we revisit the analysis of the popular Decentralized Stochastic Gradient Descent algorithm (D-SGD) under data heterogeneity. We exhibit the key role played by a new quantity, called neighborhood heterogeneity, on the convergence rate of D-SGD. By coupling the communication topology and the heterogeneity, our analysis sheds light on the poorly understood interplay between these two concepts. We then argue that neighborhood heterogeneity provides a natural criterion to learn data-dependent topologies that reduce (and can even eliminate) the otherwise detrimental effect of data heterogeneity on the convergence time of D-SGD. For the important case of classification with label skew, we formulate the problem of learning such a good topology as a tractable optimization problem that we solve with a Frank-Wolfe algorithm. As illustrated over a set of simulated and real-world experiments, our approach provides a principled way to design a sparse topology that balances the convergence speed and the per-iteration communication costs of D-SGD under data heterogeneity. △ Less

Submitted 21 October, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

arXiv:1910.08512 [pdf, other]

Learning the piece-wise constant graph structure of a varying Ising model

Authors: Batiste Le Bars, Pierre Humbert, Argyris Kalogeratos, Nicolas Vayatis

Abstract: This work focuses on the estimation of multiple change-points in a time-varying Ising model that evolves piece-wise constantly. The aim is to identify both the moments at which significant changes occur in the Ising model, as well as the underlying graph structures. For this purpose, we propose to estimate the neighborhood of each node by maximizing a penalized version of its conditional log-likel… ▽ More This work focuses on the estimation of multiple change-points in a time-varying Ising model that evolves piece-wise constantly. The aim is to identify both the moments at which significant changes occur in the Ising model, as well as the underlying graph structures. For this purpose, we propose to estimate the neighborhood of each node by maximizing a penalized version of its conditional log-likelihood. The objective of the penalization is twofold: it imposes sparsity in the learned graphs and, thanks to a fused-type penalty, it also enforces them to evolve piece-wise constantly. Using few assumptions, we provide two change-points consistency theorems. Those are the first in the context of unknown number of change-points detection in time-varying Ising model. Finally, experimental results on several synthetic datasets and a real-world dataset demonstrate the performance of our method. △ Less

Submitted 30 June, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: 18 pages (9 pages for Appendix), 4 figures, 2 tables

arXiv:1902.04521 [pdf, other]

A Probabilistic Framework to Node-level Anomaly Detection in Communication Networks

Authors: Batiste Le Bars, Argyris Kalogeratos

Abstract: In this paper we consider the task of detecting abnormal communication volume occurring at node-level in communication networks. The signal of the communication activity is modeled by means of a clique stream: each occurring communication event is instantaneous and activates an undirected subgraph spanning over a set of equally participating nodes. We present a probabilistic framework to model and… ▽ More In this paper we consider the task of detecting abnormal communication volume occurring at node-level in communication networks. The signal of the communication activity is modeled by means of a clique stream: each occurring communication event is instantaneous and activates an undirected subgraph spanning over a set of equally participating nodes. We present a probabilistic framework to model and assess the communication volume observed at any single node. Specifically, we employ non-parametric regression to learn the probability that a node takes part in a certain event knowing the set of other nodes that are involved. On the top of that, we present a concentration inequality around the estimated volume of events in which a node could participate, which in turn allows us to build an efficient and interpretable anomaly scoring function. Finally, the superior performance of the proposed approach is empirically demonstrated in real-world sensor network data, as well as using synthetic communication activity that is in accordance with that latter setting. △ Less

Submitted 12 February, 2019; originally announced February 2019.

Comments: 9 pages, 4 figures

MSC Class: 68T99 ACM Class: F.2.2; I.5

Journal ref: IEEE International Conference on Computer Communications 2019 (INFOCOM), 2019

Showing 1–5 of 5 results for author: Bars, B L