Decentralized Learning Made Practical with Client Sampling

de Vos, Martijn; Dhasade, Akash; Kermarrec, Anne-Marie; Lavoie, Erick; Pouwelse, Johan; Sharma, Rishi

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2302.13837 (cs)

[Submitted on 27 Feb 2023 (v1), last revised 7 May 2024 (this version, v2)]

Title:Decentralized Learning Made Practical with Client Sampling

Authors:Martijn de Vos, Akash Dhasade, Anne-Marie Kermarrec, Erick Lavoie, Johan Pouwelse, Rishi Sharma

View PDF HTML (experimental)

Abstract:Decentralized learning (DL) leverages edge devices for collaborative model training while avoiding coordination by a central server. Due to privacy concerns, DL has become an attractive alternative to centralized learning schemes since training data never leaves the device. In a round of DL, all nodes participate in model training and exchange their model with some other nodes. Performing DL in large-scale heterogeneous networks results in high communication costs and prolonged round durations due to slow nodes, effectively inflating the total training time. Furthermore, current DL algorithms also assume all nodes are available for training and aggregation at all times, diminishing the practicality of DL. This paper presents Plexus, an efficient, scalable, and practical DL system. Plexus (1) avoids network-wide participation by introducing a decentralized peer sampler that selects small subsets of available nodes that train the model each round and, (2) aggregates the trained models produced by nodes every round. Plexus is designed to handle joining and leaving nodes (churn). We extensively evaluate Plexus by incorporating realistic traces for compute speed, pairwise latency, network capacity, and availability of edge devices in our experiments. Our experiments on four common learning tasks empirically show that Plexus reduces time-to-accuracy by 1.2-8.3x, communication volume by 2.4-15.3x and training resources needed for convergence by 6.4-370x compared to baseline DL algorithms.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2302.13837 [cs.DC]
	(or arXiv:2302.13837v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2302.13837

Submission history

From: Martijn de Vos [view email]
[v1] Mon, 27 Feb 2023 14:39:41 UTC (1,342 KB)
[v2] Tue, 7 May 2024 14:52:56 UTC (363 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Decentralized Learning Made Practical with Client Sampling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Decentralized Learning Made Practical with Client Sampling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators