ssd-sgd: communication sparsification for distributed deep learning training

Xu, Yemao; Dong, Dezun; Zhao, Yawei; Xu, Weixia; Liao, Xiangke

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2012.05396v1 (cs)

[Submitted on 10 Dec 2020 (this version), latest version 9 Apr 2021 (v3)]

Title:ssd-sgd: communication sparsification for distributed deep learning training

Authors:Yemao Xu, Dezun Dong, Yawei Zhao, Weixia Xu, Xiangke Liao

View PDF

Abstract:Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous SGD (ASGD) delivers a faster raw training speed, we propose Several Steps Delay SGD (SSD-SGD) to combine their merits, aiming at tackling the communication bottleneck via communication sparsification. SSD-SGD explores both global synchronous updates in the parameter servers and asynchronous local updates in the workers in each periodic iteration. The periodic and flexible synchronization makes SSD-SGD achieve good convergence accuracy and fast training speed. To the best of our knowledge, we strike the new balance between synchronization quality and communication sparsification, and improve the trade-off between accuracy and training speed. Specifically, the core components of SSD-SGD include proper warm-up stage, steps delay stage, and our novel algorithm of global gradient for local update (GLU). GLU is critical for local update operations to effectively compensate the delayed local weights. Furthermore, we implement SSD-SGD on MXNet framework and comprehensively evaluate its performance with CIFAR-10 and ImageNet datasets. Experimental results show that SSD-SGD can accelerate distributed training speed under different experimental configurations, by up to 110%, while achieving good convergence accuracy.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2012.05396 [cs.DC]
	(or arXiv:2012.05396v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2012.05396

Submission history

From: Yemao Xu Mr [view email]
[v1] Thu, 10 Dec 2020 01:32:11 UTC (3,177 KB)
[v2] Tue, 12 Jan 2021 03:35:05 UTC (3,176 KB)
[v3] Fri, 9 Apr 2021 04:53:29 UTC (3,357 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:ssd-sgd: communication sparsification for distributed deep learning training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:ssd-sgd: communication sparsification for distributed deep learning training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators