PETRA: Parallel End-to-end Training with Reversible Architectures

Rivaud, Stéphane; Fournier, Louis; Pumir, Thomas; Belilovsky, Eugene; Eickenberg, Michael; Oyallon, Edouard

Computer Science > Machine Learning

arXiv:2406.02052 (cs)

[Submitted on 4 Jun 2024]

Title:PETRA: Parallel End-to-end Training with Reversible Architectures

Authors:Stéphane Rivaud (MLIA), Louis Fournier (MLIA), Thomas Pumir, Eugene Belilovsky (MILA), Michael Eickenberg, Edouard Oyallon

View PDF

Abstract:Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and kee** a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2406.02052 [cs.LG]
	(or arXiv:2406.02052v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.02052

Submission history

From: Edouard Oyallon [view email] [via CCSD proxy]
[v1] Tue, 4 Jun 2024 07:35:23 UTC (514 KB)

Computer Science > Machine Learning

Title:PETRA: Parallel End-to-end Training with Reversible Architectures

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:PETRA: Parallel End-to-end Training with Reversible Architectures

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators