A spherical analysis of Adam with Batch Normalization

Roburin, Simon; de Mont-Marin, Yann; Bursuc, Andrei; Marlet, Renaud; Pérez, Patrick; Aubry, Mathieu

Computer Science > Machine Learning

arXiv:2006.13382v2 (cs)

[Submitted on 23 Jun 2020 (v1), revised 21 Oct 2020 (this version, v2), latest version 19 May 2022 (v3)]

Title:A spherical analysis of Adam with Batch Normalization

Authors:Simon Roburin, Yann de Mont-Marin, Andrei Bursuc, Renaud Marlet, Patrick Pérez, Mathieu Aubry

View PDF

Abstract:Batch Normalization (BN) is a prominent deep learning technique. In spite of its apparent simplicity, its implications over optimization are yet to be fully understood. While previous studies mostly focus on the interaction between BN and stochastic gradient descent (SGD), we develop a geometric perspective which allows us to precisely characterize the relation between BN and Adam. More precisely, we leverage the radial invariance of groups of parameters, such as filters for convolutional neural networks, to translate the optimization steps on the $L_2$ unit hypersphere. This formulation and the associated geometric interpretation shed new light on the training dynamics. Firstly, we use it to derive the first effective learning rate expression of Adam. Then we show that, in the presence of BN layers, performing SGD alone is actually equivalent to a variant of Adam constrained to the unit hypersphere. Finally, our analysis outlines phenomena that previous variants of Adam act on and we experimentally validate their importance in the optimization process.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.13382 [cs.LG]
	(or arXiv:2006.13382v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.13382

Submission history

From: Andrei Bursuc [view email]
[v1] Tue, 23 Jun 2020 23:29:51 UTC (3,551 KB)
[v2] Wed, 21 Oct 2020 13:49:26 UTC (10,258 KB)
[v3] Thu, 19 May 2022 13:29:31 UTC (19,573 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Andrei Bursuc
Renaud Marlet
Patrick Pérez
Mathieu Aubry

export BibTeX citation

Computer Science > Machine Learning

Title:A spherical analysis of Adam with Batch Normalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A spherical analysis of Adam with Batch Normalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators