Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

Oldewage, Elre T.; Clarke, Ross M.; Hernández-Lobato, José Miguel

Computer Science > Machine Learning

arXiv:2310.14901 (cs)

[Submitted on 23 Oct 2023 (v1), last revised 27 Feb 2024 (this version, v2)]

Title:Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

Authors:Elre T. Oldewage, Ross M. Clarke, José Miguel Hernández-Lobato

View PDF

Abstract:Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact inverse Hessian with absolute-value eigenvalues. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.

Comments:	37 pages, 10 figures, 5 tables. To appear in TMLR. First two authors' order randomised
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2310.14901 [cs.LG]
	(or arXiv:2310.14901v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.14901

Submission history

From: Ross M Clarke [view email]
[v1] Mon, 23 Oct 2023 13:11:30 UTC (13,797 KB)
[v2] Tue, 27 Feb 2024 14:13:44 UTC (14,567 KB)

Computer Science > Machine Learning

Title:Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators