Skip to main content

Showing 1–3 of 3 results for author: Clarke, R M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.14963  [pdf, other

    cs.LG stat.ML

    Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens

    Authors: Ross M. Clarke, José Miguel Hernández-Lobato

    Abstract: Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). Noting that second-order methods often only function effectively with the addition of stabilising heuristics (su… ▽ More

    Submitted 13 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 33 pages, 21 figures, 7 tables. Published at ICML 2024

  2. arXiv:2310.14901  [pdf, other

    cs.LG stat.ML

    Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

    Authors: Elre T. Oldewage, Ross M. Clarke, José Miguel Hernández-Lobato

    Abstract: Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which ad… ▽ More

    Submitted 27 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 37 pages, 10 figures, 5 tables. To appear in TMLR. First two authors' order randomised

  3. arXiv:2110.10461  [pdf, other

    cs.LG stat.ML

    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation

    Authors: Ross M. Clarke, Elre T. Oldewage, José Miguel Hernández-Lobato

    Abstract: Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning… ▽ More

    Submitted 21 April, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: 41 pages, 19 figures, 15 tables; minor CIFAR-10 normalisation updates from ICLR 2022 camera-ready version