Skip to main content

Showing 1–5 of 5 results for author: Osawa, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2210.02720  [pdf, other

    cs.LG stat.ML

    Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

    Authors: Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

    Abstract: Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the algorithmic perspective, that is, the algorithms of GR that efficiently improve the performance. In this study, we first reveal that a specific finite-difference… ▽ More

    Submitted 2 February, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

  2. arXiv:2010.00879  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

    Authors: Ryo Karakida, Kazuki Osawa

    Abstract: Natural Gradient Descent (NGD) helps to accelerate the convergence of gradient descent dynamics, but it requires approximations in large-scale deep neural networks because of its high computational cost. Empirical studies have confirmed that some NGD methods with approximate Fisher information converge sufficiently fast in practice. Nevertheless, it remains unclear from the theoretical perspective… ▽ More

    Submitted 7 December, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  3. arXiv:2002.06015  [pdf, other

    cs.LG stat.ML

    Scalable and Practical Natural Gradient for Large-Scale Deep Learning

    Authors: Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, Rio Yokota

    Abstract: Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad hoc modifications of batch normalization. We propose Scalable and Practical Natural Gradient Descen… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: arXiv admin note: text overlap with arXiv:1811.12019

  4. arXiv:1906.02506  [pdf, other

    stat.ML cs.LG

    Practical Deep Learning with Bayesian Principles

    Authors: Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan

    Abstract: Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar pe… ▽ More

    Submitted 29 October, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019

  5. arXiv:1811.12019  [pdf, other

    cs.LG cs.CV stat.ML

    Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

    Authors: Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

    Abstract: Large-scale distributed training of deep neural networks suffer from the generalization gap caused by the increase in the effective mini-batch size. Previous approaches try to solve this problem by varying the learning rate and batch size over epochs and layers, or some ad hoc modification of the batch normalization. We propose an alternative approach using a second-order optimization method that… ▽ More

    Submitted 30 March, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: 10 pages, 7 figures. Accepted at CVPR 2019, Long Beach, CA