Skip to main content

Showing 1–23 of 23 results for author: Nitanda, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15767  [pdf, other

    cs.LG stat.ML

    Improved Particle Approximation Error for Mean Field Neural Networks

    Authors: Atsushi Nitanda

    Abstract: Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions. MFLD has gained attention due to its connection with noisy gradient descent for mean-field two-layer neural networks. Unlike standard Langevin dynamics, the nonlinearity of the objective functional induces particle interactions, necessitating multipl… ▽ More

    Submitted 14 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 16 pages

  2. arXiv:2306.07221  [pdf, ps, other

    cs.LG stat.ML

    Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

    Authors: Taiji Suzuki, Denny Wu, Atsushi Nitanda

    Abstract: The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift, and it naturally arises from the optimization of two-layer neural networks via (noisy) gradient descent. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. However, all prior analyses as… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 37 pages

  3. arXiv:2305.07971  [pdf, ps, other

    stat.ML cs.LG

    Tight and fast generalization error bound of graph embedding in metric space

    Authors: Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, **g Wang, Feng Tian, Kenji Yamanishi

    Abstract: Recent studies have experimentally shown that we can achieve in non-Euclidean metric space effective and efficient graph embedding, which aims to obtain the vertices' representations reflecting the graph's structure in the metric space. Specifically, graph embedding in hyperbolic space has experimentally succeeded in embedding graphs with hierarchical-tree structure, e.g., data in natural language… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

  4. arXiv:2303.02957  [pdf, other

    stat.ML cs.LG math.OC

    Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

    Authors: Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki

    Abstract: The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime. In this work, we provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structur… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  5. arXiv:2302.09376  [pdf, other

    stat.ML cs.LG

    Why is parameter averaging beneficial in SGD? An objective smoothing perspective

    Authors: Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda, Denny Wu

    Abstract: It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et al. (2018) connected this bias with the smoothing effect of SGD which eliminates sharp local minima by the convolution using the stochastic gradient noise. We f… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: 27pages, AISTATS2024

  6. arXiv:2302.05825  [pdf, other

    cs.LG math.FA stat.ML

    Koopman-based generalization bound: New aspect for full-rank weights

    Authors: Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki

    Abstract: We propose a new bound for generalization of neural networks using Koopman operators. Whereas most of existing works focus on low-rank weight matrices, we focus on full-rank weight matrices. Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small. Especially, it is completely independent of the width of the network if the weight matrices are ort… ▽ More

    Submitted 16 March, 2024; v1 submitted 11 February, 2023; originally announced February 2023.

    Journal ref: ICLR 2024

  7. arXiv:2201.10469  [pdf, other

    stat.ML cs.LG math.PR

    Convex Analysis of the Mean Field Langevin Dynamics

    Authors: Atsushi Nitanda, Denny Wu, Taiji Suzuki

    Abstract: As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this work, we give a concise and self-contained convergence rate analysis of the mean… ▽ More

    Submitted 24 February, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: AISTATS2022

  8. arXiv:2105.10475  [pdf, ps, other

    cs.LG

    Generalization Error Bound for Hyperbolic Ordinal Embedding

    Authors: Atsushi Suzuki, Atsushi Nitanda, **g Wang, Linchuan Xu, Marc Cavazza, Kenji Yamanishi

    Abstract: Hyperbolic ordinal embedding (HOE) represents entities as points in hyperbolic space so that they agree as well as possible with given constraints in the form of entity i is more similar to entity j than to entity k. It has been experimentally shown that HOE can obtain representations of hierarchical data such as a knowledge base and a citation network effectively, owing to hyperbolic space's expo… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

  9. arXiv:2103.06797  [pdf, other

    cs.LG cs.CR

    BODAME: Bilevel Optimization for Defense Against Model Extraction

    Authors: Yuto Mori, Atsushi Nitanda, Akiko Takeda

    Abstract: Model extraction attacks have become serious issues for service providers using machine learning. We consider an adversarial setting to prevent model extraction under the assumption that attackers will make their best guess on the service provider's model using query accesses, and propose to build a surrogate model that significantly keeps away the predictions of the attacker's model from those of… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: 18 pages

  10. arXiv:2012.15477  [pdf, other

    stat.ML cs.LG

    Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

    Authors: Atsushi Nitanda, Denny Wu, Taiji Suzuki

    Abstract: We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the oute… ▽ More

    Submitted 22 January, 2022; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: NeurIPS 2021

  11. arXiv:2007.15897  [pdf, other

    eess.IV cs.CV cs.LG

    A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification

    Authors: Linchuan Xu, Jun Huang, Atsushi Nitanda, Ryo Asaoka, Kenji Yamanishi

    Abstract: Spatial attention has been introduced to convolutional neural networks (CNNs) for improving both their performance and interpretability in visual tasks including image classification. The essence of the spatial attention is to learn a weight map which represents the relative importance of activations within the same layer or channel. All existing attention mechanisms are local attentions in the se… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

  12. arXiv:2007.12160  [pdf, other

    stat.ML cs.LG

    Online Robust and Adaptive Learning from Data Streams

    Authors: Shintaro Fukushima, Atsushi Nitanda, Kenji Yamanishi

    Abstract: In online learning from non-stationary data streams, it is necessary to learn robustly to outliers and to adapt quickly to changes in the underlying data generating mechanism. In this paper, we refer to the former attribute of online learning algorithms as robustness and to the latter as adaptivity. There is an obvious tradeoff between the two attributes. It is a fundamental issue to quantify and… ▽ More

    Submitted 27 September, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: 42 pages

  13. arXiv:2006.12297  [pdf, other

    stat.ML cs.LG

    Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

    Authors: Atsushi Nitanda, Taiji Suzuki

    Abstract: We analyze the convergence of the averaged stochastic gradient descent for overparameterized two-layer neural networks for regression problems. It was recently found that a neural tangent kernel (NTK) plays an important role in showing the global convergence of gradient-based methods under the NTK regime, where the learning dynamics for overparameterized neural networks can be almost characterized… ▽ More

    Submitted 11 June, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 35 pages

  14. arXiv:2006.10732  [pdf, other

    stat.ML cs.LG

    When Does Preconditioning Help or Hurt Generalization?

    Authors: Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

    Abstract: While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question. This work presents a more nuanced view on how the \textit{implicit bias} of first- and second-order methods affects the comparison of generalization properties. We provide an exact asymptotic bias-variance decomposition of the generalizatio… ▽ More

    Submitted 8 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: 42 pages

  15. arXiv:1911.05350  [pdf, other

    stat.ML cs.LG

    Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

    Authors: Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

    Abstract: Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms. In this study, we consider solving a binary classification problem using random features and stochastic gradient descent. In rece… ▽ More

    Submitted 2 June, 2022; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: AISTATS2021

  16. arXiv:1910.12799  [pdf, other

    stat.ML cs.LG

    Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

    Authors: Taiji Suzuki, Atsushi Nitanda

    Abstract: Deep learning has exhibited superior performance for various tasks, especially for high-dimensional datasets, such as images. To understand this property, we investigate the approximation and estimation ability of deep learning on anisotropic Besov spaces. The anisotropic Besov space is characterized by direction-dependent smoothness and includes several function classes that have been investigate… ▽ More

    Submitted 30 September, 2021; v1 submitted 28 October, 2019; originally announced October 2019.

    Comments: Accepted in NeurIPS2021

  17. arXiv:1906.08473  [pdf, other

    stat.ML cs.LG

    Data Cleansing for Models Trained with SGD

    Authors: Satoshi Hara, Atsushi Nitanda, Takanori Maehara

    Abstract: Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models. In this paper, we propose an algorithm that can suggest influential instances without using any domain knowledge. With the proposed method, users only need to inspect the instances suggested b… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  18. arXiv:1905.09870  [pdf, ps, other

    stat.ML cs.LG

    Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

    Authors: Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

    Abstract: Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks. Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out. On the other hand, the performance of gradient descen… ▽ More

    Submitted 18 March, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: 29 pages

  19. arXiv:1806.05438  [pdf, other

    stat.ML cs.LG math.OC

    Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

    Authors: Atsushi Nitanda, Taiji Suzuki

    Abstract: We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequ… ▽ More

    Submitted 25 July, 2022; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: 15 pages, 2 figures

  20. arXiv:1802.09031  [pdf, other

    stat.ML cs.LG

    Functional Gradient Boosting based on Residual Network Perception

    Authors: Atsushi Nitanda, Taiji Suzuki

    Abstract: Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well. One attractive viewpoint on ResNet is that it is optimizing the risk in a functional space by combining an ensemble of effective features. In this paper, we adopt this viewpoint to construct a new gradient boosting method, whi… ▽ More

    Submitted 7 July, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

    Comments: 22 pages, 1 figure, 1 table. An extended version of ICML 2018 paper

  21. arXiv:1801.02227  [pdf, other

    stat.ML cs.LG

    Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

    Authors: Atsushi Nitanda, Taiji Suzuki

    Abstract: We propose a new technique that boosts the convergence of training generative adversarial networks. Generally, the rate of training deep models reduces severely after multiple iterations. A key reason for this phenomenon is that a deep network is expressed using a highly non-convex finite-dimensional model, and thus the parameter gets stuck in a local optimum. Because of this, methods often suffer… ▽ More

    Submitted 14 June, 2018; v1 submitted 7 January, 2018; originally announced January 2018.

    Comments: 14 pages, 4 figures, AISTATS2018

  22. arXiv:1712.05438  [pdf, ps, other

    stat.ML cs.LG math.OC

    Stochastic Particle Gradient Descent for Infinite Ensembles

    Authors: Atsushi Nitanda, Taiji Suzuki

    Abstract: The superior performance of ensemble methods with infinite models are well known. Most of these methods are based on optimization problems in infinite-dimensional spaces with some regularization, for instance, boosting methods and convex neural networks use $L^1$-regularization with the non-negative constraint. However, due to the difficulty of handling $L^1$-regularization, these problems require… ▽ More

    Submitted 14 December, 2017; originally announced December 2017.

    Comments: 33 pages, 1 figure

  23. arXiv:1506.03016  [pdf, ps, other

    stat.ML cs.LG

    Accelerated Stochastic Gradient Descent for Minimizing Finite Sums

    Authors: Atsushi Nitanda

    Abstract: We propose an optimization method for minimizing the finite sums of smooth convex functions. Our method incorporates an accelerated gradient descent (AGD) and a stochastic variance reduction gradient (SVRG) in a mini-batch setting. Unlike SVRG, our method can be directly applied to non-strongly and strongly convex problems. We show that our method achieves a lower overall complexity than the recen… ▽ More

    Submitted 10 June, 2015; v1 submitted 9 June, 2015; originally announced June 2015.

    Comments: [v2] corrected citation to proxSVRG, corrected typos in Figure 1(option2) and 3(R4 -> R3)