Skip to main content

Showing 1–23 of 23 results for author: Sonoda, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13682  [pdf, other

    cs.LG math.RT stat.ML

    Constructive Universal Approximation Theorems for Deep Joint-Equivariant Networks by Schur's Lemma

    Authors: Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda

    Abstract: We present a unified constructive universal approximation theorem covering a wide range of learning machines including both shallow and deep neural networks based on the group representation theory. Constructive here means that the distribution of parameters is given in a closed-form expression (called the ridgelet transform). Contrary to the case of shallow models, expressive power analysis of de… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  2. arXiv:2402.15984  [pdf, other

    cs.LG math.FA stat.ML

    A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks

    Authors: Sho Sonoda, Isao Ishikawa, Masahiro Ikeda

    Abstract: To investigate neural network parameters, it is easier to study the distribution of parameters than to study the parameters in each neuron. The ridgelet transform is a pseudo-inverse operator that maps a given function $f$ to the parameter distribution $γ$ so that a network $\mathtt{NN}[γ]$ reproduces $f$, i.e. $\mathtt{NN}[γ]=f$. For depth-2 fully-connected networks on a Euclidean space, the ridg… ▽ More

    Submitted 18 April, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Journal ref: Journal of Statistical Planning and Inference, 2024

  3. arXiv:2401.17780  [pdf, other

    cs.LG

    A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees

    Authors: Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo

    Abstract: We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs). Despite its widespread practical use, the existing theoretical literature on PD-RL algorithms for this problem only provides sublinear regret guarantees and fails to ensure convergence to optimal policies. In this paper, we introduce a novel policy gradient PD algorithm with… ▽ More

    Submitted 1 July, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  4. arXiv:2310.03530  [pdf, other

    cs.LG stat.ML

    Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks

    Authors: Sho Sonoda, Hideyuki Ishi, Isao Ishikawa, Masahiro Ikeda

    Abstract: The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. In this study, we present a systematic method to induce a generalized neural network and its right inverse operator, called the ridgelet transform, from a joint group invariant function on the data-parameter d… ▽ More

    Submitted 13 November, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: NeurReps 2023

  5. arXiv:2310.03529  [pdf, other

    cs.LG stat.ML

    Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks

    Authors: Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda

    Abstract: We identify hidden layers inside a deep neural network (DNN) with group actions on the data domain, and formulate a formal deep network as a dual voice transform with respect to the Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of DNNs.

    Submitted 13 November, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: NeurReps 2023

  6. arXiv:2309.13078  [pdf, other

    cs.AI cs.LG cs.PL

    LPML: LLM-Prompting Markup Language for Mathematical Reasoning

    Authors: Ryutaro Yamauchi, Sho Sonoda, Akiyoshi Sannai, Wataru Kumagai

    Abstract: In utilizing large language models (LLMs) for mathematical reasoning, addressing the errors in the reasoning and calculation present in the generated text by LLMs is a crucial challenge. In this paper, we propose a novel framework that integrates the Chain-of-Thought (CoT) method with an external tool (Python REPL). We discovered that by prompting LLMs to generate structured text in XML-like marku… ▽ More

    Submitted 11 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  7. arXiv:2302.05825  [pdf, other

    cs.LG math.FA stat.ML

    Koopman-based generalization bound: New aspect for full-rank weights

    Authors: Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki

    Abstract: We propose a new bound for generalization of neural networks using Koopman operators. Whereas most of existing works focus on low-rank weight matrices, we focus on full-rank weight matrices. Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small. Especially, it is completely independent of the width of the network if the weight matrices are ort… ▽ More

    Submitted 16 March, 2024; v1 submitted 11 February, 2023; originally announced February 2023.

    Journal ref: ICLR 2024

  8. arXiv:2301.11936  [pdf, other

    quant-ph cs.LG stat.ML

    Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation

    Authors: Hayata Yamasaki, Sathyawageeswar Subramanian, Satoshi Hayakawa, Sho Sonoda

    Abstract: A significant challenge in the field of quantum machine learning (QML) is to establish applications of quantum computation to accelerate common tasks in machine learning such as those for neural networks. Ridgelet transform has been a fundamental mathematical tool in the theoretical studies of neural networks, but the practical applicability of ridgelet transform to conducting learning tasks was l… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: 27 pages, 4 figures

    Journal ref: Proceedings of the 40th International Conference on Machine Learning (ICML2023) https://proceedings.mlr.press/v202/yamasaki23a.html

  9. arXiv:2205.14819  [pdf, other

    cs.LG math.RT

    Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups

    Authors: Sho Sonoda, Isao Ishikawa, Masahiro Ikeda

    Abstract: We show the universality of depth-2 group convolutional neural networks (GCNNs) in a unified and constructive manner based on the ridgelet theory. Despite widespread use in applications, the approximation property of (G)CNNs has not been well investigated. The universality of (G)CNNs has been shown since the late 2010s. Yet, our understanding on how (G)CNNs represent functions is incomplete becaus… ▽ More

    Submitted 12 October, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

    Comments: replaced with the published version (NeurIPS2022)

  10. arXiv:2203.01631  [pdf, other

    cs.LG

    Fully-Connected Network on Noncompact Symmetric Space and Ridgelet Transform based on Helgason-Fourier Analysis

    Authors: Sho Sonoda, Isao Ishikawa, Masahiro Ikeda

    Abstract: Neural network on Riemannian symmetric space such as hyperbolic space and the manifold of symmetric positive definite (SPD) matrices is an emerging subject of research in geometric deep learning. Based on the well-established framework of the Helgason-Fourier transform on the noncompact symmetric space, we present a fully-connected network and its associated ridgelet transform on the noncompact sy… ▽ More

    Submitted 5 October, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: replaced with the published version (ICML2022)

  11. arXiv:2202.05254  [pdf, other

    cs.LG

    Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

    Authors: Kaito Watanabe, Kotaro Sakamoto, Ryo Karakida, Sho Sonoda, Shun-ichi Amari

    Abstract: A biological neural network in the cortex forms a neural field. Neurons in the field have their own receptive fields, and connection weights between two neurons are random but highly correlated when they are in close proximity in receptive fields. In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields. We empirically compa… ▽ More

    Submitted 6 January, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

  12. arXiv:2106.09028  [pdf, ps, other

    quant-ph cs.LG stat.ML

    Exponential Error Convergence in Data Classification with Optimized Random Features: Acceleration by Quantum Machine Learning

    Authors: Hayata Yamasaki, Sho Sonoda

    Abstract: Classification is a common task in machine learning. Random features (RFs) stand as a central technique for scalable learning algorithms based on kernel methods, and more recently proposed optimized random features, sampled depending on the model and the data distribution, can significantly reduce and provably minimize the required number of features. However, existing research on classification u… ▽ More

    Submitted 13 June, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: 42 pages, no figure

  13. arXiv:2106.04770  [pdf, other

    cs.LG stat.ML

    Ghosts in Neural Networks: Existence, Structure and Role of Infinite-Dimensional Null Space

    Authors: Sho Sonoda, Isao Ishikawa, Masahiro Ikeda

    Abstract: Overparametrization has been remarkably successful for deep learning studies. This study investigates an overlooked but important aspect of overparametrized neural networks, that is, the null components in the parameters of neural networks, or the ghosts. Since deep learning is not explicitly regularized, typical deep learning solutions contain null components. In this paper, we present a structur… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  14. arXiv:2106.03885  [pdf, other

    cs.LG math.DS math.OC stat.ML

    Differentiable Multiple Shooting Layers

    Authors: Stefano Massaroli, Michael Poli, Sho Sonoda, Taji Suzuki, **kyoo Park, Atsushi Yamashita, Hajime Asama

    Abstract: We detail a novel class of implicit neural models. Leveraging time-parallel methods for differential equations, Multiple Shooting Layers (MSLs) seek solutions of initial value problems via parallelizable root-finding algorithms. MSLs broadly serve as drop-in replacements for neural ordinary differential equations (Neural ODEs) with improved efficiency in number of function evaluations (NFEs) and w… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  15. arXiv:2008.08427  [pdf, other

    cs.LG stat.ML

    How Powerful are Shallow Neural Networks with Bandlimited Random Weights?

    Authors: Ming Li, Sho Sonoda, Feilong Cao, Yu Guang Wang, Jiye Liang

    Abstract: We investigate the expressive power of depth-2 bandlimited random neural networks. A random net is a neural network where the hidden layer parameters are frozen with random assignment, and only the output layer parameters are trained by loss minimization. Using random weights for a hidden layer is an effective method to avoid non-convex optimization in standard gradient descent learning. It has al… ▽ More

    Submitted 31 May, 2023; v1 submitted 19 August, 2020; originally announced August 2020.

    Comments: Published as a conference paper at ICML 2023

  16. arXiv:2007.03441  [pdf, other

    cs.LG stat.ML

    Ridge Regression with Over-Parametrized Two-Layer Networks Converge to Ridgelet Spectrum

    Authors: Sho Sonoda, Isao Ishikawa, Masahiro Ikeda

    Abstract: Characterization of local minima draws much attention in theoretical studies of deep learning. In this study, we investigate the distribution of parameters in an over-parametrized finite neural network trained by ridge regularized empirical square risk minimization (RERM). We develop a new theory of ridgelet transform, a wavelet-like integral transform that provides a powerful and general framewor… ▽ More

    Submitted 19 February, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: published at AISTATS2021

  17. arXiv:2004.10756  [pdf, other

    quant-ph cs.LG stat.ML

    Learning with Optimized Random Features: Exponential Speedup by Quantum Machine Learning without Sparsity and Low-Rank Assumptions

    Authors: Hayata Yamasaki, Sathyawageeswar Subramanian, Sho Sonoda, Masato Koashi

    Abstract: Kernel methods augmented with random features give scalable algorithms for learning from big data. But it has been computationally hard to sample random features according to a probability distribution that is optimized for the data, so as to minimize the required number of features for achieving the learning to a desired accuracy. Here, we develop a quantum algorithm for sampling from this optimi… ▽ More

    Submitted 9 November, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

    Comments: 37 pages, 2 figures, accepted at Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)

    Journal ref: https://proceedings.neurips.cc/paper/2020/hash/9ddb9dd5d8aee9a76bf217a2a3c54833-Abstract.html

  18. arXiv:1902.00648  [pdf, other

    stat.ML cs.LG

    Fast Approximation and Estimation Bounds of Kernel Quadrature for Infinitely Wide Models

    Authors: Sho Sonoda

    Abstract: An infinitely wide model is a weighted integration $\int \varphi(x,v) d μ(v)$ of feature maps. This model excels at handling an infinite number of features, and thus it has been adopted to the theoretical study of deep learning. Kernel quadrature is a kernel-based numerical integration scheme developed for fast approximation of expectations $\int f(x) d p(x)$. In this study, regarding the weight… ▽ More

    Submitted 7 July, 2020; v1 submitted 2 February, 2019; originally announced February 2019.

  19. arXiv:1805.07517  [pdf, other

    stat.ML cs.LG

    The global optimum of shallow neural network is attained by ridgelet transform

    Authors: Sho Sonoda, Isao Ishikawa, Masahiro Ikeda, Kei Hagihara, Yoshihiro Sawano, Takuo Matsubara, Noboru Murata

    Abstract: We prove that the global minimum of the backpropagation (BP) training problem of neural networks with an arbitrary nonlinear activation is given by the ridgelet transform. A series of computational experiments show that there exists an interesting similarity between the scatter plot of hidden parameters in a shallow neural network after the BP training and the spectrum of the ridgelet transform. B… ▽ More

    Submitted 28 January, 2019; v1 submitted 19 May, 2018; originally announced May 2018.

    Comments: under review

  20. arXiv:1712.04145  [pdf, other

    cs.LG stat.ML

    Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks

    Authors: Sho Sonoda, Noboru Murata

    Abstract: The feature map obtained from the denoising autoencoder (DAE) is investigated by determining transportation dynamics of the DAE, which is a cornerstone for deep learning. Despite the rapid development in its application, deep neural networks remain analytically unexplained, because the feature maps are nested and parameters are not faithful. In this paper, we address the problem of the formulation… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: Accepted at NIPS 2017 workshop on Optimal Transport & Machine Learning (OTML2017)

  21. arXiv:1605.02832  [pdf, other

    cs.LG stat.ML

    Transport Analysis of Infinitely Deep Neural Network

    Authors: Sho Sonoda, Noboru Murata

    Abstract: We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth (why do DNNs perform better than shallow models?) and the interpretation of DNNs (what do intermediate layers do?) Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are… ▽ More

    Submitted 31 October, 2018; v1 submitted 9 May, 2016; originally announced May 2016.

    Journal ref: Journal of Machine Learning Research 20(2):1-52, 2019

  22. arXiv:1505.03654  [pdf, other

    cs.NE cs.LG math.FA

    Neural Network with Unbounded Activation Functions is Universal Approximator

    Authors: Sho Sonoda, Noboru Murata

    Abstract: This paper presents an investigation of the approximation property of neural networks with unbounded activation functions, such as the rectified linear unit (ReLU), which is the new de-facto standard of deep learning. The ReLU network can be analyzed by the ridgelet transform with respect to Lizorkin distributions. By showing three reconstruction formulas by using the Fourier slice theorem, the Ra… ▽ More

    Submitted 29 November, 2015; v1 submitted 14 May, 2015; originally announced May 2015.

    Comments: under review; first revised version

    Journal ref: Applied and Computational Harmonic Analysis, 43(2):233-268, 2017

  23. arXiv:1312.6461  [pdf, other

    cs.LG cs.NE

    Nonparametric Weight Initialization of Neural Networks via Integral Representation

    Authors: Sho Sonoda, Noboru Murata

    Abstract: A new initialization method for hidden parameters in a neural network is proposed. Derived from the integral representation of the neural network, a nonparametric probability distribution of hidden parameters is introduced. In this proposal, hidden parameters are initialized by samples drawn from this distribution, and output parameters are fitted by ordinary linear regression. Numerical experimen… ▽ More

    Submitted 19 February, 2014; v1 submitted 22 December, 2013; originally announced December 2013.

    Comments: For ICLR2014, revised into 9 pages; revised into 12 pages (with supplements)