Skip to main content

Showing 1–26 of 26 results for author: Ziyin, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.07193  [pdf, other

    cs.LG math.OC stat.ML

    Loss Symmetry and Noise Equilibrium of Stochastic Gradient Descent

    Authors: Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu

    Abstract: Symmetries exist abundantly in the loss function of neural networks. We characterize the learning dynamics of stochastic gradient descent (SGD) when exponential symmetries, a broad subclass of continuous symmetries, exist in the loss function. We establish that when gradient noises do not balance, SGD has the tendency to move the model parameters toward a point where noises from different directio… ▽ More

    Submitted 3 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: preprint

  2. arXiv:2401.07085  [pdf, other

    cs.LG cs.AI

    Three Mechanisms of Feature Learning in the Exact Solution of a Latent Variable Model

    Authors: Yizhou Xu, Liu Ziyin

    Abstract: We identify and exactly solve the learning dynamics of a one-hidden-layer linear model at any finite width whose limits exhibit both the kernel phase and the feature learning phase. We analyze the phase diagram of this model in different limits of common hyperparameters including width, layer-wise learning rates, scale of output, and scale of initialization. Our solution identifies three novel pro… ▽ More

    Submitted 4 May, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

  3. arXiv:2309.16932  [pdf, other

    cs.LG stat.ML

    Symmetry Induces Structure and Constraint of Learning

    Authors: Liu Ziyin

    Abstract: Due to common architecture designs, symmetries exist extensively in contemporary neural networks. In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models. We prove that every mirror-reflection symmetry, with reflection surface $O$, in the loss function leads to the emergence of a constraint on the model… ▽ More

    Submitted 1 June, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: ICML 2024 Camera Ready Version

  4. arXiv:2308.06671  [pdf, other

    cs.LG cs.AI stat.ML

    Law of Balance and Stationary Distribution of Stochastic Gradient Descent

    Authors: Liu Ziyin, Hongchao Li, Masahito Ueda

    Abstract: The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we prove that the minibatch noise of SGD regularizes the solution towards a balanced solution whenever the loss function contains a rescaling symmetry. Beca… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Preprint

  5. arXiv:2303.15438  [pdf, other

    cs.LG

    On the Stepwise Nature of Self-Supervised Learning

    Authors: James B. Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J. Fetterman, Joshua Albrecht

    Abstract: We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 9 pages (main text) + 14 pages (refs + appendices). ICML '23

  6. arXiv:2303.13093  [pdf, other

    cs.LG math.OC physics.data-an

    Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent

    Authors: Liu Ziyin, Botao Li, Tomer Galanti, Masahito Ueda

    Abstract: Characterizing and understanding the dynamics of stochastic gradient descent (SGD) around saddle points remains an open problem. We first show that saddle points in neural networks can be divided into two types, among which the Type-II saddles are especially difficult to escape from because the gradient noise vanishes at the saddle. The dynamics of SGD around these saddles are thus to leading orde… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: preprint

  7. arXiv:2210.01212  [pdf, other

    cs.LG stat.ML

    spred: Solving $L_1$ Penalty with SGD

    Authors: Liu Ziyin, Zihao Wang

    Abstract: We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable so… ▽ More

    Submitted 12 July, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: ICML 2023, 16 pages, 10 figures, and 2 tables

  8. arXiv:2210.00638  [pdf, other

    cs.LG physics.data-an

    What shapes the loss landscape of self-supervised learning?

    Authors: Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka

    Abstract: Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL). However, questions remain in our theoretical understanding: When do those collapses occur? What are the mechanisms and causes? We answer these questions by deriving and thoroughly analyzing an analytically tractable theory of SSL loss landscapes. In this the… ▽ More

    Submitted 11 March, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: Published at ICLR 2023

  9. arXiv:2205.12510  [pdf, other

    cs.LG cond-mat.dis-nn physics.app-ph

    Exact Phase Transitions in Deep Learning

    Authors: Liu Ziyin, Masahito Ueda

    Abstract: This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics. In particular, we prove that the competition between prediction error and model complexity in the training loss leads to the second-order phase transition for nets with one hidden layer and the first-order phase transition for nets with more than o… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: preprint

  10. arXiv:2205.04009  [pdf, other

    cs.LG stat.ML

    Posterior Collapse of a Linear Latent Variable Model

    Authors: Zihao Wang, Liu Ziyin

    Abstract: This work identifies the existence and cause of a type of posterior collapse that frequently occurs in the Bayesian deep learning practice. For a general linear latent variable model that includes linear variational autoencoders as a special case, we precisely identify the nature of posterior collapse to be the competition between the likelihood and the regularization of the mean due to the prior.… ▽ More

    Submitted 13 October, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022; 25 pages, 5 figures, 1 Table

  11. Exact Solutions of a Deep Linear Network

    Authors: Liu Ziyin, Botao Li, Xiangming Meng

    Abstract: This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in deep neural network loss landscape where highly nonlinear phenomenon emerges. We show that weight decay strongly interacts with the model arc… ▽ More

    Submitted 13 June, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  12. arXiv:2201.12724  [pdf, other

    cs.LG stat.ML

    Stochastic Neural Networks with Infinite Width are Deterministic

    Authors: Liu Ziyin, Hanlin Zhang, Xiangming Meng, Yuting Lu, Eric Xing, Masahito Ueda

    Abstract: This work theoretically studies stochastic neural networks, a main type of neural network in use. We prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. Our theory justifies the common intuition that adding stochasticity to the model can help regularize the model by introducing an averaging effect. Two… ▽ More

    Submitted 24 May, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

  13. arXiv:2107.11774  [pdf, other

    cs.LG math.OC stat.ML

    SGD with a Constant Large Learning Rate Can Converge to Local Maxima

    Authors: Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda

    Abstract: Previous works on stochastic gradient descent (SGD) often focus on its success. In this work, we construct worst-case optimization problems illustrating that, when not in the regimes that the previous works often assume, SGD can exhibit many strange and potentially undesirable behaviors. Specifically, we construct landscapes and data distributions such that (1) SGD converges to local maxima, (2) S… ▽ More

    Submitted 27 May, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: Fixed typos

  14. arXiv:2106.04114  [pdf, other

    cs.LG q-fin.GN q-fin.PM

    Theoretically Motivated Data Augmentation and Regularization for Portfolio Construction

    Authors: Liu Ziyin, Kentaro Minami, Kentaro Imajo

    Abstract: The task we consider is portfolio construction in a speculative market, a fundamental problem in modern finance. While various empirical works now exist to explore deep learning in finance, the theory side is almost non-existent. In this work, we focus on develo** a theoretical framework for understanding the use of data augmentation for deep-learning-based approaches to quantitative finance. Th… ▽ More

    Submitted 22 December, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: The full version of our work published at 3rd ACM International Conference on AI in Finance (ICAIF'22)

  15. arXiv:2105.09557  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Power-law escape rate of SGD

    Authors: Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

    Abstract: Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier $Δ\log L=\log[L(θ^s)/L(θ^*)]$ between a local minimum $θ^*$ and a saddle $θ^s$ determines th… ▽ More

    Submitted 29 January, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: 17+8 pages

  16. arXiv:2105.07222  [pdf, other

    cs.LG stat.ML

    On the Distributional Properties of Adaptive Gradients

    Authors: Zhang Zhiyi, Liu Ziyin

    Abstract: Adaptive gradient methods have achieved remarkable success in training deep neural networks on a wide variety of tasks. However, not much is known about the mathematical and statistical properties of this family of methods. This work aims at providing a series of theoretical analyses of its statistical properties justified by experiments. In particular, we show that when the underlying gradient ob… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

  17. arXiv:2102.05375  [pdf, other

    cs.LG stat.ML

    Strength of Minibatch Noise in SGD

    Authors: Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda

    Abstract: The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning. This work presents the first systematic study of the SGD noise and fluctuations close to a local minimum. We first analyze the SGD noise in linear regression in detail and then derive a general formula for approximating SGD noise in different types o… ▽ More

    Submitted 8 March, 2022; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: ICLR 2022 spotlight

  18. arXiv:2012.03636  [pdf, other

    stat.ML cs.LG

    Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

    Authors: Kangqiao Liu, Liu Ziyin, Masahito Ueda

    Abstract: In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood. In this work, we propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime. The focus is on deriving exactly solvable results and discussing their implications. The main contributions of this work are to derive the stationary distribution for dis… ▽ More

    Submitted 11 June, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: Camera-ready version for the Thirty-eighth International Conference on Machine Learning (ICML 2021). 12 + 14 pages, 6 + 3 figures, 1 + 0 table. *First two authors contributed equally

  19. arXiv:2012.02813  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

    Authors: Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities are present at train and test time, which makes learning in low-resource modalities particularly difficult. In this work, we propose algorithms for cross-modal ge… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

  20. arXiv:2010.12648  [pdf, other

    cs.LG stat.ML

    An Investigation of how Label Smoothing Affects Generalization

    Authors: Blair Chen, Liu Ziyin, Zihao Wang, Paul Pu Liang

    Abstract: It has been hypothesized that label smoothing can reduce overfitting and improve generalization, and current empirical evidence seems to corroborate these effects. However, there is a lack of mathematical understanding of when and why such empirical improvements occur. In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how la… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  21. arXiv:2006.08195  [pdf, other

    cs.LG stat.ML

    Neural Networks Fail to Learn Periodic Functions and How to Fix It

    Authors: Liu Ziyin, Tilman Hartwig, Masahito Ueda

    Abstract: Previous literature offers limited clues on how to learn a periodic function using modern neural networks. We start with a study of the extrapolation properties of neural networks; we prove and demonstrate experimentally that the standard activations functions, such as ReLU, tanh, sigmoid, along with their variants, all fail to learn to extrapolate simple periodic functions. We hypothesize that th… ▽ More

    Submitted 24 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 Camera Ready version

  22. arXiv:2003.11243  [pdf, other

    cs.LG stat.ML

    Volumization as a Natural Generalization of Weight Decay

    Authors: Liu Ziyin, Zihao Wang, Makoto Yamada, Masahito Ueda

    Abstract: We propose a novel regularization method, called \textit{volumization}, for neural networks. Inspired by physics, we define a physical volume for the weight parameters in neural networks, and we show that this method is an effective way of regularizing neural networks. Intuitively, this method interpolates between an $L_2$ and $L_\infty$ regularization. Therefore, weight decay and weight clip**… ▽ More

    Submitted 1 April, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

    Comments: 18 pages, 20 figures

  23. arXiv:2002.06541  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Not to Learn in the Presence of Noisy Labels

    Authors: Liu Ziyin, Blair Chen, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

    Abstract: Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets. In this paper, we discover that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption. We show that training with this loss function encourages the model to… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

  24. arXiv:2002.04839  [pdf, other

    cs.LG stat.ML

    LaProp: Separating Momentum and Adaptivity in Adam

    Authors: Liu Ziyin, Zhikang T. Wang, Masahito Ueda

    Abstract: We identity a by-far-unrecognized problem of Adam-style optimizers which results from unnecessary coupling between momentum and adaptivity. The coupling leads to instability and divergence when the momentum and adaptivity parameters are mismatched. In this work, we propose a method, Laprop, which decouples momentum and adaptivity in the Adam-style methods. We show that the decoupling leads to grea… ▽ More

    Submitted 13 June, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

  25. arXiv:2001.01523  [pdf, other

    cs.LG cs.DC stat.ML

    Think Locally, Act Globally: Federated Learning with Local and Global Representations

    Authors: Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model a… ▽ More

    Submitted 14 July, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

    Comments: NeurIPS 2019 Workshop on Federated Learning distinguished student paper award. Code: https://github.com/pliang279/LG-FedAvg

  26. arXiv:1907.00208  [pdf, other

    cs.LG stat.ML

    Deep Gamblers: Learning to Abstain with Portfolio Theory

    Authors: Liu Ziyin, Zhikang Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

    Abstract: We deal with the \textit{selective classification} problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data. We transform the original $m$-class classification problem to $(m+1)$-class where the $(m+1)$-th class represents the model abstaining from making a prediction due to disconfidence. Inspired by po… ▽ More

    Submitted 1 October, 2019; v1 submitted 29 June, 2019; originally announced July 2019.

    Comments: Camera-Ready version for NeurIPS2019. Link to our code updated