-
Symmetries, flat minima, and the conserved quantities of gradient flow
Authors:
Bo Zhao,
Iordan Ganev,
Robin Walters,
Rose Yu,
Nima Dehmamy
Abstract:
Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be appl…
▽ More
Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries. These symmetries can transform a trained model such that it performs similarly on new samples, which allows ensemble building that improves robustness under certain adversarial attacks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.
△ Less
Submitted 23 March, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Quiver neural networks
Authors:
Iordan Ganev,
Robin Walters
Abstract:
We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures by introducing the notion of a quiver neural network. Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows in complex network architectures. As an application, we use parameter space symmetries to prove a lossless…
▽ More
We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures by introducing the notion of a quiver neural network. Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows in complex network architectures. As an application, we use parameter space symmetries to prove a lossless model compression algorithm for quiver neural networks with certain non-pointwise activations known as rescaling activations. In the case of radial rescaling activations, we prove that training the compressed model with gradient descent is equivalent to training the original model with projected gradient descent.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Universal approximation and model compression for radial neural networks
Authors:
Iordan Ganev,
Twan van Laarhoven,
Robin Walters
Abstract:
We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural…
▽ More
We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural networks, including in the more difficult cases of bounded widths and unbounded domains. Our proof techniques are novel, distinct from those in the pointwise case. Additionally, radial neural networks exhibit a rich group of orthogonal change-of-basis symmetries on the vector space of trainable parameters. Factoring out these symmetries leads to a practical lossless model compression algorithm. Optimization of the compressed model by gradient descent is equivalent to projected gradient descent for the full model.
△ Less
Submitted 16 February, 2023; v1 submitted 6 July, 2021;
originally announced July 2021.
-
PIUMA: Programmable Integrated Unified Memory Architecture
Authors:
Sriram Aananthakrishnan,
Nesreen K. Ahmed,
Vincent Cave,
Marcelo Cintra,
Yigit Demir,
Kristof Du Bois,
Stijn Eyerman,
Joshua B. Fryman,
Ivan Ganev,
Wim Heirman,
Hans-Christian Hoppe,
Jason Howard,
Ibrahim Hur,
MidhunChandra Kodiyath,
Samkit Jain,
Daniel S. Klowden,
Marek M. Landowski,
Laurent Montigny,
Ankit More,
Przemyslaw Ossowski,
Robert Pawlowski,
Nick Pepperling,
Fabrizio Petrini,
Mariusz Sikora,
Balasubramanian Seshasayee
, et al. (6 additional authors not shown)
Abstract:
High performance large scale graph analytics is essential to timely analyze relationships in big data sets. Conventional processor architectures suffer from inefficient resource usage and bad scaling on graph workloads. To enable efficient and scalable graph analysis, Intel developed the Programmable Integrated Unified Memory Architecture (PIUMA). PIUMA consists of many multi-threaded cores, fine-…
▽ More
High performance large scale graph analytics is essential to timely analyze relationships in big data sets. Conventional processor architectures suffer from inefficient resource usage and bad scaling on graph workloads. To enable efficient and scalable graph analysis, Intel developed the Programmable Integrated Unified Memory Architecture (PIUMA). PIUMA consists of many multi-threaded cores, fine-grained memory and network accesses, a globally shared address space and powerful offload engines. This paper presents the PIUMA architecture, and provides initial performance estimations, projecting that a PIUMA node will outperform a conventional compute node by one to two orders of magnitude. Furthermore, PIUMA continues to scale across multiple nodes, which is a challenge in conventional multinode setups.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.