-
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model
Authors:
Connall Garrod,
Jonathan P. Keating
Abstract:
Modern deep neural networks have achieved high performance across various tasks. Recently, researchers have noted occurrences of low-dimensional structure in the weights, Hessian's, gradients, and feature vectors of these networks, spanning different datasets and architectures when trained to convergence. In this analysis, we theoretically demonstrate these observations arising, and show how they…
▽ More
Modern deep neural networks have achieved high performance across various tasks. Recently, researchers have noted occurrences of low-dimensional structure in the weights, Hessian's, gradients, and feature vectors of these networks, spanning different datasets and architectures when trained to convergence. In this analysis, we theoretically demonstrate these observations arising, and show how they can be unified within a generalized unconstrained feature model that can be considered analytically. Specifically, we consider a previously described structure called Neural Collapse, and its multi-layer counterpart, Deep Neural Collapse, which emerges when the network approaches global optima. This phenomenon explains the other observed low-dimensional behaviours on a layer-wise level, such as the bulk and outlier structure seen in Hessian spectra, and the alignment of gradient descent with the outlier eigenspace of the Hessian. Empirical results in both the deep linear unconstrained feature model and its non-linear equivalent support these predicted observations.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
vSPACE: Voting in a Scalable, Privacy-Aware and Confidential Election
Authors:
Se Elnour,
William J Buchanan,
Paul Keating,
Mwrwan Abubakar,
Sirag Elnour
Abstract:
The vSPACE experimental proof-of-concept (PoC) on the TrueElect[Anon][Creds] protocol presents a novel approach to secure, private, and scalable elections, extending the TrueElect and ElectAnon protocols with the integration of AnonCreds SSI (Self-Sovereign Identity). Such a protocol PoC is situated within a Zero-Trust Architecture (ZTA) and leverages confidential computing, continuous authenticat…
▽ More
The vSPACE experimental proof-of-concept (PoC) on the TrueElect[Anon][Creds] protocol presents a novel approach to secure, private, and scalable elections, extending the TrueElect and ElectAnon protocols with the integration of AnonCreds SSI (Self-Sovereign Identity). Such a protocol PoC is situated within a Zero-Trust Architecture (ZTA) and leverages confidential computing, continuous authentication, multi-party computation (MPC), and well-architected framework (WAF) principles to address the challenges of cybersecurity, privacy, and trust over IP (ToIP) protection. Employing a Kubernetes confidential cluster within an Enterprise-Scale Landing Zone (ESLZ), vSPACE integrates Distributed Ledger Technology (DLT) for immutable and certifiable audit trails. The Infrastructure as Code (IaC) model ensures rapid deployment, consistent management, and adherence to security standards, making vSPACE a future-proof solution for digital voting systems.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Universal characteristics of deep neural network loss surfaces from random matrix theory
Authors:
Nicholas P Baskerville,
Jonathan P Keating,
Francesco Mezzadri,
Joseph Najnudel,
Diego Granziol
Abstract:
This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networ…
▽ More
This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms. We also present insights into deep neural network loss surfaces from quite general arguments based on tools from statistical physics and random matrix theory.
△ Less
Submitted 20 June, 2022; v1 submitted 17 May, 2022;
originally announced May 2022.
-
Appearance of Random Matrix Theory in Deep Learning
Authors:
Nicholas P Baskerville,
Diego Granziol,
Jonathan P Keating
Abstract:
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the s…
▽ More
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the study of loss surfaces in deep learning. Inspired by these observations, we propose a novel model for the true loss surfaces of neural networks, consistent with our observations, which allows for Hessian spectral densities with rank degeneracy and outliers, extensively observed in practice, and predicts a growing independence of loss gradients as a function of distance in weight-space. We further investigate the importance of the true loss surface in neural networks and find, in contrast to previous work, that the exponential hardness of locating the global minimum has practical consequences for achieving state of the art performance.
△ Less
Submitted 24 December, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
A spin-glass model for the loss surfaces of generative adversarial networks
Authors:
Nicholas P Baskerville,
Jonathan P Keating,
Francesco Mezzadri,
Joseph Najnudel
Abstract:
We present a novel mathematical model that seeks to capture the key design feature of generative adversarial networks (GANs). Our model consists of two interacting spin glasses, and we conduct an extensive theoretical analysis of the complexity of the model's critical points using techniques from Random Matrix Theory. The result is insights into the loss surfaces of large GANs that build upon prio…
▽ More
We present a novel mathematical model that seeks to capture the key design feature of generative adversarial networks (GANs). Our model consists of two interacting spin glasses, and we conduct an extensive theoretical analysis of the complexity of the model's critical points using techniques from Random Matrix Theory. The result is insights into the loss surfaces of large GANs that build upon prior insights for simpler networks, but also reveal new structure unique to this setting.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
The Loss Surfaces of Neural Networks with General Activation Functions
Authors:
Nicholas P. Baskerville,
Jonathan P. Keating,
Francesco Mezzadri,
Joseph Najnudel
Abstract:
The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established…
▽ More
The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in Random Matrix Theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.
△ Less
Submitted 8 June, 2021; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Maximum Absolute Determinants of Upper Hessenberg Bohemian Matrices
Authors:
Jonathan P. Keating,
Ahmet Abdullah Keleş
Abstract:
A matrix is called Bohemian if its entries are sampled from a finite set of integers. We determine the maximum absolute determinant of upper Hessenberg Bohemian Matrices for which the subdiagonal entries are fixed to be $1$ and upper triangular entries are sampled from $\{0,1,\cdots,n\}$, extending previous results for $n=1$ and $n=2$ and proving a recent conjecture of Fasi & Negri Porzio [8]. Fur…
▽ More
A matrix is called Bohemian if its entries are sampled from a finite set of integers. We determine the maximum absolute determinant of upper Hessenberg Bohemian Matrices for which the subdiagonal entries are fixed to be $1$ and upper triangular entries are sampled from $\{0,1,\cdots,n\}$, extending previous results for $n=1$ and $n=2$ and proving a recent conjecture of Fasi & Negri Porzio [8]. Furthermore, we generalize the problem to non-integer-valued entries.
△ Less
Submitted 8 May, 2020; v1 submitted 1 March, 2020;
originally announced March 2020.
-
Detecting squarefree numbers
Authors:
Andrew R. Booker,
Ghaith A. Hiary,
Jon P. Keating
Abstract:
We present an algorithm, based on the explicit formula for $L$-functions and conditional on GRH, for proving that a given integer is squarefree with little or no knowledge of its factorization. We analyze the algorithm both theoretically and practically, and use it to prove that several RSA challenge numbers are not squarefull.
We present an algorithm, based on the explicit formula for $L$-functions and conditional on GRH, for proving that a given integer is squarefree with little or no knowledge of its factorization. We analyze the algorithm both theoretically and practically, and use it to prove that several RSA challenge numbers are not squarefull.
△ Less
Submitted 5 January, 2015; v1 submitted 25 April, 2013;
originally announced April 2013.