Search | arXiv e-print repository

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

Authors: Connall Garrod, Jonathan P. Keating

Abstract: Modern deep neural networks have achieved high performance across various tasks. Recently, researchers have noted occurrences of low-dimensional structure in the weights, Hessian's, gradients, and feature vectors of these networks, spanning different datasets and architectures when trained to convergence. In this analysis, we theoretically demonstrate these observations arising, and show how they… ▽ More Modern deep neural networks have achieved high performance across various tasks. Recently, researchers have noted occurrences of low-dimensional structure in the weights, Hessian's, gradients, and feature vectors of these networks, spanning different datasets and architectures when trained to convergence. In this analysis, we theoretically demonstrate these observations arising, and show how they can be unified within a generalized unconstrained feature model that can be considered analytically. Specifically, we consider a previously described structure called Neural Collapse, and its multi-layer counterpart, Deep Neural Collapse, which emerges when the network approaches global optima. This phenomenon explains the other observed low-dimensional behaviours on a layer-wise level, such as the bulk and outlier structure seen in Hessian spectra, and the alignment of gradient descent with the outlier eigenspace of the Hessian. Empirical results in both the deep linear unconstrained feature model and its non-linear equivalent support these predicted observations. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 35 pages, 14 figures

arXiv:2403.05275 [pdf, other]

vSPACE: Voting in a Scalable, Privacy-Aware and Confidential Election

Authors: Se Elnour, William J Buchanan, Paul Keating, Mwrwan Abubakar, Sirag Elnour

Abstract: The vSPACE experimental proof-of-concept (PoC) on the TrueElect[Anon][Creds] protocol presents a novel approach to secure, private, and scalable elections, extending the TrueElect and ElectAnon protocols with the integration of AnonCreds SSI (Self-Sovereign Identity). Such a protocol PoC is situated within a Zero-Trust Architecture (ZTA) and leverages confidential computing, continuous authenticat… ▽ More The vSPACE experimental proof-of-concept (PoC) on the TrueElect[Anon][Creds] protocol presents a novel approach to secure, private, and scalable elections, extending the TrueElect and ElectAnon protocols with the integration of AnonCreds SSI (Self-Sovereign Identity). Such a protocol PoC is situated within a Zero-Trust Architecture (ZTA) and leverages confidential computing, continuous authentication, multi-party computation (MPC), and well-architected framework (WAF) principles to address the challenges of cybersecurity, privacy, and trust over IP (ToIP) protection. Employing a Kubernetes confidential cluster within an Enterprise-Scale Landing Zone (ESLZ), vSPACE integrates Distributed Ledger Technology (DLT) for immutable and certifiable audit trails. The Infrastructure as Code (IaC) model ensures rapid deployment, consistent management, and adherence to security standards, making vSPACE a future-proof solution for digital voting systems. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2205.08601 [pdf, other]

doi 10.1088/1751-8121/aca7f5

Universal characteristics of deep neural network loss surfaces from random matrix theory

Authors: Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph Najnudel, Diego Granziol

Abstract: This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networ… ▽ More This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms. We also present insights into deep neural network loss surfaces from quite general arguments based on tools from statistical physics and random matrix theory. △ Less

Submitted 20 June, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: 42 pages

arXiv:2102.06740 [pdf, other]

doi 10.1016/j.physa.2021.126742

Appearance of Random Matrix Theory in Deep Learning

Authors: Nicholas P Baskerville, Diego Granziol, Jonathan P Keating

Abstract: We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the s… ▽ More We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the study of loss surfaces in deep learning. Inspired by these observations, we propose a novel model for the true loss surfaces of neural networks, consistent with our observations, which allows for Hessian spectral densities with rank degeneracy and outliers, extensively observed in practice, and predicts a growing independence of loss gradients as a function of distance in weight-space. We further investigate the importance of the true loss surface in neural networks and find, in contrast to previous work, that the exponential hardness of locating the global minimum has practical consequences for achieving state of the art performance. △ Less

Submitted 24 December, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: 33 pages, 14 figures

arXiv:2101.02524 [pdf, other]

doi 10.1007/s10955-022-02875-w

A spin-glass model for the loss surfaces of generative adversarial networks

Authors: Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph Najnudel

Abstract: We present a novel mathematical model that seeks to capture the key design feature of generative adversarial networks (GANs). Our model consists of two interacting spin glasses, and we conduct an extensive theoretical analysis of the complexity of the model's critical points using techniques from Random Matrix Theory. The result is insights into the loss surfaces of large GANs that build upon prio… ▽ More We present a novel mathematical model that seeks to capture the key design feature of generative adversarial networks (GANs). Our model consists of two interacting spin glasses, and we conduct an extensive theoretical analysis of the complexity of the model's critical points using techniques from Random Matrix Theory. The result is insights into the loss surfaces of large GANs that build upon prior insights for simpler networks, but also reveal new structure unique to this setting. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: 26 pages, 9 figures

arXiv:2004.03959 [pdf, other]

doi 10.1088/1742-5468/abfa1e

The Loss Surfaces of Neural Networks with General Activation Functions

Authors: Nicholas P. Baskerville, Jonathan P. Keating, Francesco Mezzadri, Joseph Najnudel

Abstract: The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established… ▽ More The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in Random Matrix Theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context. △ Less

Submitted 8 June, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

Comments: 50 pages, 11 figures; references added for Kac-Rice reduction to RMT method; updates following JSTAT review and publication

arXiv:2003.00454 [pdf, ps, other]

Maximum Absolute Determinants of Upper Hessenberg Bohemian Matrices

Authors: Jonathan P. Keating, Ahmet Abdullah Keleş

Abstract: A matrix is called Bohemian if its entries are sampled from a finite set of integers. We determine the maximum absolute determinant of upper Hessenberg Bohemian Matrices for which the subdiagonal entries are fixed to be $1$ and upper triangular entries are sampled from $\{0,1,\cdots,n\}$, extending previous results for $n=1$ and $n=2$ and proving a recent conjecture of Fasi & Negri Porzio [8]. Fur… ▽ More A matrix is called Bohemian if its entries are sampled from a finite set of integers. We determine the maximum absolute determinant of upper Hessenberg Bohemian Matrices for which the subdiagonal entries are fixed to be $1$ and upper triangular entries are sampled from $\{0,1,\cdots,n\}$, extending previous results for $n=1$ and $n=2$ and proving a recent conjecture of Fasi & Negri Porzio [8]. Furthermore, we generalize the problem to non-integer-valued entries. △ Less

Submitted 8 May, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

arXiv:1304.6937 [pdf, other]

doi 10.1215/00127094-2856619

Detecting squarefree numbers

Authors: Andrew R. Booker, Ghaith A. Hiary, Jon P. Keating

Abstract: We present an algorithm, based on the explicit formula for $L$-functions and conditional on GRH, for proving that a given integer is squarefree with little or no knowledge of its factorization. We analyze the algorithm both theoretically and practically, and use it to prove that several RSA challenge numbers are not squarefull. We present an algorithm, based on the explicit formula for $L$-functions and conditional on GRH, for proving that a given integer is squarefree with little or no knowledge of its factorization. We analyze the algorithm both theoretically and practically, and use it to prove that several RSA challenge numbers are not squarefull. △ Less

Submitted 5 January, 2015; v1 submitted 25 April, 2013; originally announced April 2013.

Comments: 31 pages, 3 figures, latest version

Journal ref: Duke Math. J. 164, no. 2 (2015), 235-275

Showing 1–8 of 8 results for author: Keating, P