Skip to main content

Showing 1–17 of 17 results for author: Granziol, D

.
  1. arXiv:2205.08601  [pdf, other

    math-ph cond-mat.dis-nn cs.LG

    Universal characteristics of deep neural network loss surfaces from random matrix theory

    Authors: Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph Najnudel, Diego Granziol

    Abstract: This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networ… ▽ More

    Submitted 20 June, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: 42 pages

  2. arXiv:2107.13327  [pdf, other

    cs.IR

    Ranker-agnostic Contextual Position Bias Estimation

    Authors: Oriol Barbany Mayor, Vito Bellini, Alexander Buchholz, Giuseppe Di Benedetto, Diego Marco Granziol, Matteo Ruffini, Yannik Stein

    Abstract: Learning-to-rank (LTR) algorithms are ubiquitous and necessary to explore the extensive catalogs of media providers. To avoid the user examining all the results, its preferences are used to provide a subset of relatively small size. The user preferences can be inferred from the interactions with the presented content if explicit ratings are unavailable. However, directly using implicit feedback ca… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  3. arXiv:2102.06740  [pdf, other

    cs.LG math-ph stat.ML

    Appearance of Random Matrix Theory in Deep Learning

    Authors: Nicholas P Baskerville, Diego Granziol, Jonathan P Keating

    Abstract: We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the s… ▽ More

    Submitted 24 December, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: 33 pages, 14 figures

  4. arXiv:2011.08181  [pdf, other

    stat.ML cs.LG

    A Random Matrix Theory Approach to Dam** in Deep Learning

    Authors: Diego Granziol, Nicholas Baskerville

    Abstract: We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise in the flattest directions of the true loss surface. We demonstrate that typical schedules used for adaptive methods (with low numerical stability or dam** constants) serve to bias relative movement towards flat directions rela… ▽ More

    Submitted 16 March, 2022; v1 submitted 15 November, 2020; originally announced November 2020.

  5. arXiv:2006.09092  [pdf, other

    stat.ML cs.LG

    Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training

    Authors: Diego Granziol, Stefan Zohren, Stephen Roberts

    Abstract: We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the Generalised Gauss-Newton matrix approximation of the Hessian. As a consequence of our theorems we de… ▽ More

    Submitted 5 November, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

  6. arXiv:2006.09091  [pdf, other

    stat.ML cs.LG

    Flatness is a False Friend

    Authors: Diego Granziol

    Abstract: Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation. In this paper we demonstrate that for feed forward neural networks under the cross entropy loss, we would expect low loss solutions with large weights to have small Hessian based measures of flatness. This implies that solutions obtained using $L2$ regu… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: 9 pages, 10 figures

  7. arXiv:2006.07721  [pdf, other

    stat.ML cs.LG

    Beyond Random Matrix Theory for Deep Networks

    Authors: Diego Granziol

    Abstract: We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities. We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions. This raises major questions about the usefulness of these models in deep learning. We further… ▽ More

    Submitted 3 November, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: 8 pages 5 Figures

  8. arXiv:2003.01247  [pdf, other

    stat.ML cs.LG

    Iterative Averaging in the Quest for Best Test Error

    Authors: Diego Granziol, Xingchen Wan, Samuel Albanie, Stephen Roberts

    Abstract: We analyse and explain the increased generalisation performance of iterate averaging using a Gaussian process perturbation model between the true and batch risk surface on the high dimensional quadratic. We derive three phenomena \latestEdits{from our theoretical results:} (1) The importance of combining iterate averaging (IA) with large learning rates and regularisation for improved regularisatio… ▽ More

    Submitted 31 October, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

  9. arXiv:1912.09656  [pdf, other

    stat.ML cs.LG

    Deep Curvature Suite

    Authors: Diego Granziol, Xingchen Wan, Timur Garipov

    Abstract: We present MLRG Deep Curvature suite, a PyTorch-based, open-source package for analysis and visualisation of neural network curvature and loss landscape. Despite of providing rich information into properties of neural network and useful for a various designed tasks, curvature information is still not made sufficient use for various reasons, and our method aims to bridge this gap. We present a prim… ▽ More

    Submitted 22 May, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

    Comments: 11 pages, 11 figures

  10. arXiv:1912.09068  [pdf, other

    stat.ML cs.LG

    A Maximum Entropy approach to Massive Graph Spectra

    Authors: Diego Granziol, Robin Ru, Stefan Zohren, Xiaowen Dong, Michael Osborne, Stephen Roberts

    Abstract: Graph spectral techniques for measuring graph similarity, or for learning the cluster number, require kernel smoothing. The choice of kernel function and bandwidth are typically chosen in an ad-hoc manner and heavily affect the resulting output. We prove that kernel smoothing biases the moments of the spectral density. We propose an information theoretically optimal approach to learn a smooth grap… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

    Comments: 12 pages. 9 Figures

  11. arXiv:1906.01101  [pdf, other

    stat.ML cs.LG

    MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

    Authors: Diego Granziol, Binxin Ru, Stefan Zohren, Xiaowen Doing, Michael Osborne, Stephen Roberts

    Abstract: Efficient approximation lies at the heart of large-scale machine learning problems. In this paper, we propose a novel, robust maximum entropy algorithm, which is capable of dealing with hundreds of moments and allows for computationally efficient approximations. We showcase the usefulness of the proposed method, its equivalence to constrained Bayesian variational inference and demonstrate its supe… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: 18 pages, 3 figures, Published at Entropy 2019: Special Issue Entropy Based Inference and Optimization in Machine Learning

    Journal ref: MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning. Entropy, 21(6), 551 (2019)

  12. arXiv:1804.06802  [pdf, other

    stat.ML cs.IT cs.LG

    Entropic Spectral Learning for Large-Scale Graphs

    Authors: Diego Granziol, Binxin Ru, Stefan Zohren, Xiaowen Dong, Michael Osborne, Stephen Roberts

    Abstract: Graph spectra have been successfully used to classify network types, compute the similarity between graphs, and determine the number of communities in a network. For large graphs, where an eigen-decomposition is infeasible, iterative moment matched approximations to the spectra and kernel smoothing are typically used. We show that the underlying moment information is lost when using kernel smoothi… ▽ More

    Submitted 25 March, 2019; v1 submitted 18 April, 2018; originally announced April 2018.

    Comments: 13 pages, 12 figures

  13. arXiv:1802.08054  [pdf, other

    cs.LG cs.IT stat.ML

    VBALD - Variational Bayesian Approximation of Log Determinants

    Authors: Diego Granziol, Edward Wagstaff, Bin Xin Ru, Michael Osborne, Stephen Roberts

    Abstract: Evaluating the log determinant of a positive definite matrix is ubiquitous in machine learning. Applications thereof range from Gaussian processes, minimum-volume ellipsoids, metric learning, kernel learning, Bayesian neural networks, Determinental Point Processes, Markov random fields to partition functions of discrete graphical models. In order to avoid the canonical, yet prohibitive, Cholesky… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

  14. arXiv:1711.00673  [pdf, other

    stat.ML

    Fast Information-theoretic Bayesian Optimisation

    Authors: Binxin Ru, Mark McLeod, Diego Granziol, Michael A. Osborne

    Abstract: Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems. However, current information-theoretic approaches require many approximations in implementation, introduce often-prohibitive computational overhead and limit the choice of kernels available to model the objective. We develop a fast information-th… ▽ More

    Submitted 6 June, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: Main Paper: 9 pages, 6 figures, 2 tables; Accepted by ICML 2018

  15. arXiv:1709.02702  [pdf, other

    stat.ML

    Entropic Determinants

    Authors: Diego Granziol, Stephen Roberts

    Abstract: The ability of many powerful machine learning algorithms to deal with large data sets without compromise is often hampered by computationally expensive linear algebra tasks, of which calculating the log determinant is a canonical example. In this paper we demonstrate the optimality of Maximum Entropy methods in approximating such calculations. We prove the equivalence between mean value constraint… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

    Comments: 9 pages, 10 figures, 2 tables

  16. arXiv:1704.07223  [pdf, other

    math.NA cs.IT stat.CO stat.ML

    Entropic Trace Estimates for Log Determinants

    Authors: Jack Fitzsimons, Diego Granziol, Kurt Cutajar, Michael Osborne, Maurizio Filippone, Stephen Roberts

    Abstract: The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stoc… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: 16 pages, 4 figures, 2 tables, 2 algorithms

  17. arXiv:1703.10099  [pdf, ps, other

    cond-mat.stat-mech

    An information and field theoretic approach to the grand canonical ensemble

    Authors: Diego Granziol, Stephen Roberts

    Abstract: We present a novel derivation of the constraints required to obtain the underlying principles of statistical mechanics using a maximum entropy framework. We derive the mean value constraints by use of the central limit theorem and the scaling properties of Lagrange multipliers. We then arrive at the same result using a quantum free field theory and the Ward identities. The work provides a principl… ▽ More

    Submitted 29 March, 2017; originally announced March 2017.

    Comments: 7 pages, 3 pages of Appendix