Skip to main content

Showing 1–16 of 16 results for author: Hodgkinson, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2311.07013  [pdf, ps, other

    stat.ML cs.LG

    A PAC-Bayesian Perspective on the Interpolating Information Criterion

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 9 pages

  2. arXiv:2307.07785  [pdf, other

    stat.ML cs.LG

    The Interpolating Information Criterion for Overparameterized Models

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized mod… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: 23 pages, 2 figures

  3. arXiv:2307.02501  [pdf, ps, other

    stat.ML cs.LG

    Generalization Guarantees via Algorithm-dependent Rademacher Complexity

    Authors: Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli

    Abstract: Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  4. arXiv:2306.09262  [pdf, other

    stat.ML cs.LG cs.PL

    A Heavy-Tailed Algebra for Probabilistic Programming

    Authors: Feynman Liang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approac… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: 21 pages, 6 figures

  5. arXiv:2305.12313  [pdf, other

    stat.ML cs.LG

    When are ensembles really effective?

    Authors: Ryan Theisen, Hyunsuk Kim, Yaoqing Yang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Ensembling has a long history in statistical data analysis, with many impactful applications. However, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious. We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new res… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

  6. arXiv:2210.07612  [pdf, other

    stat.ML cs.LG

    Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

    Authors: Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney

    Abstract: Despite their importance for assessing reliability of predictions, uncertainty quantification (UQ) measures for machine learning models have only recently begun to be rigorously characterized. One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input d… ▽ More

    Submitted 25 July, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: 33 pages, 21 figures

  7. arXiv:2205.07918  [pdf, other

    stat.ML cs.LG

    Fat-Tailed Variational Inference with Anisotropic Tail Adaptive Flows

    Authors: Feynman Liang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately. We first improve previous theory on tails of Lipschitz flows by quantifying how the tails affect the rate of tail decay and by expanding the theory to non-Lipschitz polynomial… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  8. arXiv:2108.00781  [pdf, other

    stat.ML cs.LG

    Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

    Authors: Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney

    Abstract: Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time ap… ▽ More

    Submitted 11 July, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: 22 pages, 6 figures

  9. arXiv:2106.10820  [pdf, other

    cs.LG stat.ML

    Stateful ODE-Nets using Basis Function Expansions

    Authors: Alejandro Queiruga, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney

    Abstract: The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-in-depth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view… ▽ More

    Submitted 6 November, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted at 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  10. arXiv:2102.04877  [pdf, other

    stat.ML cs.LG math.DS math.PR

    Noisy Recurrent Neural Networks

    Authors: Soon Hoe Lim, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney

    Abstract: We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regulari… ▽ More

    Submitted 1 December, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: 38 pages

    Journal ref: NeurIPS 2021 (https://proceedings.neurips.cc/paper/2021/hash/29301521774ff3cbd26652b2d5c95996-Abstract.html)

  11. arXiv:2006.12070  [pdf, other

    cs.LG math.DS stat.ML

    Lipschitz Recurrent Neural Networks

    Authors: N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this… ▽ More

    Submitted 23 April, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Published as a conference paper at ICLR 2021

  12. arXiv:2006.06293  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Multiplicative noise and heavy tails in stochastic optimization

    Authors: Liam Hodgkinson, Michael W. Mahoney

    Abstract: Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization algorithms as discrete random recurrence relations, we show that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: 30 pages, 7 figures

  13. arXiv:2002.09547  [pdf, other

    stat.ML cs.LG

    Stochastic Normalizing Flows

    Authors: Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney

    Abstract: We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs). Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs as random neural ordinary differential equ… ▽ More

    Submitted 25 February, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: 17 pages, 4 figures

  14. arXiv:2001.09266  [pdf, other

    math.ST stat.ML

    The reproducing Stein kernel approach for post-hoc corrected sampling

    Authors: Liam Hodgkinson, Robert Salomone, Fred Roosta

    Abstract: Stein importance sampling is a widely applicable technique based on kernelized Stein discrepancy, which corrects the output of approximate sampling algorithms by reweighting the empirical distribution of the samples. A general analysis of this technique is conducted for the previously unconsidered setting where samples are obtained via the simulation of a Markov chain, and applies to an arbitrary… ▽ More

    Submitted 13 September, 2021; v1 submitted 25 January, 2020; originally announced January 2020.

    Comments: 26 pages, 2 figures

    MSC Class: 65C05 (Primary) 60J22; 60B10 (Secondary)

  15. arXiv:1907.08410  [pdf, other

    stat.ML cs.LG

    Geometric Rates of Convergence for Kernel-based Sampling Algorithms

    Authors: Rajiv Khanna, Liam Hodgkinson, Michael W. Mahoney

    Abstract: The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated. Under verifiable conditions on the chosen kernel and target measure, we establish a near-geometric rate of convergence for target measures that are nearly atomic. Furthermor… ▽ More

    Submitted 31 October, 2021; v1 submitted 19 July, 2019; originally announced July 2019.

    Comments: Accepted to UAI 2021 (Oral)

  16. arXiv:1903.12322  [pdf, other

    stat.ML cs.LG stat.CO

    Implicit Langevin Algorithms for Sampling From Log-concave Densities

    Authors: Liam Hodgkinson, Robert Salomone, Fred Roosta

    Abstract: For sampling from a log-concave density, we study implicit integrators resulting from $θ$-method discretization of the overdamped Langevin diffusion stochastic differential equation. Theoretical and algorithmic properties of the resulting sampling methods for $ θ\in [0,1] $ and a range of step sizes are established. Our results generalize and extend prior works in several directions. In particular… ▽ More

    Submitted 10 July, 2021; v1 submitted 28 March, 2019; originally announced March 2019.