Skip to main content

Showing 1–6 of 6 results for author: Maennel, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2106.09647  [pdf, other

    cs.LG stat.ML

    Deep Learning Through the Lens of Example Difficulty

    Authors: Robert J. N. Baldock, Hartmut Maennel, Behnam Neyshabur

    Abstract: Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth. Our extensive investigation reveals surprising yet simple… ▽ More

    Submitted 18 June, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: Main paper: 15 pages, 8 figures. Appendix: 31 pages, 40 figures

  2. arXiv:2006.10455  [pdf, other

    stat.ML cs.LG

    What Do Neural Networks Learn When Trained With Random Labels?

    Authors: Hartmut Maennel, Ibrahim Alabdulmohsin, Ilya Tolstikhin, Robert J. N. Baldock, Olivier Bousquet, Sylvain Gelly, Daniel Keysers

    Abstract: We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal c… ▽ More

    Submitted 11 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted, NeurIPS2020

  3. arXiv:2004.00115  [pdf, other

    stat.ML cs.LG

    Exact marginal inference in Latent Dirichlet Allocation

    Authors: Hartmut Maennel

    Abstract: Assume we have potential "causes" $z\in Z$, which produce "events" $w$ with known probabilities $β(w|z)$. We observe $w_1,w_2,...,w_n$, what can we say about the distribution of the causes? A Bayesian estimate will assume a prior on distributions on $Z$ (we assume a Dirichlet prior) and calculate a posterior. An average over that posterior then gives a distribution on $Z$, which estimates how much… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.

  4. arXiv:1906.07987  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

    Authors: Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

    Abstract: We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this pa… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

  5. arXiv:1807.03064  [pdf, other

    cs.LG stat.ML

    Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

    Authors: Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto

    Abstract: Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation. To increase our understanding of the problem, we investigate the issue of approximation errors in areas of sharp discontinuities of the value function being further propagated by bootstr… ▽ More

    Submitted 9 July, 2018; originally announced July 2018.

  6. arXiv:1803.08367  [pdf, other

    stat.ML cs.LG

    Gradient Descent Quantizes ReLU Network Features

    Authors: Hartmut Maennel, Olivier Bousquet, Sylvain Gelly

    Abstract: Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several studies have highlighted the fact that the training procedure, i.e. mini-batch Stochastic Gradient Descent (SGD) leads to solutions that have specific properties in t… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.