Skip to main content

Showing 1–11 of 11 results for author: Mattern, C

.
  1. arXiv:2401.14953  [pdf, other

    cs.LG cs.AI

    Learning Universal Predictors

    Authors: Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness

    Abstract: Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of meta-learning? In this work, we explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neu… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 32 pages, 11 figures

  2. arXiv:2309.10668  [pdf, other

    cs.LG cs.AI cs.CL cs.IT

    Language Modeling Is Compression

    Authors: Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

    Abstract: It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In th… ▽ More

    Submitted 18 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  3. arXiv:2305.13063  [pdf, ps, other

    cs.LG

    Hierarchical Partitioning Forecaster

    Authors: Christopher Mattern

    Abstract: In this work we consider a new family of algorithms for sequential prediction, Hierarchical Partitioning Forecasters (HPFs). Our goal is to provide appealing theoretical - regret guarantees on a powerful model class - and practical - empirical performance comparable to deep networks - properties at the same time. We built upon three principles: hierarchically partitioning the feature space into su… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  4. arXiv:1910.01526  [pdf, other

    cs.LG cs.IT stat.ML

    Gated Linear Networks

    Authors: Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter

    Abstract: This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons can model… ▽ More

    Submitted 11 June, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1712.01897

  5. arXiv:1712.02151  [pdf, other

    cs.IT

    Generalized Probability Smoothing

    Authors: Christopher Mattern

    Abstract: In this work we consider a generalized version of Probability Smoothing, the core elementary model for sequential prediction in the state of the art PAQ family of data compression algorithms. Our main contribution is a code length analysis that considers the redundancy of Probability Smoothing with respect to a Piecewise Stationary Source. The analysis holds for a finite alphabet and expresses red… ▽ More

    Submitted 10 January, 2018; v1 submitted 6 December, 2017; originally announced December 2017.

  6. arXiv:1712.01897  [pdf, other

    cs.LG cs.IT

    Online Learning with Gated Linear Networks

    Authors: Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, Peter Toth

    Abstract: This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss. Rather than relying on non-linear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borel-measurable function on a c… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: 40 pages

  7. arXiv:1501.01202  [pdf, other

    cs.IT

    On Probability Estimation by Exponential Smoothing

    Authors: Christopher Mattern

    Abstract: Probability estimation is essential for every statistical data compression algorithm. In practice probability estimation should be adaptive, recent observations should receive a higher weight than older observations. We present a probability estimation method based on exponential smoothing that satisfies this requirement and runs in constant time per letter. Our main contribution is a theoretical… ▽ More

    Submitted 9 January, 2015; v1 submitted 6 January, 2015; originally announced January 2015.

  8. arXiv:1311.1723  [pdf, ps, other

    cs.IT

    On Probability Estimation via Relative Frequencies and Discount

    Authors: Christopher Mattern

    Abstract: Probability estimation is an elementary building block of every statistical data compression algorithm. In practice probability estimation is often based on relative letter frequencies which get scaled down, when their sum is too large. Such algorithms are attractive in terms of memory requirements, running time and practical performance. However, there still is a lack of theoretical understanding… ▽ More

    Submitted 9 January, 2015; v1 submitted 7 November, 2013; originally announced November 2013.

  9. Combining non-stationary prediction, optimization and mixing for data compression

    Authors: Christopher Mattern

    Abstract: In this paper an approach to modelling nonstationary binary sequences, i.e., predicting the probability of upcoming symbols, is presented. After studying the prediction model we evaluate its performance in two non-artificial test cases. First the model is compared to the Laplace and Krichevsky-Trofimov estimators. Secondly a statistical ensemble model for compressing Burrows-Wheeler-Transform outp… ▽ More

    Submitted 12 February, 2013; originally announced February 2013.

    Comments: International Conference on Data Compression, Communication and Processing (CCP) 2011

  10. Mixing Strategies in Data Compression

    Authors: Christopher Mattern

    Abstract: We propose geometric weighting as a novel method to combine multiple models in data compression. Our results reveal the rationale behind PAQ-weighting and generalize it to a non-binary alphabet. Based on a similar technique we present a new, generic linear mixture technique. All novel mixture techniques rely on given weight vectors. We consider the problem of finding optimal weights and show that… ▽ More

    Submitted 12 February, 2013; originally announced February 2013.

    Comments: Data Compression Conference (DCC) 2012

  11. arXiv:1302.2820  [pdf, other

    cs.IT

    Linear and Geometric Mixtures - Analysis

    Authors: Christopher Mattern

    Abstract: Linear and geometric mixtures are two methods to combine arbitrary models in data compression. Geometric mixtures generalize the empirically well-performing PAQ7 mixture. Both mixture schemes rely on weight vectors, which heavily determine their performance. Typically weight vectors are identified via Online Gradient Descent. In this work we show that one can obtain strong code length bounds for s… ▽ More

    Submitted 12 February, 2013; originally announced February 2013.

    Comments: Data Compression Conference (DCC) 2013