Skip to main content

Showing 1–43 of 43 results for author: Grünwald, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.16646  [pdf, ps, other

    cs.IT math.ST

    Universal Reverse Information Projections and Optimal E-statistics

    Authors: Tyron Lardy, Peter Grünwald, Peter Harremoës

    Abstract: Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to so-called growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterio… ▽ More

    Submitted 4 December, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: A five-page abstract of this paper, containing a subset of the theorems but no proofs, was presented at ISIT 2023, Taipei

    MSC Class: 62B10 (primary); 94A17 (secondary)

  2. arXiv:2210.01948  [pdf, ps, other

    math.ST cs.GT cs.IT stat.ME

    Game-theoretic statistics and safe anytime-valid inference

    Authors: Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, Glenn Shafer

    Abstract: Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty -- e-processes for testing and confidence sequences for estimation -- that remain valid at all stop** times, accommodating continuous monitoring and analysis of accumulating data and optional stop** or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative… ▽ More

    Submitted 17 June, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: 25 pages. Under review. ArXiv does not compile/space some references properly

  3. The no-free-lunch theorems of supervised learning

    Authors: Tom F. Sterkenburg, Peter D. Grünwald

    Abstract: The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-dri… ▽ More

    Submitted 9 February, 2022; originally announced February 2022.

    Journal ref: Synthese 199:9979-10015 (2021)

  4. arXiv:2201.06487  [pdf, ps, other

    stat.ML cs.LG

    Minimax risk classifiers with 0-1 loss

    Authors: Santiago Mazuelas, Mauricio Romero, Peter Grünwald

    Abstract: Supervised classification techniques use training samples to learn a classification rule with small expected 0-1 loss (error probability). Conventional methods enable tractable learning and provide out-of-sample generalization by using surrogate losses instead of the 0-1 loss and considering specific families of rules (hypothesis classes). This paper presents minimax risk classifiers (MRCs) that m… ▽ More

    Submitted 16 August, 2023; v1 submitted 17 January, 2022; originally announced January 2022.

  5. arXiv:2106.09683  [pdf, other

    cs.LG cs.IT stat.ML

    PAC-Bayes, MAC-Bayes and Conditional Mutual Information: Fast rate bounds that handle general VC classes

    Authors: Peter Grünwald, Thomas Steinke, Lydia Zakynthinou

    Abstract: We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and gho… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 24 pages, accepted for publication at COLT 2021

  6. arXiv:2106.02693  [pdf, other

    stat.ME cs.LG math.ST

    Generic E-Variables for Exact Sequential k-Sample Tests that allow for Optional Stop**

    Authors: Rosanne Turner, Alexander Ly, Peter Grünwald

    Abstract: We develop E-variables for testing whether two or more data streams come from the same source or not, and more generally, whether the difference between the sources is larger than some minimal effect size. These E-variables lead to exact, nonasymptotic tests that remain safe, i.e. keep their type-I error guarantees, under flexible sampling scenarios such as optional stop** and continuation. In s… ▽ More

    Submitted 22 June, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

  7. arXiv:2103.13686  [pdf, other

    cs.LG cs.AI stat.ML

    Robust subgroup discovery

    Authors: Hugo Manuel Proença, Peter Grünwald, Thomas Bäck, Matthijs van Leeuwen

    Abstract: We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same tim… ▽ More

    Submitted 30 June, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: For associated code, see https://github.com/HMProenca/RuleList ; submitted to Data Mining and Knowledge Discovery Journal

    Journal ref: Data Mining and Knowledge Discovery 36 (2022)1885-1970

  8. Discovering outstanding subgroup lists for numeric targets using MDL

    Authors: Hugo M. Proença, Peter Grünwald, Thomas Bäck, Matthijs van Leeuwen

    Abstract: The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperpar… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: Extended version of conference paper at ECML-PKDD

    Journal ref: ECML PKDD 2020, LNAI 12457, pp. 19-35, 2021

  9. arXiv:1910.09227  [pdf, other

    math.ST cs.LG stat.ME

    Safe-Bayesian Generalized Linear Regression

    Authors: Rianne de Heide, Alisa Kirichenko, Nishant Mehta, Peter Grünwald

    Abstract: We study generalized Bayesian inference under misspecification, i.e. when the model is 'wrong but useful'. Generalized Bayes equips the likelihood with a learning rate $η$. We show that for generalized linear models (GLMs), $η$-generalized Bayes concentrates around the best approximation of the truth within the model for specific $η\neq 1$, even under severely misspecified noise, as long as the ta… ▽ More

    Submitted 29 May, 2021; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: Final version. Accepted to AISTATS 2020

  10. arXiv:1908.08484  [pdf, ps, other

    stat.ME cs.IT cs.LG stat.ML

    Minimum Description Length Revisited

    Authors: Peter Grünwald, Teemu Roos

    Abstract: This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since… ▽ More

    Submitted 18 December, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: to appear in International Journal of Mathematics for Industry

  11. arXiv:1906.07801  [pdf, other

    math.ST cs.IT cs.LG stat.ME

    Safe Testing

    Authors: Peter Grünwald, Rianne de Heide, Wouter Koolen

    Abstract: We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on e-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define grow… ▽ More

    Submitted 10 March, 2023; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted as discussion paper to the Journal of the Royal Statistical Society series B

  12. arXiv:1905.13367  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayes Un-Expected Bernstein Inequality

    Authors: Zakaria Mhammedi, Peter D. Grunwald, Benjamin Guedj

    Abstract: We present a new PAC-Bayesian generalization bound. Standard bounds contain a $\sqrt{L_n \cdot \KL/n}$ complexity term which dominates unless $L_n$, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace $L_n$ by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset a… ▽ More

    Submitted 3 November, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: 24 pages, 6 figures. To Appear in NeurIPS2019

    Journal ref: NeurIPS 2019

  13. arXiv:1807.09077  [pdf, ps, other

    math.ST cs.LG

    Optional Stop** with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations

    Authors: Allard Hendriksen, Rianne de Heide, Peter Grünwald

    Abstract: It is often claimed that Bayesian methods, in particular Bayes factor methods for hypothesis testing, can deal with optional stop**. We first give an overview, using elementary probability theory, of three different mathematical meanings that various authors give to this claim: (1) stop** rule independence, (2) posterior calibration and (3) (semi-) frequentist robustness to optional stop**.… ▽ More

    Submitted 29 April, 2020; v1 submitted 24 July, 2018; originally announced July 2018.

    Comments: 29 pages

  14. arXiv:1710.07732  [pdf, other

    cs.LG stat.ML

    A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

    Authors: Peter D. Grünwald, Nishant A. Mehta

    Abstract: We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexit… ▽ More

    Submitted 20 October, 2017; originally announced October 2017.

    Comments: 38 pages

  15. arXiv:1605.06439  [pdf, ps, other

    cs.LG

    Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning

    Authors: Wouter M. Koolen, Peter Grünwald, Tim van Erven

    Abstract: We consider online learning algorithms that guarantee worst-case regret rates in adversarial environments (so they can be deployed safely and will perform robustly), yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance). We quantify the friendliness of stochastic environments by means of the well-known Bernstein (a.k.a… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

    Journal ref: Advances in Neural Information Processing Systems 29 (NeurIPS), 4457-4465, 2016

  16. arXiv:1605.00252  [pdf, other

    cs.LG stat.ML

    Fast Rates for General Unbounded Loss Functions: from ERM to Generalized Bayes

    Authors: Peter D. Grünwald, Nishant A. Mehta

    Abstract: We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $η$-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Baye… ▽ More

    Submitted 5 November, 2019; v1 submitted 1 May, 2016; originally announced May 2016.

    Comments: accepted to JMLR pending minor final modifications

  17. arXiv:1604.01785  [pdf, other

    stat.ME cs.AI cs.LG math.ST

    Safe Probability

    Authors: Peter Grünwald

    Abstract: We formalize the idea of probability distributions that lead to reliable predictions about some, but not all aspects of a domain. The resulting notion of `safety' provides a fresh perspective on foundational issues in statistics, providing a middle ground between imprecise probability and multiple-prior models on the one hand and strictly Bayesian approaches on the other. It also allows us to form… ▽ More

    Submitted 6 April, 2016; originally announced April 2016.

    Comments: Submitted to a journal

    MSC Class: 62A01

  18. Robust Probability Updating

    Authors: Thijs van Ommen, Wouter M. Koolen, Thijs E. Feenstra, Peter D. Grünwald

    Abstract: This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution.… ▽ More

    Submitted 2 May, 2016; v1 submitted 10 December, 2015; originally announced December 2015.

    Comments: 47 pages, 4 figures. This second version is the accepted manuscript: it incorporates reviewer comments and has a new title

    Journal ref: International Journal of Approximate Reasoning 74 (2016) 30-57

  19. arXiv:1507.02592  [pdf, other

    cs.LG stat.ML

    Fast rates in statistical and online learning

    Authors: Tim van Erven, Peter D. Grünwald, Nishant A. Mehta, Mark D. Reid, Robert C. Williamson

    Abstract: The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most… ▽ More

    Submitted 1 September, 2015; v1 submitted 9 July, 2015; originally announced July 2015.

    Comments: 69 pages, 3 figures

    Journal ref: Journal of Machine Learning Research 6(54):1793-1861, 2015

  20. arXiv:1407.7190  [pdf

    cs.AI

    A Game-Theoretic Analysis of Updating Sets of Probabilities

    Authors: Peter D. Grunwald, Joseph Y. Halpern

    Abstract: We consider how an agent should update her uncertainty when it is represented by a set P of probability distributions and the agent observes that a random variable X takes on value x, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a book… ▽ More

    Submitted 27 July, 2014; originally announced July 2014.

    Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

    Report number: UAI-P-2008-PG-240-247

  21. arXiv:1407.7188  [pdf

    cs.AI

    When Ignorance is Bliss

    Authors: Peter D. Grunwald, Joseph Y. Halpern

    Abstract: It is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at ha… ▽ More

    Submitted 27 July, 2014; originally announced July 2014.

    Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

    Report number: UAI-P-2004-PG-226-234

  22. arXiv:1407.7183  [pdf

    cs.AI

    Updating Probabilities

    Authors: Peter D. Grunwald, Joseph Y. Halpern

    Abstract: As examples such as the Monty Hall puzzle show, applying conditioning to update a probability distribution on a ``naive space', which does not take into account the protocol used, can often lead to counterintuitive results. Here we examine why. A criterion known as CAR (coarsening at random) in the statistical literature characterizes when ``naive' conditioning in a naive space works. We show… ▽ More

    Submitted 27 July, 2014; originally announced July 2014.

    Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

    Report number: UAI-P-2002-PG-187-196

  23. arXiv:1401.3906  [pdf

    cs.AI cs.GT

    Making Decisions Using Sets of Probabilities: Updating, Time Consistency, and Calibration

    Authors: Peter D Grunwald, Joseph Y Halpern

    Abstract: We consider how an agent should update her beliefs when her beliefs are represented by a set P of probability distributions, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a bookie, who chooses some distribution from P. We consider two r… ▽ More

    Submitted 16 January, 2014; originally announced January 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 42, pages 393-426, 2011

  24. arXiv:1305.4324  [pdf, ps, other

    cs.LG stat.ML

    Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

    Authors: Peter Bartlett, Peter Grunwald, Peter Harremoes, Fares Hedayati, Wojciech Kotlowski

    Abstract: We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They… ▽ More

    Submitted 19 May, 2013; originally announced May 2013.

    Comments: 23 pages

  25. arXiv:1301.7378  [pdf

    cs.LG stat.ML

    Minimum Encoding Approaches for Predictive Modeling

    Authors: Peter D Grunwald, Petri Kontkanen, Petri Myllymaki, Tomi Silander, Henry Tirri

    Abstract: We analyze differences between two information-theoretically motivated approaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this analysis, we present two revised versions of MML: a pointwise estimator which gives the MML-optimal single parameter model, and a volumewise estimator which give… ▽ More

    Submitted 30 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998)

    Report number: UAI-P-1998-PG-183-192

  26. arXiv:1301.3860  [pdf

    cs.AI

    Maximum Entropy and the Glasses You Are Looking Through

    Authors: Peter D. Grunwald

    Abstract: We give an interpretation of the Maximum Entropy (MaxEnt) Principle in game-theoretic terms. Based on this interpretation, we make a formal distinction between different ways of {em applying/} Maximum Entropy distributions. MaxEnt has frequently been criticized on the grounds that it leads to highly representation dependent results. Our distinction allows us to avoid this problem in many cases.

    Submitted 16 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

    Report number: UAI-P-2000-PG-238-246

  27. arXiv:1301.0534  [pdf, ps, other

    cs.LG stat.ML

    Follow the Leader If You Can, Hedge If You Must

    Authors: Steven de Rooij, Tim van Erven, Peter D. Grünwald, Wouter M. Koolen

    Abstract: Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combi… ▽ More

    Submitted 17 January, 2013; v1 submitted 3 January, 2013; originally announced January 2013.

    Comments: under submission

    Journal ref: Journal of Machine Learning Research 15(37):1281-1316, 2014

  28. arXiv:1205.2597   

    cs.AI

    Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (2010)

    Authors: Peter Grunwald, Peter Spirtes

    Abstract: This is the Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, which was held on Catalina Island, CA, July 8 - 11 2010.

    Submitted 28 August, 2014; v1 submitted 11 May, 2012; originally announced May 2012.

    Report number: UAI2010

  29. arXiv:1107.6004  [pdf

    cs.IT physics.data-an

    Explicit Bounds for Entropy Concentration under Linear Constraints

    Authors: Kostas N. Oikonomou, Peter D. Grunwald

    Abstract: Consider the set of all sequences of $n$ outcomes, each taking one of $m$ values, that satisfy a number of linear constraints. If $m$ is fixed while $n$ increases, most sequences that satisfy the constraints result in frequency vectors whose entropy approaches that of the maximum entropy vector satisfying the constraints. This well-known "entropy concentration" phenomenon underlies the maximum ent… ▽ More

    Submitted 30 September, 2015; v1 submitted 29 July, 2011; originally announced July 2011.

    Comments: 1) An error affecting sec. 3 has been corrected: the parameters delta and theta cannot be chosen independently. Sec. 3 has been revised up to Theorem 3.15 in sec. 3.6. 2) Some minor updates in sec. 4. 3) Some proofs used in both sec. 3 and sec. 4 have been unified (This version to appear in IEEE Transactions on Information Theory, December 2015)

  30. arXiv:1002.0757  [pdf, ps, other

    cs.IT cs.LG math.ST

    Prequential Plug-In Codes that Achieve Optimal Redundancy Rates even if the Model is Wrong

    Authors: Peter Grünwald, Wojciech Kotłowski

    Abstract: We analyse the prequential plug-in codes relative to one-parameter exponential families M. We show that if data are sampled i.i.d. from some distribution outside M, then the redundancy of any plug-in prequential code grows at rate larger than 1/2 ln(n) in the worst case. This means that plug-in codes, such as the Rissanen-Dawid ML code, may behave inferior to other important universal codes such… ▽ More

    Submitted 3 February, 2010; originally announced February 2010.

  31. arXiv:0903.5399  [pdf, ps, other

    cs.IT

    Regret and Jeffreys Integrals in Exp. Families

    Authors: Peter Grunwald, Peter Harremoes

    Abstract: The problem of whether minimax redundancy, minimax regret and Jeffreys integrals are finite or infinite are discussed.

    Submitted 31 March, 2009; originally announced March 2009.

  32. arXiv:0809.2754  [pdf, ps, other

    cs.IT cs.LG math.ST

    Algorithmic information theory

    Authors: Peter D. Grunwald, Paul M. B. Vitanyi

    Abstract: We introduce algorithmic information theory, also known as the theory of Kolmogorov complexity. We explain the main concepts of this quantitative approach to defining `information'. We discuss the extent to which Kolmogorov's and Shannon's information theory have a common purpose, and where they are fundamentally different. We indicate how recent developments within the theory allow one to forma… ▽ More

    Submitted 17 September, 2008; v1 submitted 16 September, 2008; originally announced September 2008.

    Comments: 37 pages, 2 figures, pdf, in: Philosophy of Information, P. Adriaans and J. van Benthem, Eds., A volume in Handbook of the philosophy of science, D. Gabbay, P. Thagard, and J. Woods, Eds., Elsevier, 2008. In version 1 of September 16 the refs are missing. Corrected in version 2 of September 17

  33. arXiv:0809.1017  [pdf, ps, other

    cs.IT cs.LG math.ST stat.ME

    Entropy Concentration and the Empirical Coding Game

    Authors: Peter Grunwald

    Abstract: We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two `strong entropy concentration' theorems. These theorems unify and generalize Jaynes' `concentration phenomenon' and Van Campenhout and Cover's `conditional limit theorem'. The theorems characterize exactly in what sense a prior distribution Q conditioned on a given constraint, and the distribution P… ▽ More

    Submitted 5 September, 2008; originally announced September 2008.

    Comments: A somewhat modified version of this paper was published in Statistica Neerlandica 62(3), pages 374-392, 2008

  34. arXiv:0807.1005  [pdf, ps, other

    math.ST cs.IT cs.LG stat.ME stat.ML

    Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma

    Authors: Tim van Erven, Peter Grunwald, Steven de Rooij

    Abstract: Bayesian model averaging, model selection and its approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates og convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can br inconsistent. We identify the "catch-up phenomenon" as a novel explanation for the slow convergence of Bayesian method… ▽ More

    Submitted 7 July, 2008; originally announced July 2008.

    Comments: A preliminary version of a part of this paper appeared at the NIPS 2007 conference

    MSC Class: 62G99; 94A99

  35. arXiv:0711.3235  [pdf, ps, other

    cs.AI math.ST

    A Game-Theoretic Analysis of Updating Sets of Probabilities

    Authors: Peter D. Grunwald, Joseph Y. Halpern

    Abstract: We consider how an agent should update her uncertainty when it is represented by a set $¶$ of probability distributions and the agent observes that a random variable $X$ takes on value $x$, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays agains… ▽ More

    Submitted 20 November, 2007; originally announced November 2007.

    ACM Class: I.2.4

  36. arXiv:math/0510276  [pdf, ps, other

    math.ST cs.AI stat.ME

    An algorithmic and a geometric characterization of Coarsening At Random

    Authors: Richard D. Gill, Peter D. Grunwald

    Abstract: We show that the class of conditional distributions satisfying the coarsening at Random (CAR) property for discrete data has a simple and robust algorithmic description based on randomized uniform multicovers: combinatorial objects generalizing the notion of partition of a set. However, the complexity of a given CAR mechanism can be large: the maximal "height" of the needed multicovers can be ex… ▽ More

    Submitted 13 September, 2007; v1 submitted 13 October, 2005; originally announced October 2005.

    Comments: 16 pages; accepted in this form for publication by Annals of Statistics

    Report number: See also 0811.0683 (duplicate submission) MSC Class: 62A01 (Primary); 62N01; 60A99; 68T37 (Secondary)

    Journal ref: The Annals of Statistics 2008, Vol. 36, No. 5, 2409-2422

  37. arXiv:cs/0510080  [pdf, ps, other

    cs.AI cs.LG

    When Ignorance is Bliss

    Authors: Peter D. Grunwald, Joseph Y. Halpern

    Abstract: It is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at… ▽ More

    Submitted 25 October, 2005; originally announced October 2005.

    Comments: In Proceedings of the Twentieth Conference on Uncertainty in AI, 2004, pp. 226-234

    ACM Class: I.2.4

  38. arXiv:cs/0502004  [pdf, ps, other

    cs.LG cs.IT

    Asymptotic Log-loss of Prequential Maximum Likelihood Codes

    Authors: Peter Grunwald, Steven de Rooij

    Abstract: We analyze the Dawid-Rissanen prequential maximum likelihood codes relative to one-parameter exponential family models M. If data are i.i.d. according to an (essentially) arbitrary P, then the redundancy grows at rate c/2 ln n. We show that c=v1/v2, where v1 is the variance of P, and v2 is the variance of the distribution m* in M that is closest to P in KL divergence. This shows that prequential… ▽ More

    Submitted 1 February, 2005; originally announced February 2005.

    Comments: 22 pages, an abstract has been submitted to COLT 2005

    ACM Class: E.4

  39. arXiv:cs/0501028  [pdf, ps, other

    cs.LG cs.IT

    An Empirical Study of MDL Model Selection with Infinite Parametric Complexity

    Authors: Steven de Rooij, Peter Grunwald

    Abstract: Parametric complexity is a central concept in MDL model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys' prior can not be used. Several ways to resolve this problem have been proposed. We conduct experiments to compare and… ▽ More

    Submitted 14 January, 2005; originally announced January 2005.

    Comments: 23 pages, 11 graphs

    ACM Class: E.3; G.4

  40. arXiv:cs/0410002  [pdf, ps, other

    cs.IT

    Shannon Information and Kolmogorov Complexity

    Authors: Peter Grunwald, Paul Vitanyi

    Abstract: We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy versus Kolmogorov complexity, the relation of both to universal coding, Shannon mutual information versus Kolmogorov (`algorithmic') mutual informa… ▽ More

    Submitted 1 October, 2004; originally announced October 2004.

    Comments: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans Information Theory

    ACM Class: E.4, H.1.1

    Journal ref: There are some errors in this paper draft; when in doubt see the textbook Li, Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, Springer, 1993, 1997, 2008, 2019

  41. arXiv:math/0406221  [pdf, ps, other

    math.ST cs.IT cs.LG

    Suboptimal behaviour of Bayes and MDL in classification under misspecification

    Authors: Peter Grunwald, John Langford

    Abstract: We show that forms of Bayesian and MDL inference that are often applied to classification problems can be *inconsistent*. This means there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error.

    Submitted 10 June, 2004; originally announced June 2004.

    Comments: This is a slightly longer version of our paper at the COLT (Computational Learning Theory) 2004 Conference, containing two extra pages of discussion of the main results

    MSC Class: 62A01; 68T05; 68T10

  42. arXiv:math/0406077  [pdf, ps, other

    math.ST cs.IT cs.LG

    A tutorial introduction to the minimum description length principle

    Authors: Peter Grunwald

    Abstract: This tutorial provides an overview of and introduction to Rissanen's Minimum Description Length (MDL) Principle. The first chapter provides a conceptual, entirely non-technical introduction to the subject. It serves as a basis for the technical introduction given in the second chapter, in which all the ideas of the first chapter are made mathematically precise. The main ideas are discussed in gr… ▽ More

    Submitted 4 June, 2004; originally announced June 2004.

    Comments: 80 pages 5 figures Report with 2 chapters

    MSC Class: 6201; 6801; 68T05; 68T10; 9401

  43. arXiv:cs/0306124  [pdf, ps, other

    cs.AI

    Updating Probabilities

    Authors: Peter D. Grunwald, Joseph Y. Halpern

    Abstract: As examples such as the Monty Hall puzzle show, applying conditioning to update a probability distribution on a ``naive space'', which does not take into account the protocol used, can often lead to counterintuitive results. Here we examine why. A criterion known as CAR (``coarsening at random'') in the statistical literature characterizes when ``naive'' conditioning in a naive space works. We s… ▽ More

    Submitted 23 June, 2003; originally announced June 2003.

    Comments: This is an expanded version of a paper that appeared in Proceedings of the Eighteenth Conference on Uncertainty in AI, 2002, pp. 187--196. to appear, Journal of AI Research

    ACM Class: I.2.4