-
Optimal E-Values for Exponential Families: the Simple Case
Authors:
Peter Grünwald,
Tyron Lardy,
Yunda Hao,
Shaul K. Bar-Lev,
Martijn de Jong
Abstract:
We provide a general condition under which e-variables in the form of a simple-vs.-simple likelihood ratio exist when the null hypothesis is a composite, multivariate exponential family. Such `simple' e-variables are easy to compute and expected-log-optimal with respect to any stop** time. Simple e-variables were previously only known to exist in quite specific settings, but we offer a unifying…
▽ More
We provide a general condition under which e-variables in the form of a simple-vs.-simple likelihood ratio exist when the null hypothesis is a composite, multivariate exponential family. Such `simple' e-variables are easy to compute and expected-log-optimal with respect to any stop** time. Simple e-variables were previously only known to exist in quite specific settings, but we offer a unifying theorem on their existence for testing exponential families. We start with a simple alternative $Q$ and a regular exponential family null. Together these induce a second exponential family ${\cal Q}$ containing $Q$, with the same sufficient statistic as the null. Our theorem shows that simple e-variables exist whenever the covariance matrices of ${\cal Q}$ and the null are in a certain relation. Examples in which this relation holds include some $k$-sample tests, Gaussian location- and scale tests, and tests for more general classes of natural exponential families.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Description length of canonical and microcanonical models
Authors:
Francesca Giuffrida,
Tiziano Squartini,
Peter Grünwald,
Diego Garlaschelli
Abstract:
Non-equivalence between the canonical and the microcanonical ensemble has been shown to arise for models defined by an extensive (i.e. scaling with the size of the system) number of constraints (e.g. the Configuration Model). Here, we compare the description length of binary canonical and microcanonical models in light of ensemble non-equivalence. Specifically, we consider the description length i…
▽ More
Non-equivalence between the canonical and the microcanonical ensemble has been shown to arise for models defined by an extensive (i.e. scaling with the size of the system) number of constraints (e.g. the Configuration Model). Here, we compare the description length of binary canonical and microcanonical models in light of ensemble non-equivalence. Specifically, we consider the description length induced by the Normalized Maximum Likelihood (NML), which consists of two terms, i.e. a model log-likelihood and its complexity. While the effects of ensemble non-equivalence on the log-likelihood term are well understood, its effects on the complexity term have not yet been systematically studied. Here, we find that i) microcanonical models are always more complex than their canonical counterparts, ii) the best-scoring model in terms of description length highly depends on the numerical values of the constraints, and iii) the difference between the canonical and the microcanonical description length is strongly influenced by the degree of non-equivalence, a result suggesting that non-equivalence should be taken into account when selecting models. Finally, we compare the NML-based approach to model selection with the Bayesian one in light of our results, showing that the Bayesian description length becomes much more sensitive to the choice of the prior when an extensive number of constraints is involved
△ Less
Submitted 10 June, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Universal Reverse Information Projections and Optimal E-statistics
Authors:
Tyron Lardy,
Peter Grünwald,
Peter Harremoës
Abstract:
Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to so-called growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterio…
▽ More
Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to so-called growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterion are undefined whenever the infimum information divergence between the null and alternative is infinite. We show that in such scenarios there often still exists an element in the alternative that is 'closest' to the null: the universal reverse information projection. The universal reverse information projection and its non-universal counterpart coincide whenever information divergence is finite. Furthermore, the universal RIPr is shown to lead to optimal e-statistics in a sense that is a novel, but natural, extension of the GRO criterion. We also give conditions under which the universal RIPr is a strict sub-probability distribution, as well as conditions under which an approximation of the universal RIPr leads to approximate e-statistics. For this case we provide tight relations between the corresponding approximation rates.
△ Less
Submitted 4 December, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Exponential Stochastic Inequality
Authors:
Peter D. Grünwald,
Muriel F. Pérez-Ortiz,
Zakaria Mhammedi
Abstract:
We develop the concept of exponential stochastic inequality (ESI), a novel notation that simultaneously captures high-probability and in-expectation statements. It is especially well suited to succinctly state, prove, and reason about excess-risk and generalization bounds in statistical learning, specifically, but not restricted to, the PAC-Bayesian type. We show that the ESI satisfies transitivit…
▽ More
We develop the concept of exponential stochastic inequality (ESI), a novel notation that simultaneously captures high-probability and in-expectation statements. It is especially well suited to succinctly state, prove, and reason about excess-risk and generalization bounds in statistical learning, specifically, but not restricted to, the PAC-Bayesian type. We show that the ESI satisfies transitivity and other properties which allow us to use it like standard, nonstochastic inequalities. We substantially extend the original definition from Koolen et al. (2016) and show that general ESIs satisfy a host of useful additional properties, including a novel Markov-like inequality. We show how ESIs relate to, and clarify, PAC-Bayesian bounds, subcentered subgamma random variables and *fast-rate conditions* such as the central and Bernstein conditions. We also show how the ideas can be extended to random scaling factors (learning rates).
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
E-values for k-Sample Tests With Exponential Families
Authors:
Yunda Hao,
Peter Grünwald,
Tyron Lardy,
Long Long,
Reuben Adams
Abstract:
We develop and compare e-variables for testing whether $k$ samples of data are drawn from the same distribution, the alternative being that they come from different elements of an exponential family. We consider the GRO (growth-rate optimal) e-variables for (1) a `small' null inside the same exponential family, and (2) a `large' nonparametric null, as well as (3) an e-variable arrived at by condit…
▽ More
We develop and compare e-variables for testing whether $k$ samples of data are drawn from the same distribution, the alternative being that they come from different elements of an exponential family. We consider the GRO (growth-rate optimal) e-variables for (1) a `small' null inside the same exponential family, and (2) a `large' nonparametric null, as well as (3) an e-variable arrived at by conditioning on the sum of the sufficient statistics. (2) and (3) are efficiently computable, and extend ideas from Turner et al. [2021] and Wald [1947] respectively from Bernoulli to general exponential families. We provide theoretical and simulation-based comparisons of these e-variables in terms of their logarithmic growth rate, and find that for small effects all four e-variables behave surprisingly similarly; for the Gaussian location and Poisson families, e-variables (1) and (3) coincide; for Bernoulli, (1) and (2) coincide; but in general, whether (2) or (3) grows faster under the alternative is family-dependent. We furthermore discuss algorithms for numerically approximating (1).
△ Less
Submitted 8 January, 2024; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Safe Sequential Testing and Effect Estimation in Stratified Count Data
Authors:
Rosanne J. Turner,
Peter D. Grünwald
Abstract:
Sequential decision making significantly speeds up research and is more cost-effective compared to fixed-n methods. We present a method for sequential decision making for stratified count data that retains Type-I error guarantee or false discovery rate under optional stop**, using e-variables. We invert the method to construct stratified anytime-valid confidence sequences, where cross-talk betwe…
▽ More
Sequential decision making significantly speeds up research and is more cost-effective compared to fixed-n methods. We present a method for sequential decision making for stratified count data that retains Type-I error guarantee or false discovery rate under optional stop**, using e-variables. We invert the method to construct stratified anytime-valid confidence sequences, where cross-talk between subpopulations in the data can be allowed during data collection to improve power. Finally, we combine information collected in separate subpopulations through pseudo-Bayesian averaging and switching to create effective estimates for the minimal, mean and maximal treatment effects in the subpopulations.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
The E-Posterior
Authors:
Peter Grünwald
Abstract:
We develop a representation of a decision maker's uncertainty based on e-variables. Like the Bayesian posterior, this *e-posterior* allows for making predictions against arbitrary loss functions that may not be specified ex ante. Unlike the Bayesian posterior, it provides risk bounds that have frequentist validity irrespective of prior adequacy: if the e-collection (which plays a role analogous to…
▽ More
We develop a representation of a decision maker's uncertainty based on e-variables. Like the Bayesian posterior, this *e-posterior* allows for making predictions against arbitrary loss functions that may not be specified ex ante. Unlike the Bayesian posterior, it provides risk bounds that have frequentist validity irrespective of prior adequacy: if the e-collection (which plays a role analogous to the Bayesian prior) is chosen badly, the bounds get loose rather than wrong, making *e-posterior minimax* decision rules safer than Bayesian ones. The resulting *quasi-conditional paradigm* is illustrated by re-interpreting a previous influential partial Bayes-frequentist unification, *Kiefer-Berger-Brown-Wolpert conditional frequentist tests*, in terms of e-posteriors.
△ Less
Submitted 18 September, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Game-theoretic statistics and safe anytime-valid inference
Authors:
Aaditya Ramdas,
Peter Grünwald,
Vladimir Vovk,
Glenn Shafer
Abstract:
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty -- e-processes for testing and confidence sequences for estimation -- that remain valid at all stop** times, accommodating continuous monitoring and analysis of accumulating data and optional stop** or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative…
▽ More
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty -- e-processes for testing and confidence sequences for estimation -- that remain valid at all stop** times, accommodating continuous monitoring and analysis of accumulating data and optional stop** or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.
△ Less
Submitted 17 June, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Anytime Valid Tests of Conditional Independence Under Model-X
Authors:
Peter Grünwald,
Alexander Henzi,
Tyron Lardy
Abstract:
We propose a sequential, anytime-valid method to test the conditional independence of a response $Y$ and a predictor $X$ given a random vector $Z$. The proposed test is based on e-statistics and test martingales, which generalize likelihood ratios and allow valid inference at arbitrary stop** times. In accordance with the recently introduced model-X setting, our test depends on the availability…
▽ More
We propose a sequential, anytime-valid method to test the conditional independence of a response $Y$ and a predictor $X$ given a random vector $Z$. The proposed test is based on e-statistics and test martingales, which generalize likelihood ratios and allow valid inference at arbitrary stop** times. In accordance with the recently introduced model-X setting, our test depends on the availability of the conditional distribution of $X$ given $Z$, or at least a sufficiently sharp approximation thereof. Within this setting, we derive a general method for constructing e-statistics for testing conditional independence, show that it leads to growth-rate optimal e-statistics for simple alternatives, and prove that our method yields tests with asymptotic power one in the special case of a logistic regression model. A simulation study is done to demonstrate that the approach is competitive in terms of power when compared to established sequential and nonsequential testing methods, and robust with respect to violations of the model-X assumption.
△ Less
Submitted 21 February, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
E-Statistics, Group Invariance and Anytime Valid Testing
Authors:
Muriel Felipe Pérez-Ortiz,
Tyron Lardy,
Rianne de Heide,
Peter Grünwald
Abstract:
We study worst-case-growth-rate-optimal (GROW) e-statistics for hypothesis testing between two group models. It is known that under a mild condition on the action of the underlying group G on the data, there exists a maximally invariant statistic. We show that among all e-statistics, invariant or not, the likelihood ratio of the maximally invariant statistic is GROW, both in the absolute and in th…
▽ More
We study worst-case-growth-rate-optimal (GROW) e-statistics for hypothesis testing between two group models. It is known that under a mild condition on the action of the underlying group G on the data, there exists a maximally invariant statistic. We show that among all e-statistics, invariant or not, the likelihood ratio of the maximally invariant statistic is GROW, both in the absolute and in the relative sense, and that an anytime-valid test can be based on it. The GROW e-statistic is equal to a Bayes factor with a right Haar prior on G. Our treatment avoids nonuniqueness issues that sometimes arise for such priors in Bayesian contexts. A crucial assumption on the group G is its amenability, a well-known group-theoretical condition, which holds, for instance, in scale-location families. Our results also apply to finite-dimensional linear regression.
△ Less
Submitted 17 October, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Beyond Neyman-Pearson: e-values enable hypothesis testing with a data-driven alpha
Authors:
Peter Grünwald
Abstract:
A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With p-values, it is not clear how to use an extreme observation (e.g. p $\ll α$) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-P…
▽ More
A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With p-values, it is not clear how to use an extreme observation (e.g. p $\ll α$) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post-hoc, after observation of the data -- thereby providing a handle on `roving $α$'s'. When Type-II risks are taken into consideration, the only admissible decision rules in the post-hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.
△ Less
Submitted 2 April, 2024; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Exact Anytime-valid Confidence Intervals for Contingency Tables and Beyond
Authors:
Rosanne Turner,
Peter Grünwald
Abstract:
E-variables are tools for retaining type-I error guarantee with optional stop**. We extend E-variables for sequential two-sample tests to general null hypotheses and anytime-valid confidence sequences. We provide implementations for estimating risk difference, relative risk and odds-ratios in contingency tables.
E-variables are tools for retaining type-I error guarantee with optional stop**. We extend E-variables for sequential two-sample tests to general null hypotheses and anytime-valid confidence sequences. We provide implementations for estimating risk difference, relative risk and odds-ratios in contingency tables.
△ Less
Submitted 24 June, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
The no-free-lunch theorems of supervised learning
Authors:
Tom F. Sterkenburg,
Peter D. Grünwald
Abstract:
The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-dri…
▽ More
The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-driven. On this conception, every algorithm must have an inherent inductive bias, that wants justification. We argue that many standard learning algorithms should rather be understood as model-dependent: in each application they also require for input a model, representing a bias. Generic algorithms themselves, they can be given a model-relative justification.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Minimax risk classifiers with 0-1 loss
Authors:
Santiago Mazuelas,
Mauricio Romero,
Peter Grünwald
Abstract:
Supervised classification techniques use training samples to learn a classification rule with small expected 0-1 loss (error probability). Conventional methods enable tractable learning and provide out-of-sample generalization by using surrogate losses instead of the 0-1 loss and considering specific families of rules (hypothesis classes). This paper presents minimax risk classifiers (MRCs) that m…
▽ More
Supervised classification techniques use training samples to learn a classification rule with small expected 0-1 loss (error probability). Conventional methods enable tractable learning and provide out-of-sample generalization by using surrogate losses instead of the 0-1 loss and considering specific families of rules (hypothesis classes). This paper presents minimax risk classifiers (MRCs) that minize the worst-case 0-1 loss with respect to uncertainty sets of distributions that can include the underlying distribution, with a tunable confidence. We show that MRCs can provide tight performance guarantees at learning and are strongly universally consistent using feature map**s given by characteristic kernels. The paper also proposes efficient optimization techniques for MRC learning and shows that the methods presented can provide accurate classification together with tight performance guarantees in practice.
△ Less
Submitted 16 August, 2023; v1 submitted 17 January, 2022;
originally announced January 2022.
-
ALL-IN meta-analysis: breathing life into living systematic reviews
Authors:
Judith ter Schure,
Peter Grünwald
Abstract:
Science is idolized as a cumulative process ("standing on the shoulders of giants"), yet scientific knowledge is typically built on a patchwork of research contributions without much coordination. This lack of efficiency has specifically been addressed in clinical research by recommendations for living systematic reviews and against research waste. We propose to further those recommendations with…
▽ More
Science is idolized as a cumulative process ("standing on the shoulders of giants"), yet scientific knowledge is typically built on a patchwork of research contributions without much coordination. This lack of efficiency has specifically been addressed in clinical research by recommendations for living systematic reviews and against research waste. We propose to further those recommendations with ALL-IN meta-analysis: Anytime Live and Leading INterim meta-analysis. ALL-IN provides statistical methodology for a meta-analysis that can be updated at any time -- reanalyzing after each new observation while retaining type-I error guarantees, live -- no need to prespecify the looks, and leading -- in the decisions on whether individual studies should be initiated, stopped or expanded, the meta-analysis can be the leading source of information. We illustrate the method for time-to-event data, showing how synthesizing data at interim stages of studies can increase efficiency when studies are slow in themselves to provide the necessary number of events for completion. The meta-analysis can be performed on interim data, but does not have to. The analysis design requires no information about the number of patients in trials or the number of trials eventually included. So it can breathe life into living systematic reviews, through better and simpler statistics, efficiency, collaboration and communication.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Microwave-optical coupling via Rydberg excitons in cuprous oxide
Authors:
Liam A. P. Gallagher,
Joshua P. Rogers,
Jon D. Pritchett,
Rajan A. Mistry,
Danielle Pizzey,
Charles S. Adams,
Matthew P. A Jones,
Peter Grünwald,
Valentin Walther,
Chris Hodges,
Wolfgang Langbein,
Stephen A. Lynch
Abstract:
We report exciton-mediated coupling between microwave and optical fields in cuprous oxide (Cu$_2$O) at low temperatures. Rydberg excitonic states with principal quantum number up to $n=12$ were observed at 4~K using both one-photon (absorption) and two-photon (second harmonic generation) spectroscopy. Near resonance with an excitonic state, the addition of a microwave field significantly changed t…
▽ More
We report exciton-mediated coupling between microwave and optical fields in cuprous oxide (Cu$_2$O) at low temperatures. Rydberg excitonic states with principal quantum number up to $n=12$ were observed at 4~K using both one-photon (absorption) and two-photon (second harmonic generation) spectroscopy. Near resonance with an excitonic state, the addition of a microwave field significantly changed the absorption lineshape, and added sidebands at the microwave frequency to the coherent second harmonic. Both effects showed a complex dependence on $n$ and angular momentum, $l$. All of these features are in semi-quantitative agreement with a model based on intraband electric dipole transitions between Rydberg exciton states. With a simple microwave antenna we already reach a regime where the microwave coupling (Rabi frequency) is comparable to the nonradiatively broadened linewidth of the Rydberg excitons. The results provide a new way to manipulate excitonic states, and open up the possibility of a cryogenic microwave to optical transducer based on Rydberg excitons.
△ Less
Submitted 7 October, 2021; v1 submitted 20 September, 2021;
originally announced September 2021.
-
PAC-Bayes, MAC-Bayes and Conditional Mutual Information: Fast rate bounds that handle general VC classes
Authors:
Peter Grünwald,
Thomas Steinke,
Lydia Zakynthinou
Abstract:
We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and gho…
▽ More
We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and ghost sample. This allows us to get nontrivial PAC-Bayes and MI-style bounds for general VC classes, something recently shown to be impossible with standard PAC-Bayesian/MI bounds. Second, it allows us to get faster rates of order $O \left(({\text{KL}}/n)^γ\right)$ for $γ> 1/2$ if a Bernstein condition holds and for exp-concave losses (with $γ=1$), which is impossible with both standard PAC-Bayes generalization and MI bounds. Our work extends the recent work by Steinke and Zakynthinou [2020] who handle MI with VC but neither PAC-Bayes nor fast rates, the recent work of Hellström and Durisi [2020] who extend the latter to the PAC-Bayes setting via a unifying exponential inequality, and Mhammedi et al. [2019] who initiated fast rate PAC-Bayes generalization error bounds but handle neither MI nor general VC classes.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Generic E-Variables for Exact Sequential k-Sample Tests that allow for Optional Stop**
Authors:
Rosanne Turner,
Alexander Ly,
Peter Grünwald
Abstract:
We develop E-variables for testing whether two or more data streams come from the same source or not, and more generally, whether the difference between the sources is larger than some minimal effect size. These E-variables lead to exact, nonasymptotic tests that remain safe, i.e. keep their type-I error guarantees, under flexible sampling scenarios such as optional stop** and continuation. In s…
▽ More
We develop E-variables for testing whether two or more data streams come from the same source or not, and more generally, whether the difference between the sources is larger than some minimal effect size. These E-variables lead to exact, nonasymptotic tests that remain safe, i.e. keep their type-I error guarantees, under flexible sampling scenarios such as optional stop** and continuation. In special cases our E-variables also have an optimal 'growth' property under the alternative. While the construction is generic, we illustrate it through the special case of k x 2 contingency tables, where we also allow for the incorporation of different restrictions on a composite alternative. Comparison to p-value analysis in simulations and a real-world example show that E-variables, through their flexibility, often allow for early stop** of data collection, thereby retaining similar power as classical methods, while also retaining the option of extending or combining data afterwards.
△ Less
Submitted 22 June, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Robust subgroup discovery
Authors:
Hugo Manuel Proença,
Peter Grünwald,
Thomas Bäck,
Matthijs van Leeuwen
Abstract:
We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same tim…
▽ More
We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, including traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, finding optimal subgroup lists is NP-hard. Therefore, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration. In fact, the greedy gain is shown to be equivalent to a Bayesian one-sample proportion, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. Furthermore, we empirically show on 54 datasets that SSD++ outperforms previous subgroup discovery methods in terms of quality, generalisation on unseen data, and subgroup list size.
△ Less
Submitted 30 June, 2022; v1 submitted 25 March, 2021;
originally announced March 2021.
-
The Anytime-Valid Logrank Test: Error Control Under Continuous Monitoring with Unlimited Horizon
Authors:
J. ter Schure,
M. F. Perez-Ortiz,
A. Ly,
P. Grunwald
Abstract:
We introduce the anytime-valid (AV) logrank test, a version of the logrank test that provides type-I error guarantees under optional stop** and optional continuation. The test is sequential without the need to specify a maximum sample size or stop** rule, and allows for cumulative meta-analysis with type-I error control. The method can be extended to define anytime-valid confidence intervals.…
▽ More
We introduce the anytime-valid (AV) logrank test, a version of the logrank test that provides type-I error guarantees under optional stop** and optional continuation. The test is sequential without the need to specify a maximum sample size or stop** rule, and allows for cumulative meta-analysis with type-I error control. The method can be extended to define anytime-valid confidence intervals. The logrank test is an instance of the martingale tests based on E-variables that have been recently developed. We demonstrate type-I error guarantees for the test in a semiparametric setting of proportional hazards and show how to extend it to ties, Cox' regression and confidence sequences. Using a Gaussian approximation on the logrank statistic, we show that the AV logrank test (which itself is always exact) has a similar rejection region to O'Brien-Fleming alpha-spending but with the potential to achieve 100% power by optional continuation. Although our approach to study design requires a larger sample size, the *expected* sample size is competitive by optional stop**.
△ Less
Submitted 1 May, 2023; v1 submitted 13 November, 2020;
originally announced November 2020.
-
Electromagnetically induced transparency with Cu$_2$O excitons in the presence of phonon coupling
Authors:
Valentin Walther,
Peter Grünwald,
Thomas Pohl
Abstract:
Highly excited Rydberg states of excitons in Cu$_2$O semiconductors provide a promising approach to explore and control strong particle interactions in a solid-state environment. A major obstacle has been the substantial absorption background that stems from exciton-phonon coupling and lies under the Rydberg excitation spectrum, weakening the effects of exciton interactions. Here, we demonstrate t…
▽ More
Highly excited Rydberg states of excitons in Cu$_2$O semiconductors provide a promising approach to explore and control strong particle interactions in a solid-state environment. A major obstacle has been the substantial absorption background that stems from exciton-phonon coupling and lies under the Rydberg excitation spectrum, weakening the effects of exciton interactions. Here, we demonstrate that two-photon excitation of Rydberg excitons under conditions of electromagnetically induced transparency (EIT) can be used to control this background. Based on a microscopic theory that describes the known single-photon absorption spectrum, we analyze the conditions under which two-photon EIT permits separating the optical Rydberg excitation from the phonon-induced absorption background, and even suppressing it entirely. Our findings thereby pave the way for the exploitation of Rydberg blockade with Cu$_2$O excitons in nonlinear optics and other applications.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
A universal generic description of the dynamics of the current COVID-19 pandemic
Authors:
Heinrich Stolz,
Dirk Semkat,
Peter Grünwald
Abstract:
The ongoing COVID-19 pandemic is challenging every part of society. From a scientific point of view the first major task is to predict the dynamics of the pandemic, allowing governments to allocate proper resources and measures to fight it, as well as gauging the success of these measures by comparison with the predictions in hindsight. The vast majority of pandemic models are based on extensive m…
▽ More
The ongoing COVID-19 pandemic is challenging every part of society. From a scientific point of view the first major task is to predict the dynamics of the pandemic, allowing governments to allocate proper resources and measures to fight it, as well as gauging the success of these measures by comparison with the predictions in hindsight. The vast majority of pandemic models are based on extensive models with large numbers of fit parameters, leading to individual descriptions for every hot spot on the world. This makes predictions and comparisons cumbersome, if not impossible. We here propose a different approach, by moving away from a description over time, and instead choosing the total number of infected people in an enclosed area as the independent variable. Analyzing a few hot spots data, we derive an empirical formula for the dynamics, dependent only on three variables. The final number of infections is strictly connected to one fit parameter we call mitigation factor, which in turn is mostly dependent only on the enclosed population. Despite its simpleness, this description applies to every of the around 50 countries we have analyzed, allows to separate different waves of the pandemic, provides a figure of merit for the overall usefulness of government measures, and shows when a pandemic is ending. Our model is robust against undetected cases, and allows all nations, in particular those with fewer resources, to reasonably predict the outcome of the pandemic in their country.
△ Less
Submitted 28 September, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Minimax rates without the fixed sample size assumption
Authors:
Alisa Kirichenko,
Peter Grünwald
Abstract:
We generalize the notion of minimax convergence rate. In contrast to the standard definition, we do not assume that the sample size is fixed in advance. Allowing for varying sample size results in time-robust minimax rates and estimators. These can be either strongly adversarial, based on the worst-case over all sample sizes, or weakly adversarial, based on the worst-case over all stop** times.…
▽ More
We generalize the notion of minimax convergence rate. In contrast to the standard definition, we do not assume that the sample size is fixed in advance. Allowing for varying sample size results in time-robust minimax rates and estimators. These can be either strongly adversarial, based on the worst-case over all sample sizes, or weakly adversarial, based on the worst-case over all stop** times. We show that standard and time-robust rates usually differ by at most a logarithmic factor, and that for some (and we conjecture for all) exponential families, they differ by exactly an iterated logarithmic factor. In many situations, time-robust rates are arguably more natural to consider. For example, they allow us to simultaneously obtain strong model selection consistency and optimal estimation rates, thus avoiding the "AIC-BIC dilemma".
△ Less
Submitted 29 May, 2021; v1 submitted 19 June, 2020;
originally announced June 2020.
-
Discovering outstanding subgroup lists for numeric targets using MDL
Authors:
Hugo M. Proença,
Peter Grünwald,
Thomas Bäck,
Matthijs van Leeuwen
Abstract:
The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperpar…
▽ More
The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperparameters.
We propose a dispersion-aware problem formulation for subgroup set discovery that is based on the minimum description length (MDL) principle and subgroup lists. We argue that the best subgroup list is the one that best summarizes the data given the overall distribution of the target. We restrict our focus to a single numeric target variable and show that our formalization coincides with an existing quality measure when finding a single subgroup, but that-in addition-it allows to trade off subgroup quality with the complexity of the subgroup. We next propose SSD++, a heuristic algorithm for which we empirically demonstrate that it returns outstanding subgroup lists: non-redundant sets of compact subgroups that stand out by having strongly deviating means and small spread.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Estimating the single-photon projection of low-intensity light sources
Authors:
Jorge Rolando Chavez-Mackay,
Peter Grünwald,
Blas Manuel Rodríguez-Lara
Abstract:
Estimating the quality of a single-photon source is crucial for its use in quantum technologies. The standard test for semiconductor sources is a value of the second-order correlation function of the emitted field below $1/2$ at zero time-delay. This criterion alone provides no information regarding the amplitude of the single-photon contribution for general quantum states. Addressing this questio…
▽ More
Estimating the quality of a single-photon source is crucial for its use in quantum technologies. The standard test for semiconductor sources is a value of the second-order correlation function of the emitted field below $1/2$ at zero time-delay. This criterion alone provides no information regarding the amplitude of the single-photon contribution for general quantum states. Addressing this question requires the knowledge of additional observables. We derive an effective second-order correlation function, strongly connected to the Mandel-$Q$ parameter and given in terms of both the second-order correlation and the average photon number, that provides a lower bound on the single-to-multi-photon projection ratio. Using both observables individually allows for lower and upper bounds for the single-photon projection. Comparing the tightness of our bounds with those in the literature, we find that relative bounds may be better described using the average photon number, while absolute bounds for low excitation states are tighter using the vacuum projection. Our results show that estimating the quality of a single-photon source based on additional information is very much dependent on what aspect of the quantum state of light one is interested in.
△ Less
Submitted 9 May, 2020; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Safe-Bayesian Generalized Linear Regression
Authors:
Rianne de Heide,
Alisa Kirichenko,
Nishant Mehta,
Peter Grünwald
Abstract:
We study generalized Bayesian inference under misspecification, i.e. when the model is 'wrong but useful'. Generalized Bayes equips the likelihood with a learning rate $η$. We show that for generalized linear models (GLMs), $η$-generalized Bayes concentrates around the best approximation of the truth within the model for specific $η\neq 1$, even under severely misspecified noise, as long as the ta…
▽ More
We study generalized Bayesian inference under misspecification, i.e. when the model is 'wrong but useful'. Generalized Bayes equips the likelihood with a learning rate $η$. We show that for generalized linear models (GLMs), $η$-generalized Bayes concentrates around the best approximation of the truth within the model for specific $η\neq 1$, even under severely misspecified noise, as long as the tails of the true distribution are exponential. We derive MCMC samplers for generalized Bayesian lasso and logistic regression and give examples of both simulated and real-world data in which generalized Bayes substantially outperforms standard Bayes.
△ Less
Submitted 29 May, 2021; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Minimum Description Length Revisited
Authors:
Peter Grünwald,
Teemu Roos
Abstract:
This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since…
▽ More
This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.
△ Less
Submitted 18 December, 2019; v1 submitted 21 August, 2019;
originally announced August 2019.
-
Safe Testing
Authors:
Peter Grünwald,
Rianne de Heide,
Wouter Koolen
Abstract:
We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on e-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define grow…
▽ More
We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on e-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define growth-rate optimality (GRO) as an analogue of power in an optional continuation context, and we show how to construct GRO e-variables for general testing problems with composite null and alternative, emphasizing models with nuisance parameters. GRO e-values take the form of Bayes factors with special priors. We illustrate the theory using several classic examples including a one-sample safe t-test and the 2 x 2 contingency table. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, e-values may provide a methodology acceptable to adherents of all three schools.
△ Less
Submitted 10 March, 2023; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Accumulation Bias in Meta-Analysis: The Need to Consider Time in Error Control
Authors:
Judith ter Schure,
Peter D. Grünwald
Abstract:
Studies accumulate over time and meta-analyses are mainly retrospective. These two characteristics introduce dependencies between the analysis time, at which a series of studies is up for meta-analysis, and results within the series. Dependencies introduce bias --- Accumulation Bias --- and invalidate the sampling distribution assumed for p-value tests, thus inflating type-I errors. But dependenci…
▽ More
Studies accumulate over time and meta-analyses are mainly retrospective. These two characteristics introduce dependencies between the analysis time, at which a series of studies is up for meta-analysis, and results within the series. Dependencies introduce bias --- Accumulation Bias --- and invalidate the sampling distribution assumed for p-value tests, thus inflating type-I errors. But dependencies are also inevitable, since for science to accumulate efficiently, new research needs to be informed by past results. Here, we investigate various ways in which time influences error control in meta-analysis testing. We introduce an Accumulation Bias Framework that allows us to model a wide variety of practically occurring dependencies, including study series accumulation, meta-analysis timing, and approaches to multiple testing in living systematic reviews. The strength of this framework is that it shows how all dependencies affect p-value-based tests in a similar manner. This leads to two main conclusions. First, Accumulation Bias is inevitable, and even if it can be approximated and accounted for, no valid p-value tests can be constructed. Second, tests based on likelihood ratios withstand Accumulation Bias: they provide bounds on error probabilities that remain valid despite the bias. We leave the reader with a choice between two proposals to consider time in error control: either treat individual (primary) studies and meta-analyses as two separate worlds --- each with their own timing --- or integrate individual studies in the meta-analysis world. Taking up likelihood ratios in either approach allows for valid tests that relate well to the accumulating nature of scientific knowledge. Likelihood ratios can be interpreted as betting profits, earned in previous studies and invested in new ones, while the meta-analyst is allowed to cash out at any time and advise against future studies.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
PAC-Bayes Un-Expected Bernstein Inequality
Authors:
Zakaria Mhammedi,
Peter D. Grunwald,
Benjamin Guedj
Abstract:
We present a new PAC-Bayesian generalization bound. Standard bounds contain a $\sqrt{L_n \cdot \KL/n}$ complexity term which dominates unless $L_n$, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace $L_n$ by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset a…
▽ More
We present a new PAC-Bayesian generalization bound. Standard bounds contain a $\sqrt{L_n \cdot \KL/n}$ complexity term which dominates unless $L_n$, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace $L_n$ by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough $n$). Theoretically, unlike existing bounds, our new bound can be expected to converge to $0$ faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with $X^2$ taken outside its expectation.
△ Less
Submitted 3 November, 2019; v1 submitted 30 May, 2019;
originally announced May 2019.
-
Nonquantum Information Gain from Higher-order Correlation Functions
Authors:
Peter Grünwald
Abstract:
Nonlinear correlation functions are at the heart of quantum theory. The second-order correlation function $g^{(2)}(τ)$ has been a cornerstone of quantum optics since over half a century and a myriad of quantum and classical applications has been discovered. In contrast, higher-order correlation functions have so far only been used to reveal the nonclassical character of the emitted fields. In this…
▽ More
Nonlinear correlation functions are at the heart of quantum theory. The second-order correlation function $g^{(2)}(τ)$ has been a cornerstone of quantum optics since over half a century and a myriad of quantum and classical applications has been discovered. In contrast, higher-order correlation functions have so far only been used to reveal the nonclassical character of the emitted fields. In this paper, we study the relation between the $k$th-order correlation function $g^{(k)}(0)$ and the projection of the underlying quantum state of light onto the subspace of Fock states with photon number less than $k$. We show, that when $g^{(k)}(0)$ falls below a critical value, lower bounds for the projection on this subspace can be concluded as well as on the ratio of the subspace with one upto $k-1$ photons and $k$ to infinity. These bounds are at face value only valid for nonclassical quantum states. However, when the quantum state includes a nonzero projection on the vacuum state, the value of $g^{(k)}(0)$ is artificially enhanced, potentially covering these projections. We derive an effective $k$th-order correlation function, which accounts for the effect of vacuum. We show that the information gained from the effective correlation function is not limited to nonclassical quantum states and thus constitute a quantum- and classical application of higher-order correlation functions.
△ Less
Submitted 9 May, 2020; v1 submitted 25 May, 2019;
originally announced May 2019.
-
Radiation pressure in finite Fabry-Pérot cavities
Authors:
P. Grünwald,
B. M. Rodríguez-Lara
Abstract:
We study the effect of finite size and misalignment on a fundamental optomechanical setup: a Fabry-Pérot cavity with one fixed and one moveable mirror. We describe in detail light confinement under these real world imperfections and compare the behaviour of the intracavity and output fields to the well-known ideal case. In particular, we show that it is possible to trace the motion of the movable…
▽ More
We study the effect of finite size and misalignment on a fundamental optomechanical setup: a Fabry-Pérot cavity with one fixed and one moveable mirror. We describe in detail light confinement under these real world imperfections and compare the behaviour of the intracavity and output fields to the well-known ideal case. In particular, we show that it is possible to trace the motion of the movable mirror itself by measuring intensity changes in the output field even in the presence of fabrication shortcomings and thermal noise. Our result might be relevant to the transition from high precision research experiments to everyday commercial applications of optomechanics; such as high-precission stepmotor or actuator positioning.
△ Less
Submitted 16 January, 2019; v1 submitted 10 August, 2018;
originally announced August 2018.
-
Optional Stop** with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations
Authors:
Allard Hendriksen,
Rianne de Heide,
Peter Grünwald
Abstract:
It is often claimed that Bayesian methods, in particular Bayes factor methods for hypothesis testing, can deal with optional stop**. We first give an overview, using elementary probability theory, of three different mathematical meanings that various authors give to this claim: (1) stop** rule independence, (2) posterior calibration and (3) (semi-) frequentist robustness to optional stop**.…
▽ More
It is often claimed that Bayesian methods, in particular Bayes factor methods for hypothesis testing, can deal with optional stop**. We first give an overview, using elementary probability theory, of three different mathematical meanings that various authors give to this claim: (1) stop** rule independence, (2) posterior calibration and (3) (semi-) frequentist robustness to optional stop**. We then prove theorems to the effect that these claims do indeed hold in a general measure-theoretic setting. For claims of type (2) and (3), such results are new. By allowing for non-integrable measures based on improper priors, we obtain particularly strong results for the practically important case of models with nuisance parameters satisfying a group invariance (such as location or scale). We also discuss the practical relevance of (1)--(3), and conclude that whether Bayes factor methods actually perform well under optional stop** crucially depends on details of models, priors and the goal of the analysis.
△ Less
Submitted 29 April, 2020; v1 submitted 24 July, 2018;
originally announced July 2018.
-
Effective second-order correlation function and single-photon detection
Authors:
Peter Grünwald
Abstract:
Quantum-optical research on semiconductor single-photon sources puts special emphasis on the measurement of the second-order correlation function $g^{(2)}(τ)$, arguing that $g^{(2)}(0)<1/2$ implies the source field represents a good single-photon light source. We analyze the gain of information from $g^{(2)}(0)$ with respect to single photons. Any quantum state, for which the second-order correlat…
▽ More
Quantum-optical research on semiconductor single-photon sources puts special emphasis on the measurement of the second-order correlation function $g^{(2)}(τ)$, arguing that $g^{(2)}(0)<1/2$ implies the source field represents a good single-photon light source. We analyze the gain of information from $g^{(2)}(0)$ with respect to single photons. Any quantum state, for which the second-order correlation function falls below $1/2$, has a nonzero projection on the single-photon Fock state. The amplitude $p$ of this projection is arbitrary, independent of $g^{(2)}(0)$. However, one can extract a lower bound on the single-to-multi-photon-projection ratio. A vacuum contribution in the quantum state of light artificially increases the value of $g^{(2)}(0)$, cloaking actual single-photon projection. Thus, we propose an effective second-order correlation function $\tilde g^{(2)}(0)$, which takes the influence of vacuum into account and also yields lower and upper bounds on $p$. We consider the single-photon purity as a standard figure-of merit in experiments, reinterpret it within our results and provide an effective version of that physical quantity. Besides comparing different experimental and theoretical results, we also provide a possible measurement scheme for determining $\tilde g^{(2)}(0)$.
△ Less
Submitted 12 September, 2019; v1 submitted 15 November, 2017;
originally announced November 2017.
-
A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
Authors:
Peter D. Grünwald,
Nishant A. Mehta
Abstract:
We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexit…
▽ More
We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, $\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior})$ complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to $L_2(P)$ entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with $L_\infty$. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.
△ Less
Submitted 20 October, 2017;
originally announced October 2017.
-
Rydberg excitons in the presence of an ultralow-density electron-hole plasma
Authors:
J. Heckötter,
M. Freitag,
D. Fröhlich,
M. Aßmann,
M. Bayer,
P. Grünwald,
F. Schöne,
D. Semkat,
H. Stolz,
S. Scheel
Abstract:
We use two-color pump-probe spectroscopy to study Rydberg excitons in Cu$_2$O in the presence of free carriers injected by above-band-gap excitation. Already at plasma densities $ρ_\text{eh}$ below one hundredth electron-hole pair per \textmu m$^{3}$, the Rydberg exciton absorption lines are bleached while their energies remain constant, until they finally disappear, starting from the highest obse…
▽ More
We use two-color pump-probe spectroscopy to study Rydberg excitons in Cu$_2$O in the presence of free carriers injected by above-band-gap excitation. Already at plasma densities $ρ_\text{eh}$ below one hundredth electron-hole pair per \textmu m$^{3}$, the Rydberg exciton absorption lines are bleached while their energies remain constant, until they finally disappear, starting from the highest observed principal quantum number $n_\text{max}$. As confirmed by calculations, the band gap is reduced by many-particle effects caused by free carriers scaling as $ρ_\text{eh}^{1/2}$. An exciton line looses oscillator strength when the band edge approaches the exciton energy vanishing completely at the crossing point. We quantitatively describe this plasma blockade by introducing an effective Bohr radius that determines the energy distance to the shifted band edge. In combination with the negligible associated decoherence this opens the possibility to control the Rydberg exciton absorption through the plasma-induced band gap modulation.
△ Less
Submitted 4 September, 2017;
originally announced September 2017.
-
Why optional stop** can be a problem for Bayesians
Authors:
Rianne de Heide,
Peter D. Grünwald
Abstract:
Recently, optional stop** has been a subject of debate in the Bayesian psychology community. Rouder (2014) argues that optional stop** is no problem for Bayesians, and even recommends the use of optional stop** in practice, as do Wagenmakers et al. (2012). This article addresses the question whether optional stop** is problematic for Bayesian methods, and specifies under which circumstance…
▽ More
Recently, optional stop** has been a subject of debate in the Bayesian psychology community. Rouder (2014) argues that optional stop** is no problem for Bayesians, and even recommends the use of optional stop** in practice, as do Wagenmakers et al. (2012). This article addresses the question whether optional stop** is problematic for Bayesian methods, and specifies under which circumstances and in which sense it is and is not. By slightly varying and extending Rouder's (2014) experiments, we illustrate that, as soon as the parameters of interest are equipped with default or pragmatic priors - which means, in most practical applications of Bayes factor hypothesis testing - resilience to optional stop** can break down. We distinguish between three types of default priors, each having their own specific issues with optional stop**, ranging from no-problem-at-all (Type 0 priors) to quite severe (Type II priors).
△ Less
Submitted 25 March, 2021; v1 submitted 28 August, 2017;
originally announced August 2017.
-
Excitonic giant-dipole potentials in cuprous oxide
Authors:
Markus Kurz,
Peter Grünwald,
Stefan Scheel
Abstract:
In this work we predict the existence of a novel species of Wannier excitons when exposed to crossed electric and magnetic fields. In particular, we present a theory of giant-dipole excitons in $\textrm{Cu}_2\rm O$ in crossed fields. Within our theoretical approach we perform a pseudoseparation of the center-of-mass motion for the field-dressed excitonic species, thereby obtaining an effective sin…
▽ More
In this work we predict the existence of a novel species of Wannier excitons when exposed to crossed electric and magnetic fields. In particular, we present a theory of giant-dipole excitons in $\textrm{Cu}_2\rm O$ in crossed fields. Within our theoretical approach we perform a pseudoseparation of the center-of-mass motion for the field-dressed excitonic species, thereby obtaining an effective single-particle Hamiltonian for the relative motion. For arbitrary gauge fields we exactly separate the gauge-dependent kinetic energy terms from the effective single-particle interaction potential. Depending on the applied field strengths and the specific field orientation, the potential for the relative motion of electron and hole exhibits an outer well at spatial separations up to several micrometers and depths up to $380\, μ\rm eV$, leading to possible permanent excitonic electric dipole moments of around three million Debye.
△ Less
Submitted 29 May, 2017; v1 submitted 25 January, 2017;
originally announced January 2017.
-
Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning
Authors:
Wouter M. Koolen,
Peter Grünwald,
Tim van Erven
Abstract:
We consider online learning algorithms that guarantee worst-case regret rates in adversarial environments (so they can be deployed safely and will perform robustly), yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance). We quantify the friendliness of stochastic environments by means of the well-known Bernstein (a.k.a…
▽ More
We consider online learning algorithms that guarantee worst-case regret rates in adversarial environments (so they can be deployed safely and will perform robustly), yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance). We quantify the friendliness of stochastic environments by means of the well-known Bernstein (a.k.a. generalized Tsybakov margin) condition. For two recent algorithms (Squint for the Hedge setting and MetaGrad for online convex optimization) we show that the particular form of their data-dependent individual-sequence regret guarantees implies that they adapt automatically to the Bernstein parameters of the stochastic environment. We prove that these algorithms attain fast rates in their respective settings both in expectation and with high probability.
△ Less
Submitted 20 May, 2016;
originally announced May 2016.
-
Fast Rates for General Unbounded Loss Functions: from ERM to Generalized Bayes
Authors:
Peter D. Grünwald,
Nishant A. Mehta
Abstract:
We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $η$-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Baye…
▽ More
We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $η$-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate $η$ is set correctly. For general loss functions, our bounds rely on two separate conditions: the $v$-GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter $v$ in the $v$-GRIP conditions determines the achievable rate and is akin to the exponent in the Tsybakov margin condition and the Bernstein condition for bounded losses, which the $v$-GRIP conditions generalize; favorable $v$ in combination with small model complexity leads to $\tilde{O}(1/n)$ rates. The witness condition allows us to connect the excess risk to an "annealed" version thereof, by which we generalize several previous results connecting Hellinger and Rényi divergence to KL divergence.
△ Less
Submitted 5 November, 2019; v1 submitted 1 May, 2016;
originally announced May 2016.
-
Safe Probability
Authors:
Peter Grünwald
Abstract:
We formalize the idea of probability distributions that lead to reliable predictions about some, but not all aspects of a domain. The resulting notion of `safety' provides a fresh perspective on foundational issues in statistics, providing a middle ground between imprecise probability and multiple-prior models on the one hand and strictly Bayesian approaches on the other. It also allows us to form…
▽ More
We formalize the idea of probability distributions that lead to reliable predictions about some, but not all aspects of a domain. The resulting notion of `safety' provides a fresh perspective on foundational issues in statistics, providing a middle ground between imprecise probability and multiple-prior models on the one hand and strictly Bayesian approaches on the other. It also allows us to formalize fiducial distributions in terms of the set of random variables that they can safely predict, thus taking some of the sting out of the fiducial idea. By restricting probabilistic inference to safe uses, one also automatically avoids paradoxes such as the Monty Hall problem. Safety comes in a variety of degrees, such as "validity" (the strongest notion), "calibration", "confidence safety" and "unbiasedness" (almost the weakest notion).
△ Less
Submitted 6 April, 2016;
originally announced April 2016.
-
Robust Probability Updating
Authors:
Thijs van Ommen,
Wouter M. Koolen,
Thijs E. Feenstra,
Peter D. Grünwald
Abstract:
This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution.…
▽ More
This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution. This paper thus addresses a generalization of that problem to arbitrary distributions on finite outcome spaces, arbitrary sets of `messages', and (almost) arbitrary loss functions, and provides existence and characterization theorems for robust probability updating strategies. We find that for logarithmic loss, optimality is characterized by an elegant condition, which we call RCAR (reverse coarsening at random). Under certain conditions, the same condition also characterizes optimality for a much larger class of loss functions, and we obtain an objective and general answer to how one should update probabilities in the light of new information.
△ Less
Submitted 2 May, 2016; v1 submitted 10 December, 2015;
originally announced December 2015.
-
Signatures of Quantum Coherence in Rydberg Excitons
Authors:
P. Grünwald,
M. Aßmann,
J. Heckötter,
D. Fröhlich,
M. Bayer,
H. Stolz,
S. Scheel
Abstract:
Coherent optical control of individual particles has been demonstrated both for atoms and semiconductor quantum dots. Here we demonstrate the emergence of quantum coherent effects in semiconductor Rydberg excitons in bulk Cu$_2$O. Due to the spectral proximity between two adjacent Rydberg exciton states, a single-frequency laser may pump both resonances with little dissipation from the detuning. A…
▽ More
Coherent optical control of individual particles has been demonstrated both for atoms and semiconductor quantum dots. Here we demonstrate the emergence of quantum coherent effects in semiconductor Rydberg excitons in bulk Cu$_2$O. Due to the spectral proximity between two adjacent Rydberg exciton states, a single-frequency laser may pump both resonances with little dissipation from the detuning. As a consequence, additional resonances appear in the absorption spectrum that correspond to dressed states consisting of two Rydberg exciton levels coupled to the excitonic vacuum, forming a V-type three-level system, but driven only by one laser light source. We show that the level of pure dephasing in this system is extremely low. These observations are a crucial step towards coherently controlled quantum technologies in a bulk semiconductor.
△ Less
Submitted 27 September, 2016; v1 submitted 24 November, 2015;
originally announced November 2015.
-
Deviations of the exciton level spectrum in cuprous oxide from the hydrogen series
Authors:
Florian Schöne,
Sjard-Ole Krüger,
Peter Grünwald,
Marc Aßmann,
Julian Heckötter,
Johannes Thewes,
Dietmar Fröhlich,
Manfred Bayer,
Heinrich Stolz,
Stefan Scheel
Abstract:
Recent high-resolution absorption spectroscopy on excited excitons in cuprous oxide [Nature (London) 514, 343 (2014)] has revealed significant deviations of their spectrum from that of the ideal hydrogen-like series. Here we show that the complex band dispersion of the crystal, which determines the kinetic energy of electrons and holes, strongly affects the exciton binding energy. Specifically, we…
▽ More
Recent high-resolution absorption spectroscopy on excited excitons in cuprous oxide [Nature (London) 514, 343 (2014)] has revealed significant deviations of their spectrum from that of the ideal hydrogen-like series. Here we show that the complex band dispersion of the crystal, which determines the kinetic energy of electrons and holes, strongly affects the exciton binding energy. Specifically, we show that the nonparabolicity of the band dispersion is the main cause of the deviation from the hydrogen series. Experimental data collected from high-resolution absorption spectroscopy in electric fields validate the assignment of the deviation to the nonparabolicity of the band dispersion.
△ Less
Submitted 22 February, 2016; v1 submitted 17 November, 2015;
originally announced November 2015.
-
Nonclassical light from an incoherently pumped quantum dot in a microcavity
Authors:
L. Teuber,
P. Grünwald,
W. Vogel
Abstract:
Semiconductor microcavities with artificial single-photon emitters have become one of the backbones of semiconductor quantum optics. In many cases however, technical and physical issues limit the study of optical fields to incoherently excited systems. We analyze the model of a two-level system in a single-mode cavity, where the former is incoherently driven. The specific structure of the applied…
▽ More
Semiconductor microcavities with artificial single-photon emitters have become one of the backbones of semiconductor quantum optics. In many cases however, technical and physical issues limit the study of optical fields to incoherently excited systems. We analyze the model of a two-level system in a single-mode cavity, where the former is incoherently driven. The specific structure of the applied master equation yields a recurrence relation for the steady-state values of correlations of the intracavity field and the emitter. We provide boundary conditions, that permit a systematical, easy to implement solution, which is numerically less demanding than standard methods. Different cavity systems from previous experiments are analyzed. The derived boundary conditions also allow us direct analytical statements about the overall quantum state and its higher order moments. With this we can give very good approximations for the full quantum state of the field and show, that for every physically reasonable set of system parameters, the state of the intracavity field is nonclassical.
△ Less
Submitted 28 November, 2015; v1 submitted 3 September, 2015;
originally announced September 2015.
-
Fast rates in statistical and online learning
Authors:
Tim van Erven,
Peter D. Grünwald,
Nishant A. Mehta,
Mark D. Reid,
Robert C. Williamson
Abstract:
The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most…
▽ More
The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.
△ Less
Submitted 1 September, 2015; v1 submitted 9 July, 2015;
originally announced July 2015.
-
Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It
Authors:
Peter Grünwald,
Thijs van Ommen
Abstract:
We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sa…
▽ More
We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sample size increases. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the Safe Bayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates as soon the standard posterior is not `cumulatively concentrated', and its results on our data are quite encouraging.
△ Less
Submitted 29 October, 2018; v1 submitted 11 December, 2014;
originally announced December 2014.
-
Quantum Measurement of Broadband Nonclassical Light Fields
Authors:
P. Grünwald,
D. Vasylyev,
J. Häggblad,
W. Vogel
Abstract:
Based on the measurement of quantum correlation functions, the quantum statistical properties of spectral measurements are studied for broadband radiation fields. The spectral filtering of light before its detection is compared with the direct detection followed by the spectral analysis of the recorded photocurrents. As an example, the squeezing spectra of the atomic resonance fluorescence are stu…
▽ More
Based on the measurement of quantum correlation functions, the quantum statistical properties of spectral measurements are studied for broadband radiation fields. The spectral filtering of light before its detection is compared with the direct detection followed by the spectral analysis of the recorded photocurrents. As an example, the squeezing spectra of the atomic resonance fluorescence are studied for both types of filtering procedures. The conditions for which the detection of the nonclassical signatures of the radiation is possible are analyzed. For the considered example, photocurrent filtering appears to be the superior option to detect nonclassicality, due to the vacuum-noise effects in the optical filtering.
△ Less
Submitted 14 November, 2014;
originally announced November 2014.
-
Almost the Best of Three Worlds: Risk, Consistency and Optional Stop** for the Switch Criterion in Nested Model Selection
Authors:
Stéphanie van der Pas,
Peter Grünwald
Abstract:
We study the switch distribution, introduced by Van Erven et al. (2012), applied to model selection and subsequent estimation. While switching was known to be strongly consistent, here we show that it achieves minimax optimal parametric risk rates up to a $\log\log n$ factor when comparing two nested exponential families, partially confirming a conjecture by Lauritzen (2012) and Cavanaugh (2012) t…
▽ More
We study the switch distribution, introduced by Van Erven et al. (2012), applied to model selection and subsequent estimation. While switching was known to be strongly consistent, here we show that it achieves minimax optimal parametric risk rates up to a $\log\log n$ factor when comparing two nested exponential families, partially confirming a conjecture by Lauritzen (2012) and Cavanaugh (2012) that switching behaves asymptotically like the Hannan-Quinn criterion. Moreover, like Bayes factor model selection but unlike standard significance testing, when one of the models represents a simple hypothesis, the switch criterion defines a robust null hypothesis test, meaning that its Type-I error probability can be bounded irrespective of the stop** rule. Hence, switching is consistent, insensitive to optional stop** and almost minimax risk optimal, showing that, Yang's (2005) impossibility result notwithstanding, it is possible to `almost' combine the strengths of AIC and Bayes factor model selection.
△ Less
Submitted 15 December, 2016; v1 submitted 25 August, 2014;
originally announced August 2014.
-
A Game-Theoretic Analysis of Updating Sets of Probabilities
Authors:
Peter D. Grunwald,
Joseph Y. Halpern
Abstract:
We consider how an agent should update her uncertainty when it is represented by a set P of probability distributions and the agent observes that a random variable X takes on value x, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a book…
▽ More
We consider how an agent should update her uncertainty when it is represented by a set P of probability distributions and the agent observes that a random variable X takes on value x, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a bookie, who chooses some distribution from P. We consider two reasonable games that differ in what the bookie knows when he makes his choice. Anomalies that have been observed before, like time inconsistency, can be understood as arising because different games are being played, against bookies with different information. We characterize the important special cases in which the optimal decision rules according to the minimax criterion amount to either conditioning or simply ignoring the information. Finally, we consider the relationship between conditioning and calibration when uncertainty is described by sets of probabilities.
△ Less
Submitted 27 July, 2014;
originally announced July 2014.