Search | arXiv e-print repository

arXiv:2406.19940 [pdf, other]

Closed-Form Power and Sample Size Calculations for Bayes Factors

Abstract: Determining an appropriate sample size is a critical element of study design, and the method used to determine it should be consistent with the planned analysis. When the planned analysis involves Bayes factor hypothesis testing, the sample size is usually desired to ensure a sufficiently high probability of obtaining a Bayes factor indicating compelling evidence for a hypothesis, given that the h… ▽ More Determining an appropriate sample size is a critical element of study design, and the method used to determine it should be consistent with the planned analysis. When the planned analysis involves Bayes factor hypothesis testing, the sample size is usually desired to ensure a sufficiently high probability of obtaining a Bayes factor indicating compelling evidence for a hypothesis, given that the hypothesis is true. In practice, Bayes factor sample size determination is typically performed using computationally intensive Monte Carlo simulation. Here, we summarize alternative approaches that enable sample size determination without simulation. We show how, under approximate normality assumptions, sample sizes can be determined numerically, and provide the R package bfpwr for this purpose. Additionally, we identify conditions under which sample sizes can even be determined in closed-form, resulting in novel, easy-to-use formulas that also help foster intuition, enable asymptotic analysis, and can also be used for hybrid Bayesian/likelihoodist design. Furthermore, we show how in our framework power and sample size can be computed without simulation for more complex analysis priors, such as Jeffreys-Zellner-Siow priors or nonlocal normal moment priors. Case studies from medicine and psychology illustrate how researchers can use our methods to design informative yet cost-efficient studies. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19152 [pdf, other]

Mixture priors for replication studies

Authors: Roberto Macrì Demartino, Leonardo Egidi, Leonhard Held, Samuel Pawel

Abstract: Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel Bayesian approach based on mixture priors. The idea is to use a mixture of the posterior distribution based on the original study and a non-informative distribution… ▽ More Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel Bayesian approach based on mixture priors. The idea is to use a mixture of the posterior distribution based on the original study and a non-informative distribution as the prior for the analysis of the replication study. The mixture weight then determines the extent to which the original and replication data are pooled. Two distinct strategies are presented: one with fixed mixture weights, and one that introduces uncertainty by assigning a prior distribution to the mixture weight itself. Furthermore, it is shown how within this framework Bayes factors can be used for formal testing of scientific hypotheses, such as tests regarding the presence or absence of an effect. To showcase the practical application of the methodology, we analyze data from three replication studies. Our findings suggest that mixture priors are a valuable and intuitive alternative to other Bayesian methods for analyzing replication studies, such as hierarchical models and power priors. We provide the free and open source R package repmix that implements the proposed methodology. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2401.13615 [pdf, other]

The assessment of replicability using the sum of p-values

Authors: Leonhard Held, Samuel Pawel, Charlotte Micheloud

Abstract: Statistical significance of both the original and the replication study is a commonly used criterion to assess replication attempts, also known as the two-trials rule in drug development. However, replication studies are sometimes conducted although the original study is non-significant, in which case Type-I error rate control across both studies is no longer guaranteed. We propose an alternative… ▽ More Statistical significance of both the original and the replication study is a commonly used criterion to assess replication attempts, also known as the two-trials rule in drug development. However, replication studies are sometimes conducted although the original study is non-significant, in which case Type-I error rate control across both studies is no longer guaranteed. We propose an alternative method to assess replicability using the sum of p-values from the two studies. The approach provides a combined p-value and can be calibrated to control the overall Type-I error rate at the same level as the two-trials rule but allows for replication success even if the original study is non-significant. The unweighted version requires a less restrictive level of significance at replication if the original study is already convincing which facilitates sample size reductions of up to 10%. Downweighting the original study accounts for possible bias and requires a more stringent significance level and larger samples sizes at replication. Data from four large-scale replication projects are used to illustrate and compare the proposed method with the two-trials rule, meta-analysis and Fisher's combination method. △ Less

Submitted 30 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: 6 figures, 0 tables, 1 box

arXiv:2312.11991 [pdf, other]

Outcomes truncated by death in RCTs: a simulation study on the survivor average causal effect

Authors: Stefanie von Felten, Chiara Vanetta, Christoph M. Rüegger, Sven Wellmann, Leonhard Held

Abstract: Continuous outcome measurements truncated by death present a challenge for the estimation of unbiased treatment effects in randomized controlled trials (RCTs). One way to deal with such situations is to estimate the survivor average causal effect (SACE), but this requires making non-testable assumptions. Motivated by an ongoing RCT in very preterm infants with intraventricular hemorrhage, we perfo… ▽ More Continuous outcome measurements truncated by death present a challenge for the estimation of unbiased treatment effects in randomized controlled trials (RCTs). One way to deal with such situations is to estimate the survivor average causal effect (SACE), but this requires making non-testable assumptions. Motivated by an ongoing RCT in very preterm infants with intraventricular hemorrhage, we performed a simulation study to compare a SACE estimator with complete case analysis (CCA) and an analysis after multiple imputation of missing outcomes. We set up 9 scenarios combining positive, negative and no treatment effect on the outcome (cognitive development) and on survival at 2 years of age. Treatment effect estimates from all methods were compared in terms of bias, mean squared error and coverage with regard to two true treatment effects: the treatment effect on the outcome used in the simulation and the SACE, which was derived by simulation of both potential outcomes per patient. Despite targeting different estimands (principal stratum estimand, hypothetical estimand), the SACE-estimator and multiple imputation gave similar estimates of the treatment effect and efficiently reduced the bias compared to CCA. Also, both methods were relatively robust to omission of one covariate in the analysis, and thus violation of relevant assumptions. Although the SACE is not without controversy, we find it useful if mortality is inherent to the study population. Some degree of violation of the required assumptions is almost certain, but may be acceptable in practice. △ Less

Submitted 12 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.05061 [pdf, other]

LaCour!: Enabling Research on Argumentation in Hearings of the European Court of Human Rights

Authors: Lena Held, Ivan Habernal

Abstract: Why does an argument end up in the final court decision? Was it deliberated or questioned during the oral hearings? Was there something in the hearings that triggered a particular judge to write a dissenting opinion? Despite the availability of the final judgments of the European Court of Human Rights (ECHR), none of these legal research questions can currently be answered as the ECHR's multilingu… ▽ More Why does an argument end up in the final court decision? Was it deliberated or questioned during the oral hearings? Was there something in the hearings that triggered a particular judge to write a dissenting opinion? Despite the availability of the final judgments of the European Court of Human Rights (ECHR), none of these legal research questions can currently be answered as the ECHR's multilingual oral hearings are not transcribed, structured, or speaker-attributed. We address this fundamental gap by presenting LaCour!, the first corpus of textual oral arguments of the ECHR, consisting of 154 full hearings (2.1 million tokens from over 267 hours of video footage) in English, French, and other court languages, each linked to the corresponding final judgment documents. In addition to the transcribed and partially manually corrected text from the video, we provide sentence-level timestamps and manually annotated role and language labels. We also showcase LaCour! in a set of preliminary experiments that explore the interplay between questions and dissenting opinions. Apart from the use cases in legal NLP, we hope that law students or other interested parties will also use LaCour! as a learning resource, as it is freely available in various formats at https://huggingface.co/datasets/TrustHLT/LaCour. △ Less

Submitted 14 June, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2310.00453 [pdf, other]

doi 10.1093/mnras/stae929

MRI turbulence in vertically stratified accretion discs at large magnetic Prandtl numbers

Authors: Loren E. Held, George Mamatsashvili, Martin E. Pessah

Abstract: The discovery of the first binary neutron star merger, GW170817, has spawned a plethora of global numerical relativity simulations. These simulations are often ideal (with dissipation determined by the grid) and/or axisymmetric (invoking ad hoc mean-field dynamos). However, binary neutron star mergers (similar to X-ray binaries and active galactic nuclei inner discs) are characterised by large mag… ▽ More The discovery of the first binary neutron star merger, GW170817, has spawned a plethora of global numerical relativity simulations. These simulations are often ideal (with dissipation determined by the grid) and/or axisymmetric (invoking ad hoc mean-field dynamos). However, binary neutron star mergers (similar to X-ray binaries and active galactic nuclei inner discs) are characterised by large magnetic Prandtl numbers, $\rm Pm$, (the ratio of viscosity to resistivity). $\rm Pm$ is a key parameter determining dynamo action and dissipation but it is ill-defined (and likely of order unity) in ideal simulations. To bridge this gap, we investigate the magnetorotational instability (MRI) and associated dynamo at large magnetic Prandtl numbers using fully compressible, three-dimensional, vertically stratified, isothermal simulations of a local patch of a disc. We find that, within the bulk of the disc ($z\lesssim2H$, where $H$ is the scale-height), the turbulent intensity (parameterized by the stress-to-thermal-pressure ratio $α$), and the saturated magnetic field energy density, $E_\text{mag}$, produced by the MRI dynamo, both scale as a power with Pm at moderate Pm ($4\lesssim \text{Pm} \lesssim 32$): $E_\text{mag} \sim \text{Pm}^{0.74}$ and $α\sim \text{Pm}^{0.71}$, respectively. At larger Pm ($\gtrsim 32$) we find deviations from power-law scaling and the onset of a plateau. Compared to our recent unstratified study, this scaling with Pm becomes weaker further away from the disc mid-plane, where the Parker instability dominates. We perform a thorough spectral analysis to understand the underlying dynamics of small-scale MRI-driven turbulence in the mid-plane and of large-scale Parker-unstable structures in the atmosphere. △ Less

Submitted 9 April, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: Accepted by MNRAS (17 pages + 2 pages appendices, 14 figures, 3 tables)

arXiv:2307.04548 [pdf, other]

Beyond the Two-Trials Rule

Authors: Leonhard Held

Abstract: The two-trials rule for drug approval requires "at least two adequate and well-controlled studies, each convincing on its own, to establish effectiveness". This is usually implemented by requiring two significant pivotal trials and is the standard regulatory requirement to provide evidence for a new drug's efficacy. However, there is need to develop suitable alternatives to this rule for a number… ▽ More The two-trials rule for drug approval requires "at least two adequate and well-controlled studies, each convincing on its own, to establish effectiveness". This is usually implemented by requiring two significant pivotal trials and is the standard regulatory requirement to provide evidence for a new drug's efficacy. However, there is need to develop suitable alternatives to this rule for a number of reasons, including the possible availability of data from more than two trials. I consider the case of up to 3 studies and stress the importance to control the partial Type-I error rate, where only some studies have a true null effect, while maintaining the overall Type-I error rate of the two-trials rule, where all studies have a null effect. Some less-known $p$-value combination methods are useful to achieve this: Pearson's method, Edgington's method and the recently proposed harmonic mean $χ^2$-test. I study their properties and discuss how they can be extended to a sequential assessment of success while still ensuring overall Type-I error control. I compare the different methods in terms of partial Type-I error rate, project power and the expected number of studies required. Edgington's method is eventually recommended as it is easy to implement and communicate, has only moderate partial Type-I error rate inflation but substantially increased project power. △ Less

Submitted 9 November, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

arXiv:2305.04587 [pdf, other]

doi 10.7554/eLife.92311.1

Replication of "null results" -- Absence of evidence or evidence of absence?

Authors: Samuel Pawel, Rachel Heyard, Charlotte Micheloud, Leonhard Held

Abstract: In several large-scale replication projects, statistically non-significant results in both the original and the replication study have been interpreted as a "replication success". Here we discuss the logical problems with this approach: Non-significance in both studies does not ensure that the studies provide evidence for the absence of an effect and "replication success" can virtually always be a… ▽ More In several large-scale replication projects, statistically non-significant results in both the original and the replication study have been interpreted as a "replication success". Here we discuss the logical problems with this approach: Non-significance in both studies does not ensure that the studies provide evidence for the absence of an effect and "replication success" can virtually always be achieved if the sample sizes are small enough. In addition, the relevant error rates are not controlled. We show how methods, such as equivalence testing and Bayes factors, can be used to adequately quantify the evidence for the absence of an effect and how they can be applied in the replication setting. Using data from the Reproducibility Project: Cancer Biology, the Experimental Philosophy Replicability Project, and the Reproducibility Project: Psychology we illustrate that many original and replication studies with "null results" are in fact inconclusive. We conclude that it is important to also replicate studies with statistically non-significant results, but that they should be designed, analyzed, and interpreted appropriately. △ Less

Submitted 18 December, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Journal ref: eLife (2023). 12:RP92311

arXiv:2302.12651 [pdf, other]

Simulating and reporting frequentist operating characteristics of clinical trials that borrow external information

Authors: Annette Kopp-Schneider, Manuel Wiesenfarth, Leonhard Held, Silvia Calderazzo

Abstract: Borrowing of information from historical or external data to inform inference in a current trial is an expanding field in the era of precision medicine, where trials are often performed in small patient cohorts for practical or ethical reasons. Many approaches for borrowing from external data have been proposed. Even though these methods are mainly based on Bayesian approaches by incorporating ext… ▽ More Borrowing of information from historical or external data to inform inference in a current trial is an expanding field in the era of precision medicine, where trials are often performed in small patient cohorts for practical or ethical reasons. Many approaches for borrowing from external data have been proposed. Even though these methods are mainly based on Bayesian approaches by incorporating external information into the prior for the current analysis, frequentist operating characteristics of the analysis strategy are of interest. In particular, type I error and power at a prespecified point alternative are in the focus. It is well-known that borrowing from external information may lead to the alteration of type I error rate. We propose a procedure to investigate and report the frequentist operating characteristics in this context. The approach evaluates type I error rate of the test with borrowing from external data and calibrates the test without borrowing to this type I error rate. On this basis, a fair comparison of power between the test with and without borrowing is achieved. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2211.02950 [pdf, other]

The Legal Argument Reasoning Task in Civil Procedure

Authors: Leonard Bongard, Lena Held, Ivan Habernal

Abstract: We present a new NLP task and dataset from the domain of the U.S. civil procedure. Each instance of the dataset consists of a general introduction to the case, a particular question, and a possible solution argument, accompanied by a detailed analysis of why the argument applies in that case. Since the dataset is based on a book aimed at law students, we believe that it represents a truly complex… ▽ More We present a new NLP task and dataset from the domain of the U.S. civil procedure. Each instance of the dataset consists of a general introduction to the case, a particular question, and a possible solution argument, accompanied by a detailed analysis of why the argument applies in that case. Since the dataset is based on a book aimed at law students, we believe that it represents a truly complex task for benchmarking modern legal language models. Our baseline evaluation shows that fine-tuning a legal transformer provides some advantage over random baseline models, but our analysis reveals that the actual ability to infer legal arguments remains a challenging open research question. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: Camera ready, to appear at the Natural Legal Language Processing Workshop 2022 co-located with EMNLP

arXiv:2211.02552 [pdf, other]

doi 10.1037/met0000604

Bayesian Approaches to Designing Replication Studies

Authors: Samuel Pawel, Guido Consonni, Leonhard Held

Abstract: Replication studies are essential for assessing the credibility of claims from original studies. A critical aspect of designing replication studies is determining their sample size; a too small sample size may lead to inconclusive studies whereas a too large sample size may waste resources that could be allocated better in other studies. Here, we show how Bayesian approaches can be used for tackli… ▽ More Replication studies are essential for assessing the credibility of claims from original studies. A critical aspect of designing replication studies is determining their sample size; a too small sample size may lead to inconclusive studies whereas a too large sample size may waste resources that could be allocated better in other studies. Here, we show how Bayesian approaches can be used for tackling this problem. The Bayesian framework allows researchers to combine the original data and external knowledge in a design prior distribution for the underlying parameters. Based on a design prior, predictions about the replication data can be made, and the replication sample size can be chosen to ensure a sufficiently high probability of replication success. Replication success may be defined by Bayesian or non-Bayesian criteria, and different criteria may also be combined to meet distinct stakeholders and enable conclusive inferences based on multiple analysis approaches. We investigate sample size determination in the normal-normal hierarchical model where analytical results are available and traditional sample size determination is a special case where the uncertainty on parameter values is not accounted for. We use data from a multisite replication project of social-behavioral experiments to illustrate how Bayesian approaches can help design informative and cost-effective replication studies. Our methods can be used through the R package BayesRepDesign. △ Less

Submitted 11 August, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

Journal ref: 2023, Psychological Methods

arXiv:2207.14720 [pdf, other]

doi 10.1007/s11749-023-00888-5

Power priors for replication studies

Authors: Samuel Pawel, Frederik Aust, Leonhard Held, Eric-Jan Wagenmakers

Abstract: The ongoing replication crisis in science has increased interest in the methodology of replication studies. We propose a novel Bayesian analysis approach using power priors: The likelihood of the original study's data is raised to the power of $α$, and then used as the prior distribution in the analysis of the replication data. Posterior distribution and Bayes factor hypothesis tests related to th… ▽ More The ongoing replication crisis in science has increased interest in the methodology of replication studies. We propose a novel Bayesian analysis approach using power priors: The likelihood of the original study's data is raised to the power of $α$, and then used as the prior distribution in the analysis of the replication data. Posterior distribution and Bayes factor hypothesis tests related to the power parameter $α$ quantify the degree of compatibility between the original and replication study. Inferences for other parameters, such as effect sizes, dynamically borrow information from the original study. The degree of borrowing depends on the conflict between the two studies. The practical value of the approach is illustrated on data from three replication studies, and the connection to hierarchical modeling approaches explored. We generalize the known connection between normal power priors and normal hierarchical models for fixed parameters and show that normal power prior inferences with a beta prior on the power parameter $α$ align with normal hierarchical model inferences using a generalized beta prior on the relative heterogeneity variance $I^2$. The connection illustrates that power prior modeling is unnatural from the perspective of hierarchical modeling since it corresponds to specifying priors on a relative rather than an absolute heterogeneity scale. △ Less

Submitted 27 September, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

Journal ref: TEST, 2023

arXiv:2207.03199 [pdf, other]

Comparing Confidence Intervals for a Binomial Proportion with the Interval Score

Authors: Lisa J. Hofer, Leonhard Held

Abstract: There are over 55 different ways to construct a confidence respectively credible interval (CI) for the binomial proportion. Methods to compare them are necessary to decide which should be used in practice. The interval score has been suggested to compare prediction intervals. This score is a proper scoring rule that combines the coverage as a measure of calibration and the width as a measure of sh… ▽ More There are over 55 different ways to construct a confidence respectively credible interval (CI) for the binomial proportion. Methods to compare them are necessary to decide which should be used in practice. The interval score has been suggested to compare prediction intervals. This score is a proper scoring rule that combines the coverage as a measure of calibration and the width as a measure of sharpness. We evaluate eleven CIs for the binomial proportion based on the expected interval score and propose a summary measure which can take into account different weighting of the underlying true proportion. Under uniform weighting, the expected interval score recommends the Wilson CI or Bayesian credible intervals with a uniform prior. If extremely low or high proportions receive more weight, the score recommends Bayesian credible intervals based on Jeffreys' prior. While more work is needed to theoretically justify the use of the interval score for the comparison of CIs, our results suggest that it constitutes a useful method to combine coverage and width in one measure. This novel approach could also be used in other applications. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: 22 pages

arXiv:2207.00464 [pdf, other]

doi 10.1111/stan.12312

Assessing replicability with the sceptical p-value: Type-I error control and sample size planning

Authors: Charlotte Micheloud, Fadoua Balabdaoui, Leonhard Held

Abstract: We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical $p$-value. A recalibration is proposed to obtain exact overall Type-I error control if the effect is null in both studies and additional bounds on the partial and conditional Type-I error rate, which represent the case where only one study has a null effect. Th… ▽ More We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical $p$-value. A recalibration is proposed to obtain exact overall Type-I error control if the effect is null in both studies and additional bounds on the partial and conditional Type-I error rate, which represent the case where only one study has a null effect. The approach avoids the double dichotomization for significance of the two-trials rule and has larger project power to detect existing effects over both studies in combination. It can also be used for power calculations and requires a smaller replication sample size than the two-trials rule for already convincing original studies. We illustrate the performance of the proposed methodology in an application to data from the Experimental Economics Replication Project. △ Less

Submitted 19 May, 2023; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: Supporting Information is available after the references. Title of previous submission was "A statistical framework for replicability"

arXiv:2206.04379 [pdf, other]

doi 10.1002/sta4.591

Normalized power priors always discount historical data

Authors: Samuel Pawel, Frederik Aust, Leonhard Held, Eric-Jan Wagenmakers

Abstract: Power priors are used for incorporating historical data in Bayesian analyses by taking the likelihood of the historical data raised to the power $α$ as the prior distribution for the model parameters. The power parameter $α$ is typically unknown and assigned a prior distribution, most commonly a beta distribution. Here, we give a novel theoretical result on the resulting marginal posterior distrib… ▽ More Power priors are used for incorporating historical data in Bayesian analyses by taking the likelihood of the historical data raised to the power $α$ as the prior distribution for the model parameters. The power parameter $α$ is typically unknown and assigned a prior distribution, most commonly a beta distribution. Here, we give a novel theoretical result on the resulting marginal posterior distribution of $α$ in case of the the normal and binomial model. Counterintuitively, when the current data perfectly mirror the historical data and the sample sizes from both data sets become arbitrarily large, the marginal posterior of $α$ does not converge to a point mass at $α= 1$ but approaches a distribution that hardly differs from the prior. The result implies that a complete pooling of historical and current data is impossible if a power prior with beta prior for $α$ is used. △ Less

Submitted 26 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Journal ref: Stat, 2023, 12(1), e591

arXiv:2206.00497 [pdf, other]

doi 10.1093/mnras/stac2656

MRI turbulence in accretion discs at large magnetic Prandtl numbers

Authors: Loren E. Held, George Mamatsashvili

Abstract: The effect of large magnetic Prandtl number $\text{Pm}$ (the ratio of viscosity to resistivity) on the turbulent transport and energetics of the magnetorotational instability (MRI) is poorly understood, despite the realization of this regime in astrophysical environments as disparate as discs from binary neutron star mergers, the inner regions of low mass X-ray binaries and active galactic nuclei,… ▽ More The effect of large magnetic Prandtl number $\text{Pm}$ (the ratio of viscosity to resistivity) on the turbulent transport and energetics of the magnetorotational instability (MRI) is poorly understood, despite the realization of this regime in astrophysical environments as disparate as discs from binary neutron star mergers, the inner regions of low mass X-ray binaries and active galactic nuclei, and the interiors of protoneutron stars. We investigate the MRI dynamo and associated turbulence in the regime $\text{Pm}>1$ by carrying out fully compressible, 3D MHD shearing box simulations using the finite-volume code \textsc{PLUTO}, focusing mostly on the case of Keplerian shear relevant to accretion discs. We find that when the magnetic Reynolds number is kept fixed, the turbulent transport (as parameterized by $α$, the ratio of stress to thermal pressure) scales with the magnetic Prandtl number as $α\sim \text{Pm}^δ$, with $δ\sim 0.5-0.7$ up to $\text{Pm} \sim 128$. However, this scaling weakens as the magnetic Reynolds number is increased. Importantly, compared to previous studies, we find a new effect at very large $\text{Pm}$ -- the turbulent energy and stress begin to plateau, no longer depending on ${\rm Pm}$. To understand these results we have carried out a detailed analysis of the turbulent dynamics in Fourier space, focusing on the effect of increasing $\text{Pm}$ on the transverse cascade -- a key non-linear process induced by the disc shear flow that is responsible for the sustenance of MRI turbulence. Finally, we find that $α$-$\text{Pm}$ scaling is sensitive to the box vertical-to-radial aspect ratio, as well as to the background shear. △ Less

Submitted 15 September, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted for publications in MNRAS (22 pages, 18 figures, 2 tables)

arXiv:2205.04487 [pdf, other]

doi 10.1103/PhysRevD.106.124041

Implementation of advanced Riemann solvers in a neutrino-radiation magnetohydrodynamics code in numerical relativity and its application to a binary neutron star merger

Authors: Kenta Kiuchi, Loren E. Held, Yuichiro Sekiguchi, Masaru Shibata

Abstract: We implement advanced Riemann solvers HLLC and HLLD \cite{Mignone:2005ft,MUB:2009} together with an advanced constrained transport scheme \cite{Gardiner:2007nc} in a numerical-relativity neutrino-radiation magnetohydrodynamics code. We validate our implementation by performing a series of one- and multi-dimensional test problems for relativistic hydrodynamics and magnetohydrodynamics in both Minko… ▽ More We implement advanced Riemann solvers HLLC and HLLD \cite{Mignone:2005ft,MUB:2009} together with an advanced constrained transport scheme \cite{Gardiner:2007nc} in a numerical-relativity neutrino-radiation magnetohydrodynamics code. We validate our implementation by performing a series of one- and multi-dimensional test problems for relativistic hydrodynamics and magnetohydrodynamics in both Minkowski spacetime and a static black hole spacetime. We find that the numerical solutions with the advanced Riemann solvers are more accurate than those with the HLLE solver \cite{DelZanna:2002rv}, which was originally implemented in our code. As an application to numerical relativity, we simulate an asymmetric binary neutron star merger leading to a short-lived massive neutron star both with and without magnetic fields. We find that the lifetime of the rotating massive neutron star formed after the merger and also the amount of the tidally-driven dynamical ejecta are overestimated when we employ the diffusive HLLE solver. We also find that the magnetorotational instability is less resolved when we employ the HLLE solver because of the solver's large numerical diffusivity. This causes a spurious enhancement both of magnetic winding resulting from large scale poloidal magnetic fields, and also of the energy of the outflow induced by magnetic pressure. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 40 pages, 27 figures

arXiv:2204.06960 [pdf, other]

The replication of equivalence studies

Authors: Charlotte Micheloud, Leonhard Held

Abstract: Replication studies are increasingly conducted to assess the credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of methodology regarding the analysis of replication studies with alternative types of designs, such as equivalence. In order to fill this gap, we propose two approaches, the two-trials rule and the sceptic… ▽ More Replication studies are increasingly conducted to assess the credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of methodology regarding the analysis of replication studies with alternative types of designs, such as equivalence. In order to fill this gap, we propose two approaches, the two-trials rule and the sceptical TOST procedure, adapted from methods used in superiority settings. Both methods have the same overall Type-I error rate, but the sceptical TOST procedure allows replication success even for non-significant original or replication studies. This leads to a larger project power and other differences in relevant operating characteristics. Both methods can be used for sample size calculation of the replication study, based on the results from the original one. The two methods are applied to data from the Reproducibility Project: Cancer Biology. △ Less

Submitted 24 April, 2024; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: 24 pages, 6 figures

arXiv:2112.11898 [pdf, other]

Combining Evidence from Clinical Trials in Conditional or Accelerated Approval

Authors: Manja Deforth, Charlotte Micheloud, Kit C Roes, Leonhard Held

Abstract: Conditional (European Medicines Agency) or accelerated (U.S. Food and Drug Administration) approval of drugs allow earlier access to promising new treatments that address unmet medical needs. Certain post-marketing requirements must typically be met in order to obtain full approval, such as conducting a new post-market clinical trial. We study the applicability of the recently developed harmonic m… ▽ More Conditional (European Medicines Agency) or accelerated (U.S. Food and Drug Administration) approval of drugs allow earlier access to promising new treatments that address unmet medical needs. Certain post-marketing requirements must typically be met in order to obtain full approval, such as conducting a new post-market clinical trial. We study the applicability of the recently developed harmonic mean Chi-squared test to this conditional or accelerated approval framework. The proposed approach can be used both to support the design of the post-market trial and the analysis of the combined evidence provided by both trials. Other methods considered are the two-trials rule, Fisher's criterion and Stouffer's method. In contrast to some of the traditional methods, the harmonic mean Chi-squared test always requires a post-market clinical trial. If the p-value from the pre-market clinical trial is << 0.025, a smaller sample size for the post-market clinical trial is needed than with the two-trials rule. For illustration, we apply the harmonic mean Chi-squared test to a drug which received conditional (and later full) market licensing by the EMA. A simulation study is conducted to study the operating characteristics of the harmonic mean Chi-squared test and two-trials rule in more detail. We finally investigate the applicability of these two methods to compute the power at interim of an ongoing post-market trial. These results are expected to aid in the design and assessment of the required post-market studies in terms of the level of evidence required for full approval. △ Less

Submitted 18 October, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

arXiv:2111.11226 [pdf, other]

doi 10.1093/mnras/stab3398

The stress-pressure lag in MRI turbulence and its implications for thermal instability in accretion discs

Authors: Loren E. Held, Henrik N. Latter

Abstract: The classical alpha-disc model assumes that the turbulent stress scales linearly with -- and responds instantaneously to -- the pressure. It is likely, however, that the stress possesses a non-negligible relaxation time and will lag behind the pressure on some timescale. To measure the size of this lag we carry out unstratified 3D magnetohydrodynamic shearing box simulations with zero-net-magnetic… ▽ More The classical alpha-disc model assumes that the turbulent stress scales linearly with -- and responds instantaneously to -- the pressure. It is likely, however, that the stress possesses a non-negligible relaxation time and will lag behind the pressure on some timescale. To measure the size of this lag we carry out unstratified 3D magnetohydrodynamic shearing box simulations with zero-net-magnetic-flux using the finite-volume code PLUTO. We impose thermal oscillations of varying periods via a cooling term, which in turn drives oscillations in the turbulent stress. Our simulations reveal that the stress oscillations lag behind the pressure by $\sim 5$ orbits in cases where the oscillation period is several tens of orbits or more. We discuss the implication of our results for thermal and viscous overstability in discs around compact objects. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted for publication in MNRAS (8 pages, 7 figures)

arXiv:2104.02473 [pdf, other]

doi 10.1093/mnras/stab974

Magnetohydrodynamic convection in accretion discs

Authors: Loren E. Held, Henrik N. Latter

Abstract: Convection has been discussed in the field of accretion discs for several decades, both as a means of angular momentum transport and also because of its role in controlling discs' vertical structure via heat transport. If the gas is sufficiently ionized and threaded by a weak magnetic field, convection might interact in non-trivial ways with the magnetorotational instability (MRI). Recently, verti… ▽ More Convection has been discussed in the field of accretion discs for several decades, both as a means of angular momentum transport and also because of its role in controlling discs' vertical structure via heat transport. If the gas is sufficiently ionized and threaded by a weak magnetic field, convection might interact in non-trivial ways with the magnetorotational instability (MRI). Recently, vertically stratified local simulations of the MRI have reported considerable variation in the angular momentum transport, as measured by the stress to thermal pressure ratio $α$, when convection is thought to be present. Although MRI turbulence can act as a heat source for convection, it is not clear how the instabilities will interact dynamically. Here we aim to investigate the interplay between the two instabilities in controlled numerical experiments, and thus isolate the generic features of their interaction. We perform vertically stratified, 3D MHD shearing box simulations with a perfect gas equation of state with the conservative, finite-volume code PLUTO. We find two characteristic outcomes of the interaction between the two instabilities: straight MRI and MRI/convective cycles, with the latter exhibiting alternating phases of convection-dominated flow (during which the turbulent transport is weak) and MRI-dominated flow. During the latter phase we find that $α$ is enhanced by nearly an order of magnitude, reaching peak values of $\sim 0.08$. In addition, we find that convection in the non-linear phase takes the form of large-scale and oscillatory convective cells. Convection can also help the MRI persist to lower Rm than it would otherwise do. Finally we discuss how our results help interpret simulations of Dwarf Novae. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: Accepted for publication in MNRAS (23 pages, 14 figures, 1 table)

arXiv:2102.13443 [pdf, other]

doi 10.1002/jrsm.1538

Reverse-Bayes methods for evidence assessment and research synthesis

Authors: Leonhard Held, Robert Matthews, Manuela Ott, Samuel Pawel

Abstract: It is now widely accepted that the standard inferential toolkit used by the scientific research community -- null-hypothesis significance testing (NHST) -- is not fit for purpose. Yet despite the threat posed to the scientific enterprise, there is no agreement concerning alternative approaches for evidence assessment. This lack of consensus reflects long-standing issues concerning Bayesian methods… ▽ More It is now widely accepted that the standard inferential toolkit used by the scientific research community -- null-hypothesis significance testing (NHST) -- is not fit for purpose. Yet despite the threat posed to the scientific enterprise, there is no agreement concerning alternative approaches for evidence assessment. This lack of consensus reflects long-standing issues concerning Bayesian methods, the principal alternative to NHST. We report on recent work that builds on an approach to inference put forward over 70 years ago to address the well-known "Problem of Priors" in Bayesian analysis, by reversing the conventional prior-likelihood-posterior ("forward") use of Bayes's Theorem. Such Reverse-Bayes analysis allows priors to be deduced from the likelihood by requiring that the posterior achieve a specified level of credibility. We summarise the technical underpinning of this approach, and show how it opens up new approaches to common inferential challenges, such as assessing the credibility of scientific findings, setting them in appropriate context, estimating the probability of successful replications, and extracting more insight from NHST while reducing the risk of misinterpretation. We argue that Reverse-Bayes methods have a key role to play in making Bayesian methods more accessible and attractive for evidence assessment and research synthesis. As a running example we consider a recently published meta-analysis from several randomized controlled clinical trials investigating the association between corticosteroids and mortality in hospitalized patients with COVID-19. △ Less

Submitted 14 July, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

Comments: revised version of original manuscript "Reverse-Bayes methods: a review of recent technical advances"

arXiv:2009.07782 [pdf, other]

doi 10.1214/21-AOAS1502

The assessment of replication success based on relative effect size

Authors: Leonhard Held, Charlotte Micheloud, Samuel Pawel

Abstract: Replication studies are increasingly conducted in order to confirm original findings. However, there is no established standard how to assess replication success and in practice many different approaches are used. The purpose of this paper is to refine and extend a recently proposed reverse-Bayes approach for the analysis of replication studies. We show how this method is directly related to the r… ▽ More Replication studies are increasingly conducted in order to confirm original findings. However, there is no established standard how to assess replication success and in practice many different approaches are used. The purpose of this paper is to refine and extend a recently proposed reverse-Bayes approach for the analysis of replication studies. We show how this method is directly related to the relative effect size, the ratio of the replication to the original effect estimate. This perspective leads to a new proposal to recalibrate the assessment of replication success, the golden level. The recalibration ensures that for borderline significant original studies replication success can only be achieved if the replication effect estimate is larger than the original one. Conditional power for replication success can then take any desired value if the original study is significant and the replication sample size is large enough. Compared to the standard approach to require statistical significance of both the original and replication study, replication success at the golden level offers uniform gains in project power and controls the Type-I error rate if the replication sample size is not smaller than the original one. An application to data from four large replication projects shows that the new approach leads to more appropriate inferences, as it penalizes shrinkage of the replication estimate compared to the original one, while ensuring that both effect estimates are sufficiently convincing on their own. △ Less

Submitted 8 April, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: revision, 16 pages, 7 figures, 2 tables

Journal ref: The Annals of Applied Statistics 2022, Vol. 16, No. 2, 706-720

arXiv:2009.01520 [pdf, other]

doi 10.1111/rssb.12491

The sceptical Bayes factor for the assessment of replication success

Authors: Samuel Pawel, Leonhard Held

Abstract: Replication studies are increasingly conducted but there is no established statistical criterion for replication success. We propose a novel approach combining reverse-Bayes analysis with Bayesian hypothesis testing: a sceptical prior is determined for the effect size such that the original finding is no longer convincing in terms of a Bayes factor. This prior is then contrasted to an advocacy pri… ▽ More Replication studies are increasingly conducted but there is no established statistical criterion for replication success. We propose a novel approach combining reverse-Bayes analysis with Bayesian hypothesis testing: a sceptical prior is determined for the effect size such that the original finding is no longer convincing in terms of a Bayes factor. This prior is then contrasted to an advocacy prior (the reference posterior of the effect size based on the original study), and replication success is declared if the replication data favour the advocacy over the sceptical prior at a higher level than the original data favoured the sceptical prior over the null hypothesis. The sceptical Bayes factor is the highest level where replication success can be declared. A comparison to existing methods reveals that the sceptical Bayes factor combines several notions of replicability: it ensures that both studies show sufficient evidence against the null and penalises incompatibility of their effect estimates. Analysis of asymptotic properties and error rates, as well as case studies from the Social Sciences Replication Project show the advantages of the method for the assessment of replicability. △ Less

Submitted 23 August, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

arXiv:2004.10814 [pdf, other]

doi 10.1214/21-STS828

Power Calculations for Replication Studies

Authors: Charlotte Micheloud, Leonhard Held

Abstract: The reproducibility crisis has led to an increasing number of replication studies being conducted. Sample sizes for replication studies are often calculated using conditional power based on the effect estimate from the original study. However, this approach is not well suited as it ignores the uncertainty of the original result. Bayesian methods are used in clinical trials to incorporate prior inf… ▽ More The reproducibility crisis has led to an increasing number of replication studies being conducted. Sample sizes for replication studies are often calculated using conditional power based on the effect estimate from the original study. However, this approach is not well suited as it ignores the uncertainty of the original result. Bayesian methods are used in clinical trials to incorporate prior information into power calculations. We propose to adapt this methodology to the replication framework and promote the use of predictive instead of conditional power in the design of replication studies. Moreover, we describe how extensions of the methodology to sequential clinical trials can be tailored to replication studies. Conditional and predictive power calculated at an interim analysis are compared and we argue that predictive power is a useful tool to decide whether to stop a replication study prematurely. A recent project on the replicability of social sciences is used to illustrate the properties of the different methods. △ Less

Submitted 21 May, 2021; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: 28 pages, 4 figures (+ 1 in Appendix), 3 tables (+2 in Appendix)

Journal ref: Statistical Science. 2022;37:369 -- 379

arXiv:2003.05885 [pdf, other]

A marginal moment matching approach for fitting endemic-epidemic models to underreported disease surveillance counts

Authors: Johannes Bracher, Leonhard Held

Abstract: Count data are often subject to underreporting, especially in infectious disease surveillance. We propose an approximate maximum likelihood method to fit count time series models from the endemic-epidemic class to underreported data. The approach is based on marginal moment matching where underreported processes are approximated through completely observed processes from the same class. Moreover,… ▽ More Count data are often subject to underreporting, especially in infectious disease surveillance. We propose an approximate maximum likelihood method to fit count time series models from the endemic-epidemic class to underreported data. The approach is based on marginal moment matching where underreported processes are approximated through completely observed processes from the same class. Moreover, the form of the bias when underreporting is ignored or taken into account via multiplication factors is analysed. Notably, we show that this leads to a downward bias in model-based estimates of the effective reproductive number. A marginal moment matching approach can also be used to account for reporting intervals which are longer than the mean serial interval of a disease. The good performance of the proposed methodology is demonstrated in simulation studies. An extension to time-varying parameters and reporting probabilities is discussed and applied in a case study on weekly rotavirus gastroenteritis counts in Berlin, Germany. △ Less

Submitted 24 August, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

arXiv:1911.10633 [pdf, other]

doi 10.1111/rssc.12410

The harmonic mean $χ^2$ test to substantiate scientific findings

Authors: Leonhard Held

Abstract: Statistical methodology plays a crucial role in drug regulation. Decisions by the FDA or EMA are typically made based on multiple primary studies testing the same medical product, where the two-trials rule is the standard requirement, despite a number of shortcomings. A new approach is proposed for this task based on the (weighted) harmonic mean of the squared study-specific test statistics. Appro… ▽ More Statistical methodology plays a crucial role in drug regulation. Decisions by the FDA or EMA are typically made based on multiple primary studies testing the same medical product, where the two-trials rule is the standard requirement, despite a number of shortcomings. A new approach is proposed for this task based on the (weighted) harmonic mean of the squared study-specific test statistics. Appropriate scaling ensures that, for any number of independent studies, the null distribution is a $χ^2$-distribution with one degree of freedom. This gives rise to a new method for combining one-sided $p$-values and calculating confidence intervals for the overall treatment effect. Further properties are discussed and a comparison with the two-trials rule is made, as well as with alternative research synthesis methods. An attractive feature of the new approach is that a claim of success requires each study to be convincing on its own to a certain degree depending on the overall significance level and the number of studies. A real example with 5 clinical trials investigating the effect of Carvedilol for the treatment of patients with moderate to severe heart failure patients is used to illustrate the methodology. △ Less

Submitted 6 March, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

Comments: Final version, to be published in JRSSC

Journal ref: Journal of the Royal Statistical Society, Series C, 69, 697-708 2020

arXiv:1901.03090 [pdf, other]

Endemic-epidemic models with discrete-time serial interval distributions for infectious disease prediction

Authors: Johannes Bracher, Leonhard Held

Abstract: Multivariate count time series models are an important tool for the analysis and prediction of infectious disease spread. We consider the endemic-epidemic framework, an autoregressive model class for infectious disease surveillance counts, and replace the default autoregression on counts from the previous time period with more flexible weighting schemes inspired by discrete-time serial interval di… ▽ More Multivariate count time series models are an important tool for the analysis and prediction of infectious disease spread. We consider the endemic-epidemic framework, an autoregressive model class for infectious disease surveillance counts, and replace the default autoregression on counts from the previous time period with more flexible weighting schemes inspired by discrete-time serial interval distributions. We employ three different parametric formulations, each with an additional unknown weighting parameter estimated via a profile likelihood approach, and compare them to an unrestricted nonparametric approach. The new methods are illustrated in a univariate analysis of dengue fever incidence in San Juan, Puerto Rico, and a spatio-temporal study of viral gastroenteritis in the twelve districts of Berlin. We assess the predictive performance of the suggested models and several reference models at various forecast horizons. In both applications, the performance of the endemic-epidemic models is considerably improved by the proposed weighting schemes. △ Less

Submitted 13 March, 2020; v1 submitted 10 January, 2019; originally announced January 2019.

Comments: A previous version of this paper had the title "Multivariate endemic-epidemic models with higher-order lags and an application to outbreak detection" (see v1)

arXiv:1811.10287 [pdf, other]

doi 10.1111/rssa.12493

A New Standard for the Analysis and Design of Replication Studies

Authors: Leonhard Held

Abstract: A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse-Bayes technique with prior-predictive tail probabilities to define replication success. The method gives rise to a quantitative measure for replication success, called the sceptical p-value. The sceptical p-value integrates traditional significance of both the original and repl… ▽ More A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse-Bayes technique with prior-predictive tail probabilities to define replication success. The method gives rise to a quantitative measure for replication success, called the sceptical p-value. The sceptical p-value integrates traditional significance of both the original and replication study with a comparison of the respective effect sizes. It incorporates the uncertainty of both the original and replication effect estimates and reduces to the ordinary p-value of the replication study if the uncertainty of the original effect estimate is ignored. The proposed framework can also be used to determine the power or the required replication sample size to achieve replication success. Numerical calculations highlight the difficulty to achieve replication success if the evidence from the original study is only suggestive. An application to data from the Open Science Collaboration project on the replicability of psychological science illustrates the proposed methodology. △ Less

Submitted 11 May, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: Revised Manuscript for RSS Discussion Meeting

Journal ref: Journal of the Royal Statistical Society, Series A, 183, 431-448, 2020

arXiv:1809.03735 [pdf, other]

doi 10.1201/9781315222912-25

Forecasting Based on Surveillance Data

Authors: Leonhard Held, Sebastian Meyer

Abstract: Forecasting the future course of epidemics has always been one of the main goals of epidemic modelling. This chapter reviews statistical methods to quantify the accuracy of epidemic forecasts. We distinguish point and probabilistic forecasts and describe different methods to evaluate and compare the predictive performance across models. Two case studies demonstrate how to apply the different techn… ▽ More Forecasting the future course of epidemics has always been one of the main goals of epidemic modelling. This chapter reviews statistical methods to quantify the accuracy of epidemic forecasts. We distinguish point and probabilistic forecasts and describe different methods to evaluate and compare the predictive performance across models. Two case studies demonstrate how to apply the different techniques to uni- and multivariate forecasts. We focus on forecasting count time series from routine public health surveillance: weekly counts of influenza-like illness in Switzerland, and age-stratified counts of norovirus gastroenteritis in Berlin, Germany. Data and code for all analyses are available in a supplementary R package. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: This is an author-created preprint of a book chapter to appear in the Handbook of Infectious Disease Data Analysis edited by Leonhard Held, Niel Hens, Philip D O'Neill and Jacco Wallinga, Chapman and Hall/CRC, 2019. 19 pages, including 9 figures and 4 tables; supplementary R package 'HIDDA.forecasting' available https://HIDDA.github.io/forecasting/

Journal ref: Handbook of Infectious Disease Data Analysis; Chapman & Hall/CRC, 2019; Chapter 25

arXiv:1808.00267 [pdf, other]

doi 10.1093/mnras/sty2097

Hydrodynamic convection in accretion discs

Authors: Loren E. Held, Henrik N. Latter

Abstract: The prevalence and consequences of convection perpendicular to the plane of accretion discs have been discussed for several decades. Recent simulations combining convection and the magnetorotational instability have given fresh impetus to the debate, as the interplay of the two processes can enhance angular momentum transport, at least in the optically thick outburst stage of dwarf novae. In this… ▽ More The prevalence and consequences of convection perpendicular to the plane of accretion discs have been discussed for several decades. Recent simulations combining convection and the magnetorotational instability have given fresh impetus to the debate, as the interplay of the two processes can enhance angular momentum transport, at least in the optically thick outburst stage of dwarf novae. In this paper we seek to isolate and understand the most generic features of disc convection, and so undertake its study in purely hydrodynamical models. First, we investigate the linear phase of the instability, obtaining estimates of the growth rates both semi-analytically, using one-dimensional spectral computations, as well as analytically, using WKBJ methods. Next we perform three-dimensional, vertically stratified, shearing box simulations with the conservative, finite-volume code PLUTO, both with and without explicit diffusion coefficients. We find that hydrodynamic convection can, in general, drive outward angular momentum transport, a result that we confirm with ATHENA, an alternative finite-volume code. Moreover, we establish that the sign of the angular momentum flux is sensitive to the diffusivity of the numerical scheme. Finally, in sustained convection, whereby the system is continuously forced to an unstable state, we observe the formation of various coherent structures, including large- scale and oscillatory convective cells, zonal flows, and small vortices. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: Accepted for publication in MNRAS (20 pages, 16 figures, 4 tables)

arXiv:1803.10052 [pdf, other]

doi 10.1098/rsos.181534

The Assessment of Intrinsic Credibility and a New Argument for p<0.005

Authors: Leonhard Held

Abstract: The concept of intrinsic credibility has been recently introduced to check the credibility of "out of the blue" findings without any prior support. A significant result is deemed intrinsically credible if it is in conflict with a sceptical prior derived from the very same data that would make the effect non-significant. In this paper I propose to use Bayesian prior-predictive tail probabilities to… ▽ More The concept of intrinsic credibility has been recently introduced to check the credibility of "out of the blue" findings without any prior support. A significant result is deemed intrinsically credible if it is in conflict with a sceptical prior derived from the very same data that would make the effect non-significant. In this paper I propose to use Bayesian prior-predictive tail probabilities to assess intrinsic credibility. For the standard 5% significance level, this leads to a new p-value threshold that is remarkably close to the recently proposed p<0.005 standard. I also introduce the credibility ratio, the ratio of the upper to the lower limit of a standard confidence interval for the corresponding effect size. I show that the credibility ratio has to be smaller than 5.8 such that a significant finding is also intrinsically credible. Finally, a p-value for intrinsic credibility is proposed that is a simple function of the ordinary p-value and has a direct frequentist interpretation in terms of the probability of replicating an effect. △ Less

Submitted 11 September, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: arXiv admin note: text overlap with arXiv:1712.03032

Journal ref: Royal Society Open Science, 6, 2019

arXiv:1712.03032 [pdf, other]

p-Values for Credibility

Authors: Leonhard Held

Abstract: Analysis of credibility is a reverse-Bayes technique that has been proposed by Matthews (2001) to overcome some of the shortcomings of significance tests. A significant result is deemed credible if current knowledge about the effect size is in conflict with any sceptical prior that would make the effect non-significant. In this paper I formalize the approach and propose to use Bayesian predictive… ▽ More Analysis of credibility is a reverse-Bayes technique that has been proposed by Matthews (2001) to overcome some of the shortcomings of significance tests. A significant result is deemed credible if current knowledge about the effect size is in conflict with any sceptical prior that would make the effect non-significant. In this paper I formalize the approach and propose to use Bayesian predictive tail probabilities to quantify the evidence for credibility. This gives rise to a p-value for extrinsic credibility, taking into account both the internal and the external evidence for an effect. The assessment of intrinsic credibility leads to a new threshold for ordinary significance that is remarkably close to the recently proposed 0.005 level. Finally, a p-value for intrinsic credibility is proposed that is a simple function of the ordinary p-value for significance and has a direct frequentist interpretation in terms of the replication probability that a future study under identical conditions will give an estimated effect in the same direction as the first study. △ Less

Submitted 8 December, 2017; originally announced December 2017.

Comments: 21 pages, 6 figures

arXiv:1708.08239 [pdf, other]

Power Priors Based on Multiple Historical Studies for Binary Outcomes

Authors: Isaac Gravestock, Leonhard Held

Abstract: Incorporating historical information into the design and analysis of a new clinical trial has been the subject of much recent discussion. For example, in the context of clinical trials of antibiotics for drug resistant infections, where patients with specific infections can be difficult to recruit, there is often only limited and heterogeneous information available from the historical trials. To m… ▽ More Incorporating historical information into the design and analysis of a new clinical trial has been the subject of much recent discussion. For example, in the context of clinical trials of antibiotics for drug resistant infections, where patients with specific infections can be difficult to recruit, there is often only limited and heterogeneous information available from the historical trials. To make the best use of the combined information at hand, we consider an approach based on the multiple power prior which allows the prior weight of each historical study to be chosen adaptively by empirical Bayes. This choice of weight has advantages in that it varies commensurably with differences in the historical and current data and can choose weights near 1 if the data from the corresponding historical study are similar enough to the data from the current study. Fully Bayesian approaches are also considered. The methods are applied to data from antibiotics trials. An analysis of the operating characteristics in a binomial setting shows that the proposed empirical Bayes adaptive method works well, compared to several alternative approaches, including the meta-analytic prior. △ Less

Submitted 7 June, 2018; v1 submitted 28 August, 2017; originally announced August 2017.

arXiv:1708.03272 [pdf, other]

doi 10.1002/sta4.163

Fast and accurate Bayesian model criticism and conflict diagnostics using R-INLA

Authors: Egil Ferkingstad, Leonhard Held, Håvard Rue

Abstract: Bayesian hierarchical models are increasingly popular for realistic modelling and analysis of complex data. This trend is accompanied by the need for flexible, general, and computationally efficient methods for model criticism and conflict detection. Usually, a Bayesian hierarchical model incorporates a grou** of the individual data points, for example individuals in repeated measurement data. I… ▽ More Bayesian hierarchical models are increasingly popular for realistic modelling and analysis of complex data. This trend is accompanied by the need for flexible, general, and computationally efficient methods for model criticism and conflict detection. Usually, a Bayesian hierarchical model incorporates a grou** of the individual data points, for example individuals in repeated measurement data. In such cases, the following question arises: Are any of the groups "outliers", or in conflict with the remaining groups? Existing general approaches aiming to answer such questions tend to be extremely computationally demanding when model fitting is based on MCMC. We show how group-level model criticism and conflict detection can be done quickly and accurately through integrated nested Laplace approximations (INLA). The new method is implemented as a part of the open source R-INLA package for Bayesian computing (http://r-inla.org). △ Less

Submitted 1 November, 2017; v1 submitted 10 August, 2017; originally announced August 2017.

Journal ref: Stat 6(1):331-344, 2017

arXiv:1707.04635 [pdf, other]

Periodically stationary multivariate autoregressive models

Authors: Johannes Bracher, Leonhard Held

Abstract: A class of multivariate periodic autoregressive models is proposed where coupling between time series is achieved through linear mean functions. Various response distributions with quadratic mean-variance relationships fit into the framework, including the negative binomial, gamma and Gaussian distributions. We develop an iterative algorithm to obtain unconditional means, variances and auto-/cross… ▽ More A class of multivariate periodic autoregressive models is proposed where coupling between time series is achieved through linear mean functions. Various response distributions with quadratic mean-variance relationships fit into the framework, including the negative binomial, gamma and Gaussian distributions. We develop an iterative algorithm to obtain unconditional means, variances and auto-/cross-covariances for models with higher order lags. Analytical solutions are given for the univariate model with lag one and multivariate models with linear mean-variance relationship. A special case of the model class is an established framework for modelling multivariate time series of counts from routine surveillance of infectious diseases. We extend this model class to allow for distributed lags and apply it to a dataset on norovirus gastroenteritis in two German states. The availability of unconditional moments and auto/cross-correlations enhances model assessment and interpretation. △ Less

Submitted 15 December, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

arXiv:1608.05292 [pdf, ps, other]

Efficient real-time monitoring of an emerging influenza epidemic: how feasible?

Authors: Paul J Birrell, Lorenz Wernisch, Brian D M Tom, Leonhard Held, Gareth O Roberts, Richard G Pebody, Daniela De Angelis

Abstract: A prompt public health response to a new epidemic relies on the ability to monitor and predict its evolution in real time as data accumulate. The 2009 A/H1N1 outbreak in the UK revealed pandemic data as noisy, contaminated, potentially biased, and originating from multiple sources. This seriously challenges the capacity for real-time monitoring. Here we assess the feasibility of real-time inferenc… ▽ More A prompt public health response to a new epidemic relies on the ability to monitor and predict its evolution in real time as data accumulate. The 2009 A/H1N1 outbreak in the UK revealed pandemic data as noisy, contaminated, potentially biased, and originating from multiple sources. This seriously challenges the capacity for real-time monitoring. Here we assess the feasibility of real-time inference based on such data by constructing an analytic tool combining an age-stratified SEIR transmission model with various observation models describing the data generation mechanisms. As batches of data become available, a sequential Monte Carlo (SMC) algorithm is developed to synthesise multiple imperfect data streams, iterate epidemic inferences and assess model adequacy amidst a rapidly evolving epidemic environment, substantially reducing computation time in comparison to standard MCMC, to ensure timely delivery of real-time epidemic assessments. In application to simulated data designed to mimic the 2009 A/H1N1 epidemic, SMC is shown to have additional benefits in terms of assessing predictive performance and co** with parameter non-identifiability. △ Less

Submitted 3 May, 2019; v1 submitted 18 August, 2016; originally announced August 2016.

Comments: 30 pages, 8 figures

arXiv:1512.09052 [pdf, other]

doi 10.1016/j.sste.2016.03.002

Model-based testing for space-time interaction using point processes: An application to psychiatric hospital admissions in an urban area

Authors: Sebastian Meyer, Ingeborg Warnke, Wulf Rössler, Leonhard Held

Abstract: Spatio-temporal interaction is inherent to cases of infectious diseases and occurrences of earthquakes, whereas the spread of other events, such as cancer or crime, is less evident. Statistical significance tests of space-time clustering usually assess the correlation between the spatial and temporal (transformed) distances of the events. Although appealing through simplicity, these classical test… ▽ More Spatio-temporal interaction is inherent to cases of infectious diseases and occurrences of earthquakes, whereas the spread of other events, such as cancer or crime, is less evident. Statistical significance tests of space-time clustering usually assess the correlation between the spatial and temporal (transformed) distances of the events. Although appealing through simplicity, these classical tests do not adjust for the underlying population nor can they account for a distance decay of interaction. We propose to use the framework of an endemic-epidemic point process model to jointly estimate a background event rate explained by seasonal and areal characteristics, as well as a superposed epidemic component representing the hypothesis of interest. We illustrate this new model-based test for space-time interaction by analysing psychiatric inpatient admissions in Zurich, Switzerland (2007-2012). Several socio-economic factors were found to be associated with the admission rate, but there was no evidence of general clustering of the cases. △ Less

Submitted 2 May, 2016; v1 submitted 30 December, 2015; originally announced December 2015.

Comments: 21 pages including 4 figures and 5 tables; methods are implemented in the R package surveillance (https://CRAN.R-project.org/package=surveillance)

Journal ref: Spatial and Spatio-temporal Epidemiology 17, 15-25 (2016)

arXiv:1512.01065 [pdf, other]

doi 10.1093/biostatistics/kxw051

Incorporating social contact data in spatio-temporal models for infectious disease spread

Authors: Sebastian Meyer, Leonhard Held

Abstract: Routine public health surveillance of notifiable infectious diseases gives rise to weekly counts of reported cases -- possibly stratified by region and/or age group. We investigate how an age-structured social contact matrix can be incorporated into a spatio-temporal endemic-epidemic model for infectious disease counts. To illustrate the approach, we analyze the spread of norovirus gastroenteritis… ▽ More Routine public health surveillance of notifiable infectious diseases gives rise to weekly counts of reported cases -- possibly stratified by region and/or age group. We investigate how an age-structured social contact matrix can be incorporated into a spatio-temporal endemic-epidemic model for infectious disease counts. To illustrate the approach, we analyze the spread of norovirus gastroenteritis over 6 age groups within the 12 districts of Berlin, 2011-2015, using contact data from the POLYMOD study. The proposed age-structured model outperforms alternative scenarios with homogeneous or no mixing between age groups. An extended contact model suggests a power transformation of the survey-based contact matrix towards more within-group transmission. △ Less

Submitted 17 November, 2016; v1 submitted 3 December, 2015; originally announced December 2015.

Comments: accepted manuscript; 14 pages, including 4 figures and 1 table

Journal ref: Biostatistics (2017); 18(2):338-351

arXiv:1411.0416 [pdf, other]

doi 10.18637/jss.v077.i11

Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance

Authors: Sebastian Meyer, Leonhard Held, Michael Höhle

Abstract: The availability of geocoded health data and the inherent temporal structure of communicable diseases have led to an increased interest in statistical models and software for spatio-temporal data with epidemic features. The open source R package surveillance can handle various levels of aggregation at which infective events have been recorded: individual-level time-stamped geo-referenced data (cas… ▽ More The availability of geocoded health data and the inherent temporal structure of communicable diseases have led to an increased interest in statistical models and software for spatio-temporal data with epidemic features. The open source R package surveillance can handle various levels of aggregation at which infective events have been recorded: individual-level time-stamped geo-referenced data (case reports) in either continuous space or discrete space, as well as counts aggregated by period and region. For each of these data types, the surveillance package implements tools for visualization, likelihoood inference and simulation from recently developed statistical regression frameworks capturing endemic and epidemic dynamics. Altogether, this paper is a guide to the spatio-temporal modeling of epidemic phenomena, exemplified by analyses of public health surveillance data on measles and invasive meningococcal disease. △ Less

Submitted 6 November, 2015; v1 submitted 3 November, 2014; originally announced November 2014.

Comments: 53 pages, 20 figures, package homepage: http://surveillance.r-forge.r-project.org/

MSC Class: 62-04 ACM Class: G.3

Journal ref: Journal of Statistical Software (2017); 77 (11): 1-55

arXiv:1312.4797 [pdf, ps, other]

Sensitivity analysis for Bayesian hierarchical models

Authors: Malgorzata Roos, Thiago G. Martins, Leonhard Held, Havard Rue

Abstract: Prior sensitivity examination plays an important role in applied Bayesian analyses. This is especially true for Bayesian hierarchical models, where interpretability of the parameters within deeper layers in the hierarchy becomes challenging. In addition, lack of information together with identifiability issues may imply that the prior distributions for such models have an undesired influence on th… ▽ More Prior sensitivity examination plays an important role in applied Bayesian analyses. This is especially true for Bayesian hierarchical models, where interpretability of the parameters within deeper layers in the hierarchy becomes challenging. In addition, lack of information together with identifiability issues may imply that the prior distributions for such models have an undesired influence on the posterior inference. Despite its relevance, informal approaches to prior sensitivity analysis are currently used. They require repetitive re-runs of the model with ad-hoc modified base prior parameter values. Other formal approaches to prior sensitivity analysis suffer from a lack of popularity in practice, mainly due to their high computational cost and absence of software implementation. We propose a novel formal approach to prior sensitivity analysis which is fast and accurate. It quantifies sensitivity without the need for a model re-run. We develop a ready-to-use priorSens package in R for routine prior sensitivity investigation by R-INLA. Throughout a series of examples we show how our approach can be used to detect high prior sensitivities of some parameters as well as identifiability issues in possibly over-parametrized Bayesian hierarchical models. △ Less

Submitted 17 December, 2013; originally announced December 2013.

Comments: 25 pages, 3 figures

arXiv:1308.6780 [pdf, ps, other]

doi 10.1214/14-STS510

Approximate Bayesian Model Selection with the Deviance Statistic

Authors: Leonhard Held, Daniel Sabanés Bové, Isaac Gravestock

Abstract: Bayesian model selection poses two main challenges: the specification of parameter priors for all models, and the computation of the resulting Bayes factors between models. There is now a large literature on automatic and objective parameter priors in the linear model. One important class are $g$-priors, which were recently extended from linear to generalized linear models (GLMs). We show that the… ▽ More Bayesian model selection poses two main challenges: the specification of parameter priors for all models, and the computation of the resulting Bayes factors between models. There is now a large literature on automatic and objective parameter priors in the linear model. One important class are $g$-priors, which were recently extended from linear to generalized linear models (GLMs). We show that the resulting Bayes factors can be approximated by test-based Bayes factors (Johnson [Scand. J. Stat. 35 (2008) 354-368]) using the deviance statistics of the models. To estimate the hyperparameter $g$, we propose empirical and fully Bayes approaches and link the former to minimum Bayes factors and shrinkage estimates from the literature. Furthermore, we describe how to approximate the corresponding posterior distribution of the regression coefficients based on the standard GLM output. We illustrate the approach with the development of a clinical prediction model for 30-day survival in the GUSTO-I trial using logistic regression. △ Less

Submitted 18 August, 2015; v1 submitted 30 August, 2013; originally announced August 2013.

Comments: Published at http://dx.doi.org/10.1214/14-STS510 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS510

Journal ref: Statistical Science 2015, Vol. 30, No. 2, 242-257

arXiv:1308.5115 [pdf, ps, other]

doi 10.1214/14-AOAS743

Power-law models for infectious disease spread

Authors: Sebastian Meyer, Leonhard Held

Abstract: Short-time human travel behaviour can be described by a power law with respect to distance. We incorporate this information in space-time models for infectious disease surveillance data to better capture the dynamics of disease spread. Two previously established model classes are extended, which both decompose disease risk additively into endemic and epidemic components: a spatio-temporal point pr… ▽ More Short-time human travel behaviour can be described by a power law with respect to distance. We incorporate this information in space-time models for infectious disease surveillance data to better capture the dynamics of disease spread. Two previously established model classes are extended, which both decompose disease risk additively into endemic and epidemic components: a spatio-temporal point process model for individual-level data and a multivariate time-series model for aggregated count data. In both frameworks, a power-law decay of spatial interaction is embedded into the epidemic component and estimated jointly with all other unknown parameters using (penalised) likelihood inference. Whereas the power law can be based on Euclidean distance in the point process model, a novel formulation is proposed for count data where the power law depends on the order of the neighbourhood of discrete spatial units. The performance of the new approach is investigated by a reanalysis of individual cases of invasive meningococcal disease in Germany (2002-2008) and count data on influenza in 140 administrative districts of Southern Germany (2001-2008). In both applications, the power law substantially improves model fit and predictions, and is reasonably close to alternative qualitative formulations, where distance and order of neighbourhood, respectively, are treated as a factor. Implementation in the R package surveillance allows the approach to be applied in other settings. △ Less

Submitted 24 November, 2014; v1 submitted 23 August, 2013; originally announced August 2013.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS743 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS743

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 3, 1612-1639

arXiv:1302.3065 [pdf, ps, other]

Bayesian analysis of measurement error models using INLA

Authors: Stefanie Muff, Andrea Riebler, Havard Rue, Philippe Saner, Leonhard Held

Abstract: To account for measurement error (ME) in explanatory variables, Bayesian approaches provide a flexible framework, as expert knowledge about unobserved covariates can be incorporated in the prior distributions. However, given the analytic intractability of the posterior distribution, model inference so far has to be performed via time-consuming and complex Markov chain Monte Carlo implementations.… ▽ More To account for measurement error (ME) in explanatory variables, Bayesian approaches provide a flexible framework, as expert knowledge about unobserved covariates can be incorporated in the prior distributions. However, given the analytic intractability of the posterior distribution, model inference so far has to be performed via time-consuming and complex Markov chain Monte Carlo implementations. In this paper we extend the Integrated nested Laplace approximations (INLA) approach to formulate Gaussian ME models in generalized linear mixed models. We present three applications, and show how parameter estimates are obtained for common ME models, such as the classical and Berkson error model including heteroscedastic variances. To illustrate the practical feasibility, R-code is provided. △ Less

Submitted 16 August, 2013; v1 submitted 13 February, 2013; originally announced February 2013.

Comments: 37 pages, 10 figures

arXiv:1108.3520 [pdf, ps, other]

Mixtures of g-Priors for Generalised Additive Model Selection with Penalised Splines

Authors: Daniel Sabanés Bové, Leonhard Held, Göran Kauermann

Abstract: We propose an objective Bayesian approach to the selection of covariates and their penalised splines transformations in generalised additive models. Specification of a reasonable default prior for the model parameters and combination with a multiplicity-correction prior for the models themselves is crucial for this task. Here we use well-studied and well-behaved continuous mixtures of g-priors as… ▽ More We propose an objective Bayesian approach to the selection of covariates and their penalised splines transformations in generalised additive models. Specification of a reasonable default prior for the model parameters and combination with a multiplicity-correction prior for the models themselves is crucial for this task. Here we use well-studied and well-behaved continuous mixtures of g-priors as default priors. We introduce the methodology in the normal model and extend it to non-normal exponential families. A simulation study and an application from the literature illustrate the proposed approach. An efficient implementation is available in the R-package "hypergsplines". △ Less

Submitted 20 August, 2012; v1 submitted 17 August, 2011; originally announced August 2011.

Comments: 34 pages, 2 figures, 5 tables

arXiv:1108.0606 [pdf, ps, other]

doi 10.1214/11-AOAS498

Estimation and extrapolation of time trends in registry data---Borrowing strength from related populations

Authors: Andrea Riebler, Leonhard Held, Håvard Rue

Abstract: To analyze and project age-specific mortality or morbidity rates age-period-cohort (APC) models are very popular. Bayesian approaches facilitate estimation and improve predictions by assigning smoothing priors to age, period and cohort effects. Adjustments for overdispersion are straightforward using additional random effects. When rates are further stratified, for example, by countries, multivari… ▽ More To analyze and project age-specific mortality or morbidity rates age-period-cohort (APC) models are very popular. Bayesian approaches facilitate estimation and improve predictions by assigning smoothing priors to age, period and cohort effects. Adjustments for overdispersion are straightforward using additional random effects. When rates are further stratified, for example, by countries, multivariate APC models can be used, where differences of stratum-specific effects are interpretable as log relative risks. Here, we incorporate correlated stratum-specific smoothing priors and correlated overdispersion parameters into the multivariate APC model, and use Markov chain Monte Carlo and integrated nested Laplace approximations for inference. Compared to a model without correlation, the new approach may lead to more precise relative risk estimates, as shown in an application to chronic obstructive pulmonary disease mortality in three regions of England and Wales. Furthermore, the imputation of missing data for one particular stratum may be improved, since the new approach takes advantage of the remaining strata if the corresponding observations are available there. This is shown in an application to female mortality in Denmark, Sweden and Norway from the 20th century, where we treat for each country in turn either the first or second half of the observations as missing and then impute the omitted data. The projections are compared to those obtained from a univariate APC model and an extended Lee--Carter demographic forecasting approach using the proper Dawid--Sebastiani scoring rule. △ Less

Submitted 20 March, 2012; v1 submitted 2 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/11-AOAS498 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS498

Journal ref: Annals of Applied Statistics 2012, Vol. 6, No. 1, 304-333

arXiv:1008.1550 [pdf, other]

doi 10.1214/11-BA615

Hyper-g Priors for Generalized Linear Models

Authors: Daniel Sabanés Bové, Leonhard Held

Abstract: We develop an extension of the classical Zellner's g-prior to generalized linear models. The prior on the hyperparameter g is handled in a flexible way, so that any continuous proper hyperprior f(g) can be used, giving rise to a large class of hyper-g priors. Connections with the literature are described in detail. A fast and accurate integrated Laplace approximation of the marginal likelihood mak… ▽ More We develop an extension of the classical Zellner's g-prior to generalized linear models. The prior on the hyperparameter g is handled in a flexible way, so that any continuous proper hyperprior f(g) can be used, giving rise to a large class of hyper-g priors. Connections with the literature are described in detail. A fast and accurate integrated Laplace approximation of the marginal likelihood makes inference in large model spaces feasible. For posterior parameter estimation we propose an efficient and tuning-free Metropolis-Hastings sampler. The methodology is illustrated with variable selection and automatic covariate transformation in the Pima Indians diabetes data set. △ Less

Submitted 9 August, 2010; originally announced August 2010.

Comments: 30 pages, 12 figures, poster contribution at ISBA 2010

Journal ref: Published in Bayesian Analysis (2011) volume 6, number 3, pages 387-410

Showing 1–47 of 47 results for author: Held, L