Search | arXiv e-print repository

arXiv:2406.14182 [pdf, other]

Averaging polyhazard models using Piecewise deterministic Monte Carlo with applications to data with long-term survivors

Authors: Luke Hardcastle, Samuel Livingstone, Gianluca Baio

Abstract: Polyhazard models are a class of flexible parametric models for modelling survival over extended time horizons. Their additive hazard structure allows for flexible, non-proportional hazards whose characteristics can change over time while retaining a parametric form, which allows for survival to be extrapolated beyond the observation period of a study. Significant user input is required, however,… ▽ More Polyhazard models are a class of flexible parametric models for modelling survival over extended time horizons. Their additive hazard structure allows for flexible, non-proportional hazards whose characteristics can change over time while retaining a parametric form, which allows for survival to be extrapolated beyond the observation period of a study. Significant user input is required, however, in selecting the number of latent hazards to model, their distributions and the choice of which variables to associate with each hazard. The resulting set of models is too large to explore manually, limiting their practical usefulness. Motivated by applications to stroke survivor and kidney transplant patient survival times we extend the standard polyhazard model through a prior structure allowing for joint inference of parameters and structural quantities, and develop a sampling scheme that utilises state-of-the-art Piecewise Deterministic Markov Processes to sample from the resulting transdimensional posterior with minimal user tuning. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 22 pages, 9 figures

arXiv:2401.13820 [pdf, other]

A Bayesian hierarchical mixture cure modelling framework to utilize multiple survival datasets for long-term survivorship estimates: A case study from previously untreated metastatic melanoma

Authors: Nathan Green, Murat Kurt, Andriy Moshyk, James Larkin, Gianluca Baio

Abstract: Time to an event of interest over a lifetime is a central measure of the clinical benefit of an intervention used in a health technology assessment (HTA). Within the same trial multiple end-points may also be considered. For example, overall and progression-free survival time for different drugs in oncology studies. A common challenge is when an intervention is only effective for some proportion o… ▽ More Time to an event of interest over a lifetime is a central measure of the clinical benefit of an intervention used in a health technology assessment (HTA). Within the same trial multiple end-points may also be considered. For example, overall and progression-free survival time for different drugs in oncology studies. A common challenge is when an intervention is only effective for some proportion of the population who are not clinically identifiable. Therefore, latent group membership as well as separate survival models for groups identified need to be estimated. However, follow-up in trials may be relatively short leading to substantial censoring. We present a general Bayesian hierarchical framework that can handle this complexity by exploiting the similarity of cure fractions between end-points; accounting for the correlation between them and improving the extrapolation beyond the observed data. Assuming exchangeability between cure fractions facilitates the borrowing of information between end-points. We show the benefits of using our approach with a motivating example, the CheckMate 067 phase 3 trial consisting of patients with metastatic melanoma treated with first line therapy. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2306.15525 [pdf, other]

Bayesian Interrupted Time Series for evaluating policy change on mental well-being: an application to England's welfare reform

Authors: Connor Gascoigne, Marta Blangiardo, Ze**g Shao, Annie Jeffery, Sara Geneletti, James Kirkbride, Gianluca Baio

Abstract: Factors contributing to social inequalities are also associated with negative mental health outcomes leading to disparities in mental well-being. We propose a Bayesian hierarchical model which can evaluate the impact of policies on population well-being, accounting for spatial/temporal dependencies. Building on an interrupted time series framework, our approach can evaluate how different profiles… ▽ More Factors contributing to social inequalities are also associated with negative mental health outcomes leading to disparities in mental well-being. We propose a Bayesian hierarchical model which can evaluate the impact of policies on population well-being, accounting for spatial/temporal dependencies. Building on an interrupted time series framework, our approach can evaluate how different profiles of individuals are affected in different ways, whilst accounting for their uncertainty. We apply the framework to assess the impact of the United Kingdoms welfare reform, which took place throughout the 2010s, on mental well-being using data from the UK Household Longitudinal Study. The additional depth of knowledge is essential for effective evaluation of current policy and implementation of future policy. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 13 pages, 5 figures, 2 tables

arXiv:2305.08651 [pdf, ps, other]

Methodological considerations for novel approaches to covariate-adjusted indirect treatment comparisons

Authors: Antonio Remiro-Azócar, Anna Heath, Gianluca Baio

Abstract: We examine four important considerations in the development of covariate adjustment methodologies for indirect treatment comparisons. Firstly, we consider potential advantages of weighting versus outcome modeling, placing focus on bias-robustness. Secondly, we outline why model-based extrapolation may be required and useful, in the specific context of indirect treatment comparisons with limited ov… ▽ More We examine four important considerations in the development of covariate adjustment methodologies for indirect treatment comparisons. Firstly, we consider potential advantages of weighting versus outcome modeling, placing focus on bias-robustness. Secondly, we outline why model-based extrapolation may be required and useful, in the specific context of indirect treatment comparisons with limited overlap. Thirdly, we describe challenges for covariate adjustment based on data-adaptive outcome modeling. Finally, we offer further perspectives on the promise of doubly-robust covariate adjustment frameworks. △ Less

Submitted 26 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 8 pages, discussion paper, counter-response to Vo (2023) accepted by Research Synthesis Methods

arXiv:2305.08284 [pdf, other]

Model-based standardization using multiple imputation

Authors: Antonio Remiro-Azócar, Anna Heath, Gianluca Baio

Abstract: When studying the association between treatment and a clinical outcome, a parametric multivariable model of the conditional outcome expectation is often used to adjust for covariates. The treatment coefficient of the outcome model targets a conditional treatment effect. Model-based standardization is typically applied to average the model predictions over the target covariate distribution, and gen… ▽ More When studying the association between treatment and a clinical outcome, a parametric multivariable model of the conditional outcome expectation is often used to adjust for covariates. The treatment coefficient of the outcome model targets a conditional treatment effect. Model-based standardization is typically applied to average the model predictions over the target covariate distribution, and generate a covariate-adjusted estimate of the marginal treatment effect. The standard approach to model-based standardization involves maximum-likelihood estimation and use of the non-parametric bootstrap. We introduce a novel, general-purpose, model-based standardization method based on multiple imputation that is easily applicable when the outcome model is a generalized linear model. We term our proposed approach multiple imputation marginalization (MIM). MIM consists of two main stages: the generation of synthetic datasets and their analysis. MIM accommodates a Bayesian statistical framework, which naturally allows for the principled propagation of uncertainty, integrates the analysis into a probabilistic framework, and allows for the incorporation of prior evidence. We conduct a simulation study to benchmark the finite-sample performance of MIM in conjunction with a parametric outcome model. The simulations provide proof-of-principle in scenarios with binary outcomes, continuous-valued covariates, a logistic outcome model and the marginal log odds ratio as the target effect measure. When parametric modeling assumptions hold, MIM yields unbiased estimation in the target covariate distribution, valid coverage rates, and similar precision and efficiency than the standard approach to model-based standardization. △ Less

Submitted 6 January, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

Comments: 23 pages, 2 figures, 1 table. Revised manuscript re-submitted to BMC Medical Research Methodology. See the ancillary files for Additional file 1

arXiv:2211.11119 [pdf, other]

Counterfactual Learning with Multioutput Deep Kernels

Authors: Alberto Caron, Gianluca Baio, Ioanna Manolopoulou

Abstract: In this paper, we address the challenge of performing counterfactual inference with observational data via Bayesian nonparametric regression adjustment, with a focus on high-dimensional settings featuring multiple actions and multiple correlated outcomes. We present a general class of counterfactual multi-task deep kernels models that estimate causal effects and learn policies proficiently thanks… ▽ More In this paper, we address the challenge of performing counterfactual inference with observational data via Bayesian nonparametric regression adjustment, with a focus on high-dimensional settings featuring multiple actions and multiple correlated outcomes. We present a general class of counterfactual multi-task deep kernels models that estimate causal effects and learn policies proficiently thanks to their sample efficiency gains, while scaling well with high dimensions. In the first part of the work, we rely on Structural Causal Models (SCM) to formally introduce the setup and the problem of identifying counterfactual quantities under observed confounding. We then discuss the benefits of tackling the task of causal effects estimation via stacked coregionalized Gaussian Processes and Deep Kernels. Finally, we demonstrate the use of the proposed methods on simulated experiments that span individual causal effects estimation, off-policy evaluation and optimization. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2206.14047 [pdf, other]

A Bayesian hierarchical model for improving exercise rehabilitation in mechanically ventilated ICU patients

Authors: Luke Hardcastle, Samuel Livingstone, Claire Black, Federico Ricciardi, Gianluca Baio

Abstract: Patients who are mechanically ventilated in the intensive care unit (ICU) participate in exercise as a component of their rehabilitation to ameliorate the long-term impact of critical illness on their physical function. The effective implementation of these programmes is hindered, however, by the lack of a scientific method for quantifying an individual patient's exercise intensity level in real t… ▽ More Patients who are mechanically ventilated in the intensive care unit (ICU) participate in exercise as a component of their rehabilitation to ameliorate the long-term impact of critical illness on their physical function. The effective implementation of these programmes is hindered, however, by the lack of a scientific method for quantifying an individual patient's exercise intensity level in real time, which results in a broad one-size-fits-all approach to rehabilitation and sub-optimal patient outcomes. In this work we have developed a Bayesian hierarchical model with temporally correlated latent Gaussian processes to predict $\dot VO_2$, a physiological measure of exercise intensity, using readily available physiological data. Inference was performed using Integrated Nested Laplace Approximation. For practical use by clinicians $\dot VO_2$ was classified into exercise intensity categories. Internal validation using leave-one-patient-out cross-validation was conducted based on these classifications, and the role of probabilistic statements describing the classification uncertainty was investigated. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 15 pages + 8 pages of supporting information, 12 figures, 7 tables

arXiv:2206.10261 [pdf, other]

Interpretable Deep Causal Learning for Moderation Effects

Authors: Alberto Caron, Gianluca Baio, Ioanna Manolopoulou

Abstract: In this extended abstract paper, we address the problem of interpretability and targeted regularization in causal machine learning models. In particular, we focus on the problem of estimating individual causal/treatment effects under observed confounders, which can be controlled for and moderate the effect of the treatment on the outcome of interest. Black-box ML models adjusted for the causal set… ▽ More In this extended abstract paper, we address the problem of interpretability and targeted regularization in causal machine learning models. In particular, we focus on the problem of estimating individual causal/treatment effects under observed confounders, which can be controlled for and moderate the effect of the treatment on the outcome of interest. Black-box ML models adjusted for the causal setting perform generally well in this task, but they lack interpretable output identifying the main drivers of treatment heterogeneity and their functional relationship. We propose a novel deep counterfactual learning architecture for estimating individual treatment effects that can simultaneously: i) convey targeted regularization on, and produce quantify uncertainty around the quantity of interest (i.e., the Conditional Average Treatment Effect); ii) disentangle baseline prognostic and moderating effects of the covariates and output interpretable score functions describing their relationship with the outcome. Finally, we demonstrate the use of the method via a simple simulated experiment. △ Less

Submitted 11 July, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.00154 [pdf, other]

Blended Survival Curves: A New Approach to Extrapolation for Time-to-Event Outcomes from Clinical Trial in Health Technology Assessment

Authors: Zhao**g Che, Nathan Green, Gianluca Baio

Abstract: Background Survival extrapolation is essential in the cost-effectiveness analysis to quantify the lifetime survival benefit associated with a new intervention, due to the restricted duration of randomized controlled trials (RCTs). Current approaches of extrapolation often assume that the treatment effect observed in the trial can continue indefinitely, which is unrealistic and may have a huge impa… ▽ More Background Survival extrapolation is essential in the cost-effectiveness analysis to quantify the lifetime survival benefit associated with a new intervention, due to the restricted duration of randomized controlled trials (RCTs). Current approaches of extrapolation often assume that the treatment effect observed in the trial can continue indefinitely, which is unrealistic and may have a huge impact on decisions for resource allocation. Objective We introduce a novel methodology as a possible solution to alleviate the problem of performing survival extrapolation with heavily censored data from clinical trials. Method The main idea is to mix a flexible model (e.g., Cox semi-parametric) to fit as well as possible the observed data and a parametric model encoding assumptions on the expected behaviour of underlying long-term survival. The two are "blended" into a single survival curve that is identical with the Cox model over the range of observed times and gradually approaching the parametric model over the extrapolation period based on a weight function. The weight function regulates the way two survival curves are blended, determining how the internal and external sources contribute to the estimated survival over time. Results A 4-year follow-up RCT of rituximab in combination with fludarabine and cyclophosphamide v. fludarabine and cyclophosphamide alone for the first-line treatment of chronic lymphocytic leukemia is used to illustrate the method. Conclusion Long-term extrapolation from immature trial data may lead to significantly different estimates with various modelling assumptions. The blending approach provides sufficient flexibility, allowing a wide range of plausible scenarios to be considered as well as the inclusion of genuine external information, based e.g. on hard data or expert opinion. Both internal and external validity can be carefully examined. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: 14 pages, 6 figures, the abstract has been published in Value in Health, Jan 2022 (https://doi.org/10.1016/j.jval.2021.11.049)

arXiv:2203.09901 [pdf, other]

BCEA: An R Package for Cost-Effectiveness Analysis

Authors: Nathan Green, Anna Heath, Gianluca Baio

Abstract: We describe in detail how to perform health economic cost-effectiveness analyses (CEA) using the R package $\textbf{BCEA}$ (Bayesian Cost-Effectiveness Analysis). CEA consist of analytic approaches for combining costs and health consequences of intervention(s). These help to understand how much an intervention may cost (per unit of health gained) compared to an alternative intervention, such as a… ▽ More We describe in detail how to perform health economic cost-effectiveness analyses (CEA) using the R package $\textbf{BCEA}$ (Bayesian Cost-Effectiveness Analysis). CEA consist of analytic approaches for combining costs and health consequences of intervention(s). These help to understand how much an intervention may cost (per unit of health gained) compared to an alternative intervention, such as a control or status quo. For resource allocation, a decision maker may wish to know if an intervention is cost saving, and if not then how much more would it cost to implement it compared to a less effective intervention. Current guidance for cost-effectiveness analyses advocates the quantification of uncertainties which can be represented by random samples obtained from a probability sensitivity analysis or, more efficiently, a Bayesian model. $\textbf{BCEA}$ can be used to post-process the sampled costs and health impacts to perform advanced analyses producing standardised and highly customisable outputs. We present the features of the package, including its many functions and their practical application. $\textbf{BCEA}$ is valuable for statisticians and practitioners working in the field of health economic modelling wanting to simplify and standardise their workflow, for example in the preparation of dossiers in support of marketing authorisation, or academic and scientific publications. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2201.06458 [pdf, other]

A framework for estimating and visualising excess mortality during the COVID-19 pandemic

Authors: Garyfallos Konstantinoudis, Virgilio Gómez-Rubio, Michela Cameletti, Monica Pirani, Gianluca Baio, Marta Blangiardo

Abstract: COVID-19 related deaths underestimate the pandemic burden on mortality because they suffer from completeness and accuracy issues. Excess mortality is a popular alternative, as it compares observed with expected deaths based on the assumption that the pandemic did not occur. Expected deaths had the pandemic not occurred depend on population trends, temperature, and spatio-temporal patterns. In addi… ▽ More COVID-19 related deaths underestimate the pandemic burden on mortality because they suffer from completeness and accuracy issues. Excess mortality is a popular alternative, as it compares observed with expected deaths based on the assumption that the pandemic did not occur. Expected deaths had the pandemic not occurred depend on population trends, temperature, and spatio-temporal patterns. In addition to this, high geographical resolution is required to examine within country trends and the effectiveness of the different public health policies. In this tutorial, we propose a framework using R to estimate and visualise excess mortality at high geographical resolution. We show a case study estimating excess deaths during 2020 in Italy. The proposed framework is fast to implement and allows combining different models and presenting the results in any age, sex, spatial and temporal aggregation desired. This makes it particularly powerful and appealing for online monitoring of the pandemic burden and timely policy making. △ Less

Submitted 6 March, 2023; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: 14 pages, 7 figures

arXiv:2108.12208 [pdf, other]

doi 10.1002/jrsm.1565

Parametric G-computation for Compatible Indirect Treatment Comparisons with Limited Individual Patient Data

Authors: Antonio Remiro-Azócar, Anna Heath, Gianluca Baio

Abstract: Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is based on propensity score weighting, which is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome… ▽ More Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is based on propensity score weighting, which is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome regression-based alternatives can extrapolate but target a conditional treatment effect that is incompatible in the indirect comparison. When adjusting for covariates, one must integrate or average the conditional estimate over the relevant population to recover a compatible marginal treatment effect. We propose a marginalization method based on parametric G-computation that can be easily applied where the outcome regression is a generalized linear model or a Cox model. The approach views the covariate adjustment regression as a nuisance model and separates its estimation from the evaluation of the marginal treatment effect of interest. The method can accommodate a Bayesian statistical framework, which naturally integrates the analysis into a probabilistic framework. A simulation study provides proof-of-principle and benchmarks the method's performance against MAIC and the conventional outcome regression. Parametric G-computation achieves more precise and more accurate estimates than MAIC, particularly when covariate overlap is poor, and yields unbiased marginal treatment effect estimates under no failures of assumptions. Furthermore, the marginalized regression-adjusted estimates provide greater precision and accuracy than the conditional estimates produced by the conventional outcome regression, which are systematically biased because the measure of effect is non-collapsible. △ Less

Submitted 10 May, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 31 pages, 4 figures, 1 Table (19 additional pages in the Supplementary Material). This is the journal version of some of the research in the working paper arXiv:2008.05951. Accepted for publication by Research Synthesis Methods

Journal ref: Research Synthesis Methods, 13(6), pp. 716-744, 2022

arXiv:2102.06573 [pdf, other]

Shrinkage Bayesian Causal Forests for Heterogeneous Treatment Effects Estimation

Authors: Alberto Caron, Gianluca Baio, Ioanna Manolopoulou

Abstract: This paper develops a sparsity-inducing version of Bayesian Causal Forests, a recently proposed nonparametric causal regression model that employs Bayesian Additive Regression Trees and is specifically designed to estimate heterogeneous treatment effects using observational data. The sparsity-inducing component we introduce is motivated by empirical studies where not all the available covariates a… ▽ More This paper develops a sparsity-inducing version of Bayesian Causal Forests, a recently proposed nonparametric causal regression model that employs Bayesian Additive Regression Trees and is specifically designed to estimate heterogeneous treatment effects using observational data. The sparsity-inducing component we introduce is motivated by empirical studies where not all the available covariates are relevant, leading to different degrees of sparsity underlying the surfaces of interest in the estimation of individual treatment effects. The extended version presented in this work, which we name Shrinkage Bayesian Causal Forest, is equipped with an additional pair of priors allowing the model to adjust the weight of each covariate through the corresponding number of splits in the tree ensemble. These priors improve the model's adaptability to sparse data generating processes and allow to perform fully Bayesian feature shrinkage in a framework for treatment effects estimation, and thus to uncover the moderating factors driving heterogeneity. In addition, the method allows prior knowledge about the relevant confounding covariates and the relative magnitude of their impact on the outcome to be incorporated in the model. We illustrate the performance of our method in simulated studies, in comparison to Bayesian Causal Forest and other state-of-the-art models, to demonstrate how it scales up with an increasing number of covariates and how it handles strongly confounded scenarios. Finally, we also provide an example of application using real-world data. △ Less

Submitted 16 November, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

arXiv:2012.05127 [pdf, ps, other]

doi 10.1002/sim.9286

Effect modification in anchored indirect treatment comparisons: Comments on "Matching-adjusted indirect comparisons: Application to time-to-event data"

Authors: Antonio Remiro-Azócar, Anna Heath, Gianluca Baio

Abstract: This commentary regards a recent simulation study conducted by Aouni, Gaudel-Dedieu and Sebastien, evaluating the performance of different versions of matching-adjusted indirect comparison (MAIC) in an anchored scenario with a common comparator. The simulation study uses survival outcomes and the Cox proportional hazards regression as the outcome model. It concludes that using the LASSO for variab… ▽ More This commentary regards a recent simulation study conducted by Aouni, Gaudel-Dedieu and Sebastien, evaluating the performance of different versions of matching-adjusted indirect comparison (MAIC) in an anchored scenario with a common comparator. The simulation study uses survival outcomes and the Cox proportional hazards regression as the outcome model. It concludes that using the LASSO for variable selection is preferable to balancing a maximal set of covariates. However, there are no treatment effect modifiers in imbalance in the study. The LASSO is more efficient because it selects a subset of the maximal set of covariates but there are no cross-study imbalances in effect modifiers inducing bias. We highlight the following points: (1) in the anchored setting, MAIC is necessary where there are cross-trial imbalances in effect modifiers; (2) the standard indirect comparison provides greater precision and accuracy than MAIC if there are no effect modifiers in imbalance; (3) while the target estimand of the simulation study is a conditional treatment effect, MAIC targets a marginal or population-average treatment effect; (4) in MAIC, variable selection is a problem of low dimensionality and sparsity-inducing methods like the LASSO may be problematic. Finally, data-driven approaches do not obviate the necessity for subject matter knowledge when selecting effect modifiers. R code is provided in the Appendix to replicate the analyses and illustrate our points. △ Less

Submitted 4 November, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

Comments: 14 pages, minor changes after conditional acceptance by Statistics in Medicine. This is a response to `Matching-adjusted indirect comparisons: Application to time-to-event data' by Aouni, Gaudel-Dedieu and Sebastien (2020)

Journal ref: Statistics in Medicine, 41(8), pp. 1541-1553, 2022

arXiv:2011.06334 [pdf, ps, other]

doi 10.1002/sim.8857

Conflating marginal and conditional treatment effects: Comments on 'Assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study'

Authors: Antonio Remiro-Azócar, Anna Heath, Gianluca Baio

Abstract: In this commentary, we highlight the importance of: (1) carefully considering and clarifying whether a marginal or conditional treatment effect is of interest in a population-adjusted indirect treatment comparison; and (2) develo** distinct methodologies for estimating the different measures of effect. The appropriateness of each methodology depends on the preferred target of inference. In this commentary, we highlight the importance of: (1) carefully considering and clarifying whether a marginal or conditional treatment effect is of interest in a population-adjusted indirect treatment comparison; and (2) develo** distinct methodologies for estimating the different measures of effect. The appropriateness of each methodology depends on the preferred target of inference. △ Less

Submitted 2 December, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: 6 pages, submitted to Statistics in Medicine. Response to `Assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study' by Phillippo, Dias, Ades and Welton, published in Statistics in Medicine (2020). Updated after Ph.D. proposal defense/transfer viva comments

Journal ref: Statistics in Medicine, 40(11), pp. 2753-2758, 2021

arXiv:2009.06472 [pdf, other]

Estimating Individual Treatment Effects using Non-Parametric Regression Models: a Review

Authors: Alberto Caron, Gianluca Baio, Ioanna Manolopoulou

Abstract: Large observational data are increasingly available in disciplines such as health, economic and social sciences, where researchers are interested in causal questions rather than prediction. In this paper, we examine the problem of estimating heterogeneous treatment effects using non-parametric regression-based methods, starting from an empirical study aimed at investigating the effect of participa… ▽ More Large observational data are increasingly available in disciplines such as health, economic and social sciences, where researchers are interested in causal questions rather than prediction. In this paper, we examine the problem of estimating heterogeneous treatment effects using non-parametric regression-based methods, starting from an empirical study aimed at investigating the effect of participation in school meal programs on health indicators. Firstly, we introduce the setup and the issues related to conducting causal inference with observational or non-fully randomized data, and how these issues can be tackled with the help of statistical learning tools. Then, we review and develop a unifying taxonomy of the existing state-of-the-art frameworks that allow for individual treatment effects estimation via non-parametric regression models. After presenting a brief overview on the problem of model selection, we illustrate the performance of some of the methods on three different simulated studies. We conclude by demonstrating the use of some of the methods on an empirical analysis of the school meal program data. △ Less

Submitted 23 November, 2021; v1 submitted 14 September, 2020; originally announced September 2020.

Comments: 24 pages, 7 figures

arXiv:2008.05951 [pdf, other]

Marginalization of Regression-Adjusted Treatment Effects in Indirect Comparisons with Limited Patient-Level Data

Authors: Antonio Remiro-Azócar, Anna Heath, Gianluca Baio

Abstract: Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome regression-based alternatives can extrapolate… ▽ More Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome regression-based alternatives can extrapolate but target a conditional treatment effect that is incompatible in the indirect comparison. When adjusting for covariates, one must integrate or average the conditional estimate over the population of interest to recover a compatible marginal treatment effect. We propose a marginalization method based on parametric G-computation that can be easily applied where the outcome regression is a generalized linear model or a Cox model. In addition, we introduce a novel general-purpose method based on multiple imputation, which we term multiple imputation marginalization (MIM) and is applicable to a wide range of models. Both methods can accommodate a Bayesian statistical framework, which naturally integrates the analysis into a probabilistic framework. A simulation study provides proof-of-principle for the methods and benchmarks their performance against MAIC and the conventional outcome regression. The marginalized outcome regression approaches achieve more precise and more accurate estimates than MAIC, particularly when covariate overlap is poor, and yield unbiased marginal treatment effect estimates under no failures of assumptions. Furthermore, the marginalized regression-adjusted estimates provide greater precision and accuracy than the conditional estimates produced by the conventional outcome regression, which are systematically biased because the measure of effect is non-collapsible. △ Less

Submitted 11 May, 2022; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: 87 pages (28 of supplementary appendices and references), 5 figures. Final update for consistency with a related paper (2108.12208) accepted for publication by Research Synthesis Methods (DOI: https://doi.org/10.1002/jrsm.1565). arXiv admin note: text overlap with arXiv:2004.14800

arXiv:2004.14800 [pdf, other]

doi 10.1002/jrsm.1511

Methods for Population Adjustment with Limited Access to Individual Patient Data: A Review and Simulation Study

Authors: Antonio Remiro-Azócar, Anna Heath, Gianluca Baio

Abstract: Population-adjusted indirect comparisons estimate treatment effects when access to individual patient data is limited and there are cross-trial differences in effect modifiers. Popular methods include matching-adjusted indirect comparison (MAIC) and simulated treatment comparison (STC). There is limited formal evaluation of these methods and whether they can be used to accurately compare treatment… ▽ More Population-adjusted indirect comparisons estimate treatment effects when access to individual patient data is limited and there are cross-trial differences in effect modifiers. Popular methods include matching-adjusted indirect comparison (MAIC) and simulated treatment comparison (STC). There is limited formal evaluation of these methods and whether they can be used to accurately compare treatments. Thus, we undertake a comprehensive simulation study to compare standard unadjusted indirect comparisons, MAIC and STC across 162 scenarios. This simulation study assumes that the trials are investigating survival outcomes and measure continuous covariates, with the log hazard ratio as the measure of effect. MAIC yields unbiased treatment effect estimates under no failures of assumptions. The typical usage of STC produces bias because it targets a conditional treatment effect where the target estimand should be a marginal treatment effect. The incompatibility of estimates in the indirect comparison leads to bias as the measure of effect is non-collapsible. Standard indirect comparisons are systematically biased, particularly under stronger covariate imbalance and interaction effects. Standard errors and coverage rates are often valid in MAIC but the robust sandwich variance estimator underestimates variability where effective sample sizes are small. Interval estimates for the standard indirect comparison are too narrow and STC suffers from bias-induced undercoverage. MAIC provides the most accurate estimates and, with lower degrees of covariate overlap, its bias reduction outweighs the loss in effective sample size and precision under no failures of assumptions. An important future objective is the development of an alternative formulation to STC that targets a marginal treatment effect. △ Less

Submitted 2 June, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

Comments: 73 pages (34 are supplementary appendices and references), 8 figures, 2 tables. Full article (following Round 4 of minor revisions). arXiv admin note: text overlap with arXiv:2008.05951

Journal ref: Research Synthesis Methods, 12(6), pp. 750-775, 2021

arXiv:2003.11862 [pdf, other]

Dirichlet Process Mixture Models for Regression Discontinuity Designs

Authors: Federico Ricciardi, Silvia Liverani, Gianluca Baio

Abstract: The Regression Discontinuity Design (RDD) is a quasi-experimental design that estimates the causal effect of a treatment when its assignment is defined by a threshold value for a continuous assignment variable. The RDD assumes that subjects with measurements within a bandwidth around the threshold belong to a common population, so that the threshold can be seen as a randomising device assigning tr… ▽ More The Regression Discontinuity Design (RDD) is a quasi-experimental design that estimates the causal effect of a treatment when its assignment is defined by a threshold value for a continuous assignment variable. The RDD assumes that subjects with measurements within a bandwidth around the threshold belong to a common population, so that the threshold can be seen as a randomising device assigning treatment to those falling just above the threshold and withholding it from those who fall just below. Bandwidth selection represents a compelling decision for the RDD analysis as the results may be highly sensitive to its choice. A number of methods to select the optimal bandwidth, mainly originating from the econometric literature, have been proposed. However, their use in practice is limited. We propose a methodology that, tackling the problem from an applied point of view, consider units' exchangeability, i.e., their similarity with respect to measured covariates, as the main criteria to select subjects for the analysis, irrespectively of their distance from the threshold. We carry out clustering on the sample using a Dirichlet process mixture model to identify balanced and homogeneous clusters. Our proposal exploits the posterior similarity matrix, which contains the pairwise probabilities that two observations are allocated to the same cluster in the MCMC sample. Thus we include in the RDD analysis only those clusters for which we have stronger evidence of exchangeability. We illustrate the validity of our methodology with both a simulated experiment and a motivating example on the effect of statins to lower cholesterol level, using UK primary care data. △ Less

Submitted 26 March, 2020; originally announced March 2020.

arXiv:1910.03368 [pdf]

Computing the Expected Value of Sample Information Efficiently: Expertise and Skills Required for Four Model-Based Methods

Authors: Natalia R. Kunst, Edward Wilson, Fernando Alarid-Escudero, Gianluca Baio, Alan Brennan, Michael Fairley, David Glynn, Jeremy D. Goldhaber-Fiebert, Chris Jackson, Hawre Jalal, Nicolas A. Menzies, Mark Strong, Howard Thom, Anna Heath

Abstract: Objectives: Value of information (VOI) analyses can help policy-makers make informed decisions about whether to conduct and how to design future studies. Historically, a computationally expensive method to compute the Expected Value of Sample Information (EVSI) restricted the use of VOI to simple decision models and study designs. Recently, four EVSI approximation methods have made such analyses m… ▽ More Objectives: Value of information (VOI) analyses can help policy-makers make informed decisions about whether to conduct and how to design future studies. Historically, a computationally expensive method to compute the Expected Value of Sample Information (EVSI) restricted the use of VOI to simple decision models and study designs. Recently, four EVSI approximation methods have made such analyses more feasible and accessible. We provide practical recommendations for analysts computing EVSI by evaluating these novel methods. Methods: Members of the Collaborative Network for Value of Information (ConVOI) compared the inputs, analyst's expertise and skills, and software required for four recently developed approximation methods. Information was also collected on the strengths and limitations of each approximation method. Results: All four EVSI methods require a decision-analytic model's probabilistic sensitivity analysis (PSA) output. One of the methods also requires the model to be re-run to obtain new PSA outputs for each EVSI estimation. To compute EVSI, analysts must be familiar with at least one of the following skills: advanced regression modeling, likelihood specification, and Bayesian modeling. All methods have different strengths and limitations, e.g., some methods handle evaluation of study designs with more outcomes more efficiently while others quantify uncertainty in EVSI estimates. All methods are programmed in the statistical language R and two of the methods provide online applications. Conclusion: Our paper helps to inform the choice between four efficient EVSI estimation methods, enabling analysts to assess the methods' strengths and limitations and select the most appropriate EVSI method given their situation and skills. △ Less

Submitted 8 October, 2019; originally announced October 2019.

arXiv:1905.12013 [pdf, other]

doi 10.1177/0272989X20912402

Calculating the Expected Value of Sample Information in Practice: Considerations from Three Case Studies

Authors: Anna Heath, Natalia R. Kunst, Christopher Jackson, Mark Strong, Fernando Alarid-Escudero, Jeremy D. Goldhaber-Fiebert, Gianluca Baio, Nicolas A. Menzies, Hawre Jalal

Abstract: Investing efficiently in future research to improve policy decisions is an important goal. Expected Value of Sample Information (EVSI) can be used to select the specific design and sample size of a proposed study by assessing the benefit of a range of different studies. Estimating EVSI with the standard nested Monte Carlo algorithm has a notoriously high computational burden, especially when using… ▽ More Investing efficiently in future research to improve policy decisions is an important goal. Expected Value of Sample Information (EVSI) can be used to select the specific design and sample size of a proposed study by assessing the benefit of a range of different studies. Estimating EVSI with the standard nested Monte Carlo algorithm has a notoriously high computational burden, especially when using a complex decision model or when optimizing over study sample sizes and designs. Therefore, a number of more efficient EVSI approximation methods have been developed. However, these approximation methods have not been compared and therefore their relative advantages and disadvantages are not clear. A consortium of EVSI researchers, including the developers of several approximation methods, compared four EVSI methods using three previously published health economic models. The examples were chosen to represent a range of real-world contexts, including situations with multiple study outcomes, missing data, and data from an observational rather than a randomized study. The computational speed and accuracy of each method were compared, and the relative advantages and implementation challenges of the methods were highlighted. In each example, the approximation methods took minutes or hours to achieve reasonably accurate EVSI estimates, whereas the traditional Monte Carlo method took weeks. Specific methods are particularly suited to problems where we wish to compare multiple proposed sample sizes, when the proposed sample size is large, or when the health economic model is computationally expensive. All the evaluated methods gave estimates similar to those given by traditional Monte Carlo, suggesting that EVSI can now be efficiently computed with confidence in realistic examples. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Comments: 11 pages, 3 figures

Journal ref: Medical Decision Making (2020) Volume: 40 issue: 3, page(s): 314-326

arXiv:1807.01623 [pdf, other]

Modeling outcomes of soccer matches

Authors: Alkeos Tsokos, Santhosh Narayanan, Ioannis Kosmidis, Gianluca Baio, Mihai Cucuringu, Gavin Whitaker, Franz J. Király

Abstract: We compare various extensions of the Bradley-Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley-Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarch… ▽ More We compare various extensions of the Bradley-Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley-Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarchical Poisson log-linear model are approximated using integrated nested Laplace approximations. The prediction performance of the various modeling approaches is assessed using a novel, context-specific framework for temporal validation that is found to deliver accurate estimates of the test error. The direct modeling of outcomes via the various Bradley-Terry extensions and the modeling of match scores using the hierarchical Poisson log-linear model demonstrate similar behavior in terms of predictive performance. △ Less

Submitted 3 August, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

arXiv:1805.07149 [pdf, other]

Joint longitudinal models for dealing with missing at random data in trial-based economic evaluations

Authors: Andrea Gabrio, Rachael Hunter, Alexina J. Mason, Gianluca Baio

Abstract: Health economic evaluations based on patient-level data collected alongside clinical trials~(e.g. health related quality of life and resource use measures) are an important component of the process which informs resource allocation decisions. Almost inevitably, the analysis is complicated by the fact that some individuals drop out from the study, which causes their data to be unobserved at some ti… ▽ More Health economic evaluations based on patient-level data collected alongside clinical trials~(e.g. health related quality of life and resource use measures) are an important component of the process which informs resource allocation decisions. Almost inevitably, the analysis is complicated by the fact that some individuals drop out from the study, which causes their data to be unobserved at some time point. Current practice performs the evaluation by handling the missing data at the level of aggregated variables (e.g. QALYs), which are obtained by combining the economic data over the duration of the study, and are often conducted under a missing at random (MAR) assumption. However, this approach may lead to incorrect inferences since it ignores the longitudinal nature of the data and may end up discarding a considerable amount of observations from the analysis. We propose the use of joint longitudinal models to extend standard cost-effectiveness analysis methods by taking into account the longitudinal structure and incorporate all available data to improve the estimation of the targeted quantities under MAR. Our approach is compared to popular missingness approaches in trial-based analyses, motivated by an exploratory simulation study, and applied to data from two real case studies. △ Less

Submitted 22 May, 2020; v1 submitted 18 May, 2018; originally announced May 2018.

arXiv:1805.07147 [pdf, other]

A Bayesian Parametric Approach to Handle Missing Longitudinal Outcome Data in Trial-Based Health Economic Evaluations

Authors: Andrea Gabrio, Michael J. Daniels, Gianluca Baio

Abstract: Trial-based economic evaluations are typically performed on cross-sectional variables, derived from the responses for only the completers in the study, using methods that ignore the complexities of utility and cost data (e.g. skewness and spikes). We present an alternative and more efficient Bayesian parametric approach to handle missing longitudinal outcomes in economic evaluations, while account… ▽ More Trial-based economic evaluations are typically performed on cross-sectional variables, derived from the responses for only the completers in the study, using methods that ignore the complexities of utility and cost data (e.g. skewness and spikes). We present an alternative and more efficient Bayesian parametric approach to handle missing longitudinal outcomes in economic evaluations, while accounting for the complexities of the data. We specify a flexible parametric model for the observed data and partially identify the distribution of the missing data with partial identifying restrictions and sensitivity parameters. We explore alternative nonignorable scenarios through different priors for the sensitivity parameters, calibrated on the observed data. Our approach is motivated by, and applied to, data from a trial assessing the cost-effectiveness of a new treatment for intellectual disability and challenging behaviour. △ Less

Submitted 18 May, 2018; originally announced May 2018.

arXiv:1804.09590 [pdf, other]

Estimating the Expected Value of Sample Information across Different Sample Sizes using Moment Matching and Non-Linear Regression

Authors: Anna Heath, Ioanna Manolopoulou, Gianluca Baio

Abstract: Background: The Expected Value of Sample Information (EVSI) determines the economic value of any future study with a specific design aimed at reducing uncertainty in a health economic model. This has potential as a tool for trial design; the cost and value of different designs could be compared to find the trial with the greatest net benefit. However, despite recent developments, EVSI analysis can… ▽ More Background: The Expected Value of Sample Information (EVSI) determines the economic value of any future study with a specific design aimed at reducing uncertainty in a health economic model. This has potential as a tool for trial design; the cost and value of different designs could be compared to find the trial with the greatest net benefit. However, despite recent developments, EVSI analysis can be slow especially when optimising over a large number of different designs. Methods: This paper develops a method to reduce the computation time required to calculate the EVSI across different sample sizes. Our method extends the moment matching approach to EVSI estimation to optimise over different sample sizes for the underlying trial with a similar computational cost to a single EVSI estimate. This extension calculates posterior variances across the alternative sample sizes and then uses Bayesian non-linear regression to calculate the EVSI. Results: A health economic model developed to assess the cost-effectiveness of interventions for chronic pain demonstrates that this EVSI calculation method is fast and accurate for realistic models. This example also highlights how different trial designs can be compared using the EVSI. Conclusion: The proposed estimation method is fast and accurate when calculating the EVSI across different sample sizes. This will allow researchers to realise the potential of using the EVSI to determine an economically optimal trial design for reducing uncertainty in health economic models. Limitations: Our method relies on some additional simulation, which can be expensive in models with very large computational cost. △ Less

Submitted 25 April, 2018; originally announced April 2018.

arXiv:1801.09541 [pdf, other]

A Full Bayesian Model to Handle Structural Ones and Missingness in Economic Evaluations from Individual-Level Data

Authors: Andrea Gabrio, Alexina J. Mason, Gianluca Baio

Abstract: Economic evaluations from individual-level data are an important component of the process of technology appraisal, with a view to informing resource allocation decisions. A critical problem in these analyses is that both effectiveness and cost data typically present some complexity (e.g. non normality, spikes and missingness) that should be addressed using appropriate methods. However, in routine… ▽ More Economic evaluations from individual-level data are an important component of the process of technology appraisal, with a view to informing resource allocation decisions. A critical problem in these analyses is that both effectiveness and cost data typically present some complexity (e.g. non normality, spikes and missingness) that should be addressed using appropriate methods. However, in routine analyses, simple standardised approaches are typically used, possibly leading to biased inferences. We present a general Bayesian framework that can handle the complexity. We show the benefits of using our approach with a motivating example, the MenSS trial, for which there are spikes at one in the effectiveness and missingness in both outcomes. We contrast a set of increasingly complex models and perform sensitivity analysis to assess the robustness of the conclusions to a range of plausible missingness assumptions. This paper highlights the importance of adopting a comprehensive modelling approach to economic evaluations and the strategic advantages of building these complex models within a Bayesian framework. △ Less

Submitted 30 January, 2018; v1 submitted 29 January, 2018; originally announced January 2018.

Comments: 10 pages, 7 figures, 2 tables

arXiv:1709.02319 [pdf]

Calculating the Expected Value of Sample Information using Efficient Nested Monte Carlo: A Tutorial

Authors: Anna Heath, Gianluca Baio

Abstract: Objective: The Expected Value of Sample Information (EVSI) quantifies the economic benefit of reducing uncertainty in a health economic model by collecting additional information. This has the potential to improve the allocation of research budgets. Despite this, practical EVSI evaluations are limited, partly due to the computational cost of estimating this value using the "gold-standard" nested s… ▽ More Objective: The Expected Value of Sample Information (EVSI) quantifies the economic benefit of reducing uncertainty in a health economic model by collecting additional information. This has the potential to improve the allocation of research budgets. Despite this, practical EVSI evaluations are limited, partly due to the computational cost of estimating this value using the "gold-standard" nested simulation methods. Recently, however, Heath et al developed an estimation procedure that reduces the number of simulations required for this "gold-standard" calculation. Up to this point, this new method has been presented in purely technical terms. Study Design: This study presents the practical application of this new method to aid its implementation. We use a worked example to illustrate the key steps of the EVSI estimation procedure before discussing its optimal implementation using a practical health economic model. Methods: The worked example is based on a three parameter linear health economic model. The more realistic model evaluates the cost-effectiveness of a new chemotherapy treatment which aims to reduce the number of side effects experienced by patients. We use a Markov Model structure to evaluate the health economic profile of experiencing side effects. Results: This EVSI estimation method offers accurate estimation within a feasible computation time, seconds compared to days, even for more complex model structures. The EVSI estimation is more accurate if a greater number of nested samples are used, even for a fixed computational cost. Conclusions: This new method reduces the computational cost of estimating the EVSI by nested simulation. △ Less

Submitted 25 April, 2018; v1 submitted 7 September, 2017; originally announced September 2017.

arXiv:1607.08169 [pdf, ps, other]

Bayesian modelling for binary outcomes in the Regression Discontinuity Design

Authors: Sara Geneletti, Federico Ricciardi, Aidan O'Keeffe, Gianluca Baio

Abstract: The Regression Discontinuity (RD) design is a quasi-experimental design which emulates a randomised study by exploiting situations where treatment is assigned according to a continuous variable as is common in many drug treatment guidelines. The RD design literature focuses principally on continuous outcomes. In this paper we exploit the link between the RD design and instrumental variables to obt… ▽ More The Regression Discontinuity (RD) design is a quasi-experimental design which emulates a randomised study by exploiting situations where treatment is assigned according to a continuous variable as is common in many drug treatment guidelines. The RD design literature focuses principally on continuous outcomes. In this paper we exploit the link between the RD design and instrumental variables to obtain a causal effect estimator, the risk ratio for the treated (RRT), for the RD design when the outcome is binary. Occasionally the RRT estimator can give negative lower confindence bounds. In the Bayesian framework we impose prior constraints that prevent this from happening. This is novel and cannot be easily reproduced in a frequentist framework. We compare our estimators to those based on estimating equation and generalized methods of moments methods. Based on extensive simulations our methods compare favourably with both methods. We apply our method on a real example to estimate the effect of statins on the probability of Low-density Lipoprotein (LDL) cholesterol levels reaching recommended levels. △ Less

Submitted 27 July, 2016; originally announced July 2016.

Comments: 29 pages, 4 figures

arXiv:1607.06447 [pdf, other]

Handling Missing Data in Within-Trial Cost-Effectiveness Analysis: a Review with Future Guidelines

Authors: Andrea Gabrio, Alexina Mason, Gianluca Baio

Abstract: Cost-Effectiveness Analyses (CEAs) alongside randomised controlled trials (RCTs) are increasingly often designed to collect resource use and preference-based health status data for the purpose of healthcare technology assessment. However, because of the way these measures are collected, they are prone to missing data, which can ultimately affect the decision of whether an intervention is good valu… ▽ More Cost-Effectiveness Analyses (CEAs) alongside randomised controlled trials (RCTs) are increasingly often designed to collect resource use and preference-based health status data for the purpose of healthcare technology assessment. However, because of the way these measures are collected, they are prone to missing data, which can ultimately affect the decision of whether an intervention is good value for money. We examine how missing cost and effect outcome data are handled in RCT-based CEAs, complementing a previous review (covering 2003-2009, 88 articles) with a new systematic review (2009-2015, 81 articles) focussing on two different perspectives. First, we review the description of the missing data, the statistical methods used to deal with them, and the quality of the judgement underpinning the choice of these methods. Second, we provide guidelines on how the information about missingness and related methods should be presented to improve the reporting and handling of missing data. Our review shows that missing data in within-RCT CEAs are still often inadequately handled and the overall level of information provided to support the chosen methods is rarely satisfactory. △ Less

Submitted 21 July, 2016; originally announced July 2016.

Comments: 13 pages, 5 figures, 1 table, references omitted

arXiv:1512.06881 [pdf, other]

doi doi.org/10.1186/s12874-018-0541-7

A dynamic Bayesian Markov model for health economic evaluations of interventions in infectious disease

Authors: Katrin Haeussler, Ardo van den Hout, Gianluca Baio

Abstract: Background. Health economic evaluations of interventions against infectious diseases are commonly based on the predictions of ordinary differential equation (ODE) systems or Markov models (MMs). Standard MMs are static, whereas ODE systems are usually dynamic and account for herd immunity which is crucial to prevent overestimation of infection prevalence. Complex ODE systems including probabilisti… ▽ More Background. Health economic evaluations of interventions against infectious diseases are commonly based on the predictions of ordinary differential equation (ODE) systems or Markov models (MMs). Standard MMs are static, whereas ODE systems are usually dynamic and account for herd immunity which is crucial to prevent overestimation of infection prevalence. Complex ODE systems including probabilistic model parameters are computationally intensive. Thus, mainly ODE-based models including deterministic parameters are presented in the literature. These do not account for parameter uncertainty. As a consequence, probabilistic sensitivity analysis (PSA), a crucial component of health economic evaluations, cannot be conducted straightforwardly. Methods. We present a dynamic MM under a Bayesian framework. We extend a static MM by incorporating the force of infection into the state allocation algorithm. The corresponding output is based on dynamic changes in prevalence and thus accounts for herd immunity. In contrast to deterministic ODE-based models, PSA can be conducted straightforwardly. We introduce a case study of a fictional sexually transmitted infection and compare our dynamic Bayesian MM to a deterministic and a Bayesian ODE system. The models are calibrated to time series data. Results. By means of the case study, we show that our methodology produces outcome which is comparable to the "gold standard" of the Bayesian ODE system. Conclusions. In contrast to ODE systems in the literature, the setting of the dynamic MM is probabilistic at manageable computational effort (including calibration). The run time of the Bayesian ODE system is 44 times longer. △ Less

Submitted 1 September, 2018; v1 submitted 21 December, 2015; originally announced December 2015.

Journal ref: BMC Medical Research Methodology(2018) 18:82

arXiv:1508.00129 [pdf, other]

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models

Authors: William Barcella, Maria De Iorio, Gianluca Baio

Abstract: Dirichlet Process Mixture (DPM) models have been increasingly employed to specify random partition models that take into account possible patterns within the covariates. Furthermore, to deal with large numbers of covariates, methods for selecting the most important covariates have been proposed. Commonly, the covariates are chosen either for their importance in determining the clustering of the ob… ▽ More Dirichlet Process Mixture (DPM) models have been increasingly employed to specify random partition models that take into account possible patterns within the covariates. Furthermore, to deal with large numbers of covariates, methods for selecting the most important covariates have been proposed. Commonly, the covariates are chosen either for their importance in determining the clustering of the observations or for their effect on the level of a response variable (when a regression model is specified). Typically both strategies involve the specification of latent indicators that regulate the inclusion of the covariates in the model. Common examples involve the use of spike and slab prior distributions. In this work we review the most relevant DPM models that include covariate information in the induced partition of the observations and we focus on available variable selection techniques for these models. We highlight the main features of each model and demonstrate them in simulations and in a real data application. △ Less

Submitted 29 October, 2016; v1 submitted 1 August, 2015; originally announced August 2015.

Comments: 26 pages, 5 figures

arXiv:1507.02513 [pdf, other]

A Review of Methods for the Analysis of the Expected Value of Information

Authors: Anna Heath, Ioanna Manolopoulou, Gianluca Baio

Abstract: Over recent years Value of Information analysis has become more widespread in health-economic evaluations, specifically as a tool to perform Probabilistic Sensitivity Analysis. This is largely due to methodological advancements allowing for the fast computation of a typical summary known as the Expected Value of Partial Perfect Information (EVPPI). A recent review discussed some estimations method… ▽ More Over recent years Value of Information analysis has become more widespread in health-economic evaluations, specifically as a tool to perform Probabilistic Sensitivity Analysis. This is largely due to methodological advancements allowing for the fast computation of a typical summary known as the Expected Value of Partial Perfect Information (EVPPI). A recent review discussed some estimations method for calculating the EVPPI but as the research has been active over the intervening years this review does not discuss some key estimation methods. Therefore, this paper presents a comprehensive review of these new methods. We begin by providing the technical details of these computation methods. We then present a case study in order to compare the estimation performance of these new methods. We conclude that the most recent development based on non-parametric regression offers the best method for calculating the EVPPI efficiently. This means that the EVPPI can now be used practically in health economic evaluations, especially as all the methods are developed in parallel with R △ Less

Submitted 9 July, 2015; originally announced July 2015.

arXiv:1504.05436 [pdf, other]

doi 10.1002/sim.6983

Estimating the Expected Value of Partial Perfect Information in Health Economic Evaluations using Integrated Nested Laplace Approximation

Authors: Anna Heath, Ioanna Manolopoulou, Gianluca Baio

Abstract: The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the "cost" of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Mo… ▽ More The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the "cost" of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional Gaussian Processes, often substantially. We demonstrate that the EVPPI calculated using our method for Gaussian Process regression is in line with the standard Gaussian Process regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. △ Less

Submitted 4 November, 2016; v1 submitted 21 April, 2015; originally announced April 2015.

arXiv:1501.03537 [pdf, other]

Variable Selection in Covariate Dependent Random Partition Models: an Application to Urinary Tract Infection

Authors: William Barcella, Maria De Iorio, Gianluca Baio, James Malone-Lee

Abstract: Lower urinary tract symptoms (LUTS) can indicate the presence of urinary tract infection (UTI), a condition that if it becomes chronic requires expensive and time consuming care as well as leading to reduced quality of life. Detecting the presence and gravity of an infection from the earliest symptoms is then highly valuable. Typically, white blood cell count (WBC) measured in a sample of urine is… ▽ More Lower urinary tract symptoms (LUTS) can indicate the presence of urinary tract infection (UTI), a condition that if it becomes chronic requires expensive and time consuming care as well as leading to reduced quality of life. Detecting the presence and gravity of an infection from the earliest symptoms is then highly valuable. Typically, white blood cell count (WBC) measured in a sample of urine is used to assess UTI. We consider clinical data from 1341 patients at their first visit in which UTI (i.e. WBC$\geq 1$) is diagnosed. In addition, for each patient, a clinical profile of 34 symptoms was recorded. In this paper we propose a Bayesian nonparametric regression model based on the Dirichlet Process (DP) prior aimed at providing the clinicians with a meaningful clustering of the patients based on both the WBC (response variable) and possible patterns within the symptoms profiles (covariates). This is achieved by assuming a probability model for the symptoms as well as for the response variable. To identify the symptoms most associated to UTI, we specify a spike and slab base measure for the regression coefficients: this induces dependence of symptoms selection on cluster assignment. Posterior inference is performed through Markov Chain Monte Carlo methods. △ Less

Submitted 10 July, 2015; v1 submitted 14 January, 2015; originally announced January 2015.

Comments: Revised version. 24 pages, 6 figures

arXiv:1403.1806 [pdf, other]

Bayesian regression discontinuity designs: Incorporating clinical knowledge in the causal analysis of primary care data

Authors: Sara Geneletti, Aidan G. O'Keeffe, Linda D. Sharples, Sylvia Richardson, Gianluca Baio

Abstract: The regression discontinuity (RD) design is a quasi-experimental design that estimates the causal effects of a treatment by exploiting naturally occurring treatment rules. It can be applied in any context where a particular treatment or intervention is administered according to a pre-specified rule linked to a continuous variable. Such thresholds are common in primary care drug prescription where… ▽ More The regression discontinuity (RD) design is a quasi-experimental design that estimates the causal effects of a treatment by exploiting naturally occurring treatment rules. It can be applied in any context where a particular treatment or intervention is administered according to a pre-specified rule linked to a continuous variable. Such thresholds are common in primary care drug prescription where the RD design can be used to estimate the causal effect of medication in the general population. Such results can then be contrasted to those obtained from randomised controlled trials (RCTs) and inform prescription policy and guidelines based on a more realistic and less expensive context. In this paper we focus on statins, a class of cholesterol-lowering drugs, however, the methodology can be applied to many other drugs provided these are prescribed in accordance to pre-determined guidelines. NHS guidelines state that statins should be prescribed to patients with 10 year cardiovascular disease risk scores in excess of 20%. If we consider patients whose scores are close to this threshold we find that there is an element of random variation in both the risk score itself and its measurement. We can thus consider the threshold a randomising device assigning the prescription to units just above the threshold and withholds it from those just below. Thus we are effectively replicating the conditions of an RCT in the area around the threshold, removing or at least mitigating confounding. We frame the RD design in the language of conditional independence which clarifies the assumptions necessary to apply it to data, and which makes the links with instrumental variables clear. We also have context specific knowledge about the expected sizes of the effects of statin prescription and are thus able to incorporate this into Bayesian models by formulating informative priors on our causal parameters. △ Less

Submitted 7 March, 2014; originally announced March 2014.

Comments: 21 pages, 5 figures, 2 tables

arXiv:1308.6312 [pdf, ps, other]

Evidence of bias in the Eurovision song contest: modelling the votes using Bayesian hierarchical models

Authors: Marta Blangiardo, Gianluca Baio

Abstract: The Eurovision Song Contest is an annual musical competition held among active members of the European Broadcasting Union since 1956. The event is televised live across Europe. Each participating country presents a song and receive a vote based on a combination of tele-voting and jury. Over the years, this has led to speculations of tactical voting, discriminating against some participants and thu… ▽ More The Eurovision Song Contest is an annual musical competition held among active members of the European Broadcasting Union since 1956. The event is televised live across Europe. Each participating country presents a song and receive a vote based on a combination of tele-voting and jury. Over the years, this has led to speculations of tactical voting, discriminating against some participants and thus inducing bias in the final results. In this paper we investigate the presence of positive or negative bias (which may roughly indicate favouritisms or discrimination) in the votes based on geographical proximity, migration and cultural characteristics of the participating countries through a Bayesian hierarchical model. Our analysis found no evidence of negative bias, although mild positive bias does seem to emerge systematically, linking voters to performers. △ Less

Submitted 28 August, 2013; originally announced August 2013.

Comments: 16 pages, 3 figures

Showing 1–36 of 36 results for author: Baio, G