-
Replicability of Simulation Studies for the Investigation of Statistical Methods: The RepliSims Project
Authors:
K. Luijken,
A. Lohmann,
U. Alter,
J. Claramunt Gonzalez,
F. J. Clouth,
J. L. Fossum,
L. Hesen,
A. H. J. Huizing,
J. Ketelaar,
A. K. Montoya,
L. Nab,
R. C. C. Nijman,
B. B. L. Penning de Vries,
T. D. Tibbe,
Y. A. Wang,
R. H. H. Groenwold
Abstract:
Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their re…
▽ More
Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their replicability was assessed by teams of replicators with formal training in quantitative methodology. The teams found relevant information in the original publications and used it to write simulation code with the aim of replicating the results. The primary outcome was the feasibility of replicability based on reported information in the original publications. Replicability varied greatly: Some original studies provided detailed information leading to almost perfect replication of results, whereas other studies did not provide enough information to implement any of the reported simulations. Replicators had to make choices regarding missing or ambiguous information in the original studies, error handling, and software environment. Factors facilitating replication included public availability of code, and descriptions of the data-generating procedure and methods in graphs, formulas, structured text, and publicly accessible additional resources such as technical reports. Replicability of statistical simulation studies was mainly impeded by lack of information and sustainability of information sources. Reproducibility could be achieved for simulation studies by providing open code and data as a supplement to the publication. Additionally, simulation studies should be transparently reported with all relevant information either in the research paper itself or in easily accessible supplementary material to allow for replicability.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Identification of causal effects in case-control studies
Authors:
Bas B. L. Penning de Vries,
Rolf H. H. Groenwold
Abstract:
Case-control designs are an important tool in contrasting the effects of well-defined treatments. In this paper, we reconsider classical concepts, assumptions and principles and explore when the results of case-control studies can be endowed a causal interpretation. Our focus is on identification of target causal quantities, or estimands. We cover various estimands relating to intention-to-treat o…
▽ More
Case-control designs are an important tool in contrasting the effects of well-defined treatments. In this paper, we reconsider classical concepts, assumptions and principles and explore when the results of case-control studies can be endowed a causal interpretation. Our focus is on identification of target causal quantities, or estimands. We cover various estimands relating to intention-to-treat or per-protocol effects for popular sampling schemes (case-base, survivor, and risk-set sampling), each with and without matching. Our approach may inform future research on different estimands, other variations of the case-control design or settings with additional complexities.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
A weighting method for simultaneous adjustment for confounding and joint exposure-outcome misclassifications
Authors:
Bas B. L. Penning de Vries,
Maarten van Smeden,
Rolf H. H. Groenwold
Abstract:
Joint misclassification of exposure and outcome variables can lead to considerable bias in epidemiological studies of causal exposure-outcome effects. In this paper, we present a new maximum likelihood based estimator for the marginal causal odd-ratio that simultaneously adjusts for confounding and several forms of joint misclassification of the exposure and outcome variables. The proposed method…
▽ More
Joint misclassification of exposure and outcome variables can lead to considerable bias in epidemiological studies of causal exposure-outcome effects. In this paper, we present a new maximum likelihood based estimator for the marginal causal odd-ratio that simultaneously adjusts for confounding and several forms of joint misclassification of the exposure and outcome variables. The proposed method relies on validation data for the construction of weights that account for both sources of bias. The weighting estimator, which is an extension of the exposure misclassification weighting estimator proposed by Gravel and Platt (Statistics in Medicine, 2018), is applied to reinfarction data. Simulation studies were carried out to study its finite sample properties and compare it with methods that do not account for confounding or misclassification. The new estimator showed favourable large sample properties in the simulations. Further research is needed to study the sensitivity of the proposed method and that of alternatives to violations of their assumptions. The implementation of the estimator is facilitated by a new R function in an existing R package.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
Propensity score estimation using classification and regression trees in the presence of missing covariate data
Authors:
Bas B. L. Penning de Vries,
Maarten van Smeden,
Rolf H. H. Groenwold
Abstract:
Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree…
▽ More
Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree fitting and provide propensity score estimates for all subjects. Based on theoretical considerations, we argue that the automatic handling of missing data by CART may however not be appropriate. Using a series of simulation experiments, we examined the performance of different approaches to handling missing covariate data; (i) applying the CART algorithm directly to the (partially) incomplete data, (ii) complete case analysis, and (iii) multiple imputation. Performance was assessed in terms of bias in estimating exposure-outcome effects \add{among the exposed}, standard error, mean squared error and coverage. Applying the CART algorithm directly to incomplete data resulted in bias, even in scenarios where data were missing completely at random. Overall, multiple imputation followed by CART resulted in the best performance. Our study showed that automatic handling of missing data in CART can cause serious bias and does not outperform multiple imputation as a means to account for missing data.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.