-
How to relate potential outcomes: Estimating individual treatment effects under a given specified partial correlation
Authors:
Mingyang Cai,
Stef van Buuren,
Gerko Vink
Abstract:
In most medical research, the average treatment effect is used to evaluate a treatment's performance. However, precision medicine requires knowledge of individual treatment effects: What is the difference between a unit's measurement under treatment and control conditions? In most treatment effect studies, such answers are not possible because the outcomes under both experimental conditions are no…
▽ More
In most medical research, the average treatment effect is used to evaluate a treatment's performance. However, precision medicine requires knowledge of individual treatment effects: What is the difference between a unit's measurement under treatment and control conditions? In most treatment effect studies, such answers are not possible because the outcomes under both experimental conditions are not jointly observed. This makes the problem of causal inference a missing data problem. We propose to solve this problem by imputing the individual potential outcomes under a specified partial correlation (SPC), thereby allowing for heterogeneous treatment effects. We demonstrate in simulation that our proposed methodology yields valid inferences for the marginal distribution of potential outcomes. We highlight that the posterior distribution of individual treatment effects varies with different specified partial correlations. This property can be used to study the sensitivity of optimal treatment outcomes under different correlation specifications. In a practical example on HIV-1 treatment data, we demonstrate that the proposed methodology generalises to real-world data. Imputing under the SPC, therefore, opens up a wealth of possibilities for studying heterogeneous treatment effects on incomplete data and the further adaptation of individual treatment effects.
△ Less
Submitted 27 August, 2022;
originally announced August 2022.
-
Joint distribution properties of Fully Conditional Specification under the normal linear model with normal inverse-gamma priors
Authors:
Mingyang Cai,
Stef van Buuren,
Gerko Vink
Abstract:
Fully conditional specification (FCS) is a convenient and flexible multiple imputation approach. It specifies a sequence of simple regression models instead of a potential complex joint density for missing variables. However, FCS may not converge to a stationary distribution. Many authors have studied the convergence properties of FCS when priors of conditional models are non-informative. We exten…
▽ More
Fully conditional specification (FCS) is a convenient and flexible multiple imputation approach. It specifies a sequence of simple regression models instead of a potential complex joint density for missing variables. However, FCS may not converge to a stationary distribution. Many authors have studied the convergence properties of FCS when priors of conditional models are non-informative. We extend to the case of informative priors. This paper evaluates the convergence properties of the normal linear model with normal-inverse gamma prior. The theoretical and simulation results prove the convergence of FCS and show the equivalence of prior specification under the joint model and a set of conditional models when the analysis model is a linear regression with normal inverse-gamma priors.
△ Less
Submitted 27 August, 2022;
originally announced August 2022.
-
Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking
Authors:
Mingyang Cai,
Stef van Buuren,
Gerko Vink
Abstract:
Missing data are often dealt with multiple imputation. A crucial part of the multiple imputation process is selecting sensible models to generate plausible values for incomplete data. A method based on posterior predictive checking is proposed to diagnose imputation models based on posterior predictive checking. To assess the congeniality of imputation models, the proposed diagnostic method compar…
▽ More
Missing data are often dealt with multiple imputation. A crucial part of the multiple imputation process is selecting sensible models to generate plausible values for incomplete data. A method based on posterior predictive checking is proposed to diagnose imputation models based on posterior predictive checking. To assess the congeniality of imputation models, the proposed diagnostic method compares the observed data with their replicates generated under corresponding posterior predictive distributions. If the imputation model is congenial with the substantive model, the observed data are expected to be located in the centre of corresponding predictive posterior distributions. Simulation and application are designed to investigate the proposed diagnostic method for parametric and semi-parametric imputation approaches, continuous and discrete incomplete variables, univariate and multivariate missingness patterns. The results show the validity of the proposed diagnostic method.
△ Less
Submitted 27 August, 2022;
originally announced August 2022.
-
A blended distance to define "people-like-me"
Authors:
Anaïs Fopma,
Mingyang Cai,
Stef van Buuren,
Gerko Vink
Abstract:
Curve matching is a prediction technique that relies on predictive mean matching, which matches donors that are most similar to a target based on the predictive distance. Even though this approach leads to high prediction accuracy, the predictive distance may make matches look unconvincing, as the profiles of the matched donors can substantially differ from the profile of the target. To counterbal…
▽ More
Curve matching is a prediction technique that relies on predictive mean matching, which matches donors that are most similar to a target based on the predictive distance. Even though this approach leads to high prediction accuracy, the predictive distance may make matches look unconvincing, as the profiles of the matched donors can substantially differ from the profile of the target. To counterbalance this, similarity between the curves of the donors and the target can be taken into account by combining the predictive distance with the Mahalanobis distance into a `blended distance' measure. The properties of this measure are evaluated in two simulation studies. Simulation study I evaluates the performance of the blended distance under different data-generating conditions. The results show that blending towards the Mahalanobis distance leads to worse performance in terms of bias, coverage, and predictive power. Simulation study II evaluates the blended metric in a setting where a single value is imputed. The results show that a property of blending is the bias-variance trade off. Giving more weight to the Mahalanobis distance leads to less variance in the imputations, but less accuracy as well. The main conclusion is that the high prediction accuracy achieved with the predictive distance necessitates the variability in the profiles of donors.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Missing the Point: Non-Convergence in Iterative Imputation Algorithms
Authors:
Hanne Ida Oberman,
Stef van Buuren,
Gerko Vink
Abstract:
Iterative imputation is a popular tool to accommodate missing data. While it is widely accepted that valid inferences can be obtained with this technique, these inferences all rely on algorithmic convergence. There is no consensus on how to evaluate the convergence properties of the method. Our study provides insight into identifying non-convergence in iterative imputation algorithms. We found tha…
▽ More
Iterative imputation is a popular tool to accommodate missing data. While it is widely accepted that valid inferences can be obtained with this technique, these inferences all rely on algorithmic convergence. There is no consensus on how to evaluate the convergence properties of the method. Our study provides insight into identifying non-convergence in iterative imputation algorithms. We found that--in the cases considered--inferential validity was achieved after five to ten iterations, much earlier than indicated by diagnostic methods. We conclude that it never hurts to iterate longer, but such calculations hardly bring added value.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Pooling multiple imputations when the sample happens to be the population
Authors:
Gerko Vink,
Stef van Buuren
Abstract:
Current pooling rules for multiply imputed data assume infinite populations. In some situations this assumption is not feasible as every unit in the population has been observed, potentially leading to over-covered population estimates. We simplify the existing pooling rules for situations where the sampling variance is not of interest. We compare these rules to the conventional pooling rules and…
▽ More
Current pooling rules for multiply imputed data assume infinite populations. In some situations this assumption is not feasible as every unit in the population has been observed, potentially leading to over-covered population estimates. We simplify the existing pooling rules for situations where the sampling variance is not of interest. We compare these rules to the conventional pooling rules and demonstrate their use in a situation where there is no sampling variance. Using the standard pooling rules in situations where sampling variance should not be considered, leads to overestimation of the variance of the estimates of interest, especially when the amount of missingness is not very large. As a result, populations estimates are over-covered, which may lead to a loss of statistical power. We conclude that the theory of multiple imputation can be extended to the situation where the sample happens to be the population. The simplified pooling rules can be easily implemented to obtain valid inference in cases where we have observed essentially all units and in simulation studies addressing the missingness mechanism only.
△ Less
Submitted 30 September, 2014;
originally announced September 2014.