Search | arXiv e-print repository

Sensitivity Analysis for Attributable Effects in Case$^2$ Studies

Authors: Kan Chen, Ting Ye, Dylan S. Small

Abstract: The case$^2$ study, also referred to as the case-case study design, is a valuable approach for conducting inference for treatment effects. Unlike traditional case-control studies, the case$^2$ design compares treatment in two types of cases with the same disease. A key quantity of interest is the attributable effect, which is the number of cases of disease among treated units which are caused by t… ▽ More The case$^2$ study, also referred to as the case-case study design, is a valuable approach for conducting inference for treatment effects. Unlike traditional case-control studies, the case$^2$ design compares treatment in two types of cases with the same disease. A key quantity of interest is the attributable effect, which is the number of cases of disease among treated units which are caused by the treatment. Two key assumptions that are usually made for making inferences about the attributable effect in case$^2$ studies are 1.) treatment does not cause the second type of case, and 2.) the treatment does not alter an individual's case type. However, these assumptions are not realistic in many real-data applications. In this article, we present a sensitivity analysis framework to scrutinize the impact of deviations from these assumptions on obtained results. We also include sensitivity analyses related to the assumption of unmeasured confounding, recognizing the potential bias introduced by unobserved covariates. The proposed methodology is exemplified through an investigation into whether having violent behavior in the last year of life increases suicide risk via 1993 National Mortality Followback Survey dataset. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 25 pages, 2 Figures, 4 Tables

arXiv:2403.13260 [pdf, other]

A Bayesian Approach for Selecting Relevant External Data (BASE): Application to a study of Long-Term Outcomes in a Hemophilia Gene Therapy Trial

Authors: Tianyu Pan, Xiang Zhang, Weining Shen, Ting Ye

Abstract: Gene therapies aim to address the root causes of diseases, particularly those stemming from rare genetic defects that can be life-threatening or severely debilitating. While there has been notable progress in the development of gene therapies in recent years, understanding their long-term effectiveness remains challenging due to a lack of data on long-term outcomes, especially during the early sta… ▽ More Gene therapies aim to address the root causes of diseases, particularly those stemming from rare genetic defects that can be life-threatening or severely debilitating. While there has been notable progress in the development of gene therapies in recent years, understanding their long-term effectiveness remains challenging due to a lack of data on long-term outcomes, especially during the early stages of their introduction to the market. To address the critical question of estimating long-term efficacy without waiting for the completion of lengthy clinical trials, we propose a novel Bayesian framework. This framework selects pertinent data from external sources, often early-phase clinical trials with more comprehensive longitudinal efficacy data that could lead to an improved inference of the long-term efficacy outcome. We apply this methodology to predict the long-term factor IX (FIX) levels of HEMGENIX (etranacogene dezaparvovec), the first FDA-approved gene therapy to treat adults with severe Hemophilia B, in a phase 3 study. Our application showcases the capability of the framework to estimate the 5-year FIX levels following HEMGENIX therapy, demonstrating sustained FIX levels induced by HEMGENIX infusion. Additionally, we provide theoretical insights into the methodology by establishing its posterior convergence properties. △ Less

Submitted 9 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.00307 [pdf, other]

Debiased Multivariable Mendelian Randomization

Authors: Yinxiang Wu, Hyunseung Kang, Ting Ye

Abstract: Multivariable Mendelian randomization (MVMR) uses genetic variants as instrumental variables to infer the direct effect of multiple exposures on an outcome. Compared to univariable Mendelian randomization, MVMR is less prone to horizontal pleiotropy and enables estimation of the direct effect of each exposure on the outcome. However, MVMR faces greater challenges with weak instruments -- genetic v… ▽ More Multivariable Mendelian randomization (MVMR) uses genetic variants as instrumental variables to infer the direct effect of multiple exposures on an outcome. Compared to univariable Mendelian randomization, MVMR is less prone to horizontal pleiotropy and enables estimation of the direct effect of each exposure on the outcome. However, MVMR faces greater challenges with weak instruments -- genetic variants that are weakly associated with some exposures conditional on the other exposures. This article focuses on MVMR using summary data from genome-wide association studies (GWAS). We provide a new asymptotic regime to analyze MVMR estimators with many weak instruments, allowing for linear combinations of exposures to have different degrees of instrument strength, and formally show that the popular multivariable inverse-variance weighted (MV-IVW) estimator's asymptotic behavior is highly sensitive to instruments' strength. We then propose a multivariable debiased IVW (MV-dIVW) estimator, which effectively reduces the asymptotic bias from weak instruments in MV-IVW, and introduce an adjusted version, MV-adIVW, for improved finite-sample robustness. We establish the theoretical properties of our proposed estimators and extend them to handle balanced horizontal pleiotropy. We conclude by demonstrating the performance of our proposed methods in simulated and real datasets. We implement this method in the R package mr.divw. △ Less

Submitted 23 March, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

arXiv:2306.10213 [pdf, other]

A General Form of Covariate Adjustment in Randomized Clinical Trials

Authors: Marlena S. Bannick, Jun Shao, **gyi Liu, Yu Du, Yanyao Yi, Ting Ye

Abstract: In randomized clinical trials, adjusting for baseline covariates can improve credibility and efficiency for demonstrating and quantifying treatment effects. This article studies the augmented inverse propensity weighted (AIPW) estimator, which is a general form of covariate adjustment that uses linear, generalized linear, and non-parametric or machine learning models for the conditional mean of th… ▽ More In randomized clinical trials, adjusting for baseline covariates can improve credibility and efficiency for demonstrating and quantifying treatment effects. This article studies the augmented inverse propensity weighted (AIPW) estimator, which is a general form of covariate adjustment that uses linear, generalized linear, and non-parametric or machine learning models for the conditional mean of the response given covariates. Under covariate-adaptive randomization, we establish general theorems that show a complete picture of the asymptotic normality, {efficiency gain, and applicability of AIPW estimators}. In particular, we provide for the first time a rigorous theoretical justification of using machine learning methods with cross-fitting for dependent data under covariate-adaptive randomization. Based on the general theorems, we offer insights on the conditions for guaranteed efficiency gain and universal applicability {under different randomization schemes}, which also motivate a joint calibration strategy using some constructed covariates after applying AIPW. Our methods are implemented in the R package RobinCar. △ Less

Submitted 25 March, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2302.10404 [pdf, ps, other]

Robust Variance Estimation for Covariate-Adjusted Unconditional Treatment Effect in Randomized Clinical Trials with Binary Outcomes

Authors: Ting Ye, Marlena Bannick, Yanyao Yi, Jun Shao

Abstract: To improve precision of estimation and power of testing hypothesis for an unconditional treatment effect in randomized clinical trials with binary outcomes, researchers and regulatory agencies recommend using g-computation as a reliable method of covariate adjustment. However, the practical application of g-computation is hindered by the lack of an explicit robust variance formula that can be used… ▽ More To improve precision of estimation and power of testing hypothesis for an unconditional treatment effect in randomized clinical trials with binary outcomes, researchers and regulatory agencies recommend using g-computation as a reliable method of covariate adjustment. However, the practical application of g-computation is hindered by the lack of an explicit robust variance formula that can be used for different unconditional treatment effects of interest. To fill this gap, we provide explicit and robust variance estimators for g-computation estimators and demonstrate through simulations that the variance estimators can be reliably applied in practice. △ Less

Submitted 27 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2302.01269 [pdf, other]

Adjusting for Incomplete Baseline Covariates in Randomized Controlled Trials: A Cross-World Imputation Framework

Authors: Yilin Song, James P. Hughes, Ting Ye

Abstract: In randomized controlled trials, adjusting for baseline covariates is often applied to improve the precision of treatment effect estimation. However, missingness in covariates is common. Recently, Zhao & Ding (2022) studied two simple strategies, the single imputation method and missingness indicator method (MIM), to deal with missing covariates, and showed that both methods can provide efficiency… ▽ More In randomized controlled trials, adjusting for baseline covariates is often applied to improve the precision of treatment effect estimation. However, missingness in covariates is common. Recently, Zhao & Ding (2022) studied two simple strategies, the single imputation method and missingness indicator method (MIM), to deal with missing covariates, and showed that both methods can provide efficiency gain. To better understand and compare these two strategies, we propose and investigate a novel imputation framework termed cross-world imputation (CWI), which includes single imputation and MIM as special cases. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2302.01246 [pdf, other]

Behavioral Carry-Over Effect and Power Consideration in Crossover Trials

Authors: Danni Shi, Ting Ye

Abstract: A crossover trial is an efficient trial design when there is no carry-over effect. To reduce the impact of the biological carry-over effect, a washout period is often designed. However, the carry-over effect remains an outstanding concern when a washout period is unethical or cannot sufficiently diminish the impact of the carry-over effect. The latter can occur in comparative effectiveness researc… ▽ More A crossover trial is an efficient trial design when there is no carry-over effect. To reduce the impact of the biological carry-over effect, a washout period is often designed. However, the carry-over effect remains an outstanding concern when a washout period is unethical or cannot sufficiently diminish the impact of the carry-over effect. The latter can occur in comparative effectiveness research where the carry-over effect is often non-biological but behavioral. In this paper, we investigate the crossover design under a potential outcomes framework with and without the carry-over effect. We find that when the carry-over effect exists and satisfies a sign condition, the basic estimator underestimates the treatment effect, which does not inflate the type I error of one-sided tests but negatively impacts the power. This leads to a power trade-off between the crossover design and the parallel-group design, and we derive the condition under which the crossover design does not lead to type I error inflation and is still more powerful than the parallel-group design. We also develop covariate adjustment methods for crossover trials. We evaluate the performance of cross-over design and covariate adjustment using data from the MTN-034/REACH study. △ Less

Submitted 4 March, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2211.15849 [pdf, other]

Association between author metadata and acceptance: A feature-rich, matched observational study of a corpus of ICLR submissions between 2017-2022

Authors: Chang Chen, Jiayao Zhang, Dan Roth, Ting Ye, Bo Zhang

Abstract: Many recent studies have probed status bias in the peer-review process of academic journals and conferences. In this article, we investigated the association between author metadata and area chairs' final decisions (Accept/Reject) using our compiled database of 5,313 borderline submissions to the International Conference on Learning Representations (ICLR) from 2017 to 2022. We carefully defined el… ▽ More Many recent studies have probed status bias in the peer-review process of academic journals and conferences. In this article, we investigated the association between author metadata and area chairs' final decisions (Accept/Reject) using our compiled database of 5,313 borderline submissions to the International Conference on Learning Representations (ICLR) from 2017 to 2022. We carefully defined elements in a cause-and-effect analysis, including the treatment and its timing, pre-treatment variables, potential outcomes and causal null hypothesis of interest, all in the context of study units being textual data and under Neyman and Rubin's potential outcomes (PO) framework. We found some weak evidence that author metadata was associated with articles' final decisions. We also found that, under an additional stability assumption, borderline articles from high-ranking institutions (top-30% or top-20%) were less favored by area chairs compared to their matched counterparts. The results were consistent in two different matched designs (odds ratio = 0.82 [95% CI: 0.67 to 1.00] in a first design and 0.83 [95% CI: 0.64 to 1.07] in a strengthened design). We discussed how to interpret these results in the context of multiple interactions between a study unit and different agents (reviewers and area chairs) in the peer-review system. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2210.04360 [pdf, other]

A unified analysis of regression adjustment in randomized experiments

Authors: Katarzyna Reluga, Ting Ye, Qingyuan Zhao

Abstract: Regression adjustment is broadly applied in randomized trials under the premise that it usually improves the precision of a treatment effect estimator. However, previous work has shown that this is not always true. To further understand this phenomenon, we develop a unified comparison of the asymptotic variance of a class of linear regression-adjusted estimators. Our analysis is based on the class… ▽ More Regression adjustment is broadly applied in randomized trials under the premise that it usually improves the precision of a treatment effect estimator. However, previous work has shown that this is not always true. To further understand this phenomenon, we develop a unified comparison of the asymptotic variance of a class of linear regression-adjusted estimators. Our analysis is based on the classical theory for linear regression with heteroscedastic errors and thus does not assume that the postulated linear model is correct. For a completely randomized binary treatment, we provide sufficient conditions under which some regression-adjusted estimators are guaranteed to be more asymptotically efficient than others. We explore other settings such as general treatment assignment mechanisms and generalized linear models, and find that the variance dominance phenomenon no longer occurs. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: 17 pages, 1 figure, 2 tables

MSC Class: 62F10; 62J99 ACM Class: G.3

arXiv:2209.10339 [pdf, other]

Structural mean models for instrumented difference-in-differences

Authors: Tat-Thang Vo, Ting Ye, Ashkan Ertefaie, Samrat Roy, James Flory, Sean Hennessy, Stijn Vansteelandt, Dylan S. Small

Abstract: In the standard difference-in-differences research design, the parallel trends assumption may be violated when the relationship between the exposure trend and the outcome trend is confounded by unmeasured confounders. Progress can be made if there is an exogenous variable that (i) does not directly influence the change in outcome means (i.e. the outcome trend) except through influencing the change… ▽ More In the standard difference-in-differences research design, the parallel trends assumption may be violated when the relationship between the exposure trend and the outcome trend is confounded by unmeasured confounders. Progress can be made if there is an exogenous variable that (i) does not directly influence the change in outcome means (i.e. the outcome trend) except through influencing the change in exposure means (i.e. the exposure trend), and (ii) is not related to the unmeasured exposure - outcome confounders on the trend scale. Such exogenous variable is called an instrument for difference-in-differences. For continuous outcomes that lend themselves to linear modelling, so-called instrumented difference-in-differences methods have been proposed. In this paper, we will suggest novel multiplicative structural mean models for instrumented difference-in-differences, which allow one to identify and estimate the average treatment effect on count and rare binary outcomes, in the whole population or among the treated, when a valid instrument for difference-in-differences is available. We discuss the identifiability of these models, then develop efficient semi-parametric estimation approaches that allow the use of flexible, data-adaptive or machine learning methods to estimate the nuisance parameters. We apply our proposal on health care data to investigate the risk of moderate to severe weight gain under sulfonylurea treatment compared to metformin treatment, among new users of antihyperglycemic drugs. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2206.10364 [pdf, other]

Nonparametric identification of causal effects in clustered observational studies with differential selection

Authors: Ting Ye, Ted Westling, Lindsay Page, Luke Keele

Abstract: The clustered observational study (COS) design is the observational study counterpart to the clustered randomized trial. In a COS, a treatment is assigned to intact groups, and all units within the group are exposed to the treatment. However, the treatment is non-randomly assigned. COSs are common in both education and health services research. In education, treatments may be given to all students… ▽ More The clustered observational study (COS) design is the observational study counterpart to the clustered randomized trial. In a COS, a treatment is assigned to intact groups, and all units within the group are exposed to the treatment. However, the treatment is non-randomly assigned. COSs are common in both education and health services research. In education, treatments may be given to all students within some schools but withheld from all students in other schools. In health studies, treatments may be applied to clusters such as hospitals or groups of patients treated by the same physician. In this manuscript, we study the identification of causal effects in clustered observational study designs. We focus on the prospect of differential selection of units to clusters, which occurs when the units' cluster selections depend on the clusters' treatment assignments. Extant work on COSs has made an implicit assumption that rules out the presence of differential selection. We derive the identification results for designs with differential selection and that contexts with differential cluster selection require different adjustment sets than standard designs. We outline estimators for designs with and without differential selection. Using a series of simulations, we outline the magnitude of the bias that can occur with differential selection. We then present two empirical applications focusing on the likelihood of differential selection. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.02792 [pdf, other]

FIFA: Making Fairness More Generalizable in Classifiers Trained on Imbalanced Data

Authors: Zhun Deng, Jiayao Zhang, Linjun Zhang, Ting Ye, Yates Coley, Weijie J. Su, James Zou

Abstract: Algorithmic fairness plays an important role in machine learning and imposing fairness constraints during learning is a common approach. However, many datasets are imbalanced in certain label classes (e.g. "healthy") and sensitive subgroups (e.g. "older patients"). Empirically, this imbalance leads to a lack of generalizability not only of classification, but also of fairness properties, especiall… ▽ More Algorithmic fairness plays an important role in machine learning and imposing fairness constraints during learning is a common approach. However, many datasets are imbalanced in certain label classes (e.g. "healthy") and sensitive subgroups (e.g. "older patients"). Empirically, this imbalance leads to a lack of generalizability not only of classification, but also of fairness properties, especially in over-parameterized models. For example, fairness-aware training may ensure equalized odds (EO) on the training data, but EO is far from being satisfied on new users. In this paper, we propose a theoretically-principled, yet Flexible approach that is Imbalance-Fairness-Aware (FIFA). Specifically, FIFA encourages both classification and fairness generalization and can be flexibly combined with many existing fair learning methods with logits-based losses. While our main focus is on EO, FIFA can be directly applied to achieve equalized opportunity (EqOpt); and under certain conditions, it can also be applied to other fairness notions. We demonstrate the power of FIFA by combining it with a popular fair classification algorithm, and the resulting algorithm achieves significantly better fairness generalization on several real-world datasets. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.10761 [pdf, other]

The Role of Placebo Samples in Observational Studies

Authors: Ting Ye, Shuxiao Chen, Bo Zhang

Abstract: In an observational study, it is common to leverage known null effect to detect bias. One such strategy is to set aside a placebo sample -- a subset of data immune from the hypothesized cause-and-effect relationship. Existence of an effect in the placebo sample raises concern of unmeasured confounding bias while absence of it corroborates the causal conclusion. This paper establishes a formal fram… ▽ More In an observational study, it is common to leverage known null effect to detect bias. One such strategy is to set aside a placebo sample -- a subset of data immune from the hypothesized cause-and-effect relationship. Existence of an effect in the placebo sample raises concern of unmeasured confounding bias while absence of it corroborates the causal conclusion. This paper establishes a formal framework for using a placebo sample to detect and remove bias. We state identification assumption, and develop estimation and inference methods based on outcome regression, inverse probability weighting, and doubly-robust approaches. Simulation studies and an empirical application illustrate the finite-sample performance of the proposed methods. △ Less

Submitted 1 July, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

arXiv:2203.06887 [pdf, ps, other]

A Focusing Framework for Testing Bi-Directional Causal Effects with GWAS Summary Data

Authors: Sai Li, Ting Ye

Abstract: Mendelian randomization (MR) is a powerful method that uses genetic variants as instrumental variables (IVs) to infer the causal effect of a modifiable exposure on an outcome. Although recent years have seen many extensions of basic MR methods to be robust to certain violations of assumptions, few methods were proposed to infer bi-directional causal relationships, especially for phenotypes with li… ▽ More Mendelian randomization (MR) is a powerful method that uses genetic variants as instrumental variables (IVs) to infer the causal effect of a modifiable exposure on an outcome. Although recent years have seen many extensions of basic MR methods to be robust to certain violations of assumptions, few methods were proposed to infer bi-directional causal relationships, especially for phenotypes with limited biological understandings. The presence of horizontal pleiotropy adds another layer of complexity. In this article, we show that assumptions for common MR methods are often impossible or too stringent in the existence of bi-directional relationships. We then propose a new focusing framework for testing bi-directional causal effects between two traits with possibly pleiotropic genetic variants. Our proposal can be coupled with many state-of-art MR methods. We provide theoretical guarantees on the Type I error and power of the proposed methods. We demonstrate the robustness of the proposed methods using several simulated and real datasets. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 27 pages

arXiv:2203.04194 [pdf, ps, other]

Testing for Treatment Effect Twice Using Internal and External Controls in Clinical Trials

Authors: Yanyao Yi, Ying Zhang, Yu Du, Ting Ye

Abstract: Leveraging external controls -- relevant individual patient data under control from external trials or real-world data -- has the potential to reduce the cost of randomized controlled trials (RCTs) while increasing the proportion of trial patients given access to novel treatments. However, due to lack of randomization, RCT patients and external controls may differ with respect to covariates that m… ▽ More Leveraging external controls -- relevant individual patient data under control from external trials or real-world data -- has the potential to reduce the cost of randomized controlled trials (RCTs) while increasing the proportion of trial patients given access to novel treatments. However, due to lack of randomization, RCT patients and external controls may differ with respect to covariates that may or may not have been measured. Hence, after controlling for measured covariates, for instance by matching, testing for treatment effect using external controls may still be subject to unmeasured biases. In this paper, we propose a sensitivity analysis approach to quantify the magnitude of unmeasured bias that would be needed to alter the study conclusion that presumed no unmeasured biases are introduced by employing external controls. Whether leveraging external controls increases power or not depends on the interplay between sample sizes and the magnitude of treatment effect and unmeasured biases, which may be difficult to anticipate. This motivates a combined testing procedure that performs two highly correlated analyses, one with and one without external controls, with a small correction for multiple testing using the joint distribution of the two test statistics. The combined test provides a new method of sensitivity analysis designed for data fusion problems, which anchors at the unbiased analysis based on RCT only and spends a small proportion of the type I error to also test using the external controls. In this way, if leveraging external controls increases power, the power gain compared to the analysis based on RCT only can be substantial; if not, the power loss is small. The proposed method is evaluated in theory and power calculations, and applied to a real trial. △ Less

Submitted 12 July, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

arXiv:2201.11948 [pdf, other]

Covariate-Adjusted Log-Rank Test: Guaranteed Efficiency Gain and Universal Applicability

Authors: Ting Ye, Jun Shao, Yanyao Yi

Abstract: Nonparametric covariate adjustment is considered for log-rank type tests of treatment effect with right-censored time-to-event data from clinical trials applying covariate-adaptive randomization. Our proposed covariate-adjusted log-rank test has a simple explicit formula and a guaranteed efficiency gain over the unadjusted test. We also show that our proposed test achieves universal applicability… ▽ More Nonparametric covariate adjustment is considered for log-rank type tests of treatment effect with right-censored time-to-event data from clinical trials applying covariate-adaptive randomization. Our proposed covariate-adjusted log-rank test has a simple explicit formula and a guaranteed efficiency gain over the unadjusted test. We also show that our proposed test achieves universal applicability in the sense that the same formula of test can be universally applied to simple randomization and all commonly used covariate-adaptive randomization schemes such as the stratified permuted block and Pocock and Simon's minimization, which is not a property enjoyed by the unadjusted log-rank test. Our method is supported by novel asymptotic theory and empirical results for type I error and power of tests. △ Less

Submitted 19 January, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2112.04243 [pdf]

Hybrid Data-driven Framework for Shale Gas Production Performance Analysis via Game Theory, Machine Learning and Optimization Approaches

Authors: ** Meng, Yujie Zhou, Tianrui Ye, Yitian Xiao

Abstract: A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential, designing field development plan, and making investment decisions. However, quantitative analysis can be challenging because production performance is dominated by a complex interaction among a series of geological and engineering factors. In this study, we propose a hybrid data-d… ▽ More A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential, designing field development plan, and making investment decisions. However, quantitative analysis can be challenging because production performance is dominated by a complex interaction among a series of geological and engineering factors. In this study, we propose a hybrid data-driven procedure for analyzing shale gas production performance, which consists of a complete workflow for dominant factor analysis, production forecast, and development optimization. More specifically, game theory and machine learning models are coupled to determine the dominating geological and engineering factors. The Shapley value with definite physical meanings is employed to quantitatively measure the effects of individual factors. A multi-model-fused stacked model is trained for production forecast, on the basis of which derivative-free optimization algorithms are introduced to optimize the development plan. The complete workflow is validated with actual production data collected from the Fuling shale gas field, Sichuan Basin, China. The validation results show that the proposed procedure can draw rigorous conclusions with quantified evidence and thereby provide specific and reliable suggestions for development plan optimization. Comparing with traditional and experience-based approaches, the hybrid data-driven procedure is advanced in terms of both efficiency and accuracy. △ Less

Submitted 7 June, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: 37 pages, 15 figures, 6 tables

arXiv:2109.10522 [pdf, other]

Minimax Rates and Adaptivity in Combining Experimental and Observational Data

Authors: Shuxiao Chen, Bo Zhang, Ting Ye

Abstract: Randomized controlled trials (RCTs) are the gold standard for evaluating the causal effect of a treatment; however, they often have limited sample sizes and sometimes poor generalizability. On the other hand, non-randomized, observational data derived from large administrative databases have massive sample sizes and better generalizability, but they are prone to unmeasured confounding bias. It is… ▽ More Randomized controlled trials (RCTs) are the gold standard for evaluating the causal effect of a treatment; however, they often have limited sample sizes and sometimes poor generalizability. On the other hand, non-randomized, observational data derived from large administrative databases have massive sample sizes and better generalizability, but they are prone to unmeasured confounding bias. It is thus of considerable interest to reconcile effect estimates obtained from randomized controlled trials and observational studies investigating the same intervention, potentially harvesting the best from both realms. In this paper, we theoretically characterize the potential efficiency gain of integrating observational data into the RCT-based analysis from a minimax point of view. For estimation, we derive the minimax rate of convergence for the mean squared error, and propose a fully adaptive anchored thresholding estimator that attains the optimal rate up to poly-log factors. For inference, we characterize the minimax rate for the length of confidence intervals and show that adaptation (to unknown confounding bias) is in general impossible. A curious phenomenon thus emerges: for estimation, the efficiency gain from data integration can be achieved without prior knowledge on the magnitude of the confounding bias; for inference, the same task becomes information-theoretically impossible in general. We corroborate our theoretical findings using simulations and a real data example from the RCT DUPLICATE initiative [Franklin et al., 2021b]. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2107.06238 [pdf, other]

GENIUS-MAWII: For Robust Mendelian Randomization with Many Weak Invalid Instruments

Authors: Ting Ye, Zhonghua Liu, Baoluo Sun, Eric Tchetgen Tchetgen

Abstract: Mendelian randomization (MR) has become a popular approach to study causal effects by using genetic variants as instrumental variables. We propose a new MR method, GENIUS-MAWII, which simultaneously addresses the two salient phenomena that adversely affect MR analyses: many weak instruments and widespread horizontal pleiotropy. Similar to MR GENIUS (Tchetgen Tchetgen et al., 2021), we achieve iden… ▽ More Mendelian randomization (MR) has become a popular approach to study causal effects by using genetic variants as instrumental variables. We propose a new MR method, GENIUS-MAWII, which simultaneously addresses the two salient phenomena that adversely affect MR analyses: many weak instruments and widespread horizontal pleiotropy. Similar to MR GENIUS (Tchetgen Tchetgen et al., 2021), we achieve identification of the treatment effect by leveraging heteroscedasticity of the exposure. We then derive the class of influence functions of the treatment effect, based on which, we construct a continuous updating estimator and establish its consistency and asymptotic normality under a many weak invalid instruments asymptotic regime by develo** novel semiparametric theory. We also provide a measure of weak identification, an overidentification test, and a graphical diagnostic tool. We demonstrate in simulations that GENIUS-MAWII has clear advantages in the presence of directional or correlated horizontal pleiotropy compared to other methods. We apply our method to study the effect of body mass index on systolic blood pressure using UK Biobank. △ Less

Submitted 24 February, 2024; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2106.14289 [pdf, ps, other]

Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

Authors: Tian Ye, Simon S. Du

Abstract: We study the asymmetric low-rank factorization problem: \[\min_{\mathbf{U} \in \mathbb{R}^{m \times d}, \mathbf{V} \in \mathbb{R}^{n \times d}} \frac{1}{2}\|\mathbf{U}\mathbf{V}^\top -\mathbfΣ\|_F^2\] where $\mathbfΣ$ is a given matrix of size $m \times n$ and rank $d$. This is a canonical problem that admits two difficulties in optimization: 1) non-convexity and 2) non-smoothness (due to unbalanc… ▽ More We study the asymmetric low-rank factorization problem: \[\min_{\mathbf{U} \in \mathbb{R}^{m \times d}, \mathbf{V} \in \mathbb{R}^{n \times d}} \frac{1}{2}\|\mathbf{U}\mathbf{V}^\top -\mathbfΣ\|_F^2\] where $\mathbfΣ$ is a given matrix of size $m \times n$ and rank $d$. This is a canonical problem that admits two difficulties in optimization: 1) non-convexity and 2) non-smoothness (due to unbalancedness of $\mathbf{U}$ and $\mathbf{V}$). This is also a prototype for more complex problems such as asymmetric matrix sensing and matrix completion. Despite being non-convex and non-smooth, it has been observed empirically that the randomly initialized gradient descent algorithm can solve this problem in polynomial time. Existing theories to explain this phenomenon all require artificial modifications of the algorithm, such as adding noise in each iteration and adding a balancing regularizer to balance the $\mathbf{U}$ and $\mathbf{V}$. This paper presents the first proof that shows randomly initialized gradient descent converges to a global minimum of the asymmetric low-rank factorization problem with a polynomial rate. For the proof, we develop 1) a new symmetrization technique to capture the magnitudes of the symmetry and asymmetry, and 2) a quantitative perturbation analysis to approximate matrix derivatives. We believe both are useful for other related non-convex problems. △ Less

Submitted 27 June, 2021; originally announced June 2021.

arXiv:2105.01124 [pdf, other]

Combining Broad and Narrow Case Definitions in Matched Case-Control Studies: Firearms in the Home and Suicide Risk

Authors: Ting Ye, Kan Chen, Dylan S. Small

Abstract: Does having firearms in the home increase suicide risk? To test this hypothesis, a matched case-control study can be performed, in which suicide case subjects are compared to living controls who are similar in observed covariates in terms of their retrospective exposure to firearms at home. In this application, cases can be defined using a broad case definition (suicide) or a narrow case definitio… ▽ More Does having firearms in the home increase suicide risk? To test this hypothesis, a matched case-control study can be performed, in which suicide case subjects are compared to living controls who are similar in observed covariates in terms of their retrospective exposure to firearms at home. In this application, cases can be defined using a broad case definition (suicide) or a narrow case definition (suicide occurred at home). The broad case definition offers a larger number of cases but the narrow case definition may offer a larger effect size. Moreover, restricting to the narrow case definition may introduce selection bias (i.e., bias due to selecting samples based on characteristics affected by the treatment) because exposure to firearms in the home may affect the location of suicide and thus the type of a case a subject is. We propose a new sensitivity analysis framework for combining broad and narrow case definitions in matched case-control studies, that considers the unmeasured confounding bias and selection bias simultaneously. We develop a valid randomization-based testing procedure using only the narrow case matched sets when the effect of the unmeasured confounder on receiving treatment and the effect of the treatment on case definition among the always-cases are controlled by sensitivity parameters. We then use the Bonferroni method to combine the testing procedures using the broad and narrow case definitions. With the proposed methods, we find robust evidence that having firearms at home increases suicide risk. △ Less

Submitted 26 July, 2023; v1 submitted 3 May, 2021; originally announced May 2021.

arXiv:2012.06762 [pdf, ps, other]

Semiparametric causal mediation analysis with unmeasured mediator-outcome confounding

Authors: BaoLuo Sun, Ting Ye

Abstract: Although the exposure can be randomly assigned in studies of mediation effects, any form of direct intervention on the mediator is often infeasible. As a result, unmeasured mediator-outcome confounding can seldom be ruled out. We propose semiparametric identification of natural direct and indirect effects in the presence of unmeasured mediator-outcome confounding by leveraging heteroskedasticity r… ▽ More Although the exposure can be randomly assigned in studies of mediation effects, any form of direct intervention on the mediator is often infeasible. As a result, unmeasured mediator-outcome confounding can seldom be ruled out. We propose semiparametric identification of natural direct and indirect effects in the presence of unmeasured mediator-outcome confounding by leveraging heteroskedasticity restrictions on the observed data law. For inference, we develop semiparametric estimators that remain consistent under partial misspecifications of the observed data model. We illustrate the proposed estimators through both simulations and an application to evaluate the effect of self-efficacy on fatigue among health care workers during the COVID-19 outbreak. △ Less

Submitted 29 September, 2021; v1 submitted 12 December, 2020; originally announced December 2020.

Comments: 26 pages, 1 figure

arXiv:2011.06917 [pdf, other]

Social Distancing and COVID-19: Randomization Inference for a Structured Dose-Response Relationship

Authors: Bo Zhang, Siyu Heng, Ting Ye, Dylan S. Small

Abstract: Social distancing is widely acknowledged as an effective public health policy combating the novel coronavirus. But extreme social distancing has costs and it is not clear how much social distancing is needed to achieve public health effects. In this article, we develop a design-based framework to make inference about the dose-response relationship between social distancing and COVID-19 related dea… ▽ More Social distancing is widely acknowledged as an effective public health policy combating the novel coronavirus. But extreme social distancing has costs and it is not clear how much social distancing is needed to achieve public health effects. In this article, we develop a design-based framework to make inference about the dose-response relationship between social distancing and COVID-19 related death toll and case numbers. We first discuss how to embed observational data with a time-independent, continuous treatment dose into an approximate randomized experiment, and develop a randomization-based procedure that tests if a structured dose-response relationship fits the data. We then generalize the design and testing procedure to accommodate a time-dependent, treatment dose trajectory, and generalize a dose-response relationship to a longitudinal setting. Finally, we apply the proposed design and testing procedures to investigate the effect of social distancing during the phased reopening in the United States on public health outcomes using data compiled from sources including Unacast, the United States Census Bureau, and the County Health Rankings and Roadmaps Program. We rejected a primary analysis null hypothesis that stated the social distancing from April 27, 2020, to June 28, 2020, had no effect on the COVID-19-related death toll from June 29, 2020, to August 2, 2020 (p-value < 0.001), and found that it took more reduction in mobility to prevent exponential growth in case numbers for non-rural counties compared to rural counties. △ Less

Submitted 9 August, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

arXiv:2011.03593 [pdf, other]

Instrumented Difference-in-Differences

Authors: Ting Ye, Ashkan Ertefaie, James Flory, Sean Hennessy, Dylan S. Small

Abstract: Unmeasured confounding is a key threat to reliable causal inference based on observational studies. Motivated from two powerful natural experiment devices, the instrumental variables and difference-in-differences, we propose a new method called instrumented difference-in-differences that explicitly leverages exogenous randomness in an exposure trend to estimate the average and conditional average… ▽ More Unmeasured confounding is a key threat to reliable causal inference based on observational studies. Motivated from two powerful natural experiment devices, the instrumental variables and difference-in-differences, we propose a new method called instrumented difference-in-differences that explicitly leverages exogenous randomness in an exposure trend to estimate the average and conditional average treatment effect in the presence of unmeasured confounding. We develop the identification assumptions using the potential outcomes framework. We propose a Wald estimator and a class of multiply robust and efficient semiparametric estimators, with provable consistency and asymptotic normality. In addition, we extend the instrumented difference-in-differences to a two-sample design to facilitate investigations of delayed treatment effect and provide a measure of weak identification. We demonstrate our results in simulated and real datasets. △ Less

Submitted 7 November, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

arXiv:2009.14484 [pdf, other]

On Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) and Estimation for Causal Inference

Authors: Zhonghua Liu, Ting Ye, Baoluo Sun, Mary Schooling, Eric Tchetgen Tchetgen

Abstract: Standard Mendelian randomization analysis can produce biased results if the genetic variant defining the instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment. We provide novel identification conditions for the causal effect of a treatment in presence of unmeasured confounding, by leveraging an invalid IV for wh… ▽ More Standard Mendelian randomization analysis can produce biased results if the genetic variant defining the instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment. We provide novel identification conditions for the causal effect of a treatment in presence of unmeasured confounding, by leveraging an invalid IV for which both the IV independence and exclusion restriction assumptions may be violated. The proposed Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the invalid IV on the additive scale; and (ii) that the selection bias due to confounding does not vary with the invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroscedastic and thus varies with the invalid IV. We formally establish that their conjunction can identify a causal effect even with an invalid IV subject to pleiotropy. MiSTERI is shown to be particularly advantageous in presence of pervasive heterogeneity of pleiotropic effects on additive scale, a setting in which two recently proposed robust estimation methods MR GxE and MR GENIUS can be severely biased. In order to incorporate multiple, possibly correlated and weak IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MR MaWII MiSTERI) approach for strengthened identification and improved accuracy MaWII MiSTERI is shown to be robust to horizontal pleiotropy, violation of IV independence assumption and weak IV bias. Both simulation studies and real data analysis results demonstrate the robustness of the proposed MR MiSTERI methods. △ Less

Submitted 29 March, 2021; v1 submitted 30 September, 2020; originally announced September 2020.

Comments: 2 figures, 22 pages

arXiv:2009.11828 [pdf, other]

Toward Better Practice of Covariate Adjustment in Analyzing Randomized Clinical Trials

Authors: Ting Ye, Jun Shao, Yanyao Yi, Qingyuan Zhao

Abstract: In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain credibility and efficiency while producing asymptotically valid inference even when the model is incorrect. In this article we present three considerations for better p… ▽ More In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain credibility and efficiency while producing asymptotically valid inference even when the model is incorrect. In this article we present three considerations for better practice when model-assisted inference is applied to adjust for covariates under simple or covariate-adaptive randomized trials: (1) guaranteed efficiency gain: a model-assisted method should often gain but never hurt efficiency; (2) wide applicability: a valid procedure should be applicable, and preferably universally applicable, to all commonly used randomization schemes; (3) robust standard error: variance estimation should be robust to model misspecification and heteroscedasticity. To achieve these, we recommend a model-assisted estimator under an analysis of heterogeneous covariance working model including all covariates utilized in randomization. Our conclusions are based on an asymptotic theory that provides a clear picture of how covariate-adaptive randomization and regression adjustment alter statistical efficiency. Our theory is more general than the existing ones in terms of studying arbitrary functions of response means (including linear contrasts, ratios, and odds ratios), multiple arms, guaranteed efficiency gain, optimality, and universal applicability. △ Less

Submitted 13 July, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

arXiv:2008.07364 [pdf, other]

Predicting Individual Treatment Effects of Large-scale Team Competitions in a Ride-sharing Economy

Authors: Teng Ye, Wei Ai, Lingyu Zhang, Ning Luo, Lulu Zhang, Jie** Ye, Qiaozhu Mei

Abstract: Millions of drivers worldwide have enjoyed financial benefits and work schedule flexibility through a ride-sharing economy, but meanwhile they have suffered from the lack of a sense of identity and career achievement. Equipped with social identity and contest theories, financially incentivized team competitions have been an effective instrument to increase drivers' productivity, job satisfaction,… ▽ More Millions of drivers worldwide have enjoyed financial benefits and work schedule flexibility through a ride-sharing economy, but meanwhile they have suffered from the lack of a sense of identity and career achievement. Equipped with social identity and contest theories, financially incentivized team competitions have been an effective instrument to increase drivers' productivity, job satisfaction, and retention, and to improve revenue over cost for ride-sharing platforms. While these competitions are overall effective, the decisive factors behind the treatment effects and how they affect the outcomes of individual drivers have been largely mysterious. In this study, we analyze data collected from more than 500 large-scale team competitions organized by a leading ride-sharing platform, building machine learning models to predict individual treatment effects. Through a careful investigation of features and predictors, we are able to reduce out-sample prediction error by more than 24%. Through interpreting the best-performing models, we discover many novel and actionable insights regarding how to optimize the design and the execution of team competitions on ride-sharing platforms. A simulated analysis demonstrates that by simply changing a few contest design options, the average treatment effect of a real competition is expected to increase by as much as 26%. Our procedure and findings shed light on how to analyze and optimize large-scale online field experiments in general. △ Less

Submitted 7 August, 2020; originally announced August 2020.

Comments: Accepted to KDD 2020

arXiv:2007.09576 [pdf, other]

Inference on Average Treatment Effect under Minimization and Other Covariate-Adaptive Randomization Methods

Authors: Ting Ye, Yanyao Yi, Jun Shao

Abstract: Covariate-adaptive randomization schemes such as the minimization and stratified permuted blocks are often applied in clinical trials to balance treatment assignments across prognostic factors. The existing theoretical developments on inference after covariate-adaptive randomization are mostly limited to situations where a correct model between the response and covariates can be specified or the r… ▽ More Covariate-adaptive randomization schemes such as the minimization and stratified permuted blocks are often applied in clinical trials to balance treatment assignments across prognostic factors. The existing theoretical developments on inference after covariate-adaptive randomization are mostly limited to situations where a correct model between the response and covariates can be specified or the randomization method has well-understood properties. Based on stratification with covariate levels utilized in randomization and a further adjusting for covariates not used in randomization, in this article we propose several estimators for model free inference on average treatment effect defined as the difference between response means under two treatments. We establish asymptotic normality of the proposed estimators under all popular covariate-adaptive randomization schemes including the minimization whose theoretical property is unclear, and we show that the asymptotic distributions are invariant with respect to covariate-adaptive randomization methods. Consistent variance estimators are constructed for asymptotic inference. Asymptotic relative efficiencies and finite sample properties of estimators are also studied. We recommend using one of our proposed estimators for valid and model free inference after covariate-adaptive randomization. △ Less

Submitted 18 July, 2020; originally announced July 2020.

arXiv:2007.06772 [pdf, other]

Bridging preference-based instrumental variable studies and cluster-randomized encouragement experiments: study design, noncompliance, and average cluster effect ratio

Authors: Bo Zhang, Siyu Heng, Emily J. MacKay, Ting Ye

Abstract: Instrumental variable methods are widely used in medical and social science research to draw causal conclusions when the treatment and outcome are confounded by unmeasured confounding variables. One important feature of such studies is that the instrumental variable is often applied at the cluster level, e.g., hospitals' or physicians' preference for a certain treatment where each hospital or phys… ▽ More Instrumental variable methods are widely used in medical and social science research to draw causal conclusions when the treatment and outcome are confounded by unmeasured confounding variables. One important feature of such studies is that the instrumental variable is often applied at the cluster level, e.g., hospitals' or physicians' preference for a certain treatment where each hospital or physician naturally defines a cluster. This paper proposes to embed such observational instrumental variable data into a cluster-randomized encouragement experiment using statistical matching. Potential outcomes and causal assumptions underpinning the design are formalized and examined. Testing procedures for two commonly-used estimands, Fisher's sharp null hypothesis and the pooled effect ratio, are extended to the current setting. We then introduce a novel cluster-heterogeneous proportional treatment effect model and the relevant estimand: the average cluster effect ratio. This new estimand is advantageous over the structural parameter in a constant proportional treatment effect model in that it allows treatment heterogeneity, and is advantageous over the pooled effect ratio estimand in that it is immune to Simpson's paradox. We develop an asymptotically valid randomization-based testing procedure for this new estimand based on solving a mixed integer quadratically-constrained optimization problem. The proposed design and inferential methods are applied to a study of the effect of using transesophageal echocardiography during CABG surgery on patients' 30-day mortality rate. △ Less

Submitted 23 May, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

arXiv:2006.02423 [pdf, other]

A Negative Correlation Strategy for Bracketing in Difference-in-Differences

Authors: Ting Ye, Luke Keele, Raiden Hasegawa, Dylan S. Small

Abstract: The method of difference-in-differences (DID) is widely used to study the causal effect of policy interventions in observational studies. DID employs a before and after comparison of the treated and control units to remove bias due to time-invariant unmeasured confounders under the parallel trends assumption. Estimates from DID, however, will be biased if the outcomes for the treated and control u… ▽ More The method of difference-in-differences (DID) is widely used to study the causal effect of policy interventions in observational studies. DID employs a before and after comparison of the treated and control units to remove bias due to time-invariant unmeasured confounders under the parallel trends assumption. Estimates from DID, however, will be biased if the outcomes for the treated and control units evolve differently in the absence of treatment, namely if the parallel trends assumption is violated. We propose a general identification strategy that leverages two groups of control units whose outcomes relative to the treated units exhibit a negative correlation, and achieves partial identification of the average treatment effect for the treated. The identified set is of a union bounds form that involves the minimum and maximum operators, which makes the canonical bootstrap generally inconsistent and naive methods overly conservative. By utilizing the directional inconsistency of the bootstrap distribution, we develop a novel bootstrap method to construct uniformly valid confidence intervals for the identified set and parameter of interest when the identified set is of a union bounds form, and we establish the method's theoretical properties. We develop a simple falsification test and sensitivity analysis. We apply the proposed strategy for bracketing to study whether minimum wage laws affect employment levels. △ Less

Submitted 13 June, 2022; v1 submitted 3 June, 2020; originally announced June 2020.

arXiv:1911.09802 [pdf, other]

Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization

Authors: Ting Ye, Jun Shao, Hyunseung Kang

Abstract: Mendelian randomization (MR) has become a popular approach to study the effect of a modifiable exposure on an outcome by using genetic variants as instrumental variables. A challenge in MR is that each genetic variant explains a relatively small proportion of variance in the exposure and there are many such variants, a setting known as many weak instruments. To this end, we provide a theoretical c… ▽ More Mendelian randomization (MR) has become a popular approach to study the effect of a modifiable exposure on an outcome by using genetic variants as instrumental variables. A challenge in MR is that each genetic variant explains a relatively small proportion of variance in the exposure and there are many such variants, a setting known as many weak instruments. To this end, we provide a theoretical characterization of the statistical properties of two popular estimators in MR, the inverse-variance weighted (IVW) estimator and the IVW estimator with screened instruments using an independent selection dataset, under many weak instruments. We then propose a debiased IVW estimator, a simple modification of the IVW estimator, that is robust to many weak instruments and doesn't require screening. Additionally, we present two instrument selection methods to improve the efficiency of the new estimator when a selection dataset is available. An extension of the debiased IVW estimator to handle balanced horizontal pleiotropy is also discussed. We conclude by demonstrating our results in simulated and real datasets. △ Less

Submitted 10 October, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Showing 1–31 of 31 results for author: Ye, T