Search | arXiv e-print repository

Graph Machine Learning based Doubly Robust Estimator for Network Causal Effects

Authors: Seyedeh Baharan Khatami, Harsh Parikh, Haowei Chen, Sudeepa Roy, Babak Salimi

Abstract: We address the challenge of inferring causal effects in social network data. This results in challenges due to interference -- where a unit's outcome is affected by neighbors' treatments -- and network-induced confounding factors. While there is extensive literature focusing on estimating causal effects in social network setups, a majority of them make prior assumptions about the form of network-i… ▽ More We address the challenge of inferring causal effects in social network data. This results in challenges due to interference -- where a unit's outcome is affected by neighbors' treatments -- and network-induced confounding factors. While there is extensive literature focusing on estimating causal effects in social network setups, a majority of them make prior assumptions about the form of network-induced confounding mechanisms. Such strong assumptions are rarely likely to hold especially in high-dimensional networks. We propose a novel methodology that combines graph machine learning approaches with the double machine learning framework to enable accurate and efficient estimation of direct and peer effects using a single observational social network. We demonstrate the semiparametric efficiency of our proposed estimator under mild regularity conditions, allowing for consistent uncertainty quantification. We demonstrate that our method is accurate, robust, and scalable via an extensive simulation study. We use our method to investigate the impact of Self-Help Group participation on financial risk tolerance. △ Less

Submitted 31 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.17042 [pdf, ps, other]

Towards Generalizing Inferences from Trials to Target Populations

Authors: Melody Y Huang, Harsh Parikh

Abstract: Randomized Controlled Trials (RCTs) are pivotal in generating internally valid estimates with minimal assumptions, serving as a cornerstone for researchers dedicated to advancing causal inference methods. However, extending these findings beyond the experimental cohort to achieve externally valid estimates is crucial for broader scientific inquiry. This paper delves into the forefront of addressin… ▽ More Randomized Controlled Trials (RCTs) are pivotal in generating internally valid estimates with minimal assumptions, serving as a cornerstone for researchers dedicated to advancing causal inference methods. However, extending these findings beyond the experimental cohort to achieve externally valid estimates is crucial for broader scientific inquiry. This paper delves into the forefront of addressing these external validity challenges, encapsulating the essence of a multidisciplinary workshop held at the Institute for Computational and Experimental Research in Mathematics (ICERM), Brown University, in Fall 2023. The workshop congregated experts from diverse fields including social science, medicine, public health, statistics, computer science, and education, to tackle the unique obstacles each discipline faces in extrapolating experimental findings. Our study presents three key contributions: we integrate ongoing efforts, highlighting methodological synergies across fields; provide an exhaustive review of generalizability and transportability based on the workshop's discourse; and identify persistent hurdles while suggesting avenues for future research. By doing so, this paper aims to enhance the collective understanding of the generalizability and transportability of causal effects, fostering cross-disciplinary collaboration and offering valuable insights for researchers working on refining and applying causal inference methods. △ Less

Submitted 24 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2401.14512 [pdf, other]

Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population

Authors: Harsh Parikh, Rachael Ross, Elizabeth Stuart, Kara Rudolph

Abstract: Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve gener… ▽ More Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations. △ Less

Submitted 10 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: MOUD Analysis Included

arXiv:2312.10569 [pdf, other]

Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data

Authors: Srikar Katta, Harsh Parikh, Cynthia Rudin, Alexander Volfovsky

Abstract: Many modern causal questions ask how treatments affect complex outcomes that are measured using wearable devices and sensors. Current analysis approaches require summarizing these data into scalar statistics (e.g., the mean), but these summaries can be misleading. For example, disparate distributions can have the same means, variances, and other statistics. Researchers can overcome the loss of inf… ▽ More Many modern causal questions ask how treatments affect complex outcomes that are measured using wearable devices and sensors. Current analysis approaches require summarizing these data into scalar statistics (e.g., the mean), but these summaries can be misleading. For example, disparate distributions can have the same means, variances, and other statistics. Researchers can overcome the loss of information by instead representing the data as distributions. We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making: Analyzing Distributional Data via Matching After Learning to Stretch (ADD MALTS). We (i) provide analytical guarantees of the correctness of our estimation strategy, (ii) demonstrate via simulation that ADD MALTS outperforms other distributional data analysis methods at estimating treatment effects, and (iii) illustrate ADD MALTS' ability to verify whether there is enough cohesion between treatment and control units within subpopulations to trustworthily estimate treatment effects. We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks. △ Less

Submitted 20 March, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2310.15333 [pdf, other]

Safe and Interpretable Estimation of Optimal Treatment Regimes

Authors: Harsh Parikh, Quinn Lanners, Zade Akras, Sahar F. Zafar, M. Brandon Westover, Cynthia Rudin, Alexander Volfovsky

Abstract: Recent statistical and reinforcement learning methods have significantly advanced patient care strategies. However, these approaches face substantial challenges in high-stakes contexts, including missing data, inherent stochasticity, and the critical requirements for interpretability and patient safety. Our work operationalizes a safe and interpretable framework to identify optimal treatment regim… ▽ More Recent statistical and reinforcement learning methods have significantly advanced patient care strategies. However, these approaches face substantial challenges in high-stakes contexts, including missing data, inherent stochasticity, and the critical requirements for interpretability and patient safety. Our work operationalizes a safe and interpretable framework to identify optimal treatment regimes. This approach involves matching patients with similar medical and pharmacological characteristics, allowing us to construct an optimal policy via interpolation. We perform a comprehensive simulation study to demonstrate the framework's ability to identify optimal policies even in complex settings. Ultimately, we operationalize our approach to study regimes for treating seizures in critically ill patients. Our findings strongly support personalized treatment strategies based on a patient's medical history and pharmacological features. Notably, we identify that reducing medication doses for patients with mild and brief seizure episodes while adopting aggressive treatment for patients in intensive care unit experiencing intense seizures leads to more favorable outcomes. △ Less

Submitted 1 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted for publication in the proceedings of AISTATS 2025

arXiv:2307.01449 [pdf, other]

A Double Machine Learning Approach to Combining Experimental and Observational Data

Authors: Harsh Parikh, Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky

Abstract: Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only on… ▽ More Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one of these assumptions is violated, we provide semiparametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. Through comparative analyses, we show our framework's superiority over existing data fusion methods. The practical utility of our approach is further exemplified by three real-world case studies, underscoring its potential for widespread application in empirical research. △ Less

Submitted 2 April, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

arXiv:2302.11715 [pdf, other]

Variable Importance Matching for Causal Inference

Authors: Quinn Lanners, Harsh Parikh, Alexander Volfovsky, Cynthia Rudin, David Page

Abstract: Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the… ▽ More Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the matched groups to estimate treatment effects. Model-to-Match uses variable importance measurements to construct a distance metric, making it a flexible framework that can be adapted to various applications. Concentrating on the scalability of the problem in the number of potential confounders, we operationalize the Model-to-Match framework with LASSO. We derive performance guarantees for settings where LASSO outcome modeling consistently identifies all confounders (importantly without requiring the linear model to be correctly specified). We also provide experimental results demonstrating the method's auditability, accuracy, and scalability as well as extensions to more general nonparametric outcome modeling. △ Less

Submitted 28 June, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1174-1184, 2023

arXiv:2211.01575 [pdf, other]

Are Synthetic Control Weights Balancing Score?

Authors: Harsh Parikh

Abstract: In this short note, I outline conditions under which conditioning on Synthetic Control (SC) weights emulates a randomized control trial where the treatment status is independent of potential outcomes. Specifically, I demonstrate that if there exist SC weights such that (i) the treatment effects are exactly identified and (ii) these weights are uniformly and cumulatively bounded, then SC weights ar… ▽ More In this short note, I outline conditions under which conditioning on Synthetic Control (SC) weights emulates a randomized control trial where the treatment status is independent of potential outcomes. Specifically, I demonstrate that if there exist SC weights such that (i) the treatment effects are exactly identified and (ii) these weights are uniformly and cumulatively bounded, then SC weights are balancing scores. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 2 pages, 2 figures

arXiv:2203.04920 [pdf]

doi 10.1016/S2589-7500(23)00088-2

Effects of Epileptiform Activity on Discharge Outcome in Critically Ill Patients

Authors: Harsh Parikh, Kentaro Hoffman, Haoqi Sun, Wendong Ge, ** **g, Rajesh Amerineni, Lin Liu, Jimeng Sun, Sahar Zafar, Aaron Struck, Alexander Volfovsky, Cynthia Rudin, M. Brandon Westover

Abstract: Epileptiform activity (EA) is associated with worse outcomes including increased risk of disability and death. However, the effect of EA on the neurologic outcome is confounded by the feedback between treatment with anti-seizure medications (ASM) and EA burden. A randomized clinical trial is challenging due to the sequential nature of EA-ASM feedback, as well as ethical reasons. However, some mech… ▽ More Epileptiform activity (EA) is associated with worse outcomes including increased risk of disability and death. However, the effect of EA on the neurologic outcome is confounded by the feedback between treatment with anti-seizure medications (ASM) and EA burden. A randomized clinical trial is challenging due to the sequential nature of EA-ASM feedback, as well as ethical reasons. However, some mechanistic knowledge is available, e.g., how drugs are absorbed. This knowledge together with observational data could provide a more accurate effect estimate using causal inference. We performed a retrospective cross-sectional study with 995 patients with the modified Rankin Scale (mRS) at discharge as the outcome and the EA burden defined as the mean or maximum proportion of time spent with EA in six-hour windows in the first 24 hours of electroencephalography as the exposure. We estimated the change in discharge mRS if everyone in the dataset had experienced a certain EA burden and were untreated. We combined pharmacological modeling with an interpretable matching method to account for confounding and EA-ASM feedback. Our matched groups' quality was validated by the neurologists. Having a maximum EA burden greater than 75% when untreated had a 22% increased chance of a poor outcome (severe disability or death), and mild but long-lasting EA increased the risk of a poor outcome by 14%. The effect sizes were heterogeneous depending on pre-admission profile, e.g., patients with hypoxic-ischemic encephalopathy (HIE) or acquired brain injury (ABI) were more affected. Interventions should put a higher priority on patients with an average EA burden higher than 10%, while treatment should be more conservative when the maximum EA burden is low. △ Less

Submitted 11 March, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: 4 Figures

arXiv:2202.04208 [pdf, other]

Validating Causal Inference Methods

Authors: Harsh Parikh, Carlos Varjao, Louise Xu, Eric Tchetgen Tchetgen

Abstract: The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based… ▽ More The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no `one-size-fits-all' causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such data-generative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for the data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework's novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence's ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies. △ Less

Submitted 29 July, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: 5 figures, 13 pages

Journal ref: PMLR 162:17346-17358, 2022

arXiv:2105.10591 [pdf, other]

Heterogeneous Treatment Effects in Social Networks

Authors: Amir Gilad, Harsh Parikh, Sudeepa Roy, Babak Salimi

Abstract: We study treatment effect modifiers for causal analysis in a social network, where neighbors' characteristics or network structure may affect the outcome of a unit, and the goal is to identify sub-populations with varying treatment effects using such network properties. We propose a novel framework for this purpose that facilitates data-driven decision making by testing hypotheses about complex ef… ▽ More We study treatment effect modifiers for causal analysis in a social network, where neighbors' characteristics or network structure may affect the outcome of a unit, and the goal is to identify sub-populations with varying treatment effects using such network properties. We propose a novel framework for this purpose that facilitates data-driven decision making by testing hypotheses about complex effect modifiers in terms of network features or network patterns (e.g., characteristics of neighbors of a unit or belonging to a triangle), and by identifying sub-populations for which a treatment is likely to be effective or harmful. We describe a hypothesis testing approach that accounts for a unit's covariates, their neighbors' covariates, and patterns in the social network, and devise an algorithm incorporating ideas from causal inference, hypothesis testing, and graph theory to verify a hypothesized effect modifier. In addition, we develop a novel algorithm for the discovery of network patterns that are potential effect modifiers. We perform extensive experimental evaluations with a real development economics dataset about the treatment effect of belonging to a financial support network called self-help groups on risk tolerance, and also with a synthetic dataset with known ground truths simulating a vaccine efficacy trial, to evaluate our framework and algorithms. △ Less

Submitted 7 November, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

arXiv:2004.03644 [pdf, other]

Causal Relational Learning

Authors: Babak Salimi, Harsh Parikh, Moe Kayali, Sudeepa Roy, Lise Getoor, Dan Suciu

Abstract: Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational… ▽ More Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational data have been developed in statistical studies and social sciences. However, existing methods critically rely on restrictive assumptions such as the study population consisting of homogeneous elements that can be represented in a single flat table, where each row is referred to as a unit. In contrast, in many real-world settings, the study domain naturally consists of heterogeneous elements with complex relational structure, where the data is naturally represented in multiple related tables. In this paper, we present a formal framework for causal inference from such relational data. We propose a declarative language called CaRL for capturing causal background knowledge and assumptions and specifying causal queries using simple Datalog-like rules.CaRL provides a foundation for inferring causality and reasoning about the effect of complex interventions in relational domains. We present an extensive experimental evaluation on real relational data to illustrate the applicability of CaRL in social sciences and healthcare. △ Less

Submitted 7 April, 2020; originally announced April 2020.

arXiv:1811.07415 [pdf, other]

MALTS: Matching After Learning to Stretch

Authors: Harsh Parikh, Cynthia Rudin, Alexander Volfovsky

Abstract: We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metr… ▽ More We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects. △ Less

Submitted 7 June, 2023; v1 submitted 18 November, 2018; originally announced November 2018.

Comments: 40 pages, 5 Tables, 12 Figures

Journal ref: Journal.of.Machine.Learning.Research 23(240) (2022) 1-42

Showing 1–13 of 13 results for author: Parikh, H