Search | arXiv e-print repository

Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules

Authors: Michael Lingzhi Li, Kosuke Imai

Abstract: A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of i… ▽ More A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman's approach is that it can be applied to any ITR regardless of the properties of machine learning algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman's repeated sampling framework is as relevant for causal inference today as it has been since its inception. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.12108 [pdf, other]

Does AI help humans make better decisions? A methodological framework for experimental evaluation

Authors: Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin

Abstract: The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to ans… ▽ More The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to answer experimentally this question with no additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with a human making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that AI recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that AI-alone decisions generally perform worse than human decisions with or without AI assistance. Finally, AI recommendations tend to impose cash bail on non-white arrestees more often than necessary when compared to white arrestees. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.07031 [pdf, other]

The Cram Method for Efficient Simultaneous Learning and Evaluation

Authors: Zeyang Jia, Kosuke Imai, Michael Lingzhi Li

Abstract: We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient tha… ▽ More We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. The cram method also naturally accommodates online learning algorithms, making its implementation computationally efficient. To demonstrate the power of the cram method, we consider the standard policy learning setting where cramming is applied to the same data to both develop an individualized treatment rule (ITR) and estimate the average outcome that would result if the learned ITR were to be deployed. We show that under a minimal set of assumptions, the resulting crammed evaluation estimator is consistent and asymptotically normal. While our asymptotic results require a relatively weak stabilization condition of ML algorithm, we develop a simple, generic method that can be used with any policy learning algorithm to satisfy this condition. Our extensive simulation studies show that, when compared to sample-splitting, cramming reduces the evaluation standard error by more than 40% while improving the performance of learned policy. We also apply the cram method to a randomized clinical trial to demonstrate its applicability to real-world problems. Finally, we briefly discuss future extensions of the cram method to other learning and evaluation settings. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2312.03268 [pdf, other]

Design-based inference for generalized network experiments with stochastic interventions

Authors: Ambarish Chattopadhyay, Kosuke Imai, Jose R. Zubizarreta

Abstract: A growing number of scholars and data scientists are conducting randomized experiments to analyze causal relationships in network settings where units influence one another. A dominant methodology for analyzing these network experiments has been design-based, leveraging randomization of treatment assignment as the basis for inference. In this paper, we generalize this design-based approach so that… ▽ More A growing number of scholars and data scientists are conducting randomized experiments to analyze causal relationships in network settings where units influence one another. A dominant methodology for analyzing these network experiments has been design-based, leveraging randomization of treatment assignment as the basis for inference. In this paper, we generalize this design-based approach so that it can be applied to more complex experiments with a variety of causal estimands with different target populations. An important special case of such generalized network experiments is a bipartite network experiment, in which the treatment assignment is randomized among one set of units and the outcome is measured for a separate set of units. We propose a broad class of causal estimands based on stochastic intervention for generalized network experiments. Using a design-based approach, we show how to estimate the proposed causal quantities without bias, and develop conservative variance estimators. We apply our methodology to a randomized experiment in education where a group of selected students in middle schools are eligible for the anti-conflict promotion program, and the program participation is randomized within this group. In particular, our analysis estimates the causal effects of treating each student or his/her close friends, for different target populations in the network. We find that while the treatment improves the overall awareness against conflict among students, it does not significantly reduce the total number of conflicts. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.02467 [pdf, other]

Individualized Policy Evaluation and Learning under Clustered Network Interference

Authors: Yi Zhang, Kosuke Imai

Abstract: While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover e… ▽ More While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network interference (also known as partial interference) where clusters of units are sampled from a population and units may influence one another within each cluster. Unlike previous methods that impose strong restrictions on spillover effects, the proposed methodology only assumes a semiparametric structural model where each unit's outcome is an additive function of individual treatments within the cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology. △ Less

Submitted 4 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

arXiv:2310.07973 [pdf, other]

Statistical Performance Guarantee for Subgroup Identification with Generic Machine Learning

Authors: Michael Lingzhi Li, Kosuke Imai

Abstract: Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals who are likely to benefit from a treatment the most (``exceptional responders'') or those who are harmed by it. A common approach to this subgroup identification problem consists of two steps. First, researchers estimate the conditional average treatment effect (CATE) usi… ▽ More Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals who are likely to benefit from a treatment the most (``exceptional responders'') or those who are harmed by it. A common approach to this subgroup identification problem consists of two steps. First, researchers estimate the conditional average treatment effect (CATE) using an ML algorithm. Next, they use the estimated CATE to select those individuals who are predicted to be most affected by the treatment, either positively or negatively. Unfortunately, CATE estimates are often biased and noisy. In addition, utilizing the same data to both identify a subgroup and estimate its group average treatment effect results in a multiple testing problem. To address these challenges, we develop uniform confidence bands for estimation of the group average treatment effect sorted by generic ML algorithm (GATES). Using these uniform confidence bands, researchers can identify, with a statistical guarantee, a subgroup whose GATES exceeds a certain effect size, regardless of how this effect size is chosen. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units. Importantly, our method does not require modeling assumptions and avoids a computationally intensive resampling procedure. A simulation study shows that the proposed uniform confidence bands are reasonably informative and have an appropriate empirical coverage even when the sample size is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders. △ Less

Submitted 20 December, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2307.08840 [pdf, other]

Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

Authors: Zeyang Jia, Eli Ben-Michael, Kosuke Imai

Abstract: Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodo… ▽ More Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors. △ Less

Submitted 27 May, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.07521 [pdf, other]

doi 10.1126/sciadv.adl2524

Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods

Authors: Christopher T. Kenny, Cory McCartan, Shiro Kuriwaki, Tyler Simko, Kosuke Imai

Abstract: The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swap** algorithm implemented for the three previous Censuses. Our… ▽ More The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swap** algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measure File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swap**. While the estimated errors for both TopDown and swap** algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations. △ Less

Submitted 10 February, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: 25 pages, 6 figures, 2 tables, plus appendices

Journal ref: Science advances, 10(18) (2024) eadl2524

arXiv:2306.01211 [pdf, other]

Priming bias versus post-treatment bias in experimental designs

Authors: Matthew Blackwell, Jacob R. Brown, Sophie Hill, Kosuke Imai, Teppei Yamamoto

Abstract: Conditioning on variables affected by treatment can induce post-treatment bias when estimating causal effects. Although this suggests that researchers should measure potential moderators before administering the treatment in an experiment, doing so may also bias causal effect estimation if the covariate measurement primes respondents to react differently to the treatment. This paper formally analy… ▽ More Conditioning on variables affected by treatment can induce post-treatment bias when estimating causal effects. Although this suggests that researchers should measure potential moderators before administering the treatment in an experiment, doing so may also bias causal effect estimation if the covariate measurement primes respondents to react differently to the treatment. This paper formally analyzes this trade-off between post-treatment and priming biases in three experimental designs that vary when moderators are measured: pre-treatment, post-treatment, or a randomized choice between the two. We derive nonparametric bounds for interactions between the treatment and the moderator under each design and show how to use substantive assumptions to narrow these bounds. These bounds allow researchers to assess the sensitivity of their empirical findings to either source of bias. We then apply the proposed methodology to a survey experiment on electoral messaging. △ Less

Submitted 28 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 32 pages (main text), 22 pages (supplementary materials), 5 figures

arXiv:2305.05833 [pdf, other]

A Statistical Model of Bipartite Networks: Application to Cosponsorship in the United States Senate

Authors: Adeline Lo, Santiago Olivella, Kosuke Imai

Abstract: Many networks in political and social research are bipartite, with edges connecting exclusively across two distinct types of nodes. A common example includes cosponsorship networks, in which legislators are connected indirectly through the bills they support. Yet most existing network models are designed for unipartite networks, where edges can arise between any pair of nodes. However, using a uni… ▽ More Many networks in political and social research are bipartite, with edges connecting exclusively across two distinct types of nodes. A common example includes cosponsorship networks, in which legislators are connected indirectly through the bills they support. Yet most existing network models are designed for unipartite networks, where edges can arise between any pair of nodes. However, using a unipartite network model to analyze bipartite networks, as often done in practice, can result in aggregation bias and artificially high-clustering -- a particularly insidious problem when studying the role groups play in network formation. To address these methodological problems, we develop a statistical model of bipartite networks theorized to be generated through group interactions by extending the popular mixed-membership stochastic blockmodel. Our model allows researchers to identify the groups of nodes, within each node type in the bipartite structure, that share common patterns of edge formation. The model also incorporates both node and dyad-level covariates as the predictors of group membership and of observed dyadic relations. We develop an efficient computational algorithm for fitting the model, and apply it to cosponsorship data from the United States Senate. We show that legislators in a Senate that was perfectly split along party lines were able to remain productive and pass major legislation by forming non-partisan, power-brokering coalitions that found common ground through their collaboration on low-stakes bills. We also find evidence for norms of reciprocity, and uncover the substantial role played by policy expertise in the formation of cosponsorships between senators and legislation. We make an open-source software package available that makes it possible for other researchers to uncover similar insights from bipartite networks. △ Less

Submitted 27 June, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 41 pages (main text), 6 pages (appendix), 19 pages (online SI)

arXiv:2303.02580 [pdf, other]

Estimating Racial Disparities When Race is Not Observed

Authors: Cory McCartan, Robin Fisher, Jacob Goldin, Daniel E. Ho, Kosuke Imai

Abstract: The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census da… ▽ More The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race. Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities. To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics. We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that take BISG probabilities as inputs and produce racial disparity estimates by using surnames as an instrumental variable for race. Our estimation method is scalable, making it possible to analyze large-scale administrative data. We also show how to address potential violations of the key identification assumptions. A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration. Finally, we apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service. Open-source software is available which implements the proposed methodology. △ Less

Submitted 16 April, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: 28 pages, 9 figures, plus references and appendices

arXiv:2210.08383 [pdf, ps, other]

doi 10.1162/99608f92.abc2c765

Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System

Authors: Christopher T. Kenny, Shiro Kuriwaki, Cory McCartan, Evan T. R. Rosenman, Tyler Simko, Kosuke Imai

Abstract: In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary,… ▽ More In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swap** and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: Version accepted to Harvard Data Science Review

Journal ref: Harvard Data Science Review, (Special Issue 2, 2023)

arXiv:2210.08326 [pdf, ps, other]

Distributionally Robust Causal Inference with Observational Data

Authors: Dimitris Bertsimas, Kosuke Imai, Michael Lingzhi Li

Abstract: We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then… ▽ More We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case, and we demonstrate how the proposed methodology can address a primary challenge of the marginal sensitivity model that it produces uninformative results when unobserved confounders substantially affect treatment and outcome. Specifically, we develop an alternative sensitivity model, called the distributional sensitivity model, under the assumption that heterogeneity of treatment effect due to unobserved variables is relatively small. Unlike the marginal sensitivity model, the distributional sensitivity model allows for potential lack of overlap and often produces informative bounds even when unobserved variables substantially affect both treatment and outcome. Finally, we show how to extend the distributional sensitivity model to difference-in-differences designs and settings with instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology. △ Less

Submitted 2 February, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2208.13323 [pdf, other]

Safe Policy Learning under Regression Discontinuity Designs with Multiple Cutoffs

Authors: Yi Zhang, Eli Ben-Michael, Kosuke Imai

Abstract: The regression discontinuity (RD) design is widely used for program evaluation with observational data. The primary focus of the existing literature has been the estimation of the local average treatment effect at the existing treatment cutoff. In contrast, we consider policy learning under the RD design. Because the treatment assignment mechanism is deterministic, learning better treatment cutoff… ▽ More The regression discontinuity (RD) design is widely used for program evaluation with observational data. The primary focus of the existing literature has been the estimation of the local average treatment effect at the existing treatment cutoff. In contrast, we consider policy learning under the RD design. Because the treatment assignment mechanism is deterministic, learning better treatment cutoffs requires extrapolation. We develop a robust optimization approach to finding optimal treatment cutoffs that improve upon the existing ones. We first decompose the expected utility into point-identifiable and unidentifiable components. We then propose an efficient doubly-robust estimator for the identifiable parts. To account for the unidentifiable components, we leverage the existence of multiple cutoffs that are common under the RD design. Specifically, we assume that the heterogeneity in the conditional expectations of potential outcomes across different groups vary smoothly along the running variable. Under this assumption, we minimize the worst case utility loss relative to the status quo policy. The resulting new treatment cutoffs have a safety guarantee that they will not yield a worse overall outcome than the existing cutoffs. Finally, we establish the asymptotic regret bounds for the learned policy using semi-parametric efficiency theory. We apply the proposed methodology to empirical and simulated data sets. △ Less

Submitted 8 July, 2023; v1 submitted 28 August, 2022; originally announced August 2022.

arXiv:2208.12443 [pdf, other]

Race and ethnicity data for first, middle, and last names

Authors: Evan T. R. Rosenman, Santiago Olivella, Kosuke Imai

Abstract: We provide the largest compiled publicly available dictionaries of first, middle, and last names for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration. Our data cover a much larger scope of names than any compar… ▽ More We provide the largest compiled publicly available dictionaries of first, middle, and last names for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration. Our data cover a much larger scope of names than any comparable dataset, containing roughly one million first names, 1.1 million middle names, and 1.4 million surnames. Individuals are categorized into five mutually exclusive racial and ethnic groups -- White, Black, Hispanic, Asian, and Other -- and racial/ethnic counts by name are provided for every name in each dictionary. Counts can then be normalized row-wise or column-wise to obtain conditional probabilities of race given name or name given race. These conditional probabilities can then be deployed for imputation in a data analytic task for which ground truth racial and ethnic data is not available. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2208.06968 [pdf, other]

doi 10.1073/pnas.2217322120

Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition

Authors: Christopher T. Kenny, Cory McCartan, Tyler Simko, Shiro Kuriwaki, Kosuke Imai

Abstract: Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To separate the partisan effects of redistricting from the effects of other factors including geography and redistricting rules, we compare possible party compositions of the U.S. House under the enacted plan to those under a set of alternative simulated plans that serve as a non-p… ▽ More Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To separate the partisan effects of redistricting from the effects of other factors including geography and redistricting rules, we compare possible party compositions of the U.S. House under the enacted plan to those under a set of alternative simulated plans that serve as a non-partisan baseline. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the electoral bias it creates cancels at the national level, giving Republicans two additional seats on average. Geography and redistricting rules separately contribute a moderate pro-Republican bias. Finally, we find that partisan gerrymandering reduces electoral competition and makes the partisan composition of the U.S. House less responsive to shifts in the national vote. △ Less

Submitted 13 April, 2023; v1 submitted 14 August, 2022; originally announced August 2022.

Comments: 10 pages, 4 figures, plus references and appendix

Journal ref: Proc. Natl. Acad. Sci. 120(25), 2023

arXiv:2206.10763 [pdf, other]

doi 10.1038/s41597-022-01808-2

Simulated redistricting plans for the analysis and evaluation of redistricting in the United States

Authors: Cory McCartan, Christopher T. Kenny, Tyler Simko, George Garcia III, Kevin Wang, Melissa Wu, Shiro Kuriwaki, Kosuke Imai

Abstract: This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standa… ▽ More This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data. △ Less

Submitted 20 October, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: 11 pages, 3 figures

Journal ref: Sci Data (2022) 9, 689

arXiv:2206.10479 [pdf, other]

Policy Learning with Asymmetric Counterfactual Utilities

Authors: Eli Ben-Michael, Kosuke Imai, Zhichao Jiang

Abstract: Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility… ▽ More Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility function is more properly characterized by the joint set of potential outcomes under all actions. For example, the Hippocratic principle to "do no harm" implies that the cost of causing death to a patient who would otherwise survive without treatment is greater than the cost of forgoing life-saving treatment. We consider optimal policy learning with asymmetric counterfactual utility functions of this form that consider the joint set of potential outcomes. We show that asymmetric counterfactual utilities lead to an unidentifiable expected utility function, and so we first partially identify it. Drawing on statistical decision theory, we then derive minimax decision rules by minimizing the maximum expected utility loss relative to different alternative policies. We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems, and establish that the finite sample excess expected utility loss of this procedure is bounded by the regret of these intermediate classifiers. We apply this conceptual framework and methodology to the decision about whether or not to use right heart catheterization for patients with possible pulmonary hypertension. △ Less

Submitted 28 November, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2205.06129 [pdf, other]

Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements

Authors: Kosuke Imai, Santiago Olivella, Evan T. R. Rosenman

Abstract: Prediction of individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for t… ▽ More Prediction of individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two Census data problems that contribute to unsatisfactory predictive performance for minorities. First, the decennial Census often contains zero counts for minority racial groups in the Census blocks where some members of those groups reside. Second, because the Census surname files only include frequent names, many surnames -- especially those of minorities -- are missing from the list. To address the zero counts problem, we introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts by extending the naive Bayesian inference of the BISG methodology to full posterior inference. To address the missing surname problem, we supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians. The proposed methodology, together with additional name data, is available via the open-source software WRU. △ Less

Submitted 31 August, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

arXiv:2203.14511 [pdf, ps, other]

Statistical Inference for Heterogeneous Treatment Effects Discovered by Generic Machine Learning in Randomized Experiments

Authors: Kosuke Imai, Michael Lingzhi Li

Abstract: Researchers are increasingly turning to machine learning (ML) algorithms to investigate causal heterogeneity in randomized experiments. Despite their promise, ML algorithms may fail to accurately ascertain heterogeneous treatment effects under practical settings with many covariates and small sample size. In addition, the quantification of estimation uncertainty remains a challenge. We develop a g… ▽ More Researchers are increasingly turning to machine learning (ML) algorithms to investigate causal heterogeneity in randomized experiments. Despite their promise, ML algorithms may fail to accurately ascertain heterogeneous treatment effects under practical settings with many covariates and small sample size. In addition, the quantification of estimation uncertainty remains a challenge. We develop a general approach to statistical inference for heterogeneous treatment effects discovered by a generic ML algorithm. We apply the Neyman's repeated sampling framework to a common setting, in which researchers use an ML algorithm to estimate the conditional average treatment effect and then divide the sample into several groups based on the magnitude of the estimated effects. We show how to estimate the average treatment effect within each of these groups, and construct a valid confidence interval. In addition, we develop nonparametric tests of treatment effect homogeneity across groups, and rank-consistency of within-group average treatment effects. The validity of our methodology does not rely on the properties of ML algorithms because it is solely based on the randomization of treatment assignment and random sampling of units. Finally, we generalize our methodology to the cross-fitting procedure by accounting for the additional uncertainty induced by the random splitting of data. △ Less

Submitted 20 April, 2024; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2201.08343 [pdf, other]

doi 10.1017/pan.2023.41

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

Authors: Dae Woong Ham, Kosuke Imai, Lucas Janson

Abstract: Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Researchers examine how varying a factor of interest, while controlling for other relevant factors, influences decision-making. Currently, there exist two methodological approaches to analyzing data from a conjoint experiment. The first focuses on estimating the average marginal effects of each factor… ▽ More Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Researchers examine how varying a factor of interest, while controlling for other relevant factors, influences decision-making. Currently, there exist two methodological approaches to analyzing data from a conjoint experiment. The first focuses on estimating the average marginal effects of each factor while averaging over the other factors. Although this allows for straightforward design-based estimation, the results critically depend on the distribution of other factors and how interaction effects are aggregated. An alternative model-based approach can compute various quantities of interest, but requires researchers to correctly specify the model, a challenging task for conjoint analysis with many factors and possible interactions. In addition, a commonly used logistic regression has poor statistical properties even with a moderate number of factors when incorporating interactions. We propose a new hypothesis testing approach based on the conditional randomization test to answer the most fundamental question of conjoint analysis: Does a factor of interest matter in any way given the other factors? Our methodology is solely based on the randomization of factors, and hence is free from assumptions. Yet, it allows researchers to use any test statistic, including those based on complex machine learning algorithms. As a result, we are able to combine the strengths of the existing design-based and model-based approaches. We illustrate the proposed methodology through conjoint analysis of immigration preferences and political candidate evaluation. We also extend the proposed approach to test for regularity assumptions commonly used in conjoint analysis. An open-source software package is available for implementing the proposed methodology. △ Less

Submitted 17 August, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Journal ref: Political Analysis, pg. 1-16, 2024

arXiv:2201.01357 [pdf, other]

Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis

Authors: Max Goplerud, Kosuke Imai, Nicole E. Pashley

Abstract: Estimation of heterogeneous treatment effects is an active area of research. Most of the existing methods, however, focus on estimating the conditional average treatment effects of a single, binary treatment given a set of pre-treatment covariates. In this paper, we propose a method to estimate the heterogeneous causal effects of high-dimensional treatments, which poses unique challenges in terms… ▽ More Estimation of heterogeneous treatment effects is an active area of research. Most of the existing methods, however, focus on estimating the conditional average treatment effects of a single, binary treatment given a set of pre-treatment covariates. In this paper, we propose a method to estimate the heterogeneous causal effects of high-dimensional treatments, which poses unique challenges in terms of estimation and interpretation. The proposed approach finds maximally heterogeneous groups and uses a Bayesian mixture of regularized logistic regressions to identify groups of units who exhibit similar patterns of treatment effects. By directly modeling group membership with covariates, the proposed methodology allows one to explore the unit characteristics that are associated with different patterns of treatment effects. Our motivating application is conjoint analysis, which is a popular type of survey experiment in social science and marketing research and is based on a high-dimensional factorial design. We apply the proposed methodology to the conjoint data, where survey respondents are asked to select one of two immigrant profiles with randomly selected attributes. We find that a group of respondents with a relatively high degree of prejudice appears to discriminate against immigrants from non-European countries like Iraq. An open-source software package is available for implementing the proposed methodology. △ Less

Submitted 16 June, 2024; v1 submitted 4 January, 2022; originally announced January 2022.

Comments: Major revision; added Propositions 1 and 2. 27 pages (main text); 34 pages (supplementary information)

arXiv:2110.14014 [pdf, other]

Measuring and Modeling Neighborhoods

Authors: Cory McCartan, Jacob R. Brown, Kosuke Imai

Abstract: Granular geographic data present new opportunities to understand how neighborhoods are formed, and how they influence politics. At the same time, the inherent subjectivity of neighborhoods creates methodological challenges in measuring and modeling them. We develop an open-source survey instrument that allows respondents to draw their neighborhoods on a map. We also propose a statistical model to… ▽ More Granular geographic data present new opportunities to understand how neighborhoods are formed, and how they influence politics. At the same time, the inherent subjectivity of neighborhoods creates methodological challenges in measuring and modeling them. We develop an open-source survey instrument that allows respondents to draw their neighborhoods on a map. We also propose a statistical model to analyze how the characteristics of respondents and local areas determine subjective neighborhoods. We conduct two surveys: collecting subjective neighborhoods from voters in Miami, New York City, and Phoenix, and asking New York City residents to draw a community of interest for inclusion in their city council district. Our analysis shows that, holding other factors constant, White respondents include census blocks with more White residents in their neighborhoods. Similarly, Democrats and Republicans are more likely to include co-partisan areas. Furthermore, our model provides more accurate out-of-sample predictions than standard neighborhood measures. △ Less

Submitted 19 January, 2024; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: 34 pages, 11 figures, and supplementary material

arXiv:2109.11679 [pdf, other]

Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment

Authors: Eli Ben-Michael, D. James Greiner, Kosuke Imai, Zhichao Jiang

Abstract: Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these and other data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. For example, algorithmic pre-trial risk assessments, which serve as our motivating application, provide relatively simple, deterministic… ▽ More Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these and other data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. For example, algorithmic pre-trial risk assessments, which serve as our motivating application, provide relatively simple, deterministic classification scores and recommendations to help judges make release decisions. How can we use the data based on existing deterministic policies to learn new and better policies? Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic rather than deterministic. We develop a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy by minimizing the worst-case regret. The resulting policy is conservative but has a statistical safety guarantee, allowing the policy-maker to limit the probability of producing a worse outcome than the existing policy. We extend this approach to common and important settings where humans make decisions with the aid of algorithmic recommendations. Lastly, we apply the proposed methodology to a unique field experiment on pre-trial risk assessment instruments. We derive new classification and recommendation rules that retain the transparency and interpretability of the existing instrument while potentially leading to better overall outcomes at a lower cost. △ Less

Submitted 15 February, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2108.01255 [pdf, ps, other]

Optimal Covariate Balancing Conditions in Propensity Score Estimation

Authors: Jianqing Fan, Kosuke Imai, Inbeom Lee, Han Liu, Yang Ning, Xiaolin Yang

Abstract: Inverse probability of treatment weighting (IPTW) is a popular method for estimating the average treatment effect (ATE). However, empirical studies show that the IPTW estimators can be sensitive to the misspecification of the propensity score model. To address this problem, researchers have proposed to estimate propensity score by directly optimizing the balance of pre-treatment covariates. While… ▽ More Inverse probability of treatment weighting (IPTW) is a popular method for estimating the average treatment effect (ATE). However, empirical studies show that the IPTW estimators can be sensitive to the misspecification of the propensity score model. To address this problem, researchers have proposed to estimate propensity score by directly optimizing the balance of pre-treatment covariates. While these methods appear to empirically perform well, little is known about how the choice of balancing conditions affects their theoretical properties. To fill this gap, we first characterize the asymptotic bias and efficiency of the IPTW estimator based on the Covariate Balancing Propensity Score (CBPS) methodology under local model misspecification. Based on this analysis, we show how to optimally choose the covariate balancing functions and propose an optimal CBPS-based IPTW estimator. This estimator is doubly robust; it is consistent for the ATE if either the propensity score model or the outcome model is correct. In addition, the proposed estimator is locally semiparametric efficient when both models are correctly specified. To further relax the parametric assumptions, we extend our method by using a sieve estimation approach. We show that the resulting estimator is globally efficient under a set of much weaker assumptions and has a smaller asymptotic bias than the existing estimators. Finally, we evaluate the finite sample performance of the proposed estimators via simulation and empirical studies. An open-source software package is available for implementing the proposed methods. △ Less

Submitted 2 August, 2021; originally announced August 2021.

arXiv:2105.14197 [pdf, other]

doi 10.1126/sciadv.abk3283

The Impact of the U.S. Census Disclosure Avoidance System on Redistricting and Voting Rights Analysis

Authors: Christopher T. Kenny, Shiro Kuriwaki, Cory McCartan, Evan Rosenman, Tyler Simko, Kosuke Imai

Abstract: The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS), which attempts to achieve differential privacy guarantees by adding noise to the Census microdata. By applying redistricting simulation and analysis methods to DAS-protected 2010 Census data, we find that the protected data are not of sufficient quality for redistricting purp… ▽ More The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS), which attempts to achieve differential privacy guarantees by adding noise to the Census microdata. By applying redistricting simulation and analysis methods to DAS-protected 2010 Census data, we find that the protected data are not of sufficient quality for redistricting purposes. We demonstrate that the injected noise makes it impossible for states to accurately comply with the One Person, One Vote principle. Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition, and that these biases lead to large and unpredictable errors in the analysis of partisan and racial gerrymanders. Finally, we show that the DAS algorithm does not universally protect respondent privacy. Based on the names and addresses of registered voters, we are able to predict their race as accurately using the DAS-protected data as when using the 2010 Census data. Despite this, the DAS-protected data can still inaccurately estimate the number of majority-minority districts. We conclude with recommendations for how the Census Bureau should proceed with privacy protection for the 2020 Census. △ Less

Submitted 20 August, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: 42 pages, 22 figures. New postscript analyzing newly-released DAS-19.61 revision

Journal ref: Science advances, 7(41) (2021) eabk3283

arXiv:2103.00702 [pdf, other]

Dynamic Stochastic Blockmodel Regression for Network Data: Application to International Militarized Conflicts

Authors: Santiago Olivella, Tyler Pratt, Kosuke Imai

Abstract: A primary goal of social science research is to understand how latent group memberships predict the dynamic process of network evolution. In the modeling of international militarized conflicts, for instance, scholars hypothesize that membership in geopolitical coalitions shapes the decision to engage in conflict. Such theories explain the ways in which nodal and dyadic characteristics affect the e… ▽ More A primary goal of social science research is to understand how latent group memberships predict the dynamic process of network evolution. In the modeling of international militarized conflicts, for instance, scholars hypothesize that membership in geopolitical coalitions shapes the decision to engage in conflict. Such theories explain the ways in which nodal and dyadic characteristics affect the evolution of conflict patterns over time via their effects on group memberships. To aid the empirical testing of these arguments, we develop a dynamic model of network data by combining a hidden Markov model with a mixed-membership stochastic blockmodel that identifies latent groups underlying the network structure. Unlike existing models, we incorporate covariates that predict dynamic node memberships in latent groups as well as the direct formation of edges between dyads. While prior substantive research often assumes the decision to engage in international militarized conflict is independent across states and static over time, we demonstrate that conflict is driven by states' evolving membership in geopolitical blocs. Changes in monadic covariates like democracy shift states between coalitions, generating heterogeneous effects on conflict over time and across states. The proposed methodology, which relies on a variational approximation to a collapsed posterior distribution as well as stochastic optimization for scalability, is implemented through an open-source software package. △ Less

Submitted 25 October, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

Comments: 34 pages (main text), 34 pages (supplementary information), 21 figures

arXiv:2102.11926 [pdf, other]

Estimating Average Treatment Effects with Support Vector Machines

Authors: Alexander Tarr, Kosuke Imai

Abstract: Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We demonstrate that SVM can be used to balance covariates and estimate average causal effects under the unconfoundedness assumption. Specifically, we adapt the SVM classifier as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and… ▽ More Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We demonstrate that SVM can be used to balance covariates and estimate average causal effects under the unconfoundedness assumption. Specifically, we adapt the SVM classifier as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups while simultaneously maximizing effective sample size. We also show that SVM is a continuous relaxation of the quadratic integer program for computing the largest balanced subset, establishing its direct relation to the cardinality matching method. Another important feature of SVM is that the regularization parameter controls the trade-off between covariate balance and effective sample size. As a result, the existing SVM path algorithm can be used to compute the balance-sample size frontier. We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods. Finally, we conduct simulation and empirical studies to evaluate the performance of the proposed methodology and find that SVM is competitive with the state-of-the-art covariate balancing methods. △ Less

Submitted 1 July, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: 37 pages, 8 figures

arXiv:2012.02845 [pdf, other]

Experimental Evaluation of Algorithm-Assisted Human Decision-Making: Application to Pretrial Public Safety Assessment

Authors: Kosuke Imai, Zhichao Jiang, James Greiner, Ryan Halen, Sooahn Shin

Abstract: Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairne… ▽ More Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairness of such algorithmic recommendations, an overlooked question is whether they help humans make better decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. On the basis of the preliminary data available, we find that providing the PSA to the judge has little overall impact on the judge's decisions and subsequent arrestee behavior. However, our analysis yields some potentially suggestive evidence that the PSA may help avoid unnecessarily harsh decisions for female arrestees regardless of their risk levels while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. In terms of fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges' decision. Finally, we find that the PSA's recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high. △ Less

Submitted 11 December, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.07677 [pdf, other]

Statistical Inference and Power Analysis for Direct and Spillover Effects in Two-Stage Randomized Experiments

Authors: Zhichao Jiang, Kosuke Imai, Anup Malani

Abstract: Two-stage randomized experiments are becoming an increasingly popular experimental design for causal inference when the outcome of one unit may be affected by the treatment assignments of other units in the same cluster. In this paper, we provide a methodological framework for general tools of statistical inference and power analysis for two-stage randomized experiments. Under the randomization-ba… ▽ More Two-stage randomized experiments are becoming an increasingly popular experimental design for causal inference when the outcome of one unit may be affected by the treatment assignments of other units in the same cluster. In this paper, we provide a methodological framework for general tools of statistical inference and power analysis for two-stage randomized experiments. Under the randomization-based framework, we consider the estimation of a new direct effect of interest as well as the average direct and spillover effects studied in the literature. We provide unbiased estimators of these causal quantities and their conservative variance estimators in a general setting. Using these results, we then develop hypothesis testing procedures and derive sample size formulas. We theoretically compare the two-stage randomized design with the completely randomized and cluster randomized designs, which represent two limiting designs. Finally, we conduct simulation studies to evaluate the empirical performance of our sample size formulas. For empirical illustration, the proposed methodology is applied to the randomized evaluation of the Indian national health insurance program. An open-source software package is available for implementing the proposed methodology. △ Less

Submitted 20 October, 2022; v1 submitted 15 November, 2020; originally announced November 2020.

arXiv:2008.06131 [pdf, other]

doi 10.1214/23-AOAS1763

Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans

Authors: Cory McCartan, Kosuke Imai

Abstract: Random sampling of graph partitions under constraints has become a popular tool for evaluating legislative redistricting plans. Analysts detect partisan gerrymandering by comparing a proposed redistricting plan with an ensemble of sampled alternative plans. For successful application, sampling methods must scale to maps with a moderate or large number of districts, incorporate realistic legal cons… ▽ More Random sampling of graph partitions under constraints has become a popular tool for evaluating legislative redistricting plans. Analysts detect partisan gerrymandering by comparing a proposed redistricting plan with an ensemble of sampled alternative plans. For successful application, sampling methods must scale to maps with a moderate or large number of districts, incorporate realistic legal constraints, and accurately and efficiently sample from a selected target distribution. Unfortunately, most existing methods struggle in at least one of these areas. We present a new Sequential Monte Carlo (SMC) algorithm that generates a sample of redistricting plans converging to a realistic target distribution. Because it draws many plans in parallel, the SMC algorithm can efficiently explore the relevant space of redistricting plans better than the existing Markov chain Monte Carlo (MCMC) algorithms that generate plans sequentially. Our algorithm can simultaneously incorporate several constraints commonly imposed in real-world redistricting problems, including equal population, compactness, and preservation of administrative boundaries. We validate the accuracy of the proposed algorithm by using a small map where all redistricting plans can be enumerated. We then apply the SMC algorithm to evaluate the partisan implications of several maps submitted by relevant parties in a recent high-profile redistricting case in the state of Pennsylvania. We find that the proposed algorithm converges faster and with fewer samples than a comparable MCMC algorithm. Open-source software is available for implementing the proposed methodology. △ Less

Submitted 14 February, 2023; v1 submitted 13 August, 2020; originally announced August 2020.

Comments: 19 pages, 7 figures, plus appendices; revised validation section, discussion, and appendices

Journal ref: Annals of Applied Statistics 14(7), 2023

arXiv:2006.10148 [pdf, other]

The Essential Role of Empirical Validation in Legislative Redistricting Simulation

Authors: Benjamin Fifield, Kosuke Imai, Jun Kawahara, Christopher T. Kenny

Abstract: As granular data about elections and voters become available, redistricting simulation methods are playing an increasingly important role when legislatures adopt redistricting plans and courts determine their legality. These simulation methods are designed to yield a representative sample of all redistricting plans that satisfy statutory guidelines and requirements such as contiguity, population p… ▽ More As granular data about elections and voters become available, redistricting simulation methods are playing an increasingly important role when legislatures adopt redistricting plans and courts determine their legality. These simulation methods are designed to yield a representative sample of all redistricting plans that satisfy statutory guidelines and requirements such as contiguity, population parity, and compactness. A proposed redistricting plan can be considered gerrymandered if it constitutes an outlier relative to this sample according to partisan fairness metrics. Despite their growing use, an insufficient effort has been made to empirically validate the accuracy of the simulation methods. We apply a recently developed computational method that can efficiently enumerate all possible redistricting plans and yield an independent uniform sample from this population. We show that this algorithm scales to a state with a couple of hundred geographical units. Finally, we empirically examine how existing simulation methods perform on realistic validation data sets. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 32 pages, 14 figures

arXiv:2005.10400 [pdf, other]

Principal Fairness for Human and Algorithmic Decision-Making

Authors: Kosuke Imai, Zhichao Jiang

Abstract: Using the concept of principal stratification from the causal inference literature, we introduce a new notion of fairness, called principal fairness, for human and algorithmic decision-making. The key idea is that one should not discriminate among individuals who would be similarly affected by the decision. Unlike the existing statistical definitions of fairness, principal fairness explicitly acco… ▽ More Using the concept of principal stratification from the causal inference literature, we introduce a new notion of fairness, called principal fairness, for human and algorithmic decision-making. The key idea is that one should not discriminate among individuals who would be similarly affected by the decision. Unlike the existing statistical definitions of fairness, principal fairness explicitly accounts for the fact that individuals can be impacted by the decision. Furthermore, we explain how principal fairness differs from the existing causality-based fairness criteria. In contrast to the counterfactual fairness criteria, for example, principal fairness considers the effects of decision in question rather than those of protected attributes of interest. We briefly discuss how to approach empirical evaluation and policy learning problems under the proposed principal fairness criterion. △ Less

Submitted 24 March, 2022; v1 submitted 20 May, 2020; originally announced May 2020.

arXiv:2004.05964 [pdf, other]

Keyword Assisted Topic Models

Authors: Shusei Eshima, Kosuke Imai, Tomoya Sasaki

Abstract: In recent years, fully automated content analysis based on probabilistic topic models has become popular among social scientists because of their scalability. The unsupervised nature of the models makes them suitable for exploring topics in a corpus without prior knowledge. However, researchers find that these models often fail to measure specific concepts of substantive interest by inadvertently… ▽ More In recent years, fully automated content analysis based on probabilistic topic models has become popular among social scientists because of their scalability. The unsupervised nature of the models makes them suitable for exploring topics in a corpus without prior knowledge. However, researchers find that these models often fail to measure specific concepts of substantive interest by inadvertently creating multiple topics with similar content and combining distinct themes into a single topic. In this paper, we empirically demonstrate that providing a small number of keywords can substantially enhance the measurement performance of topic models. An important advantage of the proposed keyword assisted topic model (keyATM) is that the specification of keywords requires researchers to label topics prior to fitting a model to the data. This contrasts with a widespread practice of post-hoc topic interpretation and adjustments that compromises the objectivity of empirical findings. In our application, we find that keyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models. Finally, we show that keyATM can also incorporate covariates and model time trends. An open-source software package is available for implementing the proposed methodology. △ Less

Submitted 2 February, 2023; v1 submitted 13 April, 2020; originally announced April 2020.

arXiv:2003.13555 [pdf, other]

Causal Inference with Spatio-temporal Data: Estimating the Effects of Airstrikes on Insurgent Violence in Iraq

Authors: Georgia Papadogeorgou, Kosuke Imai, Jason Lyall, Fan Li

Abstract: Many causal processes have spatial and temporal dimensions. Yet the classic causal inference framework is not directly applicable when the treatment and outcome variables are generated by spatio-temporal point processes. We extend the potential outcomes framework to these settings by formulating the treatment point process as a stochastic intervention. Our causal estimands include the expected num… ▽ More Many causal processes have spatial and temporal dimensions. Yet the classic causal inference framework is not directly applicable when the treatment and outcome variables are generated by spatio-temporal point processes. We extend the potential outcomes framework to these settings by formulating the treatment point process as a stochastic intervention. Our causal estimands include the expected number of outcome events in a specified area under a particular stochastic treatment assignment strategy. Our methodology allows for arbitrary patterns of spatial spillover and temporal carryover effects. Using martingale theory, we show that the proposed estimator is consistent and asymptotically normal as the number of time periods increases. We propose a sensitivity analysis for the possible existence of unmeasured confounders, and extend it to the Hajek estimator. Simulation studies are conducted to examine the estimators' finite sample performance. Finally, we illustrate the proposed methods by estimating the effects of American airstrikes on insurgent violence in Iraq from February 2007 to July 2008. Our analysis suggests that increasing the average number of daily airstrikes for up to one month may result in more insurgent attacks. We also find some evidence that airstrikes can displace attacks from Baghdad to new locations up to 400 kilometers away △ Less

Submitted 8 June, 2022; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:1910.06991 [pdf, other]

Discussion of "The Blessings of Multiple Causes" by Wang and Blei

Authors: Kosuke Imai, Zhichao Jiang

Abstract: This commentary has two goals. We first critically review the deconfounder method and point out its advantages and limitations. We then briefly consider three possible ways to address some of the limitations of the deconfounder method. This commentary has two goals. We first critically review the deconfounder method and point out its advantages and limitations. We then briefly consider three possible ways to address some of the limitations of the deconfounder method. △ Less

Submitted 15 October, 2019; originally announced October 2019.

arXiv:1905.05389 [pdf, other]

doi 10.1080/01621459.2021.1923511

Experimental Evaluation of Individualized Treatment Rules

Authors: Kosuke Imai, Michael Lingzhi Li

Abstract: The increasing availability of individual-level data has led to numerous applications of individualized (or personalized) treatment rules (ITRs). Policy makers often wish to empirically evaluate ITRs and compare their relative performance before implementing them in a target population. We propose a new evaluation metric, the population average prescriptive effect (PAPE). The PAPE compares the per… ▽ More The increasing availability of individual-level data has led to numerous applications of individualized (or personalized) treatment rules (ITRs). Policy makers often wish to empirically evaluate ITRs and compare their relative performance before implementing them in a target population. We propose a new evaluation metric, the population average prescriptive effect (PAPE). The PAPE compares the performance of ITR with that of non-individualized treatment rule, which randomly treats the same proportion of units. Averaging the PAPE over a range of budget constraints yields our second evaluation metric, the area under the prescriptive effect curve (AUPEC). The AUPEC represents an overall performance measure for evaluation, like the area under the receiver and operating characteristic curve (AUROC) does for classification, and is a generalization of the QINI coefficient utilized in uplift modeling. We use Neyman's repeated sampling framework to estimate the PAPE and AUPEC and derive their exact finite-sample variances based on random sampling of units and random assignment of treatment. We extend our methodology to a common setting, in which the same experimental data is used to both estimate and evaluate ITRs. In this case, our variance calculation incorporates the additional uncertainty due to random splits of data used for cross-validation. The proposed evaluation metrics can be estimated without requiring modeling assumptions, asymptotic approximation, or resampling methods. As a result, it is applicable to any ITR including those based on complex machine learning algorithms. The open-source software package is available for implementing the proposed methodology. △ Less

Submitted 5 May, 2021; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: Accepted at JASA

arXiv:1812.08683 [pdf, other]

Robust Estimation of Causal Effects via High-Dimensional Covariate Balancing Propensity Score

Authors: Yang Ning, Sida Peng, Kosuke Imai

Abstract: In this paper, we propose a robust method to estimate the average treatment effects in observational studies when the number of potential confounders is possibly much greater than the sample size. We first use a class of penalized M-estimators for the propensity score and outcome models. We then calibrate the initial estimate of the propensity score by balancing a carefully selected subset of cova… ▽ More In this paper, we propose a robust method to estimate the average treatment effects in observational studies when the number of potential confounders is possibly much greater than the sample size. We first use a class of penalized M-estimators for the propensity score and outcome models. We then calibrate the initial estimate of the propensity score by balancing a carefully selected subset of covariates that are predictive of the outcome. Finally, the estimated propensity score is used to construct the inverse probability weighting estimator. We prove that the proposed estimator, which has the sample boundedness property, is root-n consistent, asymptotically normal, and semiparametrically efficient when the propensity score model is correctly specified and the outcome model is linear in covariates. More importantly, we show that our estimator remains root-n consistent and asymptotically normal so long as either the propensity score model or the outcome model is correctly specified. We provide valid confidence intervals in both cases and further extend these results to the case where the outcome model is a generalized linear model. In simulation studies, we find that the proposed methodology often estimates the average treatment effect more accurately than the existing methods. We also present an empirical application, in which we estimate the average causal effect of college attendance on adulthood political participation. Open-source software is available for implementing the proposed methodology. △ Less

Submitted 20 December, 2018; originally announced December 2018.

arXiv:1601.03501 [pdf, ps, other]

Efficient nonparametric estimation of causal mediation effects

Authors: K. C. G. Chan, K. Imai, S. C. P. Yam, Z. Zhang

Abstract: An essential goal of program evaluation and scientific research is the investigation of causal mechanisms. Over the past several decades, causal mediation analysis has been used in medical and social sciences to decompose the treatment effect into the natural direct and indirect effects. However, all of the existing mediation analysis methods rely on parametric modeling assumptions in one way or a… ▽ More An essential goal of program evaluation and scientific research is the investigation of causal mechanisms. Over the past several decades, causal mediation analysis has been used in medical and social sciences to decompose the treatment effect into the natural direct and indirect effects. However, all of the existing mediation analysis methods rely on parametric modeling assumptions in one way or another, typically requiring researchers to specify multiple regression models involving the treatment, mediator, outcome, and pre-treatment confounders. To overcome this limitation, we propose a novel nonparametric estimation method for causal mediation analysis that eliminates the need for applied researchers to model multiple conditional distributions. The proposed method balances a certain set of empirical moments between the treatment and control groups by weighting each observation; in particular, we establish that the proposed estimator is globally semiparametric efficient. We also show how to consistently estimate the asymptotic variance of the proposed estimator without additional efforts. Finally, we extend the proposed method to other relevant settings including the causal mediation analysis with multiple mediators. △ Less

Submitted 14 January, 2016; originally announced January 2016.

Comments: Nonparametric Estimation, Natural direct effects, Natural indirect effects, Treatment effects, Semiparametric efficiency

MSC Class: 62G05

arXiv:1309.6361 [pdf, other]

Causal Inference in Observational Studies with Non-Binary Treatments

Authors: Shandong Zhao, David A. van Dyk, Kosuke Imai

Abstract: Propensity score methods have become a part of the standard toolkit for applied researchers who wish to ascertain causal effects from observational data. While they were originally developed for binary treatments, several researchers have proposed generalizations of the propensity score methodology for non-binary treatment regimes. Such extensions have widened the applicability of propensity score… ▽ More Propensity score methods have become a part of the standard toolkit for applied researchers who wish to ascertain causal effects from observational data. While they were originally developed for binary treatments, several researchers have proposed generalizations of the propensity score methodology for non-binary treatment regimes. Such extensions have widened the applicability of propensity score methods and are indeed becoming increasingly popular themselves. In this article, we closely examine the two main generalizations of propensity score methods, namely, the propensity function (P-FUNCTION) of Imai and van Dyk (2004) and the generalized propensity score (GPS) of Hirano and Imbens (2004), along with recent extensions of the GPS that aim to improve its robustness. We compare the assumptions, theoretical properties, and empirical performance of these alternative methodologies. On a theoretical level, the GPS and its extensions are advantageous in that they can be used to estimate the full dose response function rather than the simple average treatment effect that is typically estimated with the P-FUNCTION. Unfortunately, our analysis shows that in practice response models often used with the original GPS are less flexible than those typically used with propensity score methods and are prone to misspecification. We compare new and existing methods that improve the robustness of the GPS and propose methods that use the P-FUNCTION to estimate the dose response function. We illustrate our findings and proposals through simulation studies, including one based on an empirical application. △ Less

Submitted 24 September, 2013; originally announced September 2013.

arXiv:1305.5682 [pdf, ps, other]

doi 10.1214/12-AOAS593

Estimating treatment effect heterogeneity in randomized program evaluation

Authors: Kosuke Imai, Marc Ratkovic

Abstract: When evaluating the efficacy of social programs and medical treatments using randomized experiments, the estimated overall average causal effect alone is often of limited value and the researchers must investigate when the treatments do and do not work. Indeed, the estimation of treatment effect heterogeneity plays an essential role in (1) selecting the most effective treatment from a large number… ▽ More When evaluating the efficacy of social programs and medical treatments using randomized experiments, the estimated overall average causal effect alone is often of limited value and the researchers must investigate when the treatments do and do not work. Indeed, the estimation of treatment effect heterogeneity plays an essential role in (1) selecting the most effective treatment from a large number of available treatments, (2) ascertaining subpopulations for which a treatment is effective or harmful, (3) designing individualized optimal treatment regimes, (4) testing for the existence or lack of heterogeneous treatment effects, and (5) generalizing causal effect estimates obtained from an experimental sample to a target population. In this paper, we formulate the estimation of heterogeneous treatment effects as a variable selection problem. We propose a method that adapts the Support Vector Machine classifier by placing separate sparsity constraints over the pre-treatment parameters and causal heterogeneity parameters of interest. The proposed method is motivated by and applied to two well-known randomized evaluation studies in the social sciences. Our method selects the most effective voter mobilization strategies from a large number of alternative strategies, and it also identifies the characteristics of workers who greatly benefit from (or are negatively affected by) a job training program. In our simulation studies, we find that the proposed method often outperforms some commonly used alternatives. △ Less

Submitted 24 May, 2013; originally announced May 2013.

Comments: Published in at http://dx.doi.org/10.1214/12-AOAS593 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS593

Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 1, 443-470

arXiv:1011.1079 [pdf, ps, other]

doi 10.1214/10-STS321

Identification, Inference and Sensitivity Analysis for Causal Mediation Effects

Authors: Kosuke Imai, Luke Keele, Teppei Yamamoto

Abstract: Causal mediation analysis is routinely conducted by applied researchers in a variety of disciplines. The goal of such an analysis is to investigate alternative causal mechanisms by examining the roles of intermediate variables that lie in the causal paths between the treatment and outcome variables. In this paper we first prove that under a particular version of sequential ignorability assumption,… ▽ More Causal mediation analysis is routinely conducted by applied researchers in a variety of disciplines. The goal of such an analysis is to investigate alternative causal mechanisms by examining the roles of intermediate variables that lie in the causal paths between the treatment and outcome variables. In this paper we first prove that under a particular version of sequential ignorability assumption, the average causal mediation effect (ACME) is nonparametrically identified. We compare our identification assumption with those proposed in the literature. Some practical implications of our identification result are also discussed. In particular, the popular estimator based on the linear structural equation model (LSEM) can be interpreted as an ACME estimator once additional parametric assumptions are made. We show that these assumptions can easily be relaxed within and outside of the LSEM framework and propose simple nonparametric estimation strategies. Second, and perhaps most importantly, we propose a new sensitivity analysis that can be easily implemented by applied researchers within the LSEM framework. Like the existing identifying assumptions, the proposed sequential ignorability assumption may be too strong in many applied settings. Thus, sensitivity analysis is essential in order to examine the robustness of empirical findings to the possible existence of an unmeasured confounder. Finally, we apply the proposed methods to a randomized experiment from political psychology. We also make easy-to-use software available to implement the proposed methods. △ Less

Submitted 4 November, 2010; originally announced November 2010.

Comments: Published in at http://dx.doi.org/10.1214/10-STS321 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS321

Journal ref: Statistical Science 2010, Vol. 25, No. 1, 51-71

arXiv:0910.3758 [pdf, ps, other]

doi 10.1214/09-STS274REJ

Rejoinder: Matched Pairs and the Future of Cluster-Randomized Experiments

Authors: Kosuke Imai, Gary King, Clayton Nall

Abstract: Rejoinder to "The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation" [arXiv:0910.3752] Rejoinder to "The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation" [arXiv:0910.3752] △ Less

Submitted 20 October, 2009; originally announced October 2009.

Comments: Published in at http://dx.doi.org/10.1214/09-STS274REJ the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS274REJ

Journal ref: Statistical Science 2009, Vol. 24, No. 1, 65-72

arXiv:0910.3752 [pdf, ps, other]

doi 10.1214/08-STS274

The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation

Authors: Kosuke Imai, Gary King, Clayton Nall

Abstract: A basic feature of many field experiments is that investigators are only able to randomize clusters of individuals--such as households, communities, firms, medical practices, schools or classrooms--even when the individual is the unit of interest. To recoup the resulting efficiency loss, some studies pair similar clusters and randomize treatment within pairs. However, many other studies avoid pa… ▽ More A basic feature of many field experiments is that investigators are only able to randomize clusters of individuals--such as households, communities, firms, medical practices, schools or classrooms--even when the individual is the unit of interest. To recoup the resulting efficiency loss, some studies pair similar clusters and randomize treatment within pairs. However, many other studies avoid pairing, in part because of claims in the literature, echoed by clinical trials standards organizations, that this matched-pair, cluster-randomization design has serious problems. We argue that all such claims are unfounded. We also prove that the estimator recommended for this design in the literature is unbiased only in situations when matching is unnecessary; its standard error is also invalid. To overcome this problem without modeling assumptions, we develop a simple design-based estimator with much improved statistical properties. We also propose a model-based approach that includes some of the benefits of our design-based estimator as well as the estimator in the literature. Our methods also address individual-level noncompliance, which is common in applications but not allowed for in most existing methods. We show that from the perspective of bias, efficiency, power, robustness or research costs, and in large or small samples, pairing should be used in cluster-randomized experiments whenever feasible; failing to do so is equivalent to discarding a considerable fraction of one's data. We develop these techniques in the context of a randomized evaluation we are conducting of the Mexican Universal Health Insurance Program. △ Less

Submitted 20 October, 2009; originally announced October 2009.

Comments: This paper commented in: [arXiv:0910.3754], [arXiv:0910.3756]. Rejoinder in [arXiv:0910.3758]. Published in at http://dx.doi.org/10.1214/08-STS274 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS274

Journal ref: Statistical Science 2009, Vol. 24, No. 1, 29-53

Showing 1–44 of 44 results for author: Imai, K