Skip to main content

Showing 1–28 of 28 results for author: Zubizarreta, J R

.
  1. arXiv:2312.03268  [pdf, other

    stat.ME stat.AP

    Design-based inference for generalized network experiments with stochastic interventions

    Authors: Ambarish Chattopadhyay, Kosuke Imai, Jose R. Zubizarreta

    Abstract: A growing number of scholars and data scientists are conducting randomized experiments to analyze causal relationships in network settings where units influence one another. A dominant methodology for analyzing these network experiments has been design-based, leveraging randomization of treatment assignment as the basis for inference. In this paper, we generalize this design-based approach so that… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  2. arXiv:2311.00568  [pdf, other

    stat.ME stat.CO stat.ML

    Scalable kernel balancing weights in a nationwide observational study of hospital profit status and heart attack outcomes

    Authors: Kwangho Kim, Bijan A. Niknam, José R. Zubizarreta

    Abstract: Weighting is a general and often-used method for statistical adjustment. Weighting has two objectives: first, to balance covariate distributions, and second, to ensure that the weights have minimal dispersion and thus produce a more stable estimator. A recent, increasingly common approach directly optimizes the weights toward these two objectives. However, this approach has not yet been feasible i… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  3. arXiv:2306.03625  [pdf, other

    stat.ME cs.LG stat.ML

    Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning

    Authors: Kwangho Kim, José R. Zubizarreta

    Abstract: We propose a simple and general framework for nonparametric estimation of heterogeneous treatment effects under fairness constraints. Under standard regularity conditions, we show that the resulting estimators possess the double robustness property. We use this framework to characterize the trade-off between fairness and the maximum welfare achievable by the optimal policy. We evaluate the methods… ▽ More

    Submitted 20 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 16997--17014, 2023

  4. arXiv:2305.14118  [pdf, other

    stat.ME stat.AP

    Notes on Causation, Comparison, and Regression

    Authors: Ambarish Chattopadhyay, Jose R. Zubizarreta

    Abstract: Comparison and contrast are the basic means to unveil causation and learn which treatments work. To build good comparison groups, randomized experimentation is key, yet often infeasible. In such non-experimental settings, we illustrate and discuss diagnostics to assess how well the common linear regression approach to causal inference approximates desirable features of randomized experiments, such… ▽ More

    Submitted 28 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  5. arXiv:2305.04143  [pdf, other

    stat.AP

    Risk Set Matched Difference-in-Differences for the Analysis of Effect Modification in an Observational Study on the Impact of Gun Violence on Health Outcomes

    Authors: Eric R. Cohn, Zirui Song, Jose R. Zubizarreta

    Abstract: Gun violence is a major source of injury and death in the United States. However, relatively little is known about the effects of firearm injuries on survivors and their family members and how these effects vary across subpopulations. To study these questions and, more generally, to address a gap in the causal inference literature, we present a framework for the study of effect modification or het… ▽ More

    Submitted 31 May, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  6. arXiv:2303.08790  [pdf, other

    stat.ME stat.AP

    lmw: Linear Model Weights for Causal Inference

    Authors: Ambarish Chattopadhyay, Noah Greifer, Jose R. Zubizarreta

    Abstract: The linear regression model is widely used in the biomedical and social sciences as well as in policy and business research to adjust for covariates and estimate the average effects of treatments. Behind every causal inference endeavor there is a hypothetical randomized experiment. However, in routine regression analyses in observational studies, it is unclear how well the adjustments made by regr… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

  7. arXiv:2301.06199  [pdf, other

    cs.LG stat.ME stat.ML

    Doubly Robust Counterfactual Classification

    Authors: Kwangho Kim, Edward H. Kennedy, José R. Zubizarreta

    Abstract: We study counterfactual classification as a new tool for decision-making under hypothetical (contrary to fact) scenarios. We propose a doubly-robust nonparametric estimator for a general counterfactual classifier, where we can incorporate flexible constraints by casting the classification problem as a nonlinear mathematical program involving counterfactuals. We go on to analyze the rates of conver… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  8. arXiv:2209.09538  [pdf, other

    stat.ME

    Counterfactual Mean-variance Optimization

    Authors: Kwangho Kim, Alan Mishler, José R. Zubizarreta

    Abstract: We study a new class of estimands in causal inference, which are the solutions to a stochastic nonlinear optimization problem that in general cannot be obtained in closed form. The optimization problem describes the counterfactual state of a system after an intervention, and the solutions represent the optimal decisions in that counterfactual state. In particular, we develop a counterfactual mean-… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  9. arXiv:2205.09736  [pdf, other

    stat.ME stat.AP

    Balanced and Robust Randomized Treatment Assignments: The Finite Selection Model for the Health Insurance Experiment and Beyond

    Authors: Ambarish Chattopadhyay, Carl N. Morris, Jose R. Zubizarreta

    Abstract: The Finite Selection Model (FSM) was developed by Carl Morris in the 1970s for the design of the RAND Health Insurance Experiment (HIE) (Morris 1979, Newhouse et al. 1993), one of the largest and most comprehensive social science experiments conducted in the U.S. The idea behind the FSM is that each treatment group takes its turns selecting units in a fair and random order to optimize a common ass… ▽ More

    Submitted 4 July, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

  10. arXiv:2203.08701  [pdf, other

    stat.ME

    One-Step weighting to generalize and transport treatment effect estimates to a target population

    Authors: Ambarish Chattopadhyay, Eric R. Cohn, Jose R. Zubizarreta

    Abstract: The problem of generalization and transportation of treatment effect estimates from a study sample to a target population is central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study sele… ▽ More

    Submitted 15 June, 2023; v1 submitted 16 March, 2022; originally announced March 2022.

  11. arXiv:2203.00768  [pdf, other

    stat.ME stat.AP

    Privacy-Preserving, Communication-Efficient, and Target-Flexible Hospital Quality Measurement

    Authors: Larry Han, Yige Li, Bijan A. Niknam, Jose R. Zubizarreta

    Abstract: Integrating information from multiple data sources can enable more precise, timely, and generalizable decisions. However, it is challenging to make valid causal inferences using observational data from multiple data sources. For example, in healthcare, learning from electronic health records contained in different hospitals is desirable but difficult due to heterogeneity in patient case mix, diffe… ▽ More

    Submitted 6 February, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: 49 pages of main text + 28 pages of supplemental material

  12. Using Cardinality Matching to Design Balanced and Representative Samples for Observational Studies

    Authors: Bijan A. Niknam, Jose R. Zubizarreta

    Abstract: Cardinality matching is a computational method for finding the largest possible number of matched pairs of exposed and unexposed individuals from an observational dataset, with specified patterns of baseline characteristics that represent a target population for analysis. This article explains the process of cardinality matching and how it simultaneously addresses the concerns of balance, sample s… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

    Journal ref: JAMA. 2022;327(2):173-174

  13. arXiv:2110.14831  [pdf, ps, other

    stat.ME

    The Balancing Act in Causal Inference

    Authors: Eli Ben-Michael, Avi Feller, David A. Hirshberg, José R. Zubizarreta

    Abstract: The idea of covariate balance is at the core of causal inference. Inverse propensity weights play a central role because they are the unique set of weights that balance the covariate distributions of different treatment groups. We discuss two broad approaches to estimating these weights: the more traditional one, which fits a propensity score model and then uses the reciprocal of the estimated pro… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: 42 pages, 0 figures

    MSC Class: 62Gxx

  14. arXiv:2105.10060  [pdf, other

    stat.ME

    Profile Matching for the Generalization and Personalization of Causal Inferences

    Authors: Eric R. Cohn, Jose R. Zubizarreta

    Abstract: We introduce profile matching, a multivariate matching method for randomized experiments and observational studies that finds the largest possible unweighted samples across multiple treatment groups that are balanced relative to a covariate profile. This covariate profile can represent a specific population or a target individual, facilitating the generalization and personalization of causal infer… ▽ More

    Submitted 6 July, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

  15. arXiv:2105.02393  [pdf, other

    stat.ME

    Randomized and Balanced Allocation of Units into Treatment Groups Using the Finite Selection Model for R

    Authors: Ambarish Chattopadhyay, Carl N. Morris, Jose R. Zubizarreta

    Abstract: The original Finite Selection Model (FSM) was developed in the 1970s to enhance the design of the RAND Health Insurance Experiment (HIE; Newhouse et al. 1993). At the time of its development by Carl Morris (Morris 1979), there were fundamental computational limitations to make the method widely available for practitioners. Today, as randomized experiments increasingly become more common, there is… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

  16. arXiv:2105.02379  [pdf, other

    stat.AP

    Targeted Quality Measurement of Health Care Providers

    Authors: Jose R. Zubizarreta, Yige Li, Nancy L. Keating, Mary Beth Landrum

    Abstract: Measuring quality of cancer care delivered by US health providers is challenging. Patients receiving oncology care greatly vary in disease presentation among other key characteristics. In this paper we discuss a framework for institutional quality measurement which addresses the heterogeneity of patient populations. For this, we follow recent statistical developments on health outcomes research an… ▽ More

    Submitted 27 October, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  17. arXiv:2104.06581  [pdf, other

    stat.ME

    On the implied weights of linear regression for causal inference

    Authors: Ambarish Chattopadhyay, Jose R. Zubizarreta

    Abstract: A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Now, linear regression models are commonly used to analyze observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate b… ▽ More

    Submitted 7 July, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

  18. arXiv:2004.05641  [pdf, other

    stat.ME stat.AP

    Complex Discontinuity Designs Using Covariates: Impact of School Grade Retention on Later Life Outcomes in Chile

    Authors: Juan D. Diaz, Jose R. Zubizarreta

    Abstract: Regression discontinuity designs are extensively used for causal inference in observational studies. However, they are usually confined to settings with simple treatment rules, determined by a single running variable, with a single cutoff. Motivated by the problem of estimating the impact of grade retention on educational and juvenile crime outcomes in Chile, we propose a framework and methods for… ▽ More

    Submitted 9 February, 2022; v1 submitted 12 April, 2020; originally announced April 2020.

  19. Large Sample Properties of Matching for Balance

    Authors: Yixin Wang, José R. Zubizarreta

    Abstract: Matching methods are widely used for causal inference in observational studies. Among them, nearest neighbor matching is arguably the most popular. However, nearest neighbor matching does not generally yield an average treatment effect estimator that is $\sqrt{n}$-consistent (Abadie and Imbens, 2006). Are matching methods not $\sqrt{n}$-consistent in general? In this paper, we study a recent class… ▽ More

    Submitted 11 September, 2021; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: 32 pages

  20. arXiv:1901.10296  [pdf, ps, other

    math.ST stat.ME

    Minimax Linear Estimation of the Retargeted Mean

    Authors: David A. Hirshberg, Arian Maleki, Jose R. Zubizarreta

    Abstract: Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the c… ▽ More

    Submitted 26 February, 2021; v1 submitted 10 January, 2019; originally announced January 2019.

    Comments: 25 pages, 4 figures

  21. arXiv:1810.06707  [pdf, ps, other

    stat.AP

    Building Representative Matched Samples with Multi-valued Treatments in Large Observational Studies

    Authors: Magdalena Bennett, Juan Pablo Vielma, Jose R. Zubizarreta

    Abstract: In this paper, we present a new way of matching in observational studies that overcomes three limitations of existing matching approaches. First, it directly balances covariates with multi-valued treatments without requiring the generalized propensity score. Second, it builds self-weighted matched samples that are representative of a target population by design. Third, it can handle large data set… ▽ More

    Submitted 9 July, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

  22. arXiv:1706.07550  [pdf, other

    math.ST

    Shape-constrained partial identification of a population mean under unknown probabilities of sample selection

    Authors: Luke W. Miratrix, Stefan Wager, Jose R. Zubizarreta

    Abstract: A prevailing challenge in the biomedical and social sciences is to estimate a population mean from a sample obtained with unknown selection probabilities. Using a well-known ratio estimator, Aronow and Lee (2013) proposed a method for partial identification of the mean by allowing the unknown selection probabilities to vary arbitrarily between two fixed extreme values. In this paper, we show how t… ▽ More

    Submitted 22 June, 2017; originally announced June 2017.

  23. arXiv:1705.00998  [pdf, other

    stat.ME math.ST stat.AP

    Minimal Dispersion Approximately Balancing Weights: Asymptotic Properties and Practical Considerations

    Authors: Yixin Wang, José R. Zubizarreta

    Abstract: Weighting methods are widely used to adjust for covariates in observational studies, sample surveys, and regression settings. In this paper, we study a class of recently proposed weighting methods which find the weights of minimum dispersion that approximately balance the covariates. We call these weights "minimal weights" and study them under a common optimization framework. The key observation i… ▽ More

    Submitted 24 April, 2019; v1 submitted 2 May, 2017; originally announced May 2017.

    Comments: 41 pages

  24. arXiv:1602.00359  [pdf, ps, other

    math.ST

    Confidence intervals for means under constrained dependence

    Authors: Peter M. Aronow, Forrest W. Crawford, José R. Zubizarreta

    Abstract: We develop a general framework for conducting inference on the mean of dependent random variables given constraints on their dependency graph. We establish the consistency of an oracle variance estimator of the mean when the dependency graph is known, along with an associated central limit theorem. We derive an integer linear program for finding an upper bound for the estimated variance when the g… ▽ More

    Submitted 31 January, 2016; originally announced February 2016.

  25. Isolation in the construction of natural experiments

    Authors: José R. Zubizarreta, Dylan S. Small, Paul R. Rosenbaum

    Abstract: A natural experiment is a type of observational study in which treatment assignment, though not randomized by the investigator, is plausibly close to random. A process that assigns treatments in a highly nonrandom, inequitable manner may, in rare and brief moments, assign aspects of treatments at random or nearly so. Isolating those moments and aspects may extract a natural experiment from a setti… ▽ More

    Submitted 19 January, 2015; originally announced January 2015.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOAS770 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS770

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 4, 2096-2121

  26. arXiv:1409.8597  [pdf, other

    stat.ME

    Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large-Scale Voucher System

    Authors: Luke Keele, Jose R. Zubizarreta

    Abstract: A distinctive feature of a clustered observational study is its multilevel or nested data structure arising from the assignment of treatment, in a non-random manner, to groups or clusters of units or individuals. Examples are ubiquitous in the health and social sciences including patients in hospitals, employees in firms, and students in schools. What is the optimal matching strategy in a clustere… ▽ More

    Submitted 28 April, 2016; v1 submitted 30 September, 2014; originally announced September 2014.

  27. Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile

    Authors: José R. Zubizarreta, Ricardo D. Paredes, Paul R. Rosenbaum

    Abstract: Conventionally, the construction of a pair-matched sample selects treated and control units and pairs them in a single step with a view to balancing observed covariates $\mathbf{x}$ and reducing the heterogeneity or dispersion of treated-minus-control response differences, $Y$. In contrast, the method of cardinality matching developed here first selects the maximum number of units subject to covar… ▽ More

    Submitted 14 April, 2014; originally announced April 2014.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOAS713 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS713

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 1, 204-231

  28. Stronger instruments via integer programming in an observational study of late preterm birth outcomes

    Authors: José R. Zubizarreta, Dylan S. Small, Neera K. Goyal, Scott Lorch, Paul R. Rosenbaum

    Abstract: In an optimal nonbipartite match, a single population is divided into matched pairs to minimize a total distance within matched pairs. Nonbipartite matching has been used to strengthen instrumental variables in observational studies of treatment effects, essentially by forming pairs that are similar in terms of covariates but very different in the strength of encouragement to accept the treatment.… ▽ More

    Submitted 15 April, 2013; originally announced April 2013.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS582 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS582

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 1, 25-50