Skip to main content

Showing 1–42 of 42 results for author: Miratrix, L

.
  1. arXiv:2309.13666  [pdf, other

    stat.ME

    Decreasing the human coding burden in randomized trials with text-based outcomes via model-assisted impact analysis

    Authors: Reagan Mozer, Luke Miratrix

    Abstract: For randomized trials that use text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by trained human raters. This process, the current standard, is both time-consuming and limiting: even the largest human coding efforts are typically constrained to measure only a small set of dimensions across a subs… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  2. arXiv:2309.06727  [pdf, ps, other

    stat.ME

    Empirical Bayes Double Shrinkage for Combining Biased and Unbiased Causal Estimates

    Authors: Evan T. R. Rosenman, Francesca Dominici, Luke Miratrix

    Abstract: Motivated by the proliferation of observational datasets and the need to integrate non-randomized evidence with randomized controlled trials, causal inference researchers have recently proposed several new methodologies for combining biased and unbiased estimators. We contribute to this growing literature by develo** a new class of estimators for the data-combination problem: double-shrinkage es… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  3. arXiv:2308.06913  [pdf, other

    stat.ME stat.AP

    Improving the Estimation of Site-Specific Effects and their Distribution in Multisite Trials

    Authors: JoonHo Lee, Jonathan Che, Sophia Rabe-Hesketh, Avi Feller, Luke Miratrix

    Abstract: In multisite trials, researchers are often interested in several inferential goals: estimating treatment effects for each site, ranking these effects, and studying their distribution. This study seeks to identify optimal methods for estimating these targets. Through a comprehensive simulation study, we assess two strategies and their combined effects: semiparametric modeling of the prior distribut… ▽ More

    Submitted 1 April, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

  4. arXiv:2307.03687  [pdf, other

    cs.CL stat.AP stat.ME

    Leveraging text data for causal inference using electronic health records

    Authors: Reagan Mozer, Aaron R. Kaufman, Leo A. Celi, Luke Miratrix

    Abstract: In studies that rely on data from electronic health records (EHRs), unstructured text data such as clinical progress notes offer a rich source of information about patient characteristics and care that may be missing from structured data. Despite the prevalence of text in clinical research, these data are often ignored for the purposes of quantitative analysis due their complexity. This paper pres… ▽ More

    Submitted 20 May, 2024; v1 submitted 9 June, 2023; originally announced July 2023.

  5. arXiv:2303.10016  [pdf, other

    stat.ME

    Improving instrumental variable estimators with post-stratification

    Authors: Nicole E. Pashley, Luke Keele, Luke W. Miratrix

    Abstract: Experiments studying get-out-the-vote (GOTV) efforts estimate the causal effect of various mobilization efforts on voter turnout. However, there is often substantial noncompliance in these studies. A usual approach is to use an instrumental variable (IV) analysis to estimate impacts for compliers, here being those actually contacted by the investigators. Unfortunately, popular IV estimators can be… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

  6. arXiv:2205.08644  [pdf, other

    stat.ME math.ST

    Benefits and costs of matching prior to a Difference in Differences analysis when parallel trends does not hold

    Authors: Dae Woong Ham, Luke Miratrix

    Abstract: The Difference in Difference (DiD) estimator is a popular estimator built on the "parallel trends" assumption, which is an assertion that the treatment group, absent treatment, would change "similarly" to the control group over time. To bolster such a claim, one might generate a comparison group, via matching, that is similar to the treated group with respect to pre-treatment outcomes and/or pre-t… ▽ More

    Submitted 7 February, 2024; v1 submitted 17 May, 2022; originally announced May 2022.

  7. arXiv:2204.06687  [pdf, ps, other

    stat.ME

    Designing Experiments Toward Shrinkage Estimation

    Authors: Evan T. R. Rosenman, Luke Miratrix

    Abstract: We consider how increasingly available observational data can be used to improve the design of randomized controlled trials (RCTs). We seek to design a prospective RCT, with the intent of using an Empirical Bayes estimator to shrink the causal estimates from our trial toward causal estimates obtained from an observational study. We ask: how might we design the experiment to better complement the o… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

  8. arXiv:2112.15273  [pdf, other

    stat.ME

    Power Under Multiplicity Project (PUMP): Estimating Power, Minimum Detectable Effect Size, and Sample Size When Adjusting for Multiple Outcomes in Multi-level Experiments

    Authors: Kristen Hunter, Luke Miratrix, Kristin Porter

    Abstract: For randomized controlled trials (RCTs) with a single intervention being measured on multiple outcomes, researchers often apply a multiple testing procedure (such as Bonferroni or Benjamini-Hochberg) to adjust $p$-values. Such an adjustment reduces the likelihood of spurious findings, but also changes the statistical power, sometimes substantially, which reduces the probability of detecting effect… ▽ More

    Submitted 15 May, 2023; v1 submitted 30 December, 2021; originally announced December 2021.

    Comments: 60 pages, 6 figures

  9. arXiv:2111.01357  [pdf, other

    stat.ME

    Leveraging Population Outcomes to Improve the Generalization of Experimental Results

    Authors: Melody Huang, Naoki Egami, Erin Hartman, Luke Miratrix

    Abstract: Generalizing causal estimates in randomized experiments to a broader target population is essential for guiding decisions by policymakers and practitioners in the social and biomedical sciences. While recent papers developed various weighting estimators for the population average treatment effect (PATE), many of these methods result in large variance because the experimental sample often differs s… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  10. arXiv:2105.03529  [pdf, other

    stat.AP

    Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data

    Authors: Johann A. Gagnon-Bartsch, Adam C. Sales, Edward Wu, Anthony F. Botelho, John A. Erickson, Luke W. Miratrix, Neil T. Heffernan

    Abstract: Randomized controlled trials (RCTs) are increasingly prevalent in education research, and are often regarded as a gold standard of causal inference. Two main virtues of randomized experiments are that they (1) do not suffer from confounding, thereby allowing for an unbiased estimate of an intervention's causal impact, and (2) allow for design-based inference, meaning that the physical act of rando… ▽ More

    Submitted 19 May, 2023; v1 submitted 7 May, 2021; originally announced May 2021.

    Comments: Forthcoming in Journal of Causal Inference. Replication materials at https://osf.io/d9ujq/ . Results differ very slightly from previous versions due to changes made in the process of making the analysis replicable. For details, compare https://github.com/adamSales/rebarLoop/tree/ReplicateArxiv2-2023 (previous version) to https://github.com/adamSales/rebarLoop/tree/docker (current version)

  11. arXiv:2103.14765  [pdf, ps, other

    stat.ME

    Is it who you are or where you are? Accounting for compositional differences in cross-site treatment variation

    Authors: Benjamin Lu, Eli Ben-Michael, Avi Feller, Luke Miratrix

    Abstract: Multisite trials, in which treatment is randomized separately in multiple sites, offer a unique opportunity to disentangle treatment effect variation due to "compositional" differences in the distributions of unit-level features from variation due to "contextual" differences in site-level features. In particular, if we can re-weight (or "transport") each site to have a common distribution of unit-… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: 22 pages, 9 figures

  12. arXiv:2101.09195  [pdf, other

    stat.ME math.ST

    Randomization Inference beyond the Sharp Null: Bounded Null Hypotheses and Quantiles of Individual Treatment Effects

    Authors: Devin Caughey, Allan Dafoe, Xinran Li, Luke Miratrix

    Abstract: Randomization inference (RI) is typically interpreted as testing Fisher's "sharp" null hypothesis that all unit-level effects are exactly zero. This hypothesis is often criticized as restrictive and implausible, making its rejection scientifically uninteresting. We show, however, that many randomization tests are also valid for a "bounded" null hypothesis under which the unit-level effects are all… ▽ More

    Submitted 28 August, 2023; v1 submitted 22 January, 2021; originally announced January 2021.

  13. Block what you can, except when you shouldn't

    Authors: Nicole E. Pashley, Luke W. Miratrix

    Abstract: Several branches of the potential outcome causal inference literature have discussed the merits of blocking versus complete randomization. Some have concluded it can never hurt the precision of estimates, and some have concluded it can hurt. In this paper, we reconcile these apparently conflicting views, give a more thorough discussion of what guarantees no harm, and discuss how other aspects of a… ▽ More

    Submitted 27 May, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:1710.10342

    Journal ref: Journal of Educational and Behavioral Statistics, 2022; 47(1):69-100

  14. Conditional As-If Analyses in Randomized Experiments

    Authors: Nicole E. Pashley, Guillaume W. Basse, Luke W. Miratrix

    Abstract: The injunction to `analyze the way you randomize' is well-known to statisticians since Fisher advocated for randomization as the basis of inference. Yet even those convinced by the merits of randomization-based inference seldom follow this injunction to the letter. Bernoulli randomized experiments are often analyzed as completely randomized experiments, and completely randomized experiments are an… ▽ More

    Submitted 19 August, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Journal ref: Journal of Causal Inference, vol. 9, no. 1, 2021, pp. 264-284

  15. arXiv:2007.09056  [pdf, other

    stat.AP

    Hospital Quality Risk Standardization via Approximate Balancing Weights

    Authors: Luke Keele, Eli Ben-Michael, Avi Feller, Rachel Kelz, Luke Miratrix

    Abstract: Comparing outcomes across hospitals, often to identify underperforming hospitals, is a critical task in health services research. However, naive comparisons of average outcomes, such as surgery complication rates, can be misleading because hospital case mixes differ -- a hospital's overall complication rate may be lower due to more effective treatments or simply because the hospital serves a healt… ▽ More

    Submitted 15 February, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

  16. arXiv:2002.05746  [pdf, other

    stat.ME stat.AP

    Using Simulation to Analyze Interrupted Time Series Designs

    Authors: Luke Miratrix

    Abstract: We are sometimes forced to use the Interrupted Time Series (ITS) design as an identification strategy for potential policy change, such as when we only have a single treated unit and no comparable controls. For example, with recent county- and state-wide criminal justice reform efforts, where judicial bodies have changed bail setting practices for everyone in their jurisdiction in order to reduce… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  17. Design-Based Ratio Estimators and Central Limit Theorems for Clustered, Blocked RCTs

    Authors: Peter Z. Schochet, Nicole E. Pashley, Luke W. Miratrix, Tim Kautz

    Abstract: This article develops design-based ratio estimators for clustered, blocked randomized controlled trials (RCTs), with an application to a federally funded, school-based RCT testing the effects of behavioral health interventions. We consider finite population weighted least squares estimators for average treatment effects (ATEs), allowing for general weighting schemes and covariates. We consider mod… ▽ More

    Submitted 25 February, 2021; v1 submitted 4 February, 2020; originally announced February 2020.

    Journal ref: Journal of the American Statistical Association 117, no. 540 (2022)

  18. arXiv:1910.07091  [pdf

    stat.AP

    Lurking Inferential Monsters? Quantifying bias in non-experimental evaluations of school programs

    Authors: Ben Weidmann, Luke Miratrix

    Abstract: This study examines whether unobserved factors substantially bias education evaluations that rely on the Conditional Independence Assumption. We add 14 new within-study comparisons to the literature, all from primary schools in England. Across these 14 studies, we generate 42 estimates of selection bias using a simple matching approach. A meta-analysis of the estimates suggests that the distributi… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

  19. arXiv:1807.04516  [pdf, other

    stat.AP

    A Bayesian Nonparametric Approach to Geographic Regression Discontinuity Designs: Do School Districts Affect NYC House Prices?

    Authors: Maxime Rischard, Zach Branson, Luke Miratrix, Luke Bornn

    Abstract: Most research on regression discontinuity designs (RDDs) has focused on univariate cases, where only those units with a "forcing" variable on one side of a threshold value receive a treatment. Geographical regression discontinuity designs (GeoRDDs) extend the RDD to multivariate settings with spatial forcing variables. We propose a framework for analysing GeoRDDs, which we implement using Gaussian… ▽ More

    Submitted 11 July, 2018; originally announced July 2018.

    Comments: 40 pages, 12 figures

  20. arXiv:1803.06048  [pdf, other

    stat.ME

    Identifying and Estimating Principal Causal Effects in Multi-site Trials

    Authors: Lo-Hua Yuan, Avi Feller, Luke W. Miratrix

    Abstract: Randomized trials are often conducted with separate randomizations across multiple sites such as schools, voting districts, or hospitals. These sites can differ in important ways, including the site's implementation, local conditions, and the composition of individuals. An important question in practice is whether---and under what assumptions---researchers can leverage this cross-site variation to… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

  21. Randomization Tests that Condition on Non-Categorical Covariate Balance

    Authors: Zach Branson, Luke Miratrix

    Abstract: A benefit of randomized experiments is that covariate distributions of treatment and control groups are balanced on average, resulting in simple unbiased estimators for treatment effects. However, it is possible that a particular randomization yields covariate imbalances that researchers want to address in the analysis stage through adjustment or other methods. Here we present a randomization test… ▽ More

    Submitted 4 October, 2018; v1 submitted 3 February, 2018; originally announced February 2018.

    Comments: 54 pages, 12 Figures

  22. arXiv:1801.00644  [pdf, other

    stat.ME cs.CL

    Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

    Authors: Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L. Jason Anastasopoulos

    Abstract: Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes… ▽ More

    Submitted 13 March, 2019; v1 submitted 2 January, 2018; originally announced January 2018.

  23. Insights on Variance Estimation for Blocked and Matched Pairs Designs

    Authors: Nicole E. Pashley, Luke W. Miratrix

    Abstract: Evaluating blocked randomized experiments from a potential outcomes perspective has two primary branches of work. The first focuses on larger blocks, with multiple treatment and control units in each block. The second focuses on matched pairs, with a single treatment and control unit in each block. These literatures not only provide different estimators for the standard errors of the estimated ave… ▽ More

    Submitted 29 June, 2020; v1 submitted 27 October, 2017; originally announced October 2017.

    Journal ref: Journal of Educational and Behavioral Statistics, 46(3) (2021) p. 271-296

  24. arXiv:1709.07339  [pdf, other

    stat.ME

    Beyond the Sharp Null: Randomization Inference, Bounded Null Hypotheses, and Confidence Intervals for Maximum Effects

    Authors: Devin Caughey, Allan Dafoe, Luke Miratrix

    Abstract: Fisherian randomization inference is often dismissed as testing an uninteresting and implausible hypothesis: the sharp null of no effects whatsoever. We show that this view is overly narrow. Many randomization tests are also valid under a more general "bounded" null hypothesis under which all effects are weakly negative (or positive), thus accommodating heterogenous effects. By inverting such test… ▽ More

    Submitted 21 September, 2017; originally announced September 2017.

  25. arXiv:1706.07550  [pdf, other

    math.ST

    Shape-constrained partial identification of a population mean under unknown probabilities of sample selection

    Authors: Luke W. Miratrix, Stefan Wager, Jose R. Zubizarreta

    Abstract: A prevailing challenge in the biomedical and social sciences is to estimate a population mean from a sample obtained with unknown selection probabilities. Using a well-known ratio estimator, Aronow and Lee (2013) proposed a method for partial identification of the mean by allowing the unknown selection probabilities to vary arbitrarily between two fixed extreme values. In this paper, we show how t… ▽ More

    Submitted 22 June, 2017; originally announced June 2017.

  26. arXiv:1705.08526  [pdf, other

    stat.ME

    Model-free causal inference of binary experimental data

    Authors: Peng Ding, Luke W. Miratrix

    Abstract: For binary experimental data, we discuss randomization-based inferential procedures that do not need to invoke any modeling assumptions. We also introduce methods for likelihood and Bayesian inference based solely on the physical randomization without any hypothetical super population assumptions about the potential outcomes. These estimators have some properties superior to moment-based ones such… ▽ More

    Submitted 23 May, 2017; originally announced May 2017.

  27. A Nonparametric Bayesian Methodology for Regression Discontinuity Designs

    Authors: Zach Branson, Maxime Rischard, Luke Bornn, Luke Miratrix

    Abstract: One of the most popular methodologies for estimating the average treatment effect at the threshold in a regression discontinuity design is local linear regression (LLR), which places larger weight on units closer to the threshold. We propose a Gaussian process regression methodology that acts as a Bayesian analog to LLR for regression discontinuity designs. Our methodology provides a flexible fit… ▽ More

    Submitted 30 September, 2018; v1 submitted 16 April, 2017; originally announced April 2017.

    Comments: 40 pages, 5 figures, 5 tables

  28. arXiv:1703.06808  [pdf, other

    stat.ME stat.AP

    Worth Weighting? How to Think About and Use Weights in Survey Experiments

    Authors: Luke W. Miratrix, Jasjeet S. Sekhon, Alexander G. Theodoridis, Luis F. Campos

    Abstract: The popularity of online surveys has increased the prominence of using weights that capture units' probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the N… ▽ More

    Submitted 15 August, 2017; v1 submitted 20 March, 2017; originally announced March 2017.

    Comments: 26 pages, 4 figures

  29. arXiv:1702.08615  [pdf, ps, other

    math.ST

    Bridging Finite and Super Population Causal Inference

    Authors: Peng Ding, Xinran Li, Luke W. Miratrix

    Abstract: There are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite populations, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the physical randomization of the treatment assignment. These two views differs concept… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

    Journal ref: Journal of Causal Inference, 2017

  30. arXiv:1701.03227  [pdf, other

    cs.CL cs.IR cs.LG

    Prior matters: simple and general methods for evaluating and improving topic quality in topic modeling

    Authors: Angela Fan, Finale Doshi-Velez, Luke Miratrix

    Abstract: Latent Dirichlet Allocation (LDA) models trained without stopword removal often produce topics with high posterior probabilities on uninformative words, obscuring the underlying corpus content. Even when canonical stopwords are manually removed, uninformative words common in that corpus will still dominate the most probable words in a topic. In this work, we first show how the standard topic quali… ▽ More

    Submitted 14 October, 2017; v1 submitted 11 January, 2017; originally announced January 2017.

  31. arXiv:1701.03139  [pdf, other

    stat.ME stat.AP

    Bounding, an accessible method for estimating principal causal effects, examined and explained

    Authors: Luke Miratrix, Jane Furey, Avi Feller, Todd Grindal, Lindsay C. Page

    Abstract: Estimating treatment effects for subgroups defined by post-treatment behavior (i.e., estimating causal effects in a principal stratification framework) can be technically challenging and heavily reliant on strong assumptions. We investigate an alternative path: using bounds to identify ranges of possible effects that are consistent with the data. This simple approach relies on fewer assumptions an… ▽ More

    Submitted 16 August, 2017; v1 submitted 11 January, 2017; originally announced January 2017.

  32. arXiv:1606.02682  [pdf, other

    stat.ME

    Principal Score Methods: Assumptions and Extensions

    Authors: Avi Feller, Fabrizia Mealli, Luke Miratrix

    Abstract: Researchers addressing post-treatment complications in randomized trials often turn to principal stratification to define relevant assumptions and quantities of interest. One approach for estimating causal effects in this framework is to use methods based on the "principal score," typically assuming that stratum membership is as-good-as-randomly assigned given a set of covariates. In this paper, w… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

  33. arXiv:1605.07242  [pdf, other

    stat.ME stat.AP

    More Powerful Multiple Testing in Randomized Experiments with Non-Compliance

    Authors: Joseph J. Lee, Laura Forastiere, Luke Miratrix, Natesh S. Pillai

    Abstract: Two common concerns raised in analyses of randomized experiments are (i) appropriately handling issues of non-compliance, and (ii) appropriately adjusting for multiple tests (e.g., on multiple outcomes or subgroups). Although simple intention-to-treat (ITT) and Bonferroni methods are valid in terms of type I error, they can each lead to a substantial loss of power; when employing both simultaneous… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

    Comments: To appear in Statistica Sinica

  34. arXiv:1605.06566  [pdf, other

    math.ST stat.ME

    Decomposing Treatment Effect Variation

    Authors: Peng Ding, Avi Feller, Luke Miratrix

    Abstract: Understanding and characterizing treatment effect variation in randomized experiments has become essential for going beyond the "black box" of the average treatment effect. Nonetheless, traditional statistical approaches often ignore or assume away such variation. In the context of randomized experiments, this paper proposes a framework for decomposing overall treatment effect variation into a sys… ▽ More

    Submitted 28 July, 2017; v1 submitted 20 May, 2016; originally announced May 2016.

  35. arXiv:1602.06595  [pdf, other

    stat.ME

    Weak separation in mixture models and implications for principal stratification

    Authors: Avi Feller, Evan Greif, Nhat Ho, Luke Miratrix, Natesh Pillai

    Abstract: Principal stratification is a widely used framework for addressing post-randomization complications. After using principal stratification to define causal effects of interest, researchers are increasingly turning to finite mixture models to estimate these quantities. Unfortunately, standard estimators of mixture parameters, like the MLE, are known to exhibit pathological behavior. We study this be… ▽ More

    Submitted 17 August, 2019; v1 submitted 21 February, 2016; originally announced February 2016.

  36. arXiv:1511.06798  [pdf, other

    cs.CL cs.IR stat.AP

    Conducting sparse feature selection on arbitrarily long phrases in text corpora with a focus on interpretability

    Authors: Luke Miratrix, Robin Ackerman

    Abstract: We propose a general framework for topic-specific summarization of large text corpora, and illustrate how it can be used for analysis in two quite different contexts: an OSHA database of fatality and catastrophe reports (to facilitate surveillance for patterns in circumstances leading to injury or death) and legal decisions on workers' compensation claims (to explore relevant case law). Our summar… ▽ More

    Submitted 22 July, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

  37. arXiv:1511.00521  [pdf, other

    stat.ME

    Posterior Predictive P-values with Fisher Randomization Tests in Noncompliance Settings: Test Statistics vs Discrepancy Variables

    Authors: Laura Forastiere, Fabrizia Mealli, Luke Miratrix

    Abstract: In randomized experiments with noncompliance, tests may focus on compliers rather than on the overall sample. Rubin (1998) put forth such a method, and argued that testing for the complier average causal effect and averaging permutation based p-values over the posterior distribution of the compliance status could increase power, as compared to general intent-to-treat tests. The general scheme is t… ▽ More

    Submitted 20 February, 2016; v1 submitted 2 November, 2015; originally announced November 2015.

  38. arXiv:1510.06817  [pdf, other

    stat.ME

    A conditional randomization test to account for covariate imbalance in randomized experiments

    Authors: Jonathan Hennessy, Tirthankar Dasgupta, Luke Miratrix, Cassandra Pattanayak, Pradipta Sarkar

    Abstract: We consider the conditional randomization test as a way to account for covariate imbalance in randomized experiments. The test accounts for covariate imbalance by comparing the observed test statistic to the null distribution of the test statistic conditional on the observed covariate imbalance. We prove that the conditional randomization test has the correct significance level and introduce origi… ▽ More

    Submitted 20 April, 2017; v1 submitted 22 October, 2015; originally announced October 2015.

  39. arXiv:1412.5000  [pdf, other

    stat.ME stat.AP

    Randomization Inference for Treatment Effect Variation

    Authors: Peng Ding, Avi Feller, Luke Miratrix

    Abstract: Applied researchers are increasingly interested in whether and how treatment effects vary in randomized evaluations, especially variation not explained by observed covariates. We propose a model-free approach for testing for the presence of such unexplained variation. To use this randomization-based approach, we must address the fact that the average treatment effect, generally the object of inter… ▽ More

    Submitted 16 December, 2014; originally announced December 2014.

  40. arXiv:1408.0324  [pdf, other

    math.ST

    To Adjust or Not to Adjust? Sensitivity Analysis of M-Bias and Butterfly-Bias

    Authors: Peng Ding, Luke Miratrix

    Abstract: "M-Bias," as it is called in the epidemiologic literature, is the bias introduced by conditioning on a pretreatment covariate due to a particular "M-Structure" between two latent factors, an observed treatment, an outcome, and a "collider." This potential source of bias, which can occur even when the treatment and the outcome are not confounded, has been a source of considerable controversy. We he… ▽ More

    Submitted 1 August, 2014; originally announced August 2014.

    Comments: Journal of Causal Inference 2014

  41. arXiv:1404.7362  [pdf, ps, other

    cs.CL stat.AP

    Concise comparative summaries (CCS) of large text corpora with a human experiment

    Authors: **zhu Jia, Luke Miratrix, Bin Yu, Brian Gawalt, Laurent El Ghaoui, Luke Barnesmoore, Sophie Clavier

    Abstract: In this paper we propose a general framework for topic-specific summarization of large text corpora and illustrate how it can be used for the analysis of news databases. Our framework, concise comparative summarization (CCS), is built on sparse classification methods. CCS is a lightweight and flexible tool that offers a compromise between simple word frequency based methods currently in wide use a… ▽ More

    Submitted 29 April, 2014; originally announced April 2014.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOAS698 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS698

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 1, 499-529

  42. arXiv:0905.4691  [pdf, ps, other

    stat.AP

    Implementing Risk-Limiting Post-Election Audits in California

    Authors: Joseph Lorenzo Hall, Luke W. Miratrix, Philip B. Stark, Melvin Briones, Elaine Ginnold, Freddie Oakley, Martin Peaden, Gail Pellerin, Tom Stanionis, Tricia Webber

    Abstract: Risk-limiting post-election audits limit the chance of certifying an electoral outcome if the outcome is not what a full hand count would show. Building on previous work, we report on pilot risk-limiting audits in four elections during 2008 in three California counties: one during the February 2008 Primary Election in Marin County and three during the November 2008 General Elections in Marin, Sa… ▽ More

    Submitted 10 July, 2009; v1 submitted 28 May, 2009; originally announced May 2009.

    Comments: Accepted to the Electronic Voting Technology Workshop/Workshop on Trustworthy Elections 2009 (EVT/WOTE '09), http://www.usenix.org/events/evtwote09/