Search | arXiv e-print repository

Sequential Validation of Treatment Heterogeneity

Abstract: We use the martingale construction of Luedtke and van der Laan (2016) to develop tests for the presence of treatment heterogeneity. The resulting sequential validation approach can be instantiated using various validation metrics, such as BLPs, GATES, QINI curves, etc., and provides an alternative to cross-validation-like cross-fold application of these metrics. We use the martingale construction of Luedtke and van der Laan (2016) to develop tests for the presence of treatment heterogeneity. The resulting sequential validation approach can be instantiated using various validation metrics, such as BLPs, GATES, QINI curves, etc., and provides an alternative to cross-validation-like cross-fold application of these metrics. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: This note was prepared as a comment on the Fisher-Schultz paper by Chernozhukov, Demirer, Duflo and Fernandez-Val, forthcoming in Econometrica

arXiv:2403.01386 [pdf, other]

Minimax-Regret Sample Selection in Randomized Experiments

Authors: Yuchen Hu, Henry Zhu, Emma Brunskill, Stefan Wager

Abstract: Randomized controlled trials are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in a randomized trial, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection s… ▽ More Randomized controlled trials are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in a randomized trial, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection schemes under a variety of conditions. Using data from a COVID-19 vaccine trial, we also highlight how different objectives and decision rules can lead to meaningfully different guidance regarding optimal sample allocation. △ Less

Submitted 25 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.08201 [pdf, other]

Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap

Authors: Mohammad Mehrabi, Stefan Wager

Abstract: Doubly robust methods hold considerable promise for off-policy evaluation in Markov decision processes (MDPs) under sequential ignorability: They have been shown to converge as $1/\sqrt{T}$ with the horizon $T$, to be statistically efficient in large samples, and to allow for modular implementation where preliminary estimation tasks can be executed using standard reinforcement learning techniques.… ▽ More Doubly robust methods hold considerable promise for off-policy evaluation in Markov decision processes (MDPs) under sequential ignorability: They have been shown to converge as $1/\sqrt{T}$ with the horizon $T$, to be statistically efficient in large samples, and to allow for modular implementation where preliminary estimation tasks can be executed using standard reinforcement learning techniques. Existing results, however, make heavy use of a strong distributional overlap assumption whereby the stationary distributions of the target policy and the data-collection policy are within a bounded factor of each other -- and this assumption is typically only credible when the state space of the MDP is bounded. In this paper, we re-visit the task of off-policy evaluation in MDPs under a weaker notion of distributional overlap, and introduce a class of truncated doubly robust (TDR) estimators which we find to perform well in this setting. When the distribution ratio of the target and data-collection policies is square-integrable (but not necessarily bounded), our approach recovers the large-sample behavior previously established under strong distributional overlap. When this ratio is not square-integrable, TDR is still consistent but with a slower-than-$1/\sqrt{T}$; furthermore, this rate of convergence is minimax over a class of MDPs defined only using mixing conditions. We validate our approach numerically and find that, in our experiments, appropriate truncation plays a major role in enabling accurate off-policy evaluation when strong distributional overlap does not hold. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 50 pages, 4 figures

arXiv:2312.02482 [pdf, other]

Treatment heterogeneity with right-censored outcomes using grf

Authors: Erik Sverdrup, Stefan Wager

Abstract: This article walks through how to estimate conditional average treatment effects (CATEs) with right-censored time-to-event outcomes using the function causal_survival_forest (Cui et al., 2023) in the R package grf (Athey et al., 2019, Tibshirani et al., 2024) using data from the National Job Training Partnership Act. This article walks through how to estimate conditional average treatment effects (CATEs) with right-censored time-to-event outcomes using the function causal_survival_forest (Cui et al., 2023) in the R package grf (Athey et al., 2019, Tibshirani et al., 2024) using data from the National Job Training Partnership Act. △ Less

Submitted 25 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Software review article prepared for January 2024 ASA Lifetime Data Science newsletter

arXiv:2306.11979 [pdf, other]

Qini Curves for Multi-Armed Treatment Rules

Authors: Erik Sverdrup, Han Wu, Susan Athey, Stefan Wager

Abstract: Qini curves have emerged as an attractive and popular approach for evaluating the benefit of data-driven targeting rules for treatment allocation. We propose a generalization of the Qini curve to multiple costly treatment arms, that quantifies the value of optimally selecting among both units and treatment arms at different budget levels. We develop an efficient algorithm for computing these curve… ▽ More Qini curves have emerged as an attractive and popular approach for evaluating the benefit of data-driven targeting rules for treatment allocation. We propose a generalization of the Qini curve to multiple costly treatment arms, that quantifies the value of optimally selecting among both units and treatment arms at different budget levels. We develop an efficient algorithm for computing these curves and propose bootstrap-based confidence intervals that are exact in large samples for any point on the curve. These confidence intervals can be used to conduct hypothesis tests comparing the value of treatment targeting using an optimal combination of arms with using just a subset of arms, or with a non-targeting assignment rule ignoring covariates, at different budget levels. We demonstrate the statistical performance in a simulation experiment and an application to treatment targeting for election turnout. △ Less

Submitted 23 April, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2304.11735 [pdf, other]

Policy Learning under Biased Sample Selection

Authors: Lihua Lei, Roshni Sahoo, Stefan Wager

Abstract: Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lea… ▽ More Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lead to policies that perform suboptimally on the target population. We consider a model where observable attributes can impact sample selection probabilities arbitrarily but the effect of unobservable attributes is bounded by a constant, and we aim to learn policies with the best possible performance guarantees that hold under any sampling bias of this type. In particular, we derive the partial identification result for the worst-case welfare in the presence of sampling bias and show that the optimal max-min, max-min gain, and minimax regret policies depend on both the conditional average treatment effect (CATE) and the conditional value-at-risk (CVaR) of potential outcomes given covariates. To avoid finite-sample inefficiencies of plug-in estimates, we further provide an end-to-end procedure for learning the optimal max-min and max-min gain policies that does not require the separate estimation of nuisance parameters. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2302.12093 [pdf, other]

Experimenting under Stochastic Congestion

Authors: Shuangning Li, Ramesh Johari, Xu Kuang, Stefan Wager

Abstract: We study randomized experiments in a service system when stochastic congestion can arise from temporarily limited supply and/or demand. Such congestion gives rise to cross-unit interference between the waiting customers, and analytic strategies that do not account for this interference may be biased. In current practice, one of the most widely used ways to address stochastic congestion is to use s… ▽ More We study randomized experiments in a service system when stochastic congestion can arise from temporarily limited supply and/or demand. Such congestion gives rise to cross-unit interference between the waiting customers, and analytic strategies that do not account for this interference may be biased. In current practice, one of the most widely used ways to address stochastic congestion is to use switchback experiments that alternatively turn a target intervention on and off for the whole system. We find, however, that under a queueing model for stochastic congestion, the standard way of analyzing switchbacks is inefficient, and that estimators that leverage the queueing model can be materially more accurate. We also consider a new class of experimental design, which can be used to estimate a policy gradient of the dynamic system using only unit-level randomization, thus alleviating key practical challenges that arise in running a switchback. △ Less

Submitted 25 September, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2209.01754 [pdf, other]

Learning from a Biased Sample

Authors: Roshni Sahoo, Lihua Lei, Stefan Wager

Abstract: The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it in. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-r… ▽ More The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it in. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-represented relative to the general population; and in this setting empirical risk minimization over the training set may fail to yield rules that perform well at deployment. We propose a model of sampling bias called $Γ$-biased sampling, where observed covariates can affect the probability of sample selection arbitrarily much but the amount of unexplained variation in the probability of sample selection is bounded by a constant factor. Applying the distributionally robust optimization framework, we propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions that can generate the training distribution under $Γ$-biased sampling. We apply a result of Rockafellar and Uryasev to show that this problem is equivalent to an augmented convex risk minimization problem. We give statistical guarantees for learning a model that is robust to sampling bias via the method of sieves, and propose a deep learning algorithm whose loss function captures our robust learning target. We empirically validate our proposed method in simulations and a case study on ICU length of stay prediction. △ Less

Submitted 5 January, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

arXiv:2209.00197 [pdf, other]

Switchback Experiments under Geometric Mixing

Authors: Yuchen Hu, Stefan Wager

Abstract: The switchback is an experimental design that measures treatment effects by repeatedly turning an intervention on and off for a whole system. Switchback experiments are a robust way to overcome cross-unit spillover effects; however, they are vulnerable to bias from temporal carryovers. In this paper, we consider properties of switchback experiments in Markovian systems that mix at a geometric rate… ▽ More The switchback is an experimental design that measures treatment effects by repeatedly turning an intervention on and off for a whole system. Switchback experiments are a robust way to overcome cross-unit spillover effects; however, they are vulnerable to bias from temporal carryovers. In this paper, we consider properties of switchback experiments in Markovian systems that mix at a geometric rate. We find that, in this setting, standard switchback designs suffer considerably from carryover bias: Their estimation error decays as $T^{-1/3}$ in terms of the experiment horizon $T$, whereas in the absence of carryovers a faster rate of $T^{-1/2}$ would have been possible. We also show, however, that judicious use of burn-in periods can considerably improve the situation, and enables errors that decay as $\log(T)^{1/2}T^{-1/2}$. Our formal results are mirrored in an empirical evaluation. △ Less

Submitted 2 April, 2024; v1 submitted 31 August, 2022; originally announced September 2022.

arXiv:2207.07758 [pdf, other]

Treatment Heterogeneity for Survival Outcomes

Authors: Yizhe Xu, Nikolaos Ignatiadis, Erik Sverdrup, Scott Fleming, Stefan Wager, Nigam Shah

Abstract: Estimation of conditional average treatment effects (CATEs) plays an essential role in modern medicine by informing treatment decision-making at a patient level. Several metalearners have been proposed recently to estimate CATEs in an effective and flexible way by re-purposing predictive machine learning models for causal estimation. In this chapter, we summarize the literature on metalearners and… ▽ More Estimation of conditional average treatment effects (CATEs) plays an essential role in modern medicine by informing treatment decision-making at a patient level. Several metalearners have been proposed recently to estimate CATEs in an effective and flexible way by re-purposing predictive machine learning models for causal estimation. In this chapter, we summarize the literature on metalearners and provide concrete guidance for their application for treatment heterogeneity estimation from randomized controlled trials' data with survival outcomes. The guidance we provide is supported by a comprehensive simulation study in which we vary the complexity of the underlying baseline risk and CATE functions, the magnitude of the heterogeneity in the treatment effect, the censoring mechanism, and the balance in treatment assignment. To demonstrate the applicability of our findings, we reanalyze the data from the Systolic Blood Pressure Intervention Trial (SPRINT) and the Action to Control Cardiovascular Risk in Diabetes (ACCORD) study. While recent literature reports the existence of heterogeneous effects of intensive blood pressure treatment with multiple treatment effect modifiers, our results suggest that many of these modifiers may be spurious discoveries. This chapter is accompanied by survlearners, an R package that provides well-documented implementations of the CATE estimation strategies described in this work, to allow easy use of our recommendations as well as the reproduction of our numerical study. △ Less

Submitted 6 September, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: A chapter of the 'Handbook of Matching and Weighting Adjustments for Causal Inference'

arXiv:2206.10323 [pdf, other]

What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?

Authors: Susanne Dandl, Torsten Hothorn, Heidi Seibold, Erik Sverdrup, Stefan Wager, Achim Zeileis

Abstract: Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, ranging from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular "causal forests", introduced by Athey, Tibshirani and Wager (2019), along with the… ▽ More Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, ranging from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular "causal forests", introduced by Athey, Tibshirani and Wager (2019), along with the R implementation in package grf were rapidly adopted. A related approach, called "model-based forests", that is geared towards randomized trials and simultaneously captures effects of both prognostic and predictive variables, was introduced by Seibold, Zeileis and Hothorn (2018) along with a modular implementation in the R package model4you. Here, we present a unifying view that goes beyond the theoretical motivations and investigates which computational elements make causal forests so successful and how these can be blended with the strengths of model-based forests. To do so, we show that both methods can be understood in terms of the same parameters and model assumptions for an additive model under L2 loss. This theoretical insight allows us to implement several flavors of "model-based causal forests" and dissect their different elements in silico. The original causal forests and model-based forests are compared with the new blended versions in a benchmark study exploring both randomized trials and observational settings. In the randomized setting, both approaches performed akin. If confounding was present in the data generating process, we found local centering of the treatment indicator with the corresponding propensities to be the main driver for good performance. Local centering of the outcome was less important, and might be replaced or enhanced by simultaneous split selection with respect to both prognostic and predictive effects. △ Less

Submitted 20 December, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: Contribution has been accepted for publication in the Annals of Applied Statistics

arXiv:2204.01884 [pdf, other]

Policy Learning with Competing Agents

Authors: Roshni Sahoo, Stefan Wager

Abstract: Decision makers often aim to learn a treatment assignment policy under a capacity constraint on the number of agents that they can treat. When agents can respond strategically to such policies, competition arises, complicating estimation of the optimal policy. In this paper, we study capacity-constrained treatment assignment in the presence of such interference. We consider a dynamic model where t… ▽ More Decision makers often aim to learn a treatment assignment policy under a capacity constraint on the number of agents that they can treat. When agents can respond strategically to such policies, competition arises, complicating estimation of the optimal policy. In this paper, we study capacity-constrained treatment assignment in the presence of such interference. We consider a dynamic model where the decision maker allocates treatments at each time step and heterogeneous agents myopically best respond to the previous treatment assignment policy. When the number of agents is large but finite, we show that the threshold for receiving treatment under a given policy converges to the policy's mean-field equilibrium threshold. Based on this result, we develop a consistent estimator for the policy gradient. In a semi-synthetic experiment with data from the National Education Longitudinal Study of 1988, we demonstrate that this estimator can be used for learning capacity-constrained policies in the presence of strategic behavior. △ Less

Submitted 17 April, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

arXiv:2203.12053 [pdf, other]

Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content

Authors: Haici Yang, Sanna Wager, Spencer Russell, Mike Luo, Minje Kim, Wontak Kim

Abstract: In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned la… ▽ More In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned latent variables are invariant to the music. At test time, we use the latent variables to control the panning of sources. We propose two upmixing use cases: transferring the spatial images from one song to another and blind panning based on the generative model. We report objective and subjective evaluation results to empirically show that our model captures spatial images separately from music content and achieves transfer-based interactive panning. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.00820 [pdf, other]

Partial Likelihood Thompson Sampling

Authors: Han Wu, Stefan Wager

Abstract: We consider the problem of deciding how best to target and prioritize existing vaccines that may offer protection against new variants of an infectious disease. Sequential experiments are a promising approach; however, challenges due to delayed feedback and the overall ebb and flow of disease prevalence make available methods inapplicable for this task. We present a method, partial likelihood Thom… ▽ More We consider the problem of deciding how best to target and prioritize existing vaccines that may offer protection against new variants of an infectious disease. Sequential experiments are a promising approach; however, challenges due to delayed feedback and the overall ebb and flow of disease prevalence make available methods inapplicable for this task. We present a method, partial likelihood Thompson sampling, that can handle these challenges. Our method involves running Thompson sampling with belief updates determined by partial likelihood each time we observe an event. To test our approach, we ran a semi-synthetic experiment based on 200 days of COVID-19 infection data in the US. △ Less

Submitted 19 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.12431 [pdf, other]

Thompson Sampling with Unrestricted Delays

Authors: Han Wu, Stefan Wager

Abstract: We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms… ▽ More We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms, and only depend on delays via selected quantiles of the delay distributions. Furthermore, in extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals, including methods specifically designed for settings with delayed feedback. △ Less

Submitted 22 May, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.05356 [pdf, ps, other]

Network Interference in Micro-Randomized Trials

Authors: Shuangning Li, Stefan Wager

Abstract: The micro-randomized trial (MRT) is an experimental design that can be used to develop optimal mobile health interventions. In MRTs, interventions in the form of notifications or messages are sent through smart phones to individuals, targeting a health-related outcome such as physical activity or weight management. Often, mobile health interventions have a social media component; an individual's o… ▽ More The micro-randomized trial (MRT) is an experimental design that can be used to develop optimal mobile health interventions. In MRTs, interventions in the form of notifications or messages are sent through smart phones to individuals, targeting a health-related outcome such as physical activity or weight management. Often, mobile health interventions have a social media component; an individual's outcome could thus depend on other individuals' treatments and outcomes. In this paper, we study the micro-randomized trial in the presence of such cross-unit interference. We model the cross-unit interference with a network interference model; the outcome of one individual may affect the outcome of another individual if and only if they are connected by an edge in the network. Assuming the dynamics can be represented as a Markov decision process, we analyze the behavior of the outcomes in large sample asymptotics and show that they converge to a mean-field limit when the sample size goes to infinity. Based on the mean-field result, we give characterization results and estimation strategies for various causal estimands including the short-term direct effect of a binary intervention, its long-term direct effect and its long-term total effect. △ Less

Submitted 10 February, 2022; originally announced February 2022.

arXiv:2112.04723 [pdf, other]

Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations

Authors: Xinkun Nie, Guido Imbens, Stefan Wager

Abstract: The ability to generalize experimental results from randomized control trials (RCTs) across locations is crucial for informing policy decisions in targeted regions. Such generalization is often hindered by the lack of identifiability due to unmeasured effect modifiers that compromise direct transport of treatment effect estimates from one location to another. We build upon sensitivity analysis in… ▽ More The ability to generalize experimental results from randomized control trials (RCTs) across locations is crucial for informing policy decisions in targeted regions. Such generalization is often hindered by the lack of identifiability due to unmeasured effect modifiers that compromise direct transport of treatment effect estimates from one location to another. We build upon sensitivity analysis in observational studies and propose an optimization procedure that allows us to get bounds on the treatment effects in targeted regions. Furthermore, we construct more informative bounds by balancing on the moments of covariates. In simulation experiments, we show that the covariate balancing approach is promising in getting sharper identification intervals. △ Less

Submitted 9 December, 2021; originally announced December 2021.

arXiv:2111.07966 [pdf, other]

Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

Authors: Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, Stefan Wager

Abstract: There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules. RATE metrics are agnostic as to how the p… ▽ More There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules. RATE metrics are agnostic as to how the prioritization rules were derived, and only assess how well they identify individuals that benefit the most from treatment. We define a family of RATE estimators and prove a central limit theorem that enables asymptotically exact inference in a wide variety of randomized and observational study settings. RATE metrics subsume a number of existing metrics, including the Qini coefficient, and our analysis directly yields inference methods for these metrics. We showcase RATE in the context of a number of applications, including optimal targeting of aspirin to stroke patients. △ Less

Submitted 28 November, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

arXiv:2110.12343 [pdf, other]

Off-Policy Evaluation in Partially Observed Markov Decision Processes under Sequential Ignorability

Authors: Yuchen Hu, Stefan Wager

Abstract: We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP). We propose an estimator, partial history importance weighting, and show that it can consistently estimate the stationary mean rewards of a target policy given long enough draws from the beh… ▽ More We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP). We propose an estimator, partial history importance weighting, and show that it can consistently estimate the stationary mean rewards of a target policy given long enough draws from the behavior policy. We provide an upper bound on its error that decays polynomially in the number of observations (i.e., the number of trajectories times their length), with an exponent that depends on the overlap of the target and behavior policies, and on the mixing time of the underlying system. Furthermore, we show that this rate of convergence is minimax given only our assumptions on mixing and overlap. Our results establish that off-policy evaluation in POMDPs is strictly harder than off-policy evaluation in (fully observed) Markov decision processes, but strictly easier than model-free off-policy evaluation. △ Less

Submitted 9 May, 2023; v1 submitted 23 October, 2021; originally announced October 2021.

arXiv:2109.11647 [pdf, other]

Treatment Effects in Market Equilibrium

Authors: Evan Munro, Xu Kuang, Stefan Wager

Abstract: Policy-relevant treatment effect estimation in a marketplace setting requires taking into account both the direct benefit of the treatment and any spillovers induced by changes to the market equilibrium. The standard way to address these challenges is to evaluate interventions via cluster-randomized experiments, where each cluster corresponds to an isolated market. This approach, however, cannot b… ▽ More Policy-relevant treatment effect estimation in a marketplace setting requires taking into account both the direct benefit of the treatment and any spillovers induced by changes to the market equilibrium. The standard way to address these challenges is to evaluate interventions via cluster-randomized experiments, where each cluster corresponds to an isolated market. This approach, however, cannot be used when we only have access to a single market (or a small number of markets). Here, we show how to identify and estimate policy-relevant treatment effects using a unit-level randomized trial run within a single large market. A standard Bernoulli-randomized trial allows consistent estimation of direct effects, and of treatment heterogeneity measures that can be used for welfare-improving targeting. Estimating spillovers - as well as providing confidence intervals for the direct effect - requires estimates of price elasticities, which we provide using an augmented experimental design. Our results rely on all spillovers being mediated via the (observed) prices of a finite number of traded goods, and the market power of any single unit decaying as the market gets large. We illustrate our results using a simulation calibrated to a conditional cash transfer experiment in the Philippines. △ Less

Submitted 17 June, 2024; v1 submitted 23 September, 2021; originally announced September 2021.

arXiv:2104.03802 [pdf, other]

Average Direct and Indirect Causal Effects under Interference

Authors: Yuchen Hu, Shuangning Li, Stefan Wager

Abstract: We propose a definition for the average indirect effect of a binary treatment in the potential outcomes model for causal inference under cross-unit interference. Our definition is analogous to the standard definition of the average direct effect, and can be expressed without needing to compare outcomes across multiple randomized experiments. We show that the proposed indirect effect satisfies a de… ▽ More We propose a definition for the average indirect effect of a binary treatment in the potential outcomes model for causal inference under cross-unit interference. Our definition is analogous to the standard definition of the average direct effect, and can be expressed without needing to compare outcomes across multiple randomized experiments. We show that the proposed indirect effect satisfies a decomposition theorem whereby, in a Bernoulli trial, the sum of the average direct and indirect effects always corresponds to the effect of a policy intervention that infinitesimally increases treatment probabilities. We also consider a number of parametric models for interference, and find that our non-parametric indirect effect remains a natural estimand when re-expressed in the context of these models. △ Less

Submitted 11 January, 2022; v1 submitted 8 April, 2021; originally announced April 2021.

arXiv:2103.11066 [pdf, other]

Treatment Allocation under Uncertain Costs

Authors: Hao Sun, Evan Munro, Georgy Kalashnov, Shuyang Du, Stefan Wager

Abstract: We consider the problem of learning how to optimally allocate treatments whose cost is uncertain and can vary with pre-treatment covariates. This setting may arise in medicine if we need to prioritize access to a scarce resource that different patients would use for different amounts of time, or in marketing if we want to target discounts whose cost to the company depends on how much the discounts… ▽ More We consider the problem of learning how to optimally allocate treatments whose cost is uncertain and can vary with pre-treatment covariates. This setting may arise in medicine if we need to prioritize access to a scarce resource that different patients would use for different amounts of time, or in marketing if we want to target discounts whose cost to the company depends on how much the discounts are used. Here, we show that the optimal treatment allocation rule under budget constraints is a thresholding rule based on priority scores, and we propose a number of practical methods for learning these priority scores using data from a randomized trial. Our formal results leverage a statistical connection between our problem and that of learning heterogeneous treatment effects under endogeneity using an instrumental variable. We find our method to perform well in a number of empirical evaluations. △ Less

Submitted 11 March, 2024; v1 submitted 19 March, 2021; originally announced March 2021.

arXiv:2101.09855 [pdf, other]

Weak Signal Asymptotics for Sequentially Randomized Experiments

Authors: Xu Kuang, Stefan Wager

Abstract: We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a… ▽ More We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments -- adapted to this scaling regime and with arm selection probabilities that vary continuously with state -- converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics, and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not UCB, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs. △ Less

Submitted 22 June, 2023; v1 submitted 24 January, 2021; originally announced January 2021.

Comments: Forthcoming in Management Science. An earlier draft of this paper was circulated under the title "Diffusion Asymptotics for Sequential Experiments.'' Xu Kuang published under a different full name in earlier versions of this manuscript. Please use X. Kuang and S. Wager when citing this paper

MSC Class: 62B15; 60J70

arXiv:2007.13302 [pdf, other]

Random Graph Asymptotics for Treatment Effect Estimation under Network Interference

Authors: Shuangning Li, Stefan Wager

Abstract: The network interference model for causal inference places all experimental units at the vertices of an undirected exposure graph, such that treatment assigned to one unit may affect the outcome of another unit if and only if these two units are connected by an edge. This model has recently gained popularity as means of incorporating interference effects into the Neyman--Rubin potential outcomes f… ▽ More The network interference model for causal inference places all experimental units at the vertices of an undirected exposure graph, such that treatment assigned to one unit may affect the outcome of another unit if and only if these two units are connected by an edge. This model has recently gained popularity as means of incorporating interference effects into the Neyman--Rubin potential outcomes framework; and several authors have considered estimation of various causal targets, including the direct and indirect effects of treatment. In this paper, we consider large-sample asymptotics for treatment effect estimation under network interference in a setting where the exposure graph is a random draw from a graphon. When targeting the direct effect, we show that -- in our setting -- popular estimators are considerably more accurate than existing results suggest, and provide a central limit theorem in terms of moments of the graphon. Meanwhile, when targeting the indirect effect, we leverage our generative assumptions to propose a consistent estimator in a setting where no other consistent estimators are currently available. We also show how our results can be used to conduct a practical assessment of the sensitivity of randomized study inference to potential interference effects. Overall, our results highlight the promise of random graph asymptotics in understanding the practicality and limits of causal inference under network interference. △ Less

Submitted 16 March, 2022; v1 submitted 27 July, 2020; originally announced July 2020.

arXiv:2007.12581 [pdf, other]

Dereverberation using joint estimation of dry speech signal and acoustic system

Authors: Sanna Wager, Keunwoo Choi, Simon Durand

Abstract: The purpose of speech dereverberation is to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry speech signal and of the room impulse response. We explore deep learning models that apply to each task separately, and how these can be combined in a joi… ▽ More The purpose of speech dereverberation is to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry speech signal and of the room impulse response. We explore deep learning models that apply to each task separately, and how these can be combined in a joint model with shared parameters. △ Less

Submitted 24 July, 2020; originally announced July 2020.

arXiv:2004.09458 [pdf, other]

Noise-Induced Randomization in Regression Discontinuity Designs

Authors: Dean Eckles, Nikolaos Ignatiadis, Stefan Wager, Han Wu

Abstract: Regression discontinuity designs assess causal effects in settings where treatment is determined by whether an observed running variable crosses a pre-specified threshold. Here we propose a new approach to identification, estimation, and inference in regression discontinuity designs that uses knowledge about exogenous noise (e.g., measurement error) in the running variable. In our strategy, we wei… ▽ More Regression discontinuity designs assess causal effects in settings where treatment is determined by whether an observed running variable crosses a pre-specified threshold. Here we propose a new approach to identification, estimation, and inference in regression discontinuity designs that uses knowledge about exogenous noise (e.g., measurement error) in the running variable. In our strategy, we weight treated and control units to balance a latent variable of which the running variable is a noisy measure. Our approach is explicitly randomization-based and complements standard formal analyses that appeal to continuity arguments while ignoring the stochastic nature of the assignment mechanism. △ Less

Submitted 26 November, 2023; v1 submitted 20 April, 2020; originally announced April 2020.

arXiv:2002.05511 [pdf, other]

Deep Autotuner: a Pitch Correcting Network for Singing Performances

Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim

Abstract: We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to… ▽ More We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to the closest pitch among the twelve equal-tempered scale degrees. The proposed system treats pitch as a continuous value rather than relying on a set of discretized notes found in musical scores, thus allowing for improvisation and harmonization in the singing performance. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. Our model is trained on both incorrect intonation, for which it learns a correction, and intentional pitch variation, which it learns to preserve. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: arXiv admin note: text overlap with arXiv:1902.00956

Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

arXiv:2002.00125 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053367

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

Authors: Sanna Wager, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram

Abstract: In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers an… ▽ More In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers and the higher classification layers were jointly trained using the logits from the teacher model. In our experiments, compared to a baseline model trained on about 600 hours of transcribed data, a relative word-error rate (WER) reduction of about 27.3% was achieved when using an additional 1800 hours of untranscribed data. We also investigated the benefit of pre-training the multi-channel front end to output the beamformed log-mel filter bank energies (LFBE) using L2 loss. We find that pre-training improves the word error rate by 10.7% when compared to a multi-channel model directly initialized with a beamformer and mel-filter bank coefficients for the front end. Finally, combining pre-training and teacher-student training produces a WER reduction of 31% compared to our baseline. △ Less

Submitted 31 January, 2020; originally announced February 2020.

Comments: To appear in ICASSP 2020

arXiv:2001.09887 [pdf, other]

Estimating heterogeneous treatment effects with right-censored data via causal survival forests

Authors: Yifan Cui, Michael R. Kosorok, Erik Sverdrup, Stefan Wager, Ruoqing Zhu

Abstract: Forest-based methods have recently gained in popularity for non-parametric treatment effect estimation. Building on this line of work, we introduce causal survival forests, which can be used to estimate heterogeneous treatment effects in a survival and observational setting where outcomes may be right-censored. Our approach relies on orthogonal estimating equations to robustly adjust for both cens… ▽ More Forest-based methods have recently gained in popularity for non-parametric treatment effect estimation. Building on this line of work, we introduce causal survival forests, which can be used to estimate heterogeneous treatment effects in a survival and observational setting where outcomes may be right-censored. Our approach relies on orthogonal estimating equations to robustly adjust for both censoring and selection effects under unconfoundedness. In our experiments, we find our approach to perform well relative to a number of baselines. △ Less

Submitted 28 February, 2023; v1 submitted 27 January, 2020; originally announced January 2020.

Comments: To appear in the Journal of the Royal Statistical Society, Series B

MSC Class: 62N01

arXiv:1911.02768 [pdf, other]

Confidence Intervals for Policy Evaluation in Adaptive Experiments

Authors: Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, Susan Athey

Abstract: Adaptive experiment designs can dramatically improve statistical efficiency in randomized trials, but they also complicate statistical inference. For example, it is now well known that the sample mean is biased in adaptive trials. Inferential challenges are exacerbated when our parameter of interest differs from the parameter the trial was designed to target, such as when we are interested in esti… ▽ More Adaptive experiment designs can dramatically improve statistical efficiency in randomized trials, but they also complicate statistical inference. For example, it is now well known that the sample mean is biased in adaptive trials. Inferential challenges are exacerbated when our parameter of interest differs from the parameter the trial was designed to target, such as when we are interested in estimating the value of a sub-optimal treatment after running a trial to determine the optimal treatment using a stochastic bandit design. In this context, typical estimators that use inverse propensity weighting to eliminate sampling bias can be problematic: their distributions become skewed and heavy-tailed as the propensity scores decay to zero. In this paper, we present a class of estimators that overcome these issues. Our approach is to adaptively reweight the terms of an augmented inverse propensity weighting estimator to control the contribution of each term to the estimator's variance. This adaptive weighting scheme prevents estimates from becoming heavy-tailed, ensuring asymptotically correct coverage. It also reduces variance, allowing us to test hypotheses with greater power - especially hypotheses that were not targeted by the experimental design. We validate the accuracy of the resulting estimates and their confidence intervals in numerical experiments and show our methods compare favorably to existing alternatives in terms of RMSE and coverage. △ Less

Submitted 12 February, 2021; v1 submitted 7 November, 2019; originally announced November 2019.

arXiv:1910.10624 [pdf, other]

Doubly robust treatment effect estimation with missing attributes

Authors: Imke Mayer, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager, Julie Josse

Abstract: Missing attributes are ubiquitous in causal inference, as they are in most applied statistical work. In this paper, we consider various sets of assumptions under which causal inference is possible despite missing attributes and discuss corresponding approaches to average treatment effect estimation, including generalized propensity score methods and multiple imputation. Across an extensive simulat… ▽ More Missing attributes are ubiquitous in causal inference, as they are in most applied statistical work. In this paper, we consider various sets of assumptions under which causal inference is possible despite missing attributes and discuss corresponding approaches to average treatment effect estimation, including generalized propensity score methods and multiple imputation. Across an extensive simulation study, we show that no single method systematically out-performs others. We find, however, that doubly robust modifications of standard methods for average treatment effect estimation with missing data repeatedly perform better than their non-doubly robust baselines; for example, doubly robust generalized propensity score methods beat inverse-weighting with the generalized propensity score. This finding is reinforced in an analysis of an observations study on the effect on mortality of tranexamic acid administration among patients with traumatic brain injury in the context of critical care management. Here, doubly robust estimators recover confidence intervals that are consistent with evidence from randomized trials, whereas non-doubly robust estimators do not. △ Less

Submitted 22 May, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

MSC Class: 93C41; 62G35; 62F35; 62P10

arXiv:1910.09714 [pdf, other]

Smoothness-Adaptive Contextual Bandits

Authors: Yonatan Gur, Ahmadreza Momeni, Stefan Wager

Abstract: We study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known i… ▽ More We study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known in advance, and misspecification of smoothness may severely deteriorate the performance of existing methods. In this work, we consider a framework where the smoothness of payoff functions is not known, and study when and how algorithms may adapt to unknown smoothness. First, we establish that designing algorithms that adapt to unknown smoothness of payoff functions is, in general, impossible. However, under a self-similarity condition (which does not reduce the minimax complexity of the dynamic optimization problem at hand), we establish that adapting to unknown smoothness is possible, and further devise a general policy for achieving smoothness-adaptive performance. Our policy infers the smoothness of payoffs throughout the decision-making process, while leveraging the structure of off-the-shelf non-adaptive policies. We establish that for problem settings with either differentiable or non-differentiable payoff functions, this policy matches (up to a logarithmic scale) the regret rate that is achievable when the smoothness of payoffs is known a priori. △ Less

Submitted 15 October, 2021; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1909.11696 [pdf, other]

Cross-Validation, Risk Estimation, and Model Selection

Authors: Stefan Wager

Abstract: Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting, and find that cross-validation is asymptotically uninformative about the expected test error of any given predictive rule, but allows for asymptotically consist… ▽ More Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting, and find that cross-validation is asymptotically uninformative about the expected test error of any given predictive rule, but allows for asymptotically consistent model selection. The reason for this phenomenon is that the leading-order error term of cross-validation doesn't depend on the model being evaluated, and so cancels out when we compare two models. △ Less

Submitted 25 September, 2019; originally announced September 2019.

Comments: This note was prepared as a comment on a paper by Rosset and Tibshirani, forthcoming in the Journal of the American Statistical Association

arXiv:1908.09874 [pdf, other]

Sufficient Representations for Categorical Variables

Authors: Jonathan Johannemann, Vitor Hadad, Susan Athey, Stefan Wager

Abstract: Many learning algorithms require categorical data to be transformed into real vectors before it can be used as input. Often, categorical variables are encoded as one-hot (or dummy) vectors. However, this mode of representation can be wasteful since it adds many low-signal regressors, especially when the number of unique categories is large. In this paper, we investigate simple alternative solution… ▽ More Many learning algorithms require categorical data to be transformed into real vectors before it can be used as input. Often, categorical variables are encoded as one-hot (or dummy) vectors. However, this mode of representation can be wasteful since it adds many low-signal regressors, especially when the number of unique categories is large. In this paper, we investigate simple alternative solutions for universally consistent estimators that rely on lower-dimensional real-valued representations of categorical variables that are "sufficient" in the sense that no predictive information is lost. We then compare preexisting and proposed methods on simulated and observational datasets. △ Less

Submitted 28 October, 2021; v1 submitted 26 August, 2019; originally announced August 2019.

arXiv:1906.01611 [pdf, other]

Covariate-Powered Empirical Bayes Estimation

Authors: Nikolaos Ignatiadis, Stefan Wager

Abstract: We study methods for simultaneous analysis of many noisy experiments in the presence of rich covariate information. The goal of the analyst is to optimally estimate the true effect underlying each experiment. Both the noisy experimental results and the auxiliary covariates are useful for this purpose, but neither data source on its own captures all the information available to the analyst. In this… ▽ More We study methods for simultaneous analysis of many noisy experiments in the presence of rich covariate information. The goal of the analyst is to optimally estimate the true effect underlying each experiment. Both the noisy experimental results and the auxiliary covariates are useful for this purpose, but neither data source on its own captures all the information available to the analyst. In this paper, we propose a flexible plug-in empirical Bayes estimator that synthesizes both sources of information and may leverage any black-box predictive model. We show that our approach is within a constant factor of minimax for a simple data-generating model. Furthermore, we establish robust convergence guarantees for our method that hold under considerable generality, and exhibit promising empirical performance on both real and simulated data. △ Less

Submitted 12 January, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

arXiv:1905.11622 [pdf, other]

Nonparametric Heterogeneous Treatment Effect Estimation in Repeated Cross Sectional Designs

Authors: Xinkun Nie, Chen Lu, Stefan Wager

Abstract: Identifying heterogeneity in a population's response to a health or policy intervention is crucial for evaluating and informing policy decisions. We propose a novel heterogeneous treatment effect estimator in the difference-in-differences design with repeated cross sectional data, where we observe different samples of a population at two time periods separated by the onset of a policy intervention… ▽ More Identifying heterogeneity in a population's response to a health or policy intervention is crucial for evaluating and informing policy decisions. We propose a novel heterogeneous treatment effect estimator in the difference-in-differences design with repeated cross sectional data, where we observe different samples of a population at two time periods separated by the onset of a policy intervention, as well as samples of a population that serves as the control. Our estimator has orthogonality properties that enable fast rates on learning the treatment effect while allowing slower rates for estimating nuisance components. Our proposal shows promising empirical performance across a variety of simulation setups. △ Less

Submitted 22 August, 2021; v1 submitted 28 May, 2019; originally announced May 2019.

arXiv:1905.09751 [pdf, other]

Learning When-to-Treat Policies

Authors: Xinkun Nie, Emma Brunskill, Stefan Wager

Abstract: Many applied decision-making problems have a dynamic component: The policymaker needs not only to choose whom to treat, but also when to start which treatment. For example, a medical doctor may choose between postponing treatment (watchful waiting) and prescribing one of several available treatments during the many visits from a patient. We develop an "advantage doubly robust" estimator for learni… ▽ More Many applied decision-making problems have a dynamic component: The policymaker needs not only to choose whom to treat, but also when to start which treatment. For example, a medical doctor may choose between postponing treatment (watchful waiting) and prescribing one of several available treatments during the many visits from a patient. We develop an "advantage doubly robust" estimator for learning such dynamic treatment rules using observational data under the assumption of sequential ignorability. We prove welfare regret bounds that generalize results for doubly robust learning in the single-step setting, and show promising empirical performance in several different contexts. Our approach is practical for policy optimization, and does not need any structural (e.g., Markovian) assumptions. △ Less

Submitted 30 April, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

arXiv:1905.00744 [pdf, ps, other]

Sparsity Double Robust Inference of Average Treatment Effects

Authors: Jelena Bradic, Stefan Wager, Yinchu Zhu

Abstract: Many popular methods for building confidence intervals on causal effects under high-dimensional confounding require strong "ultra-sparsity" assumptions that may be difficult to validate in practice. To alleviate this difficulty, we here study a new method for average treatment effect estimation that yields asymptotically exact confidence intervals assuming that either the conditional response surf… ▽ More Many popular methods for building confidence intervals on causal effects under high-dimensional confounding require strong "ultra-sparsity" assumptions that may be difficult to validate in practice. To alleviate this difficulty, we here study a new method for average treatment effect estimation that yields asymptotically exact confidence intervals assuming that either the conditional response surface or the conditional probability of treatment allows for an ultra-sparse representation (but not necessarily both). This guarantee allows us to provide valid inference for average treatment effect in high dimensions under considerably more generality than available baselines. In addition, we showcase that our results are semi-parametrically efficient. △ Less

Submitted 2 May, 2019; originally announced May 2019.

arXiv:1903.02124 [pdf, other]

Experimenting in Equilibrium

Authors: Stefan Wager, Kuang Xu

Abstract: Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, as when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in larg… ▽ More Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, as when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured via mean-field modeling. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with lightweight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and show that our approach enables the platform to optimize p in large systems using vanishingly small perturbations. △ Less

Submitted 30 June, 2020; v1 submitted 5 March, 2019; originally announced March 2019.

Comments: Forthcoming in Management Science

arXiv:1902.07409 [pdf, other]

Estimating Treatment Effects with Causal Forests: An Application

Authors: Susan Athey, Stefan Wager

Abstract: We apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges. In particular, we discuss how causal forests use estimated propensity scores to be more robust to confounding, and how they handle data with clustered errors. We apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges. In particular, we discuss how causal forests use estimated propensity scores to be more robust to confounding, and how they handle data with clustered errors. △ Less

Submitted 20 February, 2019; originally announced February 2019.

Comments: This note will appear in an upcoming issue of Observational Studies, Empirical Investigation of Methods for Heterogeneity, that compiles several analyses of the same dataset

arXiv:1902.02774 [pdf, other]

Confidence Intervals for Nonparametric Empirical Bayes Analysis

Authors: Nikolaos Ignatiadis, Stefan Wager

Abstract: In an empirical Bayes analysis, we use data from repeated sampling to imitate inferences made by an oracle Bayesian with extensive knowledge of the data-generating distribution. Existing results provide a comprehensive characterization of when and why empirical Bayes point estimates accurately recover oracle Bayes behavior. In this paper, we develop flexible and practical confidence intervals that… ▽ More In an empirical Bayes analysis, we use data from repeated sampling to imitate inferences made by an oracle Bayesian with extensive knowledge of the data-generating distribution. Existing results provide a comprehensive characterization of when and why empirical Bayes point estimates accurately recover oracle Bayes behavior. In this paper, we develop flexible and practical confidence intervals that provide asymptotic frequentist coverage of empirical Bayes estimands, such as the posterior mean or the local false sign rate. The coverage statements hold even when the estimands are only partially identified or when empirical Bayes point estimates converge very slowly. △ Less

Submitted 8 September, 2021; v1 submitted 7 February, 2019; originally announced February 2019.

arXiv:1902.00956 [pdf, ps, other]

Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Lijiang Guo, Aswin Sivaraman, Minje Kim

Abstract: We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompani… ▽ More We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompaniment tracks. Hence, the pitch shift in cents suggested by the model can be used to make the voice sound in tune with the accompaniment. This approach differs from commercially used automatic pitch correction systems, where notes in the vocal tracks are shifted to be centered around notes in a user-defined score or mapped to the closest pitch among the twelve equal-tempered scale degrees. We train the model using a dataset of 4,702 amateur karaoke performances selected for good intonation. We present a Convolutional Gated Recurrent Unit (CGRU) model to accomplish this task. This method can be extended into unsupervised pitch correction of a vocal performance, popularly referred to as autotuning. △ Less

Submitted 3 February, 2019; originally announced February 2019.

arXiv:1812.09970 [pdf, other]

Synthetic Difference in Differences

Authors: Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens, Stefan Wager

Abstract: We present a new estimator for causal effects with panel data that builds on insights behind the widely used difference in differences and synthetic control methods. Relative to these methods we find, both theoretically and empirically, that this "synthetic difference in differences" estimator has desirable robustness properties, and that it performs well in settings where the conventional estimat… ▽ More We present a new estimator for causal effects with panel data that builds on insights behind the widely used difference in differences and synthetic control methods. Relative to these methods we find, both theoretically and empirically, that this "synthetic difference in differences" estimator has desirable robustness properties, and that it performs well in settings where the conventional estimators are commonly used in practice. We study the asymptotic behavior of the estimator when the systematic part of the outcome model includes latent unit factors interacted with latent time factors, and we present conditions for consistency and asymptotic normality. △ Less

Submitted 2 July, 2021; v1 submitted 24 December, 2018; originally announced December 2018.

arXiv:1811.02547 [pdf, other]

Debiased Inference of Average Partial Effects in Single-Index Models

Authors: David A. Hirshberg, Stefan Wager

Abstract: We propose a method for average partial effect estimation in high-dimensional single-index models that is root-n-consistent and asymptotically unbiased given sparsity assumptions on the underlying regression model. This note was prepared as a comment on Wooldridge and Zhu [2018], forthcoming in the Journal of Business and Economic Statistics. We propose a method for average partial effect estimation in high-dimensional single-index models that is root-n-consistent and asymptotically unbiased given sparsity assumptions on the underlying regression model. This note was prepared as a comment on Wooldridge and Zhu [2018], forthcoming in the Journal of Business and Economic Statistics. △ Less

Submitted 6 November, 2018; originally announced November 2018.

arXiv:1810.04778 [pdf, other]

Offline Multi-Action Policy Learning: Generalization and Optimization

Authors: Zhengyuan Zhou, Susan Athey, Stefan Wager

Abstract: In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. While there is a growing body of literature devoted to this problem, most existing r… ▽ More In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. While there is a growing body of literature devoted to this problem, most existing results are focused on the case where data comes from a randomized experiment, and further, there are only two possible actions, such as giving a drug to a patient or not. In this paper, we study the offline multi-action policy learning problem with observational data and where the policy may need to respect budget constraints or belong to a restricted policy class such as decision trees. We build on the theory of efficient semi-parametric inference in order to propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup, and it provides a substantial performance improvement over the existing learning algorithms. We then consider additional computational challenges that arise in implementing our method for the case where the policy is restricted to take the form of a decision tree. We propose two different approaches, one using a mixed integer program formulation and the other using a tree-search based algorithm. △ Less

Submitted 19 November, 2018; v1 submitted 10 October, 2018; originally announced October 2018.

arXiv:1807.11408 [pdf, other]

Local Linear Forests

Authors: Rina Friedberg, Julie Tibshirani, Susan Athey, Stefan Wager

Abstract: Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure… ▽ More Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure, local linear forests, enables us to improve on asymptotic rates of convergence for random forests with smooth signals, and provides substantial gains in accuracy on both real and simulated data. We prove a central limit theorem valid under regularity conditions on the forest and smoothness constraints, and propose a computationally efficient construction for confidence intervals. Moving to a causal inference application, we discuss the merits of local regression adjustments for heterogeneous treatment effect estimation, and give an example on a dataset exploring the effect word choice has on attitudes to the social safety net. Last, we include simulation results on real and generated data. △ Less

Submitted 4 September, 2020; v1 submitted 30 July, 2018; originally announced July 2018.

Comments: Forthcoming in the Journal of Computational and Graphical Statistics

arXiv:1805.02603 [pdf, ps, other]

A Data-Driven Approach to Smooth Pitch Correction for Singing Voice in Pop Music

Authors: Sanna Wager, Lijiang Guo, Aswin Sivaraman, Minje Kim

Abstract: In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on exampl… ▽ More In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on examples of semi-professional singing. The proposed approach differs from existing real-time pitch correction methods by replacing pitch tracking and map** to a discrete set of notes---for example, the twelve classes of the equal-tempered scale---with learning a correction that is continuous both in frequency and in time directly from the harmonics of the vocal and accompaniment tracks. A Recurrent Neural Network (RNN) model provides a correction that takes context into account, preserving expressive pitch bending and vibrato. This method can be extended into unsupervised pitch correction of a vocal performance---popularly referred to as autotuning. △ Less

Submitted 7 May, 2018; originally announced May 2018.

arXiv:1712.04912 [pdf, other]

Quasi-Oracle Estimation of Heterogeneous Treatment Effects

Authors: Xinkun Nie, Stefan Wager

Abstract: Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical challenges, such as personalized medicine and optimal resource allocation. In this paper, we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities in order to form an objective fun… ▽ More Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical challenges, such as personalized medicine and optimal resource allocation. In this paper, we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities in order to form an objective function that isolates the causal component of the signal. Then, we optimize this data-adaptive objective function. Our approach has several advantages over existing methods. From a practical perspective, our method is flexible and easy to use: In both steps, we can use any loss-minimization method, e.g., penalized regression, deep neural networks, or boosting; moreover, these methods can be fine-tuned by cross validation. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property: Even if the pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same error bounds as an oracle who has a priori knowledge of these two nuisance components. We implement variants of our approach based on penalized regression, kernel ridge regression, and boosting in a variety of simulation setups, and find promising performance relative to existing baselines. △ Less

Submitted 6 August, 2020; v1 submitted 13 December, 2017; originally announced December 2017.

Comments: Biometrika, forthcoming

arXiv:1712.00038 [pdf, other]

Augmented Minimax Linear Estimation

Authors: David A. Hirshberg, Stefan Wager

Abstract: Many statistical estimands can expressed as continuous linear functionals of a conditional expectation function. This includes the average treatment effect under unconfoundedness and generalizations for continuous-valued and personalized treatments. In this paper, we discuss a general approach to estimating such quantities: we begin with a simple plug-in estimator based on an estimate of the condi… ▽ More Many statistical estimands can expressed as continuous linear functionals of a conditional expectation function. This includes the average treatment effect under unconfoundedness and generalizations for continuous-valued and personalized treatments. In this paper, we discuss a general approach to estimating such quantities: we begin with a simple plug-in estimator based on an estimate of the conditional expectation function, and then correct the plug-in estimator by subtracting a minimax linear estimate of its error. We show that our method is semiparametrically efficient under weak conditions and observe promising performance on both real and simulated data. △ Less

Submitted 19 November, 2020; v1 submitted 30 November, 2017; originally announced December 2017.

Comments: 67 pages, 3 figures

MSC Class: 62F12

arXiv:1706.07550 [pdf, other]

Shape-constrained partial identification of a population mean under unknown probabilities of sample selection

Authors: Luke W. Miratrix, Stefan Wager, Jose R. Zubizarreta

Abstract: A prevailing challenge in the biomedical and social sciences is to estimate a population mean from a sample obtained with unknown selection probabilities. Using a well-known ratio estimator, Aronow and Lee (2013) proposed a method for partial identification of the mean by allowing the unknown selection probabilities to vary arbitrarily between two fixed extreme values. In this paper, we show how t… ▽ More A prevailing challenge in the biomedical and social sciences is to estimate a population mean from a sample obtained with unknown selection probabilities. Using a well-known ratio estimator, Aronow and Lee (2013) proposed a method for partial identification of the mean by allowing the unknown selection probabilities to vary arbitrarily between two fixed extreme values. In this paper, we show how to leverage auxiliary shape constraints on the population outcome distribution, such as symmetry or log-concavity, to obtain tighter bounds on the population mean. We use this method to estimate the performance of Aymara students---an ethnic minority in the north of Chile---in a national educational standardized test. We implement this method in the new statistical software package scbounds for R. △ Less

Submitted 22 June, 2017; originally announced June 2017.

Showing 1–50 of 75 results for author: Wager, S