Skip to main content

Showing 1–31 of 31 results for author: Hahn, P R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.02530  [pdf, other

    stat.ME

    LongBet: Heterogeneous Treatment Effect Estimation in Panel Data

    Authors: Meijia Wang, Ignacio Martinez, P. Richard Hahn

    Abstract: This paper introduces a novel approach for estimating heterogeneous treatment effects of binary treatment in panel data, particularly focusing on short panel data with large cross-sectional data and observed confoundings. In contrast to traditional literature in difference-in-differences method that often relies on the parallel trend assumption, our proposed model does not necessitate such an assu… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2405.03130  [pdf, other

    stat.ML cs.LG

    Deep Learning for Causal Inference: A Comparison of Architectures for Heterogeneous Treatment Effect Estimation

    Authors: Demetrios Papakostas, Andrew Herren, P. Richard Hahn, Francisco Castillo

    Abstract: Causal inference has gained much popularity in recent years, with interests ranging from academic, to industrial, to educational, and all in between. Concurrently, the study and usage of neural networks has also grown profoundly (albeit at a far faster rate). What we aim to do in this blog write-up is demonstrate a Neural Network causal inference architecture. We develop a fully connected neural n… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  3. arXiv:2305.11163  [pdf, ps, other

    stat.ME math.ST

    On true versus estimated propensity scores for treatment effect estimation with discrete controls

    Authors: Andrew Herren, P. Richard Hahn

    Abstract: The finite sample variance of an inverse propensity weighted estimator is derived in the case of discrete control variables with finite support. The obtained expressions generally corroborate widely-cited asymptotic theory showing that estimated propensity scores are superior to true propensity scores in the context of inverse propensity weighting. However, similar analysis of a modified estimator… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  4. arXiv:2209.11400  [pdf, other

    stat.ME stat.ML

    Feature selection in stratification estimators of causal effects: lessons from potential outcomes, causal diagrams, and structural equations

    Authors: P. Richard Hahn, Andrew Herren

    Abstract: What is the ideal regression (if any) for estimating average causal effects? We study this question in the setting of discrete covariates, deriving expressions for the finite-sample variance of various stratification estimators. This approach clarifies the fundamental statistical phenomena underlying many widely-cited results. Our exposition combines insights from three distinct methodological tra… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

  5. arXiv:2209.06998  [pdf, other

    stat.ML cs.LG

    Stochastic Tree Ensembles for Estimating Heterogeneous Effects

    Authors: Nikolay Krantsevich, **gyu He, P. Richard Hahn

    Abstract: Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper d… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: 12 pages, 1 figure

  6. arXiv:2208.09970  [pdf, ps, other

    stat.ME stat.ML

    Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation

    Authors: Andrew Herren, P. Richard Hahn

    Abstract: SHAP is a popular method for measuring variable importance in machine learning models. In this paper, we study the algorithm used to estimate SHAP scores and outline its connection to the functional ANOVA decomposition. We use this connection to show that challenges in SHAP approximations largely relate to the choice of a feature distribution and the number of $2^p$ ANOVA terms estimated. We argue… ▽ More

    Submitted 11 November, 2022; v1 submitted 21 August, 2022; originally announced August 2022.

  7. arXiv:2204.10963  [pdf, other

    stat.ME econ.EM stat.CO stat.ML

    Local Gaussian process extrapolation for BART models with applications to causal inference

    Authors: Meijiang Wang, **gyu He, P. Richard Hahn

    Abstract: Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically provide inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts Gaussi… ▽ More

    Submitted 24 February, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

  8. arXiv:2106.10364  [pdf, other

    stat.AP stat.ME

    Bayesian decision theory for tree-based adaptive screening tests with an application to youth delinquency

    Authors: Chelsea Krantsevich, P. Richard Hahn, Yi Zheng, Charles Katz

    Abstract: Crime prevention strategies based on early intervention depend on accurate risk assessment instruments for identifying high risk youth. It is important in this context that the instruments be convenient to administer, which means, in particular, that they should also be reasonably brief; adaptive screening tests are useful for this purpose. Adaptive tests constructed using classification and regre… ▽ More

    Submitted 27 June, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: 22 pages, 10 figures

  9. arXiv:2106.04503  [pdf, other

    stat.ME

    Do forecasts of bankruptcy cause bankruptcy? A machine learning sensitivity analysis

    Authors: Demetrios Papakostas, P. Richard Hahn, Jared Murray, Frank Zhou, Joseph Gerakos

    Abstract: It is widely speculated that auditors' public forecasts of bankruptcy are, at least in part, self-fulfilling prophecies in the sense that they might actually cause bankruptcies that would not have otherwise occurred. This conjecture is hard to prove, however, because the strong association between bankruptcies and bankruptcy forecasts could simply indicate that auditors are skillful forecasters wi… ▽ More

    Submitted 23 June, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: 26 pages, 13 figures

  10. arXiv:2009.06183  [pdf, ps, other

    stat.ME stat.AP stat.ML

    Semi-supervised learning and the question of true versus estimated propensity scores

    Authors: Andrew Herren, P. Richard Hahn

    Abstract: A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved. According to this formulation, large unlabeled data sets could be used to estimate a high dimensional propensity function and causal inference using a much smaller la… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

  11. arXiv:2007.09845  [pdf, other

    stat.AP

    Estimating heterogeneous effects of continuous exposures using Bayesian tree ensembles: revisiting the impact of abortion rates on crime

    Authors: Spencer Woody, Carlos M. Carvalho, P. Richard Hahn, Jared S. Murray

    Abstract: In estimating the causal effect of a continuous exposure or treatment, it is important to control for all confounding factors. However, most existing methods require parametric specification for how control variables influence the outcome or generalized propensity score, and inference on treatment effects is usually sensitive to this choice. Additionally, it is often the goal to estimate how the t… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

  12. arXiv:2002.03375  [pdf, other

    stat.ML cs.LG stat.ME

    Stochastic tree ensembles for regularized nonlinear regression

    Authors: **gyu He, P. Richard Hahn

    Abstract: This paper develops a novel stochastic tree ensemble method for nonlinear regression, which we refer to as XBART, short for Accelerated Bayesian Additive Regression Trees. By combining regularization and stochastic search strategies from Bayesian modeling with computationally efficient techniques from recursive partitioning approaches, the new method attains state-of-the-art performance: in many s… ▽ More

    Submitted 3 June, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

  13. arXiv:1912.10334  [pdf, other

    stat.ME

    A Symmetric Prior for Multinomial Probit Models

    Authors: Lane F. Burgette, David Puelz, P. Richard Hahn

    Abstract: Fitted probabilities from widely used Bayesian multinomial probit models can depend strongly on the choice of a base category, which is used to uniquely identify the parameters of the model. This paper proposes a novel identification strategy, and associated prior distribution for the model parameters, that renders the prior symmetric with respect to relabeling the outcome categories. The new prio… ▽ More

    Submitted 17 May, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

  14. arXiv:1905.09715  [pdf, other

    stat.OT

    An illustration of the risk of borrowing information via a shared likelihood

    Authors: P. Richard Hahn

    Abstract: A concrete, stylized example illustrates that inferences may be degraded, rather than improved, by incorporating supplementary data via a joint likelihood. In the example, the likelihood is assumed to be correctly specified, as is the prior over the parameter of interest; all that is necessary for the joint modeling approach to suffer is misspecification of the prior over a nuisance parameter.

    Submitted 23 May, 2019; originally announced May 2019.

  15. arXiv:1905.09515  [pdf, ps, other

    stat.ME stat.OT

    Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017

    Authors: P. Richard Hahn, Vincent Dorie, Jared S. Murray

    Abstract: This brief note documents the data generating processes used in the 2017 Data Analysis Challenge associated with the Atlantic Causal Inference Conference (ACIC). The focus of the challenge was estimation and inference for conditional average treatment effects (CATEs) in the presence of targeted selection, which leads to strong confounding. The associated data files and further plots can be found o… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  16. arXiv:1810.02215  [pdf, ps, other

    stat.ML cs.LG

    XBART: Accelerated Bayesian Additive Regression Trees

    Authors: **gyu He, Saar Yalov, P. Richard Hahn

    Abstract: Bayesian additive regression trees (BART) (Chipman et. al., 2010) is a powerful predictive model that often outperforms alternative models at out-of-sample prediction. BART is especially well-suited to settings with unstructured predictor variables and substantial sources of unmeasured variation as is typical in the social, behavioral and health sciences. This paper develops a modified version of… ▽ More

    Submitted 14 March, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

  17. arXiv:1809.09337  [pdf, other

    cs.AI stat.ME

    A Survey of Learning Causality with Data: Problems and Methods

    Authors: Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, Huan Liu

    Abstract: This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and r… ▽ More

    Submitted 5 May, 2020; v1 submitted 25 September, 2018; originally announced September 2018.

    Comments: 35 pages, accepted by ACM CSUR

  18. arXiv:1806.05738  [pdf, other

    stat.CO stat.ML

    Efficient sampling for Gaussian linear regression with arbitrary priors

    Authors: P. Richard Hahn, **gyu He, Hedibert Lopes

    Abstract: This paper develops a slice sampler for Bayesian linear regression models with arbitrary priors. The new sampler has two advantages over current approaches. One, it is faster than many custom implementations that rely on auxiliary latent variables, if the number of regressors is large. Two, it can be used with any prior with a density function that can be evaluated up to a normalizing constant, ma… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

  19. arXiv:1706.10180  [pdf, other

    q-fin.PM stat.AP

    Regret-based Selection for Sparse Dynamic Portfolios

    Authors: David Puelz, P. Richard Hahn, Carlos Carvalho

    Abstract: This paper considers portfolio construction in a dynamic setting. We specify a loss function comprised of utility and complexity components with an unknown tradeoff parameter. We develop a novel regret-based criterion for selecting the tradeoff parameter to construct optimal sparse portfolios over time.

    Submitted 23 July, 2017; v1 submitted 30 June, 2017; originally announced June 2017.

  20. arXiv:1706.09523  [pdf, other

    stat.ME

    Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects

    Authors: P. Richard Hahn, Jared S. Murray, Carlos Carvalho

    Abstract: This paper presents a novel nonlinear regression model for estimating heterogeneous treatment effects from observational data, geared specifically towards situations with small effect sizes, heterogeneous effects, and strong confounding. Standard nonlinear regression models, which may work quite well for prediction, have two notable weaknesses when used to estimate heterogeneous treatment effects.… ▽ More

    Submitted 13 November, 2019; v1 submitted 28 June, 2017; originally announced June 2017.

  21. arXiv:1605.08963  [pdf, other

    stat.ME

    Variable Selection in Seemingly Unrelated Regressions with Random Predictors

    Authors: David Puelz, P. Richard Hahn, Carlos Carvalho

    Abstract: This paper considers linear model selection when the response is vector-valued and the predictors are randomly observed. We propose a new approach that decouples statistical inference from the selection step in a "post-inference model summarization" strategy. We study the impact of predictor uncertainty on the model selection procedure. The method is demonstrated through an application to asset pr… ▽ More

    Submitted 3 June, 2016; v1 submitted 29 May, 2016; originally announced May 2016.

  22. arXiv:1602.02176  [pdf, other

    stat.ME

    Regularization and confounding in linear regression for treatment effect estimation

    Authors: P. Richard Hahn, Carlos M. Carvalho, **gyu He, David Puelz

    Abstract: This paper investigates the use of regularization priors in the context of treatment effect estimation using observational data where the number of control variables is large relative to the number of observations. First, the phenomenon of regularization-induced confounding is introduced, which refers to the tendency of regularization priors to adversely bias treatment effect estimates by over-shr… ▽ More

    Submitted 27 December, 2016; v1 submitted 5 February, 2016; originally announced February 2016.

  23. arXiv:1510.03385  [pdf, other

    q-fin.ST stat.AP

    Optimal ETF Selection for Passive Investing

    Authors: David Puelz, Carlos M. Carvalho, P. Richard Hahn

    Abstract: This paper considers the problem of isolating a small number of exchange traded funds (ETFs) that suffice to capture the fundamental dimensions of variation in U.S. financial markets. First, the data is fit to a vector-valued Bayesian regression model, which is a matrix-variate generalization of the well known stochastic search variable selection (SSVS) of George and McCulloch (1993). ETF selectio… ▽ More

    Submitted 28 November, 2015; v1 submitted 12 October, 2015; originally announced October 2015.

  24. On recursive Bayesian predictive distributions

    Authors: P. Richard Hahn, Ryan Martin, Stephen G. Walker

    Abstract: A Bayesian framework is attractive in the context of prediction, but a fast recursive update of the predictive distribution has apparently been out of reach, in part because Monte Carlo methods are generally used to compute the predictive. This paper shows that online Bayesian prediction is possible by characterizing the Bayesian predictive update in terms of a bivariate copula, making it unnecess… ▽ More

    Submitted 30 April, 2017; v1 submitted 29 August, 2015; originally announced August 2015.

    Comments: 22 pages, 3 figures, 3 tables

    Journal ref: Journal of the American Statistical Association, 2018, volume 113, number 523, pages 1085--1093

  25. arXiv:1502.06045  [pdf, other

    stat.ME

    Model specification via sequential coherence and backward induction

    Authors: P. Richard Hahn

    Abstract: This paper describes how to specify probability models for data analysis via a backward induction procedure. The new approach yields coherent, prior-free uncertainty assessment. After presenting some intuition-building examples, the new approach is applied to a kernel density estimator, which leads to a novel method for computing point-wise credible intervals in nonparametric density estimation. T… ▽ More

    Submitted 20 February, 2015; originally announced February 2015.

    Comments: 25

  26. arXiv:1409.4815  [pdf, other

    stat.AP

    A Bayesian hierarchical model for inferring player strategy types in a number guessing game

    Authors: P. Richard Hahn, Indranil Goswami, Carl Mela

    Abstract: This paper presents an in-depth statistical analysis of an experiment designed to measure the extent to which players in a simple game behave according to a popular behavioral economic model. The p-beauty contest is a multi-player number guessing game that has been widely used to study strategic behavior. This paper describes beauty contest experiments for an audience of data analysts, with a spec… ▽ More

    Submitted 16 September, 2014; originally announced September 2014.

    Comments: 46 pages, 14 figures, 2 tables

  27. arXiv:1408.0464  [pdf, other

    stat.ME

    Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective

    Authors: P. Richard Hahn, Carlos M. Carvalho

    Abstract: Selecting a subset of variables for linear models remains an active area of research. This paper reviews many of the recent contributions to the Bayesian model selection and shrinkage prior literature. A posterior variable selection summary is proposed, which distills a full posterior distribution over regression coefficients into a sequence of sparse linear predictors.

    Submitted 3 August, 2014; originally announced August 2014.

    Comments: 30 pages, 6 figures, 2 tables

  28. arXiv:1408.0462  [pdf, other

    stat.ME

    Shrinkage priors for linear instrumental variable models with many instruments

    Authors: P. Richard Hahn, Hedibert Lopes

    Abstract: This paper addresses the weak instruments problem in linear instrumental variable models from a Bayesian perspective. The new approach has two components. First, a novel predictor-dependent shrinkage prior is developed for the many instruments setting. The prior is constructed based on a factor model decomposition of the matrix of observed instruments, allowing many instruments to be incorporated… ▽ More

    Submitted 3 August, 2014; originally announced August 2014.

    Comments: 27 pages, 6 figures, 3 tables

  29. arXiv:1407.8430  [pdf, other

    stat.ME

    A Bayesian partial identification approach to inferring the prevalence of accounting misconduct

    Authors: P. Richard Hahn, Jared S. Murray, Ioanna Manolopoulou

    Abstract: This paper describes the use of flexible Bayesian regression models for estimating a partially identified probability function. Our approach permits efficient sensitivity analysis concerning the posterior impact of priors on the partially identified component of the regression model. The new methodology is illustrated on an important problem where only partially observed data is available - inferr… ▽ More

    Submitted 6 March, 2015; v1 submitted 31 July, 2014; originally announced July 2014.

  30. arXiv:1405.0110  [pdf, ps, other

    math.PR math.FA math.ST stat.ML

    A Structural Approach to Coordinate-Free Statistics

    Authors: Tom LaGatta, P. Richard Hahn

    Abstract: We consider the question of learning in general topological vector spaces. By exploiting known (or parametrized) covariance structures, our Main Theorem demonstrates that any continuous linear map corresponds to a certain isomorphism of embedded Hilbert spaces. By inverting this isomorphism and extending continuously, we construct a version of the Ordinary Least Squares estimator in absolute gener… ▽ More

    Submitted 5 May, 2014; v1 submitted 1 May, 2014; originally announced May 2014.

    Comments: 31 pages

  31. arXiv:1011.3725  [pdf, other

    stat.ME

    Predictor-dependent shrinkage for linear regression via partial factor modeling

    Authors: P. Richard Hahn, Sayan Mukherjee, Carlos Carvalho

    Abstract: In prediction problems with more predictors than observations, it can sometimes be helpful to use a joint probability model, $π(Y,X)$, rather than a purely conditional model, $π(Y \mid X)$, where $Y$ is a scalar response variable and $X$ is a vector of predictors. This approach is motivated by the fact that in many situations the marginal predictor distribution $π(X)$ can provide useful informatio… ▽ More

    Submitted 16 November, 2010; originally announced November 2010.

    Comments: 16 pages, 1 figure, 2 tables