-
Toward a Taxonomy of Trust for Probabilistic Machine Learning
Authors:
Tamara Broderick,
Andrew Gelman,
Rachael Meager,
Anna L. Smith,
Tian Zheng
Abstract:
Probabilistic machine learning increasingly informs critical decisions in medicine, economics, politics, and beyond. We need evidence to support that the resulting decisions are well-founded. To aid development of trust in these decisions, we develop a taxonomy delineating where trust in an analysis can break down: (1) in the translation of real-world goals to goals on a particular set of availabl…
▽ More
Probabilistic machine learning increasingly informs critical decisions in medicine, economics, politics, and beyond. We need evidence to support that the resulting decisions are well-founded. To aid development of trust in these decisions, we develop a taxonomy delineating where trust in an analysis can break down: (1) in the translation of real-world goals to goals on a particular set of available training data, (2) in the translation of abstract goals on the training data to a concrete mathematical problem, (3) in the use of an algorithm to solve the stated mathematical problem, and (4) in the use of a particular code implementation of the chosen algorithm. We detail how trust can fail at each step and illustrate our taxonomy with two case studies: an analysis of the efficacy of microcredit and The Economist's predictions of the 2020 US presidential election. Finally, we describe a wide variety of methods that can be used to increase trust at each step of our taxonomy. The use of our taxonomy highlights steps where existing research work on trust tends to concentrate and also steps where establishing trust is particularly challenging.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
An Automatic Finite-Sample Robustness Metric: When Can Drop** a Little Data Make a Big Difference?
Authors:
Tamara Broderick,
Ryan Giordano,
Rachael Meager
Abstract:
Study samples often differ from the target populations of inference and policy decisions in non-random ways. Researchers typically believe that such departures from random sampling -- due to changes in the population over time and space, or difficulties in sampling truly randomly -- are small, and their corresponding impact on the inference should be small as well. We might therefore be concerned…
▽ More
Study samples often differ from the target populations of inference and policy decisions in non-random ways. Researchers typically believe that such departures from random sampling -- due to changes in the population over time and space, or difficulties in sampling truly randomly -- are small, and their corresponding impact on the inference should be small as well. We might therefore be concerned if the conclusions of our studies are excessively sensitive to a very small proportion of our sample data. We propose a method to assess the sensitivity of applied econometric conclusions to the removal of a small fraction of the sample. Manually checking the influence of all possible small subsets is computationally infeasible, so we use an approximation to find the most influential subset. Our metric, the "Approximate Maximum Influence Perturbation," is based on the classical influence function, and is automatically computable for common methods including (but not limited to) OLS, IV, MLE, GMM, and variational Bayes. We provide finite-sample error bounds on approximation performance. At minimal extra cost, we provide an exact finite-sample lower bound on sensitivity. We find that sensitivity is driven by a signal-to-noise ratio in the inference problem, is not reflected in standard errors, does not disappear asymptotically, and is not due to misspecification. While some empirical applications are robust, results of several influential economics papers can be overturned by removing less than 1% of the sample.
△ Less
Submitted 19 July, 2023; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Fast robustness quantification with variational Bayes
Authors:
Ryan Giordano,
Tamara Broderick,
Rachael Meager,
Jonathan Huggins,
Michael Jordan
Abstract:
Bayesian hierarchical models are increasing popular in economics. When using hierarchical models, it is useful not only to calculate posterior expectations, but also to measure the robustness of these expectations to reasonable alternative prior choices. We use variational Bayes and linear response methods to provide fast, accurate posterior means and robustness measures with an application to mea…
▽ More
Bayesian hierarchical models are increasing popular in economics. When using hierarchical models, it is useful not only to calculate posterior expectations, but also to measure the robustness of these expectations to reasonable alternative prior choices. We use variational Bayes and linear response methods to provide fast, accurate posterior means and robustness measures with an application to measuring the effectiveness of microcredit in the develo** world.
△ Less
Submitted 22 June, 2016;
originally announced June 2016.
-
Understanding the Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of 7 Randomised Experiments
Authors:
Rachael Meager
Abstract:
Bayesian hierarchical models are a methodology for aggregation and synthesis of data from heterogeneous settings, used widely in statistics and other disciplines. I apply this framework to the evidence from 7 randomized experiments of expanding access to microcredit to assess the general impact of the intervention on household outcomes and the heterogeneity in this impact across sites. The results…
▽ More
Bayesian hierarchical models are a methodology for aggregation and synthesis of data from heterogeneous settings, used widely in statistics and other disciplines. I apply this framework to the evidence from 7 randomized experiments of expanding access to microcredit to assess the general impact of the intervention on household outcomes and the heterogeneity in this impact across sites. The results suggest that the effect of microcredit is likely to be positive but small relative to control group average levels, and the possibility of a negative impact cannot be ruled out. By contrast, common meta-analytic methods that pool all the data without assessing the heterogeneity misleadingly produce "statistically significant" results in 2 of the 6 household outcomes. Standard pooling metrics for the studies indicate on average 60% pooling on the treatment effects, suggesting that the site-specific effects are reasonably externally valid, and thus informative for each other and for the general case. The cross-study heterogeneity is almost entirely generated by heterogeneous effects for the 27% households who previously operated businesses before microcredit expansion, although this group is likely to see much larger impacts overall. A Ridge regression procedure to assess the correlations between site-specific covariates and treatment effects indicates that the remaining heterogeneity is strongly correlated with differences in economic variables, but not with differences in study design protocols. The average interest rate and the average loan size have the strongest correlation with the treatment effects, and both are negative.
△ Less
Submitted 12 July, 2016; v1 submitted 22 June, 2015;
originally announced June 2015.