-
Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk
Authors:
Colleen Chan,
Kisung You,
Sunny Chung,
Mauro Giuffrè,
Theo Saarinen,
Niroop Rajashekar,
Yuan Pu,
Yeo Eun Shin,
Loren Laine,
Ambrose Wong,
René Kizilcec,
Jasjeet Sekhon,
Dennis Shung
Abstract:
Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electroni…
▽ More
Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electronic health record (EHR) with emergency medicine physicians, internal medicine physicians, and medical students to evaluate its effect on physician acceptance and trust in AI clinical decision support systems (AI-CDSS). GutGPT provides risk predictions from a validated machine learning model and evidence-based answers by querying extracted clinical guidelines. Participants were randomized to GutGPT and an interactive dashboard, or the interactive dashboard and a search engine. Surveys and educational assessments taken before and after measured technology acceptance and content mastery. Preliminary results showed mixed effects on acceptance after using GutGPT compared to the dashboard or search engine but appeared to improve content mastery based on simulation performance. Overall, this study demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented optimally and paired with interactive interfaces.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator
Authors:
Dennis Shen,
Dogyoon Song,
Peng Ding,
Jasjeet S. Sekhon
Abstract:
Deep learning research has uncovered the phenomenon of benign overfitting for overparameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical, u…
▽ More
Deep learning research has uncovered the phenomenon of benign overfitting for overparameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical, underparameterized settings, its behavior in high-dimensional, overparameterized regimes is less explored (unlike for ridge or lasso regression) though significant progress has been made of late. We contribute to this growing literature by providing fundamental algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In particular, we provide algebraic equivalents of (i) the leave-$k$-out residual formula, (ii) Cochran's formula, and (iii) the Frisch-Waugh-Lovell theorem in the overparameterized regime. These results aid in understanding the OLS interpolator's ability to generalize and have substantive implications for causal inference. Under the Gauss-Markov model, we present statistical results such as an extension of the Gauss-Markov theorem and an analysis of variance estimation under homoskedastic errors for the overparameterized regime. To substantiate our theoretical contributions, we conduct simulations that further explore the stochastic properties of the OLS interpolator.
△ Less
Submitted 30 May, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data
Authors:
Dennis Shen,
Peng Ding,
Jasjeet Sekhon,
Bin Yu
Abstract:
A central goal in social science is to evaluate the causal effect of a policy. One dominant approach is through panel data analysis in which the behaviors of multiple units are observed over time. The information across time and space motivates two general approaches: (i) horizontal regression (i.e., unconfoundedness), which exploits time series patterns, and (ii) vertical regression (e.g., synthe…
▽ More
A central goal in social science is to evaluate the causal effect of a policy. One dominant approach is through panel data analysis in which the behaviors of multiple units are observed over time. The information across time and space motivates two general approaches: (i) horizontal regression (i.e., unconfoundedness), which exploits time series patterns, and (ii) vertical regression (e.g., synthetic controls), which exploits cross-sectional patterns. Conventional wisdom states that the two approaches are fundamentally different. We establish this position to be partly false for estimation but generally true for inference. In particular, we prove that both approaches yield identical point estimates under several standard settings. For the same point estimate, however, each approach quantifies uncertainty with respect to a distinct estimand. In turn, the confidence interval developed for one estimand may have incorrect coverage for another. This emphasizes that the source of randomness that researchers assume has direct implications for the accuracy of inference.
△ Less
Submitted 8 October, 2022; v1 submitted 29 July, 2022;
originally announced July 2022.
-
Nonparametric identification is not enough, but randomized controlled trials are
Authors:
P. M. Aronow,
James M. Robins,
Theo Saarinen,
Fredrik Sävje,
Jasjeet Sekhon
Abstract:
We argue that randomized controlled trials (RCTs) are special even among settings where average treatment effects are identified by a nonparametric unconfoundedness assumption. This claim follows from two results of Robins and Ritov (1997): (1) with at least one continuous covariate control, no estimator of the average treatment effect exists which is uniformly consistent without further assumptio…
▽ More
We argue that randomized controlled trials (RCTs) are special even among settings where average treatment effects are identified by a nonparametric unconfoundedness assumption. This claim follows from two results of Robins and Ritov (1997): (1) with at least one continuous covariate control, no estimator of the average treatment effect exists which is uniformly consistent without further assumptions, (2) knowledge of the propensity score yields a uniformly consistent estimator and honest confidence intervals that shrink at parametric rates with increasing sample size, regardless of how complicated the propensity score function is. We emphasize the latter point, and note that successfully-conducted RCTs provide knowledge of the propensity score to the researcher. We discuss modern developments in covariate adjustment for RCTs, noting that statistical models and machine learning methods can be used to improve efficiency while preserving finite sample unbiasedness. We conclude that statistical inference has the potential to be fundamentally more difficult in observational settings than it is in RCTs, even when all confounders are measured.
△ Less
Submitted 26 September, 2021; v1 submitted 25 August, 2021;
originally announced August 2021.
-
Hybridized Threshold Clustering for Massive Data
Authors:
Jianmei Luo,
ChandraVyas Annakula,
Aruna Sai Kannamareddy,
Jasjeet S. Sekhon,
William Henry Hsu,
Michael Higgins
Abstract:
As the size $n$ of datasets become massive, many commonly-used clustering algorithms (for example, $k$-means or hierarchical agglomerative clustering (HAC) require prohibitive computational cost and memory. In this paper, we propose a solution to these clustering problems by extending threshold clustering (TC) to problems of instance selection. TC is a recently developed clustering algorithm desig…
▽ More
As the size $n$ of datasets become massive, many commonly-used clustering algorithms (for example, $k$-means or hierarchical agglomerative clustering (HAC) require prohibitive computational cost and memory. In this paper, we propose a solution to these clustering problems by extending threshold clustering (TC) to problems of instance selection. TC is a recently developed clustering algorithm designed to partition data into many small clusters in linearithmic time (on average). Our proposed clustering method is as follows. First, TC is performed and clusters are reduced into single "prototype" points. Then, TC is applied repeatedly on these prototype points until sufficient data reduction has been obtained. Finally, a more sophisticated clustering algorithm is applied to the reduced prototype points, thereby obtaining a clustering on all $n$ data points. This entire procedure for clustering is called iterative hybridized threshold clustering (IHTC). Through simulation results and by applying our methodology on several real datasets, we show that IHTC combined with $k$-means or HAC substantially reduces the run time and memory usage of the original clustering algorithms while still preserving their performance. Additionally, IHTC helps prevent singular data points from being overfit by clustering algorithms.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Linear Aggregation in Tree-based Estimators
Authors:
Sören R. Künzel,
Theo F. Saarinen,
Edward W. Liu,
Jasjeet S. Sekhon
Abstract:
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggrega…
▽ More
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggregation functions on the corresponding nodes, and we offer a quasilinear time implementation. We demonstrate the algorithm's favorable performance on real-world benchmarks and in an extensive simulation study, and we demonstrate its improved interpretability using a large get-out-the-vote experiment. We provide an open-source software package that implements several tree-based estimators with linear aggregation functions.
△ Less
Submitted 9 September, 2021; v1 submitted 15 June, 2019;
originally announced June 2019.
-
Shrinkage Estimators in Online Experiments
Authors:
Drew Dimmery,
Eytan Bakshy,
Jasjeet Sekhon
Abstract:
We develop and analyze empirical Bayes Stein-type estimators for use in the estimation of causal effects in large-scale online experiments. While online experiments are generally thought to be distinguished by their large sample size, we focus on the multiplicity of treatment groups. The typical analysis practice is to use simple differences-in-means (perhaps with covariate adjustment) as if all t…
▽ More
We develop and analyze empirical Bayes Stein-type estimators for use in the estimation of causal effects in large-scale online experiments. While online experiments are generally thought to be distinguished by their large sample size, we focus on the multiplicity of treatment groups. The typical analysis practice is to use simple differences-in-means (perhaps with covariate adjustment) as if all treatment arms were independent. In this work we develop consistent, small bias, shrinkage estimators for this setting. In addition to achieving lower mean squared error these estimators retain important frequentist properties such as coverage under most reasonable scenarios. Modern sequential methods of experimentation and optimization such as multi-armed bandit optimization (where treatment allocations adapt over time to prior responses) benefit from the use of our shrinkage estimators. Exploration under empirical Bayes focuses more efficiently on near-optimal arms, improving the resulting decisions made under uncertainty. We demonstrate these properties by examining seventeen large-scale experiments conducted on Facebook from April to June 2017.
△ Less
Submitted 29 April, 2019;
originally announced April 2019.
-
Active Matrix Factorization for Surveys
Authors:
Chelsea Zhang,
Sean J. Taylor,
Curtiss Cobb,
Jasjeet Sekhon
Abstract:
Amid historically low response rates, survey researchers seek ways to reduce respondent burden while measuring desired concepts with precision. We propose to ask fewer questions of respondents and impute missing responses via probabilistic matrix factorization. A variance-minimizing active learning criterion chooses the most informative questions per respondent. In simulations of our matrix sampli…
▽ More
Amid historically low response rates, survey researchers seek ways to reduce respondent burden while measuring desired concepts with precision. We propose to ask fewer questions of respondents and impute missing responses via probabilistic matrix factorization. A variance-minimizing active learning criterion chooses the most informative questions per respondent. In simulations of our matrix sampling procedure on real-world surveys, as well as a Facebook survey experiment, we find active question selection achieves efficiency gains over baselines. The reduction in imputation error is heterogeneous across questions, and depends on the latent concepts they capture. The imputation procedure can benefit from incorporating respondent side information, modeling responses as ordered logit rather than Gaussian, and accounting for order effects. With our method, survey researchers obtain principled suggestions of questions to retain and, if desired, can automate the design of shorter instruments.
△ Less
Submitted 18 June, 2019; v1 submitted 20 February, 2019;
originally announced February 2019.
-
Causaltoolbox---Estimator Stability for Heterogeneous Treatment Effects
Authors:
Sören R. Künzel,
Simon J. S. Walter,
Jasjeet S. Sekhon
Abstract:
Estimating heterogeneous treatment effects has become increasingly important in many fields and life and death decisions are now based on these estimates: for example, selecting a personalized course of medical treatment. Recently, a variety of procedures relying on different assumptions have been suggested for estimating heterogeneous treatment effects. Unfortunately, there are no compelling appr…
▽ More
Estimating heterogeneous treatment effects has become increasingly important in many fields and life and death decisions are now based on these estimates: for example, selecting a personalized course of medical treatment. Recently, a variety of procedures relying on different assumptions have been suggested for estimating heterogeneous treatment effects. Unfortunately, there are no compelling approaches that allow identification of the procedure that has assumptions that hew closest to the process generating the data set under study and researchers often select one arbitrarily. This approach risks making inferences that rely on incorrect assumptions and gives the experimenter too much scope for $p$-hacking. A single estimator will also tend to overlook patterns other estimators could have picked up. We believe that the conclusion of many published papers might change had a different estimator been chosen and we suggest that practitioners should evaluate many estimators and assess their similarity when investigating heterogeneous treatment effects. We demonstrate this by applying 28 different estimation procedures to an emulated observational data set; this analysis shows that different estimation procedures may give starkly different estimates. We also provide an extensible \texttt{R} package which makes it straightforward for practitioners to follow our recommendations.
△ Less
Submitted 28 March, 2019; v1 submitted 7 November, 2018;
originally announced November 2018.
-
Time-uniform, nonparametric, nonasymptotic confidence sequences
Authors:
Steven R. Howard,
Aaditya Ramdas,
Jon McAuliffe,
Jasjeet Sekhon
Abstract:
A confidence sequence is a sequence of confidence intervals that is uniformly valid over an unbounded time horizon. Our work develops confidence sequences whose widths go to zero, with nonasymptotic coverage guarantees under nonparametric conditions. We draw connections between the Cramér-Chernoff method for exponential concentration, the law of the iterated logarithm (LIL), and the sequential pro…
▽ More
A confidence sequence is a sequence of confidence intervals that is uniformly valid over an unbounded time horizon. Our work develops confidence sequences whose widths go to zero, with nonasymptotic coverage guarantees under nonparametric conditions. We draw connections between the Cramér-Chernoff method for exponential concentration, the law of the iterated logarithm (LIL), and the sequential probability ratio test -- our confidence sequences are time-uniform extensions of the first; provide tight, nonasymptotic characterizations of the second; and generalize the third to nonparametric settings, including sub-Gaussian and Bernstein conditions, self-normalized processes, and matrix martingales. We illustrate the generality of our proof techniques by deriving an empirical-Bernstein bound growing at a LIL rate, as well as a novel upper LIL for the maximum eigenvalue of a sum of random matrices. Finally, we apply our methods to covariance matrix estimation and to estimation of sample average treatment effect under the Neyman-Rubin potential outcomes model.
△ Less
Submitted 6 August, 2022; v1 submitted 18 October, 2018;
originally announced October 2018.
-
Transfer Learning for Estimating Causal Effects using Neural Networks
Authors:
Sören R. Künzel,
Bradly C. Stadie,
Nikita Vemuri,
Varsha Ramakrishnan,
Jasjeet S. Sekhon,
Pieter Abbeel
Abstract:
We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the exta…
▽ More
We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the extant literature using extensive simulation studies based on large-scale voter persuasion experiments and the MNIST database. Our methods can perform an order of magnitude better than existing benchmarks while using a fraction of the data.
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
Inference on a New Class of Sample Average Treatment Effects
Authors:
Jasjeet S. Sekhon,
Yotam Shem-Tov
Abstract:
We derive new variance formulas for inference on a general class of estimands of causal average treatment effects in a Randomized Control Trial (RCT). We generalize Robins (1988) and show that when the estimand of interest is the Sample Average Treatment Effect of the Treated (SATT or SATC for controls), a consistent variance estimator exists. Although these estimands are equal to the Sample Avera…
▽ More
We derive new variance formulas for inference on a general class of estimands of causal average treatment effects in a Randomized Control Trial (RCT). We generalize Robins (1988) and show that when the estimand of interest is the Sample Average Treatment Effect of the Treated (SATT or SATC for controls), a consistent variance estimator exists. Although these estimands are equal to the Sample Average Treatment Effect (SATE) in expectation, potentially large differences in both accuracy and coverage can occur by the change of estimand, even asymptotically. Inference on the SATE, even using a conservative confidence interval, provides incorrect coverage of the SATT or SATC. We derive the variance and limiting distribution of a new and general class of estimands---any mixing between SATT and SATC---for which the SATE is a specific case. We demonstrate the applicability of the new theoretical results using Monte-Carlo simulations and an empirical application with hundreds of online experiments with an average sample size of approximately one hundred million observations per experiment. An R package, estCI, that implements all the proposed estimation procedures is available.
△ Less
Submitted 18 October, 2017; v1 submitted 7 August, 2017;
originally announced August 2017.
-
Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning
Authors:
Sören R. Künzel,
Jasjeet S. Sekhon,
Peter J. Bickel,
Bin Yu
Abstract:
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Ran…
▽ More
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networks---to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new meta-algorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other, and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the meta-learners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our new X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.
△ Less
Submitted 23 April, 2019; v1 submitted 12 June, 2017;
originally announced June 2017.
-
Worth Weighting? How to Think About and Use Weights in Survey Experiments
Authors:
Luke W. Miratrix,
Jasjeet S. Sekhon,
Alexander G. Theodoridis,
Luis F. Campos
Abstract:
The popularity of online surveys has increased the prominence of using weights that capture units' probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the N…
▽ More
The popularity of online surveys has increased the prominence of using weights that capture units' probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the Neyman-Rubin model, for researchers producing and working with survey experimental data. We examine simple, efficient estimators for analyzing these data, and give formulae for their biases and variances. We provide simulations that examine these estimators as well as real examples from experiments administered online through YouGov. We find that for examining the existence of population treatment effects using high-quality, broadly representative samples recruited by top online survey firms, sample quantities, which do not rely on weights, are often sufficient. We found that Sample Average Treatment Effect (SATE) estimates did not appear to differ substantially from their weighted counterparts, and they avoided the substantial loss of statistical power that accompanies weighting. When precise estimates of Population Average Treatment Effects (PATE) are essential, we analytically show post-stratifying on survey weights and/or covariates highly correlated with the outcome to be a conservative choice. While we show these substantial gains in simulations, we find limited evidence of them in practice.
△ Less
Submitted 15 August, 2017; v1 submitted 20 March, 2017;
originally announced March 2017.
-
Generalized full matching and extrapolation of the results from a large-scale voter mobilization experiment
Authors:
Fredrik Sävje,
Michael J. Higgins,
Jasjeet S. Sekhon
Abstract:
Matching is an important tool in causal inference. The method provides a conceptually straightforward way to make groups of units comparable on observed characteristics. The use of the method is, however, limited to situations where the study design is fairly simple and the sample is moderately sized. We illustrate the issue by revisiting a large-scale voter mobilization experiment that took place…
▽ More
Matching is an important tool in causal inference. The method provides a conceptually straightforward way to make groups of units comparable on observed characteristics. The use of the method is, however, limited to situations where the study design is fairly simple and the sample is moderately sized. We illustrate the issue by revisiting a large-scale voter mobilization experiment that took place in Michigan for the 2006 election. We ask what the causal effects would have been if the treatments in the experiment were scaled up to the full population. Matching could help us answer this question, but no existing matching method can accommodate the six treatment arms and the 6,762,701 observations involved in the study. To offer a solution this and similar empirical problems, we introduce a generalization of the full matching method and an associated algorithm. The method can be used with any number of treatment conditions, and it is shown to produce near-optimal matchings. The worst case maximum within-group dissimilarity is no worse than four times the optimal solution, and simulation results indicate that its performance is considerably closer to the optimal solution on average. Despite its performance, the algorithm is fast and uses little memory. It terminates, on average, in linearithmic time using linear space. This enables investigators to construct well-performing matchings within minutes even in complex studies with samples of several million units.
△ Less
Submitted 16 June, 2019; v1 submitted 10 March, 2017;
originally announced March 2017.
-
Blocking estimators and inference under the Neyman-Rubin model
Authors:
Michael J. Higgins,
Fredrik Sävje,
Jasjeet S. Sekhon
Abstract:
We derive the variances of estimators for sample average treatment effects under the Neyman-Rubin potential outcomes model for arbitrary blocking assignments and an arbitrary number of treatments.
We derive the variances of estimators for sample average treatment effects under the Neyman-Rubin potential outcomes model for arbitrary blocking assignments and an arbitrary number of treatments.
△ Less
Submitted 5 October, 2015;
originally announced October 2015.