-
Constrained Design of a Binary Instrument in a Partially Linear Model
Authors:
Tim Morrison,
Minh Nguyen,
Michael Baiocchi,
Art B. Owen
Abstract:
We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment stat…
▽ More
We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment status. Our goal is to assign the nudge as a function of covariates in a way that best estimates the local average treatment effect (LATE). We assume a partially linear model, wherein the baseline model is non-parametric and the treatment term is linear in the covariates. Under this model, we outline a two-stage procedure to consistently estimate the LATE. Though the variance of the LATE is intractable, we derive a finite sample approximation and thus a design criterion to minimize. This criterion is convex, allowing for constraints that might arise for budgetary or ethical reasons. We prove conditions under which our solution asymptotically recovers the lowest true variance among all possible nudge propensities. We apply our method to a semi-synthetic example involving triage in an emergency department and find significant gains relative to a regression discontinuity design.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
The High Energy Light Isotope eXperiment program of direct cosmic-ray studies
Authors:
HELIX Collaboration,
S. Coutu,
P. S. Allison,
M. Baiocchi,
J. J. Beatty,
L. Beaufore,
D. H. Calderon,
A. G. Castano,
Y. Chen,
N. Green,
D. Hanna,
H. B. Jeon,
S. B. Klein,
B. Kunkler,
M. Lang,
R. Mbarek,
K. McBride,
S. I. Mognet,
J. Musser,
S. Nutter,
S. OBrien,
N. Park,
K. M. Powledge,
K. Sakai,
M. Tabata
, et al. (5 additional authors not shown)
Abstract:
HELIX is a new NASA-sponsored instrument aimed at measuring the spectra and composition of light cosmic-ray isotopes from hydrogen to neon nuclei, in particular the clock isotopes 10Be (radioactive, with 1.4 Myr lifetime) and 9Be (stable). The latter are unique markers of the production and Galactic propagation of secondary cosmic-ray nuclei, and are needed to resolve such important mysteries as t…
▽ More
HELIX is a new NASA-sponsored instrument aimed at measuring the spectra and composition of light cosmic-ray isotopes from hydrogen to neon nuclei, in particular the clock isotopes 10Be (radioactive, with 1.4 Myr lifetime) and 9Be (stable). The latter are unique markers of the production and Galactic propagation of secondary cosmic-ray nuclei, and are needed to resolve such important mysteries as the proportion of secondary positrons in the excess of antimatter observed by the AMS-02 experiment. By using a combination of a 1 T superconducting magnet spectrometer (with drift-chamber tracker) with a high-resolution time-of-flight detector system and ring-imaging Cherenkov detector, mass-resolved isotope measurements of light cosmic-ray nuclei will be possible up to 3 GeV/n in a first stratospheric balloon flight from Kiruna, Sweden to northern Canada, anticipated to take place in early summer 2024. An eventual longer Antarctic balloon flight of HELIX will yield measurements up to 10 GeV/n, sampling production from a larger volume of the Galaxy extending into the halo. We review the instrument design, testing, status and scientific prospects.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Electron-beam Calibration of Aerogel Tiles for the HELIX RICH Detector
Authors:
P. Allison,
M. Baiocchi,
J. J. Beatty,
L. Beaufore,
D. H. Calderone,
Y. Chen,
S. Coutu,
E. Ellingwood,
N. Green,
D. Hanna,
H. B. Jeon,
R. Mbarek,
K. McBride,
I. Mognet,
J. Musser,
S. Nutter,
S. O'Brien,
N. Park,
T. Rosin,
M. Tabata,
G. Tarlé,
G. Visser,
S. P. Wakely,
M. Yu
Abstract:
The HELIX cosmic-ray detector is a balloon-borne instrument designed to measure the flux of light isotopes in the energy range from 0.2 GeV/n to beyond 3 GeV/n. It will rely on a ring-imaging Cherenkov (RICH) detector for particle identification at energies greater than 1 GeV/n and will use aerogel tiles with refractive index near 1.15 as the radiator. To achieve the performance goals of the exper…
▽ More
The HELIX cosmic-ray detector is a balloon-borne instrument designed to measure the flux of light isotopes in the energy range from 0.2 GeV/n to beyond 3 GeV/n. It will rely on a ring-imaging Cherenkov (RICH) detector for particle identification at energies greater than 1 GeV/n and will use aerogel tiles with refractive index near 1.15 as the radiator. To achieve the performance goals of the experiment it is necessary to know the refractive index and its position dependence over the lateral extent of the tiles to a precision of O(10$^{-4}). In this paper we describe the apparatus and methods developed to calibrate the HELIX tiles in an electron beam, in order to meet this requirement.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection
Authors:
Conor K. Corbin,
Michael Baiocchi,
Jonathan H. Chen
Abstract:
When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally…
▽ More
When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Robust Designs for Prospective Randomized Trials Surveying Sensitive Topics
Authors:
Evan T. R. Rosenman,
Rina Friedberg,
Mike Baiocchi
Abstract:
We consider the problem of designing a prospective randomized trial in which the outcome data will be self-reported, and will involve sensitive topics. Our interest is in misreporting behavior, and how respondents' tendency to under- or overreport a binary outcome might affect the power of the experiment. We model the problem by assuming each individual in our study is a member of one "reporting c…
▽ More
We consider the problem of designing a prospective randomized trial in which the outcome data will be self-reported, and will involve sensitive topics. Our interest is in misreporting behavior, and how respondents' tendency to under- or overreport a binary outcome might affect the power of the experiment. We model the problem by assuming each individual in our study is a member of one "reporting class": a truth-teller, underreporter, overreporter, or false-teller. We show that the joint distribution of reporting classes and "response classes" (characterizing individuals' response to the treatment) will exactly define the bias and variance of the causal estimate in our experiment. Then, we propose a novel procedure for deriving sample sizes under the worst-case power corresponding to a given level of misreporting. Our problem is motivated by prior experience implementing a randomized controlled trial of a sexual violence prevention program among adolescent girls in Nairobi, Kenya.
△ Less
Submitted 25 August, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Assignment-Control Plots: A Visual Companion for Causal Inference Study Design
Authors:
Rachael C. Aikens,
Michael Baiocchi
Abstract:
An important step for any causal inference study design is understanding the distribution of the treated and control subjects in terms of measured baseline covariates. However, not all baseline variation is equally important. In the observational context, balancing on baseline variation summarized in a propensity score can help reduce bias due to self-selection. In both observational and experimen…
▽ More
An important step for any causal inference study design is understanding the distribution of the treated and control subjects in terms of measured baseline covariates. However, not all baseline variation is equally important. In the observational context, balancing on baseline variation summarized in a propensity score can help reduce bias due to self-selection. In both observational and experimental studies, controlling baseline variation associated with the expected outcomes can help increase the precision of causal effect estimates. We propose a set of visualizations which decompose the space of measured covariates into the different types of baseline variation important to the study design. These ``assignment-control plots'' and variations thereof visually illustrate core concepts of causal inference and suggest new directions for methodological research on study design. As a practical demonstration, we illustrate one application of assignment-control plots to a study of cardiothoracic surgery. While the family of visualization tools for studies of causality is relatively sparse, simple visual tools can be an asset to education, application, and methods development.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Statistical matching and subclassification with a continuous dose: characterization, algorithm, and application to a health outcomes study
Authors:
Bo Zhang,
Emily J. Mackay,
Mike Baiocchi
Abstract:
Subclassification and matching are often used in empirical studies to adjust for observed covariates; however, they are largely restricted to relatively simple study designs with a binary treatment and less developed for designs with a continuous exposure. Matching with exposure doses is particularly useful in instrumental variable designs and in understanding the dose-response relationships. In t…
▽ More
Subclassification and matching are often used in empirical studies to adjust for observed covariates; however, they are largely restricted to relatively simple study designs with a binary treatment and less developed for designs with a continuous exposure. Matching with exposure doses is particularly useful in instrumental variable designs and in understanding the dose-response relationships. In this article, we propose two criteria for optimal subclassification based on subclass homogeneity in the context of having a continuous exposure dose, and propose an efficient polynomial-time algorithm that is guaranteed to find an optimal subclassification with respect to one criterion and serves as a 2-approximation algorithm for the other criterion. We discuss how to incorporate dose and use appropriate penalties to control the number of subclasses in the design. Via extensive simulations, we systematically compare our proposed design to optimal non-bipartite pair matching, and demonstrate that combining our proposed subclassification scheme with regression adjustment helps reduce model dependence for parametric causal inference with a continuous dose. We apply the new design and associated randomization-based inferential procedure to study the effect of transesophageal echocardiography (TEE) monitoring during coronary artery bypass graft (CABG) surgery on patients' post-surgery clinical outcomes using Medicare and Medicaid claims data, and find evidence that TEE monitoring lowers patients' all-cause $30$-day mortality rate.
△ Less
Submitted 26 January, 2022; v1 submitted 13 December, 2020;
originally announced December 2020.
-
A Causal Machine Learning Framework for Predicting Preventable Hospital Readmissions
Authors:
Ben J. Marafino,
Alejandro Schuler,
Vincent X. Liu,
Gabriel J. Escobar,
Mike Baiocchi
Abstract:
Clinical predictive algorithms are increasingly being used to form the basis for optimal treatment policies--that is, to enable interventions to be targeted to the patients who will presumably benefit most. Despite taking advantage of recent advances in supervised machine learning, these algorithms remain, in a sense, blunt instruments--often being developed and deployed without a full accounting…
▽ More
Clinical predictive algorithms are increasingly being used to form the basis for optimal treatment policies--that is, to enable interventions to be targeted to the patients who will presumably benefit most. Despite taking advantage of recent advances in supervised machine learning, these algorithms remain, in a sense, blunt instruments--often being developed and deployed without a full accounting of the causal aspects of the prediction problems they are intended to solve. Indeed, in many settings, including among patients at risk of readmission, the riskiest patients may derive less benefit from a preventative intervention compared to those at lower risk. Moreover, targeting an intervention to a population, rather than limiting it to a small group of high-risk patients, may lead to far greater overall utility if the patients with the most modifiable (or preventable) outcomes across the population could be identified. Based on these insights, we introduce a causal machine learning framework that decouples this prediction problem into causal and predictive parts, which clearly delineates the complementary roles of causal inference and prediction in this problem. We estimate treatment effects using causal forests, and characterize treatment effect heterogeneity across levels of predicted risk using these estimates. Furthermore, we show how these effect estimates could be used in concert with the modeled "payoffs" associated with successful prevention of individual readmissions to maximize overall utility. Based on data taken from before and after the implementation of a readmissions prevention intervention at Kaiser Permanente Northern California, our results suggest that nearly four times as many readmissions could be prevented annually with this approach compared to targeting this intervention using predicted risk.
△ Less
Submitted 18 July, 2020; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Understanding the spatial burden of gender-based violence: Modelling patterns of violence in Nairobi, Kenya through geospatial information
Authors:
Rina Friedberg,
Clea Sarnquist,
Gavin Nyairo,
Mary Amuyunzu-Nyamongo,
Michael Baiocchi
Abstract:
We present statistical techniques for analyzing global positioning system (GPS) data in order to understand, communicate about, and prevent patterns of violence. In this pilot study, participants in Nairobi, Kenya were asked to rate their safety at several locations, with the goal of predicting safety and learning important patterns. These approaches are meant to help articulate differences in exp…
▽ More
We present statistical techniques for analyzing global positioning system (GPS) data in order to understand, communicate about, and prevent patterns of violence. In this pilot study, participants in Nairobi, Kenya were asked to rate their safety at several locations, with the goal of predicting safety and learning important patterns. These approaches are meant to help articulate differences in experiences, fostering a discussion that will help communities identify issues and policymakers develop safer communities. A generalized linear mixed model incorporating spatial information taken from existing maps of Kibera showed significant predictors of perceived lack of safety included being alone and time of day; in debrief interviews, participants described feeling unsafe in spaces with hiding places, disease carrying animals, and dangerous individuals. This pilot study demonstrates promise for detecting spatial patterns of violence, which appear to be confirmed by actual rates of measured violence at schools. Several factors relevant to community building consistently predict perceived safety and emerge in participants' qualitative descriptions, telling a cohesive story about perceived safety and empowering communication to community stakeholders.
△ Less
Submitted 16 February, 2020;
originally announced February 2020.
-
Combining Observational and Experimental Datasets Using Shrinkage Estimators
Authors:
Evan Rosenman,
Guillaume Basse,
Art Owen,
Michael Baiocchi
Abstract:
We consider the problem of combining data from observational and experimental sources to make causal conclusions. This problem is increasingly relevant, as the modern era has yielded passive collection of massive observational datasets in areas such as e-commerce and electronic health. These data may be used to supplement experimental data, which is frequently expensive to obtain. In Rosenman et a…
▽ More
We consider the problem of combining data from observational and experimental sources to make causal conclusions. This problem is increasingly relevant, as the modern era has yielded passive collection of massive observational datasets in areas such as e-commerce and electronic health. These data may be used to supplement experimental data, which is frequently expensive to obtain. In Rosenman et al. (2018), we considered this problem under the assumption that all confounders were measured. Here, we relax the assumption of unconfoundedness. To derive combined estimators with desirable properties, we make use of results from the Stein Shrinkage literature. Our contributions are threefold. First, we propose a generic procedure for deriving shrinkage estimators in this setting, making use of a generalized unbiased risk estimate. Second, we develop two new estimators, prove finite sample conditions under which they have lower risk than an estimator using only experimental data, and show that each achieves a notion of asymptotic optimality. Third, we draw connections between our approach and results in sensitivity analysis, including proposing a method for evaluating the feasibility of our estimators.
△ Less
Submitted 18 May, 2020; v1 submitted 16 February, 2020;
originally announced February 2020.
-
When black box algorithms are (not) appropriate: a principled prediction-problem ontology
Authors:
Jordan Rodu,
Michael Baiocchi
Abstract:
In the 1980s a new, extraordinarily productive way of reasoning about algorithms emerged. In this paper, we introduce the term "outcome reasoning" to refer to this form of reasoning. Though outcome reasoning has come to dominate areas of data science, it has been under-discussed and its impact under-appreciated. For example, outcome reasoning is the primary way we reason about whether ``black box'…
▽ More
In the 1980s a new, extraordinarily productive way of reasoning about algorithms emerged. In this paper, we introduce the term "outcome reasoning" to refer to this form of reasoning. Though outcome reasoning has come to dominate areas of data science, it has been under-discussed and its impact under-appreciated. For example, outcome reasoning is the primary way we reason about whether ``black box'' algorithms are performing well. In this paper we analyze outcome reasoning's most common form (i.e., as "the common task framework") and its limitations. We discuss why a large class of prediction-problems are inappropriate for outcome reasoning. As an example, we find the common task framework does not provide a foundation for the deployment of an algorithm in a real world situation. Building off of its core features, we identify a class of problems where this new form of reasoning can be used in deployment. We purposefully develop a novel framework so both technical and non-technical people can discuss and identify key features of their prediction problem and whether or not it is suitable for outcome reasoning.
△ Less
Submitted 14 February, 2023; v1 submitted 21 January, 2020;
originally announced January 2020.
-
stratamatch: Prognostic ScoreStratification using a Pilot Design
Authors:
Rachael C. Aikens,
Joseph Rigdon,
Justin Lee,
Michael Baiocchi,
Andrew B. Goldstone,
Peter Chiu,
Y. Joseph Woo,
Jonathan H. Chen
Abstract:
Optimal propensity score matching has emerged as one of the most ubiquitous approaches for causal inference studies on observational data; However, outstanding critiques of the statistical properties of propensity score matching have cast doubt on the statistical efficiency of this technique, and the poor scalability of optimal matching to large data sets makes this approach inconvenient if not in…
▽ More
Optimal propensity score matching has emerged as one of the most ubiquitous approaches for causal inference studies on observational data; However, outstanding critiques of the statistical properties of propensity score matching have cast doubt on the statistical efficiency of this technique, and the poor scalability of optimal matching to large data sets makes this approach inconvenient if not infeasible for sample sizes that are increasingly commonplace in modern observational data. The stratamatch package provides implementation support and diagnostics for `stratified matching designs,' an approach which addresses both of these issues with optimal propensity score matching for large-sample observational studies. First, stratifying the data enables more computationally efficient matching of large data sets. Second, stratamatch implements a `pilot design' approach in order to stratify by a prognostic score, which may increase the precision of the effect estimate and increase power in sensitivity analyses of unmeasured confounding.
△ Less
Submitted 25 February, 2021; v1 submitted 8 January, 2020;
originally announced January 2020.
-
A Pilot Design for Observational Studies: Using Abundant Data Thoughtfully
Authors:
Rachael C. Aikens,
Dylan Greaves,
Michael Baiocchi
Abstract:
Observational studies often benefit from an abundance of observational units. This can lead to studies that -- while challenged by issues of internal validity -- have inferences derived from sample sizes substantially larger than randomized controlled trials. But is the information provided by an observational unit best used in the analysis phase? We propose the use of `pilot design,' in which obs…
▽ More
Observational studies often benefit from an abundance of observational units. This can lead to studies that -- while challenged by issues of internal validity -- have inferences derived from sample sizes substantially larger than randomized controlled trials. But is the information provided by an observational unit best used in the analysis phase? We propose the use of `pilot design,' in which observations are expended in the design phase of the study, and the post-treatment information from these observations is used to improve study design. In modern observational studies, which are data rich but control poor, pilot designs can be used to gain information about the structure of post-treatment variation. This information can then be used to improve instrumental variable designs, propensity score matching, doubly-robust estimation, and other observational study designs.
We illustrate one version of a pilot design, which aims to reduce within-set heterogeneity and improve performance in sensitivity analyses. This version of a pilot design expends observational units during the design phase to fit a prognostic model, avoiding concerns of overfitting. Additionally, it enables the construction of `Assignment-Control (AC) plots,' which visualize the relationship between propensity and prognostic scores. We first show some examples of these plots, then we demonstrate in a simulation setting how this alternative use of the observations can lead to gains in terms of both treatment effect estimation and sensitivity analyses of unobserved confounding.
△ Less
Submitted 20 August, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Propensity Score Methods for Merging Observational and Experimental Datasets
Authors:
Evan Rosenman,
Art B. Owen,
Michael Baiocchi,
Hailey Banack
Abstract:
This project considers how one might augment a limited amount of data from randomized controlled trial (RCT) with more plentiful data from an observational database (ODB), in order to estimate a causal effect. In our motivating setting, the ODB has better external validity, while the RCT has genuine randomization. We work with strata defined by the propensity score in the ODB. Subjects from the RC…
▽ More
This project considers how one might augment a limited amount of data from randomized controlled trial (RCT) with more plentiful data from an observational database (ODB), in order to estimate a causal effect. In our motivating setting, the ODB has better external validity, while the RCT has genuine randomization. We work with strata defined by the propensity score in the ODB. Subjects from the RCT are placed in strata defined by the propensity they would have had, had they been in the ODB. Our first method simply spikes the RCT data into their corresponding ODB strata. Our second method takes a data-driven convex combination of the ODB and RCT treatment effect estimates within each stratum. Using the delta method and simulations we show that the spike-in method works best when the RCT covariates are drawn from the same distribution as in the ODB. Our convex combination method is more robust than the spike-in to covariate-based inclusion criteria that bias the RCT data. We apply our methods to data from the Women's Health Initiative, a study of thousands of postmenopausal women which has both observational and experimental data on hormone therapy (HT). Using half of the RCT to define a gold standard, we find that a version of the spiked-in estimate yields stable estimates of the causal impact of HT on coronary heart disease.
△ Less
Submitted 21 October, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
A comparison of methods for model selection when estimating individual treatment effects
Authors:
Alejandro Schuler,
Michael Baiocchi,
Robert Tibshirani,
Nigam Shah
Abstract:
Practitioners in medicine, business, political science, and other fields are increasingly aware that decisions should be personalized to each patient, customer, or voter. A given treatment (e.g. a drug or advertisement) should be administered only to those who will respond most positively, and certainly not to those who will be harmed by it. Individual-level treatment effects can be estimated with…
▽ More
Practitioners in medicine, business, political science, and other fields are increasingly aware that decisions should be personalized to each patient, customer, or voter. A given treatment (e.g. a drug or advertisement) should be administered only to those who will respond most positively, and certainly not to those who will be harmed by it. Individual-level treatment effects can be estimated with tools adapted from machine learning, but different models can yield contradictory estimates. Unlike risk prediction models, however, treatment effect models cannot be easily evaluated against each other using a held-out test set because the true treatment effect itself is never directly observed. Besides outcome prediction accuracy, several metrics that can leverage held-out data to evaluate treatment effects models have been proposed, but they are not widely used. We provide a didactic framework that elucidates the relationships between the different approaches and compare them all using a variety of simulations of both randomized and observational data. Our results show that researchers estimating heterogenous treatment effects need not limit themselves to a single model-fitting algorithm. Instead of relying on a single method, multiple models fit by a diverse set of algorithms should be evaluated against each other using an objective function learned from the validation set. The model minimizing that objective should be used for estimating the individual treatment effect for future individuals.
△ Less
Submitted 13 June, 2018; v1 submitted 13 April, 2018;
originally announced April 2018.
-
The causal impact of bail on case outcomes for indigent defendants
Authors:
Kristian Lum,
Mike Baiocchi
Abstract:
We use near-far matching, a technique for estimating causal relationships, to explore whether bail causes a higher likelihood of conviction. We find evidence of a strong causal impact. This paper was compiled as a submission to the 2017 Fairness, Accountability, and Transparency in Machine Learning (FAT ML) workshop.
We use near-far matching, a technique for estimating causal relationships, to explore whether bail causes a higher likelihood of conviction. We find evidence of a strong causal impact. This paper was compiled as a submission to the 2017 Fairness, Accountability, and Transparency in Machine Learning (FAT ML) workshop.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
Protocol for an Observational Study on the Effects of Playing High School Football on Later Life Cognitive Functioning and Mental Health
Authors:
Sameer K. Deshpande,
Raiden B. Hasegawa,
Amanda R. Rabinowitz,
John Whyte,
Carol L. Roan,
Andrew Tabatabaei,
Michael Baiocchi,
Jason H. Karlawish,
Christina L. Master,
Dylan S. Small
Abstract:
A potential causal relationship between head injuries sustained by NFL players and later-life neurological decline may have broad implications for participants in youth and high school football programs. However, brain trauma risk at the professional level may be different than that at the youth and high school levels and the long-term effects of participation at these levels is as-yet unclear. To…
▽ More
A potential causal relationship between head injuries sustained by NFL players and later-life neurological decline may have broad implications for participants in youth and high school football programs. However, brain trauma risk at the professional level may be different than that at the youth and high school levels and the long-term effects of participation at these levels is as-yet unclear. To investigate the effect of playing high school football on later life depression and cognitive functioning, we propose a retrospective observational study using data from the Wisconsin Longitudinal Study (WLS) of graduates from Wisconsin high schools in 1957.
We compare 1,153 high school males who played varsity football to 2,751 male students who did not. 1,951 of the control subjects did not play any sport and the remaining 800 controls played a non-contact sport. We focus on two primary outcomes measured at age 65: a composite cognitive outcome measuring verbal fluency and memory and the modified CES-D depression score. To control for potential confounders we adjust for pre-exposure covariates such as IQ with matching and model-based covariate adjustment. We will conduct an ordered testing procedure that uses all 2,751 controls while controlling for possible unmeasured differences between students who played sports and those who did not. We will quantitatively assess the sensitivity of the results to potential unmeasured confounding. The study will also consider several secondary outcomes of clinical interest such as aggression and heavy drinking. The rich set of pre-exposure variables, relatively unbiased sampling, and longitudinal nature of the WLS dataset make the proposed analysis unique among related studies that rely primarily on convenience samples of football players with reported neurological symptoms.
△ Less
Submitted 6 July, 2016;
originally announced July 2016.
-
Peer assessment enhances student learning
Authors:
Dennis L. Sun,
Naftali Harris,
Guenther Walther,
Michael Baiocchi
Abstract:
Feedback has a powerful influence on learning, but it is also expensive to provide. In large classes, it may even be impossible for instructors to provide individualized feedback. Peer assessment has received attention lately as a way of providing personalized feedback that scales to large classes. Besides these obvious benefits, some researchers have also conjectured that students learn by peer a…
▽ More
Feedback has a powerful influence on learning, but it is also expensive to provide. In large classes, it may even be impossible for instructors to provide individualized feedback. Peer assessment has received attention lately as a way of providing personalized feedback that scales to large classes. Besides these obvious benefits, some researchers have also conjectured that students learn by peer assessing, although no studies have ever conclusively demonstrated this effect. By conducting a randomized controlled trial in an introductory statistics class, we provide evidence that peer assessment causes significant gains in student achievement. The strength of our conclusions depends critically on the careful design of the experiment, which was made possible by a web-based platform that we developed. Hence, our study is also a proof of concept of the high-quality experiments that are possible with online tools.
△ Less
Submitted 14 October, 2014;
originally announced October 2014.