-
The Disparate Impacts of College Admissions Policies on Asian American Applicants
Authors:
Joshua Grossman,
Sabina Tomkins,
Lindsay Page,
Sharad Goel
Abstract:
There is debate over whether Asian American students are admitted to selective colleges and universities at lower rates than white students with similar academic qualifications. However, there have been few empirical investigations of this issue, in large part due to a dearth of data. Here we present the results from analyzing 685,709 applications from Asian American and white students to a subset…
▽ More
There is debate over whether Asian American students are admitted to selective colleges and universities at lower rates than white students with similar academic qualifications. However, there have been few empirical investigations of this issue, in large part due to a dearth of data. Here we present the results from analyzing 685,709 applications from Asian American and white students to a subset of selective U.S. institutions over five application cycles, beginning with the 2015-2016 cycle. The dataset does not include admissions decisions, and so we construct a proxy based in part on enrollment choices. Based on this proxy, we estimate the odds that Asian American applicants were admitted to at least one of the schools we consider were 28% lower than the odds for white students with similar test scores, grade-point averages, and extracurricular activities. The gap was particularly pronounced for students of South Asian descent (49% lower odds). We trace this pattern in part to two factors. First, many selective colleges openly give preference to the children of alumni, and we find that white applicants were substantially more likely to have such legacy status than Asian applicants, especially South Asian applicants. Second, after adjusting for observed student characteristics, the institutions we consider appear less likely to admit students from geographic regions with relatively high shares of applicants who are Asian. We hope these results inform ongoing discussions on the equity of college admissions policies.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Doubly robust nearest neighbors in factor models
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for esti…
▽ More
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.
△ Less
Submitted 29 January, 2024; v1 submitted 25 November, 2022;
originally announced November 2022.
-
Counterfactual inference for sequential experiments
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumpt…
▽ More
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.
△ Less
Submitted 16 April, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Blocks as geographic discontinuities: The effect of polling place assignment on voting
Authors:
Sabina Tomkins,
Keniel Yao,
Johann Gaebler,
Tobias Konitzer,
David Rothschild,
Marc Meredith,
Sharad Goel
Abstract:
A potential voter must incur a number of costs in order to successfully cast an in-person ballot, including the costs associated with identifying and traveling to a polling place. In order to investigate how these costs affect voting behavior, we introduce two quasi-experimental designs that can be used to study how the political participation of registered voters is affected by differences in the…
▽ More
A potential voter must incur a number of costs in order to successfully cast an in-person ballot, including the costs associated with identifying and traveling to a polling place. In order to investigate how these costs affect voting behavior, we introduce two quasi-experimental designs that can be used to study how the political participation of registered voters is affected by differences in the relative distance that registrants must travel to their assigned Election Day polling place and whether their polling place remains at the same location as in a previous election. Our designs make comparisons of registrants who live on the same residential block, but are assigned to vote at different polling places. We find that living farther from a polling place and being assigned to a new polling place reduce in-person Election Day voting, but that registrants largely offset for this by casting more early in-person and mail ballots.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Fast Physical Activity Suggestions: Efficient Hyperparameter Learning in Mobile Health
Authors:
Marianne Menictas,
Sabina Tomkins,
Susan Murphy
Abstract:
Users can be supported to adopt healthy behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile devices. Recently, reinforcement learning algorithms have been found to be effective for learning the optimal context under which to provide suggestions. However, these algorithms are not necessarily designed for the constraints posed by mobile health (mHealth)…
▽ More
Users can be supported to adopt healthy behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile devices. Recently, reinforcement learning algorithms have been found to be effective for learning the optimal context under which to provide suggestions. However, these algorithms are not necessarily designed for the constraints posed by mobile health (mHealth) settings, that they be efficient, domain-informed and computationally affordable. We propose an algorithm for providing physical activity suggestions in mHealth settings. Using domain-science, we formulate a contextual bandit algorithm which makes use of a linear mixed effects model. We then introduce a procedure to efficiently perform hyper-parameter updating, using far less computational resources than competing approaches. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99% and 56% respectively.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
IntelligentPooling: Practical Thompson Sampling for mHealth
Authors:
Sabina Tomkins,
Peng Liao,
Predrag Klasnja,
Susan Murphy
Abstract:
In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of hel** the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobi…
▽ More
In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of hel** the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each user's degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a user's time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial.
△ Less
Submitted 12 December, 2020; v1 submitted 31 July, 2020;
originally announced August 2020.
-
Noise Mitigation with Delay Pulses in the IBM Quantum Experience
Authors:
Sam Tomkins,
Rogério de Sousa
Abstract:
One of the greatest challenges for current quantum computing hardware is how to obtain reliable results from noisy devices. A recent paper [A. Kandala et al., Nature 567, 491 (2019)] described a method for injecting noise by stretching gate times, enabling the calculation of quantum expectation values as a function of the amount of noise in the IBM-Q devices. Extrapolating to zero noise led to exc…
▽ More
One of the greatest challenges for current quantum computing hardware is how to obtain reliable results from noisy devices. A recent paper [A. Kandala et al., Nature 567, 491 (2019)] described a method for injecting noise by stretching gate times, enabling the calculation of quantum expectation values as a function of the amount of noise in the IBM-Q devices. Extrapolating to zero noise led to excellent agreement with exact results. Here an alternative scheme is described that employs the intentional addition of identity pulses, pausing the device periodically in order to gradually subject the quantum computation to increased levels of noise. The scheme is implemented in a one qubit circuit on an IBM-Q device. It is determined that this is an effective method for controlled addition of noise, and further, that using noisy results to perform extrapolation can lead to improvements in the final output, provided careful attention is paid to how the extrapolation is carried out.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Streamlined Empirical Bayes Fitting of Linear Mixed Models in Mobile Health
Authors:
Marianne Menictas,
Sabina Tomkins,
Susan A Murphy
Abstract:
To effect behavior change a successful algorithm must make high-quality decisions in real-time. For example, a mobile health (mHealth) application designed to increase physical activity must make contextually relevant suggestions to motivate users. While machine learning offers solutions for certain stylized settings, such as when batch data can be processed offline, there is a dearth of approache…
▽ More
To effect behavior change a successful algorithm must make high-quality decisions in real-time. For example, a mobile health (mHealth) application designed to increase physical activity must make contextually relevant suggestions to motivate users. While machine learning offers solutions for certain stylized settings, such as when batch data can be processed offline, there is a dearth of approaches which can deliver high-quality solutions under the specific constraints of mHealth. We propose an algorithm which provides users with contextualized and personalized physical activity suggestions. This algorithm is able to overcome a challenge critical to mHealth that complex models be trained efficiently. We propose a tractable streamlined empirical Bayes procedure which fits linear mixed effects models in large-data settings. Our procedure takes advantage of sparsity introduced by hierarchical random effects to efficiently learn the posterior distribution of a linear mixed effects model. A key contribution of this work is that we provide explicit updates in order to learn both fixed effects, random effects and hyper-parameter values. We demonstrate the success of this approach in a mobile health (mHealth) reinforcement learning application, a domain in which fast computations are crucial for real time interventions. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99% and 56% respectively.
△ Less
Submitted 28 March, 2020;
originally announced March 2020.
-
Rapidly Personalizing Mobile Health Treatment Policies with Limited Data
Authors:
Sabina Tomkins,
Peng Liao,
Predrag Klasnja,
Serena Yeung,
Susan Murphy
Abstract:
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challengi…
▽ More
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challenging. We present IntelligentPooling, which learns personalized policies via an adaptive, principled use of other users' data. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art across all generative models. Additionally, we inspect the behavior of this approach in a live clinical trial, demonstrating its ability to learn from even a small group of users.
△ Less
Submitted 23 February, 2020;
originally announced February 2020.
-
Personalizing Intervention Probabilities By Pooling
Authors:
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy
Abstract:
In many mobile health interventions, treatments should only be delivered in a particular context, for example when a user is currently stressed, walking or sedentary. Even in an optimal context, concerns about user burden can restrict which treatments are sent. To diffuse the treatment delivery over times when a user is in a desired context, it is critical to predict the future number of times the…
▽ More
In many mobile health interventions, treatments should only be delivered in a particular context, for example when a user is currently stressed, walking or sedentary. Even in an optimal context, concerns about user burden can restrict which treatments are sent. To diffuse the treatment delivery over times when a user is in a desired context, it is critical to predict the future number of times the context will occur. The focus of this paper is on whether personalization can improve predictions in these settings. Though the variance between individuals' behavioral patterns suggest that personalization should be useful, the amount of individual-level data limits its capabilities. Thus, we investigate several methods which pool data across users to overcome these deficiencies and find that pooling lowers the overall error rate relative to both personalized and batch approaches.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.