As easy as APC: overcoming missing data and class imbalance in time series with self-supervised learning
Authors:
Fiorella Wever,
T. Anderson Keller,
Laura Symul,
Victor Garcia
Abstract:
High levels of missing data and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. Existing methods approach these problems separately, frequently making significant assumptions about the underlying data generation process in order to lessen the impact of missing information. In this work, we instead demonstrate how a general se…
▽ More
High levels of missing data and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. Existing methods approach these problems separately, frequently making significant assumptions about the underlying data generation process in order to lessen the impact of missing information. In this work, we instead demonstrate how a general self-supervised training method, namely Autoregressive Predictive Coding (APC), can be leveraged to overcome both missing data and class imbalance simultaneously without strong assumptions. Specifically, on a synthetic dataset, we show that standard baselines are substantially improved upon through the use of APC, yielding the greatest gains in the combined setting of high missingness and severe class imbalance. We further apply APC on two real-world medical time-series datasets, and show that APC improves the classification performance in all settings, ultimately achieving state-of-the-art AUPRC results on the Physionet benchmark.
△ Less
Submitted 26 January, 2022; v1 submitted 29 June, 2021;
originally announced June 2021.
Predicting pregnancy using large-scale data from a women's health tracking mobile application
Authors:
Bo Liu,
Shuyang Shi,
Yongshang Wu,
Daniel Thomas,
Laura Symul,
Emma Pierson,
Jure Leskovec
Abstract:
Predicting pregnancy has been a fundamental problem in women's health for more than 50 years. Previous datasets have been collected via carefully curated medical studies, but the recent growth of women's health tracking mobile apps offers potential for reaching a much broader population. However, the feasibility of predicting pregnancy from mobile health tracking data is unclear. Here we develop f…
▽ More
Predicting pregnancy has been a fundamental problem in women's health for more than 50 years. Previous datasets have been collected via carefully curated medical studies, but the recent growth of women's health tracking mobile apps offers potential for reaching a much broader population. However, the feasibility of predicting pregnancy from mobile health tracking data is unclear. Here we develop four models -- a logistic regression model, and 3 LSTM models -- to predict a woman's probability of becoming pregnant using data from a women's health tracking app, Clue by BioWink GmbH. Evaluating our models on a dataset of 79 million logs from 65,276 women with ground truth pregnancy test data, we show that our predicted pregnancy probabilities meaningfully stratify women: women in the top 10% of predicted probabilities have a 89% chance of becoming pregnant over 6 menstrual cycles, as compared to a 27% chance for women in the bottom 10%. We develop a technique for extracting interpretable time trends from our deep learning models, and show these trends are consistent with previous fertility research. Our findings illustrate the potential that women's health tracking data offers for predicting pregnancy on a broader population; we conclude by discussing the steps needed to fulfill this potential.
△ Less
Submitted 27 March, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.