On the importance of normative data in speech-based assessment
Authors:
Zeinab Noorian,
ChloƩ Pou-Prom,
Frank Rudzicz
Abstract:
Data sets for identifying Alzheimer's disease (AD) are often relatively sparse, which limits their ability to train generalizable models. Here, we augment such a data set, DementiaBank, with each of two normative data sets, the Wisconsin Longitudinal Study and Talk2Me, each of which employs a speech-based picture-description assessment. Through minority class oversampling with ADASYN, we outperfor…
▽ More
Data sets for identifying Alzheimer's disease (AD) are often relatively sparse, which limits their ability to train generalizable models. Here, we augment such a data set, DementiaBank, with each of two normative data sets, the Wisconsin Longitudinal Study and Talk2Me, each of which employs a speech-based picture-description assessment. Through minority class oversampling with ADASYN, we outperform state-of-the-art results in binary classification of people with and without AD in DementiaBank. This work highlights the effectiveness of combining sparse and difficult-to-acquire patient data with relatively large and easily accessible normative datasets.
△ Less
Submitted 30 November, 2017;
originally announced December 2017.