-
Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19
Authors:
Davide Pigoli,
Kieran Baker,
Jobie Budd,
Lorraine Butler,
Harry Coppock,
Sabrina Egglestone,
Steven G. Gilmour,
Chris Holmes,
David Hurley,
Radka Jersakova,
Ivan Kiskin,
Vasiliki Koutra,
Jonathon Mellor,
George Nicholson,
Joe Packham,
Selina Patel,
Richard Payne,
Stephen J. Roberts,
Björn W. Schuller,
Ana Tendero-Cañadas,
Tracey Thornley,
Alexander Titcomb
Abstract:
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously ass…
▽ More
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
△ Less
Submitted 27 February, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers
Authors:
Harry Coppock,
George Nicholson,
Ivan Kiskin,
Vasiliki Koutra,
Kieran Baker,
Jobie Budd,
Richard Payne,
Emma Karoune,
David Hurley,
Alexander Titcomb,
Sabrina Egglestone,
Ana Tendero Cañadas,
Lorraine Butler,
Radka Jersakova,
Jonathon Mellor,
Selina Patel,
Tracey Thornley,
Peter Diggle,
Sylvia Richardson,
Josef Packham,
Björn W. Schuller,
Davide Pigoli,
Steven Gilmour,
Stephen Roberts,
Chris Holmes
Abstract:
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata…
▽ More
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.
△ Less
Submitted 2 March, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
A large-scale and PCR-referenced vocal audio dataset for COVID-19
Authors:
Jobie Budd,
Kieran Baker,
Emma Karoune,
Harry Coppock,
Selina Patel,
Ana Tendero Cañadas,
Alexander Titcomb,
Richard Payne,
David Hurley,
Sabrina Egglestone,
Lorraine Butler,
Jonathon Mellor,
George Nicholson,
Ivan Kiskin,
Vasiliki Koutra,
Radka Jersakova,
Rachel A. McKendry,
Peter Diggle,
Sylvia Richardson,
Björn W. Schuller,
Steven Gilmour,
Davide Pigoli,
Stephen Roberts,
Josef Packham,
Tracey Thornley
, et al. (1 additional authors not shown)
Abstract:
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmi…
▽ More
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
△ Less
Submitted 3 November, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.