Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Dataset Labeling
Authors:
Nicole M Duggan,
Mike **,
Maria Alejandra Duran Mendicuti,
Stephen Hallisey,
Denie Bernier,
Lauren A Selame,
Ameneh Asgari-Targhi,
Chanel E Fischetti,
Ruben Lucassen,
Anthony E Samir,
Erik Duhaime+,
Tina Kapur,
Andrew J Goldsmith
Abstract:
Study Objective: Machine learning models have advanced medical image processing and can yield faster, more accurate diagnoses. Despite a wealth of available medical imaging data, high-quality labeled data for model training is lacking. We investigated whether a gamified crowdsourcing platform enhanced with inbuilt quality control metrics can produce lung ultrasound clip labels comparable to those…
▽ More
Study Objective: Machine learning models have advanced medical image processing and can yield faster, more accurate diagnoses. Despite a wealth of available medical imaging data, high-quality labeled data for model training is lacking. We investigated whether a gamified crowdsourcing platform enhanced with inbuilt quality control metrics can produce lung ultrasound clip labels comparable to those from clinical experts.
Methods: 2,384 lung ultrasound clips were retrospectively collected from 203 patients. Six lung ultrasound experts classified 393 of these clips as having no B-lines, one or more discrete B-lines, or confluent B-lines to create two sets of reference standard labels (195 training set clips and 198 test set clips). Sets were respectively used to A) train users on a gamified crowdsourcing platform, and B) compare concordance of the resulting crowd labels to the concordance of individual experts to reference standards.
Results: 99,238 crowdsourced opinions on 2,384 lung ultrasound clips were collected from 426 unique users over 8 days. On the 198 test set clips, mean labeling concordance of individual experts relative to the reference standard was 85.0% +/- 2.0 (SEM), compared to 87.9% crowdsourced label concordance (p=0.15). When individual experts' opinions were compared to reference standard labels created by majority vote excluding their own opinion, crowd concordance was higher than the mean concordance of individual experts to reference standards (87.4% vs. 80.8% +/- 1.6; p<0.001).
Conclusion: Crowdsourced labels for B-line classification via a gamified approach achieved expert-level quality. Scalable, high-quality labeling approaches may facilitate training dataset creation for machine learning model development.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
Deep Learning for Detection and Localization of B-Lines in Lung Ultrasound
Authors:
Ruben T. Lucassen,
Mohammad H. Jafari,
Nicole M. Duggan,
Nick Jowkar,
Alireza Mehrtash,
Chanel Fischetti,
Denie Bernier,
Kira Prentice,
Erik P. Duhaime,
Mike **,
Purang Abolmaesumi,
Friso G. Heslinga,
Mitko Veta,
Maria A. Duran-Mendicuti,
Sarah Frisken,
Paul B. Shyn,
Alexandra J. Golby,
Edward Boyer,
William M. Wells,
Andrew J. Goldsmith,
Tina Kapur
Abstract:
Lung ultrasound (LUS) is an important imaging modality used by emergency physicians to assess pulmonary congestion at the patient bedside. B-line artifacts in LUS videos are key findings associated with pulmonary congestion. Not only can the interpretation of LUS be challenging for novice operators, but visual quantification of B-lines remains subject to observer variability. In this work, we inve…
▽ More
Lung ultrasound (LUS) is an important imaging modality used by emergency physicians to assess pulmonary congestion at the patient bedside. B-line artifacts in LUS videos are key findings associated with pulmonary congestion. Not only can the interpretation of LUS be challenging for novice operators, but visual quantification of B-lines remains subject to observer variability. In this work, we investigate the strengths and weaknesses of multiple deep learning approaches for automated B-line detection and localization in LUS videos. We curate and publish, BEDLUS, a new ultrasound dataset comprising 1,419 videos from 113 patients with a total of 15,755 expert-annotated B-lines. Based on this dataset, we present a benchmark of established deep learning methods applied to the task of B-line detection. To pave the way for interpretable quantification of B-lines, we propose a novel "single-point" approach to B-line localization using only the point of origin. Our results show that (a) the area under the receiver operating characteristic curve ranges from 0.864 to 0.955 for the benchmarked detection methods, (b) within this range, the best performance is achieved by models that leverage multiple successive frames as input, and (c) the proposed single-point approach for B-line localization reaches an F1-score of 0.65, performing on par with the inter-observer agreement. The dataset and developed methods can facilitate further biomedical research on automated interpretation of lung ultrasound with the potential to expand the clinical utility.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.