-
RIDGE: Reproducibility, Integrity, Dependability, Generalizability, and Efficiency Assessment of Medical Image Segmentation Models
Authors:
Farhad Maleki,
Linda Moy,
Reza Forghani,
Tapotosh Ghosh,
Katie Ovens,
Steve Langer,
Pouria Rouzrokh,
Bardia Khosravi,
Ali Ganjizadeh,
Daniel Warren,
Roxana Daneshjou,
Mana Moassefi,
Atlas Haddadi Avval,
Susan Sotardi,
Neil Tenenholtz,
Felipe Kitamura,
Timothy Kline
Abstract:
Deep learning techniques, despite their potential, often suffer from a lack of reproducibility and generalizability, impeding their clinical adoption. Image segmentation is one of the critical tasks in medical image analysis, in which one or several regions/volumes of interest should be annotated. This paper introduces the RIDGE checklist, a framework for assessing the Reproducibility, Integrity,…
▽ More
Deep learning techniques, despite their potential, often suffer from a lack of reproducibility and generalizability, impeding their clinical adoption. Image segmentation is one of the critical tasks in medical image analysis, in which one or several regions/volumes of interest should be annotated. This paper introduces the RIDGE checklist, a framework for assessing the Reproducibility, Integrity, Dependability, Generalizability, and Efficiency of deep learning-based medical image segmentation models. The checklist serves as a guide for researchers to enhance the quality and transparency of their work, ensuring that segmentation models are not only scientifically sound but also clinically relevant.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality
Authors:
Kailas Vodrahalli,
Justin Ko,
Albert S. Chiou,
Roberto Novoa,
Abubakar Abid,
Michelle Phung,
Kiana Yekrang,
Paige Petrone,
James Zou,
Roxana Daneshjou
Abstract:
Telemedicine utilization was accelerated during the COVID-19 pandemic, and skin conditions were a common use case. However, the quality of photographs sent by patients remains a major limitation. To address this issue, we developed TrueImage 2.0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality…
▽ More
Telemedicine utilization was accelerated during the COVID-19 pandemic, and skin conditions were a common use case. However, the quality of photographs sent by patients remains a major limitation. To address this issue, we developed TrueImage 2.0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality improvement. TrueImage 2.0 was trained on 1700 telemedicine images annotated by clinicians for photo quality. On a retrospective dataset of 357 telemedicine images, TrueImage 2.0 effectively identified poor quality images (Receiver operator curve area under the curve (ROC-AUC) =0.78) and the reason for poor quality (Blurry ROC-AUC=0.84, Lighting issues ROC-AUC=0.70). The performance is consistent across age, gender, and skin tone. Next, we assessed whether patient-TrueImage 2.0 interaction led to an improvement in submitted photo quality through a prospective clinical pilot study with 98 patients. TrueImage 2.0 reduced the number of patients with a poor-quality image by 68.0%.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set
Authors:
Roxana Daneshjou,
Kailas Vodrahalli,
Roberto A Novoa,
Melissa Jenkins,
Weixin Liang,
Veronica Rotemberg,
Justin Ko,
Susan M Swetter,
Elizabeth E Bailey,
Olivier Gevaert,
Pritam Mukherjee,
Michelle Phung,
Kiana Yekrang,
Bradley Fong,
Rachna Sahasrabudhe,
Johan A. C. Allerup,
Utako Okata-Karigane,
James Zou,
Albert Chiou
Abstract:
Access to dermatological care is a major issue, with an estimated 3 billion people lacking access to care globally. Artificial intelligence (AI) may aid in triaging skin diseases. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology I…
▽ More
Access to dermatological care is a major issue, with an estimated 3 billion people lacking access to care globally. Artificial intelligence (AI) may aid in triaging skin diseases. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones. Using this dataset of 656 images, we show that state-of-the-art dermatology AI models perform substantially worse on DDI, with receiver operator curve area under the curve (ROC-AUC) drop** by 27-36 percent compared to the models' original test results. All the models performed worse on dark skin tones and uncommon diseases, which are represented in the DDI dataset. Additionally, we find that dermatologists, who typically provide visual labels for AI training and test datasets, also perform worse on images of dark skin tones and uncommon diseases compared to ground truth biopsy annotations. Finally, fine-tuning AI models on the well-characterized and diverse DDI images closed the performance gap between light and dark skin tones. Moreover, algorithms fine-tuned on diverse skin tones outperformed dermatologists on identifying malignancy on images of dark skin tones. Our findings identify important weaknesses and biases in dermatology AI that need to be addressed to ensure reliable application to diverse patients and diseases.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Disparities in Dermatology AI: Assessments Using Diverse Clinical Images
Authors:
Roxana Daneshjou,
Kailas Vodrahalli,
Weixin Liang,
Roberto A Novoa,
Melissa Jenkins,
Veronica Rotemberg,
Justin Ko,
Susan M Swetter,
Elizabeth E Bailey,
Olivier Gevaert,
Pritam Mukherjee,
Michelle Phung,
Kiana Yekrang,
Bradley Fong,
Rachna Sahasrabudhe,
James Zou,
Albert Chiou
Abstract:
More than 3 billion people lack access to care for skin disease. AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases. To address this, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, pathologically confirmed images featuring diverse skin tones. We show tha…
▽ More
More than 3 billion people lack access to care for skin disease. AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases. To address this, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, pathologically confirmed images featuring diverse skin tones. We show that state-of-the-art dermatology AI models perform substantially worse on DDI, with ROC-AUC drop** 29-40 percent compared to the models' original results. We find that dark skin tones and uncommon diseases, which are well represented in the DDI dataset, lead to performance drop-offs. Additionally, we show that state-of-the-art robust training methods cannot correct for these biases without diverse training data. Our findings identify important weaknesses and biases in dermatology AI that need to be addressed to ensure reliable application to diverse patients and across all disease.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Towards Realization of Augmented Intelligence in Dermatology: Advances and Future Directions
Authors:
Roxana Daneshjou,
Carrie Kovarik,
Justin M Ko
Abstract:
Artificial intelligence (AI) algorithms using deep learning have advanced the classification of skin disease images; however these algorithms have been mostly applied "in silico" and not validated clinically. Most dermatology AI algorithms perform binary classification tasks (e.g. malignancy versus benign lesions), but this task is not representative of dermatologists' diagnostic range. The Americ…
▽ More
Artificial intelligence (AI) algorithms using deep learning have advanced the classification of skin disease images; however these algorithms have been mostly applied "in silico" and not validated clinically. Most dermatology AI algorithms perform binary classification tasks (e.g. malignancy versus benign lesions), but this task is not representative of dermatologists' diagnostic range. The American Academy of Dermatology Task Force on Augmented Intelligence published a position statement emphasizing the importance of clinical validation to create human-computer synergy, termed augmented intelligence (AuI). Liu et al's recent paper, "A deep learning system for differential diagnosis of skin diseases" represents a significant advancement of AI in dermatology, bringing it closer to clinical impact. However, significant issues must be addressed before this algorithm can be integrated into clinical workflow. These issues include accurate and equitable model development, defining and assessing appropriate clinical outcomes, and real-world integration.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
TrueImage: A Machine Learning Algorithm to Improve the Quality of Telehealth Photos
Authors:
Kailas Vodrahalli,
Roxana Daneshjou,
Roberto A Novoa,
Albert Chiou,
Justin M Ko,
James Zou
Abstract:
Telehealth is an increasingly critical component of the health care ecosystem, especially due to the COVID-19 pandemic. Rapid adoption of telehealth has exposed limitations in the existing infrastructure. In this paper, we study and highlight photo quality as a major challenge in the telehealth workflow. We focus on teledermatology, where photo quality is particularly important; the framework prop…
▽ More
Telehealth is an increasingly critical component of the health care ecosystem, especially due to the COVID-19 pandemic. Rapid adoption of telehealth has exposed limitations in the existing infrastructure. In this paper, we study and highlight photo quality as a major challenge in the telehealth workflow. We focus on teledermatology, where photo quality is particularly important; the framework proposed here can be generalized to other health domains. For telemedicine, dermatologists request that patients submit images of their lesions for assessment. However, these images are often of insufficient quality to make a clinical diagnosis since patients do not have experience taking clinical photos. A clinician has to manually triage poor quality images and request new images to be submitted, leading to wasted time for both the clinician and the patient. We propose an automated image assessment machine learning pipeline, TrueImage, to detect poor quality dermatology photos and to guide patients in taking better photos. Our experiments indicate that TrueImage can reject 50% of the sub-par quality images, while retaining 80% of good quality images patients send in, despite heterogeneity and limitations in the training data. These promising results suggest that our solution is feasible and can improve the quality of teledermatology care.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.