-
ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology
Authors:
Tan H. Nguyen,
Dinkar Juyal,
** Li,
Aaditya Prakash,
Shima Nofallah,
Chintan Shah,
Sai Chowdary Gullapally,
Limin Yu,
Michael Griffin,
Anand Sampat,
John Abel,
Justin Lee,
Amaro Taylor-Weiner
Abstract:
Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test…
▽ More
Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. Contrimix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. We make our code and trained ContriMix models available for research use. The code for ContriMix can be found at https://gitlab.com/huutan86/contrimix
△ Less
Submitted 8 March, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Synthetic DOmain-Targeted Augmentation (S-DOTA) Improves Model Generalization in Digital Pathology
Authors:
Sai Chowdary Gullapally,
Yibo Zhang,
Nitin Kumar Mittal,
Deeksha Kartik,
Sandhya Srinivasan,
Kevin Rose,
Daniel Shenker,
Dinkar Juyal,
Harshith Padigela,
Raymond Biju,
Victor Minden,
Chirag Maheshwari,
Marc Thibault,
Zvi Goldstein,
Luke Novak,
Nidhi Chandra,
Justin Lee,
Aaditya Prakash,
Chintan Shah,
John Abel,
Darren Fahy,
Amaro Taylor-Weiner,
Anand Sampat
Abstract:
Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Syn…
▽ More
Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Synthetic DOmain-Targeted Augmentation (S-DOTA) methods, namely CycleGAN-enabled Scanner Transform (ST) and targeted Stain Vector Augmentation (SVA), and compared them against the International Color Consortium (ICC) profile-based color calibration (ICC Cal) method and a baseline method using traditional brightness, color and noise augmentations. We evaluated the ability of these techniques to improve model generalization to various tasks and settings: four models, two model types (tissue segmentation and cell classification), two loss functions, six labs, six scanners, and three indications (hepatocellular carcinoma (HCC), nonalcoholic steatohepatitis (NASH), prostate adenocarcinoma). We compared these methods based on the macro-averaged F1 scores on in-distribution (ID) and out-of-distribution (OOD) test sets across multiple domains, and found that S-DOTA methods (i.e., ST and SVA) led to significant improvements over ICC Cal and baseline on OOD data while maintaining comparable performance on ID data. Thus, we demonstrate that S-DOTA may help address generalization due to domain shift in real world applications.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
SC-MIL: Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in Pathology
Authors:
Dinkar Juyal,
Siddhant Shingi,
Syed Ashar Javed,
Harshith Padigela,
Chintan Shah,
Anand Sampat,
Archit Khosla,
John Abel,
Amaro Taylor-Weiner
Abstract:
Multiple Instance learning (MIL) models have been extensively used in pathology to predict biomarkers and risk-stratify patients from gigapixel-sized images. Machine learning problems in medical imaging often deal with rare diseases, making it important for these models to work in a label-imbalanced setting. In pathology images, there is another level of imbalance, where given a positively labeled…
▽ More
Multiple Instance learning (MIL) models have been extensively used in pathology to predict biomarkers and risk-stratify patients from gigapixel-sized images. Machine learning problems in medical imaging often deal with rare diseases, making it important for these models to work in a label-imbalanced setting. In pathology images, there is another level of imbalance, where given a positively labeled Whole Slide Image (WSI), only a fraction of pixels within it contribute to the positive label. This compounds the severity of imbalance and makes imbalanced classification in pathology challenging. Furthermore, these imbalances can occur in out-of-distribution (OOD) datasets when the models are deployed in the real-world. We leverage the idea that decoupling feature and classifier learning can lead to improved decision boundaries for label imbalanced datasets. To this end, we investigate the integration of supervised contrastive learning with multiple instance learning (SC-MIL). Specifically, we propose a joint-training MIL framework in the presence of label imbalance that progressively transitions from learning bag-level representations to optimal classifier learning. We perform experiments with different imbalance settings for two well-studied problems in cancer pathology: subty** of non-small cell lung cancer and subty** of renal cell carcinoma. SC-MIL provides large and consistent improvements over other techniques on both in-distribution (ID) and OOD held-out sets across multiple imbalanced settings.
△ Less
Submitted 9 September, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Self-training of Machine Learning Models for Liver Histopathology: Generalization under Clinical Shifts
Authors:
** Li,
Deepta Rajan,
Chintan Shah,
Dinkar Juyal,
Shreya Chakraborty,
Chandan Akiti,
Filip Kos,
Janani Iyer,
Anand Sampat,
Ali Behrooz
Abstract:
Histopathology images are gigapixel-sized and include features and information at different resolutions. Collecting annotations in histopathology requires highly specialized pathologists, making it expensive and time-consuming. Self-training can alleviate annotation constraints by learning from both labeled and unlabeled data, reducing the amount of annotations required from pathologists. We study…
▽ More
Histopathology images are gigapixel-sized and include features and information at different resolutions. Collecting annotations in histopathology requires highly specialized pathologists, making it expensive and time-consuming. Self-training can alleviate annotation constraints by learning from both labeled and unlabeled data, reducing the amount of annotations required from pathologists. We study the design of teacher-student self-training systems for Non-alcoholic Steatohepatitis (NASH) using clinical histopathology datasets with limited annotations. We evaluate the models on in-distribution and out-of-distribution test data under clinical data shifts. We demonstrate that through self-training, the best student model statistically outperforms the teacher with a $3\%$ absolute difference on the macro F1 score. The best student model also approaches the performance of a fully supervised model trained with twice as many annotations.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.