-
Deep Learning-Based Grading of Ductal Carcinoma In Situ in Breast Histopathology Images
Authors:
Suzanne C. Wetstein,
Nikolas Stathonikos,
Josien P. W. Pluim,
Yu**g J. Heng,
Natalie D. ter Hoeve,
Celien P. H. Vreuls,
Paul J. van Diest,
Mitko Veta
Abstract:
Ductal carcinoma in situ (DCIS) is a non-invasive breast cancer that can progress into invasive ductal carcinoma (IDC). Studies suggest DCIS is often overtreated since a considerable part of DCIS lesions may never progress into IDC. Lower grade lesions have a lower progression speed and risk, possibly allowing treatment de-escalation. However, studies show significant inter-observer variation in D…
▽ More
Ductal carcinoma in situ (DCIS) is a non-invasive breast cancer that can progress into invasive ductal carcinoma (IDC). Studies suggest DCIS is often overtreated since a considerable part of DCIS lesions may never progress into IDC. Lower grade lesions have a lower progression speed and risk, possibly allowing treatment de-escalation. However, studies show significant inter-observer variation in DCIS grading. Automated image analysis may provide an objective solution to address high subjectivity of DCIS grading by pathologists.
In this study, we developed a deep learning-based DCIS grading system. It was developed using the consensus DCIS grade of three expert observers on a dataset of 1186 DCIS lesions from 59 patients. The inter-observer agreement, measured by quadratic weighted Cohen's kappa, was used to evaluate the system and compare its performance to that of expert observers. We present an analysis of the lesion-level and patient-level inter-observer agreement on an independent test set of 1001 lesions from 50 patients.
The deep learning system (dl) achieved on average slightly higher inter-observer agreement to the observers (o1, o2 and o3) ($κ_{o1,dl}=0.81, κ_{o2,dl}=0.53, κ_{o3,dl}=0.40$) than the observers amongst each other ($κ_{o1,o2}=0.58, κ_{o1,o3}=0.50, κ_{o2,o3}=0.42$) at the lesion-level. At the patient-level, the deep learning system achieved similar agreement to the observers ($κ_{o1,dl}=0.77, κ_{o2,dl}=0.75, κ_{o3,dl}=0.70$) as the observers amongst each other ($κ_{o1,o2}=0.77, κ_{o1,o3}=0.75, κ_{o2,o3}=0.72$).
In conclusion, we developed a deep learning-based DCIS grading system that achieved a performance similar to expert observers. We believe this is the first automated system that could assist pathologists by providing robust and reproducible second opinions on DCIS grade.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors
Authors:
Gerda Bortsova,
Cristina González-Gonzalo,
Suzanne C. Wetstein,
Florian Dubost,
Ioannis Katramados,
Laurens Hogeweg,
Bart Liefers,
Bram van Ginneken,
Josien P. W. Pluim,
Mitko Veta,
Clara I. Sánchez,
Marleen de Bruijne
Abstract:
Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure.
In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep l…
▽ More
Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure.
In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples. We consider this to be the most realistic scenario for MedIA systems.
Firstly, we study the effect of weight initialization (ImageNet vs. random) on the transferability of adversarial attacks from the surrogate model to the target model. Secondly, we study the influence of differences in development data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks.
Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when develo** security-critical MedIA systems planned to be deployed in clinical practice.
△ Less
Submitted 17 June, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Deep learning assessment of breast terminal duct lobular unit involution: towards automated prediction of breast cancer risk
Authors:
Suzanne C Wetstein,
Allison M Onken,
Christina Luffman,
Gabrielle M Baker,
Michael E Pyle,
Kevin H Kensler,
Ying Liu,
Bart Bakker,
Ruud Vlutters,
Marinus B van Leeuwen,
Laura C Collins,
Stuart J Schnitt,
Josien PW Pluim,
Rulla M Tamimi,
Yu**g J Heng,
Mitko Veta
Abstract:
Terminal ductal lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involuti…
▽ More
Terminal ductal lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses' Health Study (NHS). A first set of 92 WSIs was annotated for TDLUs, acini and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73$\pm$0.09, and segmented TDLUs and adipose tissue with Dice scores of 0.86$\pm$0.11 and 0.86$\pm$0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 95% CI [0.51, 0.83], 0.81, 95% CI [0.67, 0.90], and 0.73, 95% CI [0.54, 0.85], respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80, 95% CI [0.63, 0.90] for number of TDLUs per tissue area, 0.57, 95% CI [0.19, 0.77] for median TDLU span, and 0.80, 95% CI [0.62, 0.89] for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.