-
Higher Chest X-ray Resolution Improves Classification Performance
Authors:
Alessandro Wollek,
Sardi Hyska,
Bastian Sabel,
Michael Ingrisch,
Tobias Lasser
Abstract:
Deep learning models for image classification are often trained at a resolution of 224 x 224 pixels for historical and efficiency reasons. However, chest X-rays are acquired at a much higher resolution to display subtle pathologies. This study investigates the effect of training resolution on chest X-ray classification performance, using the chest X-ray 14 dataset. The results show that training w…
▽ More
Deep learning models for image classification are often trained at a resolution of 224 x 224 pixels for historical and efficiency reasons. However, chest X-rays are acquired at a much higher resolution to display subtle pathologies. This study investigates the effect of training resolution on chest X-ray classification performance, using the chest X-ray 14 dataset. The results show that training with a higher image resolution, specifically 1024 x 1024 pixels, results in the best overall classification performance with a mean AUC of 84.2 % compared to 82.7 % when trained with 256 x 256 pixel images. Additionally, comparison of bounding boxes and GradCAM saliency maps suggest that low resolutions, such as 256 x 256 pixels, are insufficient for identifying small pathologies and force the model to use spurious discriminating features. Our code is publicly available at https://gitlab.lrz.de/IP/cxr-resolution
△ Less
Submitted 3 August, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
WindowNet: Learnable Windows for Chest X-ray Classification
Authors:
Alessandro Wollek,
Sardi Hyska,
Bastian Sabel,
Michael Ingrisch,
Tobias Lasser
Abstract:
Chest X-ray (CXR) images are commonly compressed to a lower resolution and bit depth to reduce their size, potentially altering subtle diagnostic features.
Radiologists use windowing operations to enhance image contrast, but the impact of such operations on CXR classification performance is unclear.
In this study, we show that windowing can improve CXR classification performance, and propose W…
▽ More
Chest X-ray (CXR) images are commonly compressed to a lower resolution and bit depth to reduce their size, potentially altering subtle diagnostic features.
Radiologists use windowing operations to enhance image contrast, but the impact of such operations on CXR classification performance is unclear.
In this study, we show that windowing can improve CXR classification performance, and propose WindowNet, a model that learns optimal window settings.
We first investigate the impact of bit-depth on classification performance and find that a higher bit-depth (12-bit) leads to improved performance.
We then evaluate different windowing settings and show that training with a distinct window generally improves pathology-wise classification performance.
Finally, we propose and evaluate WindowNet, a model that learns optimal window settings, and show that it significantly improves performance compared to the baseline model without windowing.
△ Less
Submitted 3 August, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Automated Labeling of German Chest X-Ray Radiology Reports using Deep Learning
Authors:
Alessandro Wollek,
Philip Haitzer,
Thomas Sedlmeyr,
Sardi Hyska,
Johannes Rueckel,
Bastian Sabel,
Michael Ingrisch,
Tobias Lasser
Abstract:
Radiologists are in short supply globally, and deep learning models offer a promising solution to address this shortage as part of clinical decision-support systems. However, training such models often requires expensive and time-consuming manual labeling of large datasets. Automatic label extraction from radiology reports can reduce the time required to obtain labeled datasets, but this task is c…
▽ More
Radiologists are in short supply globally, and deep learning models offer a promising solution to address this shortage as part of clinical decision-support systems. However, training such models often requires expensive and time-consuming manual labeling of large datasets. Automatic label extraction from radiology reports can reduce the time required to obtain labeled datasets, but this task is challenging due to semantically similar words and missing annotated data. In this work, we explore the potential of weak supervision of a deep learning-based label prediction model, using a rule-based labeler. We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model and fine-tuned on a small dataset of manually labeled reports. Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks. Our findings highlight the benefits of employing deep learning-based models even in scenarios with sparse data and the use of the rule-based labeler as a tool for weak supervision.
△ Less
Submitted 7 July, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
German CheXpert Chest X-ray Radiology Report Labeler
Authors:
Alessandro Wollek,
Sardi Hyska,
Thomas Sedlmeyr,
Philip Haitzer,
Johannes Rueckel,
Bastian O. Sabel,
Michael Ingrisch,
Tobias Lasser
Abstract:
This study aimed to develop an algorithm to automatically extract annotations for chest X-ray classification models from German thoracic radiology reports. An automatic label extraction model was designed based on the CheXpert architecture, and a web-based annotation interface was created for iterative improvements. Results showed that automated label extraction can reduce time spent on manual lab…
▽ More
This study aimed to develop an algorithm to automatically extract annotations for chest X-ray classification models from German thoracic radiology reports. An automatic label extraction model was designed based on the CheXpert architecture, and a web-based annotation interface was created for iterative improvements. Results showed that automated label extraction can reduce time spent on manual labeling and improve overall modeling performance. The model trained on automatically extracted labels performed competitively to manually labeled data and strongly outperformed the model trained on publicly available data.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification
Authors:
Alessandro Wollek,
Robert Graf,
Saša Čečatka,
Nicola Fink,
Theresa Willem,
Bastian O. Sabel,
Tobias Lasser
Abstract:
Purpose: To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency using the example of pneumothorax classification.
Materials and Methods: In this retrospective study, ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData. Saliency…
▽ More
Purpose: To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency using the example of pneumothorax classification.
Materials and Methods: In this retrospective study, ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData. Saliency maps were generated using transformer multimodal explainability and gradient-weighted class activation map** (GradCAM). Classification performance was evaluated on the Chest X-Ray 14, VinBigData, and SIIM-ACR data sets using the area under the receiver operating characteristic curve analysis (AUC) and compared with convolutional neural networks (CNNs). The explainability methods were evaluated with positive/negative perturbation, sensitivity-n, effective heat ratio, intra-architecture repeatability and interarchitecture reproducibility. In the user study, three radiologists classified 160 CXRs with/without saliency maps for pneumothorax and rated their usefulness.
Results: ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs 0.95 (95% CI: 0.943, 0.950) versus 0.83 (95%, CI 0.826, 0.842) on Chest X-Ray 14, 0.84 (95% CI: 0.769, 0.912) versus 0.83 (95% CI: 0.760, 0.895) on VinBigData, and 0.85 (95% CI: 0.847, 0.861) versus 0.87 (95% CI: 0.868, 0.882) on SIIM ACR. Both saliency map methods unveiled a strong bias toward pneumothorax tubes in the models. Radiologists found 47% of the attention-based saliency maps useful and 39% of GradCAM. The attention-based methods outperformed GradCAM on all metrics.
Conclusion: ViTs performed similarly to CNNs in CXR classification, and their attention-based saliency maps were more useful to radiologists and outperformed GradCAM.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports
Authors:
Katharina Jeblick,
Balthasar Schachtner,
Jakob Dexl,
Andreas Mittermeier,
Anna Theresa Stüber,
Johanna Topalis,
Tobias Weber,
Philipp Wesp,
Bastian Sabel,
Jens Ricke,
Michael Ingrisch
Abstract:
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conduct…
▽ More
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
A knee cannot have lung disease: out-of-distribution detection with in-distribution voting using the medical example of chest X-ray classification
Authors:
Alessandro Wollek,
Theresa Willem,
Michael Ingrisch,
Bastian Sabel,
Tobias Lasser
Abstract:
To investigate the impact of OOD radiographs on existing chest X-ray classification models and to increase their robustness against OOD data. The study employed the commonly used chest X-ray classification model, CheXnet, trained on the chest X-ray 14 data set, and tested its robustness against OOD data using three public radiography data sets: IRMA, Bone Age, and MURA, and the ImageNet data set.…
▽ More
To investigate the impact of OOD radiographs on existing chest X-ray classification models and to increase their robustness against OOD data. The study employed the commonly used chest X-ray classification model, CheXnet, trained on the chest X-ray 14 data set, and tested its robustness against OOD data using three public radiography data sets: IRMA, Bone Age, and MURA, and the ImageNet data set. To detect OOD data for multi-label classification, we proposed in-distribution voting (IDV). The OOD detection performance is measured across data sets using the area under the receiver operating characteristic curve (AUC) analysis and compared with Mahalanobis-based OOD detection, MaxLogit, MaxEnergy and self-supervised OOD detection (SS OOD). Without additional OOD detection, the chest X-ray classifier failed to discard any OOD images, with an AUC of 0.5. The proposed IDV approach trained on ID (chest X-ray 14) and OOD data (IRMA and ImageNet) achieved, on average, 0.999 OOD AUC across the three data sets, surpassing all other OOD detection methods. Mahalanobis-based OOD detection achieved an average OOD detection AUC of 0.982. IDV trained solely with a few thousand ImageNet images had an AUC 0.913, which was higher than MaxLogit (0.726), MaxEnergy (0.724), and SS OOD (0.476). The performance of all tested OOD detection methods did not translate well to radiography data sets, except Mahalanobis-based OOD detection and the proposed IDV method. Training solely on ID data led to incorrect classification of OOD images as ID, resulting in increased false positive rates. IDV substantially improved the model's ID classification performance, even when trained with data that will not occur in the intended use case or test set, without additional inference overhead.
△ Less
Submitted 8 May, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.