-
Underestimation of lung regions on chest X-ray segmentation masks assessed by comparison with total lung volume evaluated on computed tomography
Authors:
Przemysław Bombiński,
Patryk Szatkowski,
Bartłomiej Sobieski,
Tymoteusz Kwieciński,
Szymon Płotka,
Mariusz Adamek,
Marcin Banasiuk,
Mariusz I. Furmanek,
Przemysław Biecek
Abstract:
Lung mask creation lacks well-defined criteria and standardized guidelines, leading to a high degree of subjectivity between annotators. In this study, we assess the underestimation of lung regions on chest X-ray segmentation masks created according to the current state-of-the-art method, by comparison with total lung volume evaluated on computed tomography (CT). We show, that lung X-ray masks cre…
▽ More
Lung mask creation lacks well-defined criteria and standardized guidelines, leading to a high degree of subjectivity between annotators. In this study, we assess the underestimation of lung regions on chest X-ray segmentation masks created according to the current state-of-the-art method, by comparison with total lung volume evaluated on computed tomography (CT). We show, that lung X-ray masks created by following the contours of the heart, mediastinum, and diaphragm significantly underestimate lung regions and exclude substantial portions of the lungs from further assessment, which may result in numerous clinical errors.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Prevention is better than cure: a case study of the abnormalities detection in the chest
Authors:
Weronika Hryniewska,
Piotr Czarnecki,
Jakub Wiśniewski,
Przemysław Bombiński,
Przemysław Biecek
Abstract:
Prevention is better than cure. This old truth applies not only to the prevention of diseases but also to the prevention of issues with AI models used in medicine. The source of malfunctioning of predictive models often lies not in the training process but reaches the data acquisition phase or design of the experiment phase.
In this paper, we analyze in detail a single use case - a Kaggle compet…
▽ More
Prevention is better than cure. This old truth applies not only to the prevention of diseases but also to the prevention of issues with AI models used in medicine. The source of malfunctioning of predictive models often lies not in the training process but reaches the data acquisition phase or design of the experiment phase.
In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. Errors made at the data collection stage make it difficult to validate the model correctly.
Based on this use case, we show how to monitor data and model balance (fairness) throughout the life cycle of a predictive model, from data acquisition to parity analysis of model scores.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Interpretable machine learning for time-to-event prediction in medicine and healthcare
Authors:
Hubert Baniecki,
Bartlomiej Sobieski,
Patryk Szatkowski,
Przemyslaw Bombinski,
Przemyslaw Biecek
Abstract:
Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature im…
▽ More
Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature importance explanations. We show how post-hoc interpretation methods allow for finding biases in AI systems predicting length of stay using a novel multi-modal dataset created from 1235 X-ray images with textual radiology reports annotated by human experts. Moreover, we evaluate cancer survival models beyond predictive performance to include the importance of multi-omics feature groups based on a large-scale benchmark comprising 11 datasets from The Cancer Genome Atlas (TCGA). Model developers can use the proposed methods to debug and improve machine learning algorithms, while physicians can discover disease biomarkers and assess their significance. We hope the contributed open data and code resources facilitate future work in the emerging research direction of explainable survival analysis.
△ Less
Submitted 27 March, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Checklist for responsible deep learning modeling of medical images based on COVID-19 detection studies
Authors:
Weronika Hryniewska,
Przemysław Bombiński,
Patryk Szatkowski,
Paulina Tomaszewska,
Artur Przelaskowski,
Przemysław Biecek
Abstract:
The sudden outbreak and uncontrolled spread of COVID-19 disease is one of the most important global problems today. In a short period of time, it has led to the development of many deep neural network models for COVID-19 detection with modules for explainability. In this work, we carry out a systematic analysis of various aspects of proposed models. Our analysis revealed numerous mistakes made at…
▽ More
The sudden outbreak and uncontrolled spread of COVID-19 disease is one of the most important global problems today. In a short period of time, it has led to the development of many deep neural network models for COVID-19 detection with modules for explainability. In this work, we carry out a systematic analysis of various aspects of proposed models. Our analysis revealed numerous mistakes made at different stages of data acquisition, model development, and explanation construction. In this work, we overview the approaches proposed in the surveyed Machine Learning articles and indicate typical errors emerging from the lack of deep understanding of the radiography domain. We present the perspective of both: experts in the field - radiologists and deep learning engineers dealing with model explanations. The final result is a proposed checklist with the minimum conditions to be met by a reliable COVID-19 diagnostic model.
△ Less
Submitted 23 April, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.