-
Synergizing Data Imputation and Electronic Health Records for Advancing Prostate Cancer Research: Challenges, and Practical Applications
Authors:
Abderrahim Oussama Batouche,
Eugen Czeizler,
Miika Koskinen,
Tuomas Mirtti,
Antti Sakari Rannikko
Abstract:
The presence of detailed clinical information in electronic health record (EHR) systems presents promising prospects for enhancing patient care through automated retrieval techniques. Nevertheless, it is widely acknowledged that accessing data within EHRs is hindered by various methodological challenges. Specifically, the clinical notes stored in EHRs are composed in a narrative form, making them…
▽ More
The presence of detailed clinical information in electronic health record (EHR) systems presents promising prospects for enhancing patient care through automated retrieval techniques. Nevertheless, it is widely acknowledged that accessing data within EHRs is hindered by various methodological challenges. Specifically, the clinical notes stored in EHRs are composed in a narrative form, making them prone to ambiguous formulations and highly unstructured data presentations, while structured reports commonly suffer from missing and/or erroneous data entries. This inherent complexity poses significant challenges when attempting automated large-scale medical knowledge extraction tasks, necessitating the application of advanced tools, such as natural language processing (NLP), as well as data audit techniques. This work aims to address these obstacles by creating and validating a novel pipeline designed to extract relevant data pertaining to prostate cancer patients. The objective is to exploit the inherent redundancies available within the integrated structured and unstructured data entries within EHRs in order to generate comprehensive and reliable medical databases, ready to be used in advanced research studies. Additionally, the study explores potential opportunities arising from these data, offering valuable prospects for advancing research in prostate cancer.
△ Less
Submitted 24 October, 2023;
originally announced November 2023.
-
Augment like there's no tomorrow: Consistently performing neural networks for medical imaging
Authors:
Joona Pohjonen,
Carolin Stürenberg,
Atte Föhr,
Reija Randen-Brady,
Lassi Luomala,
Jouni Lohi,
Esa Pitkänen,
Antti Rannikko,
Tuomas Mirtti
Abstract:
Deep neural networks have achieved impressive performance in a wide variety of medical imaging tasks. However, these models often fail on data not used during training, such as data originating from a different medical centre. How to recognize models suffering from this fragility, and how to design robust models are the main obstacles to clinical adoption. Here, we present general methods to ident…
▽ More
Deep neural networks have achieved impressive performance in a wide variety of medical imaging tasks. However, these models often fail on data not used during training, such as data originating from a different medical centre. How to recognize models suffering from this fragility, and how to design robust models are the main obstacles to clinical adoption. Here, we present general methods to identify causes for model generalisation failures and how to circumvent them. First, we use $\textit{distribution-shifted datasets}$ to show that models trained with current state-of-the-art methods are highly fragile to variability encountered in clinical practice, and then develop a $\textit{strong augmentation}$ strategy to address this fragility. Distribution-shifted datasets allow us to discover this fragility, which can otherwise remain undetected after validation against multiple external datasets. Strong augmentation allows us to train robust models achieving consistent performance under shifts from the training data distribution. Importantly, we demonstrate that strong augmentation yields biomedical imaging models which retain high performance when applied to real-world clinical data. Our results pave the way for the development and evaluation of reliable and robust neural networks in clinical practice.
△ Less
Submitted 2 December, 2022; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Classification of datasets with imputed missing values: does imputation quality matter?
Authors:
Tolou Shadbahr,
Michael Roberts,
Jan Stanczuk,
Julian Gilbey,
Philip Teare,
Sören Dittmer,
Matthew Thorpe,
Ramon Vinas Torne,
Evis Sala,
Pietro Lio,
Mishal Patel,
AIX-COVNET Collaboration,
James H. F. Rudd,
Tuomas Mirtti,
Antti Rannikko,
John A. D. Aston,
**g Tang,
Carola-Bibiane Schönlieb
Abstract:
Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete, imputed, samples. The focus of the machine learning researcher is then to optimise the downstream classification…
▽ More
Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete, imputed, samples. The focus of the machine learning researcher is then to optimise the downstream classification performance. In this study, we highlight that it is imperative to consider the quality of the imputation. We demonstrate how the commonly used measures for assessing quality are flawed and propose a new class of discrepancy scores which focus on how well the method recreates the overall distribution of the data. To conclude, we highlight the compromised interpretability of classifier models trained using poorly imputed data.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Spectral decoupling allows training transferable neural networks in medical imaging
Authors:
Joona Pohjonen,
Carolin Stürenberg,
Antti Rannikko,
Tuomas Mirtti,
Esa Pitkänen
Abstract:
Many current neural networks for medical imaging generalise poorly to data unseen during training. Such behaviour can be caused by networks overfitting easy-to-learn, or statistically dominant, features while disregarding other potentially informative features. For example, indistinguishable differences in the sharpness of the images from two different scanners can degrade the performance of the n…
▽ More
Many current neural networks for medical imaging generalise poorly to data unseen during training. Such behaviour can be caused by networks overfitting easy-to-learn, or statistically dominant, features while disregarding other potentially informative features. For example, indistinguishable differences in the sharpness of the images from two different scanners can degrade the performance of the network significantly. All neural networks intended for clinical practice need to be robust to variation in data caused by differences in imaging equipment, sample preparation and patient populations.
To address these challenges, we evaluate the utility of spectral decoupling as an implicit bias mitigation method. Spectral decoupling encourages the neural network to learn more features by simply regularising the networks' unnormalised prediction scores with an L2 penalty, thus having no added computational costs.
We show that spectral decoupling allows training neural networks on datasets with strong spurious correlations and increases networks' robustness for data distribution shifts. To validate our findings, we train networks with and without spectral decoupling to detect prostate cancer tissue slides and COVID-19 in chest radiographs. Networks trained with spectral decoupling achieve up to 9.5 percent point higher performance on external datasets.
Our results show that spectral decoupling helps with generalisation issues associated with neural networks, and can be used to complement or replace computationally expensive explicit bias mitigation methods, such as stain normalization in histological images. We recommend using spectral decoupling as an implicit bias mitigation method in any neural network intended for clinical use.
△ Less
Submitted 17 December, 2021; v1 submitted 31 March, 2021;
originally announced March 2021.
-
Improving Prostate Cancer Detection with Breast Histopathology Images
Authors:
Umair Akhtar Hasan Khan,
Carolin Stürenberg,
Oguzhan Gencoglu,
Kevin Sandeman,
Timo Heikkinen,
Antti Rannikko,
Tuomas Mirtti
Abstract:
Deep neural networks have introduced significant advancements in the field of machine learning-based analysis of digital pathology images including prostate tissue images. With the help of transfer learning, classification and segmentation performance of neural network models have been further increased. However, due to the absence of large, extensively annotated, publicly available prostate histo…
▽ More
Deep neural networks have introduced significant advancements in the field of machine learning-based analysis of digital pathology images including prostate tissue images. With the help of transfer learning, classification and segmentation performance of neural network models have been further increased. However, due to the absence of large, extensively annotated, publicly available prostate histopathology datasets, several previous studies employ datasets from well-studied computer vision tasks such as ImageNet dataset. In this work, we propose a transfer learning scheme from breast histopathology images to improve prostate cancer detection performance. We validate our approach on annotated prostate whole slide images by using a publicly available breast histopathology dataset as pre-training. We show that the proposed cross-cancer approach outperforms transfer learning from ImageNet dataset.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.