-
Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness
Authors:
David Fernández Llorca,
Ronan Hamon,
Henrik Junklewitz,
Kathrin Grosse,
Lars Kunze,
Patrick Seiniger,
Robert Swaim,
Nick Reed,
Alexandre Alahi,
Emilia Gómez,
Ignacio Sánchez,
Akos Kriston
Abstract:
This study explores the complexities of integrating Artificial Intelligence (AI) into Autonomous Vehicles (AVs), examining the challenges introduced by AI components and the impact on testing procedures, focusing on some of the essential requirements for trustworthy AI. Topics addressed include the role of AI at various operational layers of AVs, the implications of the EU's AI Act on AVs, and the…
▽ More
This study explores the complexities of integrating Artificial Intelligence (AI) into Autonomous Vehicles (AVs), examining the challenges introduced by AI components and the impact on testing procedures, focusing on some of the essential requirements for trustworthy AI. Topics addressed include the role of AI at various operational layers of AVs, the implications of the EU's AI Act on AVs, and the need for new testing methodologies for Advanced Driver Assistance Systems (ADAS) and Automated Driving Systems (ADS). The study also provides a detailed analysis on the importance of cybersecurity audits, the need for explainability in AI decision-making processes and protocols for assessing the robustness and ethical behaviour of predictive systems in AVs. The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology, highlighting the need for multidisciplinary expertise.
△ Less
Submitted 21 February, 2024;
originally announced March 2024.
-
Nodule detection and generation on chest X-rays: NODE21 Challenge
Authors:
Ecem Sogancioglu,
Bram van Ginneken,
Finn Behrendt,
Marcel Bengs,
Alexander Schlaefer,
Miron Radu,
Di Xu,
Ke Sheng,
Fabien Scalzo,
Eric Marcus,
Samuele Papa,
Jonas Teuwen,
Ernst Th. Scholten,
Steven Schalekamp,
Nils Hendrix,
Colin Jacobs,
Ward Hendrix,
Clara I Sánchez,
Keelin Murphy
Abstract:
Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of…
▽ More
Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
JSSL: Joint Supervised and Self-supervised Learning for MRI Reconstruction
Authors:
George Yiasemis,
Nikita Moriakov,
Clara I. Sánchez,
Jan-Jakob Sonke,
Jonas Teuwen
Abstract:
Magnetic Resonance Imaging represents an important diagnostic modality; however, its inherently slow acquisition process poses challenges in obtaining fully sampled k-space data under motion in clinical scenarios such as abdominal, cardiac, and prostate imaging. In the absence of fully sampled acquisitions, which can serve as ground truth data, training deep learning algorithms in a supervised man…
▽ More
Magnetic Resonance Imaging represents an important diagnostic modality; however, its inherently slow acquisition process poses challenges in obtaining fully sampled k-space data under motion in clinical scenarios such as abdominal, cardiac, and prostate imaging. In the absence of fully sampled acquisitions, which can serve as ground truth data, training deep learning algorithms in a supervised manner to predict the underlying ground truth image becomes an impossible task. To address this limitation, self-supervised methods have emerged as a viable alternative, leveraging available subsampled k-space data to train deep learning networks for MRI reconstruction. Nevertheless, these self-supervised approaches often fall short when compared to supervised methodologies. In this paper, we introduce JSSL (Joint Supervised and Self-supervised Learning), a novel training approach for deep learning-based MRI reconstruction algorithms aimed at enhancing reconstruction quality in scenarios where target dataset(s) containing fully sampled k-space measurements are unavailable. Our proposed method operates by simultaneously training a model in a self-supervised learning setting, using subsampled data from the target dataset(s), and in a supervised learning manner, utilizing data from other datasets, referred to as proxy datasets, where fully sampled k-space data is accessible. To demonstrate the efficacy of JSSL, we utilized subsampled prostate parallel MRI measurements as the target dataset, while employing fully sampled brain and knee k-space acquisitions as proxy datasets. Our results showcase a substantial improvement over conventional self-supervised training methods, thereby underscoring the effectiveness of our joint approach. We provide a theoretical motivation for JSSL and establish a practical "rule-of-thumb" for selecting the most appropriate training approach for deep MRI reconstruction.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
The potential of large language models for improving probability learning: A study on ChatGPT3.5 and first-year computer engineering students
Authors:
Angel Udias,
Antonio Alonso-Ayuso,
Ignacio Sanchez,
Sonia Hernandez,
Maria Eugenia Castellanos,
Raquel Montes Diez,
Emilio Lopez Cano
Abstract:
In this paper, we assess the efficacy of ChatGPT (version Feb 2023), a large-scale language model, in solving probability problems typically presented in introductory computer engineering exams. Our study comprised a set of 23 probability exercises administered to students at Rey Juan Carlos University (URJC) in Madrid. The responses produced by ChatGPT were evaluated by a group of five statistics…
▽ More
In this paper, we assess the efficacy of ChatGPT (version Feb 2023), a large-scale language model, in solving probability problems typically presented in introductory computer engineering exams. Our study comprised a set of 23 probability exercises administered to students at Rey Juan Carlos University (URJC) in Madrid. The responses produced by ChatGPT were evaluated by a group of five statistics professors, who assessed them qualitatively and assigned grades based on the same criteria used for students. Our results indicate that ChatGPT surpasses the average student in terms of phrasing, organization, and logical reasoning. The model's performance remained consistent for both the Spanish and English versions of the exercises. However, ChatGPT encountered difficulties in executing basic numerical operations. Our experiments demonstrate that requesting ChatGPT to provide the solution in the form of an R script proved to be an effective approach for overcoming these limitations. In summary, our results indicate that ChatGPT surpasses the average student in solving probability problems commonly presented in introductory computer engineering exams. Nonetheless, the model exhibits limitations in reasoning around certain probability concepts. The model's ability to deliver high-quality explanations and illustrate solutions in any programming language, coupled with its performance in solving probability exercises, suggests that large language models have the potential to serve as learning assistants.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Understanding metric-related pitfalls in image analysis validation
Authors:
Annika Reinke,
Minu D. Tizabi,
Michael Baumgartner,
Matthias Eisenmann,
Doreen Heckmann-Nötzel,
A. Emre Kavur,
Tim Rädsch,
Carole H. Sudre,
Laura Acion,
Michela Antonelli,
Tal Arbel,
Spyridon Bakas,
Arriel Benis,
Matthew Blaschko,
Florian Buettner,
M. Jorge Cardoso,
Veronika Cheplygina,
Jianxu Chen,
Evangelia Christodoulou,
Beth A. Cimini,
Gary S. Collins,
Keyvan Farahani,
Luciana Ferrer,
Adrian Galdran,
Bram van Ginneken
, et al. (53 additional authors not shown)
Abstract:
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibilit…
▽ More
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
△ Less
Submitted 23 February, 2024; v1 submitted 3 February, 2023;
originally announced February 2023.
-
AIROGS: Artificial Intelligence for RObust Glaucoma Screening Challenge
Authors:
Coen de Vente,
Koenraad A. Vermeer,
Nicolas Jaccard,
He Wang,
Hongyi Sun,
Firas Khader,
Daniel Truhn,
Temirgali Aimyshev,
Yerkebulan Zhanibekuly,
Tien-Dung Le,
Adrian Galdran,
Miguel Ángel González Ballester,
Gustavo Carneiro,
Devika R G,
Hrishikesh P S,
Densen Puthussery,
Hong Liu,
Zekang Yang,
Satoshi Kondo,
Satoshi Kasai,
Edward Wang,
Ashritha Durvasula,
Jónathan Heras,
Miguel Ángel Zapata,
Teresa Araújo
, et al. (11 additional authors not shown)
Abstract:
The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios…
▽ More
The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios due to the presence of out-of-distribution and low-quality images. To address this issue, we propose the Artificial Intelligence for Robust Glaucoma Screening (AIROGS) challenge. This challenge includes a large dataset of around 113,000 images from about 60,000 patients and 500 different screening centers, and encourages the development of algorithms that are robust to ungradable and unexpected input data. We evaluated solutions from 14 teams in this paper, and found that the best teams performed similarly to a set of 20 expert ophthalmologists and optometrists. The highest-scoring team achieved an area under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99) for detecting ungradable images on-the-fly. Additionally, many of the algorithms showed robust performance when tested on three other publicly available datasets. These results demonstrate the feasibility of robust AI-enabled glaucoma screening.
△ Less
Submitted 10 February, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
On Retrospective k-space Subsampling schemes For Deep MRI Reconstruction
Authors:
George Yiasemis,
Clara I. Sánchez,
Jan-Jakob Sonke,
Jonas Teuwen
Abstract:
Acquiring fully-sampled MRI $k$-space data is time-consuming, and collecting accelerated data can reduce the acquisition time. Employing 2D Cartesian-rectilinear subsampling schemes is a conventional approach for accelerated acquisitions; however, this often results in imprecise reconstructions, even with the use of Deep Learning (DL), especially at high acceleration factors. Non-rectilinear or no…
▽ More
Acquiring fully-sampled MRI $k$-space data is time-consuming, and collecting accelerated data can reduce the acquisition time. Employing 2D Cartesian-rectilinear subsampling schemes is a conventional approach for accelerated acquisitions; however, this often results in imprecise reconstructions, even with the use of Deep Learning (DL), especially at high acceleration factors. Non-rectilinear or non-Cartesian trajectories can be implemented in MRI scanners as alternative subsampling options. This work investigates the impact of the $k$-space subsampling scheme on the quality of reconstructed accelerated MRI measurements produced by trained DL models. The Recurrent Variational Network (RecurrentVarNet) was used as the DL-based MRI-reconstruction architecture. Cartesian, fully-sampled multi-coil $k$-space measurements from three datasets were retrospectively subsampled with different accelerations using eight distinct subsampling schemes: four Cartesian-rectilinear, two Cartesian non-rectilinear, and two non-Cartesian. Experiments were conducted in two frameworks: scheme-specific, where a distinct model was trained and evaluated for each dataset-subsampling scheme pair, and multi-scheme, where for each dataset a single model was trained on data randomly subsampled by any of the eight schemes and evaluated on data subsampled by all schemes. In both frameworks, RecurrentVarNets trained and evaluated on non-rectilinearly subsampled data demonstrated superior performance, particularly for high accelerations. In the multi-scheme setting, reconstruction performance on rectilinearly subsampled data improved when compared to the scheme-specific experiments. Our findings demonstrate the potential for using DL-based methods, trained on non-rectilinearly subsampled measurements, to optimize scan time and image quality.
△ Less
Submitted 9 August, 2023; v1 submitted 19 January, 2023;
originally announced January 2023.
-
Liability regimes in the age of AI: a use-case driven analysis of the burden of proof
Authors:
David Fernández Llorca,
Vicky Charisi,
Ronan Hamon,
Ignacio Sánchez,
Emilia Gómez
Abstract:
New emerging technologies powered by Artificial Intelligence (AI) have the potential to disruptively transform our societies for the better. In particular, data-driven learning approaches (i.e., Machine Learning (ML)) have been a true revolution in the advancement of multiple technologies in various application domains. But at the same time there is growing concern about certain intrinsic characte…
▽ More
New emerging technologies powered by Artificial Intelligence (AI) have the potential to disruptively transform our societies for the better. In particular, data-driven learning approaches (i.e., Machine Learning (ML)) have been a true revolution in the advancement of multiple technologies in various application domains. But at the same time there is growing concern about certain intrinsic characteristics of these methodologies that carry potential risks to both safety and fundamental rights. Although there are mechanisms in the adoption process to minimize these risks (e.g., safety regulations), these do not exclude the possibility of harm occurring, and if this happens, victims should be able to seek compensation. Liability regimes will therefore play a key role in ensuring basic protection for victims using or interacting with these systems. However, the same characteristics that make AI systems inherently risky, such as lack of causality, opacity, unpredictability or their self and continuous learning capabilities, may lead to considerable difficulties when it comes to proving causation. This paper presents three case studies, as well as the methodology to reach them, that illustrate these difficulties. Specifically, we address the cases of cleaning robots, delivery drones and robots in education. The outcome of the proposed analysis suggests the need to revise liability regimes to alleviate the burden of proof on victims in cases involving AI technologies.
△ Less
Submitted 17 March, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Molecular information theory meets protein folding
Authors:
Ignacio E. Sánchez,
Ezequiel A. Galpern,
Martín M. Garibaldi,
Diego U. Ferreiro
Abstract:
We propose an application of molecular information theory to analyze the folding of single domain proteins. We analyze results from various areas of protein science, such as sequence-based potentials, reduced amino acid alphabets, backbone configurational entropy, secondary structure content, residue burial layers, and mutational studies of protein stability changes. We found that the average info…
▽ More
We propose an application of molecular information theory to analyze the folding of single domain proteins. We analyze results from various areas of protein science, such as sequence-based potentials, reduced amino acid alphabets, backbone configurational entropy, secondary structure content, residue burial layers, and mutational studies of protein stability changes. We found that the average information contained in the sequences of evolved proteins is very close to the average information needed to specify a fold ~2.2 $\pm$ 0.3 bits/(site operation). The effective alphabet size in evolved proteins equals the effective number of conformations of a residue in the compact unfolded state at around 5. We calculated an energy-to-information conversion efficiency upon folding of around 50%, lower than the theoretical limit of 70%, but much higher than human built macroscopic machines. We propose a simple map** between molecular information theory and energy landscape theory and explore the connections between sequence evolution, configurational entropy and the energetics of protein folding.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Metrics reloaded: Recommendations for image analysis validation
Authors:
Lena Maier-Hein,
Annika Reinke,
Patrick Godau,
Minu D. Tizabi,
Florian Buettner,
Evangelia Christodoulou,
Ben Glocker,
Fabian Isensee,
Jens Kleesiek,
Michal Kozubek,
Mauricio Reyes,
Michael A. Riegler,
Manuel Wiesenfarth,
A. Emre Kavur,
Carole H. Sudre,
Michael Baumgartner,
Matthias Eisenmann,
Doreen Heckmann-Nötzel,
Tim Rädsch,
Laura Acion,
Michela Antonelli,
Tal Arbel,
Spyridon Bakas,
Arriel Benis,
Matthew Blaschko
, et al. (49 additional authors not shown)
Abstract:
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international ex…
▽ More
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases.
△ Less
Submitted 23 February, 2024; v1 submitted 3 June, 2022;
originally announced June 2022.
-
A deep learning framework for the detection and quantification of drusen and reticular pseudodrusen on optical coherence tomography
Authors:
Roy Schwartz,
Hagar Khalid,
Sandra Liakopoulos,
Yanling Ouyang,
Coen de Vente,
Cristina González-Gonzalo,
Aaron Y. Lee,
Robyn Guymer,
Emily Y. Chew,
Catherine Egan,
Zhichao Wu,
Himeesh Kumar,
Joseph Farrington,
Clara I. Sánchez,
Adnan Tufail
Abstract:
Purpose - To develop and validate a deep learning (DL) framework for the detection and quantification of drusen and reticular pseudodrusen (RPD) on optical coherence tomography scans.
Design - Development and validation of deep learning models for classification and feature segmentation.
Methods - A DL framework was developed consisting of a classification model and an out-of-distribution (OOD…
▽ More
Purpose - To develop and validate a deep learning (DL) framework for the detection and quantification of drusen and reticular pseudodrusen (RPD) on optical coherence tomography scans.
Design - Development and validation of deep learning models for classification and feature segmentation.
Methods - A DL framework was developed consisting of a classification model and an out-of-distribution (OOD) detection model for the identification of ungradable scans; a classification model to identify scans with drusen or RPD; and an image segmentation model to independently segment lesions as RPD or drusen. Data were obtained from 1284 participants in the UK Biobank (UKBB) with a self-reported diagnosis of age-related macular degeneration (AMD) and 250 UKBB controls. Drusen and RPD were manually delineated by five retina specialists. The main outcome measures were sensitivity, specificity, area under the ROC curve (AUC), kappa, accuracy and intraclass correlation coefficient (ICC).
Results - The classification models performed strongly at their respective tasks (0.95, 0.93, and 0.99 AUC, respectively, for the ungradable scans classifier, the OOD model, and the drusen and RPD classification model). The mean ICC for drusen and RPD area vs. graders was 0.74 and 0.61, respectively, compared with 0.69 and 0.68 for intergrader agreement. FROC curves showed that the model's sensitivity was close to human performance.
Conclusions - The models achieved high classification and segmentation performance, similar to human performance. Application of this robust framework will further our understanding of RPD as a separate entity from drusen in both research and clinical settings.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Deep MRI Reconstruction with Radial Subsampling
Authors:
George Yiasemis,
Chao** Zhang,
Clara I. Sánchez,
Jan-Jakob Sonke,
Jonas Teuwen
Abstract:
In spite of its extensive adaptation in almost every medical diagnostic and examinatorial application, Magnetic Resonance Imaging (MRI) is still a slow imaging modality which limits its use for dynamic imaging. In recent years, Parallel Imaging (PI) and Compressed Sensing (CS) have been utilised to accelerate the MRI acquisition. In clinical settings, subsampling the k-space measurements during sc…
▽ More
In spite of its extensive adaptation in almost every medical diagnostic and examinatorial application, Magnetic Resonance Imaging (MRI) is still a slow imaging modality which limits its use for dynamic imaging. In recent years, Parallel Imaging (PI) and Compressed Sensing (CS) have been utilised to accelerate the MRI acquisition. In clinical settings, subsampling the k-space measurements during scanning time using Cartesian trajectories, such as rectilinear sampling, is currently the most conventional CS approach applied which, however, is prone to producing aliased reconstructions. With the advent of the involvement of Deep Learning (DL) in accelerating the MRI, reconstructing faithful images from subsampled data became increasingly promising. Retrospectively applying a subsampling mask onto the k-space data is a way of simulating the accelerated acquisition of k-space data in real clinical setting. In this paper we compare and provide a review for the effect of applying either rectilinear or radial retrospective subsampling on the quality of the reconstructions outputted by trained deep neural networks. With the same choice of hyper-parameters, we train and evaluate two distinct Recurrent Inference Machines (RIMs), one for each type of subsampling. The qualitative and quantitative results of our experiments indicate that the model trained on data with radial subsampling attains higher performance and learns to estimate reconstructions with higher fidelity paving the way for other DL approaches to involve radial subsampling.
△ Less
Submitted 12 January, 2022; v1 submitted 17 August, 2021;
originally announced August 2021.
-
Common Limitations of Image Processing Metrics: A Picture Story
Authors:
Annika Reinke,
Minu D. Tizabi,
Carole H. Sudre,
Matthias Eisenmann,
Tim Rädsch,
Michael Baumgartner,
Laura Acion,
Michela Antonelli,
Tal Arbel,
Spyridon Bakas,
Peter Bankhead,
Arriel Benis,
Matthew Blaschko,
Florian Buettner,
M. Jorge Cardoso,
Jianxu Chen,
Veronika Cheplygina,
Evangelia Christodoulou,
Beth Cimini,
Gary S. Collins,
Sandy Engelhardt,
Keyvan Farahani,
Luciana Ferrer,
Adrian Galdran,
Bram van Ginneken
, et al. (68 additional authors not shown)
Abstract:
While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using spe…
▽ More
While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. These are typically related to (1) the disregard of inherent metric properties, such as the behaviour in the presence of class imbalance or small target structures, (2) the disregard of inherent data set properties, such as the non-independence of the test cases, and (3) the disregard of the actual biomedical domain interest that the metrics should reflect. This living dynamically document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. In this context, it focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.
△ Less
Submitted 6 December, 2023; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Improving Automated COVID-19 Grading with Convolutional Neural Networks in Computed Tomography Scans: An Ablation Study
Authors:
Coen de Vente,
Luuk H. Boulogne,
Kiran Vaidhya Venkadesh,
Cheryl Sital,
Nikolas Lessmann,
Colin Jacobs,
Clara I. Sánchez,
Bram van Ginneken
Abstract:
Amidst the ongoing pandemic, several studies have shown that COVID-19 classification and grading using computed tomography (CT) images can be automated with convolutional neural networks (CNNs). Many of these studies focused on reporting initial results of algorithms that were assembled from commonly used components. The choice of these components was often pragmatic rather than systematic. For in…
▽ More
Amidst the ongoing pandemic, several studies have shown that COVID-19 classification and grading using computed tomography (CT) images can be automated with convolutional neural networks (CNNs). Many of these studies focused on reporting initial results of algorithms that were assembled from commonly used components. The choice of these components was often pragmatic rather than systematic. For instance, several studies used 2D CNNs even though these might not be optimal for handling 3D CT volumes. This paper identifies a variety of components that increase the performance of CNN-based algorithms for COVID-19 grading from CT images. We investigated the effectiveness of using a 3D CNN instead of a 2D CNN, of using transfer learning to initialize the network, of providing automatically computed lesion maps as additional network input, and of predicting a continuous instead of a categorical output. A 3D CNN with these components achieved an area under the ROC curve (AUC) of 0.934 on our test set of 105 CT scans and an AUC of 0.923 on a publicly available set of 742 CT scans, a substantial improvement in comparison with a previously published 2D CNN. An ablation study demonstrated that in addition to using a 3D CNN instead of a 2D CNN transfer learning contributed the most and continuous output contributed the least to improving the model performance.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors
Authors:
Gerda Bortsova,
Cristina González-Gonzalo,
Suzanne C. Wetstein,
Florian Dubost,
Ioannis Katramados,
Laurens Hogeweg,
Bart Liefers,
Bram van Ginneken,
Josien P. W. Pluim,
Mitko Veta,
Clara I. Sánchez,
Marleen de Bruijne
Abstract:
Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure.
In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep l…
▽ More
Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure.
In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples. We consider this to be the most realistic scenario for MedIA systems.
Firstly, we study the effect of weight initialization (ImageNet vs. random) on the transferability of adversarial attacks from the surrogate model to the target model. Secondly, we study the influence of differences in development data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks.
Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when develo** security-critical MedIA systems planned to be deployed in clinical practice.
△ Less
Submitted 17 June, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Iterative Augmentation of Visual Evidence for Weakly-Supervised Lesion Localization in Deep Interpretability Frameworks: Application to Color Fundus Images
Authors:
Cristina González-Gonzalo,
Bart Liefers,
Bram van Ginneken,
Clara I. Sánchez
Abstract:
Interpretability of deep learning (DL) systems is gaining attention in medical imaging to increase experts' trust in the obtained predictions and facilitate their integration in clinical settings. We propose a deep visualization method to generate interpretability of DL classification tasks in medical imaging by means of visual evidence augmentation. The proposed method iteratively unveils abnorma…
▽ More
Interpretability of deep learning (DL) systems is gaining attention in medical imaging to increase experts' trust in the obtained predictions and facilitate their integration in clinical settings. We propose a deep visualization method to generate interpretability of DL classification tasks in medical imaging by means of visual evidence augmentation. The proposed method iteratively unveils abnormalities based on the prediction of a classifier trained only with image-level labels. For each image, initial visual evidence of the prediction is extracted with a given visual attribution technique. This provides localization of abnormalities that are then removed through selective inpainting. We iteratively apply this procedure until the system considers the image as normal. This yields augmented visual evidence, including less discriminative lesions which were not detected at first but should be considered for final diagnosis. We apply the method to grading of two retinal diseases in color fundus images: diabetic retinopathy (DR) and age-related macular degeneration (AMD). We evaluate the generated visual evidence and the performance of weakly-supervised localization of different types of DR and AMD abnormalities, both qualitatively and quantitatively. We show that the augmented visual evidence of the predictions highlights the biomarkers considered by experts for diagnosis and improves the final localization performance. It results in a relative increase of 11.2+/-2.0% per image regarding sensitivity averaged at 10 false positives/image on average, when applied to different classification tasks, visual attribution techniques and network architectures. This makes the proposed method a useful tool for exhaustive visual support of DL classifiers in medical imaging.
△ Less
Submitted 1 February, 2022; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Global Planar Convolutions for improved context aggregation in Brain Tumor Segmentation
Authors:
Santi Puch,
Irina Sánchez,
Aura Hernández,
Gemma Piella,
Vesna Prchkovska
Abstract:
In this work, we introduce the Global Planar Convolution module as a building-block for fully-convolutional networks that aggregates global information and, therefore, enhances the context perception capabilities of segmentation networks in the context of brain tumor segmentation. We implement two baseline architectures (3D UNet and a residual version of 3D UNet, ResUNet) and present a novel archi…
▽ More
In this work, we introduce the Global Planar Convolution module as a building-block for fully-convolutional networks that aggregates global information and, therefore, enhances the context perception capabilities of segmentation networks in the context of brain tumor segmentation. We implement two baseline architectures (3D UNet and a residual version of 3D UNet, ResUNet) and present a novel architecture based on these two architectures, ContextNet, that includes the proposed Global Planar Convolution module. We show that the addition of such module eliminates the need of building networks with several representation levels, which tend to be over-parametrized and to showcase slow rates of convergence. Furthermore, we provide a visual demonstration of the behavior of GPC modules via visualization of intermediate representations. We finally participate in the 2018 edition of the BraTS challenge with our best performing models, that are based on ContextNet, and report the evaluation scores on the validation and the test sets of the challenge.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
Few-shot Learning with Deep Triplet Networks for Brain Imaging Modality Recognition
Authors:
Santi Puch,
Irina Sánchez,
Matt Rowe
Abstract:
Image modality recognition is essential for efficient imaging workflows in current clinical environments, where multiple imaging modalities are used to better comprehend complex diseases. Emerging biomarkers from novel, rare modalities are being developed to aid in such understanding, however the availability of these images is often limited. This scenario raises the necessity of recognising new i…
▽ More
Image modality recognition is essential for efficient imaging workflows in current clinical environments, where multiple imaging modalities are used to better comprehend complex diseases. Emerging biomarkers from novel, rare modalities are being developed to aid in such understanding, however the availability of these images is often limited. This scenario raises the necessity of recognising new imaging modalities without them being collected and annotated in large amounts. In this work, we present a few-shot learning model for limited training examples based on Deep Triplet Networks. We show that the proposed model is more accurate in distinguishing different modalities than a traditional Convolutional Neural Network classifier when limited samples are available. Furthermore, we evaluate the performance of both classifiers when presented with noisy samples and provide an initial inspection of how the proposed model can incorporate measures of uncertainty to be more robust against out-of-sample examples.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
A deep learning model for segmentation of geographic atrophy to study its long-term natural history
Authors:
Bart Liefers,
Johanna M. Colijn,
Cristina González-Gonzalo,
Timo Verzijden,
Paul Mitchell,
Carel B. Hoyng,
Bram van Ginneken,
Caroline C. W. Klaver,
Clara I. Sánchez
Abstract:
Purpose: To develop and validate a deep learning model for automatic segmentation of geographic atrophy (GA) in color fundus images (CFIs) and its application to study growth rate of GA. Participants: 409 CFIs of 238 eyes with GA from the Rotterdam Study (RS) and the Blue Mountain Eye Study (BMES) for model development, and 5,379 CFIs of 625 eyes from the Age-Related Eye Disease Study (AREDS) for…
▽ More
Purpose: To develop and validate a deep learning model for automatic segmentation of geographic atrophy (GA) in color fundus images (CFIs) and its application to study growth rate of GA. Participants: 409 CFIs of 238 eyes with GA from the Rotterdam Study (RS) and the Blue Mountain Eye Study (BMES) for model development, and 5,379 CFIs of 625 eyes from the Age-Related Eye Disease Study (AREDS) for analysis of GA growth rate. Methods: A deep learning model based on an ensemble of encoder-decoder architectures was implemented and optimized for the segmentation of GA in CFIs. Four experienced graders delineated GA in CFIs from RS and BMES. These manual delineations were used to evaluate the segmentation model using 5-fold cross-validation. The model was further applied to CFIs from the AREDS to study the growth rate of GA. Linear regression analysis was used to study associations between structural biomarkers at baseline and GA growth rate. A general estimate of the progression of GA area over time was made by combining growth rates of all eyes with GA from the AREDS set. Results: The model obtained an average Dice coefficient of 0.72 $\pm$ 0.26 on the BMES and RS. An intraclass correlation coefficient of 0.83 was reached between the automatically estimated GA area and the graders' consensus measures. Eight automatically calculated structural biomarkers (area, filled area, convex area, convex solidity, eccentricity, roundness, foveal involvement and perimeter) were significantly associated with growth rate. Combining all growth rates indicated that GA area grows quadratically up to an area of around 12 mm$^{2}$, after which growth rate stabilizes or decreases. Conclusion: The presented deep learning model allowed for fully automatic and robust segmentation of GA in CFIs. These segmentations can be used to extract structural characteristics of GA that predict its growth rate.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Evaluation of a deep learning system for the joint automated detection of diabetic retinopathy and age-related macular degeneration
Authors:
Cristina González-Gonzalo,
Verónica Sánchez-Gutiérrez,
Paula Hernández-Martínez,
Inés Contreras,
Yara T. Lechanteur,
Artin Domanian,
Bram van Ginneken,
Clara I. Sánchez
Abstract:
Purpose: To validate the performance of a commercially-available, CE-certified deep learning (DL) system, RetCAD v.1.3.0 (Thirona, Nijmegen, The Netherlands), for the joint automatic detection of diabetic retinopathy (DR) and age-related macular degeneration (AMD) in color fundus (CF) images on a dataset with mixed presence of eye diseases.
Methods: Evaluation of joint detection of referable DR…
▽ More
Purpose: To validate the performance of a commercially-available, CE-certified deep learning (DL) system, RetCAD v.1.3.0 (Thirona, Nijmegen, The Netherlands), for the joint automatic detection of diabetic retinopathy (DR) and age-related macular degeneration (AMD) in color fundus (CF) images on a dataset with mixed presence of eye diseases.
Methods: Evaluation of joint detection of referable DR and AMD was performed on a DR-AMD dataset with 600 images acquired during routine clinical practice, containing referable and non-referable cases of both diseases. Each image was graded for DR and AMD by an experienced ophthalmologist to establish the reference standard (RS), and by four independent observers for comparison with human performance. Validation was furtherly assessed on Messidor (1200 images) for individual identification of referable DR, and the Age-Related Eye Disease Study (AREDS) dataset (133821 images) for referable AMD, against the corresponding RS.
Results: Regarding joint validation on the DR-AMD dataset, the system achieved an area under the ROC curve (AUC) of 95.1% for detection of referable DR (SE=90.1%, SP=90.6%). For referable AMD, the AUC was 94.9% (SE=91.8%, SP=87.5%). Average human performance for DR was SE=61.5% and SP=97.8%; for AMD, SE=76.5% and SP=96.1%. Regarding detection of referable DR in Messidor, AUC was 97.5% (SE=92.0%, SP=92.1%); for referable AMD in AREDS, AUC was 92.7% (SE=85.8%, SP=86.0%).
Conclusions: The validated system performs comparably to human experts at simultaneous detection of DR and AMD. This shows that DL systems can facilitate access to joint screening of eye diseases and become a quick and reliable support for ophthalmological experts.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
Cascaded V-Net using ROI masks for brain tumor segmentation
Authors:
Adrià Casamitjana,
Marcel Catà,
Irina Sánchez,
Marc Combalia,
Verónica Vilaplana
Abstract:
In this work we approach the brain tumor segmentation problem with a cascade of two CNNs inspired in the V-Net architecture \cite{VNet}, reformulating residual connections and making use of ROI masks to constrain the networks to train only on relevant voxels. This architecture allows dense training on problems with highly skewed class distributions, such as brain tumor segmentation, by focusing tr…
▽ More
In this work we approach the brain tumor segmentation problem with a cascade of two CNNs inspired in the V-Net architecture \cite{VNet}, reformulating residual connections and making use of ROI masks to constrain the networks to train only on relevant voxels. This architecture allows dense training on problems with highly skewed class distributions, such as brain tumor segmentation, by focusing training only on the vecinity of the tumor area. We report results on BraTS2017 Training and Validation sets.
△ Less
Submitted 30 December, 2018;
originally announced December 2018.
-
Brain MRI super-resolution using 3D generative adversarial networks
Authors:
Irina Sanchez,
Veronica Vilaplana
Abstract:
In this work we propose an adversarial learning approach to generate high resolution MRI scans from low resolution images. The architecture, based on the SRGAN model, adopts 3D convolutions to exploit volumetric information. For the discriminator, the adversarial loss uses least squares in order to stabilize the training. For the generator, the loss function is a combination of a least squares adv…
▽ More
In this work we propose an adversarial learning approach to generate high resolution MRI scans from low resolution images. The architecture, based on the SRGAN model, adopts 3D convolutions to exploit volumetric information. For the discriminator, the adversarial loss uses least squares in order to stabilize the training. For the generator, the loss function is a combination of a least squares adversarial loss and a content term based on mean square error and image gradients in order to improve the quality of the generated images. We explore different solutions for the upsampling phase. We present promising results that improve classical interpolation, showing the potential of the approach for 3D medical imaging super-resolution. Source code available at https://github.com/imatge-upc/3D-GAN-superresolution
△ Less
Submitted 29 December, 2018;
originally announced December 2018.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
A Survey on Deep Learning in Medical Image Analysis
Authors:
Geert Litjens,
Thijs Kooi,
Babak Ehteshami Bejnordi,
Arnaud Arindra Adiyoso Setio,
Francesco Ciompi,
Mohsen Ghafoorian,
Jeroen A. W. M. van der Laak,
Bram van Ginneken,
Clara I. Sánchez
Abstract:
Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, se…
▽ More
Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.
△ Less
Submitted 4 June, 2017; v1 submitted 19 February, 2017;
originally announced February 2017.
-
Advanced channel coding for space mission telecommand links
Authors:
Marco Baldi,
Marco Bianchi,
Franco Chiaraluce,
Roberto Garello,
Ignacio Aguilar Sanchez,
Stefano Cioni
Abstract:
We investigate and compare different options for updating the error correcting code currently used in space mission telecommand links. Taking as a reference the solutions recently emerged as the most promising ones, based on Low-Density Parity-Check codes, we explore the behavior of alternative schemes, based on parallel concatenated turbo codes and soft-decision decoded BCH codes. Our analysis sh…
▽ More
We investigate and compare different options for updating the error correcting code currently used in space mission telecommand links. Taking as a reference the solutions recently emerged as the most promising ones, based on Low-Density Parity-Check codes, we explore the behavior of alternative schemes, based on parallel concatenated turbo codes and soft-decision decoded BCH codes. Our analysis shows that these further options can offer similar or even better performance.
△ Less
Submitted 2 October, 2013;
originally announced October 2013.
-
Advanced coding schemes against jamming in telecommand links
Authors:
Marco Baldi,
Marco Bianchi,
Franco Chiaraluce,
Roberto Garello,
Nicola Maturo,
Ignacio Aguilar Sanchez,
Stefano Cioni
Abstract:
The aim of this paper is to study the performance of some coding schemes recently proposed for updating the TC channel coding standard for space applications, in the presence of jamming. Besides low-density parity-check codes, that appear as the most eligible candidates, we also consider other solutions based on parallel turbo codes and extended BCH codes. We show that all these schemes offer very…
▽ More
The aim of this paper is to study the performance of some coding schemes recently proposed for updating the TC channel coding standard for space applications, in the presence of jamming. Besides low-density parity-check codes, that appear as the most eligible candidates, we also consider other solutions based on parallel turbo codes and extended BCH codes. We show that all these schemes offer very good performance, which approaches the theoretical limits achievable.
△ Less
Submitted 2 October, 2013;
originally announced October 2013.