-
A review of deep learning-based information fusion techniques for multimodal medical image classification
Authors:
Yihao Li,
Mostafa El Habib Daho,
Pierre-Henri Conze,
Rachid Zeghlache,
Hugo Le Boité,
Ramin Tadayoni,
Béatrice Cochener,
Mathieu Lamard,
Gwenolé Quellec
Abstract:
Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the de…
▽ More
Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
LaTiM: Longitudinal representation learning in continuous-time models to predict disease progression
Authors:
Rachid Zeghlache,
Pierre-Henri Conze,
Mostafa El Habib Daho,
Yihao Li,
Hugo Le Boité,
Ramin Tadayoni,
Pascal Massin,
Béatrice Cochener,
Alireza Rezaei,
Ikram Brahim,
Gwenolé Quellec,
Mathieu Lamard
Abstract:
This work proposes a novel framework for analyzing disease progression using time-aware neural ordinary differential equations (NODE). We introduce a "time-aware head" in a framework trained through self-supervised learning (SSL) to leverage temporal information in latent space for data augmentation. This approach effectively integrates NODEs with SSL, offering significant performance improvements…
▽ More
This work proposes a novel framework for analyzing disease progression using time-aware neural ordinary differential equations (NODE). We introduce a "time-aware head" in a framework trained through self-supervised learning (SSL) to leverage temporal information in latent space for data augmentation. This approach effectively integrates NODEs with SSL, offering significant performance improvements compared to traditional methods that lack explicit temporal integration. We demonstrate the effectiveness of our strategy for diabetic retinopathy progression prediction using the OPHDIAT database. Compared to the baseline, all NODE architectures achieve statistically significant improvements in area under the ROC curve (AUC) and Kappa metrics, highlighting the efficacy of pre-training with SSL-inspired approaches. Additionally, our framework promotes stable training for NODEs, a commonly encountered challenge in time-aware modeling.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Guidelines for Cerebrovascular Segmentation: Managing Imperfect Annotations in the context of Semi-Supervised Learning
Authors:
Pierre Rougé,
Pierre-Henri Conze,
Nicolas Passat,
Odyssée Merveille
Abstract:
Segmentation in medical imaging is an essential and often preliminary task in the image processing chain, driving numerous efforts towards the design of robust segmentation algorithms. Supervised learning methods achieve excellent performances when fed with a sufficient amount of labeled data. However, such labels are typically highly time-consuming, error-prone and expensive to produce. Alternati…
▽ More
Segmentation in medical imaging is an essential and often preliminary task in the image processing chain, driving numerous efforts towards the design of robust segmentation algorithms. Supervised learning methods achieve excellent performances when fed with a sufficient amount of labeled data. However, such labels are typically highly time-consuming, error-prone and expensive to produce. Alternatively, semi-supervised learning approaches leverage both labeled and unlabeled data, and are very useful when only a small fraction of the dataset is labeled. They are particularly useful for cerebrovascular segmentation, given that labeling a single volume requires several hours for an expert. In addition to the challenge posed by insufficient annotations, there are concerns regarding annotation consistency. The task of annotating the cerebrovascular tree is inherently ambiguous. Due to the discrete nature of images, the borders and extremities of vessels are often unclear. Consequently, annotations heavily rely on the expert subjectivity and on the underlying clinical objective. These discrepancies significantly increase the complexity of the segmentation task for the model and consequently impair the results. Consequently, it becomes imperative to provide clinicians with precise guidelines to improve the annotation process and construct more uniform datasets. In this article, we investigate the data dependency of deep learning methods within the context of imperfect data and semi-supervised learning, for cerebrovascular segmentation. Specifically, this study compares various state-of-the-art semi-supervised methods based on unsupervised regularization and evaluates their performance in diverse quantity and quality data scenarios. Based on these experiments, we provide guidelines for the annotation and training of cerebrovascular segmentation models.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction
Authors:
Rachid Zeghlache,
Pierre-Henri Conze,
Mostafa El Habib Daho,
Yihao Li,
Alireza Rezaei,
Hugo Le Boité,
Ramin Tadayoni,
Pascal Massin,
Béatrice Cochener,
Ikram Brahim,
Gwenolé Quellec,
Mathieu Lamard
Abstract:
Pre-training strategies based on self-supervised learning (SSL) have proven to be effective pretext tasks for many downstream tasks in computer vision. Due to the significant disparity between medical and natural images, the application of typical SSL is not straightforward in medical imaging. Additionally, those pretext tasks often lack context, which is critical for computer-aided clinical decis…
▽ More
Pre-training strategies based on self-supervised learning (SSL) have proven to be effective pretext tasks for many downstream tasks in computer vision. Due to the significant disparity between medical and natural images, the application of typical SSL is not straightforward in medical imaging. Additionally, those pretext tasks often lack context, which is critical for computer-aided clinical decision support. In this paper, we developed a longitudinal masked auto-encoder (MAE) based on the well-known Transformer-based MAE. In particular, we explored the importance of time-aware position embedding as well as disease progression-aware masking. Taking into account the time between examinations instead of just scheduling them offers the benefit of capturing temporal changes and trends. The masking strategy, for its part, evolves during follow-up to better capture pathological changes, ensuring a more accurate assessment of disease progression. Using OPHDIAT, a large follow-up screening dataset targeting diabetic retinopathy (DR), we evaluated the pre-trained weights on a longitudinal task, which is to predict the severity label of the next visit within 3 years based on the past time series examinations. Our results demonstrated the relevancy of both time-aware position embedding and masking strategies based on disease progression knowledge. Compared to popular baseline models and standard longitudinal Transformers, these simple yet effective extensions significantly enhance the predictive ability of deep classification models.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
DISCOVER: 2-D Multiview Summarization of Optical Coherence Tomography Angiography for Automatic Diabetic Retinopathy Diagnosis
Authors:
Mostafa El Habib Daho,
Yihao Li,
Rachid Zeghlache,
Hugo Le Boité,
Pierre Deman,
Laurent Borderie,
Hugang Ren,
Niranchana Mannivanan,
Capucine Lepicard,
Béatrice Cochener,
Aude Couturier,
Ramin Tadayoni,
Pierre-Henri Conze,
Mathieu Lamard,
Gwenolé Quellec
Abstract:
Diabetic Retinopathy (DR), an ocular complication of diabetes, is a leading cause of blindness worldwide. Traditionally, DR is monitored using Color Fundus Photography (CFP), a widespread 2-D imaging modality. However, DR classifications based on CFP have poor predictive power, resulting in suboptimal DR management. Optical Coherence Tomography Angiography (OCTA) is a recent 3-D imaging modality o…
▽ More
Diabetic Retinopathy (DR), an ocular complication of diabetes, is a leading cause of blindness worldwide. Traditionally, DR is monitored using Color Fundus Photography (CFP), a widespread 2-D imaging modality. However, DR classifications based on CFP have poor predictive power, resulting in suboptimal DR management. Optical Coherence Tomography Angiography (OCTA) is a recent 3-D imaging modality offering enhanced structural and functional information (blood flow) with a wider field of view. This paper investigates automatic DR severity assessment using 3-D OCTA. A straightforward solution to this task is a 3-D neural network classifier. However, 3-D architectures have numerous parameters and typically require many training samples. A lighter solution consists in using 2-D neural network classifiers processing 2-D en-face (or frontal) projections and/or 2-D cross-sectional slices. Such an approach mimics the way ophthalmologists analyze OCTA acquisitions: 1) en-face flow maps are often used to detect avascular zones and neovascularization, and 2) cross-sectional slices are commonly analyzed to detect macular edemas, for instance. However, arbitrary data reduction or selection might result in information loss. Two complementary strategies are thus proposed to optimally summarize OCTA volumes with 2-D images: 1) a parametric en-face projection optimized through deep learning and 2) a cross-sectional slice selection process controlled through gradient-based attribution. The full summarization and DR classification pipeline is trained from end to end. The automatic 2-D summary can be displayed in a viewer or printed in a report to support the decision. We show that the proposed 2-D summarization and classification pipeline outperforms direct 3-D classification with the advantage of improved interpretability.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent Prediction
Authors:
Yihao Li,
Philippe Zhang,
Yubo Tan,
**g Zhang,
Zhihan Wang,
Weili Jiang,
Pierre-Henri Conze,
Mathieu Lamard,
Gwenolé Quellec,
Mostafa El Habib Daho
Abstract:
Myopic macular degeneration is the most common complication of myopia and the primary cause of vision loss in individuals with pathological myopia. Early detection and prompt treatment are crucial in preventing vision impairment due to myopic maculopathy. This was the focus of the Myopic Maculopathy Analysis Challenge (MMAC), in which we participated. In task 1, classification of myopic maculopath…
▽ More
Myopic macular degeneration is the most common complication of myopia and the primary cause of vision loss in individuals with pathological myopia. Early detection and prompt treatment are crucial in preventing vision impairment due to myopic maculopathy. This was the focus of the Myopic Maculopathy Analysis Challenge (MMAC), in which we participated. In task 1, classification of myopic maculopathy, we employed the contrastive learning framework, specifically SimCLR, to enhance classification accuracy by effectively capturing enriched features from unlabeled data. This approach not only improved the intrinsic understanding of the data but also elevated the performance of our classification model. For Task 2 (segmentation of myopic maculopathy plus lesions), we have developed independent segmentation models tailored for different lesion segmentation tasks and implemented a test-time augmentation strategy to further enhance the model's performance. As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy. The results we obtained are promising and have allowed us to position ourselves in the Top 6 of the classification task, the Top 2 of the segmentation task, and the Top 1 of the prediction task. The code is available at \url{https://github.com/liyihao76/MMAC_LaTIM_Solution}.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Longitudinal Self-supervised Learning Using Neural Ordinary Differential Equation
Authors:
Rachid Zeghlache,
Pierre-Henri Conze,
Mostafa El Habib Daho,
Yihao Li,
Hugo Le Boité,
Ramin Tadayoni,
Pascal Massin,
Béatrice Cochener,
Ikram Brahim,
Gwenolé Quellec,
Mathieu Lamard
Abstract:
Longitudinal analysis in medical imaging is crucial to investigate the progressive changes in anatomical structures or disease progression over time. In recent years, a novel class of algorithms has emerged with the goal of learning disease progression in a self-supervised manner, using either pairs of consecutive images or time series of images. By capturing temporal patterns without external lab…
▽ More
Longitudinal analysis in medical imaging is crucial to investigate the progressive changes in anatomical structures or disease progression over time. In recent years, a novel class of algorithms has emerged with the goal of learning disease progression in a self-supervised manner, using either pairs of consecutive images or time series of images. By capturing temporal patterns without external labels or supervision, longitudinal self-supervised learning (LSSL) has become a promising avenue. To better understand this core method, we explore in this paper the LSSL algorithm under different scenarios. The original LSSL is embedded in an auto-encoder (AE) structure. However, conventional self-supervised strategies are usually implemented in a Siamese-like manner. Therefore, (as a first novelty) in this study, we explore the use of Siamese-like LSSL. Another new core framework named neural ordinary differential equation (NODE). NODE is a neural network architecture that learns the dynamics of ordinary differential equations (ODE) through the use of neural networks. Many temporal systems can be described by ODE, including modeling disease progression. We believe that there is an interesting connection to make between LSSL and NODE. This paper aims at providing a better understanding of those core algorithms for learning the disease progression with the mentioned change. In our different experiments, we employ a longitudinal dataset, named OPHDIAT, targeting diabetic retinopathy (DR) follow-up. Our results demonstrate the application of LSSL without including a reconstruction term, as well as the potential of incorporating NODE in conjunction with LSSL.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
LMT: Longitudinal Mixing Training, a Framework to Predict Disease Progression from a Single Image
Authors:
Rachid Zeghlache,
Pierre-Henri Conze,
Mostafa El Habib Daho,
Yihao Li,
Hugo Le boite,
Ramin Tadayoni,
Pascal Massin,
Béatrice Cochener,
Ikram Brahim,
Gwenolé Quellec,
Mathieu Lamard
Abstract:
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression toward earlier and better patient-specific pathology management. However, conventional approaches rarely take advantage of longitudinal information for detection and prediction purposes, especially for Diabetic Retinopathy (DR). In the past years, Mix-up training and pretext tasks w…
▽ More
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression toward earlier and better patient-specific pathology management. However, conventional approaches rarely take advantage of longitudinal information for detection and prediction purposes, especially for Diabetic Retinopathy (DR). In the past years, Mix-up training and pretext tasks with longitudinal context have effectively enhanced DR classification results and captured disease progression. In the meantime, a novel type of neural network named Neural Ordinary Differential Equation (NODE) has been proposed for solving ordinary differential equations, with a neural network treated as a black box. By definition, NODE is well suited for solving time-related problems. In this paper, we propose to combine these three aspects to detect and predict DR progression. Our framework, Longitudinal Mixing Training (LMT), can be considered both as a regularizer and as a pretext task that encodes the disease progression in the latent space. Additionally, we evaluate the trained model weights on a downstream task with a longitudinal context using standard and longitudinal pretext tasks. We introduce a new way to train time-aware models using $t_{mix}$, a weighted average time between two consecutive examinations. We compare our approach to standard mixing training on DR classification using OPHDIAT a longitudinal retinal Color Fundus Photographs (CFP) dataset. We were able to predict whether an eye would develop a severe DR in the following visit using a single image, with an AUC of 0.798 compared to baseline results of 0.641. Our results indicate that our longitudinal pretext task can learn the progression of DR disease and that introducing $t_{mix}$ augmentation is beneficial for time-aware models.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Improved Automatic Diabetic Retinopathy Severity Classification Using Deep Multimodal Fusion of UWF-CFP and OCTA Images
Authors:
Mostafa El Habib Daho,
Yihao Li,
Rachid Zeghlache,
Yapo Cedric Atse,
Hugo Le Boité,
Sophie Bonnin,
Deborah Cosette,
Pierre Deman,
Laurent Borderie,
Capucine Lepicard,
Ramin Tadayoni,
Béatrice Cochener,
Pierre-Henri Conze,
Mathieu Lamard,
Gwenolé Quellec
Abstract:
Diabetic Retinopathy (DR), a prevalent and severe complication of diabetes, affects millions of individuals globally, underscoring the need for accurate and timely diagnosis. Recent advancements in imaging technologies, such as Ultra-WideField Color Fundus Photography (UWF-CFP) imaging and Optical Coherence Tomography Angiography (OCTA), provide opportunities for the early detection of DR but also…
▽ More
Diabetic Retinopathy (DR), a prevalent and severe complication of diabetes, affects millions of individuals globally, underscoring the need for accurate and timely diagnosis. Recent advancements in imaging technologies, such as Ultra-WideField Color Fundus Photography (UWF-CFP) imaging and Optical Coherence Tomography Angiography (OCTA), provide opportunities for the early detection of DR but also pose significant challenges given the disparate nature of the data they produce. This study introduces a novel multimodal approach that leverages these imaging modalities to notably enhance DR classification. Our approach integrates 2D UWF-CFP images and 3D high-resolution 6x6 mm$^3$ OCTA (both structure and flow) images using a fusion of ResNet50 and 3D-ResNet50 models, with Squeeze-and-Excitation (SE) blocks to amplify relevant features. Additionally, to increase the model's generalization capabilities, a multimodal extension of Manifold Mixup, applied to concatenated multimodal features, is implemented. Experimental results demonstrate a remarkable enhancement in DR classification performance with the proposed multimodal approach compared to methods relying on a single modality only. The methodology laid out in this work holds substantial promise for facilitating more accurate, early detection of DR, potentially improving clinical outcomes for patients.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Cross-dimensional transfer learning in medical image segmentation with deep learning
Authors:
Hicham Messaoudi,
Ahror Belaid,
Douraied Ben Salem,
Pierre-Henri Conze
Abstract:
Over the last decade, convolutional neural networks have emerged and advanced the state-of-the-art in various image analysis and computer vision applications. The performance of 2D image classification networks is constantly improving and being trained on databases made of millions of natural images. However, progress in medical image analysis has been hindered by limited annotated data and acquis…
▽ More
Over the last decade, convolutional neural networks have emerged and advanced the state-of-the-art in various image analysis and computer vision applications. The performance of 2D image classification networks is constantly improving and being trained on databases made of millions of natural images. However, progress in medical image analysis has been hindered by limited annotated data and acquisition constraints. These limitations are even more pronounced given the volumetry of medical imaging data. In this paper, we introduce an efficient way to transfer the efficiency of a 2D classification network trained on natural images to 2D, 3D uni- and multi-modal medical image segmentation applications. In this direction, we designed novel architectures based on two key principles: weight transfer by embedding a 2D pre-trained encoder into a higher dimensional U-Net, and dimensional transfer by expanding a 2D segmentation network into a higher dimension one. The proposed networks were tested on benchmarks comprising different modalities: MR, CT, and ultrasound images. Our 2D network ranked first on the CAMUS challenge dedicated to echo-cardiographic data segmentation and surpassed the state-of-the-art. Regarding 2D/3D MR and CT abdominal images from the CHAOS challenge, our approach largely outperformed the other 2D-based methods described in the challenge paper on Dice, RAVD, ASSD, and MSSD scores and ranked third on the online evaluation platform. Our 3D network applied to the BraTS 2022 competition also achieved promising results, reaching an average Dice score of 91.69% (91.22%) for the whole tumor, 83.23% (84.77%) for the tumor core, and 81.75% (83.88%) for enhanced tumor using the approach based on weight (dimensional) transfer. Experimental and qualitative results illustrate the effectiveness of our methods for multi-dimensional medical image segmentation.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Cross-modal tumor segmentation using generative blending augmentation and self training
Authors:
Guillaume Sallé,
Pierre-Henri Conze,
Julien Bert,
Nicolas Boussion,
Dimitris Visvikis,
Vincent Jaouen
Abstract:
\textit{Objectives}: Data scarcity and domain shifts lead to biased training sets that do not accurately represent deployment conditions. A related practical problem is cross-modal image segmentation, where the objective is to segment unlabelled images using previously labelled datasets from other imaging modalities. \textit{Methods}: We propose a cross-modal segmentation method based on conventio…
▽ More
\textit{Objectives}: Data scarcity and domain shifts lead to biased training sets that do not accurately represent deployment conditions. A related practical problem is cross-modal image segmentation, where the objective is to segment unlabelled images using previously labelled datasets from other imaging modalities. \textit{Methods}: We propose a cross-modal segmentation method based on conventional image synthesis boosted by a new data augmentation technique called Generative Blending Augmentation (GBA). GBA leverages a SinGAN model to learn representative generative features from a single training image to diversify realistically tumor appearances. This way, we compensate for image synthesis errors, subsequently improving the generalization power of a downstream segmentation model. The proposed augmentation is further combined to an iterative self-training procedure leveraging pseudo labels at each pass. \textit{Results}: The proposed solution ranked first for vestibular schwannoma (VS) segmentation during the validation and test phases of the MICCAI CrossMoDA 2022 challenge, with best mean Dice similarity and average symmetric surface distance measures. \textit{Conclusion and significance}: Local contrast alteration of tumor appearances and iterative self-training with pseudo labels are likely to lead to performance improvements in a variety of segmentation contexts.
△ Less
Submitted 29 March, 2024; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Multimodal Information Fusion For The Diagnosis Of Diabetic Retinopathy
Authors:
Yihao Li,
Hassan Al Hajj,
Pierre-Henri Conze,
Mostafa EI Habib Daho,
Sophie Bonnin,
Hugang Ren,
Niranchana Manivannan,
Stephanie Magazzeni,
Ramin Tadayoni,
Mathieu Lamard,
Gwenole Quellec
Abstract:
Diabetes is a chronic disease characterized by excess sugar in the blood and affects 422 million people worldwide, including 3.3 million in France. One of the frequent complications of diabetes is diabetic retinopathy (DR): it is the leading cause of blindness in the working population of developed countries. As a result, ophthalmology is on the verge of a revolution in screening, diagnosing, and…
▽ More
Diabetes is a chronic disease characterized by excess sugar in the blood and affects 422 million people worldwide, including 3.3 million in France. One of the frequent complications of diabetes is diabetic retinopathy (DR): it is the leading cause of blindness in the working population of developed countries. As a result, ophthalmology is on the verge of a revolution in screening, diagnosing, and managing of pathologies. This upheaval is led by the arrival of technologies based on artificial intelligence. The "Evaluation intelligente de la rétinopathie diabétique" (EviRed) project uses artificial intelligence to answer a medical need: replacing the current classification of diabetic retinopathy which is mainly based on outdated fundus photography and providing an insufficient prediction precision. EviRed exploits modern fundus imaging devices and artificial intelligence to properly integrate the vast amount of data they provide with other available medical data of the patient. The goal is to improve diagnosis and prediction and help ophthalmologists to make better decisions during diabetic retinopathy follow-up. In this study, we investigate the fusion of different modalities acquired simultaneously with a PLEXElite 9000 (Carl Zeiss Meditec Inc. Dublin, California, USA), namely 3-D structural optical coherence tomography (OCT), 3-D OCT angiography (OCTA) and 2-D Line Scanning Ophthalmoscope (LSO), for the automatic detection of proliferative DR.
△ Less
Submitted 20 March, 2023;
originally announced April 2023.
-
Segmentation, Classification, and Quality Assessment of UW-OCTA Images for the Diagnosis of Diabetic Retinopathy
Authors:
Yihao Li,
Rachid Zeghlache,
Ikram Brahim,
Hui Xu,
Yubo Tan,
Pierre-Henri Conze,
Mathieu Lamard,
Gwenolé Quellec,
Mostafa El Habib Daho
Abstract:
Diabetic Retinopathy (DR) is a severe complication of diabetes that can cause blindness. Although effective treatments exist (notably laser) to slow the progression of the disease and prevent blindness, the best treatment remains prevention through regular check-ups (at least once a year) with an ophthalmologist. Optical Coherence Tomography Angiography (OCTA) allows for the visualization of the r…
▽ More
Diabetic Retinopathy (DR) is a severe complication of diabetes that can cause blindness. Although effective treatments exist (notably laser) to slow the progression of the disease and prevent blindness, the best treatment remains prevention through regular check-ups (at least once a year) with an ophthalmologist. Optical Coherence Tomography Angiography (OCTA) allows for the visualization of the retinal vascularization, and the choroid at the microvascular level in great detail. This allows doctors to diagnose DR with more precision. In recent years, algorithms for DR diagnosis have emerged along with the development of deep learning and the improvement of computer hardware. However, these usually focus on retina photography. There are no current methods that can automatically analyze DR using Ultra-Wide OCTA (UW-OCTA). The Diabetic Retinopathy Analysis Challenge 2022 (DRAC22) provides a standardized UW-OCTA dataset to train and test the effectiveness of various algorithms on three tasks: lesions segmentation, quality assessment, and DR grading. In this paper, we will present our solutions for the three tasks of the DRAC22 challenge. The obtained results are promising and have allowed us to position ourselves in the TOP 5 of the segmentation task, the TOP 4 of the quality assessment task, and the TOP 3 of the DR grading task. The code is available at \url{https://github.com/Mostafa-EHD/Diabetic_Retinopathy_OCTA}.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Multimodal Information Fusion for Glaucoma and DR Classification
Authors:
Yihao Li,
Mostafa El Habib Daho,
Pierre-Henri Conze,
Hassan Al Hajj,
Sophie Bonnin,
Hugang Ren,
Niranchana Manivannan,
Stephanie Magazzeni,
Ramin Tadayoni,
Béatrice Cochener,
Mathieu Lamard,
Gwenolé Quellec
Abstract:
Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments. In recent years, multiple imaging techniques have been used in clinical practice for retinal analysis: 2D fundus photographs, 3D optical coherence tomography (OCT) and 3D OCT angiography, etc. Our paper investigates three multimodal i…
▽ More
Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments. In recent years, multiple imaging techniques have been used in clinical practice for retinal analysis: 2D fundus photographs, 3D optical coherence tomography (OCT) and 3D OCT angiography, etc. Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks: early fusion, intermediate fusion, and hierarchical fusion. The commonly used early and intermediate fusions are simple but do not fully exploit the complementary information between modalities. We developed a hierarchical fusion approach that focuses on combining features across multiple dimensions of the network, as well as exploring the correlation between modalities. These approaches were applied to glaucoma and diabetic retinopathy classification, using the public GAMMA dataset (fundus photographs and OCT) and a private dataset of PlexElite 9000 (Carl Zeis Meditec Inc.) OCT angiography acquisitions, respectively. Our hierarchical fusion method performed the best in both cases and paved the way for better clinical diagnosis.
△ Less
Submitted 5 September, 2022; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Detection of diabetic retinopathy using longitudinal self-supervised learning
Authors:
Rachid Zeghlache,
Pierre-Henri Conze,
Mostafa El Habib Daho,
Ramin Tadayoni,
Pascal Massin,
Béatrice Cochener,
Gwenolé Quellec,
Mathieu Lamard
Abstract:
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression towards earlier and better patient-specific pathology management. However, conventional approaches for detecting diabetic retinopathy (DR) rarely take advantage of longitudinal information to improve DR analysis. In this work, we investigate the benefit of exploiting self-supervised…
▽ More
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression towards earlier and better patient-specific pathology management. However, conventional approaches for detecting diabetic retinopathy (DR) rarely take advantage of longitudinal information to improve DR analysis. In this work, we investigate the benefit of exploiting self-supervised learning with a longitudinal nature for DR diagnosis purposes. We compare different longitudinal self-supervised learning (LSSL) methods to model the disease progression from longitudinal retinal color fundus photographs (CFP) to detect early DR severity changes using a pair of consecutive exams. The experiments were conducted on a longitudinal DR screening dataset with or without those trained encoders (LSSL) acting as a longitudinal pretext task. Results achieve an AUC of 0.875 for the baseline (model trained from scratch) and an AUC of 0.96 (95% CI: 0.9593-0.9655 DeLong test) with a p-value < 2.2e-16 on early fusion using a simple ResNet alike architecture with frozen LSSL weights, suggesting that the LSSL latent space enables to encode the dynamic of DR progression.
△ Less
Submitted 24 March, 2024; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Map** the ocular surface from monocular videos with an application to dry eye disease grading
Authors:
Ikram Brahim,
Mathieu Lamard,
Anas-Alexis Benyoussef,
Pierre-Henri Conze,
Béatrice Cochener,
Divi Cornec,
Gwenolé Quellec
Abstract:
With a prevalence of 5 to 50%, Dry Eye Disease (DED) is one of the leading reasons for ophthalmologist consultations. The diagnosis and quantification of DED usually rely on ocular surface analysis through slit-lamp examinations. However, evaluations are subjective and non-reproducible. To improve the diagnosis, we propose to 1) track the ocular surface in 3-D using video recordings acquired durin…
▽ More
With a prevalence of 5 to 50%, Dry Eye Disease (DED) is one of the leading reasons for ophthalmologist consultations. The diagnosis and quantification of DED usually rely on ocular surface analysis through slit-lamp examinations. However, evaluations are subjective and non-reproducible. To improve the diagnosis, we propose to 1) track the ocular surface in 3-D using video recordings acquired during examinations, and 2) grade the severity using registered frames. Our registration method uses unsupervised image-to-depth learning. These methods learn depth from lights and shadows and estimate pose based on depth maps. However, DED examinations undergo unresolved challenges including a moving light source, transparent ocular tissues, etc. To overcome these and estimate the ego-motion, we implement joint CNN architectures with multiple losses incorporating prior known information, namely the shape of the eye, through semantic segmentation as well as sphere fitting. The achieved tracking errors outperform the state-of-the-art, with a mean Euclidean distance as low as 0.48% of the image width on our test set. This registration improves the DED severity classification by a 0.20 AUC difference. The proposed approach is the first to address DED diagnosis with supervision from monocular videos
△ Less
Submitted 5 September, 2022; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Generalizable multi-task, multi-domain deep segmentation of sparse pediatric imaging datasets via multi-scale contrastive regularization and multi-joint anatomical priors
Authors:
Arnaud Boutillon,
Pierre-Henri Conze,
Christelle Pons,
Valérie Burdin,
Bhushan Borotikar
Abstract:
Clinical diagnosis of the pediatric musculoskeletal system relies on the analysis of medical imaging examinations. In the medical image processing pipeline, semantic segmentation using deep learning algorithms enables an automatic generation of patient-specific three-dimensional anatomical models which are crucial for morphological evaluation. However, the scarcity of pediatric imaging resources m…
▽ More
Clinical diagnosis of the pediatric musculoskeletal system relies on the analysis of medical imaging examinations. In the medical image processing pipeline, semantic segmentation using deep learning algorithms enables an automatic generation of patient-specific three-dimensional anatomical models which are crucial for morphological evaluation. However, the scarcity of pediatric imaging resources may result in reduced accuracy and generalization performance of individual deep segmentation models. In this study, we propose to design a novel multi-task, multi-domain learning framework in which a single segmentation network is optimized over the union of multiple datasets arising from distinct parts of the anatomy. Unlike previous approaches, we simultaneously consider multiple intensity domains and segmentation tasks to overcome the inherent scarcity of pediatric data while leveraging shared features between imaging datasets. To further improve generalization capabilities, we employ a transfer learning scheme from natural image classification, along with a multi-scale contrastive regularization aimed at promoting domain-specific clusters in the shared representations, and multi-joint anatomical priors to enforce anatomically consistent predictions. We evaluate our contributions for performing bone segmentation using three scarce and pediatric imaging datasets of the ankle, knee, and shoulder joints. Our results demonstrate that the proposed approach outperforms individual, transfer, and shared segmentation schemes in Dice metric with statistically sufficient margins. The proposed model brings new perspectives towards intelligent use of imaging resources and better management of pediatric musculoskeletal disorders.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Regularized directional representations for medical image registration
Authors:
Vincent Jaouen,
Pierre-Henri Conze,
Guillaume Dardenne,
Julien Bert,
Dimitris Visvikis
Abstract:
In image registration, many efforts have been devoted to the development of alternatives to the popular normalized mutual information criterion. Concurrently to these efforts, an increasing number of works have demonstrated that substantial gains in registration accuracy can also be achieved by aligning structural representations of images rather than images themselves. Following this research pat…
▽ More
In image registration, many efforts have been devoted to the development of alternatives to the popular normalized mutual information criterion. Concurrently to these efforts, an increasing number of works have demonstrated that substantial gains in registration accuracy can also be achieved by aligning structural representations of images rather than images themselves. Following this research path, we propose a new method for mono- and multimodal image registration based on the alignment of regularized vector fields derived from structural information such as gradient vector flow fields, a technique we call \textit{vector field similarity}. Our approach can be combined in a straightforward fashion with any existing registration framework by substituting vector field similarity to intensity-based registration. In our experiments, we show that the proposed approach compares favourably with conventional image alignment on several public image datasets using a diversity of imaging modalities and anatomical locations.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Multi-Task, Multi-Domain Deep Segmentation with Shared Representations and Contrastive Regularization for Sparse Pediatric Datasets
Authors:
Arnaud Boutillon,
Pierre-Henri Conze,
Christelle Pons,
Valérie Burdin,
Bhushan Borotikar
Abstract:
Automatic segmentation of magnetic resonance (MR) images is crucial for morphological evaluation of the pediatric musculoskeletal system in clinical practice. However, the accuracy and generalization performance of individual segmentation models are limited due to the restricted amount of annotated pediatric data. Hence, we propose to train a segmentation model on multiple datasets, arising from d…
▽ More
Automatic segmentation of magnetic resonance (MR) images is crucial for morphological evaluation of the pediatric musculoskeletal system in clinical practice. However, the accuracy and generalization performance of individual segmentation models are limited due to the restricted amount of annotated pediatric data. Hence, we propose to train a segmentation model on multiple datasets, arising from different parts of the anatomy, in a multi-task and multi-domain learning framework. This approach allows to overcome the inherent scarcity of pediatric data while benefiting from a more robust shared representation. The proposed segmentation network comprises shared convolutional filters, domain-specific batch normalization parameters that compute the respective dataset statistics and a domain-specific segmentation layer. Furthermore, a supervised contrastive regularization is integrated to further improve generalization capabilities, by promoting intra-domain similarity and impose inter-domain margins in embedded space. We evaluate our contributions on two pediatric imaging datasets of the ankle and shoulder joints for bone segmentation. Results demonstrate that the proposed model outperforms state-of-the-art approaches.
△ Less
Submitted 2 February, 2022; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Multi-Structure Deep Segmentation with Shape Priors and Latent Adversarial Regularization
Authors:
Arnaud Boutillon,
Bhushan Borotikar,
Christelle Pons,
Valérie Burdin,
Pierre-Henri Conze
Abstract:
Automatic segmentation of the musculoskeletal system in pediatric magnetic resonance (MR) images is a challenging but crucial task for morphological evaluation in clinical practice. We propose a deep learning-based regularized segmentation method for multi-structure bone delineation in MR images, designed to overcome the inherent scarcity and heterogeneity of pediatric data. Based on a newly devis…
▽ More
Automatic segmentation of the musculoskeletal system in pediatric magnetic resonance (MR) images is a challenging but crucial task for morphological evaluation in clinical practice. We propose a deep learning-based regularized segmentation method for multi-structure bone delineation in MR images, designed to overcome the inherent scarcity and heterogeneity of pediatric data. Based on a newly devised shape code discriminator, our adversarial regularization scheme enforces the deep network to follow a learnt shape representation of the anatomy. The novel shape priors based adversarial regularization (SPAR) exploits latent shape codes arising from ground truth and predicted masks to guide the segmentation network towards more consistent and plausible predictions. Our contribution is compared to state-of-the-art regularization methods on two pediatric musculoskeletal imaging datasets from ankle and shoulder joints.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
Efficient embedding network for 3D brain tumor segmentation
Authors:
Hicham Messaoudi,
Ahror Belaid,
Mohamed Lamine Allaoui,
Ahcene Zetout,
Mohand Said Allili,
Souhil Tliba,
Douraied Ben Salem,
Pierre-Henri Conze
Abstract:
3D medical image processing with deep learning greatly suffers from a lack of data. Thus, studies carried out in this field are limited compared to works related to 2D natural image analysis, where very large datasets exist. As a result, powerful and efficient 2D convolutional neural networks have been developed and trained. In this paper, we investigate a way to transfer the performance of a two-…
▽ More
3D medical image processing with deep learning greatly suffers from a lack of data. Thus, studies carried out in this field are limited compared to works related to 2D natural image analysis, where very large datasets exist. As a result, powerful and efficient 2D convolutional neural networks have been developed and trained. In this paper, we investigate a way to transfer the performance of a two-dimensional classiffication network for the purpose of three-dimensional semantic segmentation of brain tumors. We propose an asymmetric U-Net network by incorporating the EfficientNet model as part of the encoding branch. As the input data is in 3D, the first layers of the encoder are devoted to the reduction of the third dimension in order to fit the input of the EfficientNet network. Experimental results on validation and test data from the BraTS 2020 challenge demonstrate that the proposed method achieve promising performance.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
Multi-structure bone segmentation in pediatric MR images with combined regularization from shape priors and adversarial network
Authors:
Arnaud Boutillon,
Bhushan Borotikar,
Valérie Burdin,
Pierre-Henri Conze
Abstract:
Morphological and diagnostic evaluation of pediatric musculoskeletal system is crucial in clinical practice. However, most segmentation models do not perform well on scarce pediatric imaging data. We propose a new pre-trained regularized convolutional encoder-decoder network for the challenging task of segmenting heterogeneous pediatric magnetic resonance (MR) images. To this end, we have conceive…
▽ More
Morphological and diagnostic evaluation of pediatric musculoskeletal system is crucial in clinical practice. However, most segmentation models do not perform well on scarce pediatric imaging data. We propose a new pre-trained regularized convolutional encoder-decoder network for the challenging task of segmenting heterogeneous pediatric magnetic resonance (MR) images. To this end, we have conceived a novel optimization scheme for the segmentation network which comprises additional regularization terms to the loss function. In order to obtain globally consistent predictions, we incorporate a shape priors based regularization, derived from a non-linear shape representation learnt by an auto-encoder. Additionally, an adversarial regularization computed by a discriminator is integrated to encourage precise delineations. The proposed method is evaluated for the task of multi-bone segmentation on two scarce pediatric imaging datasets from ankle and shoulder joints, comprising pathological as well as healthy examinations. The proposed method performed either better or at par with previously proposed approaches for Dice, sensitivity, specificity, maximum symmetric surface distance, average symmetric surface distance, and relative absolute volume difference metrics. We illustrate that the proposed approach can be easily integrated into various bone segmentation strategies and can improve the prediction accuracy of models pre-trained on large non-medical images databases. The obtained results bring new perspectives for the management of pediatric musculoskeletal disorders.
△ Less
Submitted 12 July, 2022; v1 submitted 15 September, 2020;
originally announced September 2020.
-
ExplAIn: Explanatory Artificial Intelligence for Diabetic Retinopathy Diagnosis
Authors:
Gwenolé Quellec,
Hassan Al Hajj,
Mathieu Lamard,
Pierre-Henri Conze,
Pascale Massin,
Béatrice Cochener
Abstract:
In recent years, Artificial Intelligence (AI) has proven its relevance for medical decision support. However, the "black-box" nature of successful AI algorithms still holds back their wide-spread deployment. In this paper, we describe an eXplanatory Artificial Intelligence (XAI) that reaches the same level of performance as black-box AI, for the task of classifying Diabetic Retinopathy (DR) severi…
▽ More
In recent years, Artificial Intelligence (AI) has proven its relevance for medical decision support. However, the "black-box" nature of successful AI algorithms still holds back their wide-spread deployment. In this paper, we describe an eXplanatory Artificial Intelligence (XAI) that reaches the same level of performance as black-box AI, for the task of classifying Diabetic Retinopathy (DR) severity using Color Fundus Photography (CFP). This algorithm, called ExplAIn, learns to segment and categorize lesions in images; the final image-level classification directly derives from these multivariate lesion segmentations. The novelty of this explanatory framework is that it is trained from end to end, with image supervision only, just like black-box AI algorithms: the concepts of lesions and lesion categories emerge by themselves. For improved lesion localization, foreground/background separation is trained through self-supervision, in such a way that occluding foreground pixels transforms the input image into a healthy-looking image. The advantage of such an architecture is that automatic diagnoses can be explained simply by an image and/or a few sentences. ExplAIn is evaluated at the image level and at the pixel level on various CFP image datasets. We expect this new framework, which jointly offers high classification performance and explainability, to facilitate AI deployment.
△ Less
Submitted 22 July, 2021; v1 submitted 13 August, 2020;
originally announced August 2020.
-
Two-stage multi-scale breast mass segmentation for full mammogram analysis without user intervention
Authors:
Yutong Yan,
Pierre-Henri Conze,
Gwenolé Quellec,
Mathieu Lamard,
Béatrice Cochener,
Gouenou Coatrieux
Abstract:
Mammography is the primary imaging modality used for early detection and diagnosis of breast cancer. X-ray mammogram analysis mainly refers to the localization of suspicious regions of interest followed by segmentation, towards further lesion classification into benign versus malignant. Among diverse types of breast abnormalities, masses are the most important clinical findings of breast carcinoma…
▽ More
Mammography is the primary imaging modality used for early detection and diagnosis of breast cancer. X-ray mammogram analysis mainly refers to the localization of suspicious regions of interest followed by segmentation, towards further lesion classification into benign versus malignant. Among diverse types of breast abnormalities, masses are the most important clinical findings of breast carcinomas. However, manually segmenting breast masses from native mammograms is time-consuming and error-prone. Therefore, an integrated computer-aided diagnosis system is required to assist clinicians for automatic and precise breast mass delineation. In this work, we present a two-stage multi-scale pipeline that provides accurate mass contours from high-resolution full mammograms. First, we propose an extended deep detector integrating a multi-scale fusion strategy for automated mass localization. Second, a convolutional encoder-decoder network using nested and dense skip connections is employed to fine-delineate candidate masses. Unlike most previous studies based on segmentation from regions, our framework handles mass segmentation from native full mammograms without any user intervention. Trained on INbreast and DDSM-CBIS public datasets, the pipeline achieves an overall average Dice of 80.44% on INbreast test images, outperforming state-of-the-art. Our system shows promising accuracy as an automatic full-image mass segmentation system. Extensive experiments reveals robustness against the diversity of size, shape and appearance of breast masses, towards better interaction-free computer-aided diagnosis.
△ Less
Submitted 8 December, 2020; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Abdominal multi-organ segmentation with cascaded convolutional and adversarial deep networks
Authors:
Pierre-Henri Conze,
Ali Emre Kavur,
Emilie Cornec-Le Gall,
Naciye Sinem Gezer,
Yannick Le Meur,
M. Alper Selver,
François Rousseau
Abstract:
Objective : Abdominal anatomy segmentation is crucial for numerous applications from computer-assisted diagnosis to image-guided surgery. In this context, we address fully-automated multi-organ segmentation from abdominal CT and MR images using deep learning. Methods: The proposed model extends standard conditional generative adversarial networks. Additionally to the discriminator which enforces t…
▽ More
Objective : Abdominal anatomy segmentation is crucial for numerous applications from computer-assisted diagnosis to image-guided surgery. In this context, we address fully-automated multi-organ segmentation from abdominal CT and MR images using deep learning. Methods: The proposed model extends standard conditional generative adversarial networks. Additionally to the discriminator which enforces the model to create realistic organ delineations, it embeds cascaded partially pre-trained convolutional encoder-decoders as generator. Encoder fine-tuning from a large amount of non-medical images alleviates data scarcity limitations. The network is trained end-to-end to benefit from simultaneous multi-level segmentation refinements using auto-context. Results : Employed for healthy liver, kidneys and spleen segmentation, our pipeline provides promising results by outperforming state-of-the-art encoder-decoder schemes. Followed for the Combined Healthy Abdominal Organ Segmentation (CHAOS) challenge organized in conjunction with the IEEE International Symposium on Biomedical Imaging 2019, it gave us the first rank for three competition categories: liver CT, liver MR and multi-organ MR segmentation. Conclusion : Combining cascaded convolutional and adversarial networks strengthens the ability of deep learning pipelines to automatically delineate multiple abdominal organs, with good generalization capability. Significance : The comprehensive evaluation provided suggests that better guidance could be achieved to help clinicians in abdominal image interpretation and clinical decision making.
△ Less
Submitted 26 January, 2020;
originally announced January 2020.
-
CHAOS Challenge -- Combined (CT-MR) Healthy Abdominal Organ Segmentation
Authors:
A. Emre Kavur,
N. Sinem Gezer,
Mustafa Barış,
Sinem Aslan,
Pierre-Henri Conze,
Vladimir Groza,
Duc Duy Pham,
Soumick Chatterjee,
Philipp Ernst,
Savaş Özkan,
Bora Baydar,
Dmitry Lachinov,
Shuo Han,
Josef Pauli,
Fabian Isensee,
Matthias Perkonigg,
Rachana Sathish,
Ronnie Rajan,
Debdoot Sheet,
Gurbandurdy Dovletov,
Oliver Speck,
Andreas Nürnberger,
Klaus H. Maier-Hein,
Gözde Bozdağı Akar,
Gözde Ünal
, et al. (2 additional authors not shown)
Abstract:
Segmentation of abdominal organs has been a comprehensive, yet unresolved, research field for many years. In the last decade, intensive developments in deep learning (DL) have introduced new state-of-the-art segmentation systems. In order to expand the knowledge on these topics, the CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation challenge has been organized in conjunction with IEEE…
▽ More
Segmentation of abdominal organs has been a comprehensive, yet unresolved, research field for many years. In the last decade, intensive developments in deep learning (DL) have introduced new state-of-the-art segmentation systems. In order to expand the knowledge on these topics, the CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation challenge has been organized in conjunction with IEEE International Symposium on Biomedical Imaging (ISBI), 2019, in Venice, Italy. CHAOS provides both abdominal CT and MR data from healthy subjects for single and multiple abdominal organ segmentation. Five different but complementary tasks have been designed to analyze the capabilities of current approaches from multiple perspectives. The results are investigated thoroughly, compared with manual annotations and interactive methods. The analysis shows that the performance of DL models for single modality (CT / MR) can show reliable volumetric analysis performance (DICE: 0.98 $\pm$ 0.00 / 0.95 $\pm$ 0.01) but the best MSSD performance remain limited (21.89 $\pm$ 13.94 / 20.85 $\pm$ 10.63 mm). The performances of participating models decrease significantly for cross-modality tasks for the liver (DICE: 0.88 $\pm$ 0.15 MSSD: 36.33 $\pm$ 21.97 mm) and all organs (DICE: 0.85 $\pm$ 0.21 MSSD: 33.17 $\pm$ 38.93 mm). Despite contrary examples on different applications, multi-tasking DL models designed to segment all organs seem to perform worse compared to organ-specific ones (performance drop around 5\%). Besides, such directions of further research for cross-modality segmentation would significantly support real-world clinical applications. Moreover, having more than 1500 participants, another important contribution of the paper is the analysis on shortcomings of challenge organizations such as the effects of multiple submissions and peeking phenomena.
△ Less
Submitted 7 January, 2021; v1 submitted 17 January, 2020;
originally announced January 2020.
-
Combining Shape Priors with Conditional Adversarial Networks for Improved Scapula Segmentation in MR images
Authors:
Arnaud Boutillon,
Bhushan Borotikar,
Valérie Burdin,
Pierre-Henri Conze
Abstract:
This paper proposes an automatic method for scapula bone segmentation from Magnetic Resonance (MR) images using deep learning. The purpose of this work is to incorporate anatomical priors into a conditional adversarial framework, given a limited amount of heterogeneous annotated images. Our approach encourages the segmentation model to follow the global anatomical properties of the underlying anat…
▽ More
This paper proposes an automatic method for scapula bone segmentation from Magnetic Resonance (MR) images using deep learning. The purpose of this work is to incorporate anatomical priors into a conditional adversarial framework, given a limited amount of heterogeneous annotated images. Our approach encourages the segmentation model to follow the global anatomical properties of the underlying anatomy through a learnt non-linear shape representation while the adversarial contribution refines the model by promoting realistic delineations. These contributions are evaluated on a dataset of 15 pediatric shoulder examinations, and compared to state-of-the-art architectures including UNet and recent derivatives. The significant improvements achieved bring new perspectives for the pre-operative management of musculo-skeletal diseases.
△ Less
Submitted 23 January, 2020; v1 submitted 20 October, 2019;
originally announced October 2019.
-
Automatic detection of rare pathologies in fundus photographs using few-shot learning
Authors:
Gwenolé Quellec,
Mathieu Lamard,
Pierre-Henri Conze,
Pascale Massin,
Béatrice Cochener
Abstract:
In the last decades, large datasets of fundus photographs have been collected in diabetic retinopathy (DR) screening networks. Through deep learning, these datasets were used to train automatic detectors for DR and a few other frequent pathologies, with the goal to automate screening. One challenge limits the adoption of such systems so far: automatic detectors ignore rare conditions that ophthalm…
▽ More
In the last decades, large datasets of fundus photographs have been collected in diabetic retinopathy (DR) screening networks. Through deep learning, these datasets were used to train automatic detectors for DR and a few other frequent pathologies, with the goal to automate screening. One challenge limits the adoption of such systems so far: automatic detectors ignore rare conditions that ophthalmologists currently detect, such as papilledema or anterior ischemic optic neuropathy. The reason is that standard deep learning requires too many examples of these conditions. However, this limitation can be addressed with few-shot learning, a machine learning paradigm where a classifier has to generalize to a new category not seen in training, given only a few examples of this category. This paper presents a new few-shot learning framework that extends convolutional neural networks (CNNs), trained for frequent conditions, with an unsupervised probabilistic model for rare condition detection. It is based on the observation that CNNs often perceive photographs containing the same anomalies as similar, even though these CNNs were trained to detect unrelated conditions. This observation was based on the t-SNE visualization tool, which we decided to incorporate in our probabilistic model. Experiments on a dataset of 164,660 screening examinations from the OPHDIAT screening network show that 37 conditions, out of 41, can be detected with an area under the ROC curve (AUC) greater than 0.8 (average AUC: 0.938). In particular, this framework significantly outperforms other frameworks for detecting rare conditions, including multitask learning, transfer learning and Siamese networks, another few-shot learning solution. We expect these richer predictions to trigger the adoption of automated eye pathology screening, which will revolutionize clinical practice in ophthalmology.
△ Less
Submitted 10 February, 2020; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Unsupervised learning-based long-term superpixel tracking
Authors:
Pierre-Henri Conze,
Florian Tilquin,
Mathieu Lamard,
Fabrice Heitz,
Gwenolé Quellec
Abstract:
Finding correspondences between structural entities decomposing images is of high interest for computer vision applications. In particular, we analyze how to accurately track superpixels - visual primitives generated by aggregating adjacent pixels sharing similar characteristics - over extended time periods relying on unsupervised learning and temporal integration. A two-step video processing pipe…
▽ More
Finding correspondences between structural entities decomposing images is of high interest for computer vision applications. In particular, we analyze how to accurately track superpixels - visual primitives generated by aggregating adjacent pixels sharing similar characteristics - over extended time periods relying on unsupervised learning and temporal integration. A two-step video processing pipeline dedicated to long-term superpixel tracking is proposed. First, unsupervised learning-based superpixel matching provides correspondences between consecutive and distant frames using new context-rich features extended from greyscale to multi-channel and forward-backward consistency contraints. Resulting elementary matches are then combined along multi-step paths running through the whole sequence with various inter-frame distances. This produces a large set of candidate long-term superpixel pairings upon which majority voting is performed. Video object tracking experiments demonstrate the accuracy of our elementary estimator against state-of-the-art methods and proves the ability of multi-step integration to provide accurate long-term superpixel matches compared to usual direct and sequential integration.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Healthy versus pathological learning transferability in shoulder muscle MRI segmentation using deep convolutional encoder-decoders
Authors:
Pierre-Henri Conze,
Sylvain Brochard,
Valérie Burdin,
Frances T. Sheehan,
Christelle Pons
Abstract:
Automatic segmentation of pathological shoulder muscles in patients with musculo-skeletal diseases is a challenging task due to the huge variability in muscle shape, size, location, texture and injury. A reliable fully-automated segmentation method from magnetic resonance images could greatly help clinicians to plan therapeutic interventions and predict interventional outcomes while eliminating ti…
▽ More
Automatic segmentation of pathological shoulder muscles in patients with musculo-skeletal diseases is a challenging task due to the huge variability in muscle shape, size, location, texture and injury. A reliable fully-automated segmentation method from magnetic resonance images could greatly help clinicians to plan therapeutic interventions and predict interventional outcomes while eliminating time consuming manual segmentation efforts. The purpose of this work is three-fold. First, we investigate the feasibility of pathological shoulder muscle segmentation using deep learning techniques, given a very limited amount of available annotated pediatric data. Second, we address the learning transferability from healthy to pathological data by comparing different learning schemes in terms of model generalizability. Third, extended versions of deep convolutional encoder-decoder architectures using encoders pre-trained on non-medical data are proposed to improve the segmentation accuracy. Methodological aspects are evaluated in a leave-one-out fashion on a dataset of 24 shoulder examinations from patients with obstetrical brachial plexus palsy and focus on 4 different muscles including deltoid as well as infraspinatus, supraspinatus and subscapularis from the rotator cuff. The most relevant segmentation model is partially pre-trained on ImageNet and jointly exploits inter-patient healthy and pathological annotated data. Its performance reaches Dice scores of 82.4%, 82.0%, 71.0% and 82.8% for deltoid, infraspinatus, supraspinatus and subscapularis muscles. Absolute surface estimation errors are all below 83mm$^2$ except for supraspinatus with 134.6mm$^2$. These contributions offer new perspectives for force inference in the context of musculo-skeletal disorder management.
△ Less
Submitted 27 April, 2020; v1 submitted 6 January, 2019;
originally announced January 2019.
-
Adaptive strategy for superpixel-based region-growing image segmentation
Authors:
Mahaman Sani Chaibou,
Pierre-Henri Conze,
Karim Kalti,
Basel Solaiman,
Mohamed Ali Mahjoub
Abstract:
This work presents a region-growing image segmentation approach based on superpixel decomposition. From an initial contour-constrained over-segmentation of the input image, the image segmentation is achieved by iteratively merging similar superpixels into regions. This approach raises two key issues: (1) how to compute the similarity between superpixels in order to perform accurate merging and (2)…
▽ More
This work presents a region-growing image segmentation approach based on superpixel decomposition. From an initial contour-constrained over-segmentation of the input image, the image segmentation is achieved by iteratively merging similar superpixels into regions. This approach raises two key issues: (1) how to compute the similarity between superpixels in order to perform accurate merging and (2) in which order those superpixels must be merged together. In this perspective, we firstly introduce a robust adaptive multi-scale superpixel similarity in which region comparisons are made both at content and common border level. Secondly, we propose a global merging strategy to efficiently guide the region merging process. Such strategy uses an adpative merging criterion to ensure that best region aggregations are given highest priorities. This allows to reach a final segmentation into consistent regions with strong boundary adherence. We perform experiments on the BSDS500 image dataset to highlight to which extent our method compares favorably against other well-known image segmentation algorithms. The obtained results demonstrate the promising potential of the proposed approach.
△ Less
Submitted 17 March, 2018;
originally announced March 2018.
-
Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks
Authors:
Hassan Al Hajj,
Mathieu Lamard,
Pierre-Henri Conze,
Béatrice Cochener,
Gwenolé Quellec
Abstract:
This paper investigates the automatic monitoring of tool usage during a surgery, with potential applications in report generation, surgical training and real-time decision support. Two surgeries are considered: cataract surgery, the most common surgical procedure, and cholecystectomy, one of the most common digestive surgeries. Tool usage is monitored in videos recorded either through a microscope…
▽ More
This paper investigates the automatic monitoring of tool usage during a surgery, with potential applications in report generation, surgical training and real-time decision support. Two surgeries are considered: cataract surgery, the most common surgical procedure, and cholecystectomy, one of the most common digestive surgeries. Tool usage is monitored in videos recorded either through a microscope (cataract surgery) or an endoscope (cholecystectomy). Following state-of-the-art video analysis solutions, each frame of the video is analyzed by convolutional neural networks (CNNs) whose outputs are fed to recurrent neural networks (RNNs) in order to take temporal relationships between events into account. Novelty lies in the way those CNNs and RNNs are trained. Computational complexity prevents the end-to-end training of "CNN+RNN" systems. Therefore, CNNs are usually trained first, independently from the RNNs. This approach is clearly suboptimal for surgical tool analysis: many tools are very similar to one another, but they can generally be differentiated based on past events. CNNs should be trained to extract the most useful visual features in combination with the temporal context. A novel boosting strategy is proposed to achieve this goal: the CNN and RNN parts of the system are simultaneously enriched by progressively adding weak classifiers (either CNNs or RNNs) trained to improve the overall classification accuracy. Experiments were performed in a dataset of 50 cataract surgery videos and a dataset of 80 cholecystectomy videos. Very good classification performance are achieved in both datasets: tool usage could be labeled with an average area under the ROC curve of $A_z = 0.9961$ and $A_z = 0.9939$, respectively, in offline mode (using past, present and future information), and $A_z = 0.9957$ and $A_z = 0.9936$, respectively, in online mode (using past and present information only).
△ Less
Submitted 6 May, 2018; v1 submitted 4 October, 2017;
originally announced October 2017.