Search | arXiv e-print repository

doi 10.59275/j.melba.2024-7fgd

Automated MRI Quality Assessment of Brain T1-weighted MRI in Clinical Data Warehouses: A Transfer Learning Approach Relying on Artefact Simulation

Authors: Sophie Loizillon, Simona Bottani, Stéphane Mabille, Yannick Jacob, Aurélien Maire, Sebastian Ströer, Didier Dormont, Olivier Colliot, Ninon Burgos

Abstract: The emergence of clinical data warehouses (CDWs), which contain the medical data of millions of patients, has paved the way for vast data sharing for research. The quality of MRIs gathered in CDWs differs greatly from what is observed in research settings and reflects a certain clinical reality. Consequently, a significant proportion of these images turns out to be unusable due to their poor quali… ▽ More The emergence of clinical data warehouses (CDWs), which contain the medical data of millions of patients, has paved the way for vast data sharing for research. The quality of MRIs gathered in CDWs differs greatly from what is observed in research settings and reflects a certain clinical reality. Consequently, a significant proportion of these images turns out to be unusable due to their poor quality. Given the massive volume of MRIs contained in CDWs, the manual rating of image quality is impossible. Thus, it is necessary to develop an automated solution capable of effectively identifying corrupted images in CDWs. This study presents an innovative transfer learning method for automated quality control of 3D gradient echo T1-weighted brain MRIs within a CDW, leveraging artefact simulation. We first intentionally corrupt images from research datasets by inducing poorer contrast, adding noise and introducing motion artefacts. Subsequently, three artefact-specific models are pre-trained using these corrupted images to detect distinct types of artefacts. Finally, the models are generalised to routine clinical data through a transfer learning technique, utilising 3660 manually annotated images. The overall image quality is inferred from the results of the three models, each designed to detect a specific type of artefact. Our method was validated on an independent test set of 385 3D gradient echo T1-weighted MRIs. Our proposed approach achieved excellent results for the detection of bad quality MRIs, with a balanced accuracy of over 87%, surpassing our previous approach by 3.5 percent points. Additionally, we achieved a satisfactory balanced accuracy of 79% for the detection of moderate quality MRIs, outperforming our previous performance by 5 percent points. Our framework provides a valuable tool for exploiting the potential of MRIs in CDWs. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:012

Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

arXiv:2401.16363 [pdf, other]

doi 10.59275/j.melba.2024-b87a

Evaluation of pseudo-healthy image reconstruction for anomaly detection with deep generative models: Application to brain FDG PET

Authors: Ravi Hassanaly, Camille Brianceau, Maëlys Solal, Olivier Colliot, Ninon Burgos

Abstract: Over the past years, pseudo-healthy reconstruction for unsupervised anomaly detection has gained in popularity. This approach has the great advantage of not requiring tedious pixel-wise data annotation and offers possibility to generalize to any kind of anomalies, including that corresponding to rare diseases. By training a deep generative model with only images from healthy subjects, the model wi… ▽ More Over the past years, pseudo-healthy reconstruction for unsupervised anomaly detection has gained in popularity. This approach has the great advantage of not requiring tedious pixel-wise data annotation and offers possibility to generalize to any kind of anomalies, including that corresponding to rare diseases. By training a deep generative model with only images from healthy subjects, the model will learn to reconstruct pseudo-healthy images. This pseudo-healthy reconstruction is then compared to the input to detect and localize anomalies. The evaluation of such methods often relies on a ground truth lesion mask that is available for test data, which may not exist depending on the application. We propose an evaluation procedure based on the simulation of realistic abnormal images to validate pseudo-healthy reconstruction methods when no ground truth is available. This allows us to extensively test generative models on different kinds of anomalies and measuring their performance using the pair of normal and abnormal images corresponding to the same subject. It can be used as a preliminary automatic step to validate the capacity of a generative model to reconstruct pseudo-healthy images, before a more advanced validation step that would require clinician's expertise. We apply this framework to the reconstruction of 3D brain FDG PET using a convolutional variational autoencoder with the aim to detect as early as possible the neurodegeneration markers that are specific to dementia such as Alzheimer's disease. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:003

Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

arXiv:2310.11470 [pdf, other]

Classic machine learning methods

Authors: Johann Faouzi, Olivier Colliot

Abstract: In this chapter, we present the main classic machine learning methods. A large part of the chapter is devoted to supervised learning techniques for classification and regression, including nearest-neighbor methods, linear and logistic regressions, support vector machines and tree-based algorithms. We also describe the problem of overfitting as well as strategies to overcome it. We finally provide… ▽ More In this chapter, we present the main classic machine learning methods. A large part of the chapter is devoted to supervised learning techniques for classification and regression, including nearest-neighbor methods, linear and logistic regressions, support vector machines and tree-based algorithms. We also describe the problem of overfitting as well as strategies to overcome it. We finally provide a brief overview of unsupervised learning methods, namely for clustering and dimensionality reduction. △ Less

Submitted 24 May, 2023; originally announced October 2023.

arXiv:2307.10926 [pdf, other]

Confidence intervals for performance estimates in 3D medical image segmentation

Authors: R. El Jurdi, G. Varoquaux, O. Colliot

Abstract: Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure… ▽ More Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across of the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in medical image segmentation. We carry experiments on 3D image segmentation using the standard nnU-net framework, two datasets from the Medical Decathlon challenge and two performance measures: the Dice accuracy and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples. △ Less

Submitted 21 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: 10 pages

arXiv:2302.12980 [pdf, other]

Frequency Disentangled Learning for Segmentation of Midbrain Structures from Quantitative Susceptibility Map** Data

Authors: Guanghui Fu, Gabriel Jimenez, Sophie Loizillon, Lydia Chougar, Didier Dormont, Romain Valabregue, Ninon Burgos, Stéphane Lehéricy, Daniel Racoceanu, Olivier Colliot, the ICEBERG Study Group

Abstract: One often lacks sufficient annotated samples for training deep segmentation models. This is in particular the case for less common imaging modalities such as Quantitative Susceptibility Map** (QSM). It has been shown that deep models tend to fit the target function from low to high frequencies. One may hypothesize that such property can be leveraged for better training of deep learning models. I… ▽ More One often lacks sufficient annotated samples for training deep segmentation models. This is in particular the case for less common imaging modalities such as Quantitative Susceptibility Map** (QSM). It has been shown that deep models tend to fit the target function from low to high frequencies. One may hypothesize that such property can be leveraged for better training of deep learning models. In this paper, we exploit this property to propose a new training method based on frequency-domain disentanglement. It consists of two main steps: i) disentangling the image into high- and low-frequency parts and feature learning; ii) frequency-domain fusion to complete the task. The approach can be used with any backbone segmentation network. We apply the approach to the segmentation of the red and dentate nuclei from QSM data which is particularly relevant for the study of parkinsonian syndromes. We demonstrate that the proposed method provides considerable performance improvements for these tasks. We further applied it to three public datasets from the Medical Segmentation Decathlon (MSD) challenge. For two MSD tasks, it provided smaller but still substantial improvements (up to 7 points of Dice), especially under small training set situations. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 10 pages

arXiv:2211.01353 [pdf, other]

Fourier Disentangled Multimodal Prior Knowledge Fusion for Red Nucleus Segmentation in Brain MRI

Authors: Guanghui Fu, Gabriel Jimenez, Sophie Loizillon, Rosana El Jurdi, Lydia Chougar, Didier Dormont, Romain Valabregue, Ninon Burgos, Stéphane Lehéricy, Daniel Racoceanu, Olivier Colliot, the ICEBERG Study Group

Abstract: Early and accurate diagnosis of parkinsonian syndromes is critical to provide appropriate care to patients and for inclusion in therapeutic trials. The red nucleus is a structure of the midbrain that plays an important role in these disorders. It can be visualized using iron-sensitive magnetic resonance imaging (MRI) sequences. Different iron-sensitive contrasts can be produced with MRI. Combining… ▽ More Early and accurate diagnosis of parkinsonian syndromes is critical to provide appropriate care to patients and for inclusion in therapeutic trials. The red nucleus is a structure of the midbrain that plays an important role in these disorders. It can be visualized using iron-sensitive magnetic resonance imaging (MRI) sequences. Different iron-sensitive contrasts can be produced with MRI. Combining such multimodal data has the potential to improve segmentation of the red nucleus. Current multimodal segmentation algorithms are computationally consuming, cannot deal with missing modalities and need annotations for all modalities. In this paper, we propose a new model that integrates prior knowledge from different contrasts for red nucleus segmentation. The method consists of three main stages. First, it disentangles the image into high-level information representing the brain structure, and low-frequency information representing the contrast. The high-frequency information is then fed into a network to learn anatomical features, while the list of multimodal low-frequency information is processed by another module. Finally, feature fusion is performed to complete the segmentation task. The proposed method was used with several iron-sensitive contrasts (iMag, QSM, R2*, SWI). Experiments demonstrate that our proposed model substantially outperforms a baseline UNet model when the training set size is very small. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.14677 [pdf, other]

How precise are performance estimates for typical medical image segmentation tasks?

Authors: Rosana El Jurdi, Olivier Colliot

Abstract: An important issue in medical image processing is to be able to estimate not only the performances of algorithms but also the precision of the estimation of these performances. Reporting precision typically amounts to reporting standard-error of the mean (SEM) or equivalently confidence intervals. However, this is rarely done in medical image segmentation studies. In this paper, we aim to estimate… ▽ More An important issue in medical image processing is to be able to estimate not only the performances of algorithms but also the precision of the estimation of these performances. Reporting precision typically amounts to reporting standard-error of the mean (SEM) or equivalently confidence intervals. However, this is rarely done in medical image segmentation studies. In this paper, we aim to estimate what is the typical confidence that can be expected in such studies. To that end, we first perform experiments for Dice metric estimation using a standard deep learning model (U-net) and a classical task from the Medical Segmentation Decathlon. We extensively study precision estimation using both Gaussian assumption and bootstrap** (which does not require any assumption on the distribution). We then perform simulations for other test set sizes and performance spreads. Overall, our work shows that small test sets lead to wide confidence intervals (e.g. $\sim$8 points of Dice for 20 samples with $σ\simeq 10$). △ Less

Submitted 24 May, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2209.05097 [pdf, ps, other]

Reproducibility in machine learning for medical imaging

Authors: Olivier Colliot, Elina Thibeau-Sutre, Ninon Burgos

Abstract: Reproducibility is a cornerstone of science, as the replication of findings is the process through which they become knowledge. It is widely considered that many fields of science are undergoing a reproducibility crisis. This has led to the publications of various guidelines in order to improve research reproducibility. This didactic chapter intends at being an introduction to reproducibility fo… ▽ More Reproducibility is a cornerstone of science, as the replication of findings is the process through which they become knowledge. It is widely considered that many fields of science are undergoing a reproducibility crisis. This has led to the publications of various guidelines in order to improve research reproducibility. This didactic chapter intends at being an introduction to reproducibility for researchers in the field of machine learning for medical imaging. We first distinguish between different types of reproducibility. For each of them, we aim at defining it, at describing the requirements to achieve it and at discussing its utility. The chapter ends with a discussion on the benefits of reproducibility and with a plea for a non-dogmatic approach to this concept and its implementation in research practice. △ Less

Submitted 12 September, 2022; originally announced September 2022.

arXiv:2204.07005 [pdf, other]

Interpretability of Machine Learning Methods Applied to Neuroimaging

Authors: Elina Thibeau-Sutre, Sasha Collin, Ninon Burgos, Olivier Colliot

Abstract: Deep learning methods have become very popular for the processing of natural images, and were then successfully adapted to the neuroimaging field. As these methods are non-transparent, interpretability methods are needed to validate them and ensure their reliability. Indeed, it has been shown that deep learning models may obtain high performance even when using irrelevant features, by exploiting b… ▽ More Deep learning methods have become very popular for the processing of natural images, and were then successfully adapted to the neuroimaging field. As these methods are non-transparent, interpretability methods are needed to validate them and ensure their reliability. Indeed, it has been shown that deep learning models may obtain high performance even when using irrelevant features, by exploiting biases in the training set. Such undesirable situations can potentially be detected by using interpretability methods. Recently, many methods have been proposed to interpret neural networks. However, this domain is not mature yet. Machine learning users face two major issues when aiming to interpret their models: which method to choose, and how to assess its reliability? Here, we aim at providing answers to these questions by presenting the most common interpretability methods and metrics developed to assess their reliability, as well as their applications and benchmarks in the neuroimaging context. Note that this is not an exhaustive survey: we aimed to focus on the studies which we found to be the most representative and relevant. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Journal ref: Machine Learning for Brain Disorders, 2022

arXiv:2109.03778 [pdf, other]

doi 10.1117/12.2612912

Axial multi-layer perceptron architecture for automatic segmentation of choroid plexus in multiple sclerosis

Authors: Marius Schmidt-Mengin, Vito A. G. Ricigliano, Benedetta Bodini, Emanuele Morena, Annalisa Colombi, Mariem Hamzaoui, Arya Yazdan Panah, Bruno Stankoff, Olivier Colliot

Abstract: Choroid plexuses (CP) are structures of the ventricles of the brain which produce most of the cerebrospinal fluid (CSF). Several postmortem and in vivo studies have pointed towards their role in the inflammatory process in multiple sclerosis (MS). Automatic segmentation of CP from MRI thus has high value for studying their characteristics in large cohorts of patients. To the best of our knowledge,… ▽ More Choroid plexuses (CP) are structures of the ventricles of the brain which produce most of the cerebrospinal fluid (CSF). Several postmortem and in vivo studies have pointed towards their role in the inflammatory process in multiple sclerosis (MS). Automatic segmentation of CP from MRI thus has high value for studying their characteristics in large cohorts of patients. To the best of our knowledge, the only freely available tool for CP segmentation is FreeSurfer but its accuracy for this specific structure is poor. In this paper, we propose to automatically segment CP from non-contrast enhanced T1-weighted MRI. To that end, we introduce a new model called "Axial-MLP" based on an assembly of Axial multi-layer perceptrons (MLPs). This is inspired by recent works which showed that the self-attention layers of Transformers can be replaced with MLPs. This approach is systematically compared with a standard 3D U-Net, nnU-Net, Freesurfer and FastSurfer. For our experiments, we make use of a dataset of 141 subjects (44 controls and 97 patients with MS). We show that all the tested deep learning (DL) methods outperform FreeSurfer (Dice around 0.7 for DL vs 0.33 for FreeSurfer). Axial-MLP is competitive with U-Nets even though it is slightly less accurate. The conclusions of our paper are two-fold: 1) the studied deep learning methods could be useful tools to study CP in large cohorts of MS patients; 2)~Axial-MLP is a potentially viable alternative to convolutional neural networks for such tasks, although it could benefit from further improvements. △ Less

Submitted 21 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

ACM Class: I.2

Journal ref: Proc. SPIE Medical Imaging 2022

arXiv:2104.08131 [pdf, other]

Automatic quality control of brain T1-weighted magnetic resonance images for a clinical data warehouse

Authors: Simona Bottani, Ninon Burgos, Aurélien Maire, Adam Wild, Sebastian Ströer, Didier Dormont, Olivier Colliot

Abstract: Many studies on machine learning (ML) for computer-aided diagnosis have so far been mostly restricted to high-quality research data. Clinical data warehouses, gathering routine examinations from hospitals, offer great promises for training and validation of ML models in a realistic setting. However, the use of such clinical data warehouses requires quality control (QC) tools. Visual QC by experts… ▽ More Many studies on machine learning (ML) for computer-aided diagnosis have so far been mostly restricted to high-quality research data. Clinical data warehouses, gathering routine examinations from hospitals, offer great promises for training and validation of ML models in a realistic setting. However, the use of such clinical data warehouses requires quality control (QC) tools. Visual QC by experts is time-consuming and does not scale to large datasets. In this paper, we propose a convolutional neural network (CNN) for the automatic QC of 3D T1-weighted brain MRI for a large heterogeneous clinical data warehouse. To that purpose, we used the data warehouse of the hospitals of the Greater Paris area (Assistance Publique-Hôpitaux de Paris [AP-HP]). Specifically, the objectives were: 1) to identify images which are not proper T1-weighted brain MRIs; 2) to identify acquisitions for which gadolinium was injected; 3) to rate the overall image quality. We used 5000 images for training and validation and a separate set of 500 images for testing. In order to train/validate the CNN, the data were annotated by two trained raters according to a visual QC protocol that we specifically designed for application in the setting of a data warehouse. For objectives 1 and 2, our approach achieved excellent accuracy (balanced accuracy and F1-score \textgreater 90\%), similar to the human raters. For objective 3, the performance was good but substantially lower than that of human raters. Nevertheless, the automatic approach accurately identified (balanced accuracy and F1-score \textgreater 80\%) low quality images, which would typically need to be excluded. Overall, our approach shall be useful for exploiting hospital data warehouses in medical image computing. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2003.09260 [pdf]

doi 10.3233/JAD-190594

Accuracy of MRI Classification Algorithms in a Tertiary Memory Center Clinical Routine Cohort

Authors: Alexandre Morin, Jorge Samper-González, Anne Bertrand, Sebastian Stroer, Didier Dormont, Aline Mendes, Pierrick Coupé, Jamila Ahdidan, Marcel Lévy, Dalila Samri, Harald Hampel, Bruno Dubois, Marc Teichmann, Stéphane Epelbaum, Olivier Colliot

Abstract: BACKGROUND:Automated volumetry software (AVS) has recently become widely available to neuroradiologists. MRI volumetry with AVS may support the diagnosis of dementias by identifying regional atrophy. Moreover, automatic classifiers using machine learning techniques have recently emerged as promising approaches to assist diagnosis. However, the performance of both AVS and automatic classifiers has… ▽ More BACKGROUND:Automated volumetry software (AVS) has recently become widely available to neuroradiologists. MRI volumetry with AVS may support the diagnosis of dementias by identifying regional atrophy. Moreover, automatic classifiers using machine learning techniques have recently emerged as promising approaches to assist diagnosis. However, the performance of both AVS and automatic classifiers has been evaluated mostly in the artificial setting of research datasets.OBJECTIVE:Our aim was to evaluate the performance of two AVS and an automatic classifier in the clinical routine condition of a memory clinic.METHODS:We studied 239 patients with cognitive troubles from a single memory center cohort. Using clinical routine T1-weighted MRI, we evaluated the classification performance of: 1) univariate volumetry using two AVS (volBrain and Neuroreader$^{TM}$); 2) Support Vector Machine (SVM) automatic classifier, using either the AVS volumes (SVM-AVS), or whole gray matter (SVM-WGM); 3) reading by two neuroradiologists. The performance measure was the balanced diagnostic accuracy. The reference standard was consensus diagnosis by three neurologists using clinical, biological (cerebrospinal fluid) and imaging data and following international criteria.RESULTS:Univariate AVS volumetry provided only moderate accuracies (46% to 71% with hippocampal volume). The accuracy improved when using SVM-AVS classifier (52% to 85%), becoming close to that of SVM-WGM (52 to 90%). Visual classification by neuroradiologists ranged between SVM-AVS and SVM-WGM.CONCLUSION:In the routine practice of a memory clinic, the use of volumetric measures provided by AVS yields only moderate accuracy. Automatic classifiers can improve accuracy and could be a useful tool to assist diagnosis. △ Less

Submitted 19 March, 2020; originally announced March 2020.

Journal ref: Journal of Alzheimer's Disease, IOS Press, 2020, pp.1-10

arXiv:2003.05169 [pdf, other]

Gaussian Graphical Model exploration and selection in high dimension low sample size setting

Authors: Thomas Lartigue, Simona Bottani, Stephanie Baron, Olivier Colliot, Stanley Durrleman, Stéphanie Allassonnière

Abstract: Gaussian Graphical Models (GGM) are often used to describe the conditional correlations between the components of a random vector. In this article, we compare two families of GGM inference methods: nodewise edge selection and penalised likelihood maximisation. We demonstrate on synthetic data that, when the sample size is small, the two methods produce graphs with either too few or too many edges… ▽ More Gaussian Graphical Models (GGM) are often used to describe the conditional correlations between the components of a random vector. In this article, we compare two families of GGM inference methods: nodewise edge selection and penalised likelihood maximisation. We demonstrate on synthetic data that, when the sample size is small, the two methods produce graphs with either too few or too many edges when compared to the real one. As a result, we propose a composite procedure that explores a family of graphs with an nodewise numerical scheme and selects a candidate among them with an overall likelihood criterion. We demonstrate that, when the number of observations is small, this selection method yields graphs closer to the truth and corresponding to distributions with better KL divergence with regards to the real distribution than the other two. Finally, we show the interest of our algorithm on two concrete cases: first on brain imaging data, then on biological nephrology data. In both cases our results are more in line with current knowledge in each field. △ Less

Submitted 11 March, 2020; originally announced March 2020.

arXiv:1911.08264 [pdf]

Visualization approach to assess the robustness of neural networks for medical image classification

Authors: Elina Thibeau Sutre, Olivier Colliot, Didier Dormont, Ninon Burgos

Abstract: The use of neural networks for diagnosis classification is becoming more and more prevalent in the medical imaging community. However, deep learning method outputs remain hard to explain. Another difficulty is to choose among the large number of techniques developed to analyze how networks learn, as all present different limitations. In this paper, we extended the framework of Fong and Vedaldi [IE… ▽ More The use of neural networks for diagnosis classification is becoming more and more prevalent in the medical imaging community. However, deep learning method outputs remain hard to explain. Another difficulty is to choose among the large number of techniques developed to analyze how networks learn, as all present different limitations. In this paper, we extended the framework of Fong and Vedaldi [IEEE International Conference on Computer Vision (ICCV), 2017] to visualize the training of convolutional neural networks (CNNs) on 3D quantitative neuroimaging data. Our application focuses on the detection of Alzheimer's disease with gray matter probability maps extracted from structural MRI. We first assessed the robustness of the visualization method by studying the coherence of the longitudinal patterns and regions identified by the network. We then studied the stability of the CNN training by computing visualization-based similarity indexes between different re-runs of the CNN. We demonstrated that the areas identified by the CNN were consistent with what is known of Alzheimer's disease and that the visualization approach extract coherent longitudinal patterns. We also showed that the CNN training is not stable and that the areas identified mainly depend on the initialization and the training process. This issue may exist in many other medical studies using deep learning methods on datasets in which the number of samples is too small and the data dimension is high. This means that it may not be possible to rely on deep learning to detect stable regions of interest in this field yet. △ Less

Submitted 10 February, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

Journal ref: SPIE Medical Imaging 2020, Feb 2020, Houston, United States

arXiv:1904.07773 [pdf]

doi 10.1016/j.media.2020.101694

Convolutional Neural Networks for Classification of Alzheimer's Disease: Overview and Reproducible Evaluation

Authors: Junhao Wen, Elina Thibeau-Sutre, Mauricio Diaz-Melo, Jorge Samper-Gonzalez, Alexandre Routier, Simona Bottani, Didier Dormont, Stanley Durrleman, Ninon Burgos, Olivier Colliot

Abstract: Over 30 papers have proposed to use convolutional neural network (CNN) for AD classification from anatomical MRI. However, the classification performance is difficult to compare across studies due to variations in components such as participant selection, image preprocessing or validation procedure. Moreover, these studies are hardly reproducible because their frameworks are not publicly accessibl… ▽ More Over 30 papers have proposed to use convolutional neural network (CNN) for AD classification from anatomical MRI. However, the classification performance is difficult to compare across studies due to variations in components such as participant selection, image preprocessing or validation procedure. Moreover, these studies are hardly reproducible because their frameworks are not publicly accessible and because implementation details are lacking. Lastly, some of these papers may report a biased performance due to inadequate or unclear validation or model selection procedures. In the present work, we aim to address these limitations through three main contributions. First, we performed a systematic literature review and found that more than half of the surveyed papers may have suffered from data leakage. Our second contribution is the extension of our open-source framework for classification of AD using CNN and T1-weighted MRI. Finally, we used this framework to rigorously compare different CNN architectures. The data was split into training/validation/test sets at the very beginning and only the training/validation sets were used for model selection. To avoid any overfitting, the test sets were left untouched until the end of the peer-review process. Overall, the different 3D approaches (3D-subject, 3D-ROI, 3D-patch) achieved similar performances while that of the 2D slice approach was lower. Of note, the different CNN approaches did not perform better than a SVM with voxel-based features. The different approaches generalized well to similar populations but not to datasets with different inclusion criteria or demographical characteristics. △ Less

Submitted 31 May, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

Journal ref: Medical Image Analysis, p.101694. 2020

arXiv:1812.11183 [pdf]

Reproducible evaluation of diffusion MRI features for automatic classification of patients with Alzheimers disease

Authors: Junhao Wen, Jorge Samper-Gonzalez, Simona Bottani, Alexandre Routier, Ninon Burgos, Thomas Jacquemont, Sabrina Fontanella, Stanley Durrleman, Stephane Epelbaum, Anne Bertrand, Olivier Colliot

Abstract: Diffusion MRI is the modality of choice to study alterations of white matter. In past years, various works have used diffusion MRI for automatic classification of AD. However, classification performance obtained with different approaches is difficult to compare and these studies are also difficult to reproduce. In the present paper, we first extend a previously proposed framework to diffusion MRI… ▽ More Diffusion MRI is the modality of choice to study alterations of white matter. In past years, various works have used diffusion MRI for automatic classification of AD. However, classification performance obtained with different approaches is difficult to compare and these studies are also difficult to reproduce. In the present paper, we first extend a previously proposed framework to diffusion MRI data for AD classification. Specifically, we add: conversion of diffusion MRI ADNI data into the BIDS standard and pipelines for diffusion MRI preprocessing and feature extraction. We then apply the framework to compare different components. First, FS has a positive impact on classification results: highest balanced accuracy (BA) improved from 0.76 to 0.82 for task CN vs AD. Secondly, voxel-wise features generally gives better performance than regional features. Fractional anisotropy (FA) and mean diffusivity (MD) provided comparable results for voxel-wise features. Moreover, we observe that the poor performance obtained in tasks involving MCI were potentially caused by the small data samples, rather than by the data imbalance. Furthermore, no extensive classification difference exists for different degree of smoothing and registration methods. Besides, we demonstrate that using non-nested validation of FS leads to unreliable and over-optimistic results: 0.05 up to 0.40 relative increase in BA. Lastly, with proper FR and FS, the performance of diffusion MRI features is comparable to that of T1w MRI. All the code of the framework and the experiments are publicly available: general-purpose tools have been integrated into the Clinica software package (www.clinica.run) and the paper-specific code is available at: https://github.com/aramis-lab/AD-ML. △ Less

Submitted 11 June, 2020; v1 submitted 28 December, 2018; originally announced December 2018.

Comments: 51 pages, 5 figure and 6 tables

arXiv:1808.06452 [pdf]

doi 10.1016/j.neuroimage.2018.08.042

Reproducible evaluation of classification methods in Alzheimer's disease: framework and application to MRI and PET data

Authors: Jorge Samper-González, Ninon Burgos, Simona Bottani, Sabrina Fontanella, Pascal Lu, Arnaud Marcoux, Alexandre Routier, Jérémy Guillon, Michael Bacci, Junhao Wen, Anne Bertrand, Hugo Bertin, Marie-Odile Habert, Stanley Durrleman, Theodoros Evgeniou, Olivier Colliot

Abstract: A large number of papers have introduced novel machine learning and feature extraction methods for automatic classification of AD. However, they are difficult to reproduce because key components of the validation are often not readily available. These components include selected participants and input data, image preprocessing and cross-validation procedures. The performance of the different appro… ▽ More A large number of papers have introduced novel machine learning and feature extraction methods for automatic classification of AD. However, they are difficult to reproduce because key components of the validation are often not readily available. These components include selected participants and input data, image preprocessing and cross-validation procedures. The performance of the different approaches is also difficult to compare objectively. In particular, it is often difficult to assess which part of the method provides a real improvement, if any. We propose a framework for reproducible and objective classification experiments in AD using three publicly available datasets (ADNI, AIBL and OASIS). The framework comprises: i) automatic conversion of the three datasets into BIDS format, ii) a modular set of preprocessing pipelines, feature extraction and classification methods, together with an evaluation framework, that provide a baseline for benchmarking the different components. We demonstrate the use of the framework for a large-scale evaluation on 1960 participants using T1 MRI and FDG PET data. In this evaluation, we assess the influence of different modalities, preprocessing, feature types, classifiers, training set sizes and datasets. Performances were in line with the state-of-the-art. FDG PET outperformed T1 MRI for all classification tasks. No difference in performance was found for the use of different atlases, image smoothing, partial volume correction of FDG PET images, or feature type. Linear SVM and L2-logistic regression resulted in similar performance and both outperformed random forests. The classification performance increased along with the number of subjects used for training. Classifiers trained on ADNI generalized well to AIBL and OASIS. All the code of the framework and the experiments is publicly available at: https://gitlab.icm-institute.org/aramislab/AD-ML. △ Less

Submitted 20 August, 2018; originally announced August 2018.

arXiv:1804.08039 [pdf, ps, other]

doi 10.1007/978-3-030-00931-1_59

Learning Myelin Content in Multiple Sclerosis from Multimodal MRI through Adversarial Training

Authors: Wen Wei, Emilie Poirion, Benedetta Bodini, Stanley Durrleman, Nicholas Ayache, Bruno Stankoff, Olivier Colliot

Abstract: Multiple sclerosis (MS) is a demyelinating disease of the central nervous system (CNS). A reliable measure of the tissue myelin content is therefore essential for the understanding of the physiopathology of MS, tracking progression and assessing treatment efficacy. Positron emission tomography (PET) with $[^{11} \mbox{C}] \mbox{PIB}$ has been proposed as a promising biomarker for measuring myelin… ▽ More Multiple sclerosis (MS) is a demyelinating disease of the central nervous system (CNS). A reliable measure of the tissue myelin content is therefore essential for the understanding of the physiopathology of MS, tracking progression and assessing treatment efficacy. Positron emission tomography (PET) with $[^{11} \mbox{C}] \mbox{PIB}$ has been proposed as a promising biomarker for measuring myelin content changes in-vivo in MS. However, PET imaging is expensive and invasive due to the injection of a radioactive tracer. On the contrary, magnetic resonance imaging (MRI) is a non-invasive, widely available technique, but existing MRI sequences do not provide, to date, a reliable, specific, or direct marker of either demyelination or remyelination. In this work, we therefore propose Sketcher-Refiner Generative Adversarial Networks (GANs) with specifically designed adversarial loss functions to predict the PET-derived myelin content map from a combination of MRI modalities. The prediction problem is solved by a sketch-refinement process in which the sketcher generates the preliminary anatomical and physiological information and the refiner refines and generates images reflecting the tissue myelin content in the human brain. We evaluated the ability of our method to predict myelin content at both global and voxel-wise levels. The evaluation results show that the demyelination in lesion regions and myelin content in normal-appearing white matter (NAWM) can be well predicted by our method. The method has the potential to become a useful tool for clinical management of patients with MS. △ Less

Submitted 8 June, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

Comments: Accepted by MICCAI2018

arXiv:1803.10119 [pdf, other]

Learning distributions of shape trajectories from longitudinal datasets: a hierarchical model on a manifold of diffeomorphisms

Authors: Alexandre Bône, Olivier Colliot, Stanley Durrleman

Abstract: We propose a method to learn a distribution of shape trajectories from longitudinal data, i.e. the collection of individual objects repeatedly observed at multiple time-points. The method allows to compute an average spatiotemporal trajectory of shape changes at the group level, and the individual variations of this trajectory both in terms of geometry and time dynamics. First, we formulate a non-… ▽ More We propose a method to learn a distribution of shape trajectories from longitudinal data, i.e. the collection of individual objects repeatedly observed at multiple time-points. The method allows to compute an average spatiotemporal trajectory of shape changes at the group level, and the individual variations of this trajectory both in terms of geometry and time dynamics. First, we formulate a non-linear mixed-effects statistical model as the combination of a generic statistical model for manifold-valued longitudinal data, a deformation model defining shape trajectories via the action of a finite-dimensional set of diffeomorphisms with a manifold structure, and an efficient numerical scheme to compute parallel transport on this manifold. Second, we introduce a MCMC-SAEM algorithm with a specific approach to shape sampling, an adaptive scheme for proposal variances, and a log-likelihood tempering strategy to estimate our model. Third, we validate our algorithm on 2D simulated data, and then estimate a scenario of alteration of the shape of the hippocampus 3D brain structure during the course of Alzheimer's disease. The method shows for instance that hippocampal atrophy progresses more quickly in female subjects, and occurs earlier in APOE4 mutation carriers. We finally illustrate the potential of our method for classifying pathological trajectories versus normal ageing. △ Less

Submitted 13 June, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

arXiv:1711.08716 [pdf, other]

doi 10.1007/978-3-319-67675-3_10

Prediction of the progression of subcortical brain structures in Alzheimer's disease from baseline

Authors: Alexandre Bône, Maxime Louis, Alexandre Routier, Jorge Samper, Michael Bacci, Benjamin Charlier, Olivier Colliot, Stanley Durrleman

Abstract: We propose a method to predict the subject-specific longitudinal progression of brain structures extracted from baseline MRI, and evaluate its performance on Alzheimer's disease data. The disease progression is modeled as a trajectory on a group of diffeomorphisms in the context of large deformation diffeomorphic metric map** (LDDMM). We first exhibit the limited predictive abilities of geodesic… ▽ More We propose a method to predict the subject-specific longitudinal progression of brain structures extracted from baseline MRI, and evaluate its performance on Alzheimer's disease data. The disease progression is modeled as a trajectory on a group of diffeomorphisms in the context of large deformation diffeomorphic metric map** (LDDMM). We first exhibit the limited predictive abilities of geodesic regression extrapolation on this group. Building on the recent concept of parallel curves in shape manifolds, we then introduce a second predictive protocol which personalizes previously learned trajectories to new subjects, and investigate the relative performances of two parallel shifting paradigms. This design only requires the baseline imaging data. Finally, coefficients encoding the disease dynamics are obtained from longitudinal cognitive measurements for each subject, and exploited to refine our methodology which is demonstrated to successfully predict the follow-up visits. △ Less

Submitted 23 November, 2017; originally announced November 2017.

arXiv:1709.08491 [pdf, other]

doi 10.1007/978-3-319-66182-7_52

Statistical learning of spatiotemporal patterns from longitudinal manifold-valued networks

Authors: Igor Koval, Jean-Baptiste Schiratti, Alexandre Routier, Michael Bacci, Olivier Colliot, Stéphanie Allassonnière, Stanley Durrleman

Abstract: We introduce a mixed-effects model to learn spatiotempo-ral patterns on a network by considering longitudinal measures distributed on a fixed graph. The data come from repeated observations of subjects at different time points which take the form of measurement maps distributed on a graph such as an image or a mesh. The model learns a typical group-average trajectory characterizing the propagation… ▽ More We introduce a mixed-effects model to learn spatiotempo-ral patterns on a network by considering longitudinal measures distributed on a fixed graph. The data come from repeated observations of subjects at different time points which take the form of measurement maps distributed on a graph such as an image or a mesh. The model learns a typical group-average trajectory characterizing the propagation of measurement changes across the graph nodes. The subject-specific trajectories are defined via spatial and temporal transformations of the group-average scenario, thus estimating the variability of spatiotemporal patterns within the group. To estimate population and individual model parameters, we adapted a stochastic version of the Expectation-Maximization algorithm, the MCMC-SAEM. The model is used to describe the propagation of cortical atrophy during the course of Alzheimer's Disease. Model parameters show the variability of this average pattern of atrophy in terms of trajectories across brain regions, age at disease onset and pace of propagation. We show that the personalization of this model yields accurate prediction of maps of cortical thickness in patients. △ Less

Submitted 25 September, 2017; originally announced September 2017.

Journal ref: Proc. Medical Image Computing and Computer-Assisted Intervention, MICCAI 2017, Lecture Notes in Computer Science, volume 10433, pp 451-459, Springer

arXiv:1709.07267 [pdf, other]

doi 10.1007/978-3-319-67389-9_7

Yet Another ADNI Machine Learning Paper? Paving The Way Towards Fully-reproducible Research on Classification of Alzheimer's Disease

Authors: Jorge Samper-González, Ninon Burgos, Sabrina Fontanella, Hugo Bertin, Marie-Odile Habert, Stanley Durrleman, Theodoros Evgeniou, Olivier Colliot

Abstract: In recent years, the number of papers on Alzheimer's disease classification has increased dramatically, generating interesting methodological ideas on the use machine learning and feature extraction methods. However, practical impact is much more limited and, eventually, one could not tell which of these approaches are the most efficient. While over 90\% of these works make use of ADNI an objectiv… ▽ More In recent years, the number of papers on Alzheimer's disease classification has increased dramatically, generating interesting methodological ideas on the use machine learning and feature extraction methods. However, practical impact is much more limited and, eventually, one could not tell which of these approaches are the most efficient. While over 90\% of these works make use of ADNI an objective comparison between approaches is impossible due to variations in the subjects included, image pre-processing, performance metrics and cross-validation procedures. In this paper, we propose a framework for reproducible classification experiments using multimodal MRI and PET data from ADNI. The core components are: 1) code to automatically convert the full ADNI database into BIDS format; 2) a modular architecture based on Nipype in order to easily plug-in different classification and feature extraction tools; 3) feature extraction pipelines for MRI and PET data; 4) baseline classification approaches for unimodal and multimodal features. This provides a flexible framework for benchmarking different feature extraction and classification tools in a reproducible manner. We demonstrate its use on all (1519) baseline T1 MR images and all (1102) baseline FDG PET images from ADNI 1, GO and 2 with SPM-based feature extraction pipelines and three different classification techniques (linear SVM, anatomically regularized SVM and multiple kernel learning SVM). The highest accuracies achieved were: 91% for AD vs CN, 83% for MCIc vs CN, 75% for MCIc vs MCInc, 94% for AD-A$β$+ vs CN-A$β$- and 72% for MCIc-A$β$+ vs MCInc-A$β$+. The code is publicly available at https://gitlab.icm-institute.org/aramislab/AD-ML (depends on the Clinica software platform, publicly available at http://www.clinica.run). △ Less

Submitted 21 September, 2017; originally announced September 2017.

Journal ref: Proc. Machine Learning in Medical Imaging MLMI 2017, MICCAI Worskhop, Lecture Notes in Computer Science, volume 10541, pp 53-60, Springer

arXiv:1709.06151 [pdf, other]

Multi-modal analysis of genetically-related subjects using SIFT descriptors in brain MRI

Authors: Kuldeep Kumar, Laurent Chauvin, Mathew Toews, Olivier Colliot, Christian Desrosiers

Abstract: So far, fingerprinting studies have focused on identifying features from single-modality MRI data, which capture individual characteristics in terms of brain structure, function, or white matter microstructure. However, due to the lack of a framework for comparing across multiple modalities, studies based on multi-modal data remain elusive. This paper presents a multi-modal analysis of genetically… ▽ More So far, fingerprinting studies have focused on identifying features from single-modality MRI data, which capture individual characteristics in terms of brain structure, function, or white matter microstructure. However, due to the lack of a framework for comparing across multiple modalities, studies based on multi-modal data remain elusive. This paper presents a multi-modal analysis of genetically-related subjects to compare and contrast the information provided by various MRI modalities. The proposed framework represents MRI scans as bags of SIFT features, and uses these features in a nearest-neighbor graph to measure subject similarity. Experiments using the T1/T2-weighted MRI and diffusion MRI data of 861 Human Connectome Project subjects demonstrate strong links between the proposed similarity measure and genetic proximity. △ Less

Submitted 18 September, 2017; originally announced September 2017.

Journal ref: Proc. Computational Diffusion MRI, MICCAI Workshop, Québec City, Canada, September 2017

arXiv:1709.06144 [pdf, other]

White Matter Fiber Segmentation Using Functional Varifolds

Authors: Kuldeep Kumar, Pietro Gori, Benjamin Charlier, Stanley Durrleman, Olivier Colliot, Christian Desrosiers

Abstract: The extraction of fibers from dMRI data typically produces a large number of fibers, it is common to group fibers into bundles. To this end, many specialized distance measures, such as MCP, have been used for fiber similarity. However, these distance based approaches require point-wise correspondence and focus only on the geometry of the fibers. Recent publications have highlighted that using micr… ▽ More The extraction of fibers from dMRI data typically produces a large number of fibers, it is common to group fibers into bundles. To this end, many specialized distance measures, such as MCP, have been used for fiber similarity. However, these distance based approaches require point-wise correspondence and focus only on the geometry of the fibers. Recent publications have highlighted that using microstructure measures along fibers improves tractography analysis. Also, many neurodegenerative diseases impacting white matter require the study of microstructure measures as well as the white matter geometry. Motivated by these, we propose to use a novel computational model for fibers, called functional varifolds, characterized by a metric that considers both the geometry and microstructure measure (e.g. GFA) along the fiber pathway. We use it to cluster fibers with a dictionary learning and sparse coding-based framework, and present a preliminary analysis using HCP data. △ Less

Submitted 18 September, 2017; originally announced September 2017.

Journal ref: Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics, pp 92-100, Lecture Notes in Computer Science, volume 10551, Springer, 2017

arXiv:1707.05961 [pdf]

doi 10.1016/j.neuroimage.2009.05.036

Multidimensional classification of hippocampal shape features discriminates Alzheimer's disease and mild cognitive impairment from normal aging

Authors: Emilie Gerardin, Gaël Chételat, Marie Chupin, Rémi Cuingnet, Béatrice Desgranges, Ho-Sung Kim, Marc Niethammer, Bruno Dubois, Stéphane Lehéricy, Line Garnero, Francis Eustache, Olivier Colliot

Abstract: We describe a new method to automatically discriminate between patients with Alzheimer's disease (AD) or mild cognitive impairment (MCI) and elderly controls, based on multidimensional classification of hippocampal shape features. This approach uses spherical harmonics (SPHARM) coefficients to model the shape of the hippocampi, which are segmented from magnetic resonance images (MRI) using a fully… ▽ More We describe a new method to automatically discriminate between patients with Alzheimer's disease (AD) or mild cognitive impairment (MCI) and elderly controls, based on multidimensional classification of hippocampal shape features. This approach uses spherical harmonics (SPHARM) coefficients to model the shape of the hippocampi, which are segmented from magnetic resonance images (MRI) using a fully automatic method that we previously developed. SPHARM coefficients are used as features in a classification procedure based on support vector machines (SVM). The most relevant features for classification are selected using a bagging strategy. We evaluate the accuracy of our method in a group of 23 patients with AD (10 males, 13 females, age $\pm$ standard-deviation (SD) = 73 $\pm$ 6 years, mini-mental score (MMS) = 24.4 $\pm$ 2.8), 23 patients with amnestic MCI (10 males, 13 females, age $\pm$ SD = 74 $\pm$ 8 years, MMS = 27.3 $\pm$ 1.4) and 25 elderly healthy controls (13 males, 12 females, age $\pm$ SD = 64 $\pm$ 8 years), using leave-one-out cross-validation. For AD vs controls, we obtain a correct classification rate of 94%, a sensitivity of 96%, and a specificity of 92%. For MCI vs controls, we obtain a classification rate of 83%, a sensitivity of 83%, and a specificity of 84%. This accuracy is superior to that of hippocampal volumetry and is comparable to recently published SVM-based whole-brain classification methods, which relied on a different strategy. This new method may become a useful tool to assist in the diagnosis of Alzheimer's disease. △ Less

Submitted 19 July, 2017; originally announced July 2017.

Comments: Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database

Journal ref: NeuroImage, 47 (4), pp.1476-86, 2009

arXiv:1605.02559 [pdf]

Robust imaging of hippocampal inner structure at 7T: in vivo acquisition protocol and methodological choices

Authors: Linda Marrakchi-Kacem, Alexandre Vignaud, Julien Sein, Johanne Germain, Thomas R Henry, Cyril Poupon, Lucie Hertz-Pannier, Stéphane Lehéricy, Olivier Colliot, Pierre-François Van de Moortele, Marie Chupin

Abstract: OBJECTIVE:Motion-robust multi-slab imaging of hippocampal inner structure in vivo at 7T.MATERIALS AND METHODS:Motion is a crucial issue for ultra-high resolution imaging, such as can be achieved with 7T MRI. An acquisition protocol was designed for imaging hippocampal inner structure at 7T. It relies on a compromise between anatomical details visibility and robustness to motion. In order to reduce… ▽ More OBJECTIVE:Motion-robust multi-slab imaging of hippocampal inner structure in vivo at 7T.MATERIALS AND METHODS:Motion is a crucial issue for ultra-high resolution imaging, such as can be achieved with 7T MRI. An acquisition protocol was designed for imaging hippocampal inner structure at 7T. It relies on a compromise between anatomical details visibility and robustness to motion. In order to reduce acquisition time and motion artifacts, the full slab covering the hippocampus was split into separate slabs with lower acquisition time. A robust registration approach was implemented to combine the acquired slabs within a final 3D-consistent high-resolution slab covering the whole hippocampus. Evaluation was performed on 50 subjects overall, made of three groups of subjects acquired using three acquisition settings; it focused on three issues: visibility of hippocampal inner structure, robustness to motion artifacts and registration procedure performance.RESULTS:Overall, T2-weighted acquisitions with interleaved slabs proved robust. Multi-slab registration yielded high quality datasets in 96 % of the subjects, thus compatible with further analyses of hippocampal inner structure.CONCLUSION:Multi-slab acquisition and registration setting is efficient for reducing acquisition time and consequently motion artifacts for ultra-high resolution imaging of the inner structure of the hippocampus. △ Less

Submitted 9 May, 2016; originally announced May 2016.

Journal ref: Magnetic Resonance Materials in Physics, Biology and Medicine, Springer Verlag, 2016

Showing 1–26 of 26 results for author: Colliot, O