-
Deep Learning Based Detection of Enlarged Perivascular Spaces on Brain MRI
Authors:
Tanweer Rashid,
Hangfan Liu,
Jeffrey B. Ware,
Karl Li,
Jose Rafael Romero,
Elyas Fadaee,
Ilya M. Nasrallah,
Saima Hilal,
R. Nick Bryan,
Timothy M. Hughes,
Christos Davatzikos,
Lenore Launer,
Sudha Seshadri,
Susan R. Heckbert,
Mohamad Habes
Abstract:
BACKGROUND AND PURPOSE: Deep learning has been demonstrated effective in many neuroimaging applications. However, in many scenarios, the number of imaging sequences capturing information related to small vessel disease lesions is insufficient to support data-driven techniques. Additionally, cohort-based studies may not always have the optimal or essential imaging sequences for accurate lesion dete…
▽ More
BACKGROUND AND PURPOSE: Deep learning has been demonstrated effective in many neuroimaging applications. However, in many scenarios, the number of imaging sequences capturing information related to small vessel disease lesions is insufficient to support data-driven techniques. Additionally, cohort-based studies may not always have the optimal or essential imaging sequences for accurate lesion detection. Therefore, it is necessary to determine which imaging sequences are crucial for precise detection. This study introduces a novel deep learning framework to detect enlarged perivascular spaces (ePVS) and aims to find the optimal combination of MRI sequences for deep learning-based quantification. MATERIALS AND METHODS: We implemented an effective lightweight U-Net adapted for ePVS detection and comprehensively investigated different combinations of information from SWI, FLAIR, T1-weighted (T1w), and T2-weighted (T2w) MRI sequences. The training data included 21 participants, which were randomly selected from the MESA cohort. Participants had ePVS 683 lesions on average. For T1w, T2w, and FLAIR images, the MESA study collected 3D isotropic MRI scans at six different sites with Siemens scanners. Our training data included participants from all these sites and all the scanner models, and the proposed model was applied to the whole brain instead of selective regions. RESULTS: The experimental results showed that T2w MRI is the most important for accurate ePVS detection, and the incorporation of SWI, FLAIR and T1w MRI in the deep neural network had minor improvements in accuracy and resulted in the highest sensitivity and precision (sensitivity =0.82, precision =0.83). The proposed method achieved comparable accuracy at a minimal time cost compared to manual reading.
△ Less
Submitted 14 October, 2022; v1 submitted 27 September, 2022;
originally announced September 2022.
-
Deep neural network heatmaps capture Alzheimer's disease patterns reported in a large meta-analysis of neuroimaging studies
Authors:
Di Wang,
Nicolas Honnorat,
Peter T. Fox,
Kerstin Ritter,
Simon B. Eickhoff,
Sudha Seshadri,
Mohamad Habes
Abstract:
Deep neural networks currently provide the most advanced and accurate machine learning models to distinguish between structural MRI scans of subjects with Alzheimer's disease and healthy controls. Unfortunately, the subtle brain alterations captured by these models are difficult to interpret because of the complexity of these multi-layer and non-linear models. Several heatmap methods have been pro…
▽ More
Deep neural networks currently provide the most advanced and accurate machine learning models to distinguish between structural MRI scans of subjects with Alzheimer's disease and healthy controls. Unfortunately, the subtle brain alterations captured by these models are difficult to interpret because of the complexity of these multi-layer and non-linear models. Several heatmap methods have been proposed to address this issue and analyze the imaging patterns extracted from the deep neural networks, but no quantitative comparison between these methods has been carried out so far. In this work, we explore these questions by deriving heatmaps from Convolutional Neural Networks (CNN) trained using T1 MRI scans of the ADNI data set, and by comparing these heatmaps with brain maps corresponding to Support Vector Machines (SVM) coefficients. Three prominent heatmap methods are studied: Layer-wise Relevance Propagation (LRP), Integrated Gradients (IG), and Guided Grad-CAM (GGC). Contrary to prior studies where the quality of heatmaps was visually or qualitatively assessed, we obtained precise quantitative measures by computing overlap with a ground-truth map from a large meta-analysis that combined 77 voxel-based morphometry (VBM) studies independently from ADNI. Our results indicate that all three heatmap methods were able to capture brain regions covering the meta-analysis map and achieved better results than SVM coefficients. Among them, IG produced the heatmaps with the best overlap with the independent meta-analysis.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
Emphasis control for parallel neural TTS
Authors:
Shreyas Seshadri,
Tuomo Raitio,
Dan Castellani,
Jiangchuan Li
Abstract:
Recent parallel neural text-to-speech (TTS) synthesis methods are able to generate speech with high fidelity while maintaining high performance. However, these systems often lack control over the output prosody, thus restricting the semantic information conveyable for a given text. This paper proposes a hierarchical parallel neural TTS system for prosodic emphasis control by learning a latent spac…
▽ More
Recent parallel neural text-to-speech (TTS) synthesis methods are able to generate speech with high fidelity while maintaining high performance. However, these systems often lack control over the output prosody, thus restricting the semantic information conveyable for a given text. This paper proposes a hierarchical parallel neural TTS system for prosodic emphasis control by learning a latent space that directly corresponds to a change in emphasis. Three candidate features for the latent space are compared: 1) Variance of pitch and duration within words in a sentence, 2) Wavelet-based feature computed from pitch, energy, and duration, and 3) Learned combination of the two aforementioned approaches. At inference time, word-level prosodic emphasis is achieved by increasing the feature values of the latent space for the given words. Experiments show that all the proposed methods are able to achieve the perception of increased emphasis with little loss in overall quality. Moreover, emphasized utterances were preferred in a pairwise comparison test over the non-emphasized utterances, indicating promise for real-world applications.
△ Less
Submitted 29 March, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
Authors:
Tuomo Raitio,
Jiangchuan Li,
Shreyas Seshadri
Abstract:
Neural text-to-speech (TTS) synthesis can generate speech that is indistinguishable from natural speech. However, the synthetic speech often represents the average prosodic style of the database instead of having more versatile prosodic variation. Moreover, many models lack the ability to control the output prosody, which does not allow for different styles for the same text input. In this work, w…
▽ More
Neural text-to-speech (TTS) synthesis can generate speech that is indistinguishable from natural speech. However, the synthetic speech often represents the average prosodic style of the database instead of having more versatile prosodic variation. Moreover, many models lack the ability to control the output prosody, which does not allow for different styles for the same text input. In this work, we train a non-autoregressive parallel neural TTS front-end model hierarchically conditioned on both coarse and fine-grained acoustic speech features to learn a latent prosody space with intuitive and meaningful dimensions. Experiments show that a non-autoregressive TTS model hierarchically conditioned on utterance-wise pitch, pitch range, duration, energy, and spectral tilt can effectively control each prosodic dimension, generate a wide variety of speaking styles, and provide word-wise emphasis control, while maintaining equal or better quality to the baseline model.
△ Less
Submitted 22 March, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech
Authors:
Shreyas Seshadri,
Okko Räsänen
Abstract:
Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic DSP methods, and only a small nu…
▽ More
Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic DSP methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This paper presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.