-
Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning
Authors:
Shivam Ratnakant Mhaskar,
Nirmesh J. Shah,
Mohammadi Zaki,
Ashishkumar P. Gudmalwar,
Pankaj Wasnik,
Rajiv Ratn Shah
Abstract:
Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. This is done to guarantee synchronization with respect to the alignment of video and audio subseque…
▽ More
Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. This is done to guarantee synchronization with respect to the alignment of video and audio subsequent to the dubbing process. Previous approaches have focused on aligning the number of characters and words in the source and target language texts of Machine Translation models. However, our approach aims to align the number of phonemes instead, as they are closely associated with speech duration. In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs. To evaluate our models, we propose the Phoneme Count Compliance (PCC) score, which is a measure of length compliance. Our approach demonstrates a substantial improvement of approximately 36% in the PCC score compared to the state-of-the-art models when applied to English-Hindi language pairs. Moreover, we propose a student-teacher architecture within the framework of our RL approach to maintain a trade-off between the phoneme count and translation quality.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
MoCoNet: Motion Correction in 3D MPRAGE images using a Convolutional Neural Network approach
Authors:
Kamlesh Pawar,
Zhaolin Chen,
N. Jon Shah,
Gary F. Egan
Abstract:
Purpose: The suppression of motion artefacts from MR images is a challenging task. The purpose of this paper is to develop a standalone novel technique to suppress motion artefacts from MR images using a data-driven deep learning approach. Methods: A deep learning convolutional neural network (CNN) was developed to remove motion artefacts in brain MR images. A CNN was trained on simulated motion c…
▽ More
Purpose: The suppression of motion artefacts from MR images is a challenging task. The purpose of this paper is to develop a standalone novel technique to suppress motion artefacts from MR images using a data-driven deep learning approach. Methods: A deep learning convolutional neural network (CNN) was developed to remove motion artefacts in brain MR images. A CNN was trained on simulated motion corrupted images to identify and suppress artefacts due to the motion. The network was an encoder-decoder CNN architecture where the encoder decomposed the motion corrupted images into a set of feature maps. The feature maps were then combined by the decoder network to generate a motion-corrected image. The network was tested on an unseen simulated dataset and an experimental, motion corrupted in vivo brain dataset. Results: The trained network was able to suppress the motion artefacts in the simulated motion corrupted images, and the mean percentage error in the motion corrected images was 2.69 % with a standard deviation of 0.95 %. The network was able to effectively suppress the motion artefacts from the experimental dataset, demonstrating the generalisation capability of the trained network. Conclusion: A novel and generic motion correction technique has been developed that can suppress motion artefacts from motion corrupted MR images. The proposed technique is a standalone post-processing method that does not interfere with data acquisition or reconstruction parameters, thus making it suitable for a multitude of MR sequences.
△ Less
Submitted 29 July, 2018;
originally announced July 2018.
-
Quality assessment of voice converted speech using articulatory features
Authors:
Avni Rajpal,
Nirmesh J. Shah,
Mohammadi Zaki,
Hemant A. Patil
Abstract:
We propose a novel application based on acoustic-to-articulatory inversion towards quality assessment of voice converted speech. The ability of humans to speak effortlessly requires coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards naturalness, intelligibility and speakers identity which is partially present in voice converted speech. Hence,…
▽ More
We propose a novel application based on acoustic-to-articulatory inversion towards quality assessment of voice converted speech. The ability of humans to speak effortlessly requires coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards naturalness, intelligibility and speakers identity which is partially present in voice converted speech. Hence, during voice conversion, the information related to speech production is lost. In this paper, this loss is quantified for male voice, by showing increase in RMSE error for voice converted speech followed by showing decrease in mutual information. Similar results are obtained in case of female voice. This observation is extended by showing that articulatory features can be used as an objective measure. The effectiveness of proposed measure over MCD is illustrated by comparing their correlation with Mean Opinion Score.
△ Less
Submitted 23 November, 2015; v1 submitted 16 November, 2015;
originally announced November 2015.