-
Quantifying the effect of speech pathology on automatic and human speaker verification
Authors:
Bence Mark Halpern,
Thomas Tienkamp,
Wen-Chin Huang,
Lester Phillip Violeta,
Teja Rebernik,
Sebastiaan de Visscher,
Max Witjes,
Martijn Wieling,
Defne Abur,
Tomoki Toda
Abstract:
This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance,…
▽ More
This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance, and whether objective/subjective measures of speech severity are correlated with the performance. Finally, we carry out a perceptual study to compare judgements of ASV and human listeners. Our findings reveal that pathological speech negatively affects ASV performance, and the severity of the speech is negatively correlated with the performance. There is a moderate agreement in perceptual and objective scores of speaker similarity and severity, however, we could not clearly establish in the perceptual study, whether the same phenomenon also exists in human perception.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Manipulation of oral cancer speech using neural articulatory synthesis
Authors:
Bence Mark Halpern,
Teja Rebernik,
Thomas Tienkamp,
Rob van Son,
Michiel van den Brekel,
Martijn Wieling,
Max Witjes,
Odette Scharenborg
Abstract:
We present an articulatory synthesis framework for the synthesis and manipulation of oral cancer speech for clinical decision making and alleviation of patient stress. Objective and subjective evaluations demonstrate that the framework has acceptable naturalness and is worth further investigation. A subsequent subjective vowel and consonant identification experiment showed that the articulatory sy…
▽ More
We present an articulatory synthesis framework for the synthesis and manipulation of oral cancer speech for clinical decision making and alleviation of patient stress. Objective and subjective evaluations demonstrate that the framework has acceptable naturalness and is worth further investigation. A subsequent subjective vowel and consonant identification experiment showed that the articulatory synthesis system can manipulate the articulatory trajectories so that the synthesised speech reproduces problems present in the ground truth oral cancer speech.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Recurrent convolutional neural networks for mandible segmentation from computed tomography
Authors:
Bingjiang Qiu,
Jiapan Guo,
Joep Kraeima,
Haye H. Glas,
Ronald J. H. Borra,
Max J. H. Witjes,
Peter M. A. van Ooijen
Abstract:
Recently, accurate mandible segmentation in CT scans based on deep learning methods has attracted much attention. However, there still exist two major challenges, namely, metal artifacts among mandibles and large variations in shape or size among individuals. To address these two challenges, we propose a recurrent segmentation convolutional neural network (RSegCNN) that embeds segmentation convolu…
▽ More
Recently, accurate mandible segmentation in CT scans based on deep learning methods has attracted much attention. However, there still exist two major challenges, namely, metal artifacts among mandibles and large variations in shape or size among individuals. To address these two challenges, we propose a recurrent segmentation convolutional neural network (RSegCNN) that embeds segmentation convolutional neural network (SegCNN) into the recurrent neural network (RNN) for robust and accurate segmentation of the mandible. Such a design of the system takes into account the similarity and continuity of the mandible shapes captured in adjacent image slices in CT scans. The RSegCNN infers the mandible information based on the recurrent structure with the embedded encoder-decoder segmentation (SegCNN) components. The recurrent structure guides the system to exploit relevant and important information from adjacent slices, while the SegCNN component focuses on the mandible shapes from a single CT slice. We conducted extensive experiments to evaluate the proposed RSegCNN on two head and neck CT datasets. The experimental results show that the RSegCNN is significantly better than the state-of-the-art models for accurate mandible segmentation.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.