A Comprehensive Rubric for Annotating Pathological Speech
Authors:
Mario Corrales-Astorgano,
David Escudero-Mancebo,
Lourdes Aguilar,
Valle Flores-Lucas,
Valentín Cardeñoso-Payo,
Carlos Vivaracho-Pascual,
César González-Ferreras
Abstract:
Rubrics are a commonly used tool for labeling voice corpora in speech quality assessment, although their application in the context of pathological speech remains relatively limited. In this study, we introduce a comprehensive rubric based on various dimensions of speech quality, including phonetics, fluency, and prosody. The objective is to establish standardized criteria for identifying errors w…
▽ More
Rubrics are a commonly used tool for labeling voice corpora in speech quality assessment, although their application in the context of pathological speech remains relatively limited. In this study, we introduce a comprehensive rubric based on various dimensions of speech quality, including phonetics, fluency, and prosody. The objective is to establish standardized criteria for identifying errors within the speech of individuals with Down syndrome, thereby enabling the development of automated assessment systems. To achieve this objective, we utilized the Prautocal corpus. To assess the quality of annotations using our rubric, two experiments were conducted, focusing on phonetics and fluency. For phonetic evaluation, we employed the Goodness of Pronunciation (GoP) metric, utilizing automatic segmentation systems and correlating the results with evaluations conducted by a specialized speech therapist. While the obtained correlation values were not notably high, a positive trend was observed. In terms of fluency assessment, deep learning models like wav2vec were used to extract audio features, and we employed an SVM classifier trained on a corpus focused on identifying fluency issues to categorize Prautocal corpus samples. The outcomes highlight the complexities of evaluating such phenomena, with variability depending on the specific type of disfluency detected.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
BiosecurID: a multimodal biometric database
Authors:
Julian Fierrez,
Javier Galbally,
Javier Ortega-Garcia,
Manuel R Freire,
Fernando Alonso-Fernandez,
Daniel Ramos,
Doroteo Torre Toledano,
Joaquin Gonzalez-Rodriguez,
Juan A Siguenza,
Javier Garrido-Salas,
E Anguiano,
Guillermo Gonzalez-de-Rivera,
Ricardo Ribalda,
Marcos Faundez-Zanuy,
JA Ortega,
Valentín Cardeñoso-Payo,
A Viloria,
Carlos E Vivaracho,
Q Isaac Moro,
Juan J Igarza,
J Sanchez,
Inmaculada Hernaez,
Carlos Orrite-Urunuela,
Francisco Martinez-Contreras,
Juan José Gracia-Roche
Abstract:
A new multimodal biometric database, acquired in the framework of the BiosecurID project, is presented together with the description of the acquisition setup and protocol. The database includes eight unimodal biometric traits, namely: speech, iris, face (still images, videos of talking faces), handwritten signature and handwritten text (on-line dynamic signals, off-line scanned images), fingerprin…
▽ More
A new multimodal biometric database, acquired in the framework of the BiosecurID project, is presented together with the description of the acquisition setup and protocol. The database includes eight unimodal biometric traits, namely: speech, iris, face (still images, videos of talking faces), handwritten signature and handwritten text (on-line dynamic signals, off-line scanned images), fingerprints (acquired with two different sensors), hand (palmprint, contour-geometry) and keystroking. The database comprises 400 subjects and presents features such as: realistic acquisition scenario, balanced gender and population distributions, availability of information about particular demographic groups (age, gender, handedness), acquisition of replay attacks for speech and keystroking, skilled forgeries for signatures, and compatibility with other existing databases. All these characteristics make it very useful in research and development of unimodal and multimodal biometric systems.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.