Showing 1–2 of 2 results for author: Herygers, A

Search v0.5.6 released 2020-02-24

arXiv:2312.15499 [pdf, other]

eess.AS

Exploring data augmentation in bias mitigation against non-native-accented speech

Authors: Yuanyuan Zhang, Aaricia Herygers, Tanvina Patel, Zhengjun Yue, Odette Scharenborg

Abstract: Automatic speech recognition (ASR) should serve every speaker, not only the majority ``standard'' speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a ``non-standard'' or ``diverse'' way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system. Since this is a low-resource problem, we investiga… ▽ More Automatic speech recognition (ASR) should serve every speaker, not only the majority ``standard'' speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a ``non-standard'' or ``diverse'' way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system. Since this is a low-resource problem, we investigate the optimal type of data augmentation, i.e., speed/pitch perturbation, cross-lingual voice conversion-based methods, and SpecAugment, applied to both native Flemish and non-native-accented Flemish, for bias mitigation. The results showed that specific types of data augmentation applied to both native and non-native-accented speech improve non-native-accented ASR while applying data augmentation to the non-native-accented speech is more conducive to bias reduction. Combining both gave the largest bias reduction for human-machine interaction (HMI) as well as read-type speech. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: Accepted to ASRU 2023
arXiv:2306.04306 [pdf, other]

cs.CL cs.SD eess.AS

doi 10.21437/Interspeech.2023-772

Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Authors: Kevin Glocker, Aaricia Herygers, Munir Georges

Abstract: This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone embedding approach with individually supervised phonetic attribute classifiers in a multi-task architecture. We also introduce Allophoible, an extension of the P… ▽ More This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone embedding approach with individually supervised phonetic attribute classifiers in a multi-task architecture. We also introduce Allophoible, an extension of the PHOIBLE database. When combined with a distance based map** approach for grapheme-to-phoneme outputs, it allows us to train on PHOIBLE inventories directly. By training and evaluating on 34 languages, we found that the addition of multi-task learning improves the model's capability of being applied to unseen phonemes and phoneme inventories. On supervised languages we achieve phoneme error rate improvements of 11 percentage points (pp.) compared to a baseline without multi-task learning. Evaluation of zero-shot transfer on 84 languages yielded a decrease in PER of 2.63 pp. over the baseline. △ Less

Submitted 16 August, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 5 pages, 2 figures, 2 tables, accepted to INTERSPEECH 2023; published version

ACM Class: I.2.7

Journal ref: Proc. INTERSPEECH 2023, 2258-2262

Search v0.5.6 released 2020-02-24