Search | arXiv e-print repository

Error-preserving Automatic Speech Recognition of Young English Learners' Language

Authors: Janick Michot, Manuela Hürlimann, Jan Deriu, Luzia Sauer, Katsiaryna Mlynchyk, Mark Cieliebak

Abstract: One of the central skills that language learners need to practice is speaking the language. Currently, students in school do not get enough speaking opportunities and lack conversational practice. Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills. In this work, we tackle the first component of such a pipelin… ▽ More One of the central skills that language learners need to practice is speaking the language. Currently, students in school do not get enough speaking opportunities and lack conversational practice. Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills. In this work, we tackle the first component of such a pipeline, namely, the automated speech recognition module (ASR), which faces a number of challenges: first, state-of-the-art ASR models are often trained on adult read-aloud data by native speakers and do not transfer well to young language learners' speech. Second, most ASR systems contain a powerful language model, which smooths out errors made by the speakers. To give corrective feedback, which is a crucial part of language learning, the ASR systems in our setting need to preserve the errors made by the language learners. In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their errors. For this, we collected a corpus containing around 85 hours of English audio spoken by learners in Switzerland from grades 4 to 6 on different language learning tasks, which we used to train an ASR model. Our experiments show that our model benefits from direct fine-tuning on children's voices and has a much higher error preservation rate than other models. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted at ACL 2024 Main Conference

arXiv:2310.09088 [pdf, other]

Dialect Transfer for Swiss German Speech Translation

Authors: Claudio Paonessa, Yanick Schraner, Jan Deriu, Manuela Hürlimann, Manfred Vogel, Mark Cieliebak

Abstract: This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German. Swiss German is a spoken language with no formal writing system, it comprises many diverse dialects and is a low-resource language with only around 5 million speakers. The study is guided by tw… ▽ More This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German. Swiss German is a spoken language with no formal writing system, it comprises many diverse dialects and is a low-resource language with only around 5 million speakers. The study is guided by two key research questions: how does the inclusion and exclusion of dialects during the training of speech translation models for Swiss German impact the performance on specific dialects, and how do the differences between Swiss German and Standard German impact the performance of the systems? We show that dialect diversity and linguistic differences pose significant challenges to Swiss German speech translation, which is in line with linguistic hypotheses derived from empirical investigations. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2305.18855 [pdf, other]

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

Authors: Michel Plüss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia Hartmann, Larissa Schmidt, Christian Scheller, Manuela Hürlimann, Tanja Samardžić, Manfred Vogel, Mark Cieliebak

Abstract: We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. We make the corpus publicly available. It contains 343 hours of speech from all dialect regions and is th… ▽ More We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. We make the corpus publicly available. It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date. Application areas include automatic speech recognition (ASR), text-to-speech, dialect identification, and speaker recognition. Dialect information, age group, and gender of the 316 speakers are provided. Genders are equally represented and the corpus includes speakers of all ages. Roughly the same amount of speech is provided per dialect region, which makes the corpus ideally suited for experiments with speech technology for different dialects. We provide training, validation, and test splits of the data. The test set consists of the same spoken sentences for each dialect region and allows a fair evaluation of the quality of speech technologies in different dialects. We train an ASR model on the training set and achieve an average BLEU score of 74.7 on the test set. The model beats the best published BLEU scores on 2 other Swiss German ASR test sets, demonstrating the quality of the corpus. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.01633 [pdf, other]

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP. △ Less

Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

MSC Class: 68 ACM Class: I.2.7

arXiv:2205.09501 [pdf, other]

SDS-200: A Swiss German Speech to Standard German Text Corpus

Authors: Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel

Abstract: We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for training speech translation, dialect recognition, and speech synthesis systems, among others. The data was collected using a web recording tool that is open to the public. Each participant was given a text… ▽ More We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for training speech translation, dialect recognition, and speech synthesis systems, among others. The data was collected using a web recording tool that is open to the public. Each participant was given a text in Standard German and asked to translate it to their Swiss German dialect before recording it. To increase the corpus quality, recordings were validated by other participants. The data consists of 200 hours of speech by around 4000 different speakers and covers a large part of the Swiss-German dialect landscape. We release SDS-200 alongside a baseline speech translation model, which achieves a word error rate (WER) of 30.3 and a BLEU score of 53.1 on the SDS-200 test set. Furthermore, we use SDS-200 to fine-tune a pre-trained XLS-R model, achieving 21.6 WER and 64.0 BLEU. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2003.02171 [pdf, other]

doi 10.1103/PhysRevResearch.2.033352

Observation of adiabatic and non-adiabatic behavior for CPMG sequence in time-dependent magnetic fields

Authors: Martin D. Hürlimann, Shin Utsuzawa, Chang-Yu Hou

Abstract: We investigate experimentally the effect of time dependent magnetic fields on the spin dynamics of the Carr - Purcell - Meiboom - Gill (CPMG) sequence. Over a wide range of offset fields and ramp rates, the measured response is fully consistent with adiabatic behavior. The echo amplitudes exhibit characteristic modulations that are in excellent agreement with theoretical predictions. Non-adiabatic… ▽ More We investigate experimentally the effect of time dependent magnetic fields on the spin dynamics of the Carr - Purcell - Meiboom - Gill (CPMG) sequence. Over a wide range of offset fields and ramp rates, the measured response is fully consistent with adiabatic behavior. The echo amplitudes exhibit characteristic modulations that are in excellent agreement with theoretical predictions. Non-adiabatic events are observed at distinct offsets. Abruptly after passing through these offsets, the experimental results deviate from the theoretical adiabatic expressions. These non-adiabatic events occur precisely at the field offsets predicted by theory. It is demonstrated that in the adiabatic regime the effects of field fluctuations are fully reversible, while the occurrence of non-adiabatic events leads to hysteresis. The adiabatic range of field offsets can be increased by modifying the refocusing pulses within the CPMG sequence. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: 29 pages, 7 figures

Journal ref: Phys. Rev. Research 2, 033352 (2020)

arXiv:1903.08006 [pdf, other]

doi 10.1103/PhysRevApplied.12.044061

Spin Dynamics of CPMG sequence in time-dependent magnetic fields

Authors: Martin D. Hürlimann, Shin Utsuzawa, Chang-Yu Hou

Abstract: We analyze the effects of time dependent magnetic and RF fields on the spin dynamics of the Carr-Purcell-Meiboom-Gill (CPMG) sequence. The analysis is based on the decomposition of the magnetization into the eigenmodes of the propagator of a single refocusing cycle. For sufficiently slow changes in the external fields, the magnetization follows the changing eigenmodes adiabatically. This results i… ▽ More We analyze the effects of time dependent magnetic and RF fields on the spin dynamics of the Carr-Purcell-Meiboom-Gill (CPMG) sequence. The analysis is based on the decomposition of the magnetization into the eigenmodes of the propagator of a single refocusing cycle. For sufficiently slow changes in the external fields, the magnetization follows the changing eigenmodes adiabatically. This results in echo amplitudes that show regular modulations with time. Faster field changes can induce transitions between the eigenmodes. Such non-adiabatic behavior occurs preferentially at particular offsets of the Larmor frequency from the RF frequency where the eigenmodes become nearly degenerate. We introduce the instantaneous adiabaticity parameter ${\cal A}(t)$ that accurately predicts the crossover from the adiabatic to the non-adiabatic regime and allows the classification of field fluctuations. ${\cal A}(t)$ is determined solely by the properties of a single refocusing cycle under static conditions and the instantaneous value of the field offset and its temporal derivative. The analytical results are compared with numerical simulations. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: 54 pages, 16 figures

Journal ref: Phys. Rev. Applied 12, 044061 (2019)

arXiv:1002.1702 [pdf, ps, other]

doi 10.1016/j.jmr.2010.09.003

Application of Optimal Control to CPMG Refocusing Pulse Design

Authors: Troy W. Borneman, Martin D. Hurlimann, David G. Cory

Abstract: We apply optimal control theory (OCT) to the design of refocusing pulses suitable for the CPMG sequence that are robust over a wide range of B0 and B1 offsets. We also introduce a model, based on recent progress in the analysis of unitary dynamics in the field of quantum information processing (QIP), that describes the multiple refocusing dynamics of the CPMG sequence as a dephasing Pauli channel.… ▽ More We apply optimal control theory (OCT) to the design of refocusing pulses suitable for the CPMG sequence that are robust over a wide range of B0 and B1 offsets. We also introduce a model, based on recent progress in the analysis of unitary dynamics in the field of quantum information processing (QIP), that describes the multiple refocusing dynamics of the CPMG sequence as a dephasing Pauli channel. This model provides a compact characterization of the consequences and severity of residual pulse errors. We illustrate the methods by considering a specific example of designing and analyzing broadband OCT refocusing pulses of length 10 t180 that are constrained by the maximum instantaneous pulse power. We show that with this refocusing pulse, the CPMG sequence can refocus over 98% of magnetization for resonance offsets up to 3.2 times the maximum RF amplitude, even in the presence of +/- 10% RF inhomogeneity. △ Less

Submitted 12 October, 2010; v1 submitted 8 February, 2010; originally announced February 2010.

Comments: 23 pages, 10 figures; Revised and reformatted version with new title and significant changes to Introduction and Conclusions sections

Journal ref: J. Magn. Reson. 207, 220-233 (2010)

arXiv:cond-mat/0211184 [pdf]

Tortuosity Measurement and the Effects of Finite Pulse Widths on Xenon Gas Diffusion NMR Studies of Porous Media

Authors: R. W. Mair, M. D. Hurlimann, P. N. Sen, L. M. Schwartz, S. Patz, R. L. Walsworth

Abstract: We have extended the utility of NMR as a technique to probe porous media structure over length scales of ~ 100 - 2000 micron by using the spin 1/2 noble gas 129Xe imbibed into the system's pore space. Such length scales are much greater than can be probed with NMR diffusion studies of water-saturated porous media. We utilized Pulsed Gradient Spin Echo NMR measurements of the time-dependent diffu… ▽ More We have extended the utility of NMR as a technique to probe porous media structure over length scales of ~ 100 - 2000 micron by using the spin 1/2 noble gas 129Xe imbibed into the system's pore space. Such length scales are much greater than can be probed with NMR diffusion studies of water-saturated porous media. We utilized Pulsed Gradient Spin Echo NMR measurements of the time-dependent diffusion coefficient, D(t) of the xenon gas filling the pore space to study further the measurements of both the surface area-pore volume ratio, S/Vp, and the tortuosity (pore connectivity) of the medium. In uniform-size glass bead packs, we observed D(t) decreasing with increasing t, reaching an observed asymptote of ~ 0.62 - 0.65D0, that could be measured over diffusion distances extending over multiple bead diameters. Measurements of D(t)/D0 at differing gas pressures showed this tortuosity limit was not affected by changing the characteristic diffusion length of the spins during the diffusion encoding gradient pulse. This was not the case at the short time limit, where D(t)/D0 was noticeably affected by the gas pressure in the sample. Increasing the gas pressure, and hence reducing D0 and the diffusion during the gradient pulse served to reduce the previously observed deviation of D(t)/D0 from the S/Vp relation. The Pade approximation is used to interpolate between the long and short time limits in D(t). While the short time D(t) point lay above the interpolation line in the case of small beads, due to diffusion during the gradient pulse on the order of the pore size, it was also noted that the experimental D(t) data fell below the Pade line in the case of large beads, most likely due to finite size effects. △ Less

Submitted 9 November, 2002; originally announced November 2002.

Comments: single pdf file containing all figures

Journal ref: Magnetic Resonance Imaging 19, 345 (2001)

arXiv:cond-mat/0211182 [pdf]

doi 10.1006/jmre.2002.2540

The Narrow Pulse Approximation and long length scale determination in xenon gas diffusion NMR studies of model porous media

Authors: R. W. Mair, P. N. Sen, M. D. Hurlimann, S. Patz, D. G. Cory, R. L. Walsworth

Abstract: We report a systematic study of xenon gas diffusion NMR in simple model porous media: random packs of mono-sized glass beads, and focus on three specific areas peculiar to gas-phase diffusion. These topics are: (i) diffusion of spins on the order of the pore dimensions during the application of the diffusion encoding gradient pulses in a PGSE experiment (breakdown of the 'narrow pulse approximat… ▽ More We report a systematic study of xenon gas diffusion NMR in simple model porous media: random packs of mono-sized glass beads, and focus on three specific areas peculiar to gas-phase diffusion. These topics are: (i) diffusion of spins on the order of the pore dimensions during the application of the diffusion encoding gradient pulses in a PGSE experiment (breakdown of the 'narrow pulse approximation' and imperfect background gradient cancellation), (ii) the ability to derive long-length scale structural information, and (iii) effects of finite sample size. We find that the time-dependent diffusion coefficient, D(t), of the imbibed xenon gas at short diffusion times in small beads is significantly affected by the gas pressure. In particular, as expected, we find smaller deviations between measured D(t) and theoretical predictions as the gas pressure is increased, resulting from reduced diffusion during the application of the gradient pulse. The deviations are then completely removed when water D(t) is observed in the same samples. The use of gas also allows us to probe D(t) over a wide range of length scales, and observe the long-time asymptotic limit which is proportional to the inverse tortuosity of the sample, as well as the diffusion distance where this limit takes effect (~ 1 - 1.5 bead diameters). The Pade approximation can be used as a reference for expected xenon D(t) data between the short and long time limits, allowing us to explore deviations from the expected behaviour at intermediate times as a result of finite sample size effects. Finally, the application of the Pade interpolation between the long and short time asymptotic limits yields a fitted length scale (the "Pade length"), which is found to be ~ 0.13b for all bead packs, where b is the bead diameter. △ Less

Submitted 9 November, 2002; originally announced November 2002.

Comments: single pdf file including figures

Journal ref: Journal of Magnetic Resonance 156, 202 (2002)

Showing 1–10 of 10 results for author: Hürlimann, M