The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

He, Mutian; Garner, Philip N.

Computer Science > Computation and Language

arXiv:2305.09652v1 (cs)

[Submitted on 16 May 2023 (this version), latest version 17 Oct 2023 (v2)]

Title:The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Authors:Mutian He, Philip N. Garner

View PDF

Abstract:End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for speech models that work on lower-level acoustic frames. Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both monolingual and cross-lingual scenarios.
By introducing ST, our models give higher performance over current baselines on monolingual and multilingual intent classification as well as spoken question answering using SLURP, MINDS-14, and NMSQA benchmarks. To verify the effectiveness of our methods, we also release two new benchmark datasets from both synthetic and real sources, for the tasks of abstractive summarization from speech and low-resource or zero-shot transfer from English to French. We further show the value of preserving knowledge from the pretraining task, and explore Bayesian transfer learning on pretrained speech models based on continual learning regularizers for that.

Comments:	13 pages, 3 figures
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.09652 [cs.CL]
	(or arXiv:2305.09652v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.09652

Submission history

From: Mutian He [view email]
[v1] Tue, 16 May 2023 17:53:03 UTC (7,056 KB)
[v2] Tue, 17 Oct 2023 14:59:28 UTC (213 KB)

Computer Science > Computation and Language

Title:The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators