Search | arXiv e-print repository

doi 10.21437/Interspeech.2020-1750

Do End-to-End Speech Recognition Models Care About Context?

Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel

Abstract: The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual… ▽ More The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356

arXiv:2005.00812 [pdf, other]

MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

Authors: Jakob D. Havtorn, Jan Latko, Joakim Edin, Lasse Borgholt, Lars Maaløe, Lorenzo Belgrano, Nicolai F. Jacobsen, Regitze Sdun, Željko Agić

Abstract: We address a challenging and practical task of labeling questions in speech in real time during telephone calls to emergency medical services in English, which embeds within a broader decision support system for emergency call-takers. We propose a novel multimodal approach to real-time sequence labeling in speech. Our model treats speech and its own textual representation as two separate modalitie… ▽ More We address a challenging and practical task of labeling questions in speech in real time during telephone calls to emergency medical services in English, which embeds within a broader decision support system for emergency call-takers. We propose a novel multimodal approach to real-time sequence labeling in speech. Our model treats speech and its own textual representation as two separate modalities or views, as it jointly learns from streamed audio and its noisy transcription into text via automatic speech recognition. Our results show significant gains of jointly learning from the two modalities when compared to text or audio only, under adverse noise and limited volume of training data. The results generalize to medical symptoms detection where we observe a similar pattern of improvements with multimodal learning. △ Less

Submitted 12 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

Comments: Accepted at ACL 2020

arXiv:2004.07642 [pdf, other]

Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers

Authors: Robert Litschko, Ivan Vulić, Željko Agić, Goran Glavaš

Abstract: Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level". In this work, we propose and argue for a novel cross-lingual transfer paradigm: instance-level parser selection (ILPS), and present a proof-of-concept study focused on instance-level selection in the framework of delexicalized parser transf… ▽ More Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level". In this work, we propose and argue for a novel cross-lingual transfer paradigm: instance-level parser selection (ILPS), and present a proof-of-concept study focused on instance-level selection in the framework of delexicalized parser transfer. We start from an empirical observation that different source parsers are the best choice for different Universal POS sequences in the target language. We then propose to predict the best parser at the instance level. To this end, we train a supervised regression model, based on the Transformer architecture, to predict parser accuracies for individual POS-sequences. We compare ILPS against two strong single-best parser selection baselines (SBPS): (1) a model that compares POS n-gram distributions between the source and target languages (KL) and (2) a model that selects the source based on the similarity between manually created language vectors encoding syntactic properties of languages (L2V). The results from our extensive evaluation, coupling 42 source parsers and 20 diverse low-resource test languages, show that ILPS outperforms KL and L2V on 13/20 and 14/20 test languages, respectively. Further, we show that by predicting the best parser "at the treebank level" (SBPS), using the aggregation of predictions from our instance-level model, we outperform the same baselines on 17/20 and 16/20 test languages. △ Less

Submitted 16 April, 2020; originally announced April 2020.

arXiv:1811.08757 [pdf, other]

The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging

Authors: Barbara Plank, Sigrid Klerke, Zeljko Agic

Abstract: In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amounts of symbolic lexical resources such as user-generated lexicons are often available even when gol… ▽ More In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amounts of symbolic lexical resources such as user-generated lexicons are often available even when gold-standard corpora are not. Such additional linguistic information is though often neglected, and recent neural approaches to cross-lingual tagging typically rely only on word and subword embeddings. While these representations are effective, our recent work has shown clear benefits of combining the best of both worlds: integrating conventional lexical information improves neural cross-lingual part-of-speech (PoS) tagging. However, little is known on how complementary such additional information is, and to what extent improvements depend on the coverage and quality of these external resources. This paper seeks to fill this gap by providing the first thorough analysis on the contributions of lexical resources for cross-lingual PoS tagging in neural times. △ Less

Submitted 21 November, 2018; originally announced November 2018.

Comments: Under review for Natural Language Engineering

arXiv:1808.09733 [pdf, other]

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

Authors: Barbara Plank, Željko Agić

Abstract: We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effecti… ▽ More We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: EMNLP 2018

arXiv:1704.05347 [pdf, other]

Baselines and test data for cross-lingual inference

Authors: Željko Agić, Natalie Schluter

Abstract: The recent years have seen a revival of interest in textual entailment, sparked by i) the emergence of powerful deep neural network learners for natural language processing and ii) the timely development of large-scale evaluation datasets such as SNLI. Recast as natural language inference, the problem now amounts to detecting the relation between pairs of statements: they either contradict or enta… ▽ More The recent years have seen a revival of interest in textual entailment, sparked by i) the emergence of powerful deep neural network learners for natural language processing and ii) the timely development of large-scale evaluation datasets such as SNLI. Recast as natural language inference, the problem now amounts to detecting the relation between pairs of statements: they either contradict or entail one another, or they are mutually neutral. Current research in natural language inference is effectively exclusive to English. In this paper, we propose to advance the research in SNLI-style natural language inference toward multilingual evaluation. To that end, we provide test data for four major languages: Arabic, French, Spanish, and Russian. We experiment with a set of baselines. Our systems are based on cross-lingual word embeddings and machine translation. While our best system scores an average accuracy of just over 75%, we focus largely on enabling further research in multilingual inference. △ Less

Submitted 2 March, 2018; v1 submitted 18 April, 2017; originally announced April 2017.

Comments: To appear at LREC 2018

arXiv:1701.03163 [pdf, ps, other]

Parsing Universal Dependencies without training

Authors: Héctor Martínez Alonso, Željko Agić, Barbara Plank, Anders Søgaard

Abstract: We propose UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of head attachment rules. It features two-step decoding to guarantee that function words are attached as leaf nodes. The parser requires no training, and it is competitive with a delexicalized transfer system. UDP offers a linguistically sound unsupervised alternative… ▽ More We propose UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of head attachment rules. It features two-step decoding to guarantee that function words are attached as leaf nodes. The parser requires no training, and it is competitive with a delexicalized transfer system. UDP offers a linguistically sound unsupervised alternative to cross-lingual parsing for UD, which can be used as a baseline for such systems. The parser has very few parameters and is distinctly robust to domain change across languages. △ Less

Submitted 11 January, 2017; originally announced January 2017.

Comments: EACL 2017, 8+2 pages

arXiv:cond-mat/0311381 [pdf]

doi 10.1051/jp4:2004114018

Effects of transverse electron dispersion on photo-emission spectra of quasi-one-dimensional systems

Authors: Zeljana Agic, Pasko Zupanovic, Aleksa Bjelis

Abstract: The random phase approximation (RPA) spectral function of the one-dimensional electron band with the three-dimensional long range Coulomb interaction shows a broad feature which is spread on the scale of the plasmon energy and vanishes at the chemical potential. The fact that there are no quasi-particle $δ$-peaks is the direct consequence of the acoustic nature of the collective plasmon mode. Th… ▽ More The random phase approximation (RPA) spectral function of the one-dimensional electron band with the three-dimensional long range Coulomb interaction shows a broad feature which is spread on the scale of the plasmon energy and vanishes at the chemical potential. The fact that there are no quasi-particle $δ$-peaks is the direct consequence of the acoustic nature of the collective plasmon mode. This behaviour of the spectral function is in the qualitative agreement with the angle resolved photo-emission spectra of some Bechgaard salts. In the present work we consider the modifications in the spectral function due to finite transverse electron dispersion. The transverse bandwidth is responsible for the appearance of an optical gap in the long wavelength plasmon mode. The plasmon dispersion of such kind introduces the quasi-particle $δ$-peak into the spectral function at the chemical potential. The cross-over from the Fermi liquid to the non-Fermi liquid regime by decreasing the transverse bandwidth takes place through the decrease of the quasi-particle weight as the optical gap in the long wavelength plasmon mode is closing. △ Less

Submitted 17 November, 2003; originally announced November 2003.

Comments: 2 pages, 2 figures, ISCOM'03

arXiv:cond-mat/0211276 [pdf]

Photo-emission properties of quasi-one-dimensional conductors

Authors: Z. Agic, P. Zupanovic, A. Bjelis

Abstract: We calculate the self-energy of one-dimensional electron band with the three-dimensional long range Coulomb interaction within the random phase approximation, paying particular attention to the contribution coming from the electron scatterings on the collective plasmon mode. It is shown that the spectral density has a form of wide feature at thr frequency scale of the plasmon frequency, without… ▽ More We calculate the self-energy of one-dimensional electron band with the three-dimensional long range Coulomb interaction within the random phase approximation, paying particular attention to the contribution coming from the electron scatterings on the collective plasmon mode. It is shown that the spectral density has a form of wide feature at thr frequency scale of the plasmon frequency, without the presence of quasi-particle delta-peaks. The relevance of this result with respect to experimental findings and to the theory of Luttinger liquids is discussed. △ Less

Submitted 14 November, 2002; originally announced November 2002.

Comments: 4 pages, 2 figures

arXiv:cond-mat/0206265 [pdf, ps, other]

Discrete approach to incoherent excitations in conductors

Authors: P. Zupanovic, A. Bjelis, Z. Agic

Abstract: Kee** the discretness of the reciprocal space we calculate the spectrum of incoherent electron-hole excitations in the conducting Fermi liquids. The metod is illustrated on the well-known jellium model within the random phase approximation. It also leads to the formulation os a sum rule from which we get the details os dispersion curve for the collective plasmon mode. The notion of time averag… ▽ More Kee** the discretness of the reciprocal space we calculate the spectrum of incoherent electron-hole excitations in the conducting Fermi liquids. The metod is illustrated on the well-known jellium model within the random phase approximation. It also leads to the formulation os a sum rule from which we get the details os dispersion curve for the collective plasmon mode. The notion of time averaging in the discrete approach is briefly recalled. △ Less

Submitted 14 June, 2002; originally announced June 2002.

Journal ref: FIZIKA A (Zagreb) 10 (2001) 4, 203-214

Showing 1–10 of 10 results for author: Agić, Ž