Search | arXiv e-print repository

On the long-term learning ability of LSTM LMs

Authors: Wim Boes, Robbe Van Rompaey, Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq

Abstract: We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance. We evaluate on text and speech. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-leve… ▽ More We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance. We evaluate on text and speech. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs. On the other hand, the extension does not provide gains for discourse-level models. These findings indicate that discourse-level LSTM LMs already rely on contextual information to perform long-term learning. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Journal ref: ESANN 2020 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2020) 625-630

arXiv:1809.08826 [pdf, ps, other]

Information-Weighted Neural Cache Language Models for ASR

Authors: Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq

Abstract: Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache. We make an extensive comparison of 'regular' cache models with neural cache models, both in terms of perplexity and WER after rescoring first-pass ASR results. Furthermore, we propose t… ▽ More Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache. We make an extensive comparison of 'regular' cache models with neural cache models, both in terms of perplexity and WER after rescoring first-pass ASR results. Furthermore, we propose two extensions to this neural cache model that make use of the content value/information weight of the word: firstly, combining the cache probability and LM probability with an information-weighted interpolation and secondly, selectively adding only content words to the cache. We obtain a 29.9%/32.1% (validation/test set) relative improvement in perplexity with respect to a baseline LSTM LM on the WikiText-2 dataset, outperforming previous work on neural cache LMs. Additionally, we observe significant WER reductions with respect to the baseline model on the WSJ ASR task. △ Less

Submitted 24 September, 2018; originally announced September 2018.

Comments: Accepted for publication at SLT 2018

arXiv:1805.04264 [pdf, ps, other]

State Gradients for RNN Memory Analysis

Authors: Lyan Verwimp, Hugo Van hamme, Vincent Renkens, Patrick Wambacq

Abstract: We present a framework for analyzing what the state in RNNs remembers from its input embeddings. Our approach is inspired by backpropagation, in the sense that we compute the gradients of the states with respect to the input embeddings. The gradient matrix is decomposed with Singular Value Decomposition to analyze which directions in the embedding space are best transferred to the hidden state spa… ▽ More We present a framework for analyzing what the state in RNNs remembers from its input embeddings. Our approach is inspired by backpropagation, in the sense that we compute the gradients of the states with respect to the input embeddings. The gradient matrix is decomposed with Singular Value Decomposition to analyze which directions in the embedding space are best transferred to the hidden state space, characterized by the largest singular values. We apply our approach to LSTM language models and investigate to what extent and for how long certain classes of words are remembered on average for a certain corpus. Additionally, the extent to which a specific property or relationship is remembered by the RNN can be tracked by comparing a vector characterizing that property with the direction(s) in embedding space that are best preserved in hidden state space. △ Less

Submitted 18 June, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: Accepted for Interspeech 2018

arXiv:1709.03759 [pdf, ps, other]

Language Models of Spoken Dutch

Authors: Lyan Verwimp, Joris Pelemans, Marieke Lycke, Hugo Van hamme, Patrick Wambacq

Abstract: In Flanders, all TV shows are subtitled. However, the process of subtitling is a very time-consuming one and can be sped up by providing the output of a speech recognizer run on the audio of the TV show, prior to the subtitling. Naturally, this speech recognition will perform much better if the employed language model is adapted to the register and the topic of the program. We present several lang… ▽ More In Flanders, all TV shows are subtitled. However, the process of subtitling is a very time-consuming one and can be sped up by providing the output of a speech recognizer run on the audio of the TV show, prior to the subtitling. Naturally, this speech recognition will perform much better if the employed language model is adapted to the register and the topic of the program. We present several language models trained on subtitles of television shows provided by the Flemish public-service broadcaster VRT. This data was gathered in the context of the project STON which has as purpose to facilitate the process of subtitling TV shows. One model is trained on all available data (46M word tokens), but we also trained models on a specific type of TV show or domain/topic. Language models of spoken language are quite rare due to the lack of training data. The size of this corpus is relatively large for a corpus of spoken language (compare with e.g. CGN which has 9M words), but still rather small for a language model. Thus, in practice it is advised to interpolate these models with a large background language model trained on written language. The models can be freely downloaded on http://www.esat.kuleuven.be/psi/spraak/downloads/. △ Less

Submitted 12 September, 2017; originally announced September 2017.

arXiv:1704.02813 [pdf, other]

Character-Word LSTM Language Models

Authors: Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq

Abstract: We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By c… ▽ More We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a baseline model with a similar amount of parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level models with a larger number of parameters. △ Less

Submitted 10 April, 2017; originally announced April 2017.

Journal ref: European Chapter of the Association for Computational Linguistics (EACL) 2017, Valencia, Spain, pp. 417-427

arXiv:0710.4723 [pdf]

Simulation Methodology for Analysis of Substrate Noise Impact on Analog / RF Circuits Including Interconnect Resistance

Authors: C. Soens, G. Van Der Plas, P. Wambacq, S. Donnay

Abstract: This paper reports a novel simulation methodology for analysis and prediction of substrate noise impact on analog / RF circuits taking into account the role of the parasitic resistance of the on-chip interconnect in the impact mechanism. This methodology allows investigation of the role of the separate devices (also parasitic devices) in the analog / RF circuit in the overall impact. This way is… ▽ More This paper reports a novel simulation methodology for analysis and prediction of substrate noise impact on analog / RF circuits taking into account the role of the parasitic resistance of the on-chip interconnect in the impact mechanism. This methodology allows investigation of the role of the separate devices (also parasitic devices) in the analog / RF circuit in the overall impact. This way is revealed which devices have to be taken care of (shielding, topology change) to protect the circuit against substrate noise. The developed methodology is used to analyze impact of substrate noise on a 3 GHz LC-tank Voltage Controlled Oscillator (VCO) designed in a high-ohmic 0.18 $μ$m 1PM6 CMOS technology. For this VCO (in the investigated frequency range from DC to 15 MHz) impact is mainly caused by resistive coupling of noise from the substrate to the non-ideal on-chip ground interconnect, resulting in analog ground bounce and frequency modulation. Hence, the presented test-case reveals the important role of the on-chip interconnect in the phenomenon of substrate noise impact. △ Less

Submitted 25 October, 2007; originally announced October 2007.

Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

Journal ref: Dans Design, Automation and Test in Europe - DATE'05, Munich : Allemagne (2005)

Showing 1–6 of 6 results for author: Wambacq, P