-
On the long-term learning ability of LSTM LMs
Authors:
Wim Boes,
Robbe Van Rompaey,
Lyan Verwimp,
Joris Pelemans,
Hugo Van hamme,
Patrick Wambacq
Abstract:
We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance. We evaluate on text and speech. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-leve…
▽ More
We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance. We evaluate on text and speech. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs. On the other hand, the extension does not provide gains for discourse-level models. These findings indicate that discourse-level LSTM LMs already rely on contextual information to perform long-term learning.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Information-Weighted Neural Cache Language Models for ASR
Authors:
Lyan Verwimp,
Joris Pelemans,
Hugo Van hamme,
Patrick Wambacq
Abstract:
Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache. We make an extensive comparison of 'regular' cache models with neural cache models, both in terms of perplexity and WER after rescoring first-pass ASR results. Furthermore, we propose t…
▽ More
Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache. We make an extensive comparison of 'regular' cache models with neural cache models, both in terms of perplexity and WER after rescoring first-pass ASR results. Furthermore, we propose two extensions to this neural cache model that make use of the content value/information weight of the word: firstly, combining the cache probability and LM probability with an information-weighted interpolation and secondly, selectively adding only content words to the cache. We obtain a 29.9%/32.1% (validation/test set) relative improvement in perplexity with respect to a baseline LSTM LM on the WikiText-2 dataset, outperforming previous work on neural cache LMs. Additionally, we observe significant WER reductions with respect to the baseline model on the WSJ ASR task.
△ Less
Submitted 24 September, 2018;
originally announced September 2018.
-
State Gradients for RNN Memory Analysis
Authors:
Lyan Verwimp,
Hugo Van hamme,
Vincent Renkens,
Patrick Wambacq
Abstract:
We present a framework for analyzing what the state in RNNs remembers from its input embeddings. Our approach is inspired by backpropagation, in the sense that we compute the gradients of the states with respect to the input embeddings. The gradient matrix is decomposed with Singular Value Decomposition to analyze which directions in the embedding space are best transferred to the hidden state spa…
▽ More
We present a framework for analyzing what the state in RNNs remembers from its input embeddings. Our approach is inspired by backpropagation, in the sense that we compute the gradients of the states with respect to the input embeddings. The gradient matrix is decomposed with Singular Value Decomposition to analyze which directions in the embedding space are best transferred to the hidden state space, characterized by the largest singular values. We apply our approach to LSTM language models and investigate to what extent and for how long certain classes of words are remembered on average for a certain corpus. Additionally, the extent to which a specific property or relationship is remembered by the RNN can be tracked by comparing a vector characterizing that property with the direction(s) in embedding space that are best preserved in hidden state space.
△ Less
Submitted 18 June, 2018; v1 submitted 11 May, 2018;
originally announced May 2018.
-
Language Models of Spoken Dutch
Authors:
Lyan Verwimp,
Joris Pelemans,
Marieke Lycke,
Hugo Van hamme,
Patrick Wambacq
Abstract:
In Flanders, all TV shows are subtitled. However, the process of subtitling is a very time-consuming one and can be sped up by providing the output of a speech recognizer run on the audio of the TV show, prior to the subtitling. Naturally, this speech recognition will perform much better if the employed language model is adapted to the register and the topic of the program. We present several lang…
▽ More
In Flanders, all TV shows are subtitled. However, the process of subtitling is a very time-consuming one and can be sped up by providing the output of a speech recognizer run on the audio of the TV show, prior to the subtitling. Naturally, this speech recognition will perform much better if the employed language model is adapted to the register and the topic of the program. We present several language models trained on subtitles of television shows provided by the Flemish public-service broadcaster VRT. This data was gathered in the context of the project STON which has as purpose to facilitate the process of subtitling TV shows. One model is trained on all available data (46M word tokens), but we also trained models on a specific type of TV show or domain/topic. Language models of spoken language are quite rare due to the lack of training data. The size of this corpus is relatively large for a corpus of spoken language (compare with e.g. CGN which has 9M words), but still rather small for a language model. Thus, in practice it is advised to interpolate these models with a large background language model trained on written language. The models can be freely downloaded on http://www.esat.kuleuven.be/psi/spraak/downloads/.
△ Less
Submitted 12 September, 2017;
originally announced September 2017.
-
Character-Word LSTM Language Models
Authors:
Lyan Verwimp,
Joris Pelemans,
Hugo Van hamme,
Patrick Wambacq
Abstract:
We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By c…
▽ More
We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a baseline model with a similar amount of parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level models with a larger number of parameters.
△ Less
Submitted 10 April, 2017;
originally announced April 2017.
-
Simulation Methodology for Analysis of Substrate Noise Impact on Analog / RF Circuits Including Interconnect Resistance
Authors:
C. Soens,
G. Van Der Plas,
P. Wambacq,
S. Donnay
Abstract:
This paper reports a novel simulation methodology for analysis and prediction of substrate noise impact on analog / RF circuits taking into account the role of the parasitic resistance of the on-chip interconnect in the impact mechanism. This methodology allows investigation of the role of the separate devices (also parasitic devices) in the analog / RF circuit in the overall impact. This way is…
▽ More
This paper reports a novel simulation methodology for analysis and prediction of substrate noise impact on analog / RF circuits taking into account the role of the parasitic resistance of the on-chip interconnect in the impact mechanism. This methodology allows investigation of the role of the separate devices (also parasitic devices) in the analog / RF circuit in the overall impact. This way is revealed which devices have to be taken care of (shielding, topology change) to protect the circuit against substrate noise. The developed methodology is used to analyze impact of substrate noise on a 3 GHz LC-tank Voltage Controlled Oscillator (VCO) designed in a high-ohmic 0.18 $μ$m 1PM6 CMOS technology. For this VCO (in the investigated frequency range from DC to 15 MHz) impact is mainly caused by resistive coupling of noise from the substrate to the non-ideal on-chip ground interconnect, resulting in analog ground bounce and frequency modulation. Hence, the presented test-case reveals the important role of the on-chip interconnect in the phenomenon of substrate noise impact.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.