-
Learning to Control Latent Representations for Few-Shot Learning of Named Entities
Authors:
Omar U. Florez,
Erik Mueller
Abstract:
Humans excel in continuously learning with small data without forgetting how to solve old problems. However, neural networks require large datasets to compute latent representations across different tasks while minimizing a loss function. For example, a natural language understanding (NLU) system will often deal with emerging entities during its deployment as interactions with users in realistic s…
▽ More
Humans excel in continuously learning with small data without forgetting how to solve old problems. However, neural networks require large datasets to compute latent representations across different tasks while minimizing a loss function. For example, a natural language understanding (NLU) system will often deal with emerging entities during its deployment as interactions with users in realistic scenarios will generate new and infrequent names, events, and locations. Here, we address this scenario by introducing an RL trainable controller that disentangles the representation learning of a neural encoder from its memory management role.
Our proposed solution is straightforward and simple: we train a controller to execute an optimal sequence of reading and writing operations on an external memory with the goal of leveraging diverse activations from the past and provide accurate predictions. Our approach is named Learning to Control (LTC) and allows few-shot learning with two degrees of memory plasticity. We experimentally show that our system obtains accurate results for few-shot learning of entity recognition in the Stanford Task-Oriented Dialogue dataset.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.
-
Aging Memories Generate More Fluent Dialogue Responses with Memory Augmented Neural Networks
Authors:
Omar U. Florez,
Erik Mueller
Abstract:
Memory Networks have emerged as effective models to incorporate Knowledge Bases (KB) into neural networks. By storing KB embeddings into a memory component, these models can learn meaningful representations that are grounded to external knowledge. However, as the memory unit becomes full, the oldest memories are replaced by newer representations.
In this paper, we question this approach and prov…
▽ More
Memory Networks have emerged as effective models to incorporate Knowledge Bases (KB) into neural networks. By storing KB embeddings into a memory component, these models can learn meaningful representations that are grounded to external knowledge. However, as the memory unit becomes full, the oldest memories are replaced by newer representations.
In this paper, we question this approach and provide experimental evidence that conventional Memory Networks store highly correlated vectors during training. While increasing the memory size mitigates this problem, this also leads to overfitting as the memory stores a large number of training latent representations. To address these issues, we propose a novel regularization mechanism named memory dropout which 1) Samples a single latent vector from the distribution of redundant memories. 2) Ages redundant memories thus increasing their probability of overwriting them during training. This fully differentiable technique allows us to achieve state-of-the-art response generation in the Stanford Multi-Turn Dialogue and Cambridge Restaurant datasets.
△ Less
Submitted 26 September, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
On the Unintended Social Bias of Training Language Generation Models with Data from Local Media
Authors:
Omar U. Florez
Abstract:
There are concerns that neural language models may preserve some of the stereotypes of the underlying societies that generate the large corpora needed to train these models. For example, gender bias is a significant problem when generating text, and its unintended memorization could impact the user experience of many applications (e.g., the smart-compose feature in Gmail).
In this paper, we intr…
▽ More
There are concerns that neural language models may preserve some of the stereotypes of the underlying societies that generate the large corpora needed to train these models. For example, gender bias is a significant problem when generating text, and its unintended memorization could impact the user experience of many applications (e.g., the smart-compose feature in Gmail).
In this paper, we introduce a novel architecture that decouples the representation learning of a neural model from its memory management role. This architecture allows us to update a memory module with an equal ratio across gender types addressing biased correlations directly in the latent space. We experimentally show that our approach can mitigate the gender bias amplification in the automatic generation of articles news while providing similar perplexity values when extending the Sequence2Sequence architecture.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.