Search | arXiv e-print repository

arXiv:2009.01803 [pdf, other]

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Abstract: Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast incremental learning on the fly and memory systems in the brain play a critical role. We introduce Sparse Meta Networks -- a meta-learning approach to learn onli… ▽ More Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast incremental learning on the fly and memory systems in the brain play a critical role. We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks, by using deep neural networks. We augment a deep neural network with a layer-specific fast-weight memory. The fast-weights are generated sparsely at each time step and accumulated incrementally through time providing a useful inductive bias for online continual adaptation. We demonstrate strong performance on a variety of sequential adaptation scenarios, from a simple online reinforcement learning to a large scale adaptive language modelling. △ Less

Submitted 3 September, 2020; originally announced September 2020.

Comments: 9 pages, 4 figures, 2 tables

arXiv:2005.03350 [pdf]

A Locally Adaptive Interpretable Regression

Authors: Lkhagvadorj Munkhdalai, Tsendsuren Munkhdalai, Keun Ho Ryu

Abstract: Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural n… ▽ More Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability. △ Less

Submitted 28 April, 2022; v1 submitted 7 May, 2020; originally announced May 2020.

arXiv:1907.09720 [pdf, other]

Metalearned Neural Memory

Authors: Tsendsuren Munkhdalai, Alessandro Sordoni, Tong Wang, Adam Trischler

Abstract: We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning. We conceptualize this memory as a rapidly adaptable function that we parameterize as a deep neural network. Reading from the neural memory function amounts to pushing an input (the key vector) through the function to produce an output (the value vector). Writing to memory means… ▽ More We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning. We conceptualize this memory as a rapidly adaptable function that we parameterize as a deep neural network. Reading from the neural memory function amounts to pushing an input (the key vector) through the function to produce an output (the value vector). Writing to memory means changing the function; specifically, updating the parameters of the neural network to encode desired information. We leverage training and algorithmic techniques from metalearning to update the neural memory function in one shot. The proposed memory-augmented model achieves strong performance on a variety of learning problems, from supervised question answering to reinforcement learning. △ Less

Submitted 3 December, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: NeurIPS 2019

arXiv:1807.05076 [pdf, other]

Metalearning with Hebbian Fast Weights

Authors: Tsendsuren Munkhdalai, Adam Trischler

Abstract: We unify recent neural approaches to one-shot learning with older ideas of associative memory in a model for metalearning. Our model learns jointly to represent data and to bind class labels to representations in a single shot. It builds representations via slow weights, learned across tasks through SGD, while fast weights constructed by a Hebbian learning rule implement one-shot binding for each… ▽ More We unify recent neural approaches to one-shot learning with older ideas of associative memory in a model for metalearning. Our model learns jointly to represent data and to bind class labels to representations in a single shot. It builds representations via slow weights, learned across tasks through SGD, while fast weights constructed by a Hebbian learning rule implement one-shot binding for each new task. On the Omniglot, Mini-ImageNet, and Penn Treebank one-shot learning benchmarks, our model achieves state-of-the-art results. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: 8 pages, 3 figures, 4 tables. arXiv admin note: text overlap with arXiv:1712.09926

arXiv:1712.09926 [pdf, other]

Rapid Adaptation with Conditionally Shifted Neurons

Authors: Tsendsuren Munkhdalai, Xingdi Yuan, Soroush Mehri, Adam Trischler

Abstract: We describe a mechanism by which artificial neural networks can learn rapid adaptation - the ability to adapt on the fly, with little data, to new tasks - that we call conditionally shifted neurons. We apply this mechanism in the framework of metalearning, where the aim is to replicate some of the flexibility of human learning in machines. Conditionally shifted neurons modify their activation valu… ▽ More We describe a mechanism by which artificial neural networks can learn rapid adaptation - the ability to adapt on the fly, with little data, to new tasks - that we call conditionally shifted neurons. We apply this mechanism in the framework of metalearning, where the aim is to replicate some of the flexibility of human learning in machines. Conditionally shifted neurons modify their activation values with task-specific shifts retrieved from a memory module, which is populated rapidly based on limited task experience. On metalearning benchmarks from the vision and language domains, models augmented with conditionally shifted neurons achieve state-of-the-art results. △ Less

Submitted 3 July, 2018; v1 submitted 28 December, 2017; originally announced December 2017.

Comments: ICML 2018; Added: additional ablation and speed comparison with MetaNet

arXiv:1703.00837 [pdf, other]

Meta Networks

Authors: Tsendsuren Munkhdalai, Hong Yu

Abstract: Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a… ▽ More Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. When evaluated on Omniglot and Mini-ImageNet benchmarks, our MetaNet models achieve a near human-level performance and outperform the baseline approaches by up to 6% accuracy. We demonstrate several appealing properties of MetaNet relating to generalization and continual learning. △ Less

Submitted 8 June, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

Comments: Accepted at ICML 2017 - rewrote: the main section; added: MetaNet algorithmic procedure; performed: Mini-ImageNet evaluation

arXiv:1610.06454 [pdf, other]

Reasoning with Memory Augmented Neural Networks for Language Comprehension

Authors: Tsendsuren Munkhdalai, Hong Yu

Abstract: Hypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to… ▽ More Hypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets. △ Less

Submitted 28 February, 2017; v1 submitted 20 October, 2016; originally announced October 2016.

Comments: Accepted at ICLR 2017

arXiv:1607.04492 [pdf, other]

Neural Tree Indexers for Text Understanding

Authors: Tsendsuren Munkhdalai, Hong Yu

Abstract: Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a… ▽ More Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks. △ Less

Submitted 28 February, 2017; v1 submitted 15 July, 2016; originally announced July 2016.

Comments: Accepted at EACL 2017

arXiv:1607.04315 [pdf, other]

Neural Semantic Encoders

Authors: Tsendsuren Munkhdalai, Hong Yu

Abstract: We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read}, compose and write operations. NSE can also access multiple and shared memories. In this paper, we demonstrated the… ▽ More We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read}, compose and write operations. NSE can also access multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU. △ Less

Submitted 5 January, 2017; v1 submitted 14 July, 2016; originally announced July 2016.

Comments: Accepted in EACL 2017, added: comparison with NTM, qualitative analysis and memory visualization

Showing 1–9 of 9 results for author: Munkhdalai, T