Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Gan, Zhe; Li, Chunyuan; Chen, Changyou; Pu, Yunchen; Su, Qinliang; Carin, Lawrence

Computer Science > Computation and Language

arXiv:1611.08034 (cs)

[Submitted on 23 Nov 2016 (v1), last revised 24 Apr 2017 (this version, v2)]

Title:Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Authors:Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence Carin

View PDF

Abstract:Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach over stochastic optimization.

Comments:	Accepted to ACL 2017
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1611.08034 [cs.CL]
	(or arXiv:1611.08034v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1611.08034

Submission history

From: Zhe Gan [view email]
[v1] Wed, 23 Nov 2016 23:40:50 UTC (1,012 KB)
[v2] Mon, 24 Apr 2017 15:32:49 UTC (663 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2016-11

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zhe Gan
Chunyuan Li
Changyou Chen
Yunchen Pu
Qinliang Su

…

export BibTeX citation

Computer Science > Computation and Language

Title:Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators