Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Liu, Bing; Yu, Tong; Lane, Ian; Mengshoel, Ole J.

Computer Science > Computation and Language

arXiv:1711.08493 (cs)

[Submitted on 22 Nov 2017]

Title:Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Authors:Bing Liu, Tong Yu, Ian Lane, Ole J. Mengshoel

View PDF

Abstract:Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with a nonlinear reward function that uses distributed representation of text for online response selection. A bidirectional LSTM is used to produce the distributed representations of dialog context and responses, which serve as the input to a contextual bandit. In learning the bandit, we propose a customized Thompson sampling method that is applied to a polynomial feature space in approximating the reward. Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples.

Comments:	Accepted at AAAI 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1711.08493 [cs.CL]
	(or arXiv:1711.08493v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1711.08493

Submission history

From: Bing Liu [view email]
[v1] Wed, 22 Nov 2017 20:15:01 UTC (484 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Bing Liu
Tong Yu
Ian Lane
Ian R. Lane
Ole J. Mengshoel

export BibTeX citation

Computer Science > Computation and Language

Title:Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators