Skip to main content

Showing 1–5 of 5 results for author: Budzianowski, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:1805.06966  [pdf, other

    cs.CL cs.AI stat.ML

    Neural User Simulation for Corpus-based Policy Optimisation for Spoken Dialogue Systems

    Authors: Florian Kreyssig, Inigo Casanueva, Pawel Budzianowski, Milica Gasic

    Abstract: User Simulators are one of the major tools that enable offline training of task-oriented dialogue systems. For this task the Agenda-Based User Simulator (ABUS) is often used. The ABUS is based on hand-crafted rules and its output is in semantic form. Issues arise from both properties such as limited diversity and the inability to interface a text-level belief tracker. This paper introduces the Neu… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

    Comments: Accepted to SIGDIAL 2018

  2. arXiv:1802.03753  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces

    Authors: Gellért Weisz, Paweł Budzianowski, Pei-Hao Su, Milica Gašić

    Abstract: In spoken dialogue systems, we aim to deploy artificial intelligence to build automated dialogue agents that can converse with humans. A part of this effort is the policy optimisation task, which attempts to find a policy describing how to respond to humans, in the form of a function taking the current state of the dialogue and returning the response of the system. In this paper, we investigate de… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.

  3. arXiv:1711.11486  [pdf, other

    stat.ML cs.CL cs.LG cs.NE

    Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

    Authors: Christopher Tegho, Paweł Budzianowski, Milica Gašić

    Abstract: In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on epsilon-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches s… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.

    Comments: Accepted at the Bayesian Deep Learning Workshop, 31st Conference on Neural Information Processing Systems (NIPS 2017)

  4. arXiv:1711.11023  [pdf, other

    stat.ML cs.CL cs.NE

    A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management

    Authors: Iñigo Casanueva, Paweł Budzianowski, Pei-Hao Su, Nikola Mrkšić, Tsung-Hsien Wen, Stefan Ultes, Lina Rojas-Barahona, Steve Young, Milica Gašić

    Abstract: Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid the significant effort needed to hand-craft the required dialogue flow, the Dialogue Management (DM) module can be cast as a continuous Markov Decision Process (MDP) and trained through Reinforcement Learning (RL). Several RL models have been investigated over recent years. However, the lack of a common benchmarking fram… ▽ More

    Submitted 6 April, 2018; v1 submitted 29 November, 2017; originally announced November 2017.

    Comments: Accepted at the Deep Reinforcement Learning Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017) Paper updated with minor changes

  5. arXiv:1707.06299  [pdf, other

    cs.CL stat.ML

    Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning

    Authors: Stefan Ultes, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gašić, Steve Young

    Abstract: Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: Accepted at SIGDial 2017