Skip to main content

Showing 1–4 of 4 results for author: Cordier, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.11199  [pdf, other

    cs.CL

    Few-Shot Structured Policy Learning for Multi-Domain and Multi-Task Dialogues

    Authors: Thibault Cordier, Tanguy Urvoy, Fabrice Lefevre, Lina M. Rojas-Barahona

    Abstract: Reinforcement learning has been widely adopted to model dialogue managers in task-oriented dialogues. However, the user simulator provided by state-of-the-art dialogue frameworks are only rough approximations of human behaviour. The ability to learn from a small number of human interactions is hence crucial, especially on multi-domain and multi-task environments where the action space is large. We… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: 8 pages, at the EACL2023 conference (Findings)

  2. arXiv:2210.05252  [pdf, other

    cs.CL

    Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues

    Authors: Thibault Cordier, Tanguy Urvoy, Fabrice Lefèvre, Lina M. Rojas-Barahona

    Abstract: Task-oriented dialogue systems are designed to achieve specific goals while conversing with humans. In practice, they may have to handle simultaneously several domains and tasks. The dialogue manager must therefore be able to take into account domain changes and plan over different domains/tasks in order to deal with multidomain dialogues. However, learning with reinforcement in such context becom… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Journal ref: SIGDIAL 2022

  3. arXiv:2209.05779  [pdf, other

    cs.LG cs.AI cs.CV

    Test-Time Adaptation with Principal Component Analysis

    Authors: Thomas Cordier, Victor Bouvier, Gilles Hénaff, Céline Hudelot

    Abstract: Machine Learning models are prone to fail when test data are different from training data, a situation often encountered in real applications known as distribution shift. While still valid, the training-time knowledge becomes less effective, requiring a test-time adaptation to maintain high performance. Following approaches that assume batch-norm layer and use their statistics for adaptation, we p… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 7 pages, 2 figures, 2 tables, accepted at Workshop on Trustworthy Artificial Intelligence in conjunction with ECML/PKDD 22

  4. arXiv:2012.04687  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

    Authors: Thibault Cordier, Tanguy Urvoy, Lina M. Rojas-Barahona, Fabrice Lefèvre

    Abstract: A learning dialogue agent can infer its behaviour from interactions with the users. These interactions can be taken from either human-to-human or human-machine conversations. However, human interactions are scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's exploration with the help of an expert. We present in th… ▽ More

    Submitted 25 November, 2020; originally announced December 2020.

    Comments: 8 pages, Accepted at Human in the Loop Dialogue Systems Workshop, NeurIPS 2020