Skip to main content

Showing 1–5 of 5 results for author: Lavergne, T

.
  1. arXiv:2403.18336  [pdf, other

    cs.CL cs.LG

    A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages

    Authors: Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum

    Abstract: User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social m… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  2. arXiv:2204.10360  [pdf, other

    cs.CL

    Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction

    Authors: Hui-Syuan Yeh, Thomas Lavergne, Pierre Zweigenbaum

    Abstract: Relation extraction is a core problem for natural language processing in the biomedical domain. Recent research on relation extraction showed that prompt-based learning improves the performance on both fine-tuning on full training set and few-shot training. However, less effort has been made on domain-specific tasks where good prompt design can be even harder. In this paper, we investigate prompti… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  3. arXiv:2010.10392  [pdf, other

    cs.CL

    CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

    Authors: Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum, Junichi Tsujii

    Abstract: Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers. While this system is thought to achieve a good balance between the flexibility of characters and the efficiency of f… ▽ More

    Submitted 31 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 13 pages, 8 figures and 3 tables. Accepted at COLING 2020

  4. arXiv:1905.13354  [pdf, other

    cs.CL

    DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation

    Authors: Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski

    Abstract: We present a new English-French test set for the evaluation of Machine Translation (MT) for informal, written bilingual dialogue. The test set contains 144 spontaneous dialogues (5,700+ sentences) between native English and French speakers, mediated by one of two neural MT systems in a range of role-play settings. The dialogues are accompanied by fine-grained sentence-level judgments of MT quality… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  5. Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling

    Authors: Nataliya Sokolovska, Thomas Lavergne, Olivier Cappé, François Yvon

    Abstract: Conditional Random Fields (CRFs) constitute a popular and efficient approach for supervised sequence labelling. CRFs can cope with large description spaces and can integrate some form of structural dependency between labels. In this contribution, we address the issue of efficient feature selection for CRFs based on imposing sparsity through an L1 penalty. We first show how sparsity of the parame… ▽ More

    Submitted 3 January, 2010; v1 submitted 7 September, 2009; originally announced September 2009.