Skip to main content

Showing 1–1 of 1 results for author: Andres, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14312  [pdf, other

    cs.CL cs.AI

    Infusing clinical knowledge into tokenisers for language models

    Authors: Abul Hasan, **ge Wu, Quang Ngoc Nguyen, Salomé Andres, Imane Guellil, Huayu Zhang, Arlene Casey, Beatrice Alex, Bruce Guthrie, Honghan Wu

    Abstract: This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures