Cross-lingual Candidate Search for Biomedical Concept Normalization

Roller, Roland; Kittner, Madeleine; Weissenborn, Dirk; Leser, Ulf

Computer Science > Computation and Language

arXiv:1805.01646 (cs)

[Submitted on 4 May 2018]

Title:Cross-lingual Candidate Search for Biomedical Concept Normalization

Authors:Roland Roller, Madeleine Kittner, Dirk Weissenborn, Ulf Leser

View PDF

Abstract:Biomedical concept normalization links concept mentions in texts to a semantically equivalent concept in a biomedical knowledge base. This task is challenging as concepts can have different expressions in natural languages, e.g. paraphrases, which are not necessarily all present in the knowledge base. Concept normalization of non-English biomedical text is even more challenging as non-English resources tend to be much smaller and contain less synonyms. To overcome the limitations of non-English terminologies we propose a cross-lingual candidate search for concept normalization using a character-based neural translation model trained on a multilingual biomedical terminology. Our model is trained with Spanish, French, Dutch and German versions of UMLS. The evaluation of our model is carried out on the French Quaero corpus, showing that it outperforms most teams of CLEF eHealth 2015 and 2016. Additionally, we compare performance to commercial translators on Spanish, French, Dutch and German versions of Mantra. Our model performs similarly well, but is free of charge and can be run locally. This is particularly important for clinical NLP applications as medical documents underlay strict privacy restrictions.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1805.01646 [cs.CL]
	(or arXiv:1805.01646v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1805.01646

Submission history

From: Roland Roller [view email]
[v1] Fri, 4 May 2018 08:11:09 UTC (141 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Roland Roller
Madeleine Kittner
Dirk Weissenborn
Ulf Leser

export BibTeX citation

Computer Science > Computation and Language

Title:Cross-lingual Candidate Search for Biomedical Concept Normalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Cross-lingual Candidate Search for Biomedical Concept Normalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators