A Survey of Spanish Clinical Language Models
Authors:
Guillem García Subies,
Álvaro Barbero Jiménez,
Paloma Martínez Fernández
Abstract:
This survey focuses in encoder Language Models for solving tasks in the clinical domain in the Spanish language. We review the contributions of 17 corpora focused mainly in clinical tasks, then list the most relevant Spanish Language Models and Spanish Clinical Language models. We perform a thorough comparison of these models by benchmarking them over a curated subset of the available corpora, in…
▽ More
This survey focuses in encoder Language Models for solving tasks in the clinical domain in the Spanish language. We review the contributions of 17 corpora focused mainly in clinical tasks, then list the most relevant Spanish Language Models and Spanish Clinical Language models. We perform a thorough comparison of these models by benchmarking them over a curated subset of the available corpora, in order to find the best-performing ones; in total more than 3000 models were fine-tuned for this study. All the tested corpora and the best models are made publically available in an accessible way, so that the results can be reproduced by independent teams or challenged in the future when new Spanish Clinical Language models are created.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
RigoBERTa: A State-of-the-Art Language Model For Spanish
Authors:
Alejandro Vaca Serrano,
Guillem Garcia Subies,
Helena Montoro Zamorano,
Nuria Aldama Garcia,
Doaa Samy,
David Betancur Sanchez,
Antonio Moreno Sandoval,
Marta Guerrero Nieto,
Alvaro Barbero Jimenez
Abstract:
This paper presents RigoBERTa, a State-of-the-Art Language Model for Spanish. RigoBERTa is trained over a well-curated corpus formed up from different subcorpora with key features. It follows the DeBERTa architecture, which has several advantages over other architectures of similar size as BERT or RoBERTa. RigoBERTa performance is assessed over 13 NLU tasks in comparison with other available Spani…
▽ More
This paper presents RigoBERTa, a State-of-the-Art Language Model for Spanish. RigoBERTa is trained over a well-curated corpus formed up from different subcorpora with key features. It follows the DeBERTa architecture, which has several advantages over other architectures of similar size as BERT or RoBERTa. RigoBERTa performance is assessed over 13 NLU tasks in comparison with other available Spanish language models, namely, MarIA, BERTIN and BETO. RigoBERTa outperformed the three models in 10 out of the 13 tasks, achieving new "State-of-the-Art" results.
△ Less
Submitted 3 June, 2022; v1 submitted 27 April, 2022;
originally announced May 2022.