-
Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Authors:
Anastasios Nentidis,
Georgios Katsimpras,
Anastasia Krithara,
Salvador Lima López,
Eulália Farré-Maduell,
Luis Gasco,
Martin Krallinger,
Georgios Paliouras
Abstract:
This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on…
▽ More
This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on semantic annotation of clinical content in Spanish with medical procedures, which have a critical role in medical practice. In this edition of BioASQ, 28 competing teams submitted the results of more than 150 distinct systems in total for the three different shared tasks of the challenge. Similarly to previous editions, most of the participating systems achieved competitive performance, suggesting the continuous advancement of the state-of-the-art in the field.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature
Authors:
Anastasios Nentidis,
Thomas Chatzopoulos,
Anastasia Krithara,
Grigorios Tsoumakas,
Georgios Paliouras
Abstract:
Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on conc…
▽ More
Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.
△ Less
Submitted 5 October, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Overview of BioASQ 2022: The tenth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Authors:
Anastasios Nentidis,
Georgios Katsimpras,
Eirini Vandorou,
Anastasia Krithara,
Antonio Miranda-Escalada,
Luis Gasco,
Martin Krallinger,
Georgios Paliouras
Abstract:
This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of the three established tasks a, b, and Synergy, and…
▽ More
This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of the three established tasks a, b, and Synergy, and a new task named DisTEMIST for automatic semantic annotation and grounding of diseases from clinical content in Spanish, a key concept for semantic indexing and search engines of literature and clinical records. This year, BioASQ received more than 170 distinct systems from 38 teams in total for the four different tasks of the challenge. As in previous years, the majority of the competing systems outperformed the strong baselines, indicating the continuous advancement of the state-of-the-art in this domain.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Authors:
Anastasios Nentidis,
Georgios Katsimpras,
Eirini Vandorou,
Anastasia Krithara,
Luis Gasco,
Martin Krallinger,
Georgios Paliouras
Abstract:
Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This paper presents an overview of the ninth edition of th…
▽ More
Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This paper presents an overview of the ninth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2021. In this year, a new question answering task, named Synergy, is introduced to support researchers studying the COVID-19 disease and measure the ability of the participating teams to discern information while the problem is still develo**. In total, 42 teams with more than 170 systems were registered to participate in the four tasks of the challenge. The evaluation results, similarly to previous years, show a performance gain against the baselines which indicates the continuous improvement of the state-of-the-art in this field.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Authors:
Anastasios Nentidis,
Anastasia Krithara,
Konstantinos Bougiatiotis,
Martin Krallinger,
Carlos Rodriguez-Penagos,
Marta Villegas,
Georgios Paliouras
Abstract:
In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different te…
▽ More
In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different teams develop systems that compete on the same demanding benchmark datasets that represent the real information needs of experts in the biomedical domain. This year, the challenge has been extended with the introduction of a new task on medical semantic indexing in Spanish. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge. As in previous years, the results of the evaluation reveal that the top-performing systems managed to outperform the strong baselines, which suggests that state-of-the-art systems keep pushing the frontier of research through continuous improvements.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Harvesting the Public MeSH Note field
Authors:
Anastasios Nentidis,
Anastasia Krithara,
Grigorios Tsoumakas,
Georgios Paliouras
Abstract:
In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020. The aim of this analysis was to extract information about the previous status of these new descriptors as Supplementary Concept Records. The Public MeSH Note field contains information in semi-structured text, meant to be read by humans. Therefore, we…
▽ More
In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020. The aim of this analysis was to extract information about the previous status of these new descriptors as Supplementary Concept Records. The Public MeSH Note field contains information in semi-structured text, meant to be read by humans. Therefore, we adopted a semi-automated approach, based on regular expressions, to extract information from it. In the large majority of cases, we managed to minimize the required manual effort for extracting the previous state of a new descriptor as a Supplementary Concept Record. The source code for this analysis is openly available on GitHub.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
What is all this new MeSH about? Exploring the semantic provenance of new descriptors in the MeSH thesaurus
Authors:
Anastasios Nentidis,
Anastasia Krithara,
Grigorios Tsoumakas,
Georgios Paliouras
Abstract:
The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary widely used in biomedical knowledge systems, particularly for semantic indexing of scientific literature. As the MeSH hierarchy evolves through annual version updates, some new descriptors are introduced that were not previously available. This paper explores the conceptual provenance of these new descriptors. In particular,…
▽ More
The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary widely used in biomedical knowledge systems, particularly for semantic indexing of scientific literature. As the MeSH hierarchy evolves through annual version updates, some new descriptors are introduced that were not previously available. This paper explores the conceptual provenance of these new descriptors. In particular, we investigate whether such new descriptors have been previously covered by older descriptors and what is their current relation to them. To this end, we propose a framework to categorize new descriptors based on their current relation to older descriptors. Based on the proposed classification scheme, we quantify, analyse and present the different types of new descriptors introduced in MeSH during the last fifteen years. The results show that only about 25% of new MeSH descriptors correspond to new emerging concepts, whereas the rest were previously covered by one or more existing descriptors, either implicitly or explicitly. Most of them were covered by a single existing descriptor and they usually end up as descendants of it in the current hierarchy, gradually leading towards a more fine-grained MeSH vocabulary. These insights about the dynamics of the thesaurus are useful for the retrospective study of scientific articles annotated with MeSH, but could also be used to inform the policy of updating the thesaurus in the future.
△ Less
Submitted 27 July, 2021; v1 submitted 20 January, 2021;
originally announced January 2021.
-
Results of the seventh edition of the BioASQ Challenge
Authors:
Anastasios Nentidis,
Konstantinos Bougiatiotis,
Anastasia Krithara,
Georgios Paliouras
Abstract:
The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the…
▽ More
The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision
Authors:
Anastasios Nentidis,
Anastasia Krithara,
Grigorios Tsoumakas,
Georgios Paliouras
Abstract:
In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of…
▽ More
In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of detail available in the domain knowledge and may not be sufficient to fulfil the information needs of experts in the domain. To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease. We test this method on the MeSH descriptors for two diseases: Alzheimer's Disease and Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a strong heuristic for automated subject annotation refinement and its use as weak supervision can lead to improved concept-level annotations. The fine-grained semantic annotations can enable more precise literature retrieval, sustain the semantic integration of subject annotations with other domain resources and ease the maintenance of consistent subject annotations, as new more detailed entries are added in the MeSH thesaurus over time.
△ Less
Submitted 18 May, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Guiding Graph Embeddings using Path-Ranking Methods for Error Detection innoisy Knowledge Graphs
Authors:
K. Bougiatiotis,
R. Fasoulis,
F. Aisopos,
A. Nentidis,
G. Paliouras
Abstract:
Nowadays Knowledge Graphs constitute a mainstream approach for the representation of relational information on big heterogeneous data, however, they may contain a big amount of imputed noise when constructed automatically. To address this problem, different error detection methodologies have been proposed, mainly focusing on path ranking and representation learning. This work presents various main…
▽ More
Nowadays Knowledge Graphs constitute a mainstream approach for the representation of relational information on big heterogeneous data, however, they may contain a big amount of imputed noise when constructed automatically. To address this problem, different error detection methodologies have been proposed, mainly focusing on path ranking and representation learning. This work presents various mainstream approaches and proposes a hybrid and modular methodology for the task. We compare different methods on two benchmarks and one real-world biomedical publications dataset, showcasing the potential of our approach and providing insights on graph embeddings when dealing with noisy Knowledge Graphs.
△ Less
Submitted 12 December, 2020; v1 submitted 19 February, 2020;
originally announced February 2020.
-
iASiS Open Data Graph: Automated Semantic Integration of Disease-Specific Knowledge
Authors:
Anastasios Nentidis,
Konstantinos Bougiatiotis,
Anastasia Krithara,
Georgios Paliouras
Abstract:
In biomedical research, unified access to up-to-date domain-specific knowledge is crucial, as such knowledge is continuously accumulated in scientific literature and structured resources. Identifying and extracting specific information is a challenging task and computational analysis of knowledge bases can be valuable in this direction. However, for disease-specific analyses researchers often need…
▽ More
In biomedical research, unified access to up-to-date domain-specific knowledge is crucial, as such knowledge is continuously accumulated in scientific literature and structured resources. Identifying and extracting specific information is a challenging task and computational analysis of knowledge bases can be valuable in this direction. However, for disease-specific analyses researchers often need to compile their own datasets, integrating knowledge from different resources, or reuse existing datasets, that can be out-of-date. In this study, we propose a framework to automatically retrieve and integrate disease-specific knowledge into an up-to-date semantic graph, the iASiS Open Data Graph. This disease-specific semantic graph provides access to knowledge relevant to specific concepts and their individual aspects, in the form of concept relations and attributes. The proposed approach is implemented as an open-source framework and applied to three diseases (Lung Cancer, Dementia, and Duchenne Muscular Dystrophy). Exemplary queries are presented, investigating the potential of this automatically generated semantic graph as a basis for retrieval and analysis of disease-specific knowledge.
△ Less
Submitted 2 June, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.