Search | arXiv e-print repository

arXiv:2307.05131 [pdf, other]

Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Authors: Anastasios Nentidis, Georgios Katsimpras, Anastasia Krithara, Salvador Lima López, Eulália Farré-Maduell, Luis Gasco, Martin Krallinger, Georgios Paliouras

Abstract: This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on… ▽ More This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on semantic annotation of clinical content in Spanish with medical procedures, which have a critical role in medical practice. In this edition of BioASQ, 28 competing teams submitted the results of more than 150 distinct systems in total for the three different shared tasks of the challenge. Similarly to previous editions, most of the participating systems achieved competitive performance, suggesting the continuous advancement of the state-of-the-art in the field. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 24 pages, 12 tables, 3 figures. CLEF2023. arXiv admin note: text overlap with arXiv:2210.06852

arXiv:2301.09350 [pdf, other]

doi 10.1016/j.jbi.2023.104499

Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature

Authors: Anastasios Nentidis, Thomas Chatzopoulos, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

Abstract: Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on conc… ▽ More Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further. △ Less

Submitted 5 October, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: 26 pages, 5 figures, 4 tables. A more concise version

Journal ref: Journal of Biomedical Informatics, Volume 146, 2023, 104499, ISSN 1532-0464

arXiv:2210.06852 [pdf, other]

doi 10.1007/978-3-031-13643-6_22

Overview of BioASQ 2022: The tenth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Authors: Anastasios Nentidis, Georgios Katsimpras, Eirini Vandorou, Anastasia Krithara, Antonio Miranda-Escalada, Luis Gasco, Martin Krallinger, Georgios Paliouras

Abstract: This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of the three established tasks a, b, and Synergy, and… ▽ More This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of the three established tasks a, b, and Synergy, and a new task named DisTEMIST for automatic semantic annotation and grounding of diseases from clinical content in Spanish, a key concept for semantic indexing and search engines of literature and clinical records. This year, BioASQ received more than 170 distinct systems from 38 teams in total for the four different tasks of the challenge. As in previous years, the majority of the competing systems outperformed the strong baselines, indicating the continuous advancement of the state-of-the-art in this domain. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 25 pages, 14 tables, 4 figures. arXiv admin note: substantial text overlap with arXiv:2106.14885

Journal ref: Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2022. Lecture Notes in Computer Science, vol 13390. Springer, Cham

arXiv:2106.14885 [pdf, other]

doi 10.1007/978-3-030-85251-1_18

Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Authors: Anastasios Nentidis, Georgios Katsimpras, Eirini Vandorou, Anastasia Krithara, Luis Gasco, Martin Krallinger, Georgios Paliouras

Abstract: Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This paper presents an overview of the ninth edition of th… ▽ More Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This paper presents an overview of the ninth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2021. In this year, a new question answering task, named Synergy, is introduced to support researchers studying the COVID-19 disease and measure the ability of the participating teams to discern information while the problem is still develo**. In total, 42 teams with more than 170 systems were registered to participate in the four tasks of the challenge. The evaluation results, similarly to previous years, show a performance gain against the baselines which indicates the continuous improvement of the state-of-the-art in this field. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: 25 pages, 15 tables, 3 figures. arXiv admin note: text overlap with arXiv:2106.14618

Journal ref: Candan K.S. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science, vol 12880. Springer, Cham

arXiv:2106.14618 [pdf, other]

doi 10.1007/978-3-030-58219-7_16

Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Authors: Anastasios Nentidis, Anastasia Krithara, Konstantinos Bougiatiotis, Martin Krallinger, Carlos Rodriguez-Penagos, Marta Villegas, Georgios Paliouras

Abstract: In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different te… ▽ More In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different teams develop systems that compete on the same demanding benchmark datasets that represent the real information needs of experts in the biomedical domain. This year, the challenge has been extended with the introduction of a new task on medical semantic indexing in Spanish. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge. As in previous years, the results of the evaluation reveal that the top-performing systems managed to outperform the strong baselines, which suggests that state-of-the-art systems keep pushing the frontier of research through continuous improvements. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: 21 pages, 10 tables, 3 figures

Journal ref: Arampatzis A. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2020. Lecture Notes in Computer Science, vol 12260. Springer, Cham

arXiv:2106.00302 [pdf, other]

Harvesting the Public MeSH Note field

Authors: Anastasios Nentidis, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

Abstract: In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020. The aim of this analysis was to extract information about the previous status of these new descriptors as Supplementary Concept Records. The Public MeSH Note field contains information in semi-structured text, meant to be read by humans. Therefore, we… ▽ More In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020. The aim of this analysis was to extract information about the previous status of these new descriptors as Supplementary Concept Records. The Public MeSH Note field contains information in semi-structured text, meant to be read by humans. Therefore, we adopted a semi-automated approach, based on regular expressions, to extract information from it. In the large majority of cases, we managed to minimize the required manual effort for extracting the previous state of a new descriptor as a Supplementary Concept Record. The source code for this analysis is openly available on GitHub. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: 3 pages, 1 figure, 1 table. Technical report

arXiv:2101.08293 [pdf, other]

doi 10.1007/s00799-021-00304-z

What is all this new MeSH about? Exploring the semantic provenance of new descriptors in the MeSH thesaurus

Authors: Anastasios Nentidis, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

Abstract: The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary widely used in biomedical knowledge systems, particularly for semantic indexing of scientific literature. As the MeSH hierarchy evolves through annual version updates, some new descriptors are introduced that were not previously available. This paper explores the conceptual provenance of these new descriptors. In particular,… ▽ More The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary widely used in biomedical knowledge systems, particularly for semantic indexing of scientific literature. As the MeSH hierarchy evolves through annual version updates, some new descriptors are introduced that were not previously available. This paper explores the conceptual provenance of these new descriptors. In particular, we investigate whether such new descriptors have been previously covered by older descriptors and what is their current relation to them. To this end, we propose a framework to categorize new descriptors based on their current relation to older descriptors. Based on the proposed classification scheme, we quantify, analyse and present the different types of new descriptors introduced in MeSH during the last fifteen years. The results show that only about 25% of new MeSH descriptors correspond to new emerging concepts, whereas the rest were previously covered by one or more existing descriptors, either implicitly or explicitly. Most of them were covered by a single existing descriptor and they usually end up as descendants of it in the current hierarchy, gradually leading towards a more fine-grained MeSH vocabulary. These insights about the dynamics of the thesaurus are useful for the retrospective study of scientific articles annotated with MeSH, but could also be used to inform the policy of updating the thesaurus in the future. △ Less

Submitted 27 July, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

Comments: 18 pages, 14 figures, 2 tables

arXiv:2006.09174 [pdf, other]

doi 10.1007/978-3-030-43887-6_51

Results of the seventh edition of the BioASQ Challenge

Authors: Anastasios Nentidis, Konstantinos Bougiatiotis, Anastasia Krithara, Georgios Paliouras

Abstract: The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the… ▽ More The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: 17 pages, 2 figures

Journal ref: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham

arXiv:2005.07638 [pdf, other]

doi 10.1016/j.ipm.2020.102282

Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision

Authors: Anastasios Nentidis, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

Abstract: In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of… ▽ More In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of detail available in the domain knowledge and may not be sufficient to fulfil the information needs of experts in the domain. To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease. We test this method on the MeSH descriptors for two diseases: Alzheimer's Disease and Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a strong heuristic for automated subject annotation refinement and its use as weak supervision can lead to improved concept-level annotations. The fine-grained semantic annotations can enable more precise literature retrieval, sustain the semantic integration of subject annotations with other domain resources and ease the maintenance of consistent subject annotations, as new more detailed entries are added in the MeSH thesaurus over time. △ Less

Submitted 18 May, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

Comments: 36 pages, 8 figures; Dictionary-based baselines added and conclusions updated

Journal ref: Information Processing and Management 57 (2020) 102282

arXiv:2002.08762 [pdf, other]

Guiding Graph Embeddings using Path-Ranking Methods for Error Detection innoisy Knowledge Graphs

Authors: K. Bougiatiotis, R. Fasoulis, F. Aisopos, A. Nentidis, G. Paliouras

Abstract: Nowadays Knowledge Graphs constitute a mainstream approach for the representation of relational information on big heterogeneous data, however, they may contain a big amount of imputed noise when constructed automatically. To address this problem, different error detection methodologies have been proposed, mainly focusing on path ranking and representation learning. This work presents various main… ▽ More Nowadays Knowledge Graphs constitute a mainstream approach for the representation of relational information on big heterogeneous data, however, they may contain a big amount of imputed noise when constructed automatically. To address this problem, different error detection methodologies have been proposed, mainly focusing on path ranking and representation learning. This work presents various mainstream approaches and proposes a hybrid and modular methodology for the task. We compare different methods on two benchmarks and one real-world biomedical publications dataset, showcasing the potential of our approach and providing insights on graph embeddings when dealing with noisy Knowledge Graphs. △ Less

Submitted 12 December, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: 9 pages, 2 figures. To appear in GCLR 2021: AAAI 2021 Workshop on Graphs and more Complex structures for Learning and Reasonin

arXiv:1912.08633 [pdf, other]

iASiS Open Data Graph: Automated Semantic Integration of Disease-Specific Knowledge

Authors: Anastasios Nentidis, Konstantinos Bougiatiotis, Anastasia Krithara, Georgios Paliouras

Abstract: In biomedical research, unified access to up-to-date domain-specific knowledge is crucial, as such knowledge is continuously accumulated in scientific literature and structured resources. Identifying and extracting specific information is a challenging task and computational analysis of knowledge bases can be valuable in this direction. However, for disease-specific analyses researchers often need… ▽ More In biomedical research, unified access to up-to-date domain-specific knowledge is crucial, as such knowledge is continuously accumulated in scientific literature and structured resources. Identifying and extracting specific information is a challenging task and computational analysis of knowledge bases can be valuable in this direction. However, for disease-specific analyses researchers often need to compile their own datasets, integrating knowledge from different resources, or reuse existing datasets, that can be out-of-date. In this study, we propose a framework to automatically retrieve and integrate disease-specific knowledge into an up-to-date semantic graph, the iASiS Open Data Graph. This disease-specific semantic graph provides access to knowledge relevant to specific concepts and their individual aspects, in the form of concept relations and attributes. The proposed approach is implemented as an open-source framework and applied to three diseases (Lung Cancer, Dementia, and Duchenne Muscular Dystrophy). Exemplary queries are presented, investigating the potential of this automatically generated semantic graph as a basis for retrieval and analysis of disease-specific knowledge. △ Less

Submitted 2 June, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

Comments: 6 pages, 2 figures, accepted in IEEE 33rd International Symposium on Computer Based Medical Systems (CBMS2020)

Showing 1–11 of 11 results for author: Nentidis, A