-
Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI)
Authors:
Sabrina Toro,
Anna V Anagnostopoulos,
Sue Bello,
Kai Blumberg,
Rhiannon Cameron,
Leigh Carmody,
Alexander D Diehl,
Damion Dooley,
William Duncan,
Petra Fey,
Pascale Gaudet,
Nomi L Harris,
Marcin Joachimiak,
Leila Kiani,
Tiago Lubiana,
Monica C Munoz-Torres,
Shawn O'Neil,
David Osumi-Sutherland,
Aleix Puig,
Justin P Reese,
Leonore Reiser,
Sofia Robb,
Troy Ruem**,
James Seager,
Eric Sid
, et al. (5 additional authors not shown)
Abstract:
Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dyna…
▽ More
Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI), an ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). DRAGON-AI can generate textual and logical ontology components, drawing from existing knowledge in multiple ontologies and unstructured text sources.
Results: We assessed performance of DRAGON-AI on de novo term construction across ten diverse ontologies, making use of extensive manual evaluation of results. Our method has high precision for relationship generation, but has slightly lower precision than from logic-based reasoning. Our method is also able to generate definitions deemed acceptable by expert evaluators, but these scored worse than human-authored definitions. Notably, evaluators with the highest level of confidence in a domain were better able to discern flaws in AI-generated definitions. We also demonstrated the ability of DRAGON-AI to incorporate natural language instructions in the form of GitHub issues.
Conclusions: These findings suggest DRAGON-AI's potential to substantially aid the manual ontology construction process. However, our results also underscore the importance of having expert curators and ontology editors drive the ontology generation process.
△ Less
Submitted 12 June, 2024; v1 submitted 17 December, 2023;
originally announced December 2023.
-
KG-Hub -- Building and Exchanging Biological Knowledge Graphs
Authors:
J Harry Caufield,
Tim Putman,
Kevin Schaper,
Deepak R Unni,
Harshad Hegde,
Tiffany J Callahan,
Luca Cappelletti,
Sierra AT Moxon,
Vida Ravanmehr,
Seth Carbon,
Lauren E Chan,
Katherina Cortes,
Kent A Shefchek,
Glass Elsarboukh,
James P Balhoff,
Tommaso Fontana,
Nicolas Matentzoglu,
Richard M Bruskiewich,
Anne E Thessen,
Nomi L Harris,
Monica C Munoz-Torres,
Melissa A Haendel,
Peter N Robinson,
Marcin P Joachimiak,
Christopher J Mungall
, et al. (1 additional authors not shown)
Abstract:
Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simp…
▽ More
Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simple, modular extract-transform-load (ETL) pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate knowledge graphs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph machine learning, including node embeddings and training of models for link prediction and node classification.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
A Simple Standard for Sharing Ontological Map**s (SSSOM)
Authors:
Nicolas Matentzoglu,
James P. Balhoff,
Susan M. Bello,
Chris Bizon,
Matthew Brush,
Tiffany J. Callahan,
Christopher G Chute,
William D. Duncan,
Chris T. Evelo,
Davera Gabriel,
John Graybeal,
Alasdair Gray,
Benjamin M. Gyori,
Melissa Haendel,
Henriette Harmse,
Nomi L. Harris,
Ian Harrow,
Harshad Hegde,
Amelia L. Hoyt,
Charles T. Hoyt,
Dazhi Jiao,
Ernesto Jiménez-Ruiz,
Simon Jupp,
Hyeongsik Kim,
Sebastian Koehler
, et al. (19 additional authors not shown)
Abstract:
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for map** between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Map**s often lack the metadata needed to be correctly interpreted and applied. For example, ar…
▽ More
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for map** between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Map**s often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how map**s were done makes it hard to combine and reconcile map**s, particularly curated and automated ones.
The Simple Standard for Sharing Ontological Map**s (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in map**s explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and map** practices. 4. Providing reference tools and software libraries for working with the standard.
In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of map**s, with the goal of making map**s Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.