Search | arXiv e-print repository

An Open-Source Knowledge Graph Ecosystem for the Life Sciences

Authors: Tiffany J. Callahan, Ignacio J. Tripodi, Adrianne L. Stefanski, Luca Cappelletti, Sanya B. Taneja, Jordan M. Wyrwa, Elena Casiraghi, Nicolas A. Matentzoglu, Justin Reese, Jonathan C. Silverstein, Charles Tapley Hoyt, Richard D. Boyce, Scott A. Malec, Deepak R. Unni, Marcin P. Joachimiak, Peter N. Robinson, Christopher J. Mungall, Emanuele Cavalleri, Tommaso Fontana, Giorgio Valentini, Marco Mesiti, Lucas A. Gillenwater, Brook Santangelo, Nicole A. Vasilevsky, Robert Hoehndorf , et al. (7 additional authors not shown)

Abstract: Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integrat… ▽ More Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability. △ Less

Submitted 30 January, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

arXiv:2209.11950 [pdf]

doi 10.1016/j.jbi.2023.104341

Develo** a Knowledge Graph Framework for Pharmacokinetic Natural Product-Drug Interactions

Authors: Sanya B. Taneja, Tiffany J. Callahan, Mary F. Paine, Sandra L. Kane-Gill, Halil Kilicoglu, Marcin P. Joachimiak, Richard D. Boyce

Abstract: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical natural products are co-consumed with pharmaceutical drugs. Understanding mechanisms of NPDIs is key to preventing adverse events. We constructed a knowledge graph framework, NP-KG, as a step toward computational discovery of pharmacokinetic NPDIs. NP-KG is a heterogeneous KG with biomedical ontologies, linked data, and… ▽ More Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical natural products are co-consumed with pharmaceutical drugs. Understanding mechanisms of NPDIs is key to preventing adverse events. We constructed a knowledge graph framework, NP-KG, as a step toward computational discovery of pharmacokinetic NPDIs. NP-KG is a heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature, constructed with the Phenotype Knowledge Translator framework and the semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through path searches and meta-path discovery to determine congruent and contradictory information compared to ground truth data. The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify pharmacokinetic interactions involving enzymes, transporters, and pharmaceutical drugs. We envision that NP-KG will facilitate improved human-machine collaboration to guide researchers in future studies of pharmacokinetic NPDIs. The NP-KG framework is publicly available at https://doi.org/10.5281/zenodo.6814507 and https://github.com/sanyabt/np-kg. △ Less

Submitted 24 September, 2022; originally announced September 2022.

Journal ref: Journal of Biomedical Informatics 140 (2023) 104341

arXiv:2209.04732 [pdf]

Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

Authors: Tiffany J. Callahan, Adrianne L. Stefanski, Jordan M. Wyrwa, Chenjie Zeng, Anna Ostropolets, Juan M. Banda, William A. Baumgartner Jr., Richard D. Boyce, Elena Casiraghi, Ben D. Coleman, Janine H. Collins, Sara J. Deakyne-Davies, James A. Feinstein, Melissa A. Haendel, Asiyah Y. Lin, Blake Martin, Nicolas A. Matentzoglu, Daniella Meeker, Justin Reese, Jessica Sinclair, Sanya B. Taneja, Katy E. Trinkley, Nicole A. Vasilevsky, Andrew Williams, Xingman A. Zhang , et al. (7 additional authors not shown)

Abstract: Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenoty**. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, map** EHR data to OB… ▽ More Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenoty**. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, map** EHR data to OBO ontologies requires significant manual curation and domain expertise. Objective: We introduce OMOP2OBO, an algorithm for map** Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Results: Using OMOP2OBO, we produced map**s for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the map**s helped systematically identify undiagnosed patients who might benefit from genetic testing. Conclusions: By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenoty**. △ Less

Submitted 30 January, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

Comments: Supplementary Material is included at the end of the manuscript

ACM Class: J.3

arXiv:2207.00414 [pdf, other]

Artificial Intelligence Techniques for Next-Generation Mega Satellite Networks

Authors: Bassel Al Homssi, Kosta Dakic, Ke Wang, Tansu Alpcan, Ben Allen, Russell Boyce, Sithamparanathan Kandeepan, Akram Al-Hourani, Walid Saad

Abstract: Space communications, particularly massive satellite networks, re-emerged as an appealing candidate for next generation networks due to major advances in space launching, electronics, processing power, and miniaturization. However, massive satellite networks rely on numerous underlying and intertwined processes that cannot be truly captured using conventionally used models, due to their dynamic an… ▽ More Space communications, particularly massive satellite networks, re-emerged as an appealing candidate for next generation networks due to major advances in space launching, electronics, processing power, and miniaturization. However, massive satellite networks rely on numerous underlying and intertwined processes that cannot be truly captured using conventionally used models, due to their dynamic and unique features such as orbital speed, inter-satellite links, short pass time, and satellite footprint, among others. Hence, new approaches are needed to enable the network to proactively adjust to the rapidly varying conditions associated within the link. Artificial intelligence (AI) provides a pathway to capture these processes, analyze their behavior, and model their effect on the network. This article introduces the application of AI techniques for integrated terrestrial satellite networks, particularly massive satellite network communications. It details the unique features of massive satellite networks, and the overarching challenges concomitant with their integration into the current communication infrastructure. Moreover, this article provides insights into state-of-the-art AI techniques across various layers of the communication link. This entails applying AI for forecasting the highly dynamic radio channel, spectrum sensing and classification, signal detection and demodulation, inter-satellite and satellite access network optimization, and network security. Moreover, future paradigms and the map** of these mechanisms onto practical networks are outlined. △ Less

Submitted 16 September, 2023; v1 submitted 2 June, 2022; originally announced July 2022.

arXiv:2105.02746 [pdf, other]

Introducing Information Retrieval for Biomedical Informatics Students

Authors: Sanya B. Taneja, Richard D. Boyce, William T. Reynolds, Denis Newman-Griffis

Abstract: Introducing biomedical informatics (BMI) students to natural language processing (NLP) requires balancing technical depth with practical know-how to address application-focused needs. We developed a set of three activities introducing introductory BMI students to information retrieval with NLP, covering document representation strategies and language models from TF-IDF to BERT. These activities pr… ▽ More Introducing biomedical informatics (BMI) students to natural language processing (NLP) requires balancing technical depth with practical know-how to address application-focused needs. We developed a set of three activities introducing introductory BMI students to information retrieval with NLP, covering document representation strategies and language models from TF-IDF to BERT. These activities provide students with hands-on experience targeted towards common use cases, and introduce fundamental components of NLP workflows for a wide variety of applications. △ Less

Submitted 6 May, 2021; originally announced May 2021.

Comments: To appear in the Proceedings of the Fifth Workshop on Teaching NLP @ NAACL

arXiv:1912.12371 [pdf]

Open Source Software Sustainability Models: Initial White Paper from the Informatics Technology for Cancer Research Sustainability and Industry Partnership Work Group

Authors: Y. Ye, R. D. Boyce, M. K. Davis, K. Elliston, C. Davatzikos, A. Fedorov, J. C. Fillion-Robin, I. Foster, J. Gilbertson, M. Heiskanen, J. Klemm, A. Lasso, J. V. Miller, M. Morgan, S. Pieper, B. Raumann, B. Sarachan, G. Savova, J. C. Silverstein, D. Taylor, J. Zelnis, G. Q. Zhang, M. J. Becich

Abstract: The Sustainability and Industry Partnership Work Group (SIP-WG) is a part of the National Cancer Institute Informatics Technology for Cancer Research (ITCR) program. The charter of the SIP-WG is to investigate options of long-term sustainability of open source software (OSS) developed by the ITCR, in part by develo** a collection of business model archetypes that can serve as sustainability plan… ▽ More The Sustainability and Industry Partnership Work Group (SIP-WG) is a part of the National Cancer Institute Informatics Technology for Cancer Research (ITCR) program. The charter of the SIP-WG is to investigate options of long-term sustainability of open source software (OSS) developed by the ITCR, in part by develo** a collection of business model archetypes that can serve as sustainability plans for ITCR OSS development initiatives. The workgroup assembled models from the ITCR program, from other studies, and via engagement of its extensive network of relationships with other organizations (e.g., Chan Zuckerberg Initiative, Open Source Initiative and Software Sustainability Institute). This article reviews existing sustainability models and describes ten OSS use cases disseminated by the SIP-WG and others, and highlights five essential attributes (alignment with unmet scientific needs, dedicated development team, vibrant user community, feasible licensing model, and sustainable financial model) to assist academic software developers in achieving best practice in software sustainability. △ Less

Submitted 1 January, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

Comments: 21-page main manuscript, 43-page supplemental file

Showing 1–6 of 6 results for author: Boyce, R