Search | arXiv e-print repository

Biomedical knowledge graph-optimized prompt generation for large language models

Authors: Karthik Soman, Peter W Rose, John H Morris, Rabia E Akbas, Brett Smith, Braian Peetoom, Catalina Villouta-Reyes, Gabriel Cerono, Yongmei Shi, Angela Rizk-Jackson, Sharat Israni, Charlotte A Nelson, Sui Huang, Sergio E Baranzini

Abstract: Large Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) fra… ▽ More Large Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion. △ Less

Submitted 13 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 29 pages, 5 figures, 1 table, 1 supplementary file

arXiv:2309.00700 [pdf, other]

Cross-temporal Detection of Novel Ransomware Campaigns: A Multi-Modal Alert Approach

Authors: Sathvik Murli, Dhruv Nandakumar, Prabhat Kumar Kushwaha, Cheng Wang, Christopher Redino, Abdul Rahman, Shalini Israni, Tarun Singh, Edward Bowen

Abstract: We present a novel approach to identify ransomware campaigns derived from attack timelines representations within victim networks. Malicious activity profiles developed from multiple alert sources support the construction of alert graphs. This approach enables an effective and scalable representation of the attack timelines where individual nodes represent malicious activity detections with connec… ▽ More We present a novel approach to identify ransomware campaigns derived from attack timelines representations within victim networks. Malicious activity profiles developed from multiple alert sources support the construction of alert graphs. This approach enables an effective and scalable representation of the attack timelines where individual nodes represent malicious activity detections with connections describing the potential attack paths. This work demonstrates adaptability to different attack patterns through implementing a novel method for parsing and classifying alert graphs while maintaining efficacy despite potentially low-dimension node features. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Preprint. Under Review

arXiv:2105.07238 [pdf]

Using Ethnographic Methods to Classify the Human Experience in Medicine: A Case Study of the Presence Ontology

Authors: Amrapali Maitra, Maulik R. Kamdar, Donna M. Zulman, Marie C. Haverfield, Cati Brown-Johnson, Rachel Schwartz, Sonoo Thadaney Israni, Abraham Verghese, Mark A. Musen

Abstract: Objective Although social and environmental factors are central to provider patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine. Methods Our top down approach for ontology development based on the concept of re… ▽ More Objective Although social and environmental factors are central to provider patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine. Methods Our top down approach for ontology development based on the concept of relationality included 1) broad survey of social sciences literature and systematic literature review of more than 20,000 articles around interpersonal connection in medicine, 3) relational ethnography of clinical encounters (5 pilot, 27 full) and 4) interviews about relational work with 40 medical and nonmedical professionals. We formalized the model using the Web Ontology Language in the Protege ontology editor. We iteratively evaluated and refined the Presence Ontology through manual expert review and automated annotation of literature. Results and Discussion The Presence Ontology facilitates the naming and classification of concepts that would otherwise be vague. Our model categorizes contributors to healthcare encounters and factors such as Communication, Emotions, Tools, and Environment. Ontology evaluation indicated that Cognitive Models (both patients explanatory models and providers caregiving approaches) influenced encounters and were subsequently incorporated. We show how ethnographic methods based in relationality can aid the representation of experiential concepts (e.g., empathy, trust). Our ontology could support informatics applications to improve healthcare such annotation of videotaped encounters, clinical instruments to measure presence, or EHR based reminders for providers. Conclusion The Presence Ontology provides a model for using ethnographic approaches to classify interpersonal data. △ Less

Submitted 15 May, 2021; originally announced May 2021.

Comments: 15 pages, 4 figures, 57 references

Showing 1–3 of 3 results for author: Israni, S