Search | arXiv e-print repository

Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations

Authors: Qingyu Chen, **gcheng Du, Yan Hu, Vipina Kuttichi Keloth, Xueqing Peng, Kalpana Raja, Rui Zhang, Zhiyong Lu, Hua Xu

Abstract: Biomedical literature is growing rapidly, making it challenging to curate and extract knowledge manually. Biomedical natural language processing (BioNLP) techniques that can automatically extract information from biomedical literature help alleviate this burden. Recently, large Language Models (LLMs), such as GPT-3 and GPT-4, have gained significant attention for their impressive performance. Howe… ▽ More Biomedical literature is growing rapidly, making it challenging to curate and extract knowledge manually. Biomedical natural language processing (BioNLP) techniques that can automatically extract information from biomedical literature help alleviate this burden. Recently, large Language Models (LLMs), such as GPT-3 and GPT-4, have gained significant attention for their impressive performance. However, their effectiveness in BioNLP tasks and impact on method development and downstream users remain understudied. This pilot study (1) establishes the baseline performance of GPT-3 and GPT-4 at both zero-shot and one-shot settings in eight BioNLP datasets across four applications: named entity recognition, relation extraction, multi-label document classification, and semantic similarity and reasoning, (2) examines the errors produced by the LLMs and categorized the errors into three types: missingness, inconsistencies, and unwanted artificial content, and (3) provides suggestions for using LLMs in BioNLP applications. We make the datasets, baselines, and results publicly available to the community via https://github.com/qingyu-qc/gpt_bionlp_benchmark. △ Less

Submitted 20 January, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2304.01446 [pdf]

Integrating Commercial and Social Determinants of Health: A Unified Ontology for Non-Clinical Determinants of Health

Authors: Navya Martin Kollapally, Vipina Kuttichi Keloth, Julia Xu, James Geller

Abstract: The objectives of this research are 1) to develop an ontology for CDoH by utilizing PubMed articles and ChatGPT; 2) to foster ontology reuse by integrating CDoH with an existing SDoH ontology into a unified structure; 3) to devise an overarching conception for all non-clinical determinants of health and to create an initial ontology, called N-CODH, for them; 4) and to validate the degree of corres… ▽ More The objectives of this research are 1) to develop an ontology for CDoH by utilizing PubMed articles and ChatGPT; 2) to foster ontology reuse by integrating CDoH with an existing SDoH ontology into a unified structure; 3) to devise an overarching conception for all non-clinical determinants of health and to create an initial ontology, called N-CODH, for them; 4) and to validate the degree of correspondence between concepts provided by ChatGPT with the existing SDoH ontology △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: Under review AMIA 2023

arXiv:2303.16416 [pdf, other]

Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering

Authors: Yan Hu, Qingyu Chen, **gcheng Du, Xueqing Peng, Vipina Kuttichi Keloth, Xu Zuo, Yujia Zhou, Zehan Li, Xiaoqian Jiang, Zhiyong Lu, Kirk Roberts, Hua Xu

Abstract: Objective: This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance. Materials and Methods: We evaluated these models on two clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extrac… ▽ More Objective: This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance. Materials and Methods: We evaluated these models on two clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) identifying nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's effectiveness and compared the models to BioClinicalBERT. Results: Using baseline prompts, GPT-3.5 and GPT-4 achieved relaxed F1 scores of 0.634, 0.804 for MTSamples, and 0.301, 0.593 for VAERS. Additional prompt components consistently improved model performance. When all four components were used, GPT-3.5 and GPT-4 achieved relaxed F1 socres of 0.794, 0.861 for MTSamples and 0.676, 0.736 for VAERS, demonstrating the effectiveness of our prompt framework. Although these results trail BioClinicalBERT (F1 of 0.901 for the MTSamples dataset and 0.802 for the VAERS), it is very promising considering few training samples are needed. Conclusion: While direct application of GPT models to clinical NER tasks falls short of optimal performance, our task-specific prompt framework, incorporating medical knowledge and training samples, significantly enhances GPT models' feasibility for potential clinical applications. △ Less

Submitted 24 January, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: 17 pages, 5 tables, 6 figure

arXiv:2212.01941 [pdf]

doi 10.1093/jamia/ocad096

Systematic Design and Evaluation of Social Determinants of Health Ontology (SDoHO)

Authors: Yifang Dang, Fang Li, Xinyue Hu, Vipina K. Keloth, Meng Zhang, Sunyang Fu, **gcheng Du, J. Wilfred Fan, Muhammad F. Amith, Evan Yu, Hongfang Liu, Xiaoqian Jiang, Hua Xu, Cui Tao

Abstract: Social determinants of health (SDoH) have a significant impact on health outcomes and well-being. Addressing SDoH is the key to reducing healthcare inequalities and transforming a "sick care" system into a "health promoting" system. To address the SDOH terminology gap and better embed relevant elements in advanced biomedical informatics, we propose an SDoH ontology (SDoHO), which represents fundam… ▽ More Social determinants of health (SDoH) have a significant impact on health outcomes and well-being. Addressing SDoH is the key to reducing healthcare inequalities and transforming a "sick care" system into a "health promoting" system. To address the SDOH terminology gap and better embed relevant elements in advanced biomedical informatics, we propose an SDoH ontology (SDoHO), which represents fundamental SDoH factors and their relationships in a standardized and measurable way. The ontology formally models classes, relationships, and constraints based on multiple SDoH-related resources. Expert review and coverage evaluation, using clinical notes data and a national survey, showed satisfactory results. SDoHO could potentially play an essential role in providing a foundation for a comprehensive understanding of the associations between SDoH and health outcomes and providing a path toward health equity across populations. △ Less

Submitted 15 June, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

Comments: J Am Med Inform Assoc Published Online First: 10 June 2023

Showing 1–4 of 4 results for author: Keloth, V K