Assessing The Potential Of Mid-Sized Language Models For Clinical QA

Bolton, Elliot; Xiong, Betty; Muralidharan, Vijaytha; Schamroth, Joel; Muralidharan, Vivek; Manning, Christopher D.; Daneshjou, Roxana

Computer Science > Computation and Language

arXiv:2404.15894 (cs)

[Submitted on 24 Apr 2024]

Title:Assessing The Potential Of Mid-Sized Language Models For Clinical QA

Authors:Elliot Bolton, Betty Xiong, Vijaytha Muralidharan, Joel Schamroth, Vivek Muralidharan, Christopher D. Manning, Roxana Daneshjou

View PDF HTML (experimental)

Abstract:Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks; however, they require access to compute, are closed-source, and cannot be deployed on device. Mid-size models such as BioGPT-large, BioMedLM, LLaMA 2, and Mistral 7B avoid these drawbacks, but their capacity for clinical tasks has been understudied. To help assess their potential for clinical use and help researchers decide which model they should use, we compare their performance on two clinical question-answering (QA) tasks: MedQA and consumer query answering. We find that Mistral 7B is the best performing model, winning on all benchmarks and outperforming models trained specifically for the biomedical domain. While Mistral 7B's MedQA score of 63.0% approaches the original Med-PaLM, and it often can produce plausible responses to consumer health queries, room for improvement still exists. This study provides the first head-to-head assessment of open source mid-sized models on clinical tasks.

Comments:	25 pages, 8 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.15894 [cs.CL]
	(or arXiv:2404.15894v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.15894

Submission history

From: Elliot Bolton [view email]
[v1] Wed, 24 Apr 2024 14:32:34 UTC (1,169 KB)

Computer Science > Computation and Language

Title:Assessing The Potential Of Mid-Sized Language Models For Clinical QA

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Assessing The Potential Of Mid-Sized Language Models For Clinical QA

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators