Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data
Authors:
Sudeshna Das,
Yao Ge,
Yuting Guo,
Swati Rajwal,
JaMor Hairston,
Jeanne Powell,
Drew Walker,
Snigdha Peddireddy,
Sahithi Lakamana,
Selen Bozkurt,
Matthew Reyna,
Reza Sameni,
Yunyu Xiao,
Sangmi Kim,
Rasheeta Chandler,
Natalie Hernandez,
Danielle Mowery,
Rachel Wightman,
Jennifer Love,
Anthony Spadaro,
Jeanmarie Perrone,
Abeed Sarker
Abstract:
Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for qu…
▽ More
Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for query-focused answer generation and evaluate a proof-of-concept for this framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. The evaluations demonstrate the effectiveness of the two-layer framework in resource constrained settings to enable researchers in obtaining near real-time data from users.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media
Authors:
Yuting Guo,
Swati Rajwal,
Sahithi Lakamana,
Chia-Chun Chiang,
Paul C. Menell,
Adnan H. Shahid,
Yi-Chieh Chen,
Nikita Chhabra,
Wan-Ju Chao,
Chieh-Ju Chao,
Todd J. Schwedt,
Imon Banerjee,
Abeed Sarker
Abstract:
Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text cla…
▽ More
Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.
△ Less
Submitted 23 December, 2022;
originally announced December 2022.