Search | arXiv e-print repository

Answering real-world clinical questions using large language model based systems

Authors: Yen Sia Low, Michael L. Jackson, Rebecca J. Hyde, Robert E. Brown, Neil M. Sanghavi, Julian D. Baldwin, C. William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V. Nene, Morgan Pike, Courtney J. Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R. Zipursky, Christina Dinh, Philip Ballentine, Dan C. Derieg, Vladimir Polony, Rehan N. Chawdry, Jordan Davies, Brigham B. Hyde , et al. (2 additional authors not shown)

Abstract: Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas… ▽ More Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)

arXiv:2212.14177 [pdf, other]

Current State of Community-Driven Radiological AI Deployment in Medical Imaging

Authors: Vikash Gupta, Barbaros Selnur Erdal, Carolina Ramirez, Ralf Floca, Laurence Jackson, Brad Genereaux, Sidney Bryson, Christopher P Bridge, Jens Kleesiek, Felix Nensa, Rickmer Braren, Khaled Younis, Tobias Penzkofer, Andreas Michael Bucher, Ming Melvin Qin, Gigon Bae, Hyeonhoon Lee, M. Jorge Cardoso, Sebastien Ourselin, Eric Kerfoot, Rahul Choudhury, Richard D. White, Tessa Cook, David Bericat, Matthew Lungren , et al. (2 additional authors not shown)

Abstract: Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introd… ▽ More Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and develo** tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions. △ Less

Submitted 8 May, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: 21 pages; 5 figures

MSC Class: eess.IV

arXiv:2208.03461 [pdf]

Data science in public health: building next generation capacity

Authors: Nicholas Mirin, Heather Mattie, Latifa Jackson, Zainab Samad, Rumi Chunara

Abstract: Rapidly evolving technology, data and analytic landscapes are permeating many fields and professions. In public health, the need for data science skills including data literacy is particularly prominent given both the potential of novel data types and analysis methods to fill gaps in existing public health research and intervention practices, as well as the potential of such data or methods to per… ▽ More Rapidly evolving technology, data and analytic landscapes are permeating many fields and professions. In public health, the need for data science skills including data literacy is particularly prominent given both the potential of novel data types and analysis methods to fill gaps in existing public health research and intervention practices, as well as the potential of such data or methods to perpetuate or augment health disparities. Through a review of public health courses and programs at the top 10 U.S. and globally ranked schools of public health, this article summarizes existing educational efforts in public health data science. These existing practices serve to inform efforts for broadening such curricula to further schools and populations. Data science ethics course offerings are also examined in context of assessing how population health principles can be blended into training across levels of data involvement to augment the traditional core of public health curricula. Parallel findings from domestic and international 'outside the classroom' training programs are also synthesized to advance approaches for increasing diversity in public health data science. Based on these program reviews and their synthesis, a four-point formula is distilled for furthering public health data science education efforts, toward development of a critical and inclusive mass of practitioners with fluency to leverage data to advance goals of public health and improve quality of life in the digital age. △ Less

Submitted 6 August, 2022; originally announced August 2022.

arXiv:2107.00123 [pdf, other]

If you Cheat, I Cheat: Cheating on a Collaborative Task with a Social Robot

Authors: Ali Ayub, Huiqing Hu, Guangwei Zhou, Carter Fendley, Crystal Ramsay, Kathy Lou Jackson, Alan R. Wagner

Abstract: Robots may soon play a role in higher education by augmenting learning environments and managing interactions between instructors and learners. Little, however, is known about how the presence of robots in the learning environment will influence academic integrity. This study therefore investigates if and how college students cheat while engaged in a collaborative sorting task with a robot. We emp… ▽ More Robots may soon play a role in higher education by augmenting learning environments and managing interactions between instructors and learners. Little, however, is known about how the presence of robots in the learning environment will influence academic integrity. This study therefore investigates if and how college students cheat while engaged in a collaborative sorting task with a robot. We employed a 2x2 factorial design to examine the effects of cheating exposure (exposure to cheating or no exposure) and task clarity (clear or vague rules) on college student cheating behaviors while interacting with a robot. Our study finds that prior exposure to cheating on the task significantly increases the likelihood of cheating. Yet, the tendency to cheat was not impacted by the clarity of the task rules. These results suggest that normative behavior by classmates may strongly influence the decision to cheat while engaged in an instructional experience with a robot. △ Less

Submitted 30 June, 2021; originally announced July 2021.

Comments: Accepted at IEEE International Conference on Robot and Human Interactive Communication (ROMAN), 2021

arXiv:2003.07681 [pdf, ps, other]

The Data Science Fire Next Time: Innovative strategies for mentoring in data science

Authors: Latifa Jackson, Heriberto Acosta Maestre

Abstract: As data mining research and applications continue to expand in to a variety of fields such as medicine, finance, security, etc., the need for talented and diverse individuals is clearly felt. This is particularly the case as Big Data initiatives have taken off in the federal, private and academic sectors, providing a wealth of opportunities, nationally and internationally. The Broadening Participa… ▽ More As data mining research and applications continue to expand in to a variety of fields such as medicine, finance, security, etc., the need for talented and diverse individuals is clearly felt. This is particularly the case as Big Data initiatives have taken off in the federal, private and academic sectors, providing a wealth of opportunities, nationally and internationally. The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago with the goal of fostering mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community, while also enriching technical aptitude and exposure for a group of talented students. To date it has impacted the lives of more than 330 underrepresented trainees in data science. We provide a venue to connect talented students with innovative researchers in industry, academia, professional societies, and government. Our mission is to facilitate meaningful, lasting relationships between BPDM participants to ultimately increase diversity in data mining. This most recent workshop took place at Howard University in Washington, DC in February 2019. Here we report on the mentoring strategies that we undertook at the 2019 BPDM and how those were received. △ Less

Submitted 29 February, 2020; originally announced March 2020.

arXiv:2002.11836 [pdf, other]

No computation without representation: Avoiding data and algorithm biases through diversity

Authors: Caitlin Kuhlman, Latifa Jackson, Rumi Chunara

Abstract: The emergence and growth of research on issues of ethics in AI, and in particular algorithmic fairness, has roots in an essential observation that structural inequalities in society are reflected in the data used to train predictive models and in the design of objective functions. While research aiming to mitigate these issues is inherently interdisciplinary, the design of unbiased algorithms and… ▽ More The emergence and growth of research on issues of ethics in AI, and in particular algorithmic fairness, has roots in an essential observation that structural inequalities in society are reflected in the data used to train predictive models and in the design of objective functions. While research aiming to mitigate these issues is inherently interdisciplinary, the design of unbiased algorithms and fair socio-technical systems are key desired outcomes which depend on practitioners from the fields of data science and computing. However, these computing fields broadly also suffer from the same under-representation issues that are found in the datasets we analyze. This disconnect affects the design of both the desired outcomes and metrics by which we measure success. If the ethical AI research community accepts this, we tacitly endorse the status quo and contradict the goals of non-discrimination and equity which work on algorithmic fairness, accountability, and transparency seeks to address. Therefore, we advocate in this work for diversifying computing as a core priority of the field and our efforts to achieve ethical AI practices. We draw connections between the lack of diversity within academic and professional computing fields and the type and breadth of the biases encountered in datasets, machine learning models, problem formulations, and interpretation of results. Examining the current fairness/ethics in AI literature, we highlight cases where this lack of diverse perspectives has been foundational to the inequity in treatment of underrepresented and protected group data. We also look to other professional communities, such as in law and health, where disparities have been reduced both in the educational diversity of trainees and among professional practices. We use these lessons to develop recommendations that provide concrete steps for the computing community to increase diversity. △ Less

Submitted 26 February, 2020; originally announced February 2020.

arXiv:1908.10842 [pdf, other]

Self-supervised Recurrent Neural Network for 4D Abdominal and In-utero MR Imaging

Authors: Tong Zhang, Laurence H. Jackson, Alena Uus, James R. Clough, Lisa Story, Mary A. Rutherford, Joseph V. Hajnal, Maria Deprez

Abstract: Accurately estimating and correcting the motion artifacts are crucial for 3D image reconstruction of the abdominal and in-utero magnetic resonance imaging (MRI). The state-of-art methods are based on slice-to-volume registration (SVR) where multiple 2D image stacks are acquired in three orthogonal orientations. In this work, we present a novel reconstruction pipeline that only needs one orientatio… ▽ More Accurately estimating and correcting the motion artifacts are crucial for 3D image reconstruction of the abdominal and in-utero magnetic resonance imaging (MRI). The state-of-art methods are based on slice-to-volume registration (SVR) where multiple 2D image stacks are acquired in three orthogonal orientations. In this work, we present a novel reconstruction pipeline that only needs one orientation of 2D MRI scans and can reconstruct the full high-resolution image without masking or registration steps. The framework consists of two main stages: the respiratory motion estimation using a self-supervised recurrent neural network, which learns the respiratory signals that are naturally embedded in the asymmetry relationship of the neighborhood slices and cluster them according to a respiratory state. Then, we train a 3D deconvolutional network for super-resolution (SR) reconstruction of the sparsely selected 2D images using integrated reconstruction and total variation loss. We evaluate the classification accuracy on 5 simulated images and compare our results with the SVR method in adult abdominal and in-utero MRI scans. The results show that the proposed pipeline can accurately estimate the respiratory state and reconstruct 4D SR volumes with better or similar performance to the 3D SVR pipeline with less than 20\% sparsely selected slices. The method has great potential to transform the 4D abdominal and in-utero MRI in clinical practice. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: Accepted by MICCAI 2019 workshop on Machine Learning for Medical Image Reconstruction

arXiv:1509.07302 [pdf, other]

doi 10.1109/TBCAS.2016.2539352

Map** Generative Models onto a Network of Digital Spiking Neurons

Authors: Bruno U. Pedroni, Srinjoy Das, John V. Arthur, Paul A. Merolla, Bryan L. Jackson, Dharmendra S. Modha, Kenneth Kreutz-Delgado, Gert Cauwenberghs

Abstract: Stochastic neural networks such as Restricted Boltzmann Machines (RBMs) have been successfully used in applications ranging from speech recognition to image classification. Inference and learning in these algorithms use a Markov Chain Monte Carlo procedure called Gibbs sampling, where a logistic function forms the kernel of this sampler. On the other side of the spectrum, neuromorphic systems have… ▽ More Stochastic neural networks such as Restricted Boltzmann Machines (RBMs) have been successfully used in applications ranging from speech recognition to image classification. Inference and learning in these algorithms use a Markov Chain Monte Carlo procedure called Gibbs sampling, where a logistic function forms the kernel of this sampler. On the other side of the spectrum, neuromorphic systems have shown great promise for low-power and parallelized cognitive computing, but lack well-suited applications and automation procedures. In this work, we propose a systematic method for bridging the RBM algorithm and digital neuromorphic systems, with a generative pattern completion task as proof of concept. For this, we first propose a method of producing the Gibbs sampler using bio-inspired digital noisy integrate-and-fire neurons. Next, we describe the process of map** generative RBMs trained offline onto the IBM TrueNorth neurosynaptic processor -- a low-power digital neuromorphic VLSI substrate. Map** these algorithms onto neuromorphic hardware presents unique challenges in network connectivity and weight and bias quantization, which, in turn, require architectural and design strategies for the physical realization. Generative performance metrics are analyzed to validate the neuromorphic requirements and to best select the neuron parameters for the model. Lastly, we describe a design automation procedure which achieves optimal resource usage, accounting for the novel hardware adaptations. This work represents the first implementation of generative RBM inference on a neuromorphic VLSI substrate. △ Less

Submitted 9 October, 2015; v1 submitted 24 September, 2015; originally announced September 2015.

Comments: A similar version of this manuscript has been submitted to IEEE TBioCAS for revision in October 2015

arXiv:1509.00670 [pdf, other]

doi 10.1145/2808797.2809311

Stay Awhile and Listen: User Interactions in a Crowdsourced Platform Offering Emotional Support

Authors: Derek Doran, Samir Yelne, Luisa Massari, Maria-Carla Calzarossa, LaTrelle Jackson, Glen Moriarty

Abstract: Internet and online-based social systems are rising as the dominant mode of communication in society. However, the public or semi-private environment under which most online communications operate under do not make them suitable channels for speaking with others about personal or emotional problems. This has led to the emergence of online platforms for emotional support offering free, anonymous, a… ▽ More Internet and online-based social systems are rising as the dominant mode of communication in society. However, the public or semi-private environment under which most online communications operate under do not make them suitable channels for speaking with others about personal or emotional problems. This has led to the emergence of online platforms for emotional support offering free, anonymous, and confidential conversations with live listeners. Yet very little is known about the way these platforms are utilized, and if their features and design foster strong user engagement. This paper explores the utilization and the interaction features of hundreds of thousands of users on 7 Cups of Tea, a leading online platform offering online emotional support. It dissects the level of activity of hundreds of thousands of users, the patterns by which they engage in conversation with each other, and uses machine learning methods to find factors promoting engagement. The study may be the first to measure activities and interactions in a large-scale online social system that fosters peer-to-peer emotional support. △ Less

Submitted 2 September, 2015; originally announced September 2015.

arXiv:1503.07793 [pdf, other]

Gibbs Sampling with Low-Power Spiking Digital Neurons

Authors: Srinjoy Das, Bruno Umbria Pedroni, Paul Merolla, John Arthur, Andrew S. Cassidy, Bryan L. Jackson, Dharmendra Modha, Gert Cauwenberghs, Ken Kreutz-Delgado

Abstract: Restricted Boltzmann Machines and Deep Belief Networks have been successfully used in a wide variety of applications including image classification and speech recognition. Inference and learning in these algorithms uses a Markov Chain Monte Carlo procedure called Gibbs sampling. A sigmoidal function forms the kernel of this sampler which can be realized from the firing statistics of noisy integrat… ▽ More Restricted Boltzmann Machines and Deep Belief Networks have been successfully used in a wide variety of applications including image classification and speech recognition. Inference and learning in these algorithms uses a Markov Chain Monte Carlo procedure called Gibbs sampling. A sigmoidal function forms the kernel of this sampler which can be realized from the firing statistics of noisy integrate-and-fire neurons on a neuromorphic VLSI substrate. This paper demonstrates such an implementation on an array of digital spiking neurons with stochastic leak and threshold properties for inference tasks and presents some key performance metrics for such a hardware-based sampler in both the generative and discriminative contexts. △ Less

Submitted 27 March, 2015; v1 submitted 26 March, 2015; originally announced March 2015.

Comments: Accepted at ISCAS 2015

Showing 1–10 of 10 results for author: Jackson, L