-
Human-AI collectives produce the most accurate differential diagnoses
Authors:
N. Zöller,
J. Berger,
I. Lin,
N. Fu,
J. Komarneni,
G. Barabucci,
K. Laskowski,
V. Shia,
B. Harack,
E. A. Chu,
V. Trianni,
R. H. J. M. Kurvers,
S. M. Herzog
Abstract:
Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied…
▽ More
Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 medical cases. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience, and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy
Authors:
Gioele Barabucci,
Victor Shia,
Eugene Chu,
Benjamin Harack,
Nathan Fu
Abstract:
Background: Large language models (LLMs) such as OpenAI's GPT-4 or Google's PaLM 2 are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults". However, even LLMs specifically trained on medical topics may lack sufficient diagnostic accuracy for real-life applications.
Methods: Using collective intelligence methods and a dataset of 200 clinical vigne…
▽ More
Background: Large language models (LLMs) such as OpenAI's GPT-4 or Google's PaLM 2 are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults". However, even LLMs specifically trained on medical topics may lack sufficient diagnostic accuracy for real-life applications.
Methods: Using collective intelligence methods and a dataset of 200 clinical vignettes of real-life cases, we assessed and compared the accuracy of differential diagnoses obtained by asking individual commercial LLMs (OpenAI GPT-4, Google PaLM 2, Cohere Command, Meta Llama 2) against the accuracy of differential diagnoses synthesized by aggregating responses from combinations of the same LLMs.
Results: We find that aggregating responses from multiple, various LLMs leads to more accurate differential diagnoses (average accuracy for 3 LLMs: $75.3\%\pm 1.6pp$) compared to the differential diagnoses produced by single LLMs (average accuracy for single LLMs: $59.0\%\pm 6.1pp$).
Discussion: The use of collective intelligence methods to synthesize differential diagnoses combining the responses of different LLMs achieves two of the necessary steps towards advancing acceptance of LLMs as a diagnostic support tool: (1) demonstrate high diagnostic accuracy and (2) eliminate dependence on a single commercial vendor.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Convex Computation of the Basin of Stability to Measure the Likelihood of Falling: A Case Study on the Sit-to-Stand Task
Authors:
Victor Shia,
Talia Moore,
Ruzena Bajcsy,
Ram Vasudevan
Abstract:
Locomotion in the real world involves unexpected perturbations, and therefore requires strategies to maintain stability to successfully execute desired behaviours. Ensuring the safety of locomoting systems therefore necessitates a quantitative metric for stability. Due to the difficulty of determining the set of perturbations that induce failure, researchers have used a variety of features as a pr…
▽ More
Locomotion in the real world involves unexpected perturbations, and therefore requires strategies to maintain stability to successfully execute desired behaviours. Ensuring the safety of locomoting systems therefore necessitates a quantitative metric for stability. Due to the difficulty of determining the set of perturbations that induce failure, researchers have used a variety of features as a proxy to describe stability. This paper utilises recent advances in dynamical systems theory to develop a personalised, automated framework to compute the set of perturbations from which a system can avoid failure, which is known as the basin of stability. The approach tracks human motion to synthesise a control input that is analysed to measure the basin of stability. The utility of this analysis is verified on a Sit-to-Stand task performed by 15 individuals. The experiment illustrates that the computed basin of stability for each individual can successfully differentiate between less and more stable Sit-to-Stand strategies.
△ Less
Submitted 3 April, 2016;
originally announced April 2016.
-
Convex Computation of the Reachable Set for Hybrid Systems with Parametric Uncertainty
Authors:
Shankar Mohan,
Victor Shia,
Ram Vasudevan
Abstract:
To verify the correct operation of systems, engineers need to determine the set of configurations of a dynamical model that are able to safely reach a specified configuration under a control law. Unfortunately, constructing models for systems interacting in highly dynamic environments is difficult. This paper addresses this challenge by presenting a convex optimization method to efficiently comput…
▽ More
To verify the correct operation of systems, engineers need to determine the set of configurations of a dynamical model that are able to safely reach a specified configuration under a control law. Unfortunately, constructing models for systems interacting in highly dynamic environments is difficult. This paper addresses this challenge by presenting a convex optimization method to efficiently compute the set of configurations of a polynomial hybrid dynamical system that are able to safely reach a user defined target set despite parametric uncertainty in the model. This class of models describes, for example, legged robots moving over uncertain terrains. The presented approach utilizes the notion of occupation measures to describe the evolution of trajectories of a nonlinear hybrid dynamical system with parametric uncertainty as a linear equation over measures whose supports coincide with the trajectories under investigation. This linear equation with user defined support constraints is approximated with vanishing conservatism using a hierarchy of semidefinite programs that are each proven to compute an inner/outer approximation to the set of initial conditions that can reach the user defined target set safely in spite of uncertainty. The efficacy of this method is illustrated on a collection of six representative examples.
△ Less
Submitted 5 January, 2016;
originally announced January 2016.
-
Experimental Design for Human-in-the-Loop Driving Simulations
Authors:
Katherine Driggs-Campbell,
Guillaume Bellegarda,
Victor Shia,
S. Shankar Sastry,
Ruzena Bajcsy
Abstract:
This report describes a new experimental setup for human-in-the-loop simulations. A force feedback simulator with four axis motion has been setup for real-time driving experiments. The simulator will move to simulate the forces a driver feels while driving, which allows for a realistic experience for the driver. This setup allows for flexibility and control for the researcher in a realistic simula…
▽ More
This report describes a new experimental setup for human-in-the-loop simulations. A force feedback simulator with four axis motion has been setup for real-time driving experiments. The simulator will move to simulate the forces a driver feels while driving, which allows for a realistic experience for the driver. This setup allows for flexibility and control for the researcher in a realistic simulation environment. Experiments concerning driver distraction can also be carried out safely in this test bed, in addition to multi-agent experiments. All necessary code to run the simulator, the additional sensors, and the basic processing is available for use.
△ Less
Submitted 20 January, 2014;
originally announced January 2014.