Search | arXiv e-print repository

Human-AI collectives produce the most accurate differential diagnoses

Authors: N. Zöller, J. Berger, I. Lin, N. Fu, J. Komarneni, G. Barabucci, K. Laskowski, V. Shia, B. Harack, E. A. Chu, V. Trianni, R. H. J. M. Kurvers, S. M. Herzog

Abstract: Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied… ▽ More Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 medical cases. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience, and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2402.08806 [pdf, other]

Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy

Authors: Gioele Barabucci, Victor Shia, Eugene Chu, Benjamin Harack, Nathan Fu

Abstract: Background: Large language models (LLMs) such as OpenAI's GPT-4 or Google's PaLM 2 are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults". However, even LLMs specifically trained on medical topics may lack sufficient diagnostic accuracy for real-life applications. Methods: Using collective intelligence methods and a dataset of 200 clinical vigne… ▽ More Background: Large language models (LLMs) such as OpenAI's GPT-4 or Google's PaLM 2 are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults". However, even LLMs specifically trained on medical topics may lack sufficient diagnostic accuracy for real-life applications. Methods: Using collective intelligence methods and a dataset of 200 clinical vignettes of real-life cases, we assessed and compared the accuracy of differential diagnoses obtained by asking individual commercial LLMs (OpenAI GPT-4, Google PaLM 2, Cohere Command, Meta Llama 2) against the accuracy of differential diagnoses synthesized by aggregating responses from combinations of the same LLMs. Results: We find that aggregating responses from multiple, various LLMs leads to more accurate differential diagnoses (average accuracy for 3 LLMs: $75.3\%\pm 1.6pp$) compared to the differential diagnoses produced by single LLMs (average accuracy for single LLMs: $59.0\%\pm 6.1pp$). Discussion: The use of collective intelligence methods to synthesize differential diagnoses combining the responses of different LLMs achieves two of the necessary steps towards advancing acceptance of LLMs as a diagnostic support tool: (1) demonstrate high diagnostic accuracy and (2) eliminate dependence on a single commercial vendor. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 5 pages, 2 figures, 1 table

ACM Class: I.2.1; J.3

arXiv:2308.15514 [pdf, other]

International Governance of Civilian AI: A Jurisdictional Certification Approach

Authors: Robert Trager, Ben Harack, Anka Reuel, Allison Carnegie, Lennart Heim, Lewis Ho, Sarah Kreps, Ranjit Lall, Owen Larter, Seán Ó hÉigeartaigh, Simon Staffell, José Jaime Villalobos

Abstract: This report describes trade-offs in the design of international governance arrangements for civilian artificial intelligence (AI) and presents one approach in detail. This approach represents the extension of a standards, licensing, and liability regime to the global level. We propose that states establish an International AI Organization (IAIO) to certify state jurisdictions (not firms or AI proj… ▽ More This report describes trade-offs in the design of international governance arrangements for civilian artificial intelligence (AI) and presents one approach in detail. This approach represents the extension of a standards, licensing, and liability regime to the global level. We propose that states establish an International AI Organization (IAIO) to certify state jurisdictions (not firms or AI projects) for compliance with international oversight standards. States can give force to these international standards by adopting regulations prohibiting the import of goods whose supply chains embody AI from non-IAIO-certified jurisdictions. This borrows attributes from models of existing international organizations, such as the International Civilian Aviation Organization (ICAO), the International Maritime Organization (IMO), and the Financial Action Task Force (FATF). States can also adopt multilateral controls on the export of AI product inputs, such as specialized hardware, to non-certified jurisdictions. Indeed, both the import and export standards could be required for certification. As international actors reach consensus on risks of and minimum standards for advanced AI, a jurisdictional certification regime could mitigate a broad range of potential harms, including threats to public safety. △ Less

Submitted 11 September, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

arXiv:2108.09404 [pdf, other]

Safe Transformative AI via a Windfall Clause

Authors: Paolo Bova, Jonas Emanuel Müller, Benjamin Harack

Abstract: Society could soon see transformative artificial intelligence (TAI). Models of competition for TAI show firms face strong competitive pressure to deploy TAI systems before they are safe. This paper explores a proposed solution to this problem, a Windfall Clause, where developers commit to donating a significant portion of any eventual extremely large profits to good causes. However, a key challeng… ▽ More Society could soon see transformative artificial intelligence (TAI). Models of competition for TAI show firms face strong competitive pressure to deploy TAI systems before they are safe. This paper explores a proposed solution to this problem, a Windfall Clause, where developers commit to donating a significant portion of any eventual extremely large profits to good causes. However, a key challenge for a Windfall Clause is that firms must have reason to join one. Firms must also believe these commitments are credible. We extend a model of TAI competition with a Windfall Clause to show how firms and policymakers can design a Windfall Clause which overcomes these challenges. Encouragingly, firms benefit from joining a Windfall Clause under a wide range of scenarios. We also find that firms join the Windfall Clause more often when the competition is more dangerous. Even when firms learn each other's capabilities, firms rarely wish to withdraw their support for the Windfall Clause. These three findings strengthen the case for using a Windfall Clause to promote the safe development of TAI. △ Less

Submitted 28 August, 2021; v1 submitted 20 August, 2021; originally announced August 2021.

ACM Class: K.4.1; K.5.2

arXiv:1110.6557 [pdf, other]

Experimental review of graphene

Authors: Daniel R. Cooper, Benjamin D'Anjou, Nageswara Ghattamaneni, Benjamin Harack, Michael Hilke, Alexandre Horth, Norberto Majlis, Mathieu Massicotte, Leron Vandsburger, Eric Whiteway, Victor Yu

Abstract: This review examines the properties of graphene from an experimental perspective. The intent is to review the most important experimental results at a level of detail appropriate for new graduate students who are interested in a general overview of the fascinating properties of graphene. While some introductory theoretical concepts are provided, including a discussion of the electronic band struct… ▽ More This review examines the properties of graphene from an experimental perspective. The intent is to review the most important experimental results at a level of detail appropriate for new graduate students who are interested in a general overview of the fascinating properties of graphene. While some introductory theoretical concepts are provided, including a discussion of the electronic band structure and phonon dispersion, the main emphasis is on describing relevant experiments and important results as well as some of the novel applications of graphene. In particular, this review covers graphene synthesis and characterization, field-effect behavior, electronic transport properties, magneto-transport, integer and fractional quantum Hall effects, mechanical properties, transistors, optoelectronics, graphene-based sensors, and biosensors. This approach attempts to highlight both the means by which the current understanding of graphene has come about and some tools for future contributions. △ Less

Submitted 29 October, 2011; originally announced October 2011.

Comments: Equal contributions from all authors

Showing 1–5 of 5 results for author: Harack, B