-
Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology
Authors:
Dyke Ferber,
Omar S. M. El Nahhas,
Georg Wölflein,
Isabella C. Wiest,
Jan Clusmann,
Marie-Elisabeth Leßman,
Sebastian Foersch,
Jacqueline Lammert,
Maximilian Tschochohei,
Dirk Jäger,
Manuel Salto-Tellez,
Nikolaus Schultz,
Daniel Truhn,
Jakob Nikolas Kather
Abstract:
Multimodal artificial intelligence (AI) systems have the potential to enhance clinical decision-making by interpreting various types of medical data. However, the effectiveness of these models across all medical fields is uncertain. Each discipline presents unique challenges that need to be addressed for optimal performance. This complexity is further increased when attempting to integrate differe…
▽ More
Multimodal artificial intelligence (AI) systems have the potential to enhance clinical decision-making by interpreting various types of medical data. However, the effectiveness of these models across all medical fields is uncertain. Each discipline presents unique challenges that need to be addressed for optimal performance. This complexity is further increased when attempting to integrate different fields into a single model. Here, we introduce an alternative approach to multimodal medical AI that utilizes the generalist capabilities of a large language model (LLM) as a central reasoning engine. This engine autonomously coordinates and deploys a set of specialized medical AI tools. These tools include text, radiology and histopathology image interpretation, genomic data processing, web searches, and document retrieval from medical guidelines. We validate our system across a series of clinical oncology scenarios that closely resemble typical patient care workflows. We show that the system has a high capability in employing appropriate tools (97%), drawing correct conclusions (93.6%), and providing complete (94%), and helpful (89.2%) recommendations for individual patient cases while consistently referencing relevant literature (82.5%) upon instruction. This work provides evidence that LLMs can effectively plan and execute domain-specific models to retrieve or synthesize new information when used as autonomous agents. This enables them to function as specialist, patient-tailored clinical assistants. It also simplifies regulatory compliance by allowing each component tool to be individually validated and approved. We believe, that our work can serve as a proof-of-concept for more advanced LLM-agents in the medical domain.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
From Whole-slide Image to Biomarker Prediction: A Protocol for End-to-End Deep Learning in Computational Pathology
Authors:
Omar S. M. El Nahhas,
Marko van Treeck,
Georg Wölflein,
Michaela Unger,
Marta Ligero,
Tim Lenz,
Sophia J. Wagner,
Katherine J. Hewitt,
Firas Khader,
Sebastian Foersch,
Daniel Truhn,
Jakob Nikolas Kather
Abstract:
Hematoxylin- and eosin (H&E) stained whole-slide images (WSIs) are the foundation of diagnosis of cancer. In recent years, development of deep learning-based methods in computational pathology enabled the prediction of biomarkers directly from WSIs. However, accurately linking tissue phenotype to biomarkers at scale remains a crucial challenge for democratizing complex biomarkers in precision onco…
▽ More
Hematoxylin- and eosin (H&E) stained whole-slide images (WSIs) are the foundation of diagnosis of cancer. In recent years, development of deep learning-based methods in computational pathology enabled the prediction of biomarkers directly from WSIs. However, accurately linking tissue phenotype to biomarkers at scale remains a crucial challenge for democratizing complex biomarkers in precision oncology. This protocol describes a practical workflow for solid tumor associative modeling in pathology (STAMP), enabling prediction of biomarkers directly from WSIs using deep learning. The STAMP workflow is biomarker agnostic and allows for genetic- and clinicopathologic tabular data to be included as an additional input, together with histopathology images. The protocol consists of five main stages which have been successfully applied to various research problems: formal problem definition, data preprocessing, modeling, evaluation and clinical translation. The STAMP workflow differentiates itself through its focus on serving as a collaborative framework that can be used by clinicians and engineers alike for setting up research projects in the field of computational pathology. As an example task, we applied STAMP to the prediction of microsatellite instability (MSI) status in colorectal cancer, showing accurate performance for the identification of MSI-high tumors. Moreover, we provide an open-source codebase which has been deployed at several hospitals across the globe to set up computational pathology workflows. The STAMP workflow requires one workday of hands-on computational execution and basic command line knowledge.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Medical Foundation Models are Susceptible to Targeted Misinformation Attacks
Authors:
Tianyu Han,
Sven Nebelung,
Firas Khader,
Tianci Wang,
Gustav Mueller-Franzes,
Christiane Kuhl,
Sebastian Försch,
Jens Kleesiek,
Christoph Haarburger,
Keno K. Bressem,
Jakob Nikolas Kather,
Daniel Truhn
Abstract:
Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains, holding promising potential for diverse medical applications in the near future. In this study, we demonstrate a concerning vulnerability of LLMs in medicine. Through targeted manipulation of just 1.1% of the model's weights, we can deliberately inject an incorrect biomedical fac…
▽ More
Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains, holding promising potential for diverse medical applications in the near future. In this study, we demonstrate a concerning vulnerability of LLMs in medicine. Through targeted manipulation of just 1.1% of the model's weights, we can deliberately inject an incorrect biomedical fact. The erroneous information is then propagated in the model's output, whilst its performance on other biomedical tasks remains intact. We validate our findings in a set of 1,038 incorrect biomedical facts. This peculiar susceptibility raises serious security and trustworthiness concerns for the application of LLMs in healthcare settings. It accentuates the need for robust protective measures, thorough verification mechanisms, and stringent management of access to these models, ensuring their reliable and safe use in medical practice.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Generation
Authors:
Firas Khader,
Gustav Mueller-Franzes,
Soroosh Tayebi Arasteh,
Tianyu Han,
Christoph Haarburger,
Maximilian Schulze-Hagen,
Philipp Schad,
Sandy Engelhardt,
Bettina Baessler,
Sebastian Foersch,
Johannes Stegmaier,
Christiane Kuhl,
Sven Nebelung,
Jakob Nikolas Kather,
Daniel Truhn
Abstract:
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play…
▽ More
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play a crucial role in privacy preserving artificial intelligence and can also be used to augment small datasets. Here we show that diffusion probabilistic models can synthesize high quality medical imaging data, which we show for Magnetic Resonance Images (MRI) and Computed Tomography (CT) images. We provide quantitative measurements of their performance through a reader study with two medical experts who rated the quality of the synthesized images in three categories: Realistic image appearance, anatomical correctness and consistency between slices. Furthermore, we demonstrate that synthetic images can be used in a self-supervised pre-training and improve the performance of breast segmentation models when data is scarce (dice score 0.91 vs. 0.95 without vs. with synthetic data). The code is publicly available on GitHub: https://github.com/FirasGit/medicaldiffusion.
△ Less
Submitted 3 January, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.