-
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
Authors:
William James Bolton,
Rafael Poyiadzi,
Edward R. Morrell,
Gabriela van Bergen Gonzalez Bueno,
Lea Goetz
Abstract:
Large Language Models (LLMs) increasingly support applications in a wide range of domains, some with potential high societal impact such as biomedicine, yet their reliability in realistic use cases is under-researched. In this work we introduce the Reliability AssesMent for Biomedical LLM Assistants (RAmBLA) framework and evaluate whether four state-of-the-art foundation LLMs can serve as reliable…
▽ More
Large Language Models (LLMs) increasingly support applications in a wide range of domains, some with potential high societal impact such as biomedicine, yet their reliability in realistic use cases is under-researched. In this work we introduce the Reliability AssesMent for Biomedical LLM Assistants (RAmBLA) framework and evaluate whether four state-of-the-art foundation LLMs can serve as reliable assistants in the biomedical domain. We identify prompt robustness, high recall, and a lack of hallucinations as necessary criteria for this use case. We design shortform tasks and tasks requiring LLM freeform responses mimicking real-world user interactions. We evaluate LLM performance using semantic similarity with a ground truth response, through an evaluator LLM.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Segmentation of diagnostic tissue compartments on whole slide images with renal thrombotic microangiopathies (TMAs)
Authors:
Huy Q. Vo,
Pietro A. Cicalese,
Surya Seshan,
Syed A. Rizvi,
Aneesh Vathul,
Gloria Bueno,
Anibal Pedraza Dorado,
Niels Grabe,
Katharina Stolle,
Francesco Pesce,
Joris J. T. H. Roelofs,
Jesper Kers,
Vitoantonio Bevilacqua,
Nicola Altini,
Bernd Schröppel,
Dario Roccatello,
Antonella Barreca,
Savino Sciascia,
Chandra Mohan,
Hien V. Nguyen,
Jan U. Becker
Abstract:
The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology with a broad spectrum of acute and chronic findings. Precise diagnostic criteria for a renal biopsy diagnosis of TMA are missing. As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissu…
▽ More
The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology with a broad spectrum of acute and chronic findings. Precise diagnostic criteria for a renal biopsy diagnosis of TMA are missing. As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of whole slide images from renal biopsies with TMAs and Mimickers (distinct diseases with a similar nephropathological appearance as TMA like severe benign nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy, arteriolar light chain deposition disease). Our segmentation model combines a U-Net-based tissue detection with a Shifted windows-transformer architecture to reach excellent segmentation results for even the most severely altered glomeruli, arterioles and arteries, even on unseen staining domains from a different nephropathology lab. With accurate automatic segmentation of the decisive renal biopsy compartments in human renal vasculopathies, we have laid the foundation for large-scale compartment-specific machine learning and computer vision analysis of renal biopsy repositories with TMAs.
△ Less
Submitted 28 November, 2023; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Handgun detection using combined human pose and weapon appearance
Authors:
Jesus Ruiz-Santaquiteria,
Alberto Velasco-Mata,
Noelia Vallez,
Gloria Bueno,
Juan A. Álvarez-García,
Oscar Deniz
Abstract:
Closed-circuit television (CCTV) systems are essential nowadays to prevent security threats or dangerous situations, in which early detection is crucial. Novel deep learning-based methods have allowed to develop automatic weapon detectors with promising results. However, these approaches are mainly based on visual weapon appearance only. For handguns, body pose may be a useful cue, especially in c…
▽ More
Closed-circuit television (CCTV) systems are essential nowadays to prevent security threats or dangerous situations, in which early detection is crucial. Novel deep learning-based methods have allowed to develop automatic weapon detectors with promising results. However, these approaches are mainly based on visual weapon appearance only. For handguns, body pose may be a useful cue, especially in cases where the gun is barely visible. In this work, a novel method is proposed to combine, in a single architecture, both weapon appearance and human pose information. First, pose keypoints are estimated to extract hand regions and generate binary pose images, which are the model inputs. Then, each input is processed in different subnetworks and combined to produce the handgun bounding box. Results obtained show that the combined model improves the handgun detection state of the art, achieving from 4.23 to 18.9 AP points more than the best previous approach.
△ Less
Submitted 23 July, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues
Authors:
Talha Qaiser,
Abhik Mukherjee,
Chaitanya Reddy Pb,
Sai Dileep Munugoti,
Vamsi Tallam,
Tomi Pitkäaho,
Taina Lehtimäki,
Thomas Naughton,
Matt Berseth,
Aníbal Pedraza,
Ramakrishnan Mukundan,
Matthew Smith,
Abhir Bhalerao,
Erik Rodner,
Marcel Simon,
Joachim Denzler,
Chao-Hui Huang,
Gloria Bueno,
David Snead,
Ian Ellis,
Mohammad Ilyas,
Nasir Rajpoot
Abstract:
Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variabili…
▽ More
Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variability. Given the prognostic and therapeutic implications of Her2 scoring, a more objective method is required. In this paper, we report on a recent automated Her2 scoring contest, held in conjunction with the annual PathSoc meeting held in Nottingham in June 2016, aimed at systematically comparing and advancing the state-of-the-art Artificial Intelligence (AI) based automated methods for Her2 scoring. The contest dataset comprised of digitised whole slide images (WSI) of sections from 86 cases of invasive breast carcinoma stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting algorithms automatically predicted scores of the IHC slides for an unseen subset of the dataset and the predicted scores were compared with the 'ground truth' (a consensus score from at least two experts). We also report on a simple Man vs Machine contest for the scoring of Her2 and show that the automated methods could beat the pathology experts on this contest dataset. This paper presents a benchmark for comparing the performance of automated algorithms for scoring of Her2. It also demonstrates the enormous potential of automated algorithms in assisting the pathologist with objective IHC scoring.
△ Less
Submitted 24 July, 2017; v1 submitted 23 May, 2017;
originally announced May 2017.