Skip to main content

Showing 1–15 of 15 results for author: Pérez-García, F

.
  1. arXiv:2406.04449  [pdf, other

    cs.CL cs.CV

    MAIRA-2: Grounded Radiology Report Generation

    Authors: Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Anton Schwaighofer, Sam Bond-Taylor, Maximilian Ilse, Fernando Pérez-García, Valentina Salvatelli, Harshita Sharma, Felix Meissen, Mercy Ranjit, Shaury Srivastav, Julia Gong, Fabian Falck, Ozan Oktay, Anja Thieme, Matthew P. Lungren, Maria Teodora Wetscherek, Javier Alvarez-Valle, Stephanie L. Hyland

    Abstract: Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. Here, we extend report generation to include the localisation of individual findings on the image - a task we call grounded report… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 44 pages, 20 figures

  2. arXiv:2405.05299  [pdf, other

    cs.HC cs.AI

    Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology

    Authors: Anja Thieme, Abhijith Rajamohan, Benjamin Cooper, Heather Groombridge, Robert Simister, Barney Wong, Nicholas Woznitza, Mark Ames Pinnock, Maria Teodora Wetscherek, Cecily Morrison, Hannah Richardson, Fernando Pérez-García, Stephanie L. Hyland, Shruthi Bannur, Daniel C. Castro, Kenza Bouzid, Anton Schwaighofer, Mercy Ranjit, Harshita Sharma, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle, Aditya Nori, Stephen Harris, Joseph Jacob

    Abstract: Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delay… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    ACM Class: H.5.m; I.2.m

  3. Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology

    Authors: Nur Yildirim, Hannah Richardson, Maria T. Wetscherek, Junaid Bajwa, Joseph Jacob, Mark A. Pinnock, Stephen Harris, Daniel Coelho de Castro, Shruthi Bannur, Stephanie L. Hyland, Pratik Ghosh, Mercy Ranjit, Kenza Bouzid, Anton Schwaighofer, Fernando Pérez-García, Harshita Sharma, Ozan Oktay, Matthew Lungren, Javier Alvarez-Valle, Aditya Nori, Anja Thieme

    Abstract: Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual que… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: to appear at CHI 2024

  4. arXiv:2401.10815  [pdf, other

    cs.CV

    RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision

    Authors: Fernando Pérez-García, Harshita Sharma, Sam Bond-Taylor, Kenza Bouzid, Valentina Salvatelli, Maximilian Ilse, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Matthew P. Lungren, Maria Wetscherek, Noel Codella, Stephanie L. Hyland, Javier Alvarez-Valle, Ozan Oktay

    Abstract: Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, resulting features are limited by the information contained within the text. This is particularly problematic in medical imaging, where radiologists'… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  5. arXiv:2312.12865  [pdf, other

    cs.CV cs.AI

    RadEdit: stress-testing biomedical vision models via diffusion image editing

    Authors: Fernando Pérez-García, Sam Bond-Taylor, Pedro P. Sanchez, Boris van Breugel, Daniel C. Castro, Harshita Sharma, Valentina Salvatelli, Maria T. A. Wetscherek, Hannah Richardson, Matthew P. Lungren, Aditya Nori, Javier Alvarez-Valle, Ozan Oktay, Maximilian Ilse

    Abstract: Biomedical imaging datasets are often small and biased, meaning that real-world performance of predictive models can be substantially lower than expected from internal testing. This work proposes using generative image editing to simulate dataset shifts and diagnose failure modes of biomedical vision models; this can be used in advance of deployment to assess readiness, potentially reducing cost a… ▽ More

    Submitted 3 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  6. arXiv:2311.13668  [pdf, other

    cs.CL cs.AI cs.CV

    MAIRA-1: A specialised large multimodal model for radiology report generation

    Authors: Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando Pérez-García, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, Noel Codella, Matthew P. Lungren, Maria Teodora Wetscherek, Ozan Oktay, Javier Alvarez-Valle

    Abstract: We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities… ▽ More

    Submitted 26 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 18 pages, 9 tables, 5 figures. v2 adds test IDs and image encoder citation. v3 fixes error in NPV/specificity

  7. arXiv:2310.14573  [pdf, other

    cs.CL

    Exploring the Boundaries of GPT-4 in Radiology

    Authors: Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Pérez-García, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle

    Abstract: The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-s… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 main

  8. arXiv:2305.05598  [pdf, other

    cs.CV

    Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query

    Authors: Ho Hin Lee, Alberto Santamaria-Pang, Jameson Merkow, Ozan Oktay, Fernando Pérez-García, Javier Alvarez-Valle, Ivan Tarapov

    Abstract: We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR) that demonstrates the feasibility of medical image retrieval with similar anatomical regions. RegionMIR addresses two major challenges for medical image retrieval i) standardization of clinically relevant searching criteria (e.g., anatomical, pathology-based), and ii) localization of anatomical area o… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  9. arXiv:2303.13386  [pdf, other

    cs.CL cs.LG

    Compositional Zero-Shot Domain Transfer with Text-to-Text Models

    Authors: Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Sheng Zhang, Tristan Naumann, Aditya Nori, Hoifung Poon, Javier Alvarez-Valle, Ozan Oktay, Stephanie L. Hyland

    Abstract: Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily availa… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted at TACL, pre-MIT Press publication version. 16 pages, 4 figures

  10. arXiv:2301.04558  [pdf, other

    cs.CV cs.CL

    Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

    Authors: Shruthi Bannur, Stephanie Hyland, Qianchu Liu, Fernando Pérez-García, Maximilian Ilse, Daniel C. Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, Matthew P. Lungren, Aditya Nori, Javier Alvarez-Valle, Ozan Oktay

    Abstract: Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the alignment of single image and report pairs even though clinical notes commonly refer to prior images. This does not only introduce poor alignment between the modalities but also a missed opportunity to exploit rich self-superv… ▽ More

    Submitted 16 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: To appear in CVPR 2023

  11. arXiv:2203.12362  [pdf, other

    cs.HC cs.CV cs.LG eess.IV

    MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images

    Authors: Andres Diaz-Pinto, Sachidanand Alle, Vishwesh Nath, Yucheng Tang, Alvin Ihsani, Muhammad Asad, Fernando Pérez-García, Pritesh Mehta, Wenqi Li, Mona Flores, Holger R. Roth, Tom Vercauteren, Daguang Xu, Prerna Dogra, Sebastien Ourselin, Andrew Feng, M. Jorge Cardoso

    Abstract: The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the t… ▽ More

    Submitted 28 April, 2023; v1 submitted 23 March, 2022; originally announced March 2022.

  12. Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures

    Authors: Fernando Pérez-García, Catherine Scott, Rachel Sparks, Beate Diehl, Sébastien Ourselin

    Abstract: Detailed analysis of seizure semiology, the symptoms and signs which occur during a seizure, is critical for management of epilepsy patients. Inter-rater reliability using qualitative visual analysis is often poor for semiological features. Therefore, automatic and quantitative analysis of video-recorded seizures is needed for objective assessment. We present GESTURES, a novel architecture combi… ▽ More

    Submitted 5 August, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021)

    Journal ref: Medical Image Computing and Computer Assisted Intervention - MICCAI 2021. Lecture Notes in Computer Science. Springer, Cham

  13. A self-supervised learning strategy for postoperative brain cavity segmentation simulating resections

    Authors: Fernando Pérez-García, Reuben Dorent, Michele Rizzi, Francesco Cardinale, Valerio Frazzini, Vincent Navarro, Caroline Essert, Irène Ollivier, Tom Vercauteren, Rachel Sparks, John S. Duncan, Sébastien Ourselin

    Abstract: Accurate segmentation of brain resection cavities (RCs) aids in postoperative analysis and determining follow-up treatment. Convolutional neural networks (CNNs) are the state-of-the-art image segmentation technique, but require large annotated datasets for training. Annotation of 3D medical images is time-consuming, requires highly-trained raters, and may suffer from high inter-rater variability.… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: To be published in the International Journal of Computer Assisted Radiology and Surgery (IJCARS) - Special issue MICCAI 2020

  14. Simulation of Brain Resection for Cavity Segmentation Using Self-Supervised and Semi-Supervised Learning

    Authors: Fernando Pérez-García, Roman Rodionov, Ali Alim-Marvasti, Rachel Sparks, John S. Duncan, Sébastien Ourselin

    Abstract: Resective surgery may be curative for drug-resistant focal epilepsy, but only 40% to 70% of patients achieve seizure freedom after surgery. Retrospective quantitative analysis could elucidate patterns in resected structures and patient outcomes to improve resective surgery. However, the resection cavity must first be segmented on the postoperative MR image. Convolutional neural networks (CNNs) are… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: 13 pages, 6 figures, accepted at the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2020

    Journal ref: Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. Lecture Notes in Computer Science, vol 12263. Springer, Cham

  15. arXiv:2003.04696  [pdf, other

    eess.IV cs.AI cs.CV cs.LG stat.ML

    TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning

    Authors: Fernando Pérez-García, Rachel Sparks, Sébastien Ourselin

    Abstract: Processing of medical images such as MRI or CT presents unique challenges compared to RGB images typically used in computer vision. These include a lack of labels for large datasets, high computational costs, and metadata to describe the physical properties of voxels. Data augmentation is used to artificially increase the size of the training datasets. Training with image patches decreases the nee… ▽ More

    Submitted 5 August, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: Published in Computer Methods and Programs in Biomedicine

    Journal ref: Computer Methods and Programs in Biomedicine (June 2021), p. 106236. ISSN: 0169-2607