-
Predicting Generalization of AI Colonoscopy Models to Unseen Data
Authors:
Joel Shor,
Carson McNeil,
Yotam Intrator,
Joseph R Ledsam,
Hiro-o Yamano,
Daisuke Tsurumaru,
Hiroki Kayama,
Atsushi Hamabe,
Koji Ando,
Mitsuhiko Ota,
Haruei Ogino,
Hiroshi Nakase,
Kaho Kobayashi,
Masaaki Miyo,
Eiji Oki,
Ichiro Takemasa,
Ehud Rivlin,
Roman Goldenberg
Abstract:
$\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels.
$\textbf{Methods}…
▽ More
$\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels.
$\textbf{Methods}$: We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan.
$\textbf{Results}$: MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p < 10^-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56).
$\textbf{Conclusion}$: Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy.
△ Less
Submitted 22 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
AVIDa-hIL6: A Large-Scale VHH Dataset Produced from an Immunized Alpaca for Predicting Antigen-Antibody Interactions
Authors:
Hirofumi Tsuruta,
Hiroyuki Yamazaki,
Ryota Maeda,
Ryotaro Tamura,
Jennifer N. Wei,
Zelda Mariet,
Poomarin Phloyphisut,
Hidetoshi Shimokawa,
Joseph R. Ledsam,
Lucy Colwell,
Akihiro Imura
Abstract:
Antibodies have become an important class of therapeutic agents to treat human diseases. To accelerate therapeutic antibody discovery, computational methods, especially machine learning, have attracted considerable interest for predicting specific interactions between antibody candidates and target antigens such as viruses and bacteria. However, the publicly available datasets in existing works ha…
▽ More
Antibodies have become an important class of therapeutic agents to treat human diseases. To accelerate therapeutic antibody discovery, computational methods, especially machine learning, have attracted considerable interest for predicting specific interactions between antibody candidates and target antigens such as viruses and bacteria. However, the publicly available datasets in existing works have notable limitations, such as small sizes and the lack of non-binding samples and exact amino acid sequences. To overcome these limitations, we have developed AVIDa-hIL6, a large-scale dataset for predicting antigen-antibody interactions in the variable domain of heavy chain of heavy chain antibodies (VHHs), produced from an alpaca immunized with the human interleukin-6 (IL-6) protein, as antigens. By leveraging the simple structure of VHHs, which facilitates identification of full-length amino acid sequences by DNA sequencing technology, AVIDa-hIL6 contains 573,891 antigen-VHH pairs with amino acid sequences. All the antigen-VHH pairs have reliable labels for binding or non-binding, as generated by a novel labeling method. Furthermore, via introduction of artificial mutations, AVIDa-hIL6 contains 30 different mutants in addition to wild-type IL-6 protein. This characteristic provides opportunities to develop machine learning models for predicting changes in antibody binding by antigen mutations. We report experimental benchmark results on AVIDa-hIL6 by using machine learning models. The results indicate that the existing models have potential, but further research is needed to generalize them to predict effective antibodies against unknown mutants. The dataset is available at https://avida-hil6.cognanous.com.
△ Less
Submitted 10 October, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Contrastive Training for Improved Out-of-Distribution Detection
Authors:
Jim Winkens,
Rudy Bunel,
Abhijit Guha Roy,
Robert Stanforth,
Vivek Natarajan,
Joseph R. Ledsam,
Patricia MacWilliams,
Pushmeet Kohli,
Alan Karthikesalingam,
Simon Kohl,
Taylan Cemgil,
S. M. Ali Eslami,
Olaf Ronneberger
Abstract:
Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to coll…
▽ More
Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to collect in practice. We show in extensive experiments that contrastive training significantly helps OOD detection performance on a number of common benchmarks. By introducing and employing the Confusion Log Probability (CLP) score, which quantifies the difficulty of the OOD detection task by capturing the similarity of inlier and outlier datasets, we show that our method especially improves performance in the `near OOD' classes -- a particularly challenging setting for previous methods.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy
Authors:
Stanislav Nikolov,
Sam Blackwell,
Alexei Zverovitch,
Ruheena Mendes,
Michelle Livne,
Jeffrey De Fauw,
Yojan Patel,
Clemens Meyer,
Harry Askham,
Bernardino Romera-Paredes,
Christopher Kelly,
Alan Karthikesalingam,
Carlton Chu,
Dawn Carnell,
Cheng Boon,
Derek D'Souza,
Syed Ali Moinuddin,
Bethany Garie,
Yasmin McQuinlan,
Sarah Ireland,
Kiarna Hampton,
Krystle Fuller,
Hugh Montgomery,
Geraint Rees,
Mustafa Suleyman
, et al. (4 additional authors not shown)
Abstract:
Over half a million individuals are diagnosed with head and neck cancer each year worldwide. Radiotherapy is an important curative treatment for this disease, but it requires manual time consuming delineation of radio-sensitive organs at risk (OARs). This planning process can delay treatment, while also introducing inter-operator variability with resulting downstream radiation dose differences. Wh…
▽ More
Over half a million individuals are diagnosed with head and neck cancer each year worldwide. Radiotherapy is an important curative treatment for this disease, but it requires manual time consuming delineation of radio-sensitive organs at risk (OARs). This planning process can delay treatment, while also introducing inter-operator variability with resulting downstream radiation dose differences. While auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying and achieving expert performance remain. Adopting a deep learning approach, we demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck OARs commonly segmented in clinical practice. The model was trained on a dataset of 663 deidentified computed tomography (CT) scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus OAR definitions. We demonstrate the model's clinical applicability by assessing its performance on a test set of 21 CT scans from clinical practice, each with the 21 OARs segmented by two independent experts. We also introduce surface Dice similarity coefficient (surface DSC), a new metric for the comparison of organ delineation, to quantify deviation between OAR surface contours rather than volumes, better reflecting the clinical task of correcting errors in the automated organ segmentations. The model's generalisability is then demonstrated on two distinct open source datasets, reflecting different centres and countries to model training. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways.
△ Less
Submitted 13 January, 2021; v1 submitted 12 September, 2018;
originally announced September 2018.
-
A Probabilistic U-Net for Segmentation of Ambiguous Images
Authors:
Simon A. A. Kohl,
Bernardino Romera-Paredes,
Clemens Meyer,
Jeffrey De Fauw,
Joseph R. Ledsam,
Klaus H. Maier-Hein,
S. M. Ali Eslami,
Danilo Jimenez Rezende,
Olaf Ronneberger
Abstract:
Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generativ…
▽ More
Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses. We show on a lung abnormalities segmentation task and on a Cityscapes segmentation task that our model reproduces the possible segmentation variants as well as the frequencies with which they occur, doing so significantly better than published approaches. These models could have a high impact in real-world applications, such as being used as clinical decision-making algorithms accounting for multiple plausible semantic segmentation hypotheses to provide possible diagnoses and recommend further actions to resolve the present ambiguities.
△ Less
Submitted 29 January, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.