-
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images
Authors:
Nohemi Sofia Leon Contreras,
Marina D'Amato,
Francesco Ciompi,
Clement Grisi,
Witali Aswolinskiy,
Simona Vatrano,
Filippo Fraggetta,
Iris Nagtegaal
Abstract:
Training neural networks with high-quality pixel-level annotation in histopathology whole-slide images (WSI) is an expensive process due to gigapixel resolution of WSIs. However, recent advances in self-supervised learning have shown that highly descriptive image representations can be learned without the need for annotations. We investigate the application of the recent Hierarchical Image Pyramid…
▽ More
Training neural networks with high-quality pixel-level annotation in histopathology whole-slide images (WSI) is an expensive process due to gigapixel resolution of WSIs. However, recent advances in self-supervised learning have shown that highly descriptive image representations can be learned without the need for annotations. We investigate the application of the recent Hierarchical Image Pyramid Transformer (HIPT) model for the specific task of classification of colorectal biopsies and polyps. After evaluating the effectiveness of TCGA-learned features in the original HIPT model, we incorporate colon biopsy image information into HIPT's pretraining using two distinct strategies: (1) fine-tuning HIPT from the existing TCGA weights and (2) pretraining HIPT from random weight initialization. We compare the performance of these pretraining regimes on two colorectal biopsy classification tasks: binary and multiclass classification.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Masked Attention as a Mechanism for Improving Interpretability of Vision Transformers
Authors:
Clément Grisi,
Geert Litjens,
Jeroen van der Laak
Abstract:
Vision Transformers are at the heart of the current surge of interest in foundation models for histopathology. They process images by breaking them into smaller patches following a regular grid, regardless of their content. Yet, not all parts of an image are equally relevant for its understanding. This is particularly true in computational pathology where background is completely non-informative a…
▽ More
Vision Transformers are at the heart of the current surge of interest in foundation models for histopathology. They process images by breaking them into smaller patches following a regular grid, regardless of their content. Yet, not all parts of an image are equally relevant for its understanding. This is particularly true in computational pathology where background is completely non-informative and may introduce artefacts that could mislead predictions. To address this issue, we propose a novel method that explicitly masks background in Vision Transformers' attention mechanism. This ensures tokens corresponding to background patches do not contribute to the final image representation, thereby improving model robustness and interpretability. We validate our approach using prostate cancer grading from whole-slide images as a case study. Our results demonstrate that it achieves comparable performance with plain self-attention while providing more accurate and clinically meaningful attention heatmaps.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Hierarchical Vision Transformers for Context-Aware Prostate Cancer Grading in Whole Slide Images
Authors:
Clément Grisi,
Geert Litjens,
Jeroen van der Laak
Abstract:
Vision Transformers (ViTs) have ushered in a new era in computer vision, showcasing unparalleled performance in many challenging tasks. However, their practical deployment in computational pathology has largely been constrained by the sheer size of whole slide images (WSIs), which result in lengthy input sequences. Transformers faced a similar limitation when applied to long documents, and Hierarc…
▽ More
Vision Transformers (ViTs) have ushered in a new era in computer vision, showcasing unparalleled performance in many challenging tasks. However, their practical deployment in computational pathology has largely been constrained by the sheer size of whole slide images (WSIs), which result in lengthy input sequences. Transformers faced a similar limitation when applied to long documents, and Hierarchical Transformers were introduced to circumvent it. Given the analogous challenge with WSIs and their inherent hierarchical structure, Hierarchical Vision Transformers (H-ViTs) emerge as a promising solution in computational pathology. This work delves into the capabilities of H-ViTs, evaluating their efficiency for prostate cancer grading in WSIs. Our results show that they achieve competitive performance against existing state-of-the-art solutions.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.