Search | arXiv e-print repository

SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology

Authors: Saarthak Kapse, Pushpak Pati, Srijan Das, **gwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, Prateek Prasanna

Abstract: Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selectio… ▽ More Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness. △ Less

Submitted 18 May, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.07330 [pdf, other]

Learned representation-guided diffusion models for large-image generation

Authors: Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris Samaras

Abstract: To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning… ▽ More To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions. △ Less

Submitted 28 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2309.06439 [pdf, other]

Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning

Authors: Saarthak Kapse, Srijan Das, **gwei Zhang, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna

Abstract: We propose DiRL, a Diversity-inducing Representation Learning technique for histopathology imaging. Self-supervised learning techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective representations of digitized tissue samples with limited pathologist supervision. Our analysis of vanilla SSL-pretrained models' attention distribution reveals an insi… ▽ More We propose DiRL, a Diversity-inducing Representation Learning technique for histopathology imaging. Self-supervised learning techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective representations of digitized tissue samples with limited pathologist supervision. Our analysis of vanilla SSL-pretrained models' attention distribution reveals an insightful observation: sparsity in attention, i.e, models tends to localize most of their attention to some prominent patterns in the image. Although attention sparsity can be beneficial in natural images due to these prominent patterns being the object of interest itself, this can be sub-optimal in digital pathology; this is because, unlike natural images, digital pathology scans are not object-centric, but rather a complex phenotype of various spatially intermixed biological components. Inadequate diversification of attention in these complex images could result in crucial information loss. To address this, we leverage cell segmentation to densely extract multiple histopathology-specific representations, and then propose a prior-guided dense pretext task for SSL, designed to match the multiple corresponding representations between the views. Through this, the model learns to attend to various components more closely and evenly, thus inducing adequate diversification in attention for capturing context rich representations. Through quantitative and qualitative analysis on multiple tasks across cancer types, we demonstrate the efficacy of our method and observe that the attention is more globally distributed. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2307.09570 [pdf, other]

SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology

Authors: **gwei Zhang, Ke Ma, Saarthak Kapse, Joel Saltz, Maria Vakalopoulou, Prateek Prasanna, Dimitris Samaras

Abstract: Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due t… ▽ More Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due to the following factors: (1) lack of comprehensive pathology datasets used in SAM training and (2) the design of SAM is not inherently optimized for semantic segmentation tasks. In this work, we adapt SAM for semantic segmentation by introducing trainable class prompts, followed by further enhancements through the incorporation of a pathology encoder, specifically a pathology foundation model. Our framework, SAM-Path enhances SAM's ability to conduct semantic segmentation in digital pathology without human input prompts. Through experiments on two public pathology datasets, the BCSS and the CRAG datasets, we demonstrate that the fine-tuning with trainable class prompts outperforms vanilla SAM with manual prompts and post-processing by 27.52% in Dice score and 71.63% in IOU. On these two datasets, the proposed additional pathology foundation model further achieves a relative improvement of 5.07% to 5.12% in Dice score and 4.50% to 8.48% in IOU. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Submitted to MedAGI 2023

arXiv:2304.01053 [pdf, other]

ViT-DAE: Transformer-driven Diffusion Autoencoder for Histopathology Image Analysis

Authors: Xuan Xu, Saarthak Kapse, Rajarsi Gupta, Prateek Prasanna

Abstract: Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data source. While Generative Adversarial Networks (GANs) have provided innovative approaches for histopathological image analysis, they suffer from limitations such as mode collapse and overfitting in discriminator. Recently, Denoising Diffusion models have de… ▽ More Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data source. While Generative Adversarial Networks (GANs) have provided innovative approaches for histopathological image analysis, they suffer from limitations such as mode collapse and overfitting in discriminator. Recently, Denoising Diffusion models have demonstrated promising results in computer vision. These models exhibit superior stability during training, better distribution coverage, and produce high-quality diverse images. Additionally, they display a high degree of resilience to noise and perturbations, making them well-suited for use in digital pathology, where images commonly contain artifacts and exhibit significant variations in staining. In this paper, we present a novel approach, namely ViT-DAE, which integrates vision transformers (ViT) and diffusion autoencoders for high-quality histopathology image synthesis. This marks the first time that ViT has been introduced to diffusion autoencoders in computational pathology, allowing the model to better capture the complex and intricate details of histopathology images. We demonstrate the effectiveness of ViT-DAE on three publicly available datasets. Our approach outperforms recent GAN-based and vanilla DAE methods in generating realistic images. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: Submitted to MICCAI 2023

arXiv:2303.12214 [pdf, other]

Prompt-MIL: Boosting Multi-Instance Learning Schemes via Task-specific Prompt Tuning

Authors: **gwei Zhang, Saarthak Kapse, Ke Ma, Prateek Prasanna, Joel Saltz, Maria Vakalopoulou, Dimitris Samaras

Abstract: Whole slide image (WSI) classification is a critical task in computational pathology, requiring the processing of gigapixel-sized images, which is challenging for current deep-learning methods. Current state of the art methods are based on multi-instance learning schemes (MIL), which usually rely on pretrained features to represent the instances. Due to the lack of task-specific annotated data, th… ▽ More Whole slide image (WSI) classification is a critical task in computational pathology, requiring the processing of gigapixel-sized images, which is challenging for current deep-learning methods. Current state of the art methods are based on multi-instance learning schemes (MIL), which usually rely on pretrained features to represent the instances. Due to the lack of task-specific annotated data, these features are either obtained from well-established backbones on natural images, or, more recently from self-supervised models pretrained on histopathology. However, both approaches yield task-agnostic features, resulting in performance loss compared to the appropriate task-related supervision, if available. In this paper, we show that when task-specific annotations are limited, we can inject such supervision into downstream task training, to reduce the gap between fully task-tuned and task agnostic features. We propose Prompt-MIL, an MIL framework that integrates prompts into WSI classification. Prompt-MIL adopts a prompt tuning mechanism, where only a small fraction of parameters calibrates the pretrained features to encode task-specific information, rather than the conventional full fine-tuning approaches. Extensive experiments on three WSI datasets, TCGA-BRCA, TCGA-CRC, and BRIGHT, demonstrate the superiority of Prompt-MIL over conventional MIL methods, achieving a relative improvement of 1.49%-4.03% in accuracy and 0.25%-8.97% in AUROC while using fewer than 0.3% additional parameters. Compared to conventional full fine-tuning approaches, we fine-tune less than 1.3% of the parameters, yet achieve a relative improvement of 1.29%-13.61% in accuracy and 3.22%-27.18% in AUROC and reduce GPU memory consumption by 38%-45% while training 21%-27% faster. Our code is available at https://github.com/cvlab-stonybrook/PromptMIL. △ Less

Submitted 4 October, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted to MICCAI 2023 (Oral)

arXiv:2212.12105 [pdf, other]

Precise Location Matching Improves Dense Contrastive Learning in Digital Pathology

Authors: **gwei Zhang, Saarthak Kapse, Ke Ma, Prateek Prasanna, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras

Abstract: Dense prediction tasks such as segmentation and detection of pathological entities hold crucial clinical value in computational pathology workflows. However, obtaining dense annotations on large cohorts is usually tedious and expensive. Contrastive learning (CL) is thus often employed to leverage large volumes of unlabeled data to pre-train the backbone network. To boost CL for dense prediction, s… ▽ More Dense prediction tasks such as segmentation and detection of pathological entities hold crucial clinical value in computational pathology workflows. However, obtaining dense annotations on large cohorts is usually tedious and expensive. Contrastive learning (CL) is thus often employed to leverage large volumes of unlabeled data to pre-train the backbone network. To boost CL for dense prediction, some studies have proposed variations of dense matching objectives in pre-training. However, our analysis shows that employing existing dense matching strategies on histopathology images enforces invariance among incorrect pairs of dense features and, thus, is imprecise. To address this, we propose a precise location-based matching mechanism that utilizes the overlap** information between geometric transformations to precisely match regions in two augmentations. Extensive experiments on two pretraining datasets (TCGA-BRCA, NCT-CRC-HE) and three downstream datasets (GlaS, CRAG, BCSS) highlight the superiority of our method in semantic and instance segmentation tasks. Our method outperforms previous dense matching methods by up to 7.2% in average precision for detection and 5.6% in average precision for instance segmentation tasks. Additionally, by using our matching mechanism in the three popular contrastive learning frameworks, MoCo-v2, VICRegL, and ConCL, the average precision in detection is improved by 0.7% to 5.2%, and the average precision in segmentation is improved by 0.7% to 4.0%, demonstrating generalizability. Our code is available at https://github.com/cvlab-stonybrook/PLM_SSL. △ Less

Submitted 22 March, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: Accept to IPMI 2023

arXiv:2208.13703 [pdf]

Conceptual design of an innovative UVC-LED air-cleaner to reduce airborne pathogen transmission

Authors: Saket Kapse, Dena Rahman, Eldad J Avital, Taylor Smith, Lidia Cantero-Garcia, Maham Sandhu, Rishav Raj, Fariborz Motallebi, Abdus Samad, Nithya Venkatesan, Clive B Beggs

Abstract: A conceptual design of a novel UVC-LED air-cleaner is presented as part of an international educational-research study. The main components are a dust-filter assembly, a UVC chamber and a fan. The dust-filter aims to suppress dust accumulation that will hamper the UVC chamber operation. The innovation is in the UVC chamber that includes a novel turbulence-generating grid to enhance air mixing in t… ▽ More A conceptual design of a novel UVC-LED air-cleaner is presented as part of an international educational-research study. The main components are a dust-filter assembly, a UVC chamber and a fan. The dust-filter aims to suppress dust accumulation that will hamper the UVC chamber operation. The innovation is in the UVC chamber that includes a novel turbulence-generating grid to enhance air mixing in the chamber and a novel LEDs layout to achieve sufficient kill of the SARS-CoV-2 virus and TB bacterium aerosols with a reasonable power consumption. Both diseases have hit hard low to medium income countries and this study is part of an effort to offer non-pharmaceutical solutions to mitigate the air-transmission of such diseases. Low to high fidelity methods of computational fluid dynamics and UVC ray method are used to show that the design can provide a kill above 97% for Covid and TB, and above 92% for influenza-A. This is at a flow rate of 100 l/s, power consumption of less than 300W and a device size that is both portable and may also fit into ventilation ducts. Research and educational methodologies are discussed, along with analysis of the inexpensive dust-filter performance and the irradiation and flow fields. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: 22 pages, 7 figures

MSC Class: 76; 92; 70 ACM Class: J.2; J.3; J.6

arXiv:2203.15078 [pdf, other]

CD-Net: Histopathology Representation Learning using Pyramidal Context-Detail Network

Authors: Saarthak Kapse, Srijan Das, Prateek Prasanna

Abstract: Extracting rich phenotype information, such as cell density and arrangement, from whole slide histology images (WSIs), requires analysis of large field of view, i.e more contexual information. This can be achieved through analyzing the digital slides at lower resolution. A potential drawback is missing out on details present at a higher resolution. To jointly leverage complementary information fro… ▽ More Extracting rich phenotype information, such as cell density and arrangement, from whole slide histology images (WSIs), requires analysis of large field of view, i.e more contexual information. This can be achieved through analyzing the digital slides at lower resolution. A potential drawback is missing out on details present at a higher resolution. To jointly leverage complementary information from multiple resolutions, we present a novel transformer based Pyramidal Context-Detail Network (CD-Net). CD-Net exploits the WSI pyramidal structure through co-training of proposed Context and Detail Modules, which operate on inputs from multiple resolutions. The residual connections between the modules enable the joint training paradigm while learning self-supervised representation for WSIs. The efficacy of CD-Net is demonstrated in classifying Lung Adenocarcinoma from Squamous cell carcinoma. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to MICCAI 2022

arXiv:2202.08022 [pdf, other]

doi 10.3847/2041-8213/ac551a

Decoding the bifurcated red-giant branch as a tracer of multiple stellar populations in the young Large Magellanic Cloud cluster NGC 2173

Authors: Shalmalee Kapse, Richard de Grijs, Devika Kamath, Daniel B. Zucker

Abstract: Multiple stellar populations (MPs) representing star-to-star light-element abundance variations are common in nearly all ancient Galactic globular clusters. Here we provide the strongest evidence yet that the populous, ~ 1.7 Gyr-old Large Magellanic Cloud cluster NGC 2173 also exhibits light-element abundance variations. Thus, our results suggest that NGC 2173 is the youngest cluster for which MPs… ▽ More Multiple stellar populations (MPs) representing star-to-star light-element abundance variations are common in nearly all ancient Galactic globular clusters. Here we provide the strongest evidence yet that the populous, ~ 1.7 Gyr-old Large Magellanic Cloud cluster NGC 2173 also exhibits light-element abundance variations. Thus, our results suggest that NGC 2173 is the youngest cluster for which MPs have been confirmed to date. Our conclusion is based on the distinct bifurcation at the tip of its red-giant branch in high-quality color--magnitude diagrams generated from Hubble Space Telescope imaging observations. Our results are further supported by a detailed analysis of 'pseudo-$UBI$' maps, which reveal clear evidence of a bimodality in the cluster's red-giant-branch color distribution. Young clusters in the Magellanic Clouds can provide critical insights into galaxy evolution histories. Our discovery of MPs in NGC 2173 suggests that ancient Galactic globular clusters and young massive clusters might share a common formation process. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: 8 pages, 5 figures

arXiv:2105.06049 [pdf, other]

TopoTxR: A Topological Biomarker for Predicting Treatment Response in Breast Cancer

Authors: Fan Wang, Saarthak Kapse, Steven Liu, Prateek Prasanna, Chao Chen

Abstract: Characterization of breast parenchyma on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Current quantitative approaches, including radiomics and deep learning models, do not explicitly capture the complex and subtle parenchymal structures, such as fibroglandular tissue. In this paper, we propose a novel… ▽ More Characterization of breast parenchyma on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Current quantitative approaches, including radiomics and deep learning models, do not explicitly capture the complex and subtle parenchymal structures, such as fibroglandular tissue. In this paper, we propose a novel method to direct a neural network's attention to a dedicated set of voxels surrounding biologically relevant tissue structures. By extracting multi-dimensional topological structures with high saliency, we build a topology-derived biomarker, TopoTxR. We demonstrate the efficacy of TopoTxR in predicting response to neoadjuvant chemotherapy in breast cancer. Our qualitative and quantitative results suggest differential topological behavior of breast tissue on treatment-naïve imaging, in patients who respond favorably to therapy versus those who do not. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 12 pages, 5 figures, 2 tables, accepted to International Conference on Information Processing in Medical Imaging (IPMI) 2021

arXiv:2103.10034 [pdf, other]

doi 10.1093/mnras/stab813

Searching for chemical abundance variations in young star clusters in the Magellanic Clouds: NGC 411, NGC 1718 and NGC 2213

Authors: Shalmalee Kapse, Richard de Grijs, Daniel B. Zucker

Abstract: The conventional picture of coeval, chemically homogeneous, populous star clusters -- known as `simple stellar populations' (SSPs) -- is a view of the past. Photometric and spectroscopic studies reveal that almost all ancient globular clusters in the Milky Way and our neighbouring galaxies exhibit star-to-star light-element abundance variations, typically known as 'multiple populations' (MPs). Her… ▽ More The conventional picture of coeval, chemically homogeneous, populous star clusters -- known as `simple stellar populations' (SSPs) -- is a view of the past. Photometric and spectroscopic studies reveal that almost all ancient globular clusters in the Milky Way and our neighbouring galaxies exhibit star-to-star light-element abundance variations, typically known as 'multiple populations' (MPs). Here, we analyse photometric $\it Hubble$ $\it Space$ $\it Telescope$ observations of three young ($<$2 Gyr-old) Large and Small Magellanic Cloud clusters, NGC 411, NGC 1718 and NGC 2213. We measure the widths of their red-giant branches (RGBs). For NGC 411, we also use a pseudo-colour--magnitude diagram (pseudo-CMD) to assess its RGB for evidence of MPs. We compare the morphologies of the clusters' RGBs with artificially generated SSPs. We conclude that their RGBs do not show evidence of significant broadening beyond intrinsic photometric scatter, suggesting an absence of significant chemical abundance variations in our sample clusters. Specifically, for NGC 411, NGC 1718 and NGC 2213 we derive maximum helium-abundance variations of delta_Y=0.003$\pm$0.001 Y=0.300), 0.002$\pm$0.001 (Y=0.350) and 0.004$\pm$0.002 (Y=0.300), respectively. We determined an upper limit to the NGC 411 nitrogen-abundance variation of $Δ$[N/Fe] = 0.3 dex; the available data for our other clusters do not allow us to determine useful upper limits. It thus appears that the transition from SSPs to MPs occurs at an age of ~2 Gyr, implying that age might play an important role in this transition. This raises the question as to whether this is indeed a fundamental minimum-age limit for the formation of MPs. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: 11 pages, 5 Figures

arXiv:2007.08028 [pdf]

Predicting Clinical Outcomes in COVID-19 using Radiomics and Deep Learning on Chest Radiographs: A Multi-Institutional Study

Authors: Joseph Bae, Saarthak Kapse, Gagandeep Singh, Rishabh Gattu, Syed Ali, Neal Shah, Colin Marshall, Jonathan Pierce, Tej Phatak, Amit Gupta, Jeremy Green, Nikhil Madan, Prateek Prasanna

Abstract: We predict mechanical ventilation requirement and mortality using computational modeling of chest radiographs (CXRs) for coronavirus disease 2019 (COVID-19) patients. This two-center, retrospective study analyzed 530 deidentified CXRs from 515 COVID-19 patients treated at Stony Brook University Hospital and Newark Beth Israel Medical Center between March and August 2020. DL and machine learning cl… ▽ More We predict mechanical ventilation requirement and mortality using computational modeling of chest radiographs (CXRs) for coronavirus disease 2019 (COVID-19) patients. This two-center, retrospective study analyzed 530 deidentified CXRs from 515 COVID-19 patients treated at Stony Brook University Hospital and Newark Beth Israel Medical Center between March and August 2020. DL and machine learning classifiers to predict mechanical ventilation requirement and mortality were trained and evaluated using patient CXRs. A novel radiomic embedding framework was also explored for outcome prediction. All results are compared against radiologist grading of CXRs (zone-wise expert severity scores). Radiomic and DL classification models had mAUCs of 0.78+/-0.02 and 0.81+/-0.04, compared with expert scores mAUCs of 0.75+/-0.02 and 0.79+/-0.05 for mechanical ventilation requirement and mortality prediction, respectively. Combined classifiers using both radiomics and expert severity scores resulted in mAUCs of 0.79+/-0.04 and 0.83+/-0.04 for each prediction task, demonstrating improvement over either artificial intelligence or radiologist interpretation alone. Our results also suggest instances where inclusion of radiomic features in DL improves model predictions, something that might be explored in other pathologies. The models proposed in this study and the prognostic information they provide might aid physician decision making and resource allocation during the COVID-19 pandemic. △ Less

Submitted 1 July, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

Comments: Joseph Bae and Saarthak Kapse have contributed equally to this work

ACM Class: J.3; I.2.6

Showing 1–13 of 13 results for author: Kapse, S