Search | arXiv e-print repository

Transformer-based segmentation of adnexal lesions and ovarian implants in CT images

Authors: Aneesh Rangnekar, Kevin M. Boehm, Emily A. Aherne, Ines Nikolovski, Natalie Gangai, Ying Liu, Dimitry Zamarin, Kara L. Roche, Sohrab P. Shah, Yulia Lakhman, Harini Veeraraghavan

Abstract: Two self-supervised pretrained transformer-based segmentation models (SMIT and Swin UNETR) fine-tuned on a dataset of ovarian cancer CT images provided reasonably accurate delineations of the tumors in an independent test dataset. Tumors in the adnexa were segmented more accurately by both transformers (SMIT and Swin UNETR) than the omental implants. AI-assisted labeling performed on 72 out of 245… ▽ More Two self-supervised pretrained transformer-based segmentation models (SMIT and Swin UNETR) fine-tuned on a dataset of ovarian cancer CT images provided reasonably accurate delineations of the tumors in an independent test dataset. Tumors in the adnexa were segmented more accurately by both transformers (SMIT and Swin UNETR) than the omental implants. AI-assisted labeling performed on 72 out of 245 omental implants resulted in smaller manual editing effort of 39.55 mm compared to full manual correction of partial labels of 106.49 mm and resulted in overall improved accuracy performance. Both SMIT and Swin UNETR did not generate any false detection of omental metastases in the urinary bladder and relatively few false detections in the small bowel, with 2.16 cc on average for SMIT and 7.37 cc for Swin UNETR respectively. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.08657 [pdf, other]

Self-supervised learning improves robustness of deep learning lung tumor segmentation to CT imaging differences

Authors: Jue Jiang, Aneesh Rangnekar, Harini Veeraraghavan

Abstract: Self-supervised learning (SSL) is an approach to extract useful feature representations from unlabeled data, and enable fine-tuning on downstream tasks with limited labeled examples. Self-pretraining is a SSL approach that uses the curated task dataset for both pretraining the networks and fine-tuning them. Availability of large, diverse, and uncurated public medical image sets provides the opport… ▽ More Self-supervised learning (SSL) is an approach to extract useful feature representations from unlabeled data, and enable fine-tuning on downstream tasks with limited labeled examples. Self-pretraining is a SSL approach that uses the curated task dataset for both pretraining the networks and fine-tuning them. Availability of large, diverse, and uncurated public medical image sets provides the opportunity to apply SSL in the "wild" and potentially extract features robust to imaging variations. However, the benefit of wild- vs self-pretraining has not been studied for medical image analysis. In this paper, we compare robustness of wild versus self-pretrained transformer (vision transformer [ViT] and hierarchical shifted window [Swin]) models to computed tomography (CT) imaging differences for non-small cell lung cancer (NSCLC) segmentation. Wild-pretrained Swin models outperformed self-pretrained Swin for the various imaging acquisitions. ViT resulted in similar accuracy for both wild- and self-pretrained models. Masked image prediction pretext task that forces networks to learn the local structure resulted in higher accuracy compared to contrastive task that models global image information. Wild-pretrained models resulted in higher feature reuse at the lower level layers and feature differentiation close to output layer after fine-tuning. Hence, we conclude: Wild-pretrained networks were more robust to analyzed CT imaging differences for lung tumor segmentation than self-pretrained methods. Swin architecture benefited from such pretraining more than ViT. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.03762 [pdf, other]

Deep learning classifier of locally advanced rectal cancer treatment response from endoscopy images

Authors: Jorge Tapias Gomez, Aneesh Rangnekar, Hannah Williams, Hannah Thompson, Julio Garcia-Aguilar, Joshua Jesse Smith, Harini Veeraraghavan

Abstract: We developed a deep learning classifier of rectal cancer response (tumor vs. no-tumor) to total neoadjuvant treatment (TNT) from endoscopic images acquired before, during, and following TNT. We further evaluated the network's ability in a near out-of-distribution (OOD) problem to identify local regrowth (LR) from follow-up endoscopy images acquired several months to years after completing TNT. We… ▽ More We developed a deep learning classifier of rectal cancer response (tumor vs. no-tumor) to total neoadjuvant treatment (TNT) from endoscopic images acquired before, during, and following TNT. We further evaluated the network's ability in a near out-of-distribution (OOD) problem to identify local regrowth (LR) from follow-up endoscopy images acquired several months to years after completing TNT. We addressed endoscopic image variability by using optimal mass transport-based image harmonization. We evaluated multiple training regularization schemes to study the ResNet-50 network's in-distribution and near-OOD generalization ability. Test time augmentation resulted in the most considerable accuracy improvement. Image harmonization resulted in slight accuracy improvement for the near-OOD cases. Our results suggest that off-the-shelf deep learning classifiers can detect rectal cancer from endoscopic images at various stages of therapy for surveillance. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2403.13113 [pdf, other]

Trustworthiness of Pretrained Transformers for Lung Cancer Segmentation

Authors: Aneesh Rangnekar, Nishant Nadkarni, Jue Jiang, Harini Veeraraghavan

Abstract: We assessed the trustworthiness of two self-supervision pretrained transformer models, Swin UNETR and SMIT, for fine-tuned lung (LC) tumor segmentation using 670 CT and MRI scans. We measured segmentation accuracy on two public 3D-CT datasets, robustness on CT scans of patients with COVID-19, CT scans of patients with ovarian cancer and T2-weighted MRI of men with prostate cancer, and zero-shot ge… ▽ More We assessed the trustworthiness of two self-supervision pretrained transformer models, Swin UNETR and SMIT, for fine-tuned lung (LC) tumor segmentation using 670 CT and MRI scans. We measured segmentation accuracy on two public 3D-CT datasets, robustness on CT scans of patients with COVID-19, CT scans of patients with ovarian cancer and T2-weighted MRI of men with prostate cancer, and zero-shot generalization of LC for T2-weighted MRIs. Both models demonstrated high accuracy on in-distribution data (Dice 0.80 for SMIT and 0.78 for Swin UNETR). SMIT showed similar near-out-of-distribution performance on CT scans (AUROC 89.85% vs. 89.19%) but significantly better far-out-of-distribution accuracy on CT (AUROC 97.2% vs. 87.1%) and MRI (92.15% vs. 73.8%). SMIT outperformed Swin UNETR in zero-shot segmentation on MRI (Dice 0.78 vs. 0.69). We expect these findings to guide the safe development and deployment of current and future pretrained models in routine clinical use. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2310.01209 [pdf, other]

Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis

Authors: Jue Jiang, Aneesh Rangnekar, Chloe Min Seo Choi, Harini Veeraraghavan

Abstract: Pretraining vision transformers (ViT) with attention guided masked image modeling (MIM) has shown to increase downstream accuracy for natural image analysis. Hierarchical shifted window (Swin) transformer, often used in medical image analysis cannot use attention guided masking as it lacks an explicit [CLS] token, needed for computing attention maps for selective masking. We thus enhanced Swin wit… ▽ More Pretraining vision transformers (ViT) with attention guided masked image modeling (MIM) has shown to increase downstream accuracy for natural image analysis. Hierarchical shifted window (Swin) transformer, often used in medical image analysis cannot use attention guided masking as it lacks an explicit [CLS] token, needed for computing attention maps for selective masking. We thus enhanced Swin with semantic class attention. We developed a co-distilled Swin transformer that combines a noisy momentum updated teacher to guide selective masking for MIM. Our approach called \textsc{s}e\textsc{m}antic \textsc{a}ttention guided co-distillation with noisy teacher \textsc{r}egularized Swin \textsc{T}rans\textsc{F}ormer (SMARTFormer) was applied for analyzing 3D computed tomography datasets with lung nodules and malignant lung cancers (LC). We also analyzed the impact of semantic attention and noisy teacher on pretraining and downstream accuracy. SMARTFormer classified lesions (malignant from benign) with a high accuracy of 0.895 of 1000 nodules, predicted LC treatment response with accuracy of 0.74, and achieved high accuracies even in limited data regimes. Pretraining with semantic attention and noisy teacher improved ability to distinguish semantically meaningful structures such as organs in a unsupervised clustering task and localize abnormal structures like tumors. Code, models will be made available through GitHub upon paper acceptance. △ Less

Submitted 3 July, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Paper is under review at TMI

arXiv:2210.08403 [pdf, other]

Semantic Segmentation with Active Semi-Supervised Representation Learning

Authors: Aneesh Rangnekar, Christopher Kanan, Matthew Hoffman

Abstract: Obtaining human per-pixel labels for semantic segmentation is incredibly laborious, often making labeled dataset construction prohibitively expensive. Here, we endeavor to overcome this problem with a novel algorithm that combines semi-supervised and active learning, resulting in the ability to train an effective semantic segmentation algorithm with significantly lesser labeled data. To do this, w… ▽ More Obtaining human per-pixel labels for semantic segmentation is incredibly laborious, often making labeled dataset construction prohibitively expensive. Here, we endeavor to overcome this problem with a novel algorithm that combines semi-supervised and active learning, resulting in the ability to train an effective semantic segmentation algorithm with significantly lesser labeled data. To do this, we extend the prior state-of-the-art S4AL algorithm by replacing its mean teacher approach for semi-supervised learning with a self-training approach that improves learning with noisy labels. We further boost the neural network's ability to query useful data by adding a contrastive learning head, which leads to better understanding of the objects in the scene, and hence, better queries for active learning. We evaluate our method on CamVid and CityScapes datasets, the de-facto standards for active learning for semantic segmentation. We achieve more than 95% of the network's performance on CamVid and CityScapes datasets, utilizing only 12.1% and 15.1% of the labeled data, respectively. We also benchmark our method across existing stand-alone semi-supervised learning methods on the CityScapes dataset and achieve superior performance without any bells or whistles. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: To appear in the British Machine Vision Conference (BMVC-2022)

arXiv:2203.10730 [pdf, other]

Semantic Segmentation with Active Semi-Supervised Learning

Authors: Aneesh Rangnekar, Christopher Kanan, Matthew Hoffman

Abstract: Using deep learning, we now have the ability to create exceptionally good semantic segmentation systems; however, collecting the prerequisite pixel-wise annotations for training images remains expensive and time-consuming. Therefore, it would be ideal to minimize the number of human annotations needed when creating a new dataset. Here, we address this problem by proposing a novel algorithm that co… ▽ More Using deep learning, we now have the ability to create exceptionally good semantic segmentation systems; however, collecting the prerequisite pixel-wise annotations for training images remains expensive and time-consuming. Therefore, it would be ideal to minimize the number of human annotations needed when creating a new dataset. Here, we address this problem by proposing a novel algorithm that combines active learning and semi-supervised learning. Active learning is an approach for identifying the best unlabeled samples to annotate. While there has been work on active learning for segmentation, most methods require annotating all pixel objects in each image, rather than only the most informative regions. We argue that this is inefficient. Instead, our active learning approach aims to minimize the number of annotations per-image. Our method is enriched with semi-supervised learning, where we use pseudo labels generated with a teacher-student framework to identify image regions that help disambiguate confused classes. We also integrate mechanisms that enable better performance on imbalanced label distributions, which have not been studied previously for active learning in semantic segmentation. In experiments on the CamVid and CityScapes datasets, our method obtains over 95% of the network's performance on the full-training set using less than 17% of the training data, whereas the previous state of the art required 40% of the training data. △ Less

Submitted 15 October, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: To appear in the Winter Conference on Applications of Computer Vision (WACV-2023)

arXiv:2004.08228 [pdf, other]

Calibrated Vehicle Paint Signatures for Simulating Hyperspectral Imagery

Authors: Zachary Mulhollan, Aneesh Rangnekar, Timothy Bauch, Matthew J. Hoffman, Anthony Vodacek

Abstract: We investigate a procedure for rapidly adding calibrated vehicle visible-near infrared (VNIR) paint signatures to an existing hyperspectral simulator - The Digital Imaging and Remote Sensing Image Generation (DIRSIG) model - to create more diversity in simulated urban scenes. The DIRSIG model can produce synthetic hyperspectral imagery with user-specified geometry, atmospheric conditions, and grou… ▽ More We investigate a procedure for rapidly adding calibrated vehicle visible-near infrared (VNIR) paint signatures to an existing hyperspectral simulator - The Digital Imaging and Remote Sensing Image Generation (DIRSIG) model - to create more diversity in simulated urban scenes. The DIRSIG model can produce synthetic hyperspectral imagery with user-specified geometry, atmospheric conditions, and ground target spectra. To render an object pixel's spectral signature, DIRSIG uses a large database of reflectance curves for the corresponding object material and a bidirectional reflectance model to introduce s due to orientation and surface structure. However, this database contains only a few spectral curves for vehicle paints and generates new paint signatures by combining these curves internally. In this paper we demonstrate a method to rapidly generate multiple paint spectra, flying a drone carrying a pushbroom hyperspectral camera to image a university parking lot. We then process the images to convert them from the digital count space to spectral reflectance without the need of calibration panels in the scene, and port the paint signatures into DIRSIG for successful integration into the newly rendered sets of synthetic VNIR hyperspectral scenes. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: 10 pages with supplementary material, 7 figures, submitted to CVPR PBVS 2020 Conference, Dataset uploaded to https://github.com/mulhollanz/Hyperspectral_Vehicle_Paint_Signatures_Dataset.git

arXiv:1912.08178 [pdf, other]

doi 10.1109/TGRS.2020.2987199

AeroRIT: A New Scene for Hyperspectral Image Analysis

Authors: Aneesh Rangnekar, Nilay Mokashi, Emmett Ientilucci, Christopher Kanan, Matthew J. Hoffman

Abstract: We investigate applying convolutional neural network (CNN) architecture to facilitate aerial hyperspectral scene understanding and present a new hyperspectral dataset-AeroRIT-that is large enough for CNN training. To date the majority of hyperspectral airborne have been confined to various sub-categories of vegetation and roads and this scene introduces two new categories: buildings and cars. To t… ▽ More We investigate applying convolutional neural network (CNN) architecture to facilitate aerial hyperspectral scene understanding and present a new hyperspectral dataset-AeroRIT-that is large enough for CNN training. To date the majority of hyperspectral airborne have been confined to various sub-categories of vegetation and roads and this scene introduces two new categories: buildings and cars. To the best of our knowledge, this is the first comprehensive large-scale hyperspectral scene with nearly seven million pixel annotations for identifying cars, roads, and buildings. We compare the performance of three popular architectures - SegNet, U-Net, and Res-U-Net, for scene understanding and object identification via the task of dense semantic segmentation to establish a benchmark for the scene. To further strengthen the network, we add squeeze and excitation blocks for better channel interactions and use self-supervised learning for better encoder initialization. Aerial hyperspectral image analysis has been restricted to small datasets with limited train/test splits capabilities and we believe that AeroRIT will help advance the research in the field with a more complex object distribution to perform well on. The full dataset, with flight lines in radiance and reflectance domain, is available for download at https://github.com/aneesh3108/AeroRIT. This dataset is the first step towards develo** robust algorithms for hyperspectral airborne sensing that can robustly perform advanced tasks like vehicle tracking and occlusion handling. △ Less

Submitted 7 April, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

Comments: To appear in IEEE TGRS

arXiv:1712.08690 [pdf, other]

Aerial Spectral Super-Resolution using Conditional Adversarial Networks

Authors: Aneesh Rangnekar, Nilay Mokashi, Emmett Ientilucci, Christopher Kanan, Matthew Hoffman

Abstract: Inferring spectral signatures from ground based natural images has acquired a lot of interest in applied deep learning. In contrast to the spectra of ground based images, aerial spectral images have low spatial resolution and suffer from higher noise interference. In this paper, we train a conditional adversarial network to learn an inverse map** from a trichromatic space to 31 spectral bands wi… ▽ More Inferring spectral signatures from ground based natural images has acquired a lot of interest in applied deep learning. In contrast to the spectra of ground based images, aerial spectral images have low spatial resolution and suffer from higher noise interference. In this paper, we train a conditional adversarial network to learn an inverse map** from a trichromatic space to 31 spectral bands within 400 to 700 nm. The network is trained on AeroCampus, a first of its kind aerial hyperspectral dataset. AeroCampus consists of high spatial resolution color images and low spatial resolution hyperspectral images (HSI). Color images synthesized from 31 spectral bands are used to train our network. With a baseline root mean square error of 2.48 on the synthesized RGB test data, we show that it is possible to generate spectral signatures in aerial imagery. △ Less

Submitted 22 December, 2017; originally announced December 2017.

arXiv:1711.07235 [pdf, other]

Tracking in Aerial Hyperspectral Videos using Deep Kernelized Correlation Filters

Authors: Burak Uzkent, Aneesh Rangnekar, Matthew J. Hoffman

Abstract: Hyperspectral imaging holds enormous potential to improve the state-of-the-art in aerial vehicle tracking with low spatial and temporal resolutions. Recently, adaptive multi-modal hyperspectral sensors have attracted growing interest due to their ability to record extended data quickly from aerial platforms. In this study, we apply popular concepts from traditional object tracking, namely (1) Kern… ▽ More Hyperspectral imaging holds enormous potential to improve the state-of-the-art in aerial vehicle tracking with low spatial and temporal resolutions. Recently, adaptive multi-modal hyperspectral sensors have attracted growing interest due to their ability to record extended data quickly from aerial platforms. In this study, we apply popular concepts from traditional object tracking, namely (1) Kernelized Correlation Filters (KCF) and (2) Deep Convolutional Neural Network (CNN) features to aerial tracking in hyperspectral domain. We propose the Deep Hyperspectral Kernelized Correlation Filter based tracker (DeepHKCF) to efficiently track aerial vehicles using an adaptive multi-modal hyperspectral sensor. We address low temporal resolution by designing a single KCF-in-multiple Regions-of-Interest (ROIs) approach to cover a reasonably large area. To increase the speed of deep convolutional features extraction from multiple ROIs, we design an effective ROI map** strategy. The proposed tracker also provides flexibility to couple with the more advanced correlation filter trackers. The DeepHKCF tracker performs exceptionally well with deep features set up in a synthetic hyperspectral video generated by the Digital Imaging and Remote Sensing Image Generation (DIRSIG) software. Additionally, we generate a large, synthetic, single-channel dataset using DIRSIG to perform vehicle classification in the Wide Area Motion Imagery (WAMI) platform. This way, the high-fidelity of the DIRSIG software is proved and a large scale aerial vehicle classification dataset is released to support studies on vehicle detection and tracking in the WAMI platform. △ Less

Submitted 6 May, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

arXiv:1707.03553 [pdf, other]

Aerial Vehicle Tracking by Adaptive Fusion of Hyperspectral Likelihood Maps

Authors: Burak Uzkent, Aneesh Rangnekar, M. J. Hoffman

Abstract: Hyperspectral cameras can provide unique spectral signatures for consistently distinguishing materials that can be used to solve surveillance tasks. In this paper, we propose a novel real-time hyperspectral likelihood maps-aided tracking method (HLT) inspired by an adaptive hyperspectral sensor. A moving object tracking system generally consists of registration, object detection, and tracking modu… ▽ More Hyperspectral cameras can provide unique spectral signatures for consistently distinguishing materials that can be used to solve surveillance tasks. In this paper, we propose a novel real-time hyperspectral likelihood maps-aided tracking method (HLT) inspired by an adaptive hyperspectral sensor. A moving object tracking system generally consists of registration, object detection, and tracking modules. We focus on the target detection part and remove the necessity to build any offline classifiers and tune a large amount of hyperparameters, instead learning a generative target model in an online manner for hyperspectral channels ranging from visible to infrared wavelengths. The key idea is that, our adaptive fusion method can combine likelihood maps from multiple bands of hyperspectral imagery into one single more distinctive representation increasing the margin between mean value of foreground and background pixels in the fused map. Experimental results show that the HLT not only outperforms all established fusion methods but is on par with the current state-of-the-art hyperspectral target tracking frameworks. △ Less

Submitted 12 July, 2017; originally announced July 2017.

Comments: Accepted at the International Conference on Computer Vision and Pattern Recognition Workshops, 2017

Showing 1–12 of 12 results for author: Rangnekar, A