Search | arXiv e-print repository

A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation

Authors: Yuyue Zhou, Banafshe Felfeliyan, Shrimanti Ghosh, Jessica Knight, Fatima Alves-Pereira, Christopher Keen, Jessica Küpper, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremko

Abstract: Conventional deep learning models deal with images one-by-one, requiring costly and time-consuming expert labeling in the field of medical imaging, and domain-specific restriction limits model generalizability. Visual in-context learning (ICL) is a new and exciting area of research in computer vision. Unlike conventional deep learning, ICL emphasizes the model's ability to adapt to new tasks based… ▽ More Conventional deep learning models deal with images one-by-one, requiring costly and time-consuming expert labeling in the field of medical imaging, and domain-specific restriction limits model generalizability. Visual in-context learning (ICL) is a new and exciting area of research in computer vision. Unlike conventional deep learning, ICL emphasizes the model's ability to adapt to new tasks based on given examples quickly. Inspired by MAE-VQGAN, we proposed a new simple visual ICL method called SimICL, combining visual ICL pairing images with masked image modeling (MIM) designed for self-supervised learning. We validated our method on bony structures segmentation in a wrist ultrasound (US) dataset with limited annotations, where the clinical objective was to segment bony structures to help with further fracture detection. We used a test set containing 3822 images from 18 patients for bony region segmentation. SimICL achieved an remarkably high Dice coeffient (DC) of 0.96 and Jaccard Index (IoU) of 0.92, surpassing state-of-the-art segmentation and visual ICL models (a maximum DC 0.86 and IoU 0.76), with SimICL DC and IoU increasing up to 0.10 and 0.16. This remarkably high agreement with limited manual annotations indicates SimICL could be used for training AI models even on small US datasets. This could dramatically decrease the human expert time required for image labeling compared to conventional approaches, and enhance the real-world use of AI assistance in US image analysis. △ Less

Submitted 8 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.06331 [pdf]

Application Of Vision-Language Models For Assessing Osteoarthritis Disease Severity

Authors: Banafshe Felfeliyan, Yuyue Zhou, Shrimanti Ghosh, Jessica Kupper, Shaobo Liu, Abhilash Hareendranathan, Jacob L. Jaremko

Abstract: Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods. Current radiographic assessments are time consuming and prone to variability, prompting the need for automated solutions. The existing deep learning models for OA assessment are unimodal single task systems and they don't incorporate relevant text information such as patient demographics, disease history, or… ▽ More Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods. Current radiographic assessments are time consuming and prone to variability, prompting the need for automated solutions. The existing deep learning models for OA assessment are unimodal single task systems and they don't incorporate relevant text information such as patient demographics, disease history, or physician reports. This study investigates employing Vision Language Processing (VLP) models to predict OA severity using Xray images and corresponding reports. Our method leverages Xray images of the knee and diverse report templates generated from tabular OA scoring values to train a CLIP (Contrastive Language Image PreTraining) style VLP model. Furthermore, we incorporate additional contrasting captions to enforce the model to discriminate between positive and negative reports. Results demonstrate the efficacy of these models in learning text image representations and their contextual relationships, showcase potential advancement in OA assessment, and establish a foundation for specialized vision language models in medical contexts. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2309.09490 [pdf, other]

Self-supervised TransUNet for Ultrasound regional segmentation of the distal radius in children

Authors: Yuyue Zhou, Jessica Knight, Banafshe Felfeliyan, Christopher Keen, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremko

Abstract: Supervised deep learning offers great promise to automate analysis of medical images from segmentation to diagnosis. However, their performance highly relies on the quality and quantity of the data annotation. Meanwhile, curating large annotated datasets for medical images requires a high level of expertise, which is time-consuming and expensive. Recently, to quench the thirst for large data sets… ▽ More Supervised deep learning offers great promise to automate analysis of medical images from segmentation to diagnosis. However, their performance highly relies on the quality and quantity of the data annotation. Meanwhile, curating large annotated datasets for medical images requires a high level of expertise, which is time-consuming and expensive. Recently, to quench the thirst for large data sets with high-quality annotation, self-supervised learning (SSL) methods using unlabeled domain-specific data, have attracted attention. Therefore, designing an SSL method that relies on minimal quantities of labeled data has far-reaching significance in medical images. This paper investigates the feasibility of deploying the Masked Autoencoder for SSL (SSL-MAE) of TransUNet, for segmenting bony regions from children's wrist ultrasound scans. We found that changing the embedding and loss function in SSL-MAE can produce better downstream results compared to the original SSL-MAE. In addition, we determined that only pretraining TransUNet embedding and encoder with SSL-MAE does not work as well as TransUNet without SSL-MAE pretraining on downstream segmentation tasks. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2209.08172 [pdf, other]

doi 10.1007/978-3-031-37742-6_47

Weakly Supervised Medical Image Segmentation With Soft Labels and Noise Robust Loss

Authors: Banafshe Felfeliyan, Abhilash Hareendranathan, Gregor Kuntze, Stephanie Wichuk, Nils D. Forkert, Jacob L. Jaremko, Janet L. Ronsky

Abstract: Recent advances in deep learning algorithms have led to significant benefits for solving many medical image analysis problems. Training deep learning models commonly requires large datasets with expert-labeled annotations. However, acquiring expert-labeled annotation is not only expensive but also is subjective, error-prone, and inter-/intra- observer variability introduces noise to labels. This i… ▽ More Recent advances in deep learning algorithms have led to significant benefits for solving many medical image analysis problems. Training deep learning models commonly requires large datasets with expert-labeled annotations. However, acquiring expert-labeled annotation is not only expensive but also is subjective, error-prone, and inter-/intra- observer variability introduces noise to labels. This is particularly a problem when using deep learning models for segmenting medical images due to the ambiguous anatomical boundaries. Image-based medical diagnosis tools using deep learning models trained with incorrect segmentation labels can lead to false diagnoses and treatment suggestions. Multi-rater annotations might be better suited to train deep learning models with small training sets compared to single-rater annotations. The aim of this paper was to develop and evaluate a method to generate probabilistic labels based on multi-rater annotations and anatomical knowledge of the lesion features in MRI and a method to train segmentation models using probabilistic labels using normalized active-passive loss as a "noise-tolerant loss" function. The model was evaluated by comparing it to binary ground truth for 17 knees MRI scans for clinical segmentation and detection of bone marrow lesions (BML). The proposed method successfully improved precision 14, recall 22, and Dice score 8 percent compared to a binary cross-entropy loss function. Overall, the results of this work suggest that the proposed normalized active-passive loss using soft labels successfully mitigated the effects of noisy labels. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2207.11191 [pdf]

Self-Supervised-RCNN for Medical Image Segmentation with Limited Data Annotation

Authors: Banafshe Felfeliyan, Abhilash Hareendranathan, Gregor Kuntze, David Cornell, Nils D. Forkert, Jacob L. Jaremko, Janet L. Ronsky

Abstract: Many successful methods developed for medical image analysis that are based on machine learning use supervised learning approaches, which often require large datasets annotated by experts to achieve high accuracy. However, medical data annotation is time-consuming and expensive, especially for segmentation tasks. To solve the problem of learning with limited labeled medical image data, an alternat… ▽ More Many successful methods developed for medical image analysis that are based on machine learning use supervised learning approaches, which often require large datasets annotated by experts to achieve high accuracy. However, medical data annotation is time-consuming and expensive, especially for segmentation tasks. To solve the problem of learning with limited labeled medical image data, an alternative deep learning training strategy based on self-supervised pretraining on unlabeled MRI scans is proposed in this work. Our pretraining approach first, randomly applies different distortions to random areas of unlabeled images and then predicts the type of distortions and loss of information. To this aim, an improved version of Mask-RCNN architecture has been adapted to localize the distortion location and recover the original image pixels. The effectiveness of the proposed method for segmentation tasks in different pre-training and fine-tuning scenarios is evaluated based on the Osteoarthritis Initiative dataset. Using this self-supervised pretraining method improved the Dice score by 20% compared to training from scratch. The proposed self-supervised learning is simple, effective, and suitable for different ranges of medical image analysis tasks including anomaly detection, segmentation, and classification. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2109.01309 [pdf]

Unsupervised multi-latent space reinforcement learning framework for video summarization in ultrasound imaging

Authors: Roshan P Mathews, Mahesh Raveendranatha Panicker, Abhilash R Hareendranathan, Yale Tung Chen, Jacob L Jaremko, Brian Buchanan, Kiran Vishnu Narayan, Kesavadas C, Greeta Mathews

Abstract: The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans and provide clinicians with fast access to relevant information. The proposed video-summarization technique is a step in this direction that provides clinicians access to relevant key-frames from a given ultrasound scan (such as lung ultrasound) while reducing resource, storage and bandwidth requiremen… ▽ More The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans and provide clinicians with fast access to relevant information. The proposed video-summarization technique is a step in this direction that provides clinicians access to relevant key-frames from a given ultrasound scan (such as lung ultrasound) while reducing resource, storage and bandwidth requirements. We propose a new unsupervised reinforcement learning (RL) framework with novel rewards that facilitates unsupervised learning avoiding tedious and impractical manual labelling for summarizing ultrasound videos to enhance its utility as a triage tool in the emergency department (ED) and for use in telemedicine. Using an attention ensemble of encoders, the high dimensional image is projected into a low dimensional latent space in terms of: a) reduced distance with a normal or abnormal class (classifier encoder), b) following a topology of landmarks (segmentation encoder), and c) the distance or topology agnostic latent representation (convolutional autoencoders). The decoder is implemented using a bi-directional long-short term memory (Bi-LSTM) which utilizes the latent space representation from the encoder. Our new paradigm for video summarization is capable of delivering classification labels and segmentation of key landmarks for each of the summarized keyframes. Validation is performed on lung ultrasound (LUS) dataset, that typically represent potential use cases in telemedicine and ED triage acquired from different medical centers across geographies (India, Spain and Canada). △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: 24 pages, submitted to Elsevier Medical Image Analysis for review

arXiv:2107.12889 [pdf]

doi 10.1016/j.compmedimag.2022.102056

Improved-Mask R-CNN: Towards an Accurate Generic MSK MRI instance segmentation platform (Data from the Osteoarthritis Initiative)

Authors: Banafshe Felfeliyan, Abhilash Hareendranathan, Gregor Kuntze, Jacob L. Jaremko, Janet L. Ronsky

Abstract: Objective assessment of Magnetic Resonance Imaging (MRI) scans of osteoarthritis (OA) can address the limitation of the current OA assessment. Segmentation of bone, cartilage, and joint fluid is necessary for the OA objective assessment. Most of the proposed segmentation methods are not performing instance segmentation and suffer from class imbalance problems. This study deployed Mask R-CNN instan… ▽ More Objective assessment of Magnetic Resonance Imaging (MRI) scans of osteoarthritis (OA) can address the limitation of the current OA assessment. Segmentation of bone, cartilage, and joint fluid is necessary for the OA objective assessment. Most of the proposed segmentation methods are not performing instance segmentation and suffer from class imbalance problems. This study deployed Mask R-CNN instance segmentation and improved it (improved-Mask R-CNN (iMaskRCNN)) to obtain a more accurate generalized segmentation for OA-associated tissues. Training and validation of the method were performed using 500 MRI knees from the Osteoarthritis Initiative (OAI) dataset and 97 MRI scans of patients with symptomatic hip OA. Three modifications to Mask R-CNN yielded the iMaskRCNN: adding a 2nd ROIAligned block, adding an extra decoder layer to the mask-header, and connecting them by a skip connection. The results were assessed using Hausdorff distance, dice score, and coefficients of variation (CoV). The iMaskRCNN led to improved bone and cartilage segmentation compared to Mask RCNN as indicated with the increase in dice score from 95% to 98% for the femur, 95% to 97% for tibia, 71% to 80% for femoral cartilage, and 81% to 82% for tibial cartilage. For the effusion detection, dice improved with iMaskRCNN 72% versus MaskRCNN 71%. The CoV values for effusion detection between Reader1 and Mask R-CNN (0.33), Reader1 and iMaskRCNN (0.34), Reader2 and Mask R-CNN (0.22), Reader2 and iMaskRCNN (0.29) are close to CoV between two readers (0.21), indicating a high agreement between the human readers and both Mask R-CNN and iMaskRCNN. Mask R-CNN and iMaskRCNN can reliably and simultaneously extract different scale articular tissues involved in OA, forming the foundation for automated assessment of OA. The iMaskRCNN results show that the modification improved the network performance around the edges. △ Less

Submitted 23 June, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

arXiv:2106.06987 [pdf]

Learning the Imaging Landmarks: Unsupervised Key point Detection in Lung Ultrasound Videos

Authors: Arpan Tripathi, Mahesh Raveendranatha Panicker, Abhilash R Hareendranathan, Yale Tung Chen, Jacob L Jaremko, Kiran Vishnu Narayan, Kesavadas C

Abstract: Lung ultrasound (LUS) is an increasingly popular diagnostic imaging modality for continuous and periodic monitoring of lung infection, given its advantages of non-invasiveness, non-ionizing nature, portability and easy disinfection. The major landmarks assessed by clinicians for triaging using LUS are pleura, A and B lines. There have been many efforts for the automatic detection of these landmark… ▽ More Lung ultrasound (LUS) is an increasingly popular diagnostic imaging modality for continuous and periodic monitoring of lung infection, given its advantages of non-invasiveness, non-ionizing nature, portability and easy disinfection. The major landmarks assessed by clinicians for triaging using LUS are pleura, A and B lines. There have been many efforts for the automatic detection of these landmarks. However, restricting to a few pre-defined landmarks may not reveal the actual imaging biomarkers particularly in case of new pathologies like COVID-19. Rather, the identification of key landmarks should be driven by data given the availability of a plethora of neural network algorithms. This work is a first of its kind attempt towards unsupervised detection of the key LUS landmarks in LUS videos of COVID-19 subjects during various stages of infection. We adapted the relatively newer approach of transporter neural networks to automatically mark and track pleura, A and B lines based on their periodic motion and relatively stable appearance in the videos. Initial results on unsupervised pleura detection show an accuracy of 91.8% employing 1081 LUS video frames. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: 5 pages, 6 figures, submitted to IEEE EMBC 2021

arXiv:2102.06164 [pdf, other]

Sample Efficient Learning of Image-Based Diagnostic Classifiers Using Probabilistic Labels

Authors: Roberto Vega, Pouneh Gorji, Zichen Zhang, Xuebin Qin, Abhilash Rakkunedeth Hareendranathan, Jeevesh Kapur, Jacob L. Jaremko, Russell Greiner

Abstract: Deep learning approaches often require huge datasets to achieve good generalization. This complicates its use in tasks like image-based medical diagnosis, where the small training datasets are usually insufficient to learn appropriate data representations. For such sensitive tasks it is also important to provide the confidence in the predictions. Here, we propose a way to learn and use probabilist… ▽ More Deep learning approaches often require huge datasets to achieve good generalization. This complicates its use in tasks like image-based medical diagnosis, where the small training datasets are usually insufficient to learn appropriate data representations. For such sensitive tasks it is also important to provide the confidence in the predictions. Here, we propose a way to learn and use probabilistic labels to train accurate and calibrated deep networks from relatively small datasets. We observe gains of up to 22% in the accuracy of models trained with these labels, as compared with traditional approaches, in three classification tasks: diagnosis of hip dysplasia, fatty liver, and glaucoma. The outputs of models trained with probabilistic labels are calibrated, allowing the interpretation of its predictions as proper probabilities. We anticipate this approach will apply to other tasks where few training instances are available and expert knowledge can be encoded as probabilities. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: To appear in the Proceedings of the 24 th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego,California, USA. PMLR: Volume 130

arXiv:2006.05861 [pdf, other]

A systematic review on the role of artificial intelligence in sonographic diagnosis of thyroid cancer: Past, present and future

Authors: Fatemeh Abdolali, Atefeh Shahroudnejad, Abhilash Rakkunedeth Hareendranathan, Jacob L Jaremko, Michelle Noga, Kumaradevan Punithakumar

Abstract: Thyroid cancer is common worldwide, with a rapid increase in prevalence across North America in recent years. While most patients present with palpable nodules through physical examination, a large number of small and medium-sized nodules are detected by ultrasound examination. Suspicious nodules are then sent for biopsy through fine needle aspiration. Since biopsies are invasive and sometimes inc… ▽ More Thyroid cancer is common worldwide, with a rapid increase in prevalence across North America in recent years. While most patients present with palpable nodules through physical examination, a large number of small and medium-sized nodules are detected by ultrasound examination. Suspicious nodules are then sent for biopsy through fine needle aspiration. Since biopsies are invasive and sometimes inconclusive, various research groups have tried to develop computer-aided diagnosis systems. Earlier approaches along these lines relied on clinically relevant features that were manually identified by radiologists. With the recent success of artificial intelligence (AI), various new methods are being developed to identify these features in thyroid ultrasound automatically. In this paper, we present a systematic review of state-of-the-art on AI application in sonographic diagnosis of thyroid cancer. This review follows a methodology-based classification of the different techniques available for thyroid cancer diagnosis. With more than 50 papers included in this review, we reflect on the trends and challenges of the field of sonographic diagnosis of thyroid malignancies and potential of computer-aided diagnosis to increase the impact of ultrasound applications on the future of thyroid cancer diagnosis. Machine learning will continue to play a fundamental role in the development of future thyroid cancer diagnosis frameworks. △ Less

Submitted 10 June, 2020; originally announced June 2020.

arXiv:1801.02722 [pdf, other]

End-to-end detection-segmentation network with ROI convolution

Authors: Zichen Zhang, Min Tang, Dana Cobzas, Dornoosh Zonoobi, Martin Jagersand, Jacob L. Jaremko

Abstract: We propose an end-to-end neural network that improves the segmentation accuracy of fully convolutional networks by incorporating a localization unit. This network performs object localization first, which is then used as a cue to guide the training of the segmentation network. We test the proposed method on a segmentation task of small objects on a clinical dataset of ultrasound images. We show th… ▽ More We propose an end-to-end neural network that improves the segmentation accuracy of fully convolutional networks by incorporating a localization unit. This network performs object localization first, which is then used as a cue to guide the training of the segmentation network. We test the proposed method on a segmentation task of small objects on a clinical dataset of ultrasound images. We show that by jointly learning for detection and segmentation, the proposed network is able to improve the segmentation accuracy compared to only learning for segmentation. Code is publicly available at https://github.com/vincentzhang/roi-fcn. △ Less

Submitted 2 December, 2019; v1 submitted 8 January, 2018; originally announced January 2018.

Comments: ISBI 2018

arXiv:1711.00139 [pdf, other]

Segmentation-by-Detection: A Cascade Network for Volumetric Medical Image Segmentation

Authors: Min Tang, Zichen Zhang, Dana Cobzas, Martin Jagersand, Jacob L. Jaremko

Abstract: We propose an attention mechanism for 3D medical image segmentation. The method, named segmentation-by-detection, is a cascade of a detection module followed by a segmentation module. The detection module enables a region of interest to come to attention and produces a set of object region candidates which are further used as an attention model. Rather than dealing with the entire volume, the segm… ▽ More We propose an attention mechanism for 3D medical image segmentation. The method, named segmentation-by-detection, is a cascade of a detection module followed by a segmentation module. The detection module enables a region of interest to come to attention and produces a set of object region candidates which are further used as an attention model. Rather than dealing with the entire volume, the segmentation module distills the information from the potential region. This scheme is an efficient solution for volumetric data as it reduces the influence of the surrounding noise which is especially important for medical data with low signal-to-noise ratio. Experimental results on 3D ultrasound data of the femoral head shows superiority of the proposed method when compared with a standard fully convolutional network like the U-Net. △ Less

Submitted 31 October, 2017; originally announced November 2017.

Showing 1–12 of 12 results for author: Jaremko, J L