Search | arXiv e-print repository

VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks

Authors: Manish Dhakal, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

Abstract: Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must s… ▽ More Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must spend substantial time delineating the target structure of interest. VLSMs for medical images resort to fine-tuning base VLM or VLSM pretrained on open-domain natural image datasets due to fewer annotated medical image datasets; this fine-tuning is resource-consuming and expensive as it usually requires updating all or a significant fraction of the pretrained parameters. Recently, lightweight blocks called adapters have been proposed in VLMs that keep the pretrained model frozen and only train adapters during fine-tuning, substantially reducing the computing resources required. We introduce a novel adapter, VLSM-Adapter, that can fine-tune pretrained vision-language segmentation models using transformer encoders. Our experiments in widely used CLIP-based segmentation models show that with only 3 million trainable parameters, the VLSM-Adapter outperforms state-of-the-art and is comparable to the upper bound end-to-end fine-tuning. The source code is available at: https://github.com/naamiinepal/vlsm-adapter. △ Less

Submitted 27 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted at MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

arXiv:2309.12829 [pdf, other]

doi 10.1007/978-3-031-44521-7_9

Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

Authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari, Bishesh Khanal

Abstract: Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially… ▽ More Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially aiding in accurate and explainable segmentation. However, the lack of readily available data in echocardiography hampers the training of VLSMs. In this study, we explore using synthetic datasets from Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography segmentation. We evaluate results for two popular VLSMs (CLIPSeg and CRIS) using seven different kinds of language prompts derived from several attributes, automatically extracted from echocardiography images, segmentation masks, and their metadata. Our results show improved metrics and faster convergence when pretraining VLSMs on SDM-generated synthetic images before finetuning on real images. The code, configs, and prompts are available at https://github.com/naamiinepal/synthetic-boost. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: Accepted at the 4th International Workshop of Advances in Simplifying Medical UltraSound (ASMUS)

arXiv:2308.07706 [pdf, other]

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

Authors: Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

Abstract: Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an addition… ▽ More Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an additional input to segmentation models. Introducing auxiliary information via text with human-in-the-loop prompting during inference opens up unique opportunities, such as open vocabulary segmentation and potentially more robust segmentation models against out-of-distribution data. Although transfer learning from natural to medical images has been explored for image-only segmentation models, the joint representation of vision-language in segmentation problems remains underexplored. This study introduces the first systematic study on transferring VLSMs to 2D medical images, using carefully curated $11$ datasets encompassing diverse modalities and insightful language prompts and experiments. Our findings demonstrate that although VLSMs show competitive performance compared to image-only models for segmentation after finetuning in limited medical image datasets, not all VLSMs utilize the additional information from language prompts, with image features playing a dominant role. While VLSMs exhibit enhanced performance in handling pooled datasets with diverse modalities and show potential robustness to domain shifts compared to conventional segmentation models, our results suggest that novel approaches are required to enable VLSMs to leverage the various auxiliary information available through language prompts. The code and datasets are available at https://github.com/naamiinepal/medvlsm. △ Less

Submitted 20 June, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: Medical Imaging with Deep Learning (MIDL) 2024 (Oral)

arXiv:2304.05339 [pdf, other]

Deep-learning assisted detection and quantification of (oo)cysts of Giardia and Cryptosporidium on smartphone microscopy images

Authors: Suprim Nakarmi, Sanam Pudasaini, Safal Thapaliya, Pratima Upretee, Retina Shrestha, Basant Giri, Bhanu Bhakta Neupane, Bishesh Khanal

Abstract: The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identificatio… ▽ More The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identification by trained technicians, usually unavailable in resource-limited settings. Automatic detection of (oo)cysts using deep-learning-based object detection could offer a solution for this limitation. We evaluate the performance of three state-of-the-art object detectors to detect (oo)cysts of Giardia and Cryptosporidium on a custom dataset that includes both smartphone and brightfield microscopic images from vegetable samples. Faster RCNN, RetinaNet, and you only look once (YOLOv8s) deep-learning models were employed to explore their efficacy and limitations. Our results show that while the deep-learning models perform better with the brightfield microscopy image dataset than the smartphone microscopy image dataset, the smartphone microscopy predictions are still comparable to the prediction performance of non-experts. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 18 pages (including supplementary information), 4 figures, 7 tables, submitting to Journal of Machine Learning for Biomedical Imaging

arXiv:2210.05425 [pdf]

COVID-19-related Nepali Tweets Classification in a Low Resource Setting

Authors: Rabin Adhikari, Safal Thapaliya, Nirajan Basnet, Samip Poudel, Aman Shakya, Bishesh Khanal

Abstract: Billions of people across the globe have been using social media platforms in their local languages to voice their opinions about the various topics related to the COVID-19 pandemic. Several organizations, including the World Health Organization, have developed automated social media analysis tools that classify COVID-19-related tweets into various topics. However, these tools that help combat the… ▽ More Billions of people across the globe have been using social media platforms in their local languages to voice their opinions about the various topics related to the COVID-19 pandemic. Several organizations, including the World Health Organization, have developed automated social media analysis tools that classify COVID-19-related tweets into various topics. However, these tools that help combat the pandemic are limited to very few languages, making several countries unable to take their benefit. While multi-lingual or low-resource language-specific tools are being developed, they still need to expand their coverage, such as for the Nepali language. In this paper, we identify the eight most common COVID-19 discussion topics among the Twitter community using the Nepali language, set up an online platform to automatically gather Nepali tweets containing the COVID-19-related keywords, classify the tweets into the eight topics, and visualize the results across the period in a web-based dashboard. We compare the performance of two state-of-the-art multi-lingual language models for Nepali tweet classification, one generic (mBERT) and the other Nepali language family-specific model (MuRIL). Our results show that the models' relative performance depends on the data size, with MuRIL doing better for a larger dataset. The annotated data, models, and the web-based dashboard are open-sourced at https://github.com/naamiinepal/covid-tweet-classification. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted at the 7th Social Media Mining for Health (#SMM4H) Workshop, co-located at Coling 2022

arXiv:1907.01515 [pdf, other]

doi 10.4018/978-1-5225-7467-5.ch002

Electroencephalogram (EEG) for Delineating Objective Measure of Autism Spectrum Disorder (ASD) (Extended Version)

Authors: Yasith Jayawardana, Mark Jaime, Sashi Thapaliya, Sampath Jayarathna

Abstract: Autism Spectrum Disorder (ASD) is a developmental disorder that often impairs a child's normal development of the brain. According to CDC, it is estimated that 1 in 6 children in the US suffer from development disorders, and 1 in 68 children in the US suffer from ASD. This condition has a negative impact on a person's ability to hear, socialize and communicate. Overall, ASD has a broad range of sy… ▽ More Autism Spectrum Disorder (ASD) is a developmental disorder that often impairs a child's normal development of the brain. According to CDC, it is estimated that 1 in 6 children in the US suffer from development disorders, and 1 in 68 children in the US suffer from ASD. This condition has a negative impact on a person's ability to hear, socialize and communicate. Overall, ASD has a broad range of symptoms and severity; hence the term spectrum is used. One of the main contributors to ASD is known to be genetics. Up to date, no suitable cure for ASD has been found. Early diagnosis is crucial for the long-term treatment of ASD, but this is challenging due to the lack of a proper objective measures. Subjective measures often take more time, resources, and have false positives or false negatives. There is a need for efficient objective measures that can help in diagnosing this disease early as possible with less effort. EEG measures the electric signals of the brain via electrodes placed on various places on the scalp. These signals can be used to study complex neuropsychiatric issues. Studies have shown that EEG has the potential to be used as a biomarker for various neurological conditions including ASD. This chapter will outline the usage of EEG measurement for the classification of ASD using machine learning algorithms. △ Less

Submitted 25 June, 2019; originally announced July 2019.

Showing 1–6 of 6 results for author: Thapaliya, S