Skip to main content

Showing 1–4 of 4 results for author: Dhakal, M

Searching in archive cs. Search in all archives.
.
  1. Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet

    Authors: Manish Dhakal, Arman Chhetri, Aman Kumar Gupta, Prabin Lamichhane, Suraj Pandey, Subarna Shakya

    Abstract: This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted at 2022 International Conference on Inventive Computation Technologies (ICICT), IEEE

    Journal ref: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 515-521

  2. arXiv:2405.06196  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks

    Authors: Manish Dhakal, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

    Abstract: Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must s… ▽ More

    Submitted 27 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted at MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

  3. arXiv:2309.12829  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

    Authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari, Bishesh Khanal

    Abstract: Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted at the 4th International Workshop of Advances in Simplifying Medical UltraSound (ASMUS)

  4. arXiv:2308.07706  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

    Authors: Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

    Abstract: Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an addition… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Medical Imaging with Deep Learning (MIDL) 2024 (Oral)