Search | arXiv e-print repository

arXiv:2307.02783 [pdf, other]

UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering

Authors: Triet M. Thai, Anh T. Vo, Hao K. Tieu, Linh N. P. Bui, Thien T. B. Nguyen

Abstract: In recent years, artificial intelligence has played an important role in medicine and disease diagnosis, with many applications to be mentioned, one of which is Medical Visual Question Answering (MedVQA). By combining computer vision and natural language processing, MedVQA systems can assist experts in extracting relevant information from medical image based on a given question and providing preci… ▽ More In recent years, artificial intelligence has played an important role in medicine and disease diagnosis, with many applications to be mentioned, one of which is Medical Visual Question Answering (MedVQA). By combining computer vision and natural language processing, MedVQA systems can assist experts in extracting relevant information from medical image based on a given question and providing precise diagnostic answers. The ImageCLEFmed-MEDVQA-GI-2023 challenge carried out visual question answering task in the gastrointestinal domain, which includes gastroscopy and colonoscopy images. Our team approached Task 1 of the challenge by proposing a multimodal learning method with image enhancement to improve the VQA performance on gastrointestinal images. The multimodal architecture is set up with BERT encoder and different pre-trained vision models based on convolutional neural network (CNN) and Transformer architecture for features extraction from question and endoscopy image. The result of this study highlights the dominance of Transformer-based vision models over the CNNs and demonstrates the effectiveness of the image enhancement process, with six out of the eight vision models achieving better F1-Score. Our best method, which takes advantages of BERT+BEiT fusion and image enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score on the development test set, while also producing good result on the private test set with accuracy of 82.01%. △ Less

Submitted 19 November, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: ImageCLEF2023 published version: https://ceur-ws.org/Vol-3497/paper-129.pdf

arXiv:2209.04794 [pdf, other]

doi 10.1371/journal.pone.0276545

Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese

Authors: Thao T. B. Nguyen, Tam M. Vo, Thang V. Nguyen, Hieu H. Pham, Ha Q. Nguyen

Abstract: We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we… ▽ More We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we built a CXR dataset containing 9,752 studies and evaluated our pipeline using a subset of this dataset. With an F1-score of at least 0.9923, the evaluation demonstrates that our labeling tool performs precisely and consistently across all classes. After building the dataset, we train deep learning models that leverage knowledge transferred from large public CXR datasets. We employ a variety of loss functions to overcome the curse of imbalanced multi-label datasets and conduct experiments with various model architectures to select the one that delivers the best performance. Our best model (CheXpert-pretrained EfficientNet-B2) yields an F1-score of 0.6989 (95% CI 0.6740, 0.7240), AUC of 0.7912, sensitivity of 0.7064 and specificity of 0.8760 for the abnormal diagnosis in general. Finally, we demonstrate that our coarse classification (based on five specific locations of abnormalities) yields comparable results to fine classification (twelve pathologies) on the benchmark CheXpert dataset for general anomaly detection while delivering better performance in terms of the average performance of all classes. △ Less

Submitted 11 September, 2022; originally announced September 2022.

Comments: This work has been provisionally accepted for publication by Plos One journal

arXiv:2208.03403 [pdf, other]

Slice-level Detection of Intracranial Hemorrhage on CT Using Deep Descriptors of Adjacent Slices

Authors: Dat T. Ngo, Thao T. B. Nguyen, Hieu T. Nguyen, Dung B. Nguyen, Ha Q. Nguyen, Hieu H. Pham

Abstract: The rapid development in representation learning techniques such as deep neural networks and the availability of large-scale, well-annotated medical imaging datasets have to a rapid increase in the use of supervised machine learning in the 3D medical image analysis and diagnosis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imagin… ▽ More The rapid development in representation learning techniques such as deep neural networks and the availability of large-scale, well-annotated medical imaging datasets have to a rapid increase in the use of supervised machine learning in the 3D medical image analysis and diagnosis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imaging community to assist clinicians and medical experts in disease diagnosis and treatment. However, training and inferencing deep neural networks such as D-CNN on high-resolution 3D volumes of Computed Tomography (CT) scans for diagnostic tasks pose formidable computational challenges. This challenge raises the need of develo** deep learning-based approaches that are robust in learning representations in 2D images, instead 3D scans. In this work, we propose for the first time a new strategy to train \emph{slice-level} classifiers on CT scans based on the descriptors of the adjacent slices along the axis. In particular, each of which is extracted through a convolutional neural network (CNN). This method is applicable to CT datasets with per-slice labels such as the RSNA Intracranial Hemorrhage (ICH) dataset, which aims to predict the presence of ICH and classify it into 5 different sub-types. We obtain a single model in the top 4% best-performing solutions of the RSNA ICH challenge, where model ensembles are allowed. Experiments also show that the proposed method significantly outperforms the baseline model on CQ500. The proposed method is general and can be applied to other 3D medical diagnosis tasks such as MRI imaging. To encourage new advances in the field, we will make our codes and pre-trained model available upon acceptance of the paper. △ Less

Submitted 17 April, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: Accepted for presentation at the 22nd IEEE Statistical Signal Processing (SSP) workshop

Showing 1–3 of 3 results for author: Nguyen, T T B