-
Policy Gradient-Driven Noise Mask
Authors:
Mehmet Can Yavuz,
Yang Yang
Abstract:
Deep learning classifiers face significant challenges when dealing with heterogeneous multi-modal and multi-organ biomedical datasets. The low-level feature distinguishability limited to imaging-modality hinders the classifiers' ability to learn high-level semantic relationships, resulting in sub-optimal performance. To address this issue, image augmentation strategies are employed as regularizati…
▽ More
Deep learning classifiers face significant challenges when dealing with heterogeneous multi-modal and multi-organ biomedical datasets. The low-level feature distinguishability limited to imaging-modality hinders the classifiers' ability to learn high-level semantic relationships, resulting in sub-optimal performance. To address this issue, image augmentation strategies are employed as regularization techniques. While additive noise input during network training is a well-established augmentation as regularization method, modern pipelines often favor more robust techniques such as dropout and weight decay. This preference stems from the observation that combining these established techniques with noise input can adversely affect model performance.
In this study, we propose a novel pretraining pipeline that learns to generate conditional noise mask specifically tailored to improve performance on multi-modal and multi-organ datasets. As a reinforcement learning algorithm, our approach employs a dual-component system comprising a very light-weight policy network that learns to sample conditional noise using a differentiable beta distribution and a classifier network. The policy network is trained using the reinforce algorithm to generate image-specific noise masks that regularize the classifier during pretraining. A key aspect is that the policy network's role is limited to obtaining an intermediate (or heated) model before fine-tuning. During inference, the policy network is omitted, allowing direct comparison between the baseline and noise-regularized models.
We conducted experiments and related analyses on RadImageNet datasets. Results demonstrate that fine-tuning the intermediate models consistently outperforms conventional training algorithms on both classification and generalization to unseen concept tasks.
△ Less
Submitted 29 April, 2024;
originally announced June 2024.
-
Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography
Authors:
Jie Liu,
Yixiao Zhang,
Kang Wang,
Mehmet Can Yavuz,
Xiaoxi Chen,
Yixuan Yuan,
Haoliang Li,
Yang Yang,
Alan Yuille,
Yucheng Tang,
Zongwei Zhou
Abstract:
The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and…
▽ More
The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome these limitations, we propose a universal, extensible framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models, enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors and ease the addition of new classes. We train our Universal Model on 3,410 CT volumes assembled from 14 publicly available datasets and then test it on 6,173 CT volumes from four external datasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD) public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary, Universal Model exhibits remarkable computational efficiency (6x faster than other dataset-specific models), demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks, and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes, models, and datasets are available at https://github.com/ljwztc/CLIP-Driven-Universal-Model
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Variational Self-Supervised Contrastive Learning Using Beta Divergence
Authors:
Mehmet Can Yavuz,
Berrin Yanikoglu
Abstract:
Learning a discriminative semantic space using unlabelled and noisy data remains unaddressed in a multi-label setting. We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. The method (VCL) utilizes variational contrastive learning with beta-divergence to learn robustly from unlabelled datasets, including uncurated an…
▽ More
Learning a discriminative semantic space using unlabelled and noisy data remains unaddressed in a multi-label setting. We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. The method (VCL) utilizes variational contrastive learning with beta-divergence to learn robustly from unlabelled datasets, including uncurated and noisy datasets. We demonstrate the effectiveness of the proposed method through rigorous experiments including linear evaluation and fine-tuning scenarios with multi-label datasets in the face understanding domain. In almost all tested scenarios, VCL surpasses the performance of state-of-the-art self-supervised methods, achieving a noteworthy increase in accuracy.
△ Less
Submitted 8 May, 2024; v1 submitted 5 September, 2023;
originally announced December 2023.
-
COVID-19 Detection in Computed Tomography Images with 2D and 3D Approaches
Authors:
Sara Atito Ali Ahmed,
Mehmet Can Yavuz,
Mehmet Umut Sen,
Fatih Gulsen,
Onur Tutar,
Bora Korkmazer,
Cesur Samanci,
Sabri Sirolu,
Rauf Hamid,
Ali Ergun Eryurekli,
Toghrul Mammadov,
Berrin Yanikoglu
Abstract:
Detecting COVID-19 in computed tomography (CT) or radiography images has been proposed as a supplement to the definitive RT-PCR test. We present a deep learning ensemble for detecting COVID-19 infection, combining slice-based (2D) and volume-based (3D) approaches. The 2D system detects the infection on each CT slice independently, combining them to obtain the patient-level decision via different m…
▽ More
Detecting COVID-19 in computed tomography (CT) or radiography images has been proposed as a supplement to the definitive RT-PCR test. We present a deep learning ensemble for detecting COVID-19 infection, combining slice-based (2D) and volume-based (3D) approaches. The 2D system detects the infection on each CT slice independently, combining them to obtain the patient-level decision via different methods (averaging and long-short term memory networks). The 3D system takes the whole CT volume to arrive to the patient-level decision in one step. A new high resolution chest CT scan dataset, called the IST-C dataset, is also collected in this work. The proposed ensemble, called IST-CovNet, obtains 90.80% accuracy and 0.95 AUC score overall on the IST-C dataset in detecting COVID-19 among normal controls and other types of lung pathologies; and 93.69% accuracy and 0.99 AUC score on the publicly available MosMed dataset that consists of COVID-19 scans and normal controls only. The system is deployed at Istanbul University Cerrahpasa School of Medicine.
△ Less
Submitted 20 May, 2021; v1 submitted 16 May, 2021;
originally announced May 2021.
-
Multilingual, Temporal and Sentimental Distant-Reading of City Events
Authors:
Mehmet Can Yavuz
Abstract:
Leibniz's Monadology mentions perceptional and sentimental variations of the individual in the city. It is the interaction of people with people and events. Film festivals are highly sentimental events of multicultural cities. Each movie has a different sentimental effect and the interactions with the movies have reflections that can be observed on social media. This analysis aims to apply distant…
▽ More
Leibniz's Monadology mentions perceptional and sentimental variations of the individual in the city. It is the interaction of people with people and events. Film festivals are highly sentimental events of multicultural cities. Each movie has a different sentimental effect and the interactions with the movies have reflections that can be observed on social media. This analysis aims to apply distant reading on Berlinale tweets collected during the festival. On contrary to close reading, distant reading let authors to observe patterns in large collection of data. The analysis is temporal and sentimental in multilingual domain and strongly positive and negative time intervals are analysed. For this purpose, we trained a deep sentiment network with multilingual embeddings. These multilingual embeddings are aligned in latent space. We trained the network with a multilingual dataset in three languages English, German and Spanish. The trained algorithm has a 0.78 test score and applied on Tweets with Berlinale hashtag during the festival. Although the sentimental analysis does not reflect the award-winning films, we observe weekly routine on the relationship between sentimentality, which can mislead a close reading analysis. We have also remarks on popularity of the director or actors.
△ Less
Submitted 4 January, 2021;
originally announced February 2021.