Search | arXiv e-print repository

Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs

Authors: Huaying Zhang, Rintaro Yanagi, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: This paper proposes a novel zero-shot composed image retrieval (CIR) method considering the query-target relationship by masked image-text pairs. The objective of CIR is to retrieve the target image using a query image and a query text. Existing methods use a textual inversion network to convert the query image into a pseudo word to compose the image and text and use a pre-trained visual-language… ▽ More This paper proposes a novel zero-shot composed image retrieval (CIR) method considering the query-target relationship by masked image-text pairs. The objective of CIR is to retrieve the target image using a query image and a query text. Existing methods use a textual inversion network to convert the query image into a pseudo word to compose the image and text and use a pre-trained visual-language model to realize the retrieval. However, they do not consider the query-target relationship to train the textual inversion network to acquire information for retrieval. In this paper, we propose a novel zero-shot CIR method that is trained end-to-end using masked image-text pairs. By exploiting the abundant image-text pairs that are convenient to obtain with a masking strategy for learning the query-target relationship, it is expected that accurate zero-shot CIR using a retrieval-focused textual inversion network can be realized. Experimental results show the effectiveness of the proposed method. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted as a conference paper in IEEE ICIP 2024

arXiv:2406.13316 [pdf, other]

Reinforcing Pre-trained Models Using Counterfactual Images

Authors: Xiang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Abstract: This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. Deep learning classification models are often trained using datasets that mirror real-world scenarios. In this training process, because learning is based solely on correlations with labels, there is a risk that models may learn spurious relationships, such as an overreli… ▽ More This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. Deep learning classification models are often trained using datasets that mirror real-world scenarios. In this training process, because learning is based solely on correlations with labels, there is a risk that models may learn spurious relationships, such as an overreliance on features not central to the subject, like background elements in images. However, due to the black-box nature of the decision-making process in deep learning models, identifying and addressing these vulnerabilities has been particularly challenging. We introduce a novel framework for reinforcing the classification models, which consists of a two-stage process. First, we identify model weaknesses by testing the model using the counterfactual image dataset, which is generated by perturbed image captions. Subsequently, we employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model. Through extensive experiments on several classification models across various datasets, we revealed that fine-tuning with a small set of counterfactual images effectively strengthens the model. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

arXiv:2404.17732 [pdf, other]

Generative Dataset Distillation: Balancing Global Structure and Local Details

Authors: Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Abstract: In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor… ▽ More In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by the 1st CVPR Workshop on Dataset Distillation

arXiv:2403.18258 [pdf, other]

Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach

Authors: Taro Togo, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Abstract: This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The… ▽ More This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The ability to forget is a crucial brain function that facilitates continual learning by selectively discarding less relevant information for humans. However, in the field of machine learning models, the concept of intentionally forgetting has not been extensively investigated. In this study we aim to bridge this gap by incorporating the forgetting mechanisms into GCIL, thereby examining their impact on the models' ability to learn in continual learning. Through our experiments, we have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge, underscoring the positive role that strategic forgetting plays in the process of continual learning. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2402.09677 [pdf, other]

Prompt-based Personalized Federated Learning for Medical Visual Question Answering

Authors: He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client… ▽ More We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client-to-client communication in previous pFL methods, we propose a succinct information sharing system by introducing prompts that are small learnable parameters. In addition, the proposed method introduces a reliability parameter to prevent the negative effects of low performance and irrelevant clients. Finally, extensive evaluations on various heterogeneous medical datasets attest to the effectiveness of our proposed method. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Accept by ICASSP2024

arXiv:2401.15863 [pdf, other]

Importance-Aware Adaptive Dataset Distillation

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model t… ▽ More Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: Published as a journal paper in Elsevier Neural Networks

arXiv:2303.04388 [pdf, other]

Interpretable Visual Question Answering Referring to Outside Knowledge

Authors: He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: We present a novel multimodal interpretable VQA model that can answer the question more accurately and generate diverse explanations. Although researchers have proposed several methods that can generate human-readable and fine-grained natural language sentences to explain a model's decision, these methods have focused solely on the information in the image. Ideally, the model should refer to vario… ▽ More We present a novel multimodal interpretable VQA model that can answer the question more accurately and generate diverse explanations. Although researchers have proposed several methods that can generate human-readable and fine-grained natural language sentences to explain a model's decision, these methods have focused solely on the information in the image. Ideally, the model should refer to various information inside and outside the image to correctly generate explanations, just as we use background knowledge daily. The proposed method incorporates information from outside knowledge and multiple image captions to increase the diversity of information available to the model. The contribution of this paper is to construct an interpretable visual question answering model using multimodal inputs to improve the rationality of generated results. Experimental results show that our model can outperform state-of-the-art methods regarding answer accuracy and explanation rationality. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Under review

arXiv:2212.09281 [pdf, other]

Boosting Automatic COVID-19 Detection Performance with Self-Supervised Learning and Batch Knowledge Ensembling

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Problem: Detecting COVID-19 from chest X-Ray (CXR) images has become one of the fastest and easiest methods for detecting COVID-19. However, the existing methods usually use supervised transfer learning from natural images as a pretraining process. These methods do not consider the unique features of COVID-19 and the similar features between COVID-19 and other pneumonia. Aim: In this paper, we wan… ▽ More Problem: Detecting COVID-19 from chest X-Ray (CXR) images has become one of the fastest and easiest methods for detecting COVID-19. However, the existing methods usually use supervised transfer learning from natural images as a pretraining process. These methods do not consider the unique features of COVID-19 and the similar features between COVID-19 and other pneumonia. Aim: In this paper, we want to design a novel high-accuracy COVID-19 detection method that uses CXR images, which can consider the unique features of COVID-19 and the similar features between COVID-19 and other pneumonia. Methods: Our method consists of two phases. One is self-supervised learning-based pertaining; the other is batch knowledge ensembling-based fine-tuning. Self-supervised learning-based pretraining can learn distinguished representations from CXR images without manually annotated labels. On the other hand, batch knowledge ensembling-based fine-tuning can utilize category knowledge of images in a batch according to their visual feature similarities to improve detection performance. Unlike our previous implementation, we introduce batch knowledge ensembling into the fine-tuning phase, reducing the memory used in self-supervised learning and improving COVID-19 detection accuracy. Results: On two public COVID-19 CXR datasets, namely, a large dataset and an unbalanced dataset, our method exhibited promising COVID-19 detection performance. Our method maintains high detection accuracy even when annotated CXR training images are reduced significantly (e.g., using only 10% of the original dataset). In addition, our method is insensitive to changes in hyperparameters. △ Less

Submitted 30 March, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Published as a journal paper at Elsevier CIBM

arXiv:2212.09276 [pdf, other]

COVID-19 Detection Based on Self-Supervised Transfer Learning Using Chest X-Ray Images

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Purpose: Considering several patients screened due to COVID-19 pandemic, computer-aided detection has strong potential in assisting clinical workflow efficiency and reducing the incidence of infections among radiologists and healthcare providers. Since many confirmed COVID-19 cases present radiological findings of pneumonia, radiologic examinations can be useful for fast detection. Therefore, ches… ▽ More Purpose: Considering several patients screened due to COVID-19 pandemic, computer-aided detection has strong potential in assisting clinical workflow efficiency and reducing the incidence of infections among radiologists and healthcare providers. Since many confirmed COVID-19 cases present radiological findings of pneumonia, radiologic examinations can be useful for fast detection. Therefore, chest radiography can be used to fast screen COVID-19 during the patient triage, thereby determining the priority of patient's care to help saturated medical facilities in a pandemic situation. Methods: In this paper, we propose a new learning scheme called self-supervised transfer learning for detecting COVID-19 from chest X-ray (CXR) images. We compared six self-supervised learning (SSL) methods (Cross, BYOL, SimSiam, SimCLR, PIRL-jigsaw, and PIRL-rotation) with the proposed method. Additionally, we compared six pretrained DCNNs (ResNet18, ResNet50, ResNet101, CheXNet, DenseNet201, and InceptionV3) with the proposed method. We provide quantitative evaluation on the largest open COVID-19 CXR dataset and qualitative results for visual inspection. Results: Our method achieved a harmonic mean (HM) score of 0.985, AUC of 0.999, and four-class accuracy of 0.953. We also used the visualization technique Grad-CAM++ to generate visual explanations of different classes of CXR images with the proposed method to increase the interpretability. Conclusions: Our method shows that the knowledge learned from natural images using transfer learning is beneficial for SSL of the CXR images and boosts the performance of representation learning for COVID-19 detection. Our method promises to reduce the incidence of infections among radiologists and healthcare providers. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: Published as a journal paper at Springer IJCARS

arXiv:2212.02785 [pdf, other]

doi 10.1007/978-3-031-19818-2_33

Union-set Multi-source Model Adaptation for Semantic Segmentation

Authors: Zongyao Li, Ren Togo, Takahiro Ogawa, Miki haseyama

Abstract: This paper solves a generalized version of the problem of multi-source model adaptation for semantic segmentation. Model adaptation is proposed as a new domain adaptation problem which requires access to a pre-trained model instead of data for the source domain. A general multi-source setting of model adaptation assumes strictly that each source domain shares a common label space with the target d… ▽ More This paper solves a generalized version of the problem of multi-source model adaptation for semantic segmentation. Model adaptation is proposed as a new domain adaptation problem which requires access to a pre-trained model instead of data for the source domain. A general multi-source setting of model adaptation assumes strictly that each source domain shares a common label space with the target domain. As a relaxation, we allow the label space of each source domain to be a subset of that of the target domain and require the union of the source-domain label spaces to be equal to the target-domain label space. For the new setting named union-set multi-source model adaptation, we propose a method with a novel learning strategy named model-invariant feature learning, which takes full advantage of the diverse characteristics of the source-domain models, thereby improving the generalization in the target domain. We conduct extensive experiments in various adaptation settings to show the superiority of our method. The code is available at https://github.com/lzy7976/union-set-model-adaptation. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Accepted by ECCV2022

arXiv:2211.00313 [pdf, other]

RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representation from X-Ray Images

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Purpose: Self-supervised learning has been gaining attention in the medical field for its potential to improve computer-aided diagnosis. One popular method of self-supervised learning is masked image modeling (MIM), which involves masking a subset of input pixels and predicting the masked pixels. However, traditional MIM methods typically use a random masking strategy, which may not be ideal for m… ▽ More Purpose: Self-supervised learning has been gaining attention in the medical field for its potential to improve computer-aided diagnosis. One popular method of self-supervised learning is masked image modeling (MIM), which involves masking a subset of input pixels and predicting the masked pixels. However, traditional MIM methods typically use a random masking strategy, which may not be ideal for medical images that often have a small region of interest for disease detection. To address this issue, this work aims to improve MIM for medical images and evaluate its effectiveness in an open X-ray image dataset. Methods: In this paper, we present a novel method called region-guided masked image modeling (RGMIM) for learning meaningful representation from X-ray images. Our method adopts a new masking strategy that utilizes organ mask information to identify valid regions for learning more meaningful representations. The proposed method was contrasted with five self-supervised learning techniques (MAE, SKD, Cross, BYOL, and, SimSiam). We conduct quantitative evaluations on an open lung X-ray image dataset as well as masking ratio hyperparameter studies. Results: When using the entire training set, RGMIM outperformed other comparable methods, achieving a 0.962 lung disease detection accuracy. Specifically, RGMIM significantly improved performance in small data volumes, such as 5% and 10% of the training set (846 and 1,693 images) compared to other methods, and achieved a 0.957 detection accuracy even when only 50% of the training set was used. Conclusions: RGMIM can mask more valid regions, facilitating the learning of discriminative representations and the subsequent high-accuracy lung disease detection. RGMIM outperforms other state-of-the-art self-supervised learning methods in experiments, particularly when limited training data is used. △ Less

Submitted 21 May, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2209.14743 [pdf, other]

doi 10.1007/s11042-022-13027-3

Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different dat… ▽ More Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different datasets. Hence, it is meaningful to predict classification performance by assessing the complexity of datasets effectively before training DCNN models. This paper proposes a novel method called cumulative maximum scaled Area Under Laplacian Spectrum (cmsAULS), which can achieve state-of-the-art complexity assessment performance on six datasets. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: Published as a journal paper at Springer MTAP

arXiv:2209.14635 [pdf, other]

doi 10.1016/j.cmpb.2022.107189

Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Background and objective: Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical da… ▽ More Background and objective: Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical data sharing. Therefore, this study proposes a novel soft-label dataset distillation method for medical data sharing. Methods: The proposed method distills valid information of medical image data and generates several compressed images with different data distributions for anonymous medical data sharing. Furthermore, our method can extract essential weights of DCNN models to reduce the memory required to save trained models for efficient medical data sharing. Results: The proposed method can compress tens of thousands of images into several soft-label images and reduce the size of a trained model to a few hundredths of its original size. The compressed images obtained after distillation have been visually anonymized; therefore, they do not contain the private information of the patients. Furthermore, we can realize high-detection performance with a small number of compressed images. Conclusions: The experimental results show that the proposed method can improve the efficiency and security of medical data sharing. △ Less

Submitted 1 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Published as a journal paper at Elsevier CMPB

arXiv:2209.14609 [pdf, other]

Dataset Distillation Using Parameter Pruning

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method. In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method. △ Less

Submitted 20 August, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Published as a journal paper at IEICE Trans. Fund

arXiv:2209.14603 [pdf, other]

Dataset Distillation for Medical Dataset Sharing

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Sharing medical datasets between hospitals is challenging because of the privacy-protection problem and the massive cost of transmitting and storing many high-resolution medical images. However, dataset distillation can synthesize a small dataset such that models trained on it achieve comparable performance with the original large dataset, which shows potential for solving the existing medical sha… ▽ More Sharing medical datasets between hospitals is challenging because of the privacy-protection problem and the massive cost of transmitting and storing many high-resolution medical images. However, dataset distillation can synthesize a small dataset such that models trained on it achieve comparable performance with the original large dataset, which shows potential for solving the existing medical sharing problems. Hence, this paper proposes a novel dataset distillation-based method for medical dataset sharing. Experimental results on a COVID-19 chest X-ray image dataset show that our method can achieve high detection performance even using scarce anonymized chest X-ray images. △ Less

Submitted 23 December, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted by AAAI-23 Workshop on Representation Learning for Responsible Human-Centric AI

arXiv:2209.07007 [pdf, other]

Gromov-Wasserstein Autoencoders

Authors: Nao Nakagawa, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Variational Autoencoder (VAE)-based generative models offer flexible representation learning by incorporating meta-priors, general premises considered beneficial for downstream tasks. However, the incorporated meta-priors often involve ad-hoc model deviations from the original likelihood architecture, causing undesirable changes in their training. In this paper, we propose a novel representation l… ▽ More Variational Autoencoder (VAE)-based generative models offer flexible representation learning by incorporating meta-priors, general premises considered beneficial for downstream tasks. However, the incorporated meta-priors often involve ad-hoc model deviations from the original likelihood architecture, causing undesirable changes in their training. In this paper, we propose a novel representation learning method, Gromov-Wasserstein Autoencoders (GWAE), which directly matches the latent and data distributions using the variational autoencoding scheme. Instead of likelihood-based objectives, GWAE models minimize the Gromov-Wasserstein (GW) metric between the trainable prior and given data distributions. The GW metric measures the distance structure-oriented discrepancy between distributions even with different dimensionalities, which provides a direct measure between the latent and data spaces. By restricting the prior family, we can introduce meta-priors into the latent space without changing their objective. The empirical comparisons with VAE-based models show that GWAE models work in two prominent meta-priors, disentanglement and clustering, with their GW objective unchanged. △ Less

Submitted 24 February, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: 38 pages, 9 tables, 13 figures; accepted at ICLR2023

arXiv:2206.03012 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746967

TriBYOL: Triplet BYOL for Self-Supervised Representation Learning

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: This paper proposes a novel self-supervised learning method for learning better representations with small batch sizes. Many self-supervised learning methods based on certain forms of the siamese network have emerged and received significant attention. However, these methods need to use large batch sizes to learn good representations and require heavy computational resources. We present a new trip… ▽ More This paper proposes a novel self-supervised learning method for learning better representations with small batch sizes. Many self-supervised learning methods based on certain forms of the siamese network have emerged and received significant attention. However, these methods need to use large batch sizes to learn good representations and require heavy computational resources. We present a new triplet network combined with a triple-view loss to improve the performance of self-supervised representation learning with small batch sizes. Experimental results show that our method can drastically outperform state-of-the-art self-supervised learning methods on several datasets in small-batch cases. Our method provides a feasible solution for self-supervised learning with real-world high-resolution images that uses small batch sizes. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: Published as a conference paper at ICASSP 2022

arXiv:2206.03009 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746540

Self-Knowledge Distillation based Self-Supervised Learning for Covid-19 Detection from Chest X-Ray Images

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: The global outbreak of the Coronavirus 2019 (COVID-19) has overloaded worldwide healthcare systems. Computer-aided diagnosis for COVID-19 fast detection and patient triage is becoming critical. This paper proposes a novel self-knowledge distillation based self-supervised learning method for COVID-19 detection from chest X-ray images. Our method can use self-knowledge of images based on similaritie… ▽ More The global outbreak of the Coronavirus 2019 (COVID-19) has overloaded worldwide healthcare systems. Computer-aided diagnosis for COVID-19 fast detection and patient triage is becoming critical. This paper proposes a novel self-knowledge distillation based self-supervised learning method for COVID-19 detection from chest X-ray images. Our method can use self-knowledge of images based on similarities of their visual features for self-supervised learning. Experimental results show that our method achieved an HM score of 0.988, an AUC of 0.999, and an accuracy of 0.957 on the largest open COVID-19 chest X-ray dataset. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: Published as a conference paper at ICASSP 2022

arXiv:2104.02864 [pdf, other]

Self-Supervised Learning for Gastritis Detection with Gastric X-ray Images

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: Purpose: Manual annotation of gastric X-ray images by doctors for gastritis detection is time-consuming and expensive. To solve this, a self-supervised learning method is developed in this study. The effectiveness of the proposed self-supervised learning method in gastritis detection is verified using a few annotated gastric X-ray images. Methods: In this study, we develop a novel method that can… ▽ More Purpose: Manual annotation of gastric X-ray images by doctors for gastritis detection is time-consuming and expensive. To solve this, a self-supervised learning method is developed in this study. The effectiveness of the proposed self-supervised learning method in gastritis detection is verified using a few annotated gastric X-ray images. Methods: In this study, we develop a novel method that can perform explicit self-supervised learning and learn discriminative representations from gastric X-ray images. Models trained based on the proposed method were fine-tuned on datasets comprising a few annotated gastric X-ray images. Five self-supervised learning methods, i.e., SimSiam, BYOL, PIRL-jigsaw, PIRL-rotation, and SimCLR, were compared with the proposed method. Furthermore, three previous methods, one pretrained on ImageNet, one trained from scratch, and one semi-supervised learning method, were compared with the proposed method. Results: The proposed method's harmonic mean score of sensitivity and specificity after fine-tuning with the annotated data of 10, 20, 30, and 40 patients were 0.875, 0.911, 0.915, and 0.931, respectively. The proposed method outperformed all comparative methods, including the five self-supervised learning and three previous methods. Experimental results showed the effectiveness of the proposed method in gastritis detection using a few annotated gastric X-ray images. Conclusions: This paper proposes a novel self-supervised learning method based on a teacher-student architecture for gastritis detection using gastric X-ray images. The proposed method can perform explicit self-supervised learning and learn discriminative representations from gastric X-ray images. The proposed method exhibits potential clinical use in gastritis detection using a few annotated gastric X-ray images. △ Less

Submitted 27 March, 2023; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: Published as a journal paper at Springer IJCARS

arXiv:2104.02857 [pdf, other]

doi 10.1109/ICIP40778.2020.9191357

Soft-Label Anonymous Gastric X-ray Image Distillation

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract: This paper presents a soft-label anonymous gastric X-ray image distillation method based on a gradient descent approach. The sharing of medical data is demanded to construct high-accuracy computer-aided diagnosis (CAD) systems. However, the large size of the medical dataset and privacy protection are remaining problems in medical data sharing, which hindered the research of CAD systems. The idea o… ▽ More This paper presents a soft-label anonymous gastric X-ray image distillation method based on a gradient descent approach. The sharing of medical data is demanded to construct high-accuracy computer-aided diagnosis (CAD) systems. However, the large size of the medical dataset and privacy protection are remaining problems in medical data sharing, which hindered the research of CAD systems. The idea of our distillation method is to extract the valid information of the medical dataset and generate a tiny distilled dataset that has a different data distribution. Different from model distillation, our method aims to find the optimal distilled images, distilled labels and the optimized learning rate. Experimental results show that the proposed method can not only effectively compress the medical dataset but also anonymize medical images to protect the patient's private information. The proposed approach can improve the efficiency and security of medical data sharing. △ Less

Submitted 20 March, 2024; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: The first paper to explore real-world dataset distillation; Work was done in 2019 and published as a conference paper at ICIP 2020

Showing 1–20 of 20 results for author: Togo, R