Skip to main content

Showing 1–50 of 55 results for author: Ogawa, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18836  [pdf, other

    cs.CV cs.IR

    Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs

    Authors: Huaying Zhang, Rintaro Yanagi, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper proposes a novel zero-shot composed image retrieval (CIR) method considering the query-target relationship by masked image-text pairs. The objective of CIR is to retrieve the target image using a query image and a query text. Existing methods use a textual inversion network to convert the query image into a pseudo word to compose the image and text and use a pre-trained visual-language… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper in IEEE ICIP 2024

  2. arXiv:2406.13316  [pdf, other

    cs.CV cs.MM

    Reinforcing Pre-trained Models Using Counterfactual Images

    Authors: Xiang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. Deep learning classification models are often trained using datasets that mirror real-world scenarios. In this training process, because learning is based solely on correlations with labels, there is a risk that models may learn spurious relationships, such as an overreli… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 6 pages, 4 figures

  3. arXiv:2406.01033  [pdf

    cs.CV cs.LG cs.MM

    Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement

    Authors: Yung-Hui Lin, Yu-Wen Chang, Huang-Chia Shih, Takahiro Ogawa

    Abstract: Jersey number recognition (JNR) has always been an important task in sports analytics. Improving recognition accuracy remains an ongoing challenge because images are subject to blurring, occlusion, deformity, and low resolution. Recent research has addressed these problems using number localization and optical character recognition. Some approaches apply player identification schemes to image sequ… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures, 5 tables

  4. arXiv:2404.17732  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Dataset Distillation: Balancing Global Structure and Local Details

    Authors: Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by the 1st CVPR Workshop on Dataset Distillation

  5. arXiv:2403.18258  [pdf, other

    cs.CV cs.AI

    Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach

    Authors: Taro Togo, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  6. arXiv:2402.09677  [pdf, other

    cs.CV

    Prompt-based Personalized Federated Learning for Medical Visual Question Answering

    Authors: He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accept by ICASSP2024

  7. arXiv:2401.15863  [pdf, other

    cs.CV cs.AI cs.LG

    Importance-Aware Adaptive Dataset Distillation

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model t… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Published as a journal paper in Elsevier Neural Networks

  8. arXiv:2310.08277  [pdf, other

    eess.AS cs.SD

    A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

    Authors: Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa

    Abstract: We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting. This is achieved by integrating two modules into an SE model: 1) an internal separation module that does both speaker counting and separation; and 2) a TSE module that extrac… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures, 2 tables, accepted by ASRU2023

  9. arXiv:2309.10524  [pdf, other

    eess.AS cs.CL cs.SD

    Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

    Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: We present a novel integration of an instruction-tuned large language model (LLM) and end-to-end automatic speech recognition (ASR). Modern LLMs can perform a wide range of linguistic tasks within zero-shot learning when provided with a precise instruction or a prompt to guide the text generation process towards the desired task. We explore using this zero-shot capability of LLMs to extract lingui… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  10. arXiv:2309.04654  [pdf, other

    cs.SD eess.AS

    Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

    Authors: Huaibo Zhao, Yosuke Higuchi, Yusuke Kida, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: Achieving high accuracy with low latency has always been a challenge in streaming end-to-end automatic speech recognition (ASR) systems. By attending to more future contexts, a streaming ASR model achieves higher accuracy but results in larger latency, which hurts the streaming performance. In the Mask-CTC framework, an encoder network is trained to learn the feature representation that anticipate… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted to EUSIPCO 2023

  11. arXiv:2309.00376  [pdf, other

    eess.AS cs.SD

    Remixing-based Unsupervised Source Separation from Scratch

    Authors: Kohei Saijo, Tetsuji Ogawa

    Abstract: We propose an unsupervised approach for training separation models from scratch using RemixIT and Self-Remixing, which are recently proposed self-supervised learning methods for refining pre-trained models. They first separate mixtures with a teacher model and create pseudo-mixtures by shuffling and remixing the separated signals. A student model is then trained to separate the pseudo-mixtures usi… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: Interspeech2023, 5pages, 2figures, 2tables

  12. arXiv:2307.02799  [pdf, other

    eess.IV cs.LG

    Few-shot Personalized Saliency Prediction Based on Inter-personnel Gaze Patterns

    Authors: Yuya Moroto, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents few-shot personalized saliency prediction based on inter-personnel gaze patterns. In contrast to general saliency maps, personalized saliecny maps (PSMs) have been great potential since PSMs indicate the person-specific visual attention useful for obtaining individual visual preferences. The PSM prediction is needed for acquiring the PSMs for unseen images, but its prediction i… ▽ More

    Submitted 3 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: 5pages, 3 figures

  13. arXiv:2303.06806  [pdf, other

    eess.AS cs.CL cs.SD

    Neural Diarization with Non-autoregressive Intermediate Attractors

    Authors: Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

    Abstract: End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency betw… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  14. arXiv:2303.04388  [pdf, other

    cs.CV

    Interpretable Visual Question Answering Referring to Outside Knowledge

    Authors: He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: We present a novel multimodal interpretable VQA model that can answer the question more accurately and generate diverse explanations. Although researchers have proposed several methods that can generate human-readable and fine-grained natural language sentences to explain a model's decision, these methods have focused solely on the information in the image. Ideally, the model should refer to vario… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Under review

  15. arXiv:2302.08493  [pdf, other

    cs.CV cs.HC eess.IV

    Deep Multi-stream Network for Video-based Calving Sign Detection

    Authors: Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa

    Abstract: We have designed a deep multi-stream network for automatically detecting calving signs from video. Calving sign detection from a camera, which is a non-contact sensor, is expected to enable more efficient livestock management. As large-scale, well-developed data cannot generally be assumed when establishing calving detection systems, the basis for making the prediction needs to be presented to far… ▽ More

    Submitted 10 January, 2023; originally announced February 2023.

  16. arXiv:2301.03926  [pdf, other

    cs.HC cs.CV eess.IV

    Video Surveillance System Incorporating Expert Decision-making Process: A Case Study on Detecting Calving Signs in Cattle

    Authors: Ryosuke Hyodo, Susumu Saito, Teppei Nakano, Makoto Akabane, Ryoichi Kasuga, Tetsuji Ogawa

    Abstract: Through a user study in the field of livestock farming, we verify the effectiveness of an XAI framework for video surveillance systems. The systems can be made interpretable by incorporating experts' decision-making processes. AI systems are becoming increasingly common in real-world applications, especially in fields related to human decision-making, and its interpretability is necessary. However… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  17. arXiv:2212.09281  [pdf, other

    eess.IV cs.CV

    Boosting Automatic COVID-19 Detection Performance with Self-Supervised Learning and Batch Knowledge Ensembling

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Problem: Detecting COVID-19 from chest X-Ray (CXR) images has become one of the fastest and easiest methods for detecting COVID-19. However, the existing methods usually use supervised transfer learning from natural images as a pretraining process. These methods do not consider the unique features of COVID-19 and the similar features between COVID-19 and other pneumonia. Aim: In this paper, we wan… ▽ More

    Submitted 30 March, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Published as a journal paper at Elsevier CIBM

  18. arXiv:2212.09276  [pdf, other

    eess.IV cs.CV cs.LG

    COVID-19 Detection Based on Self-Supervised Transfer Learning Using Chest X-Ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Purpose: Considering several patients screened due to COVID-19 pandemic, computer-aided detection has strong potential in assisting clinical workflow efficiency and reducing the incidence of infections among radiologists and healthcare providers. Since many confirmed COVID-19 cases present radiological findings of pneumonia, radiologic examinations can be useful for fast detection. Therefore, ches… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: Published as a journal paper at Springer IJCARS

  19. Union-set Multi-source Model Adaptation for Semantic Segmentation

    Authors: Zongyao Li, Ren Togo, Takahiro Ogawa, Miki haseyama

    Abstract: This paper solves a generalized version of the problem of multi-source model adaptation for semantic segmentation. Model adaptation is proposed as a new domain adaptation problem which requires access to a pre-trained model instead of data for the source domain. A general multi-source setting of model adaptation assumes strictly that each source domain shares a common label space with the target d… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted by ECCV2022

  20. arXiv:2211.10194  [pdf, other

    eess.AS cs.SD

    Self-Remixing: Unsupervised Speech Separation via Separation and Remixing

    Authors: Kohei Saijo, Tetsuji Ogawa

    Abstract: We present Self-Remixing, a novel self-supervised speech separation method, which refines a pre-trained separation model in an unsupervised manner. The proposed method consists of a shuffler module and a solver module, and they grow together through separation and remixing processes. Specifically, the shuffler first separates observed mixtures and makes pseudo-mixtures by shuffling and remixing th… ▽ More

    Submitted 1 September, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP2023, 5pages, 2figures, 2tables

  21. arXiv:2211.00858  [pdf, other

    cs.SD eess.AS

    Conversation-oriented ASR with multi-look-ahead CBS architecture

    Authors: Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, ** Sakuma, Yusuke Kida, Tetsunori Kobayashi

    Abstract: During conversations, humans are capable of inferring the intention of the speaker at any point of the speech to prepare the following action promptly. Such ability is also the key for conversational systems to achieve rhythmic and natural conversation. To perform this, the automatic speech recognition (ASR) used for transcribing the speech in real-time must achieve high accuracy without delay. In… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP2023

  22. arXiv:2211.00795  [pdf, other

    eess.AS cs.CL cs.SD

    InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

    Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    Abstract: This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision. Momentum PL (MPL) trains a connectionist temporal classification (CTC)-based model on unlabeled data by continuously generating pseudo-labels on the fly and improving their quality. In contrast to autoregressive formulati… ▽ More

    Submitted 16 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP2023

  23. arXiv:2211.00792  [pdf, other

    eess.AS cs.CL cs.SD

    BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

    Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    Abstract: We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging… ▽ More

    Submitted 16 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP2023

  24. arXiv:2211.00313  [pdf, other

    cs.CV cs.LG eess.IV

    RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representation from X-Ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Purpose: Self-supervised learning has been gaining attention in the medical field for its potential to improve computer-aided diagnosis. One popular method of self-supervised learning is masked image modeling (MIM), which involves masking a subset of input pixels and predicting the masked pixels. However, traditional MIM methods typically use a random masking strategy, which may not be ideal for m… ▽ More

    Submitted 21 May, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

  25. arXiv:2210.16663  [pdf, other

    eess.AS cs.CL

    BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

    Authors: Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    Abstract: This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC). Our formulation relaxes the conditional independence assumptions used in conventional CTC and incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding. BERT-CTC attends to the full contexts of the… ▽ More

    Submitted 19 April, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: v1: Accepted to Findings of EMNLP2022, v2: Minor corrections and clearer derivation of Eq. (21)

  26. Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different dat… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: Published as a journal paper at Springer MTAP

  27. Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Background and objective: Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical da… ▽ More

    Submitted 1 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Published as a journal paper at Elsevier CMPB

  28. arXiv:2209.14609  [pdf, other

    cs.CV cs.AI cs.LG

    Dataset Distillation Using Parameter Pruning

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method.

    Submitted 20 August, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Published as a journal paper at IEICE Trans. Fund

  29. arXiv:2209.14603  [pdf, other

    cs.CR cs.CV cs.LG eess.IV

    Dataset Distillation for Medical Dataset Sharing

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Sharing medical datasets between hospitals is challenging because of the privacy-protection problem and the massive cost of transmitting and storing many high-resolution medical images. However, dataset distillation can synthesize a small dataset such that models trained on it achieve comparable performance with the original large dataset, which shows potential for solving the existing medical sha… ▽ More

    Submitted 23 December, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted by AAAI-23 Workshop on Representation Learning for Responsible Human-Centric AI

  30. arXiv:2209.07007  [pdf, other

    cs.LG cs.CV

    Gromov-Wasserstein Autoencoders

    Authors: Nao Nakagawa, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Variational Autoencoder (VAE)-based generative models offer flexible representation learning by incorporating meta-priors, general premises considered beneficial for downstream tasks. However, the incorporated meta-priors often involve ad-hoc model deviations from the original likelihood architecture, causing undesirable changes in their training. In this paper, we propose a novel representation l… ▽ More

    Submitted 24 February, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: 38 pages, 9 tables, 13 figures; accepted at ICLR2023

  31. TriBYOL: Triplet BYOL for Self-Supervised Representation Learning

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper proposes a novel self-supervised learning method for learning better representations with small batch sizes. Many self-supervised learning methods based on certain forms of the siamese network have emerged and received significant attention. However, these methods need to use large batch sizes to learn good representations and require heavy computational resources. We present a new trip… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Published as a conference paper at ICASSP 2022

  32. Self-Knowledge Distillation based Self-Supervised Learning for Covid-19 Detection from Chest X-Ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: The global outbreak of the Coronavirus 2019 (COVID-19) has overloaded worldwide healthcare systems. Computer-aided diagnosis for COVID-19 fast detection and patient triage is becoming critical. This paper proposes a novel self-knowledge distillation based self-supervised learning method for COVID-19 detection from chest X-ray images. Our method can use self-knowledge of images based on similaritie… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Published as a conference paper at ICASSP 2022

  33. arXiv:2203.14080  [pdf, ps, other

    eess.AS cs.SD

    Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

    Authors: Kohei Saijo, Tetsuji Ogawa

    Abstract: A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner. Generative adversarial networks are known to be effective in constructing separation networks when the ground truth for the observed signal is inaccessible. Still, weak objectives aimed at distribution-to-distribution map** make… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP2022

  34. arXiv:2110.10402  [pdf, other

    cs.SD cs.LG eess.AS

    An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

    Authors: Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintai… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted to APSIPA 2021

  35. arXiv:2110.04109  [pdf, other

    eess.AS cs.CL

    Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

    Authors: Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: In end-to-end automatic speech recognition (ASR), a model is expected to implicitly learn representations suitable for recognizing a word-level sequence. However, the huge abstraction gap between input acoustic signals and output linguistic tokens makes it challenging for a model to learn the representations. In this work, to promote the word-level representation learning in end-to-end ASR, we pro… ▽ More

    Submitted 8 February, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP2022

  36. arXiv:2104.02864  [pdf, other

    cs.CV

    Self-Supervised Learning for Gastritis Detection with Gastric X-ray Images

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: Purpose: Manual annotation of gastric X-ray images by doctors for gastritis detection is time-consuming and expensive. To solve this, a self-supervised learning method is developed in this study. The effectiveness of the proposed self-supervised learning method in gastritis detection is verified using a few annotated gastric X-ray images. Methods: In this study, we develop a novel method that can… ▽ More

    Submitted 27 March, 2023; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: Published as a journal paper at Springer IJCARS

  37. Soft-Label Anonymous Gastric X-ray Image Distillation

    Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents a soft-label anonymous gastric X-ray image distillation method based on a gradient descent approach. The sharing of medical data is demanded to construct high-accuracy computer-aided diagnosis (CAD) systems. However, the large size of the medical dataset and privacy protection are remaining problems in medical data sharing, which hindered the research of CAD systems. The idea o… ▽ More

    Submitted 20 March, 2024; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: The first paper to explore real-world dataset distillation; Work was done in 2019 and published as a conference paper at ICIP 2020

  38. arXiv:2012.10999  [pdf, other

    cs.HC

    Exploring Effectiveness of Inter-Microtask Qualification Tests in Crowdsourcing

    Authors: Masaya Morinaga, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

    Abstract: Qualification tests in crowdsourcing are often used to pre-filter workers by measuring their ability in executing microtasks.While creating qualification tests for each task type is considered as a common and reasonable way, this study investigates into its worker-filtering performance when the same qualification test is used across multiple types of tasks.On Amazon Mechanical Turk, we tested the… ▽ More

    Submitted 20 December, 2020; originally announced December 2020.

  39. arXiv:2010.13270  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Improved Mask-CTC for Non-Autoregressive End-to-End ASR

    Authors: Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: For real-world deployment of automatic speech recognition (ASR), the system is desired to be capable of fast inference while relieving the requirement of computational resources. The recently proposed end-to-end ASR system based on mask-predict with connectionist temporal classification (CTC), Mask-CTC, fulfills this demand by generating tokens in a non-autoregressive fashion. While Mask-CTC achie… ▽ More

    Submitted 16 February, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: Accepted to ICASSP2021

  40. arXiv:2005.08700  [pdf, other

    eess.AS cs.SD

    Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

    Authors: Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: We present Mask CTC, a novel non-autoregressive end-to-end automatic speech recognition (ASR) framework, which generates a sequence by refining outputs of the connectionist temporal classification (CTC). Neural sequence-to-sequence models are usually \textit{autoregressive}: each output token is generated by conditioning on previously generated tokens, at the cost of requiring as many iterations a… ▽ More

    Submitted 17 August, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted to INTERSPEECH2020

  41. Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications

    Authors: Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, Hikaru Ikuta

    Abstract: Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the fra… ▽ More

    Submitted 12 May, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

    Comments: 10 pages, 8 figures

    ACM Class: I.4

    Journal ref: IEEE MultiMedia 2020

  42. arXiv:2001.07761  [pdf, other

    cs.CV cs.LG eess.IV

    Block-wise Scrambled Image Recognition Using Adaptation Network

    Authors: Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

    Abstract: In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines. Hence, both the perceptual information hiding and the corresponding object recognition methods should be developed. Block-wise image scrambling is introduced to hide perceptual information from a third party. In addition, an adaptation network is propose… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: 6 pages Artificial Intelligence of Things(AAAI-2020 WS)

  43. arXiv:1910.11534  [pdf, other

    cs.CV

    Team PFDet's Methods for Open Images Challenge 2019

    Authors: Yusuke Niitani, Toru Ogawa, Shuji Suzuki, Takuya Akiba, Tommi Kerola, Kohei Ozaki, Shotaro Sano

    Abstract: We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively.

    Submitted 25 October, 2019; originally announced October 2019.

  44. arXiv:1908.00213  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    Chainer: A Deep Learning Framework for Accelerating the Research Cycle

    Authors: Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, Hiroyuki Yamazaki Vincent

    Abstract: Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

    Comments: Accepted for Applied Data Science Track in KDD'19

  45. arXiv:1907.08915  [pdf, other

    eess.IV cs.CV

    Automated Muscle Segmentation from Clinical CT using Bayesian U-Net for Personalized Musculoskeletal Modeling

    Authors: Yuta Hiasa, Yoshito Otake, Masaki Takao, Takeshi Ogawa, Nobuhiko Sugano, Yoshinobu Sato

    Abstract: We propose a method for automatic segmentation of individual muscles from a clinical CT. The method uses Bayesian convolutional neural networks with the U-Net architecture, using Monte Carlo dropout that infers an uncertainty metric in addition to the segmentation label. We evaluated the performance of the proposed method using two data sets: 20 fully annotated CTs of the hip and thigh regions and… ▽ More

    Submitted 9 December, 2019; v1 submitted 21 July, 2019; originally announced July 2019.

    Comments: 11 pages, 10 figures, and supplementary materials

  46. Dynamic Manipulation of Flexible Objects with Torque Sequence Using a Deep Neural Network

    Authors: Kento Kawaharazuka, Toru Ogawa, Juntaro Tamura, Cota Nabeshima

    Abstract: For dynamic manipulation of flexible objects, we propose an acquisition method of a flexible object motion equation model using a deep neural network and a control method to realize a target state by calculating an optimized time-series joint torque command. By using the proposed method, any physics model of a target object is not needed, and the object can be controlled as intended. We applied th… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

  47. arXiv:1811.10862  [pdf, other

    cs.CV

    Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

    Authors: Yusuke Niitani, Takuya Akiba, Tommi Kerola, Toru Ogawa, Shotaro Sano, Shuji Suzuki

    Abstract: Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account,… ▽ More

    Submitted 21 April, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: CVPR2019 oral

  48. arXiv:1811.10599  [pdf, ps, other

    quant-ph cs.IT math-ph

    Divergence radii and the strong converse exponent of classical-quantum channel coding with constant compositions

    Authors: Milán Mosonyi, Tomohiro Ogawa

    Abstract: There are different inequivalent ways to define the Rényi capacity of a channel for a fixed input distribution $P$. In a 1995 paper Csiszár has shown that for classical discrete memoryless channels there is a distinguished such quantity that has an operational interpretation as a generalized cutoff rate for constant composition channel coding. We show that the analogous notion of Rényi capacity, d… ▽ More

    Submitted 8 June, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: 46 pages. V7: Added the strong converse exponent with cost constraint

    Journal ref: IEEE Transactions on Information Theory, 67(3):1668-1698, (2021)

  49. arXiv:1809.00778  [pdf, other

    cs.CV

    PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

    Authors: Takuya Akiba, Tommi Kerola, Yusuke Niitani, Toru Ogawa, Shotaro Sano, Shuji Suzuki

    Abstract: We present a large-scale object detection system by team PFDet. Our system enables training with huge datasets using 512 GPUs, handles sparsely verified classes, and massive class imbalance. Using our method, we achieved 2nd place in the Google AI Open Images Object Detection Track 2018 on Kaggle.

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: Technical report for Open Images Challenge 2018 Object Detection Track

  50. arXiv:1803.08670  [pdf, ps, other

    cs.CV cs.MM

    Object Detection for Comics using Manga109 Annotations

    Authors: Toru Ogawa, Atsushi Otsubo, Rei Narita, Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection ta… ▽ More

    Submitted 26 March, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: http://www.manga109.org/en/