Skip to main content

Showing 1–21 of 21 results for author: Niizumi, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07250  [pdf, other

    eess.AS cs.LG cs.SD

    Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

    Authors: Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, Yohei Kawaguchi

    Abstract: We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task 2: First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring. Continuing from last year's DCASE 2023 Challenge Task 2, we organize the task as a first-shot problem under domain generalization required settings. The main goal of the first-shot… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: anomaly detection, acoustic condition monitoring, domain shift, first-shot problem, DCASE Challenge. arXiv admin note: text overlap with arXiv:2305.07828

  2. arXiv:2406.02032  [pdf, other

    eess.AS cs.MM cs.SD

    M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Masahiro Yasuda, Shunsuke Tsubaki, Keisuke Imoto

    Abstract: Contrastive language-audio pre-training (CLAP) enables zero-shot (ZS) inference of audio and exhibits promising performance in several classification tasks. However, conventional audio representations are still crucial for many tasks where ZS is not applicable (e.g., regression problems). Here, we explore a new representation, a general-purpose audio-language representation, that performs well in… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure, 5 tables. Accepted by Interspeech 2024

    MSC Class: 68T07

  3. arXiv:2404.17107  [pdf, other

    eess.AS cs.SD

    Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: To reduce the need for skilled clinicians in heart sound interpretation, recent studies on automating cardiac auscultation have explored deep learning approaches. However, despite the demands for large data for deep learning, the size of the heart sound datasets is limited, and no pre-trained model is available. On the contrary, many pre-trained models for general audio tasks are available as gene… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 4 pages, 1 figure, and 4 tables. Accepted by IEEE EMBC 2024

    MSC Class: 68T07

  4. arXiv:2404.06095  [pdf, other

    eess.AS cs.SD

    Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Self-supervised learning (SSL) using masked prediction has made great strides in general-purpose audio representation. This study proposes Masked Modeling Duo (M2D), an improved masked prediction SSL, which learns by predicting representations of masked input signals that serve as training signals. Unlike conventional methods, M2D obtains a training signal by encoding only the masked part, encoura… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 15 pages, 6 figures, 15 tables. Accepted by TASLP

    MSC Class: 68T07

  5. arXiv:2403.10756  [pdf, other

    eess.AS cs.SD

    Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

    Authors: Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

    Abstract: The aim of this research is to refine knowledge transfer on audio-image temporal agreement for audio-text cross retrieval. To address the limited availability of paired non-speech audio-text data, learning methods for transferring the knowledge acquired from a large amount of paired audio-image data to shared audio-text representation have been investigated, suggesting the importance of how audio-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO2024

  6. arXiv:2308.11923  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

    Authors: Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

    Abstract: We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips. The ADC solves the problem that conventional audio captioning sometimes generates similar captions for similar audio clips, failing to describe the difference in content. We also propose a cross-attentio… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to DCASE2023 Workshop

  7. arXiv:2305.14079  [pdf, other

    eess.AS cs.SD

    Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Self-supervised learning general-purpose audio representations have demonstrated high performance in a variety of tasks. Although they can be optimized for application by fine-tuning, even higher performance can be expected if they can be specialized to pre-train for an application. This paper explores the challenges and solutions in specializing general-purpose audio representations for a specifi… ▽ More

    Submitted 3 August, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023; 5+2 pages, 2 figures, 6+6 tables, Code: https://github.com/nttcslab/m2d/tree/master/speech

    MSC Class: 68T07

  8. arXiv:2305.07828  [pdf, other

    cs.SD cs.LG eess.AS

    Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

    Authors: Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Yohei Kawaguchi

    Abstract: We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: ``First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring''. The main goal is to enable rapid deployment of ASD systems for new kinds of machines without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned h… ▽ More

    Submitted 2 November, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: anomaly detection, acoustic condition monitoring, domain shift, first-shot problem, DCASE Challenge, Accepted in DCASE2023 Workshop

  9. arXiv:2303.00455  [pdf, other

    eess.AS cs.SD

    First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

    Authors: Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda

    Abstract: This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring. First-shot ASD does not allow systems to do machine-type dependent hyperparameter tuning or tool ensembling based on the performance metric calculated with the grand truth. To show benchmark performance for First-shot ASD, this paper proposes an anomaly sound detect… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 5 pages, 2 figures

  10. arXiv:2210.14648  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it learns representations indirectly by reconstructing masked input patches. Several methods learn representations directly by predicting representations of masked patches; however, we think using all patches to encode training signal representations is suboptimal. We propose a new method, Masked Modeling Duo (M… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 6 pages, 3 figures, and 6 tables. To appear at ICASSP2023

    MSC Class: 68T07

  11. arXiv:2207.11964  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    ConceptBeam: Concept Driven Target Speech Extraction

    Authors: Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

    Abstract: We propose a novel framework for target speech extraction based on semantic information, called ConceptBeam. Target speech extraction means extracting the speech of a target speaker in a mixture. Typical approaches have been exploiting properties of audio signals, such as harmonic structure and direction of arrival. In contrast, ConceptBeam tackles the problem with semantic clues. Specifically, we… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM Multimedia 2022

  12. arXiv:2207.09732  [pdf, other

    eess.AS cs.CL cs.IR cs.LG cs.SD

    Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

    Authors: Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

    Abstract: The amount of audio data available on public websites is growing rapidly, and an efficient mechanism for accessing the desired data is necessary. We propose a content-based audio retrieval method that can retrieve a target audio that is similar to but slightly different from the query audio by introducing auxiliary textual information which describes the difference between the query and target aud… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to Interspeech 2022

  13. arXiv:2206.05876  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

    Authors: Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi

    Abstract: We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''. Domain shifts are a critical problem for the application of ASD systems. Because domain shifts can change the acoustic characteristics of data, a model trained in a source domai… ▽ More

    Submitted 21 November, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.04492

  14. arXiv:2205.08138  [pdf, ps, other

    eess.AS cs.SD

    Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Many application studies rely on audio DNN models pre-trained on a large-scale dataset as essential feature extractors, and they extract features from the last layers. In this study, we focus on our finding that the middle layer features of existing supervised pre-trained models are more effective than the late layer features for some tasks. We propose a simple approach to compose features effecti… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 5 pages, 4 figures and 4 tables. Accepted by EUSIPCO 2022

    MSC Class: 68T07

  15. arXiv:2204.12260  [pdf, other

    eess.AS cs.SD

    Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Recent general-purpose audio representations show state-of-the-art performance on various audio tasks. These representations are pre-trained by self-supervised learning methods that create training signals from the input. For example, typical audio contrastive learning uses temporal relationships among input sounds to create training signals, whereas some methods use a difference among input views… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: 22 pages, 8 figures. Under the review process

    MSC Class: 68T07

    Journal ref: HEAR: Holistic Evaluation of Audio Representations (NeurIPS 2021 Competition) PMLR 166 (2022) 1-24

  16. BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Pre-trained models are essential as feature extractors in modern machine learning systems in various domains. In this study, we hypothesize that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound. For recognizing sounds regardless of perturbations such as varying pitch or timbre, features should be robust to these perturbations.… ▽ More

    Submitted 16 June, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: 15 pages, 6 figures, and 15 tables. Under the review process

    MSC Class: 68T07

    Journal ref: IEEE/ACM Trans. Audio, Speech, Language Process. 31 (2023) 137-151

  17. arXiv:2106.04492  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

    Authors: Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Niizumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, Takashi Endo

    Abstract: We present the task description and discussion on the results of the DCASE 2021 Challenge Task 2. In 2020, we organized an unsupervised anomalous sound detection (ASD) task, identifying whether a given sound was normal or anomalous without anomalous training data. In 2021, we organized an advanced unsupervised ASD task under domain-shift conditions, which focuses on the inevitable problem of the p… ▽ More

    Submitted 27 September, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to DCASE 2021 Workshop

  18. arXiv:2106.02369  [pdf, other

    eess.AS

    ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions

    Authors: Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito

    Abstract: This paper proposes a new large-scale dataset called "ToyADMOS2" for anomaly detection in machine operating sounds (ADMOS). As did for our previous ToyADMOS dataset, we collected a large number of operating sounds of miniature machines (toys) under normal and anomaly conditions by deliberately damaging them but extended with providing controlled depth of damages in anomaly samples. Since typical a… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 5 pages, 4 figures

  19. arXiv:2103.06695  [pdf, other

    eess.AS cs.LG cs.SD

    BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach. We propose learning general-purpose audio representation from a single audio segment without expecting relationships between different time segments of audio samples. To implement this principle… ▽ More

    Submitted 20 April, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: IJCNN 2021, 8 pages, 4 figures

    MSC Class: 68T07

  20. arXiv:2012.07331  [pdf, other

    eess.AS cs.CL cs.SD

    Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

    Authors: Yuma Koizumi, Yasunori Ohishi, Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda

    Abstract: The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted in… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: Submitted to ICASSP 2021

  21. arXiv:1808.02357  [pdf, other

    eess.AS cs.CV cs.LG cs.SD stat.ML

    Acoustic Scene Classification: A Competition Review

    Authors: Shayan Gharib, Honain Derrar, Daisuke Niizumi, Tuukka Senttula, Janne Tommola, Toni Heittola, Tuomas Virtanen, Heikki Huttunen

    Abstract: In this paper we study the problem of acoustic scene classification, i.e., categorization of audio sequences into mutually exclusive classes based on their spectral content. We describe the methods and results discovered during a competition organized in the context of a graduate machine learning course; both by the students and external participants. We identify the most suitable methods and stud… ▽ More

    Submitted 2 August, 2018; originally announced August 2018.

    Comments: This work has been accepted in IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2018). Copyright may be transferred without notice, after which this version may no longer be accessible