Skip to main content

Showing 1–39 of 39 results for author: Imoto, K

.
  1. arXiv:2406.07250  [pdf, other

    eess.AS cs.LG cs.SD

    Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

    Authors: Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, Yohei Kawaguchi

    Abstract: We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task 2: First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring. Continuing from last year's DCASE 2023 Challenge Task 2, we organize the task as a first-shot problem under domain generalization required settings. The main goal of the first-shot… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: anomaly detection, acoustic condition monitoring, domain shift, first-shot problem, DCASE Challenge. arXiv admin note: text overlap with arXiv:2305.07828

  2. arXiv:2406.02032  [pdf, other

    eess.AS cs.MM cs.SD

    M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation

    Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Masahiro Yasuda, Shunsuke Tsubaki, Keisuke Imoto

    Abstract: Contrastive language-audio pre-training (CLAP) enables zero-shot (ZS) inference of audio and exhibits promising performance in several classification tasks. However, conventional audio representations are still crucial for many tasks where ZS is not applicable (e.g., regression problems). Here, we explore a new representation, a general-purpose audio-language representation, that performs well in… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure, 5 tables. Accepted by Interspeech 2024

    MSC Class: 68T07

  3. arXiv:2403.17508  [pdf, other

    cs.SD eess.AS

    Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant

    Authors: Modan Tailleur, Junwon Lee, Mathieu Lagrange, Keunwoo Choi, Laurie M. Heller, Keisuke Imoto, Yuki Okamoto

    Abstract: This paper explores whether considering alternative domain-specific embeddings to calculate the Fréchet Audio Distance (FAD) metric can help the FAD to correlate better with perceptual ratings of environmental sounds. We used embeddings from VGGish, PANNs, MS-CLAP, L-CLAP, and MERT, which are tailored for either music or environmental sound evaluation. The FAD scores were calculated for sounds fro… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  4. arXiv:2403.11508  [pdf, other

    eess.AS

    Discriminative Neighborhood Smoothing for Generative Anomalous Sound Detection

    Authors: Takuya Fujimura, Keisuke Imoto, Tomoki Toda

    Abstract: We propose discriminative neighborhood smoothing of generative anomaly scores for anomalous sound detection. While the discriminative approach is known to achieve better performance than generative approaches often, we have found that it sometimes causes significant performance degradation due to the discrepancy between the training and test data, making it less robust than the generative approach… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO 2024

  5. arXiv:2403.10756  [pdf, other

    eess.AS cs.SD

    Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

    Authors: Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

    Abstract: The aim of this research is to refine knowledge transfer on audio-image temporal agreement for audio-text cross retrieval. To address the limited availability of paired non-speech audio-text data, learning methods for transferring the knowledge acquired from a large amount of paired audio-image data to shared audio-text representation have been investigated, suggesting the importance of how audio-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO2024

  6. arXiv:2312.09143  [pdf, other

    cs.SD eess.AS

    F1-EV Score: Measuring the Likelihood of Estimating a Good Decision Threshold for Semi-Supervised Anomaly Detection

    Authors: Kevin Wilkinghoff, Keisuke Imoto

    Abstract: Anomalous sound detection (ASD) systems are usually compared by using threshold-independent performance measures such as AUC-ROC. However, for practical applications a decision threshold is needed to decide whether a given test sample is normal or anomalous. Estimating such a threshold is highly non-trivial in a semi-supervised setting where only normal training samples are available. In this work… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted for presentation at IEEE ICASSP 2024

  7. arXiv:2305.17758  [pdf, ps, other

    cs.SD eess.AS

    CAPTDURE: Captioned Sound Dataset of Single Sources

    Authors: Yuki Okamoto, Kanta Shimonishi, Keisuke Imoto, Kota Dohi, Shota Horiguchi, Yohei Kawaguchi

    Abstract: In conventional studies on environmental sound separation and synthesis using captions, datasets consisting of multiple-source sounds with their captions were used for model training. However, when we collect the captions for multiple-source sound, it is not easy to collect detailed captions for each sound source, such as the number of sound occurrences and timbre. Therefore, it is difficult to ex… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH2023

  8. arXiv:2305.07828  [pdf, other

    cs.SD cs.LG eess.AS

    Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

    Authors: Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Yohei Kawaguchi

    Abstract: We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: ``First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring''. The main goal is to enable rapid deployment of ASD systems for new kinds of machines without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned h… ▽ More

    Submitted 2 November, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: anomaly detection, acoustic condition monitoring, domain shift, first-shot problem, DCASE Challenge, Accepted in DCASE2023 Workshop

  9. arXiv:2305.00302  [pdf, ps, other

    cs.SD eess.AS

    Environmental sound synthesis from vocal imitations and sound event labels

    Authors: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita

    Abstract: One way of expressing an environmental sound is using vocal imitations, which involve the process of replicating or mimicking the rhythm and pitch of sounds by voice. We can effectively express the features of environmental sounds, such as rhythm and pitch, using vocal imitations, which cannot be expressed by conventional input information, such as sound event labels, images, or texts, in an envir… ▽ More

    Submitted 14 September, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

    Comments: Submitted to ICASSP2024

  10. arXiv:2304.12521  [pdf, other

    cs.SD eess.AS

    Foley Sound Synthesis at the DCASE 2023 Challenge

    Authors: Keunwoo Choi, Jaekwon Im, Laurie Heller, Brian McFee, Keisuke Imoto, Yuki Okamoto, Mathieu Lagrange, Shinosuke Takamichi

    Abstract: The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic F… ▽ More

    Submitted 28 September, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: DCASE 2023 Challenge - Task 7 - Technical Report (Submitted to DCASE 2023 Workshop)

  11. arXiv:2210.09173  [pdf, other

    cs.SD eess.AS

    Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

    Authors: Hien Ohnaka, Shinnosuke Takamichi, Keisuke Imoto, Yuki Okamoto, Kazuki Fujii, Hiroshi Saruwatari

    Abstract: We propose a method for synthesizing environmental sounds from visually represented onomatopoeias and sound sources. An onomatopoeia is a word that imitates a sound structure, i.e., the text representation of sound. From this perspective, onoma-to-wave has been proposed to synthesize environmental sounds from the desired onomatopoeia texts. Onomatopoeias have another representation: visual-text re… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  12. arXiv:2208.07679  [pdf, ps, other

    cs.SD eess.AS

    How Should We Evaluate Synthesized Environmental Sounds

    Authors: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Takahiro Fukumori, Yoichi Yamashita

    Abstract: Although several methods of environmental sound synthesis have been proposed, there has been no discussion on how synthesized environmental sounds should be evaluated. Only either subjective or objective evaluations have been conducted in conventional evaluations, and it is not clear what type of evaluation should be carried out. In this paper, we investigate how to evaluate synthesized environmen… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: Submitted APSIPA ASC 2022

  13. arXiv:2207.04357  [pdf, ps, other

    cs.SD eess.AS

    Joint Analysis of Acoustic Scenes and Sound Events with Weakly labeled Data

    Authors: Shunsuke Tsubaki, Keisuke Imoto, Nobutaka Ono

    Abstract: Considering that acoustic scenes and sound events are closely related to each other, in some previous papers, a joint analysis of acoustic scenes and sound events utilizing multitask learning (MTL)-based neural networks was proposed. In conventional methods, a strongly supervised scheme is applied to sound event detection in MTL models, which requires strong labels of sound events in model trainin… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: Accepted to IWAENC2022

  14. arXiv:2206.10349  [pdf, ps, other

    cs.SD eess.AS

    Joint Analysis of Acoustic Scenes and Sound Events Based on Multitask Learning with Dynamic Weight Adaptation

    Authors: Kayo Nada, Keisuke Imoto, Takao Tsuchiya

    Abstract: Acoustic scene classification (ASC) and sound event detection (SED) are major topics in environmental sound analysis. Considering that acoustic scenes and sound events are closely related to each other, the joint analysis of acoustic scenes and sound events using multitask learning (MTL)-based neural networks was proposed in some previous works. Conventional methods train MTL-based models using a… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Submitted to Acoustical Science and Technology

  15. arXiv:2206.05876  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

    Authors: Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi

    Abstract: We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''. Domain shifts are a critical problem for the application of ASD systems. Because domain shifts can change the acoustic characteristics of data, a model trained in a source domai… ▽ More

    Submitted 21 November, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.04492

  16. arXiv:2204.06402  [pdf, ps, other

    cs.SD eess.AS

    Sound Event Triage: Detecting Sound Events Considering Priority of Classes

    Authors: Noriyuki Tonami, Keisuke Imoto

    Abstract: We propose a new task for sound event detection (SED): sound event triage (SET). The goal of SET is to detect an arbitrary number of high-priority event classes while allowing misdetections of low-priority event classes where the priority is given for each event class. In conventional methods of SED for targeting a specific sound event class, it is only possible to give priority to a single event… ▽ More

    Submitted 11 January, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted to EURASIP Journal on Audio, Speech, and Music Processing

  17. arXiv:2204.02279  [pdf, ps, other

    cs.SD eess.AS

    How Information on Acoustic Scenes and Sound Events Mutually Benefits Event Detection and Scene Classification Tasks

    Authors: Keisuke Imoto, Yuka Komatsu, Shunsuke Tsubaki, Tatsuya Komatsu

    Abstract: Acoustic scene classification (ASC) and sound event detection (SED) are fundamental tasks in environmental sound analysis, and many methods based on deep learning have been proposed. Considering that information on acoustic scenes and sound events helps SED and ASC mutually, some researchers have proposed a joint analysis of acoustic scenes and sound events by multitask learning (MTL). However, co… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  18. arXiv:2112.00209  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Environmental Sound Extraction Using Onomatopoeic Words

    Authors: Yuki Okamoto, Shota Horiguchi, Masaaki Yamamoto, Keisuke Imoto, Yohei Kawaguchi

    Abstract: An onomatopoeic word, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre. We propose an environmental-sound-extraction method using onomatopoeic words to specify the target sound to be extracted. By this method, we estimate a time-frequency mask from an input mixture spectrogram and an onomatopoe… ▽ More

    Submitted 16 February, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: Accepted to ICASSP2022

  19. arXiv:2110.03243  [pdf, ps, other

    cs.SD

    Sound Event Detection Guided by Semantic Contexts of Scenes

    Authors: Noriyuki Tonami, Keisuke Imoto, Ryotaro Nagase, Yuki Okamoto, Takahiro Fukumori, Yoichi Yamashita

    Abstract: Some studies have revealed that contexts of scenes (e.g., "home," "office," and "cooking") are advantageous for sound event detection (SED). Mobile devices and sensing technologies give useful information on scenes for SED without the use of acoustic signals. However, conventional methods can employ pre-defined contexts in inference stages but not undefined contexts. This is because one-hot repres… ▽ More

    Submitted 17 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  20. arXiv:2106.04492  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

    Authors: Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Niizumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, Takashi Endo

    Abstract: We present the task description and discussion on the results of the DCASE 2021 Challenge Task 2. In 2020, we organized an unsupervised anomalous sound detection (ASD) task, identifying whether a given sound was normal or anomalous without anomalous training data. In 2021, we organized an advanced unsupervised ASD task under domain-shift conditions, which focuses on the inevitable problem of the p… ▽ More

    Submitted 27 September, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to DCASE 2021 Workshop

  21. arXiv:2105.01836  [pdf, ps, other

    cs.SD eess.AS

    Acoustic Scene Classification Using Multichannel Observation with Partially Missing Channels

    Authors: Keisuke Imoto

    Abstract: Sounds recorded with smartphones or IoT devices often have partially unreliable observations caused by clip**, wind noise, and completely missing parts due to microphone failure and packet loss in data transmission over the network. In this paper, we investigate the impact of the partially missing channels on the performance of acoustic scene classification using multichannel audio recordings, e… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: Accepted to EUSIPCO2021

  22. arXiv:2102.05872  [pdf, ps, other

    cs.SD eess.AS

    Onoma-to-wave: Environmental sound synthesis from onomatopoeic words

    Authors: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, Yoichi Yamashita

    Abstract: In this paper, we propose a framework for environmental sound synthesis from onomatopoeic words. As one way of expressing an environmental sound, we can use an onomatopoeic word, which is a character sequence for phonetically imitating a sound. An onomatopoeic word is effective for describing diverse sound features. Therefore, using onomatopoeic words for environmental sound synthesis will enable… ▽ More

    Submitted 7 February, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Accepted to APSIPA Transactions on Signal and Information Processing

  23. arXiv:2102.05288  [pdf, ps, other

    cs.SD

    Sound Event Detection Based on Curriculum Learning Considering Learning Difficulty of Events

    Authors: Noriyuki Tonami, Keisuke Imoto, Yuki Okamoto, Takahiro Fukumori, Yoichi Yamashita

    Abstract: In conventional sound event detection (SED) models, two types of events, namely, those that are present and those that do not occur in an acoustic scene, are regarded as the same type of events. The conventional SED methods cannot effectively exploit the difference between the two types of events. All time frames of sound events that do not occur in an acoustic scene are easily regarded as inactiv… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021

  24. arXiv:2102.01927  [pdf, ps, other

    cs.SD eess.AS

    Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

    Authors: Keisuke Imoto, Sakiko Mishima, Yumi Arai, Reishi Kondo

    Abstract: In many methods of sound event detection (SED), a segmented time frame is regarded as one data sample to model training. The durations of sound events greatly depend on the sound event class, e.g., the sound event "fan" has a long duration, whereas the sound event "mouse clicking" is instantaneous. Thus, the difference in the duration between sound event classes results in a serious data imbalance… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.15253

  25. arXiv:2012.11834  [pdf, other

    cs.LG cs.CV

    Dual-encoder Bidirectional Generative Adversarial Networks for Anomaly Detection

    Authors: Teguh Budianto, Tomohiro Nakai, Kazunori Imoto, Takahiro Takimoto, Kosuke Haruki

    Abstract: Generative adversarial networks (GANs) have shown promise for various problems including anomaly detection. When anomaly detection is performed using GAN models that learn only the features of normal data samples, data that are not similar to normal data are detected as abnormal samples. The present approach is developed by employing a dual-encoder in a bidirectional GAN architecture that is train… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)

  26. Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

    Authors: Noriyuki Tonami, Keisuke Imoto, Ryosuke Yamanishi, Yoichi Yamashita

    Abstract: Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separatel… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: Accepted to IEICE Transactions on Information and Systems. arXiv admin note: text overlap with arXiv:1904.12146

  27. arXiv:2007.04719  [pdf, ps, other

    cs.SD eess.AS

    RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis

    Authors: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, Yoichi Yamashita

    Abstract: Environmental sound synthesis is a technique for generating a natural environmental sound. Conventional work on environmental sound synthesis using sound event labels cannot finely control synthesized sounds, for example, the pitch and timbre. We consider that onomatopoeic words can be used for environmental sound synthesis. Onomatopoeic words are effective for explaining the feature of sounds. We… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: Submitted to DCASE2020 workshop

  28. arXiv:2006.15253  [pdf, ps, other

    cs.SD eess.AS

    Sound Event Detection Using Duration Robust Loss Function

    Authors: Daichi Akiyama, Keisuke Imoto, Noriyuki Tonami, Yuki Okamoto, Ryosuke Yamanishi, Takahiro Fukumori, Yoichi Yamashita

    Abstract: Many methods of sound event detection (SED) based on machine learning regard a segmented time frame as one data sample to model training. However, the sound durations of sound events vary greatly depending on the sound event class, e.g., the sound event ``fan'' has a long time duration, while the sound event ``mouse clicking'' is instantaneous. The difference in the time duration between sound eve… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

    Comments: Submitted to DCASE2020 Workshop

  29. arXiv:2006.05822  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

    Authors: Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, Noboru Harada

    Abstract: In this paper, we present the task description and discuss the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condi… ▽ More

    Submitted 8 August, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Submitted to DCASE2020 Workshop

  30. Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-occurrence

    Authors: Keisuke Imoto, Seisuke Kyochi

    Abstract: A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events "dishes" and "glass **gling" are likely to co-occur in the acoustic scene "cooking". In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Comments: Accepted to IEICE Transactions on Information and Systems

  31. arXiv:2002.05994  [pdf, ps, other

    eess.AS cs.SD

    Sound Event Localization based on Sound Intensity Vector Refined By DNN-Based Denoising and Source Separation

    Authors: Masahiro Yasuda, Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Keisuke Imoto

    Abstract: We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). Direct estimation of DOA using a deep neural network (DNN), i.e. completely-datadriven approach, achieves high accuracy. However, there is a gap in the accuracy between DOA estimation for single and overlap** sources because they cannot incorporate physical knowledge. Meanwhile, although… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: 5 pages, 3 figures, to appear in IEEE ICASSP 2020

  32. arXiv:2002.05848  [pdf, ps, other

    cs.SD eess.AS

    Sound Event Detection by Multitask Learning of Sound Events and Scenes with Soft Scene Labels

    Authors: Keisuke Imoto, Noriyuki Tonami, Yuma Koizumi, Masahiro Yasuda, Ryosuke Yamanishi, Yoichi Yamashita

    Abstract: Sound event detection (SED) and acoustic scene classification (ASC) are major tasks in environmental sound analysis. Considering that sound events and scenes are closely related to each other, some works have addressed joint analyses of sound events and acoustic scenes based on multitask learning (MTL), in which the knowledge of sound events and scenes can help in estimating them mutually. The con… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted to ICASSP 2020

  33. Graph Cepstrum: Spatial Feature Extracted from Partially Connected Microphones

    Authors: Keisuke Imoto

    Abstract: In this paper, we propose an effective and robust method of spatial feature extraction for acoustic scene analysis utilizing partially synchronized and/or closely located distributed microphones. In the proposed method, a new cepstrum feature utilizing a graph-based basis transformation to extract spatial information from distributed microphones, while taking into account whether any pairs of micr… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: Accepted to IEICE Transactions on Information and Systems. arXiv admin note: substantial text overlap with arXiv:1805.11782

  34. arXiv:1908.10055  [pdf, ps, other

    cs.SD eess.AS

    Overview of Tasks and Investigation of Subjective Evaluation Methods in Environmental Sound Synthesis and Conversion

    Authors: Yuki Okamoto, Keisuke Imoto, Tatsuya Komatsu, Shinnosuke Takamichi, Takumi Yagyu, Ryosuke Yamanishi, Yoichi Yamashita

    Abstract: Synthesizing and converting environmental sounds have the potential for many applications such as supporting movie and game production, data augmentation for sound event detection and scene classification. Conventional works on synthesizing and converting environmental sounds are based on a physical modeling or concatenative approach. However, there are a limited number of works that have addresse… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

  35. arXiv:1908.03299  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

    Authors: Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, Keisuke Imoto

    Abstract: This paper introduces a new dataset called "ToyADMOS" designed for anomaly detection in machine operating sounds (ADMOS). To the best our knowledge, no large-scale datasets are available for ADMOS, although large-scale datasets have contributed to recent advancements in acoustic signal processing. This is because anomalous sound data are difficult to collect. To build a large-scale dataset for ADM… ▽ More

    Submitted 8 August, 2019; originally announced August 2019.

    Comments: 5 pages, to appear in IEEE WASPAA 2019

  36. arXiv:1904.12146  [pdf, ps, other

    cs.SD eess.AS

    Joint Analysis of Acoustic Events and Scenes Based on Multitask Learning

    Authors: Noriyuki Tonami, Keisuke Imoto, Masahiro Niitsuma, Ryosuke Yamanishi, Yoichi Yamashita

    Abstract: Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however, acoustic events and scenes are closely related to each other. For example, in the acoustic scene `office', the acoustic events `mouse clicking' and `keyboard typ… ▽ More

    Submitted 18 July, 2019; v1 submitted 27 April, 2019; originally announced April 2019.

    Comments: Accepted to WASPAA 2019

  37. arXiv:1902.00816  [pdf, ps, other

    cs.SD eess.AS

    Sound Event Detection Using Graph Laplacian Regularization Based on Event Co-occurrence

    Authors: Keisuke Imoto, Seisuke Kyochi

    Abstract: The types of sound events that occur in a situation are limited, and some sound events are likely to co-occur; for instance, ``dishes'' and ``glass **gling.'' In this paper, we propose a technique of sound event detection utilizing graph Laplacian regularization taking the sound event co-occurrence into account. In the proposed method, sound event occurrences are represented as a graph whose node… ▽ More

    Submitted 18 February, 2019; v1 submitted 2 February, 2019; originally announced February 2019.

    Comments: Accepted to ICASSP 2019

  38. arXiv:1811.04048  [pdf, ps, other

    eess.AS cs.SD

    Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection

    Authors: Sandeep Kothinti, Keisuke Imoto, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe, Mounya Elhilali

    Abstract: Sound event detection is a challenging task, especially for scenes with multiple simultaneous events. While event classification methods tend to be fairly accurate, event localization presents additional challenges, especially when large amounts of labeled data are not available. Task4 of the 2018 DCASE challenge presents an event detection task that requires accuracy in both segmentation and reco… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

    Comments: Submitted to ICASSP 2019

  39. arXiv:1805.11782  [pdf, ps, other

    cs.SD eess.AS

    Acoustic Scene Analysis Using Partially Connected Microphones Based on Graph Cepstrum

    Authors: Keisuke Imoto

    Abstract: In this paper, we propose an effective and robust method for acoustic scene analysis based on spatial information extracted from partially synchronized and/or closely located distributed microphones. In the proposed method, to extract spatial information from distributed microphones while taking into account whether any pairs of microphones are synchronized and/or closely located, we derive a new… ▽ More

    Submitted 8 July, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: Accepted to EUSIPCO 2018