Skip to main content

Showing 1–17 of 17 results for author: Koizumi, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2210.01029  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

    Authors: Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani

    Abstract: Denoising diffusion probabilistic models (DDPMs) and generative adversarial networks (GANs) are popular generative models for neural vocoders. The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, respectively. This study proposes a fast and high-quality neural vocoder called \textit{WaveFit}, which integrates the essence of GANs into a DDPM-like it… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  2. arXiv:2206.05876  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

    Authors: Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi

    Abstract: We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''. Domain shifts are a critical problem for the application of ASD systems. Because domain shifts can change the acoustic characteristics of data, a model trained in a source domai… ▽ More

    Submitted 21 November, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.04492

  3. arXiv:2203.16749  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Sha**

    Authors: Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani

    Abstract: Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality es… ▽ More

    Submitted 4 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to Interspeech 2022

  4. arXiv:2106.04492  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

    Authors: Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Niizumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, Takashi Endo

    Abstract: We present the task description and discussion on the results of the DCASE 2021 Challenge Task 2. In 2020, we organized an unsupervised anomalous sound detection (ASD) task, identifying whether a given sound was normal or anomalous without anomalous training data. In 2021, we organized an advanced unsupervised ASD task under domain-shift conditions, which focuses on the inevitable problem of the p… ▽ More

    Submitted 27 September, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to DCASE 2021 Workshop

  5. arXiv:2007.00225  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

    Authors: Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

    Abstract: This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning. Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy. We simultaneously solve the main caption generation and sub i… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: Technical Report of DCASE2020 Challenge Task 6

  6. arXiv:2007.00222  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    A Transformer-based Audio Captioning Model with Keyword Estimation

    Authors: Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito

    Abstract: One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation calle… ▽ More

    Submitted 8 August, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Accepted to Interspeech 2020

  7. arXiv:2006.05822  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

    Authors: Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, Noboru Harada

    Abstract: In this paper, we present the task description and discuss the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condi… ▽ More

    Submitted 8 August, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Submitted to DCASE2020 Workshop

  8. arXiv:2002.05879  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

    Authors: Masaki Kawanaka, Yuma Koizumi, Ryoichi Miyazaki, Kohei Yatabe

    Abstract: Improving subjective sound quality of enhanced signals is one of the most important missions in speech enhancement. For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality). However, direct use of such measures for training deep neural network (DNN) is… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: accepted to the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

  9. arXiv:2002.05873  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

    Authors: Yuma Koizumi, Kohei Yatabe, Marc Delcroix, Yoshiki Masuyama, Daiki Takeuchi

    Abstract: This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)--based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and s… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: 5 pages, to appear in IEEE ICASSP 2020

  10. arXiv:1910.04415  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector

    Authors: Masahiro Yasuda, Yuma Koizumi, Luca Mazzon, Shoichiro Saito, Hisashi Uematsu

    Abstract: We propose a direction of arrival (DOA) estimation method that combines sound-intensity vector (IV)-based DOA estimation and DNN-based denoising and dereverberation. Since the accuracy of IV-based DOA estimation degrades due to environmental noise and reverberation, two DNNs are used to remove such effects from the observed IVs. DOA is then estimated from the refined IVs based on the physics of wa… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: 4 pages

  11. arXiv:1910.04388  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation

    Authors: Luca Mazzon, Yuma Koizumi, Masahiro Yasuda, Noboru Harada

    Abstract: In this paper, we propose a novel data augmentation method for training neural networks for Direction of Arrival (DOA) estimation. This method focuses on expanding the representation of the DOA subspace of a dataset. Given some input data, it applies a transformation to it in order to change its DOA information and simulate new potentially unseen one. Such transformation, in general, is a combinat… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: 5 pages, to appear in DCASE 2019

  12. arXiv:1908.03299  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

    Authors: Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, Keisuke Imoto

    Abstract: This paper introduces a new dataset called "ToyADMOS" designed for anomaly detection in machine operating sounds (ADMOS). To the best our knowledge, no large-scale datasets are available for ADMOS, although large-scale datasets have contributed to recent advancements in acoustic signal processing. This is because anomalous sound data are difficult to collect. To build a large-scale dataset for ADM… ▽ More

    Submitted 8 August, 2019; originally announced August 2019.

    Comments: 5 pages, to appear in IEEE WASPAA 2019

  13. arXiv:1907.08338  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

    Authors: Yuma Koizumi, Shoichiro Saito, Masataka Yamaguchi, Shin Murata, Noboru Harada

    Abstract: Use of an autoencoder (AE) as a normal model is a state-of-the-art technique for unsupervised-anomaly detection in sounds (ADS). The AE is trained to minimize the sample mean of the anomaly score of normal sounds in a mini-batch. One problem with this approach is that the anomaly score of rare-normal sounds becomes higher than that of frequent-normal sounds, because the sample mean is strongly aff… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

    Comments: 5 pages, to appear in IEEE WASPAA 2019

  14. arXiv:1812.05796  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation

    Authors: Masataka Yamaguchi, Yuma Koizumi, Noboru Harada

    Abstract: We tackle unsupervised anomaly detection (UAD), a problem of detecting data that significantly differ from normal data. UAD is typically solved by using density estimation. Recently, deep neural network (DNN)-based density estimators, such as Normalizing Flows, have been attracting attention. However, one of their drawbacks is the difficulty in adapting them to the change in the normal data's dist… ▽ More

    Submitted 13 March, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

    Comments: Accepted to ICASSP2019

  15. arXiv:1811.02438  [pdf, other

    eess.AS cs.LG cs.SD eess.SP stat.ML

    Trainable Adaptive Window Switching for Speech Enhancement

    Authors: Yuma Koizumi, Noboru Harada, Yoichi Haneda

    Abstract: This study proposes a trainable adaptive window switching (AWS) method and apply it to a deep-neural-network (DNN) for speech enhancement in the modified discrete cosine transform domain. Time-frequency (T-F) mask processing in the short-time Fourier transform (STFT)-domain is a typical speech enhancement method. To recover the target signal precisely, DNN-based short-time frequency transforms hav… ▽ More

    Submitted 19 February, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: accepted to the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)

  16. arXiv:1810.09137  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score

    Authors: Yuma Koizumi, Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Yoichi Haneda

    Abstract: We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a map** function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squa… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol.26, Issue.10, 2018

  17. arXiv:1810.09133  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma

    Authors: Yuma Koizumi, Shoichiro Saito, Hisashi Uematsum Yuta Kawachi, Noboru Harada

    Abstract: This paper proposes a novel optimization principle and its implementation for unsupervised anomaly detection in sound (ADS) using an autoencoder (AE). The goal of unsupervised-ADS is to detect unknown anomalous sound without training data of anomalous sound. Use of an AE as a normal model is a state-of-the-art technique for unsupervised-ADS. To decrease the false positive rate (FPR), the AE is tra… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018