Skip to main content

Showing 1–7 of 7 results for author: Nishizaki, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2307.01546  [pdf, other

    cs.SD eess.AS

    Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure

    Authors: Yikang Wang, Hiromitsu Nishizaki, Ming Li

    Abstract: Finding synthetic artifacts of spoofing data will help the anti-spoofing countermeasures (CMs) system discriminate between spoofed and real speech. The Conformer combines the best of convolutional neural network and the Transformer, allowing it to aggregate global and local information. This may benefit the CM system to capture the synthetic artifacts hidden both locally and globally. In this pape… ▽ More

    Submitted 30 October, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: 7 pages, 2 figures

  2. arXiv:2211.06546  [pdf, other

    cs.SD eess.AS

    Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing Countermeasure Against Codec Variabilities

    Authors: Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li

    Abstract: A reliable voice anti-spoofing countermeasure system needs to robustly protect automatic speaker verification (ASV) systems in various kinds of spoofing scenarios. However, the performance of countermeasure systems could be degraded by channel effects and codecs. In this paper, we show that using the low-frequency subbands of signals as input can mitigate the negative impact introduced by codecs o… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures, accepted by ISCSLP 2022

  3. arXiv:2203.16085  [pdf, other

    cs.SD eess.AS

    Combination of Time-domain, Frequency-domain, and Cepstral-domain Acoustic Features for Speech Commands Classification

    Authors: Yikang Wang, Hiromitsu Nishizaki

    Abstract: In speech-related classification tasks, frequency-domain acoustic features such as logarithmic Mel-filter bank coefficients (FBANK) and cepstral-domain acoustic features such as Mel-frequency cepstral coefficients (MFCC) are often used. However, time-domain features perform more effectively in some sound classification tasks which contain non-vocal or weakly speech-related sounds. We previously pr… ▽ More

    Submitted 16 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: 5 pages, 4 figures

  4. arXiv:2203.15473  [pdf, other

    eess.AS

    Frequency-Directional Attention Model for Multilingual Automatic Speech Recognition

    Authors: Akihiro Dobashi, Chee Siang Leow, Hiromitsu Nishizaki

    Abstract: This paper proposes a model for transforming speech features using the frequency-directional attention model for End-to-End (E2E) automatic speech recognition. The idea is based on the hypothesis that in the phoneme system of each language, the characteristics of the frequency bands of speech when uttering them are different. By transforming the input Mel filter bank features with an attention mod… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: submitted to INTERSPEECH2022

  5. arXiv:2110.03511  [pdf, other

    eess.AS cs.LG cs.SD

    Peer Collaborative Learning for Polyphonic Sound Event Detection

    Authors: Hayato Endo, Hiromitsu Nishizaki

    Abstract: This paper describes that semi-supervised learning called peer collaborative learning (PCL) can be applied to the polyphonic sound event detection (PSED) task, which is one of the tasks in the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. Many deep learning models have been studied to find out what kind of sound events occur where and for how long in a given audio c… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  6. arXiv:2104.01384  [pdf, other

    eess.AS cs.CL

    ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi

    Authors: Yu Wang, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, Hiromitsu Nishizaki

    Abstract: This paper describes the ExKaldi-RT online automatic speech recognition (ASR) toolkit that is implemented based on the Kaldi ASR toolkit and Python language. ExKaldi-RT provides tools for building online recognition pipelines. While similar tools are available built on Kaldi, a key feature of ExKaldi-RT that it works on Python, which has an easy-to-use interface that allows online ASR system devel… ▽ More

    Submitted 8 August, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

    Comments: Accepted at the IEEE 10th Global Conference on Consumer Electronics

  7. arXiv:1904.04364  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Audio Classification of Bit-Representation Waveform

    Authors: Masaki Okawa, Takuya Saito, Naoki Sawada, Hiromitsu Nishizaki

    Abstract: This study investigated the waveform representation for audio signal classification. Recently, many studies on audio waveform classification such as acoustic event detection and music genre classification have been published. Most studies on audio waveform classification have proposed the use of a deep learning (neural network) framework. Generally, a frequency analysis method such as Fourier tran… ▽ More

    Submitted 18 September, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: Accepted at INTERSPEECH2019