Showing 1–2 of 2 results for author: Leow, C S

Search v0.5.6 released 2020-02-24

arXiv:2203.15473 [pdf, other]

eess.AS

Frequency-Directional Attention Model for Multilingual Automatic Speech Recognition

Authors: Akihiro Dobashi, Chee Siang Leow, Hiromitsu Nishizaki

Abstract: This paper proposes a model for transforming speech features using the frequency-directional attention model for End-to-End (E2E) automatic speech recognition. The idea is based on the hypothesis that in the phoneme system of each language, the characteristics of the frequency bands of speech when uttering them are different. By transforming the input Mel filter bank features with an attention mod… ▽ More This paper proposes a model for transforming speech features using the frequency-directional attention model for End-to-End (E2E) automatic speech recognition. The idea is based on the hypothesis that in the phoneme system of each language, the characteristics of the frequency bands of speech when uttering them are different. By transforming the input Mel filter bank features with an attention model that characterizes the frequency direction, a feature transformation suitable for ASR in each language can be expected. This paper introduces a Transformer-encoder as a frequency-directional attention model. We evaluated the proposed method on a multilingual E2E ASR system for six different languages and found that the proposed method could achieve, on average, 5.3 points higher accuracy than the ASR model for each language by introducing the frequency-directional attention mechanism. Furthermore, visualization of the attention weights based on the proposed method suggested that it is possible to transform acoustic features considering the frequency characteristics of each language. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: submitted to INTERSPEECH2022
arXiv:2104.01384 [pdf, other]

eess.AS cs.CL

ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi

Authors: Yu Wang, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, Hiromitsu Nishizaki

Abstract: This paper describes the ExKaldi-RT online automatic speech recognition (ASR) toolkit that is implemented based on the Kaldi ASR toolkit and Python language. ExKaldi-RT provides tools for building online recognition pipelines. While similar tools are available built on Kaldi, a key feature of ExKaldi-RT that it works on Python, which has an easy-to-use interface that allows online ASR system devel… ▽ More This paper describes the ExKaldi-RT online automatic speech recognition (ASR) toolkit that is implemented based on the Kaldi ASR toolkit and Python language. ExKaldi-RT provides tools for building online recognition pipelines. While similar tools are available built on Kaldi, a key feature of ExKaldi-RT that it works on Python, which has an easy-to-use interface that allows online ASR system developers to develop original research, such as by applying neural network-based signal processing and by decoding model trained with deep learning frameworks. We performed benchmark experiments on the minimum LibriSpeech corpus, and it showed that ExKaldi-RT could achieve competitive ASR performance in real-time recognition. △ Less

Submitted 8 August, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: Accepted at the IEEE 10th Global Conference on Consumer Electronics

Search v0.5.6 released 2020-02-24