Showing 1–2 of 2 results for author: Chhetri, A
-
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Authors:
Manish Dhakal,
Arman Chhetri,
Aman Kumar Gupta,
Prabin Lamichhane,
Suraj Pandey,
Subarna Shakya
Abstract:
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequen…
▽ More
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequency Cepstral Coefficients (MFCCs) are used as audio features to feed into the model. The model having Bidirectional LSTM paired with ResNet and one-dimensional CNN produces the best results for this dataset out of all the models (neural networks with variations of LSTM, GRU, CNN, and ResNet) that have been trained so far. This novel model uses Connectionist Temporal Classification (CTC) function for loss calculation during training and CTC beam search decoding for predicting characters as the most likely sequence of Nepali text. On the test dataset, the character error rate (CER) of 17.06 percent has been achieved. The source code is available at: https://github.com/manishdhakal/ASR-Nepali-using-CNN-BiLSTM-ResNet.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
On Acoustic Modeling for Broadband Beamforming
Authors:
Amit Chhetri,
Mohamed Mansour,
Wontak Kim,
Guangdong Pan
Abstract:
In this work, we describe limitations of the free-field propagation model for designing broadband beamformers for microphone arrays on a rigid surface. Towards this goal, we describe a general framework for quantifying the microphone array performance in a general wave-field by directly solving the acoustic wave equation. The model utilizes Finite-Element-Method (FEM) for evaluating the response o…
▽ More
In this work, we describe limitations of the free-field propagation model for designing broadband beamformers for microphone arrays on a rigid surface. Towards this goal, we describe a general framework for quantifying the microphone array performance in a general wave-field by directly solving the acoustic wave equation. The model utilizes Finite-Element-Method (FEM) for evaluating the response of the microphone array surface to background 3D planar and spherical waves. The effectiveness of the framework is established by designing and evaluating a representative broadband beamformer under realistic acoustic conditions.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.