Showing 1–2 of 2 results for author: Chhetri, A

Search v0.5.6 released 2020-02-24

arXiv:2406.17825 [pdf]

cs.CL cs.SD eess.AS

doi 10.1109/ICICT54344.2022.9850832

Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet

Authors: Manish Dhakal, Arman Chhetri, Aman Kumar Gupta, Prabin Lamichhane, Suraj Pandey, Subarna Shakya

Abstract: This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequen… ▽ More This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequency Cepstral Coefficients (MFCCs) are used as audio features to feed into the model. The model having Bidirectional LSTM paired with ResNet and one-dimensional CNN produces the best results for this dataset out of all the models (neural networks with variations of LSTM, GRU, CNN, and ResNet) that have been trained so far. This novel model uses Connectionist Temporal Classification (CTC) function for loss calculation during training and CTC beam search decoding for predicting characters as the most likely sequence of Nepali text. On the test dataset, the character error rate (CER) of 17.06 percent has been achieved. The source code is available at: https://github.com/manishdhakal/ASR-Nepali-using-CNN-BiLSTM-ResNet. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Accepted at 2022 International Conference on Inventive Computation Technologies (ICICT), IEEE

Journal ref: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 515-521
arXiv:1904.08971 [pdf, ps, other]

cs.SD cs.MM eess.AS

On Acoustic Modeling for Broadband Beamforming

Authors: Amit Chhetri, Mohamed Mansour, Wontak Kim, Guangdong Pan

Abstract: In this work, we describe limitations of the free-field propagation model for designing broadband beamformers for microphone arrays on a rigid surface. Towards this goal, we describe a general framework for quantifying the microphone array performance in a general wave-field by directly solving the acoustic wave equation. The model utilizes Finite-Element-Method (FEM) for evaluating the response o… ▽ More In this work, we describe limitations of the free-field propagation model for designing broadband beamformers for microphone arrays on a rigid surface. Towards this goal, we describe a general framework for quantifying the microphone array performance in a general wave-field by directly solving the acoustic wave equation. The model utilizes Finite-Element-Method (FEM) for evaluating the response of the microphone array surface to background 3D planar and spherical waves. The effectiveness of the framework is established by designing and evaluating a representative broadband beamformer under realistic acoustic conditions. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: 5 pages, conference

MSC Class: 94A12; 94A40; 94A15 ACM Class: H.1.2; H.5.1

Journal ref: European Signal Processing Conference (EUSIPCO 2019)

Search v0.5.6 released 2020-02-24