Search | arXiv e-print repository

Confinement induced three-dimensional trajectories of microswimmers in rectangular channels

Authors: Byjesh N. Radhakrishnan, Ahana Purushothaman, Ranabir Dey, Sumesh P Thampi

Abstract: We study the trajectories of a model microorganism inside three-dimensional channels with square and rectangular cross-sections. Using (i) numerical simulations based on lattice-Boltzmann method, and (ii) analytical expressions using far-field hydrodynamic approximations and method of images we systematically investigate the role of the strength and finite-size of the squirmer, confinement dimensi… ▽ More We study the trajectories of a model microorganism inside three-dimensional channels with square and rectangular cross-sections. Using (i) numerical simulations based on lattice-Boltzmann method, and (ii) analytical expressions using far-field hydrodynamic approximations and method of images we systematically investigate the role of the strength and finite-size of the squirmer, confinement dimensions, and initial conditions in determining the three dimensional trajectories of microswimmers. Our results indicate that the hydrodynamic interactions with the confining walls of the channel significantly affect the swimming speed and trajectory of the model microswimmer. Specifically, pullers always display sliding motion inside the channel: weak pullers slide through the channel centerline, while strong pullers slide through a path close to any of the walls. Pushers generally follow helical motion in a square channel. Unlike pullers and pushers, the trajectories of neutral swimmers are not easy to generalize, and are sensitive to the initial conditions. Despite this diversity in the trajectories, the far-field expressions capture the essential features of channel-confined swimmers. Finally, we propose a method based on the principle of superposition to understand the origin of the three-dimensional trajectories of channel confined swimmers. Such construction allows us to predict and justify the origin of apparently complex 3D trajectories generated by different types of swimmers in channels with square and rectangular cross sections. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2309.13537 [pdf, other]

doi 10.1109/TASLP.2023.3317570

Speech enhancement with frequency domain auto-regressive modeling

Authors: Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy

Abstract: Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to improve the audible quality and to reduce the error rates in applications like automatic speech recognition (ASR). We propose a unified framework of speech dereverberation for improving the speech quality and the ASR performa… ▽ More Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to improve the audible quality and to reduce the error rates in applications like automatic speech recognition (ASR). We propose a unified framework of speech dereverberation for improving the speech quality and the ASR performance using the approach of envelope-carrier decomposition provided by an autoregressive (AR) model. The AR model is applied in the frequency domain of the sub-band speech signals to separate the envelope and carrier parts. A novel neural architecture based on dual path long short term memory (DPLSTM) model is proposed, which jointly enhances the sub-band envelope and carrier components. The dereverberated envelope-carrier signals are modulated and the sub-band signals are synthesized to reconstruct the audio signal back. The DPLSTM model for dereverberation of envelope and carrier components also allows the joint learning of the network weights for the down stream ASR task. In the ASR tasks on the REVERB challenge dataset as well as on the VOiCES dataset, we illustrate that the joint learning of speech dereverberation network and the E2E ASR model yields significant performance improvements over the baseline ASR system trained on log-mel spectrogram as well as other benchmarks for dereverberation (average relative improvements of 10-24% over the baseline system). The speech quality improvements, evaluated using subjective listening tests, further highlight the improved quality of the reconstructed audio. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: 10 pages

Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing 2023

arXiv:2108.05520 [pdf, other]

Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

Authors: Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy

Abstract: The task of speech recognition in far-field environments is adversely affected by the reverberant artifacts that elicit as the temporal smearing of the sub-band envelopes. In this paper, we develop a neural model for speech dereverberation using the long-term sub-band envelopes of speech. The sub-band envelopes are derived using frequency domain linear prediction (FDLP) which performs an autoregre… ▽ More The task of speech recognition in far-field environments is adversely affected by the reverberant artifacts that elicit as the temporal smearing of the sub-band envelopes. In this paper, we develop a neural model for speech dereverberation using the long-term sub-band envelopes of speech. The sub-band envelopes are derived using frequency domain linear prediction (FDLP) which performs an autoregressive estimation of the Hilbert envelopes. The neural dereverberation model estimates the envelope gain which when applied to reverberant signals suppresses the late reflection components in the far-field signal. The dereverberated envelopes are used for feature extraction in speech recognition. Further, the sequence of steps involved in envelope dereverberation, feature extraction and acoustic modeling for ASR can be implemented as a single neural processing pipeline which allows the joint learning of the dereverberation network and the acoustic model. Several experiments are performed on the REVERB challenge dataset, CHiME-3 dataset and VOiCES dataset. In these experiments, the joint learning of envelope dereverberation and acoustic model yields significant performance improvements over the baseline ASR system based on log-mel spectrogram as well as other past approaches for dereverberation (average relative improvements of 10-24% over the baseline system). A detailed analysis on the choice of hyper-parameters and the cost function involved in envelope dereverberation is also provided. △ Less

Submitted 13 August, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: arXiv admin note: text overlap with arXiv:2008.03339

arXiv:2108.03975 [pdf, other]

End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes

Authors: Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

Abstract: The end-to-end (E2E) automatic speech recognition (ASR) systems are often required to operate in reverberant conditions, where the long-term sub-band envelopes of the speech are temporally smeared. In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. The temporal envelopes are modeled using the framework of frequency domain linear… ▽ More The end-to-end (E2E) automatic speech recognition (ASR) systems are often required to operate in reverberant conditions, where the long-term sub-band envelopes of the speech are temporally smeared. In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. The temporal envelopes are modeled using the framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelope gain based enhancement of temporal envelopes. The model architecture consists of a combination of convolutional and long short term memory (LSTM) neural network layers. Further, the envelope dereverberation, feature extraction and acoustic modeling using transformer based E2E ASR can all be jointly optimized for the speech recognition task. The joint optimization ensures that the dereverberation model targets the ASR cost function. We perform E2E speech recognition experiments on the REVERB challenge dataset as well as on the VOiCES dataset. In these experiments, the proposed joint modeling approach yields significant improvements compared to the baseline E2E ASR system (average relative improvements of 21% on the REVERB challenge dataset and about 10% on the VOiCES dataset). △ Less

Submitted 17 February, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: 5 pages with refrences, e2e asr

arXiv:2106.12763 [pdf, other]

SRIB-LEAP submission to Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing

Authors: R G Prithvi Raj, Rohit Kumar, M K Jayesh, Anurenjan Purushothaman, Sriram Ganapathy, M A Basha Shaik

Abstract: This paper presents the details of the SRIB-LEAP submission to the ConferencingSpeech challenge 2021. The challenge involved the task of multi-channel speech enhancement to improve the quality of far field speech from microphone arrays in a video conferencing room. We propose a two stage method involving a beamformer followed by single channel enhancement. For the beamformer, we incorporated self-… ▽ More This paper presents the details of the SRIB-LEAP submission to the ConferencingSpeech challenge 2021. The challenge involved the task of multi-channel speech enhancement to improve the quality of far field speech from microphone arrays in a video conferencing room. We propose a two stage method involving a beamformer followed by single channel enhancement. For the beamformer, we incorporated self-attention mechanism as inter-channel processing layer in the filter-and-sum network (FaSNet), an end-to-end time-domain beamforming system. The single channel speech enhancement is done in log spectral domain using convolution neural network (CNN)-long short term memory (LSTM) based architecture. We achieved improvements in objective quality metrics - perceptual evaluation of speech quality (PESQ) of 0.5 on the noisy data. On subjective quality evaluation, the proposed approach improved the mean opinion score (MOS) by an absolute measure of 0.9 over the noisy audio. △ Less

Submitted 24 June, 2021; originally announced June 2021.

arXiv:2008.03339 [pdf, other]

Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

Authors: Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy

Abstract: Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dereverberation of speech. The temporal envelopes are derived using the autoregressive modeling framework of frequency domain linear prediction (FDLP). Th… ▽ More Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dereverberation of speech. The temporal envelopes are derived using the autoregressive modeling framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelop gain based enhancement of temporal envelopes and it consists of a series of convolutional and recurrent neural network layers. The enhanced sub-band envelopes are used to generate features for automatic speech recognition (ASR). The ASR experiments are performed on the REVERB challenge dataset as well as the CHiME-3 dataset. In these experiments, the proposed neural enhancement approach provides significant improvements over a baseline ASR system with beamformed audio (average relative improvements of 21% on the development set and about 11% on the evaluation set in word error rates for REVERB challenge dataset). △ Less

Submitted 7 August, 2020; originally announced August 2020.

arXiv:2005.11258 [pdf, other]

LEAP Submission to CHiME-6 ASR Challenge}

Authors: Anirudh Sreeram, Anurenjan Purushothaman, Rohit Kumar, Sriram Ganapathy

Abstract: This paper reports the LEAP submission to the CHiME-6 challenge. The CHiME-6 Automatic Speech Recognition (ASR) challenge Track 1 involved the recognition of speech in noisy and reverberant acoustic conditions in home environments with multiple-party interactions. For the challenge submission, the LEAP system used extensive data augmentation and a factorized time-delay neural network (TDNN) archit… ▽ More This paper reports the LEAP submission to the CHiME-6 challenge. The CHiME-6 Automatic Speech Recognition (ASR) challenge Track 1 involved the recognition of speech in noisy and reverberant acoustic conditions in home environments with multiple-party interactions. For the challenge submission, the LEAP system used extensive data augmentation and a factorized time-delay neural network (TDNN) architecture. We also explored a neural architecture that interleaved the TDNN layers with LSTM layers. The submitted system improved the Kaldi recipe by 2% in terms of relative word-error-rate improvements. △ Less

Submitted 22 May, 2020; originally announced May 2020.

arXiv:1911.12617 [pdf, other]

Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR

Authors: Rohit Kumar, Anirudh Sreeram, Anurenjan Purushothaman, Sriram Ganapathy

Abstract: The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based o… ▽ More The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based on signal enhancement and beamforming using multi-channel linear prediction serve as the required mask estimate. In this way, the model training can also be carried out on real recordings of noisy speech rather than simulated ones alone done in a typical teacher model. Several experiments performed on noisy and reverberant environments in the CHiME-3 corpus as well as the REVERB challenge corpus highlight the effectiveness of the proposed approach. The ASR results for the proposed approach provide performances that are significantly better than a teacher model trained on an out-of-domain dataset and on par with the oracle mask estimators trained on the in-domain dataset. △ Less

Submitted 28 November, 2019; originally announced November 2019.

arXiv:1911.05504 [pdf, other]

3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

Authors: Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

Abstract: Automatic speech recognition in multi-channel reverberant conditions is a challenging task. The conventional way of suppressing the reverberation artifacts involves a beamforming based enhancement of the multi-channel speech signal, which is used to extract spectrogram based features for a neural network acoustic model. In this paper, we propose to extract features directly from the multi-channel… ▽ More Automatic speech recognition in multi-channel reverberant conditions is a challenging task. The conventional way of suppressing the reverberation artifacts involves a beamforming based enhancement of the multi-channel speech signal, which is used to extract spectrogram based features for a neural network acoustic model. In this paper, we propose to extract features directly from the multi-channel speech signal using a multi variate autoregressive (MAR) modeling approach, where the correlations among all the three dimensions of time, frequency and channel are exploited. The MAR features are fed to a convolutional neural network (CNN) architecture which performs the joint acoustic modeling on the three dimensions. The 3-D CNN architecture allows the combination of multi-channel features that optimize the speech recognition cost compared to the traditional beamforming models that focus on the enhancement task. Experiments are conducted on the CHiME-3 and REVERB Challenge dataset using multi-channel reverberant speech. In these experiments, the proposed 3-D feature and acoustic modeling approach provides significant improvements over an ASR system trained with beamformed audio (average relative improvements of 10 % and 9 % in word error rates for CHiME-3 and REVERB Challenge datasets respectively. △ Less

Submitted 26 January, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

Showing 1–9 of 9 results for author: Purushothaman, A