Skip to main content

Showing 1–9 of 9 results for author: Aralikatti, R

.
  1. arXiv:2310.08746  [pdf, other

    cs.LG

    Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning

    Authors: Aakriti Agrawal, Rohith Aralikatti, Yanchao Sun, Furong Huang

    Abstract: Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling real-world challenges. However, the seamless transition of trained policies from simulations to real-world requires it to be robust to various environmental uncertainties. Existing works focus on finding Nash Equilibrium or the optimal policy under uncertainty in one environment variable (i.e. action, state or reward). This… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  2. arXiv:2212.05360  [pdf, other

    eess.AS cs.AI cs.LG

    Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

    Authors: Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha

    Abstract: We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets. Our approach is designed to recover the reverb-free signal from a reverberant speech signal. We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation. We use the GWA dataset that con… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  3. arXiv:2211.08303  [pdf, other

    eess.AS cs.AI cs.LG cs.SD stat.ML

    Reverberation as Supervision for Speech Separation

    Authors: Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux

    Abstract: This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation. Prior methods for unsupervised separation required the synthesis of mixtures of mixtures or assumed the existence of a teacher model, making them difficult to consider as potential methods explaining the emergence of separation abilities in an animal's audito… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 pages, 2 figures, 4 tables. Submitted to ICASSP 2023

  4. arXiv:2205.09248  [pdf, other

    cs.SD cs.CV cs.GR cs.LG cs.MM eess.AS

    MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

    Authors: Anton Ratnarajah, Zhenyu Tang, Rohith Chandrashekar Aralikatti, Dinesh Manocha

    Abstract: We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing. Our method can handle input triangular meshes with arbitrary topologies (2K - 3M triangles). We present a novel training technique to train MESH2IR us… ▽ More

    Submitted 11 July, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: Accepted to ACM Multimedia 2022. More results and source code is available at https://anton-jeran.github.io/M2IR/

  5. GWA: A Large High-Quality Acoustic Dataset for Audio Processing

    Authors: Zhenyu Tang, Rohith Aralikatti, Anton Ratnarajah, Dinesh Manocha

    Abstract: We present the Geometric-Wave Acoustic (GWA) dataset, a large-scale audio dataset of about 2 million synthetic room impulse responses (IRs) and their corresponding detailed geometric and simulation configurations. Our dataset samples acoustic environments from over 6.8K high-quality diverse and professionally designed houses represented as semantically labeled 3D meshes. We also present a novel re… ▽ More

    Submitted 20 June, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

  6. arXiv:2107.09177  [pdf, other

    eess.AS cs.SD

    Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning

    Authors: Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

    Abstract: We present a novel approach that improves the performance of reverberant speech separation. Our approach is based on an accurate geometric acoustic simulator (GAS) which generates realistic room impulse responses (RIRs) by modeling both specular and diffuse reflections. We also propose three training methods - pre-training, multi-stage training and curriculum learning that significantly improve se… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

  7. arXiv:2001.10832  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.IV

    Audio-Visual Decision Fusion for WFST-based and seq2seq Models

    Authors: Rohith Aralikatti, Sharad Roy, Abhinav Thanda, Dilip Kumar Margam, Pujitha Appan Kandala, Tanay Sharma, Shankar M Venkatesan

    Abstract: Under noisy conditions, speech recognition systems suffer from high Word Error Rates (WER). In such cases, information from the visual modality comprising the speaker lip movements can help improve the performance. In this work, we propose novel methods to fuse information from audio and visual modalities at inference time. This enables us to train the acoustic and visual models independently. Fir… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: Submitted for review to ICASSP 2020 on October 21st, 2019

  8. arXiv:1906.12170  [pdf, other

    cs.CV cs.LG cs.SD eess.AS eess.IV

    LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

    Authors: Dilip Kumar Margam, Rohith Aralikatti, Tanay Sharma, Abhinav Thanda, Pujitha A K, Sharad Roy, Shankar M Venkatesan

    Abstract: In recent years, deep learning based machine lipreading has gained prominence. To this end, several architectures such as LipNet, LCANet and others have been proposed which perform extremely well compared to traditional lipreading DNN-HMM hybrid systems trained on DCT features. In this work, we propose a simpler architecture of 3D-2D-CNN-BLSTM network with a bottleneck layer. We also present analy… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

    Comments: Submitted to Interspeech 2019

  9. arXiv:1804.04353  [pdf, other

    eess.AS cs.AI eess.SP stat.ML

    Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks

    Authors: Rohith Aralikatti, Dilip Margam, Tanay Sharma, Thanda Abhinav, Shankar M Venkatesan

    Abstract: This paper demonstrates two novel methods to estimate the global SNR of speech signals. In both methods, Deep Neural Network-Hidden Markov Model (DNN-HMM) acoustic model used in speech recognition systems is leveraged for the additional task of SNR estimation. In the first method, the entropy of the DNN-HMM output is computed. Recent work on bayesian deep learning has shown that a DNN-HMM trained… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.