Skip to main content

Showing 1–10 of 10 results for author: Ratnarajah, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.07416  [pdf, other

    cs.SD eess.AS

    M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

    Authors: Anton Ratnarajah, Shi-Xiong Zhang, Dong Yu

    Abstract: We introduce M3-AUDIODEC, an innovative neural spatial audio codec designed for efficient compression of multi-channel (binaural) speech in both single and multi-speaker scenarios, while retaining the spatial location information of each speaker. This model boasts versatility, allowing configuration and training tailored to a predetermined set of multi-channel, multi-speaker, and multi-spatial ove… ▽ More

    Submitted 22 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: More results and source code are available at https://anton-jeran.github.io/MAD/

  2. arXiv:2308.12370  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    AdVerb: Visually Guided Audio Dereverberation

    Authors: Sanjoy Chowdhury, Sreyan Ghosh, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha

    Abstract: We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, our approach incorporates the complementary visual modality to perform audio dereverberation. Given an image of the environment where the reverberated sound signal has been recorded, AdVe… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023. For project page, see https://gamma.umd.edu/researchdirections/speech/adverb

  3. arXiv:2302.02809  [pdf, other

    eess.AS cs.CV cs.LG cs.MM cs.SD

    Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

    Authors: Anton Ratnarajah, Dinesh Manocha

    Abstract: We present an end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. We propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for indoor 3D models of real environments. Any clean audio or dry audio can be convolved with the generated acoustic effects to render audio corresponding to… ▽ More

    Submitted 1 February, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE VR 2024. Project page: https://anton-jeran.github.io/Listen2Scene/

  4. arXiv:2211.04473  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Improved Room Impulse Response Estimation for Speech Recognition

    Authors: Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia

    Abstract: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture tha… ▽ More

    Submitted 19 March, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted at ICASSP 2023. More results are available at https://anton-jeran.github.io/S2IR/

  5. arXiv:2205.09248  [pdf, other

    cs.SD cs.CV cs.GR cs.LG cs.MM eess.AS

    MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

    Authors: Anton Ratnarajah, Zhenyu Tang, Rohith Chandrashekar Aralikatti, Dinesh Manocha

    Abstract: We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing. Our method can handle input triangular meshes with arbitrary topologies (2K - 3M triangles). We present a novel training technique to train MESH2IR us… ▽ More

    Submitted 11 July, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: Accepted to ACM Multimedia 2022. More results and source code is available at https://anton-jeran.github.io/M2IR/

  6. GWA: A Large High-Quality Acoustic Dataset for Audio Processing

    Authors: Zhenyu Tang, Rohith Aralikatti, Anton Ratnarajah, Dinesh Manocha

    Abstract: We present the Geometric-Wave Acoustic (GWA) dataset, a large-scale audio dataset of about 2 million synthetic room impulse responses (IRs) and their corresponding detailed geometric and simulation configurations. Our dataset samples acoustic environments from over 6.8K high-quality diverse and professionally designed houses represented as semantically labeled 3D meshes. We also present a novel re… ▽ More

    Submitted 20 June, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

  7. arXiv:2110.04057  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    FAST-RIR: Fast neural diffuse room impulse response generator

    Authors: Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

    Abstract: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. Our FAST-RIR takes rectangular room dimensions, listener and speaker positions, and reverberation time as inputs and generates specular and diffuse reflections for a given acoustic environment. Our FAST-RIR is capable of generating… ▽ More

    Submitted 5 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022. More results and source code is available at https://anton-jeran.github.io/FRIR/

  8. arXiv:2107.09177  [pdf, other

    eess.AS cs.SD

    Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning

    Authors: Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

    Abstract: We present a novel approach that improves the performance of reverberant speech separation. Our approach is based on an accurate geometric acoustic simulator (GAS) which generates realistic room impulse responses (RIRs) by modeling both specular and diffuse reflections. We also propose three training methods - pre-training, multi-stage training and curriculum learning that significantly improve se… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

  9. arXiv:2103.16804  [pdf, other

    cs.SD eess.AS

    TS-RIR: Translated synthetic room impulse responses for speech augmentation

    Authors: Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

    Abstract: We present a method for improving the quality of synthetic room impulse responses for far-field speech recognition. We bridge the gap between the fidelity of synthetic room impulse responses (RIRs) and the real room impulse responses using our novel, TS-RIRGAN architecture. Given a synthetic RIR in the form of raw audio, we use TS-RIRGAN to translate it into a real RIR. We also perform real-world… ▽ More

    Submitted 11 November, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: Accepted to IEEE ASRU 2021. Source code is available at https://github.com/GAMMA-UMD/TS-RIR

  10. arXiv:2010.13219  [pdf, other

    cs.SD eess.AS

    IR-GAN: Room Impulse Response Generator for Far-field Speech Recognition

    Authors: Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

    Abstract: We present a Generative Adversarial Network (GAN) based room impulse response generator (IR-GAN) for generating realistic synthetic room impulse responses (RIRs). IR-GAN extracts acoustic parameters from captured real-world RIRs and uses these parameters to generate new synthetic RIRs. We use these generated synthetic RIRs to improve far-field automatic speech recognition in new environments that… ▽ More

    Submitted 6 April, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: conference revision