Skip to main content

Showing 1–12 of 12 results for author: Calamia, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2312.13707  [pdf, other

    eess.AS cs.SD

    Blind Localization of Room Reflections with Application to Spatial Audio

    Authors: Yogev Hadadi, Vladimir Tourbabin, Paul Calamia, Boaz Rafaely

    Abstract: Blind estimation of early room reflections, without knowledge of the room impulse response, holds substantial value. The FF-PHALCOR (Frequency Focusing PHase ALigned CORrelation), method was recently developed for this objective, extending the original PHALCOR method from spherical to arbitrary arrays. However, previous studies only compared the two methods under limited conditions without present… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Journal ref: in 2023 Immersive and 3D Audio: from Architecture to Automotive (I3DA 2023), Bologna, Italy, September 2023

  2. arXiv:2301.02184  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Chat2Map: Efficient Scene Map** from Multi-Ego Conversations

    Authors: Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Krishna Ithapu

    Abstract: Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way? We seek to answer this question by proposing a new problem: efficiently building the map of a previously unseen 3D environment by exploiting shared information in the egocentric audio-visual observations of participants in a natural conversation. Our hypothesis is that as multi… ▽ More

    Submitted 20 April, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

    Comments: Accepted to CVPR 2023

  3. arXiv:2211.04473  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Improved Room Impulse Response Estimation for Speech Recognition

    Authors: Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia

    Abstract: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture tha… ▽ More

    Submitted 19 March, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted at ICASSP 2023. More results are available at https://anton-jeran.github.io/S2IR/

  4. Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses

    Authors: Thomas Deppisch, Sebastià V. Amengual Garí, Paul Calamia, Jens Ahrens

    Abstract: Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. T… ▽ More

    Submitted 31 January, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: This article has been accepted for publication in the IEEE/ACM Transactions on Audio, Speech, and Language Processing. (c) 2023 IEEE

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 927-942, 2023

  5. arXiv:2206.12297  [pdf, other

    eess.AS cs.SD

    SAQAM: Spatial Audio Quality Assessment Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Audio quality assessment is critical for assessing the perceptual realism of sounds. However, the time and expense of obtaining ''gold standard'' human judgments limit the availability of such data. For AR&VR, good perceived sound quality and localizability of sources are among the key elements to ensure complete immersion of the user. Our work introduces SAQAM which uses a multi-task learning fra… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  6. arXiv:2206.08312  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

    Authors: Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman

    Abstract: We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio rendering for 3D environments. Given a 3D mesh of a real-world environment, SoundSpaces can generate highly realistic acoustics for arbitrary sounds captured from arbitrary microphone locations. Together with existing 3D visual assets, it supports an array of audio-visual research tasks, such as audio-visual navigation, m… ▽ More

    Submitted 23 January, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Camera-ready version. Website: https://soundspaces.org. Project page: https://vision.cs.utexas.edu/projects/soundspaces2

  7. arXiv:2202.06875  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Visual Acoustic Matching

    Authors: Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman

    Abstract: We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment. Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials. To address this novel task, we propose a cross-modal tr… ▽ More

    Submitted 13 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Project page: https://vision.cs.utexas.edu/projects/visual-acoustic-matching. Accepted at CVPR 2022

  8. arXiv:2110.13130  [pdf, other

    cs.SD eess.AS

    Multichannel Speech Enhancement without Beamforming

    Authors: Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

    Abstract: Deep neural networks are often coupled with traditional spatial filters, such as MVDR beamformers for effectively exploiting spatial information. Even though single-stage end-to-end supervised models can obtain impressive enhancement, combining them with a traditional beamformer and a DNN-based post-filter in a multistage processing provides additional improvements. In this work, we propose a two-… ▽ More

    Submitted 6 April, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in ICASSP 2022

  9. arXiv:2110.11844  [pdf, other

    cs.SD eess.AS

    Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

    Authors: Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

    Abstract: Deep neural networks (DNNs) are very effective for multichannel speech enhancement with fixed array geometries. However, it is not trivial to use DNNs for ad-hoc arrays with unknown order and placement of microphones. We propose a novel triple-path network for ad-hoc array processing in the time domain. The key idea in the network design is to divide the overall processing into spatial processing… ▽ More

    Submitted 4 July, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in INTERSPEECH 2022

  10. arXiv:2110.10757  [pdf, other

    cs.SD eess.AS

    TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

    Authors: Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

    Abstract: In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), wh… ▽ More

    Submitted 6 April, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in ICASSP 2022

  11. arXiv:2107.07503  [pdf, other

    eess.AS cs.SD

    Filtered Noise Sha** for Time Domain Room Impulse Response Estimation From Reverberant Speech

    Authors: Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia

    Abstract: Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality. In this work, we propose FiNS, a Filtered Noise Sha** network that directly estimates the time domain room impulse response (RIR) from reverberant speech. Our domain-in… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: Accepted to WASPAA 2021. See details at https://facebookresearch.github.io/FiNS/

  12. DPLM: A Deep Perceptual Spatial-Audio Localization Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, we tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general pur… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.