Skip to main content

Showing 1–14 of 14 results for author: Richard, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.06185  [pdf, other

    eess.AS cs.LG cs.SD

    EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

    Authors: Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann

    Abstract: We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various m… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2403.18821  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

    Authors: Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

    Abstract: We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthes… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project site: https://facebookresearch.github.io/real-acoustic-fields/

  3. arXiv:2401.12160  [pdf, other

    eess.AS

    ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter

    Authors: Yi-Chiao Wu, Dejan Marković, Steven Krenn, Israel D. Gebru, Alexander Richard

    Abstract: Although recent mainstream waveform-domain end-to-end (E2E) neural audio codecs achieve impressive coded audio quality with a very low bitrate, the quality gap between the coded and natural audio is still significant. A generative adversarial network (GAN) training is usually required for these E2E neural codecs because of the difficulty of direct phase modeling. However, such adversarial learning… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, 2 tables. Proc. ICASSP, 2024

  4. arXiv:2401.01206  [pdf, other

    eess.AS

    Room impulse response reconstruction with physics-informed deep learning

    Authors: Xenofon Karakonstantis, Diego Caviedes-Nozal, Antoine Richard, Efren Fernandez-Grande

    Abstract: A method is presented for estimating and reconstructing the sound field within a room using physics-informed neural networks. By incorporating a limited set of experimental room impulse responses as training data, this approach combines neural network processing capabilities with the underlying physics of sound propagation, as articulated by the wave equation. The network's ability to estimate par… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Submitted to Journal of Acoustical Society of America (JASA)

    ACM Class: J.2; G.1.8; I.6.4

  5. arXiv:2311.06285  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio

    Authors: Xudong Xu, Dejan Markovic, Jacob Sandakly, Todd Keebler, Steven Krenn, Alexander Richard

    Abstract: While 3D human body modeling has received much attention in computer vision, modeling the acoustic equivalent, i.e. modeling 3D spatial audio produced by body motion and speech, has fallen short in the community. To close this gap, we present a model that can generate accurate 3D spatial audio for full human bodies. The system consumes, as input, audio signals from headset microphones and body pos… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  6. AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

    Authors: Yi-Chiao Wu, Israel D. Gebru, Dejan Marković, Alexander Richard

    Abstract: A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e.\ encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the s… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 5 pages, 1 figure, 5 tables. Proc. ICASSP, 2023

  7. arXiv:2301.08730  [pdf, other

    cs.CV cs.SD eess.AS

    Novel-View Acoustic Synthesis

    Authors: Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

    Abstract: We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. To benc… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Accepted at CVPR 2023. Project page: https://vision.cs.utexas.edu/projects/nvas

  8. arXiv:2207.03697  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    End-to-End Binaural Speech Synthesis

    Authors: Wen Chin Huang, Dejan Markovic, Alexander Richard, Israel Dejene Gebru, Anjali Menon

    Abstract: In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb. The network is a modified vector-quantized variational autoencoder, trained with several carefully designed objectives,… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: Accepted to INTERSPEECH 2022. Demo link: https://unilight.github.io/Publication-Demos/publications/e2e-binaural-synthesis

  9. arXiv:2206.15423  [pdf, other

    cs.SD cs.LG eess.AS

    Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain

    Authors: Dejan Markovic, Alexandre Defossez, Alexander Richard

    Abstract: We present a single-stage casual waveform-to-waveform multichannel model that can separate moving sound sources based on their broad spatial locations in a dynamic acoustic scene. We divide the scene into two spatial regions containing, respectively, the target and the interfering sound sources. The model is trained end-to-end and performs spatial processing implicitly, without any components base… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: Interspeech 2022

  10. arXiv:2203.17263  [pdf, other

    cs.CV cs.LG eess.AS

    Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

    Authors: Karren Yang, Dejan Markovic, Steven Krenn, Vasu Agrawal, Alexander Richard

    Abstract: Since facial actions such as lip movements contain significant information about speech content, it is not surprising that audio-visual speech enhancement methods are more accurate than their audio-only counterparts. Yet, state-of-the-art approaches still struggle to generate clean, realistic speech without noise artifacts and unnatural distortions in challenging acoustic environments. In this pap… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  11. arXiv:2202.05256  [pdf, other

    eess.AS cs.LG cs.SD

    Conditional Diffusion Probabilistic Model for Speech Enhancement

    Authors: Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

    Abstract: Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorit… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

  12. arXiv:2202.03416  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks

    Authors: Alexander Richard, Peter Dodds, Vamsi Krishna Ithapu

    Abstract: Impulse response estimation in high noise and in-the-wild settings, with minimal control of the underlying data distributions, is a challenging problem. We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning. Our framework is driven by a carefully designed neural network that jointly estimates the impulse response… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  13. arXiv:2010.03173  [pdf, other

    cs.CV cs.LG eess.IV

    A Study on Trees's Knots Prediction from their Bark Outer-Shape

    Authors: Mejri Mohamed, Antoine Richard, Cedric Pradalier

    Abstract: In the industry, the value of wood-logs strongly depends on their internal structure and more specifically on the knots' distribution inside the trees. As of today, CT-scanners are the prevalent tool to acquire accurate images of the trees internal structure. However, CT-scanners are expensive, and slow, making their use impractical for most industrial applications. Knowing where the knots are wit… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:2002.04571

  14. arXiv:2002.04571  [pdf, other

    cs.CV cs.LG eess.IV

    A Survey On 3D Inner Structure Prediction from its Outer Shape

    Authors: Mohamed Mejri, Antoine Richard, Cédric Pradalier

    Abstract: The analysis of the internal structure of trees is highly important for both forest experts, biological scientists, and the wood industry. Traditionally, CT-scanners are considered as the most efficient way to get an accurate inner representation of the tree. However, this method requires an important investment and reduces the cost-effectiveness of this operation. Our goal is to design neural-net… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.