Search | arXiv e-print repository

Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines

Authors: Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, Alberto Bernardini

Abstract: Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN) such that its outpu… ▽ More Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN) such that its output renders target attributes of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a perceptually-motivated time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics, and outperforms existing methods based on genetic algorithms and analytical FDN design. △ Less

Submitted 17 May, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: The article has been submitted to EURASIP Journal on Audio, Speech, and Music Processing on Jan 02, 2024 and is currently under review

arXiv:2312.14658 [pdf, other]

Room Acoustic Rendering Networks with Control of Scattering and Early Reflections

Authors: Matteo Scerbo, Lauri Savioja, Enzo De Sena

Abstract: Room acoustic synthesis can be used in Virtual Reality (VR), Augmented Reality (AR) and gaming applications to enhance listeners' sense of immersion, realism and externalisation. A common approach is to use Geometrical Acoustics (GA) models to compute impulse responses at interactive speed, and fast convolution methods to apply said responses in real time. Alternatively, delay-network-based models… ▽ More Room acoustic synthesis can be used in Virtual Reality (VR), Augmented Reality (AR) and gaming applications to enhance listeners' sense of immersion, realism and externalisation. A common approach is to use Geometrical Acoustics (GA) models to compute impulse responses at interactive speed, and fast convolution methods to apply said responses in real time. Alternatively, delay-network-based models are capable of modeling certain aspects of room acoustics, but with a significantly lower computational cost. In order to bridge the gap between these classes of models, recent work introduced delay network designs that approximate Acoustic Radiance Transfer (ART), a GA model that simulates the transfer of acoustic energy between discrete surface patches in an environment. This paper presents two key extensions of such designs. The first extension involves a new physically-based and stability-preserving design of the feedback matrices, enabling more accurate control of scattering and, more in general, of late reverberation properties. The second extension allows an arbitrary number of early reflections to be modeled with high accuracy, meaning the network can be scaled at will between computational cost and early reverb precision. The proposed extensions are compared to the baseline ART-approximating delay network as well as two reference GA models. The evaluation is based on objective measures of perceptually-relevant features, including frequency-dependent reverberation times, echo density build-up, and early decay time. Results show how the proposed extensions result in a significant improvement over the baseline model, especially for the case of non-convex geometries or the case of unevenly distributed wall absorption, both scenarios of broad practical interest. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. 12 pages, 12 figures, 2 tables

MSC Class: 76Q05 (Primary) 93C43; 94A12 (Secondary)

arXiv:2306.08514 [pdf, other]

Low-Complexity Steered Response Power Map** based on Low-Rank and Sparse Interpolation

Authors: Thomas Dietzen, Enzo De Sena, Toon van Waterschoot

Abstract: For acoustic source localization, a map of the acoustic scene as obtained by the steered response power (SRP) approach can be employed. In SRP, the frequency-weighted output power of a beamformer steered towards a set of candidate locations is obtained from generalized cross-correlations (GCCs). Due to the dense grid of candidate locations, conventional SRP exhibits a high computational complexity… ▽ More For acoustic source localization, a map of the acoustic scene as obtained by the steered response power (SRP) approach can be employed. In SRP, the frequency-weighted output power of a beamformer steered towards a set of candidate locations is obtained from generalized cross-correlations (GCCs). Due to the dense grid of candidate locations, conventional SRP exhibits a high computational complexity. While a number of low-complexity SRP-based localization approaches using non-exhaustive spatial search have been proposed, few studies aim to construct a full SRP map at reduced computational cost. In this paper, we propose two scalable approaches to this problem. Expressing the SRP map as a matrix transform of frequency-domain GCCs, we decompose the SRP matrix into a sampling matrix and an interpolation matrix. While the sampling operation can be implemented efficiently by the inverse fast Fourier transform (iFFT), we propose to use optimal low-rank or sparse approximations of the interpolation matrix for further complexity reduction. The proposed approaches, refered to as sampling + low-rank interpolation-based SRP (SLRI-SRP) and sampling + sparse interpolation-based SRP (SSPI-SRP), are evaluated in a near-field (NF) and a far-field (FF) localization scenario and compared to a state-of-the-art low-rank-based SRP approach (LR-SRP). The results indicate that SSPI-SRP outperforms both SLRI-SRP and LR-SRP over a wide complexity range in terms of approximation error and localization accuracy, achieving a complexity reduction of two to three orders of magnitude as compared to conventional SRP. A MATLAB implementation is available online. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2012.09499 [pdf, ps, other]

doi 10.1109/WASPAA52581.2021.9632774

Low-Complexity Steered Response Power Map** based on Nyquist-Shannon Sampling

Authors: Thomas Dietzen, Enzo De Sena, Toon van Waterschoot

Abstract: The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the… ▽ More The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are bandlimited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. MATLAB and Python implementations are available online. △ Less

Submitted 22 July, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

arXiv:2009.12143 [pdf, other]

On the Convergence of the Multipole Expansion Method

Authors: Brian Fitzpatrick, Enzo De Sena, Toon van Waterschoot

Abstract: The multipole expansion method (MEM) is a spatial discretization technique that is widely used in applications that feature scattering of waves from circular cylinders. Moreover, it also serves as a key component in several other numerical methods in which scattering computations involving arbitrarily shaped objects are accelerated by enclosing the objects in artificial cylinders. A fundamental qu… ▽ More The multipole expansion method (MEM) is a spatial discretization technique that is widely used in applications that feature scattering of waves from circular cylinders. Moreover, it also serves as a key component in several other numerical methods in which scattering computations involving arbitrarily shaped objects are accelerated by enclosing the objects in artificial cylinders. A fundamental question is that of how fast the approximation error of the MEM converges to zero as the truncation number goes to infinity. Despite the fact that the MEM was introduced in 1913, and has been in widespread usage as a numerical technique since as far back as 1955, to the best of the authors' knowledge, a precise characterization of the asymptotic rate of convergence of the MEM has not been obtained. In this work, we provide a resolution to this issue. While our focus in this paper is on the Dirichlet scattering problem, this is merely for convenience and our results actually establish convergence rates that hold for all MEM formulations irrespective of the specific boundary conditions or boundary integral equation solution representation chosen. △ Less

Submitted 3 June, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

Comments: 21 pages, 2 figures; Corrected a scaling error that occurred when plotting the third columns of Figs 1,2,3, some very minor grammatical edits to the intro/conclusion to improve clarity and conciseness, included funding info in first page; updated intro with historical info; reformatted several sections to reduce no. of pages; changed title, shortened abstract; fixed typo in proof of Thm 1.1

MSC Class: 31A10; 42B10; 65N12; 65N15; 65R20; 70F10; 78M15; 78M16

arXiv:1907.11425 [pdf, other]

doi 10.1109/TASLP.2020.2975419

Localization Uncertainty in Time-Amplitude Stereophonic Reproduction

Authors: Enzo De Sena, Zoran Cvetkovic, Huseyin Hacihabiboglu, Marc Moonen, Toon van Waterschoot

Abstract: This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and com… ▽ More This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0.99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty. △ Less

Submitted 6 September, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

Journal ref: IEEE/ACM Trans. Audio, Speech and Language Process. vol 28, pp. 1000 - 1015, Feb. 2020

arXiv:1502.05751 [pdf, other]

doi 10.1109/TASLP.2015.2438547

Efficient Synthesis of Room Acoustics via Scattering Delay Networks

Authors: Enzo De Sena, Huseyin Hacihabiboglu, Zoran Cvetkovic, Julius O. Smith III

Abstract: An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly… ▽ More An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible. △ Less

Submitted 9 July, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, No. 9, September 2015

Showing 1–7 of 7 results for author: De Sena, E