Search | arXiv e-print repository

A Diffusion-Based Generative Equalizer for Music Restoration

Authors: Eloi Moliner, Maija Turunen, Filip Elvander, Vesa Välimäki

Abstract: This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of significant improvements. This research broadens the concept of bandwidth extension to \emph{generative equalizat… ▽ More This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of significant improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing a marked enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Submitted to DAFx24. Historical music restoration examples are available at: http://research.spa.aalto.fi/publications/papers/dafx-babe2/

arXiv:2403.10329 [pdf, ps, other]

Multi-Source Localization and Data Association for Time-Difference of Arrival Measurements

Authors: Gabrielle Flood, Filip Elvander

Abstract: In this work, we consider the problem of localizing multiple signal sources based on time-difference of arrival (TDOA) measurements. In the blind setting, in which the source signals are not known, the localization task is challenging due to the data association problem. That is, it is not known which of the TDOA measurements correspond to the same source. Herein, we propose to perform joint local… ▽ More In this work, we consider the problem of localizing multiple signal sources based on time-difference of arrival (TDOA) measurements. In the blind setting, in which the source signals are not known, the localization task is challenging due to the data association problem. That is, it is not known which of the TDOA measurements correspond to the same source. Herein, we propose to perform joint localization and data association by means of an optimal transport formulation. The method operates by finding optimal grou**s of TDOA measurements and associating these with candidate source locations. To allow for computationally feasible localization in three-dimensional space, an efficient set of candidate locations is constructed using a minimal multilateration solver based on minimal sets of receiver pairs. In numerical simulations, we demonstrate that the proposed method is robust both to measurement noise and TDOA detection errors. Furthermore, it is shown that the data association provided by the proposed method allows for statistically efficient estimates of the source locations. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.03762 [pdf, ps, other]

Room Impulse Response Estimation using Optimal Transport: Simulation-Informed Inference

Authors: David Sundström, Anton Björkman, Andreas Jakobsson, Filip Elvander

Abstract: The ability to accurately estimate room impulse responses (RIRs) is integral to many applications of spatial audio processing. Regrettably, estimating the RIR using ambient signals, such as speech or music, remains a challenging problem due to, e.g., low signal-to-noise ratios, finite sample lengths, and poor spectral excitation. Commonly, in order to improve the conditioning of the estimation pro… ▽ More The ability to accurately estimate room impulse responses (RIRs) is integral to many applications of spatial audio processing. Regrettably, estimating the RIR using ambient signals, such as speech or music, remains a challenging problem due to, e.g., low signal-to-noise ratios, finite sample lengths, and poor spectral excitation. Commonly, in order to improve the conditioning of the estimation problem, priors are placed on the amplitudes of the RIR. Although serving as a regularizer, this type of prior is generally not useful when only approximate knowledge of the delay structure is available, which, for example, is the case when the prior is a simulated RIR from an approximation of the room geometry. In this work, we target the delay structure itself, constructing a prior based on the concept of optimal transport. As illustrated using both simulated and measured data, the resulting method is able to beneficially incorporate information even from simple simulation models, displaying considerable robustness to perturbations in the assumed room dimensions and its temperature. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.19345 [pdf, other]

Multi-frequency tracking via group-sparse optimal transport

Authors: Isabel Haasler, Filip Elvander

Abstract: In this work, we introduce an optimal transport framework for inferring power distributions over both spatial location and temporal frequency. Recently, it has been shown that optimal transport is a powerful tool for estimating spatial spectra that change smoothly over time. In this work, we consider the tracking of the spatio-temporal spectrum corresponding to a small number of moving broad-band… ▽ More In this work, we introduce an optimal transport framework for inferring power distributions over both spatial location and temporal frequency. Recently, it has been shown that optimal transport is a powerful tool for estimating spatial spectra that change smoothly over time. In this work, we consider the tracking of the spatio-temporal spectrum corresponding to a small number of moving broad-band signal sources. Typically, such tracking problems are addressed by treating the spatio-temporal power distribution in a frequency-by-frequency manner, allowing to use well-understood models for narrow-band signals. This however leads to decreased target resolution due to inefficient use of the available information. We propose an extension of the optimal transport framework that exploits information from several frequencies simultaneously by estimating a spatio-temporal distribution penalized by a group-sparsity regularizer. This approach finds a spatial spectrum that changes smoothly over time, and at each time instance has a small support that is similar across frequencies. To the best of the authors knowledge, this is the first formulation combining optimal transport and sparsity for solving inverse problems. As is shown on simulated and real data, our method can successfully track targets in scenarios where information from separate frequency bands alone is insufficient. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 6 pages, 9 figures

arXiv:2306.01433 [pdf, other]

Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach

Authors: Eloi Moliner, Filip Elvander, Vesa Välimäki

Abstract: Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveragi… ▽ More Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/) △ Less

Submitted 30 January, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

arXiv:2303.00832 [pdf, ps, other]

Distributed Adaptive Norm Estimation for Blind System Identification in Wireless Sensor Networks

Authors: Matthias Blochberger, Filip Elvander, Randall Ali, Jan Østergaard, Jesper Jensen, Marc Moonen, Toon van Waterschoot

Abstract: Distributed signal-processing algorithms in (wireless) sensor networks often aim to decentralize processing tasks to reduce communication cost and computational complexity or avoid reliance on a single device (i.e., fusion center) for processing. In this contribution, we extend a distributed adaptive algorithm for blind system identification that relies on the estimation of a stacked network-wide… ▽ More Distributed signal-processing algorithms in (wireless) sensor networks often aim to decentralize processing tasks to reduce communication cost and computational complexity or avoid reliance on a single device (i.e., fusion center) for processing. In this contribution, we extend a distributed adaptive algorithm for blind system identification that relies on the estimation of a stacked network-wide consensus vector at each node, the computation of which requires either broadcasting or relaying of node-specific values (i.e., local vector norms) to all other nodes. The extended algorithm employs a distributed-averaging-based scheme to estimate the network-wide consensus norm value by only using the local vector norm provided by neighboring sensor nodes. We introduce an adaptive mixing factor between instantaneous and recursive estimates of these norms for adaptivity in a time-varying system. Simulation results show that the extension provides estimation results close to the optimal fully-connected-network or broadcasting case while reducing inter-node transmission significantly. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted to ICASSP 2023

arXiv:2111.05579 [pdf, ps, other]

An efficient solver for designing optimal sampling schemes

Authors: Filip Elvander, Johan Swärd, Andreas Jakobsson

Abstract: In this short paper, we describe an efficient numerical solver for the optimal sampling problem considered in "Designing Sampling Schemes for Multi-Dimensional Data". An implementation may be found on https://www.maths.lu.se/staff/andreas-jakobsson/publications/. In this short paper, we describe an efficient numerical solver for the optimal sampling problem considered in "Designing Sampling Schemes for Multi-Dimensional Data". An implementation may be found on https://www.maths.lu.se/staff/andreas-jakobsson/publications/. △ Less

Submitted 10 November, 2021; originally announced November 2021.

arXiv:2110.02728 [pdf, ps, other]

Quantifying and Computing Covariance Uncertainty

Authors: Filip Elvander, Johan Karlsson, Toon van Waterschoot

Abstract: In this work, we consider the problem of bounding the values of a covariance function corresponding to a continuous-time stationary stochastic process or signal. Specifically, for two signals whose covariance functions agree on a finite discrete set of time-lags, we consider the maximal possible discrepancy of the covariance functions for real-valued time-lags outside this discrete grid. Computing… ▽ More In this work, we consider the problem of bounding the values of a covariance function corresponding to a continuous-time stationary stochastic process or signal. Specifically, for two signals whose covariance functions agree on a finite discrete set of time-lags, we consider the maximal possible discrepancy of the covariance functions for real-valued time-lags outside this discrete grid. Computing this uncertainty corresponds to solving an infinite dimensional non-convex problem. However, we herein prove that the maximal objective value may be bounded from above by a finite dimensional convex optimization problem, allowing for efficient computation by standard methods. Furthermore, we empirically observe that for the case of signals whose spectra are supported on an interval, this upper bound is sharp, i.e., provides an exact quantification of the covariance uncertainty. △ Less

Submitted 6 October, 2021; originally announced October 2021.

arXiv:2110.02357 [pdf, other]

Determining Joint Periodicities in Multi-time Data With Sampling Uncertainties

Authors: David Svedberg, Filip Elvander, Andreas Jakobsson

Abstract: In this work, we introduce a novel approach for determining a joint sparse spectrum from several non-uniformly sampled data sets, where each data set is assumed to have its own, possibly disjoint, and only partially known, sampling times. The potential of the proposed approach is illustrated using a spectral estimation problem in paleoclimatology. In this problem, each data point derives from a se… ▽ More In this work, we introduce a novel approach for determining a joint sparse spectrum from several non-uniformly sampled data sets, where each data set is assumed to have its own, possibly disjoint, and only partially known, sampling times. The potential of the proposed approach is illustrated using a spectral estimation problem in paleoclimatology. In this problem, each data point derives from a separate ice core measurement, resulting in that even though all measurements reflect the same periodicities, the sampling times and phases differ among the data sets. In addition, sampling times are only approximately known. The resulting joint estimate exploiting all available data is formulated using a sparse reconstruction framework allowing for a reliable and robust estimate of the underlying periodicities. The corresponding misspecified Cramér-Rao lower bound, accounting for the expected sampling uncertainties, is derived and the proposed method is shown to attain the resulting bound when the signal to noise ratio is sufficiently high. The performance of the proposed method is illustrated as compared to other commonly used approaches using both simulated and measured ice core data sets. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2106.14696 [pdf, ps, other]

Mixed-Spectrum Signals -- Discrete Approximations and Variance Expressions for Covariance Estimates

Authors: Filip Elvander, Johan Karlsson

Abstract: The estimation of the covariance function of a stochastic process, or signal, is of integral importance for a multitude of signal processing applications. In this work, we derive closed-form expressions for the variance of covariance estimates for mixed-spectrum signals, i.e., spectra containing both absolutely continuous and singular parts. The results cover both finite-sample and asymptotic regi… ▽ More The estimation of the covariance function of a stochastic process, or signal, is of integral importance for a multitude of signal processing applications. In this work, we derive closed-form expressions for the variance of covariance estimates for mixed-spectrum signals, i.e., spectra containing both absolutely continuous and singular parts. The results cover both finite-sample and asymptotic regimes, allowing for assessing the exact speed of convergence of estimates to their expectations, as well as their limiting behavior. As is shown, such covariance estimates may converge even for non-ergodic processes. Furthermore, we consider approximating signals with arbitrary spectral densities by sequences of singular spectrum, i.e., sinusoidal, processes, and derive the limiting behavior of covariance estimates as both the sample size and the number of sinusoidal components tend to infinity. We show that the asymptotic regime variance can be described by a time-frequency resolution product, with dramatically different behavior depending on how the sinusoidal approximation is constructed. In a few numerical examples we illustrate the theory and the corresponding implications for direction of arrival estimation. △ Less

Submitted 4 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

arXiv:2003.10767 [pdf, ps, other]

doi 10.1109/TSP.2020.3035466

Defining Fundamental Frequency for Almost Harmonic Signals

Authors: Filip Elvander, Andreas Jakobsson

Abstract: In this work, we consider the modeling of signals that are almost, but not quite, harmonic, i.e., composed of sinusoids whose frequencies are close to being integer multiples of a common frequency. Typically, in applications, such signals are treated as perfectly harmonic, allowing for the estimation of their fundamental frequency, despite the signals not actually being periodic. Herein, we provid… ▽ More In this work, we consider the modeling of signals that are almost, but not quite, harmonic, i.e., composed of sinusoids whose frequencies are close to being integer multiples of a common frequency. Typically, in applications, such signals are treated as perfectly harmonic, allowing for the estimation of their fundamental frequency, despite the signals not actually being periodic. Herein, we provide three different definitions of a concept of fundamental frequency for such inharmonic signals and study the implications of the different choices for modeling and estimation. We show that one of the definitions corresponds to a misspecified modeling scenario, and provides a theoretical benchmark for analyzing the behavior of estimators derived under a perfectly harmonic assumption. The second definition stems from optimal mass transport theory and yields a robust and easily interpretable concept of fundamental frequency based on the signals' spectral properties. The third definition interprets the inharmonic signal as an observation of a randomly perturbed harmonic signal. This allows for computing a hybrid information theoretical bound on estimation performance, as well as for finding an estimator attaining the bound. The theoretical findings are illustrated using numerical examples. △ Less

Submitted 30 October, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

Comments: Accepted for publication in IEEE Transactions on Signal Processing

Journal ref: IEEE Transactions on Signal Processing, 2020, volume 68, pages 6453 - 6466

arXiv:1910.07016 [pdf, other]

On Harmonic Approximations of Inharmonic Signals

Authors: Filip Elvander, Jie Ding, Andreas Jakobsson

Abstract: In this work, we present the misspecified Gaussian Cramér-Rao lower bound for the parameters of a harmonic signal, or pitch, when signal measurements are collected from an almost, but not quite, harmonic model. For the asymptotic case of large sample sizes, we present a closed-form expression for the bound corresponding to the pseduo-true fundamental frequency. Using simulation studies, it is show… ▽ More In this work, we present the misspecified Gaussian Cramér-Rao lower bound for the parameters of a harmonic signal, or pitch, when signal measurements are collected from an almost, but not quite, harmonic model. For the asymptotic case of large sample sizes, we present a closed-form expression for the bound corresponding to the pseduo-true fundamental frequency. Using simulation studies, it is shown that the bound is sharp and is attained by maximum likelihood estimators derived under the misspecified harmonic assumption. It is shown that misspecified harmonic models achieve a lower mean squared error than correctly specified unstructured models for moderately inharmonic signals. Examining voices from a speech database, we conclude that human speech belongs to this class of signals, verifying that the use of a harmonic model for voiced speech is preferable. △ Less

Submitted 28 October, 2019; v1 submitted 15 October, 2019; originally announced October 2019.

arXiv:1906.11113 [pdf, other]

Mismatched Estimation of Polynomially Damped Signals

Authors: Filip Elvander, Johan Swärd, Andreas Jakobsson

Abstract: In this work, we consider the problem of estimating the parameters of polynomially damped sinusoidal signals, commonly encountered in, for instance, spectroscopy. Generally, finding the parameter values of such signals constitutes a high-dimensional problem, often further complicated by not knowing the number of signal components or their specific signal structures. In order to alleviate the compu… ▽ More In this work, we consider the problem of estimating the parameters of polynomially damped sinusoidal signals, commonly encountered in, for instance, spectroscopy. Generally, finding the parameter values of such signals constitutes a high-dimensional problem, often further complicated by not knowing the number of signal components or their specific signal structures. In order to alleviate the computational burden, we herein propose a mismatched estimation procedure using simplified, approximate signal models. Despite the approximation, we show that such a procedure is expected to yield predictable results, allowing for statistically and computationally efficient estimates of the signal parameters. △ Less

Submitted 26 June, 2019; originally announced June 2019.

arXiv:1905.03847 [pdf, other]

Multi-Marginal Optimal Mass Transport with Partial Information

Authors: Filip Elvander, Isabel Haasler, Andreas Jakobsson, Johan Karlsson

Abstract: During recent decades, there has been a substantial development in optimal mass transport theory and methods. In this work, we consider multi-marginal problems wherein only partial information of each marginal is available, which is a setup common in many inverse problems in, e.g., imaging and spectral estimation. By considering an entropy regularized approximation of the original transport proble… ▽ More During recent decades, there has been a substantial development in optimal mass transport theory and methods. In this work, we consider multi-marginal problems wherein only partial information of each marginal is available, which is a setup common in many inverse problems in, e.g., imaging and spectral estimation. By considering an entropy regularized approximation of the original transport problem, we propose an algorithm corresponding to a block-coordinate ascent of the dual problem, where Newton's algorithm is used to solve the sub-problems. In order to make this computationally tractable for large-scale settings, we utilize the tensor structure that arises in practical problems, allowing for computing projections of the multi-marginal transport plan using only matrix-vector operations of relatively small matrices. As illustrating examples, we apply the resulting method to tracking and barycenter problems in spatial spectral estimation. In particular, we show that the optimal mass transport framework allows for fusing information from different time steps, as well as from different sensor arrays, also when the sensor arrays are not jointly calibrated. Furthermore, we show that by incorporating knowledge of underlying dynamics in tracking scenarios, one may arrive at accurate spectral estimates, as well as faithful reconstructions of spectra corresponding to unobserved time points. △ Less

Submitted 9 May, 2019; originally announced May 2019.

arXiv:1810.10788 [pdf, other]

Non-Coherent Sensor Fusion via Entropy Regularized Optimal Mass Transport

Authors: Filip Elvander, Isabel Haasler, Andreas Jakobsson, Johan Karlsson

Abstract: This work presents a method for information fusion in source localization applications. The method utilizes the concept of optimal mass transport in order to construct estimates of the spatial spectrum using a convex barycenter formulation. We introduce an entropy regularization term to the convex objective, which allows for low-complexity iterations of the solution algorithm and thus makes the pr… ▽ More This work presents a method for information fusion in source localization applications. The method utilizes the concept of optimal mass transport in order to construct estimates of the spatial spectrum using a convex barycenter formulation. We introduce an entropy regularization term to the convex objective, which allows for low-complexity iterations of the solution algorithm and thus makes the proposed method applicable also to higher-dimensional problems. We illustrate the proposed method's inherent robustness to misalignment and miscalibration of the sensor arrays using numerical examples of localization in two dimensions. △ Less

Submitted 19 November, 2018; v1 submitted 25 October, 2018; originally announced October 2018.

arXiv:1711.03890 [pdf, other]

doi 10.1109/TSP.2018.2866432

Interpolation and Extrapolation of Toeplitz Matrices via Optimal Mass Transport

Authors: Filip Elvander, Andreas Jakobsson, Johan Karlsson

Abstract: In this work, we propose a novel method for quantifying distances between Toeplitz structured covariance matrices. By exploiting the spectral representation of Toeplitz matrices, the proposed distance measure is defined based on an optimal mass transport problem in the spectral domain. This may then be interpreted in the covariance domain, suggesting a natural way of interpolating and extrapolatin… ▽ More In this work, we propose a novel method for quantifying distances between Toeplitz structured covariance matrices. By exploiting the spectral representation of Toeplitz matrices, the proposed distance measure is defined based on an optimal mass transport problem in the spectral domain. This may then be interpreted in the covariance domain, suggesting a natural way of interpolating and extrapolating Toeplitz matrices, such that the positive semi-definiteness and the Toeplitz structure of these matrices are preserved. The proposed distance measure is also shown to be contractive with respect to both additive and multiplicative noise, and thereby allows for a quantification of the decreased distance between signals when these are corrupted by noise. Finally, we illustrate how this approach can be used for several applications in signal processing. In particular, we consider interpolation and extrapolation of Toeplitz matrices, as well as clustering problems and tracking of slowly varying stochastic processes. △ Less

Submitted 29 May, 2019; v1 submitted 10 November, 2017; originally announced November 2017.

Journal ref: IEEE Transactions on Signal Processing, vol. 66, no. 20, (2018), pp. 5285 - 5298

arXiv:1707.03247 [pdf, ps, other]

Designing Sampling Schemes for Multi-Dimensional Data

Authors: Johan Swärd, Filip Elvander, Andreas Jakobsson

Abstract: In this work, we propose a method for determining a non-uniform sampling scheme for multi-dimensional signals by solving a convex optimization problem reminiscent of the sensor selection problem. The resulting sampling scheme minimizes the sum of the Cramér-Rao lower bound for the parameters of interest, given a desired number of sampling points. The proposed framework allows for selecting an arbi… ▽ More In this work, we propose a method for determining a non-uniform sampling scheme for multi-dimensional signals by solving a convex optimization problem reminiscent of the sensor selection problem. The resulting sampling scheme minimizes the sum of the Cramér-Rao lower bound for the parameters of interest, given a desired number of sampling points. The proposed framework allows for selecting an arbitrary subset of the parameters detailing the model, as well as weighing the importance of the different parameters. Also presented is a scheme for incorporating any imprecise a priori knowledge of the locations of the parameters, as well as defining estimation performance bounds for the parameters of interest. Numerical examples illustrate the efficiency of the proposed scheme. △ Less

Submitted 11 July, 2017; originally announced July 2017.

Showing 1–17 of 17 results for author: Elvander, F