-
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders
Authors:
Manuel Pariente,
Antoine Deleforge,
Emmanuel Vincent
Abstract:
Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to i…
▽ More
Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the en-coder of the pre-learned VAE can be used to estimate the varia-tional approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the afore-mentioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance .
△ Less
Submitted 14 May, 2019; v1 submitted 3 May, 2019;
originally announced May 2019.
-
Blind Source Separation Using Mixtures of Alpha-Stable Distributions
Authors:
Nicolas Keriven,
Antoine Deleforge,
Antoine Liutkus
Abstract:
We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. However, inference of these models is notoriously hard to perform because their probability de…
▽ More
We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. However, inference of these models is notoriously hard to perform because their probability density functions do not have a closed-form expression in general. Here, we introduce a novel method for estimating mixture of alpha-stable distributions based on characteristic function matching. We apply this to the blind estimation of binary masks in individual frequency bands from multichannel convolutive audio mixes. We show that the proposed method yields better separation performance than Gaussian-based binary-masking methods.
△ Less
Submitted 12 February, 2018; v1 submitted 13 November, 2017;
originally announced November 2017.
-
Phase Unmixing : Multichannel Source Separation with Magnitude Constraints
Authors:
Antoine Deleforge,
Yann Traonmilin
Abstract:
We consider the problem of estimating the phases of K mixed complex signals from a multichannel observation, when the mixing matrix and signal magnitudes are known. This problem can be cast as a non-convex quadratically constrained quadratic program which is known to be NP-hard in general. We propose three approaches to tackle it: a heuristic method, an alternate minimization method, and a convex…
▽ More
We consider the problem of estimating the phases of K mixed complex signals from a multichannel observation, when the mixing matrix and signal magnitudes are known. This problem can be cast as a non-convex quadratically constrained quadratic program which is known to be NP-hard in general. We propose three approaches to tackle it: a heuristic method, an alternate minimization method, and a convex relaxation into a semi-definite program. The last two approaches are showed to outperform the oracle multichannel Wiener filter in under-determined informed source separation tasks, using simulated and speech signals. The convex relaxation approach yields best results, including the potential for exact source separation in under-determined settings.
△ Less
Submitted 20 March, 2017; v1 submitted 30 September, 2016;
originally announced September 2016.
-
Rectified binaural ratio: A complex T-distributed feature for robust sound localization
Authors:
Antoine Deleforge,
Florence Forbes
Abstract:
Most existing methods in binaural sound source localization rely on some kind of aggregation of phase-and level-difference cues in the time-frequency plane. While different ag-gregation schemes exist, they are often heuristic and suffer in adverse noise conditions. In this paper, we introduce the rectified binaural ratio as a new feature for sound source local-ization. We show that for Gaussian-pr…
▽ More
Most existing methods in binaural sound source localization rely on some kind of aggregation of phase-and level-difference cues in the time-frequency plane. While different ag-gregation schemes exist, they are often heuristic and suffer in adverse noise conditions. In this paper, we introduce the rectified binaural ratio as a new feature for sound source local-ization. We show that for Gaussian-process point source signals corrupted by stationary Gaussian noise, this ratio follows a complex t-distribution with explicit parameters. This new formulation provides a principled and statistically sound way to aggregate binaural features in the presence of noise. We subsequently derive two simple and efficient methods for robust relative transfer function and time-delay estimation. Experiments on heavily corrupted simulated and speech signals demonstrate the robustness of the proposed scheme.
△ Less
Submitted 30 September, 2016;
originally announced September 2016.
-
Hyper-Spectral Image Analysis with Partially-Latent Regression and Spatial Markov Dependencies
Authors:
Antoine Deleforge,
Florence Forbes,
Sileye Ba,
Radu Horaud
Abstract:
Hyper-spectral data can be analyzed to recover physical properties at large planetary scales. This involves resolving inverse problems which can be addressed within machine learning, with the advantage that, once a relationship between physical parameters and spectra has been established in a data-driven fashion, the learned relationship can be used to estimate physical parameters for new hyper-sp…
▽ More
Hyper-spectral data can be analyzed to recover physical properties at large planetary scales. This involves resolving inverse problems which can be addressed within machine learning, with the advantage that, once a relationship between physical parameters and spectra has been established in a data-driven fashion, the learned relationship can be used to estimate physical parameters for new hyper-spectral observations. Within this framework, we propose a spatially-constrained and partially-latent regression method which maps high-dimensional inputs (hyper-spectral images) onto low-dimensional responses (physical parameters such as the local chemical composition of the soil). The proposed regression model comprises two key features. Firstly, it combines a Gaussian mixture of locally-linear map**s (GLLiM) with a partially-latent response model. While the former makes high-dimensional regression tractable, the latter enables to deal with physical parameters that cannot be observed or, more generally, with data contaminated by experimental artifacts that cannot be explained with noise models. Secondly, spatial constraints are introduced in the model through a Markov random field (MRF) prior which provides a spatial structure to the Gaussian-mixture hidden variables. Experiments conducted on a database composed of remotely sensed observations collected from the Mars planet by the Mars Express orbiter demonstrate the effectiveness of the proposed model.
△ Less
Submitted 27 March, 2015; v1 submitted 30 September, 2014;
originally announced September 2014.
-
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression
Authors:
Antoine Deleforge,
Radu Horaud,
Yoav Schechner,
Laurent Girin
Abstract:
This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a lo…
▽ More
This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.
△ Less
Submitted 15 April, 2016; v1 submitted 12 August, 2014;
originally announced August 2014.
-
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
Authors:
Antoine Deleforge,
Florence Forbes,
Radu Horaud
Abstract:
In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic map** model that…
▽ More
In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic map** model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques.
△ Less
Submitted 20 December, 2013; v1 submitted 10 August, 2013;
originally announced August 2013.