Search | arXiv e-print repository

On Improving Error Resilience of Neural End-to-End Speech Coders

Authors: Kishan Gupta, Nicola Pia, Srikanth Korse, Andreas Brendel, Guillaume Fuchs, Markus Multrus

Abstract: Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates bu… ▽ More Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates but few considerations were made about their error resilience in a real system. Recently introduced Neural End-to-End Speech Codec (NESC) can reproduce high quality natural speech at low bitrates. We extend its robustness to packet losses by adding a low complexity network to predict the codebook indices in latent space. Furthermore, we propose a method to add an in-band FEC at an additional bitrate of 0.8 kbps. Both subjective and objective assessment indicate the effectiveness of proposed methods, and demonstrate that coupling PLC and FEC provide significant robustness against packet losses. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.08417 [pdf, other]

Simple and Efficient Quantization Techniques for Neural Speech Coding

Authors: Andreas Brendel, Nicola Pia, Kishan Gupta, Guillaume Fuchs, Markus Multrus

Abstract: Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder has to be learned that allows for efficient transmission of the input audio signal. This… ▽ More Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder has to be learned that allows for efficient transmission of the input audio signal. This discrete representation is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose simple alternatives to VQ, which are based on projected Scalar Quantization (SQ). These quantization techniques do not need any additional losses, scheduling parameters or codebook storage thereby simplifying the training of neural audio codecs. Furthermore, we propose a new causal network architecture for neural speech coding that shows good performance at very low computational complexity. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2306.02450 [pdf, other]

End-To-End Deep Learning-based Adaptation Control for Linear Acoustic Echo Cancellation

Authors: Thomas Haubner, Andreas Brendel, Walter Kellermann

Abstract: The attenuation of acoustic loudspeaker echoes remains to be one of the open challenges to achieve pleasant full-duplex hands free speech communication. In many modern signal enhancement interfaces, this problem is addressed by a linear acoustic echo canceler which subtracts a loudspeaker echo estimate from the recorded microphone signal. To obtain precise echo estimates, the parameters of the ech… ▽ More The attenuation of acoustic loudspeaker echoes remains to be one of the open challenges to achieve pleasant full-duplex hands free speech communication. In many modern signal enhancement interfaces, this problem is addressed by a linear acoustic echo canceler which subtracts a loudspeaker echo estimate from the recorded microphone signal. To obtain precise echo estimates, the parameters of the echo canceler, i.e., the filter coefficients, need to be estimated quickly and precisely from the observed loudspeaker and microphone signals. For this a sophisticated adaptation control is required to deal with high-power double-talk and rapidly track time-varying acoustic environments which are often faced with portable devices. In this paper, we address this problem by end-to-end deep learning. In particular, we suggest to infer the step-size for a least mean squares frequency-domain adaptive filter update by a Deep Neural Network (DNN). Two different step-size inference approaches are investigated. On the one hand broadband approaches, which use a single DNN to jointly infer step-sizes for all frequency bands, and on the other hand narrowband methods, which exploit individual DNNs per frequency band. The discussion of benefits and disadvantages of both approaches leads to a novel hybrid approach which shows improved echo cancellation while requiring only small DNN architectures. Furthermore, we investigate the effect of different loss functions, signal feature vectors, and DNN output layer architectures on the echo cancellation performance from which we obtain valuable insights into the general design and functionality of DNN-based adaptation control algorithms. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: This article has been submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

arXiv:2207.13934 [pdf, ps, other]

doi 10.1109/TSP.2023.3255552

A Unifying View on Blind Source Separation of Convolutive Mixtures based on Independent Component Analysis

Authors: Andreas Brendel, Thomas Haubner, Walter Kellermann

Abstract: In many daily-life scenarios, acoustic sources recorded in an enclosure can only be observed with other interfering sources. Hence, convolutive Blind Source Separation (BSS) is a central problem in audio signal processing. Methods based on Independent Component Analysis (ICA) are especially important in this field as they require only few and weak assumptions and allow for blindness regarding the… ▽ More In many daily-life scenarios, acoustic sources recorded in an enclosure can only be observed with other interfering sources. Hence, convolutive Blind Source Separation (BSS) is a central problem in audio signal processing. Methods based on Independent Component Analysis (ICA) are especially important in this field as they require only few and weak assumptions and allow for blindness regarding the original source signals and the acoustic propagation path. Most of the currently used algorithms belong to one of the following three families: Frequency Domain ICA (FD-ICA), Independent Vector Analysis (IVA), and TRIple-N Independent component analysis for CONvolutive mixtures (TRINICON). While the relation between ICA, FD-ICA and IVA becomes apparent due to their construction, the relation to TRINICON is not well established yet. This paper fills this gap by providing an in-depth treatment of the common building blocks of these algorithms and their differences, and thus provides a common framework for all considered algorithms. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2201.09946 [pdf, other]

Microphone Utility Estimation in Acoustic Sensor Networks using Single-Channel Signal Features

Authors: Michael Günther, Andreas Brendel, Walter Kellermann

Abstract: In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computational load, ideally both at the same time. In the acoustic domain, signal cross-correlation is a natural choice to quantify the usefulness of microphone signals, i.e., microphone utility, for ar… ▽ More In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computational load, ideally both at the same time. In the acoustic domain, signal cross-correlation is a natural choice to quantify the usefulness of microphone signals, i.e., microphone utility, for array processing, but its estimation requires that the uncoded signals are synchronized and transmitted between nodes. In resource-constrained environments like acoustic sensor networks, low data transmission rates often make transmission of all observed signals to the centralized location infeasible, thus discouraging direct estimation of signal cross-correlation. Instead, we employ characteristic features of the recorded signals to estimate the usefulness of individual microphone signals. In this contribution, we provide a comprehensive analysis of model-based microphone utility estimation approaches that use signal features and, as an alternative, also propose machine learning-based estimation methods that identify optimal sensor signal utility features. The performance of both approaches is validated experimentally using both simulated and recorded acoustic data, comprising a variety of realistic and practically relevant acoustic scenarios including moving and static sources. △ Less

Submitted 14 January, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

Comments: submitted to EURASIP Journal on Audio, Speech, and Music Processing

arXiv:2110.02189 [pdf, ps, other]

Manifold learning-supported estimation of relative transfer functions for spatial filtering

Authors: Andreas Brendel, Johannes Zeitler, Walter Kellermann

Abstract: Many spatial filtering algorithms used for voice capture in, e.g., teleconferencing applications, can benefit from or even rely on knowledge of Relative Transfer Functions (RTFs). Accordingly, many RTF estimators have been proposed which, however, suffer from performance degradation under acoustically adverse conditions or need prior knowledge on the properties of the interfering sources. While st… ▽ More Many spatial filtering algorithms used for voice capture in, e.g., teleconferencing applications, can benefit from or even rely on knowledge of Relative Transfer Functions (RTFs). Accordingly, many RTF estimators have been proposed which, however, suffer from performance degradation under acoustically adverse conditions or need prior knowledge on the properties of the interfering sources. While state-of-the-art RTF estimators ignore prior knowledge about the acoustic enclosure, audio signal processing algorithms for teleconferencing equipment are often operating in the same or at least a similar acoustic enclosure, e.g., a car or an office, such that training data can be collected. In this contribution, we use such data to train Variational Autoencoders (VAEs) in an unsupervised manner and apply the trained VAEs to enhance imprecise RTF estimates. Furthermore, a hybrid between classic RTF estimation and the trained VAE is investigated. Comprehensive experiments with real-world data confirm the efficacy for the proposed method. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2107.06253 [pdf, other]

Bottom-up Synthesis of Recursive Functional Programs using Angelic Execution

Authors: Anders Miltner, Adrian Trejo Nuñez, Ana Brendel, Swarat Chaudhuri, Isil Dillig

Abstract: We present a novel bottom-up method for the synthesis of functional recursive programs. While bottom-up synthesis techniques can work better than top-down methods in certain settings, there is no prior technique for synthesizing recursive programs from logical specifications in a purely bottom-up fashion. The main challenge is that effective bottom-up methods need to execute sub-expressions of the… ▽ More We present a novel bottom-up method for the synthesis of functional recursive programs. While bottom-up synthesis techniques can work better than top-down methods in certain settings, there is no prior technique for synthesizing recursive programs from logical specifications in a purely bottom-up fashion. The main challenge is that effective bottom-up methods need to execute sub-expressions of the code being synthesized, but it is impossible to execute a recursive subexpression of a program that has not been fully constructed yet. In this paper, we address this challenge using the concept of angelic semantics. Specifically, our method finds a program that satisfies the specification under angelic semantics (we refer to this as angelic synthesis), analyzes the assumptions made during its angelic execution, uses this analysis to strengthen the specification, and finally reattempts synthesis with the strengthened specification. Our proposed angelic synthesis algorithm is based on version space learning and therefore deals effectively with many incremental synthesis calls made during the overall algorithm. We have implemented this approach in a prototype called Burst and evaluate it on synthesis problems from prior work. Our experiments show that Burst is able to synthesize a solution to 94% of the benchmarks in our benchmark suite, outperforming prior work. △ Less

Submitted 8 December, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2106.01262 [pdf, ps, other]

End-To-End Deep Learning-Based Adaptation Control for Frequency-Domain Adaptive System Identification

Authors: Thomas Haubner, Andreas Brendel, Walter Kellermann

Abstract: We present a novel end-to-end deep learning-based adaptation control algorithm for frequency-domain adaptive system identification. The proposed method exploits a deep neural network to map observed signal features to corresponding step-sizes which control the filter adaptation. The parameters of the network are optimized in an end-to-end fashion by minimizing the average normalized system distanc… ▽ More We present a novel end-to-end deep learning-based adaptation control algorithm for frequency-domain adaptive system identification. The proposed method exploits a deep neural network to map observed signal features to corresponding step-sizes which control the filter adaptation. The parameters of the network are optimized in an end-to-end fashion by minimizing the average normalized system distance of the adaptive filter. This avoids the need of explicit signal power spectral density estimation as required for model-based adaptation control and further auxiliary mechanisms to deal with model inaccuracies. The proposed algorithm achieves fast convergence and robust steady-state performance for scenarios characterized by high-level, non-white and non-stationary additive noise signals, abrupt environment changes and additional model inaccuracies. △ Less

Submitted 4 March, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022, Singapore, Singapore

arXiv:2105.03337 [pdf, other]

doi 10.13140/RG.2.2.23838.46408

Online Acoustic System Identification Exploiting Kalman Filtering and an Adaptive Impulse Response Subspace Model

Authors: Thomas Haubner, Andreas Brendel, Walter Kellermann

Abstract: We introduce a novel algorithm for online estimation of acoustic impulse responses (AIRs) which allows for fast convergence by exploiting prior knowledge about the fundamental structure of AIRs. The proposed method assumes that the variability of AIRs of an acoustic scene is confined to a low-dimensional manifold which is embedded in a high-dimensional space of possible AIR estimates. We discuss v… ▽ More We introduce a novel algorithm for online estimation of acoustic impulse responses (AIRs) which allows for fast convergence by exploiting prior knowledge about the fundamental structure of AIRs. The proposed method assumes that the variability of AIRs of an acoustic scene is confined to a low-dimensional manifold which is embedded in a high-dimensional space of possible AIR estimates. We discuss various approaches to locally approximate the AIR manifold by affine subspaces which are assumed to be tangential hyperplanes to the manifold. The validity of these model assumptions is verified for simulated data. Subsequently, we describe how the learned models can be used to improve online AIR estimates by projecting them onto an adaptively estimated subspace. The parameters determining the subspace are learned from training samples in a local neighbourhood to the current AIR estimate. This allows the system identification algorithm to benefit from preceding estimates in the acoustic scene. To assess the proximity of training data AIRs to the current AIR estimate, we introduce a probabilistic extension of the Euclidean distance which improves the performance for applications with correlated excitation signals. Furthermore, we describe how model imperfections can be tackled by a soft projection of the AIR estimates. The proposed algorithm exhibits significantly faster convergence properties in comparison to a high-performance state-of-the-art algorithm. Furthermore, we show an improved steady-state performance for speech-excited system identification scenarios suffering from high-level interfering noise and nonunique solutions. △ Less

Submitted 7 May, 2021; originally announced May 2021.

arXiv:2012.08867 [pdf, ps, other]

doi 10.23919/EUSIPCO54536.2021.9616295

A Synergistic Kalman- and Deep Postfiltering Approach to Acoustic Echo Cancellation

Authors: Thomas Haubner, Mhd. Modar Halimeh, Andreas Brendel, Walter Kellermann

Abstract: We introduce a synergistic approach to double-talk robust acoustic echo cancellation combining adaptive Kalman filtering with a deep neural network-based postfilter. The proposed algorithm overcomes the well-known limitations of Kalman filter-based adaptation control in scenarios characterized by abrupt echo path changes. As the key innovation, we suggest to exploit the different statistical prope… ▽ More We introduce a synergistic approach to double-talk robust acoustic echo cancellation combining adaptive Kalman filtering with a deep neural network-based postfilter. The proposed algorithm overcomes the well-known limitations of Kalman filter-based adaptation control in scenarios characterized by abrupt echo path changes. As the key innovation, we suggest to exploit the different statistical properties of the interfering signal components for robustly estimating the adaptation step size. This is achieved by leveraging the postfilter near-end estimate and the estimation error of the Kalman filter. The proposed synergistic scheme allows for rapid reconvergence of the adaptive filter after abrupt echo path changes without compromising the steady state performance achieved by state-of-the-art approaches in static scenarios. △ Less

Submitted 4 March, 2022; v1 submitted 16 December, 2020; originally announced December 2020.

Comments: Accepted for European Signal Processing Conference (EUSIPCO), Dublin, Ireland, August 2021

arXiv:2011.03432 [pdf, ps, other]

Misalignment Recognition in Acoustic Sensor Networks using a Semi-supervised Source Estimation Method and Markov Random Fields

Authors: Gabriel F Miller, Andreas Brendel, Walter Kellermann, Sharon Gannot

Abstract: In this paper, we consider the problem of acoustic source localization by acoustic sensor networks (ASNs) using a promising, learning-based technique that adapts to the acoustic environment. In particular, we look at the scenario when a node in the ASN is displaced from its position during training. As the mismatch between the ASN used for learning the localization model and the one after a node d… ▽ More In this paper, we consider the problem of acoustic source localization by acoustic sensor networks (ASNs) using a promising, learning-based technique that adapts to the acoustic environment. In particular, we look at the scenario when a node in the ASN is displaced from its position during training. As the mismatch between the ASN used for learning the localization model and the one after a node displacement leads to erroneous position estimates, a displacement has to be detected and the displaced nodes need to be identified. We propose a method that considers the disparity in position estimates made by leave-one-node-out (LONO) sub-networks and uses a Markov random field (MRF) framework to infer the probability of each LONO position estimate being aligned, misaligned or unreliable while accounting for the noise inherent to the estimator. This probabilistic approach is advantageous over naive detection methods, as it outputs a normalized value that encapsulates conditional information provided by each LONO sub-network on whether the reading is in misalignment with the overall network. Experimental results confirm that the performance of the proposed method is consistent in identifying compromised nodes in various acoustic conditions. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2009.09402 [pdf, ps, other]

Accelerating Auxiliary Function-based Independent Vector Analysis

Authors: Andreas Brendel, Walter Kellermann

Abstract: Independent Vector Analysis (IVA) is an effective approach for Blind Source Separation (BSS) of convolutive mixtures of audio signals. As a practical realization of an IVA-based BSS algorithm, the so-called AuxIVA update rules based on the Majorize-Minimize (MM) principle have been proposed which allow for fast and computationally efficient optimization of the IVA cost function. For many real-time… ▽ More Independent Vector Analysis (IVA) is an effective approach for Blind Source Separation (BSS) of convolutive mixtures of audio signals. As a practical realization of an IVA-based BSS algorithm, the so-called AuxIVA update rules based on the Majorize-Minimize (MM) principle have been proposed which allow for fast and computationally efficient optimization of the IVA cost function. For many real-time applications, however, update rules for IVA exhibiting even faster convergence are highly desirable. To this end, we investigate techniques which accelerate the convergence of the AuxIVA update rules without extra computational cost. The efficacy of the proposed methods is verified in experiments representing real-world acoustic scenarios. △ Less

Submitted 20 September, 2020; originally announced September 2020.

arXiv:2007.01579 [pdf, ps, other]

Noise-Robust Adaptation Control for Supervised Acoustic System Identification Exploiting A Noise Dictionary

Authors: Thomas Haubner, Andreas Brendel, Mohamed Elminshawi, Walter Kellermann

Abstract: We present a noise-robust adaptation control strategy for block-online supervised acoustic system identification by exploiting a noise dictionary. The proposed algorithm takes advantage of the pronounced spectral structure which characterizes many types of interfering noise signals. We model the noisy observations by a linear Gaussian Discrete Fourier Transform-domain state space model whose param… ▽ More We present a noise-robust adaptation control strategy for block-online supervised acoustic system identification by exploiting a noise dictionary. The proposed algorithm takes advantage of the pronounced spectral structure which characterizes many types of interfering noise signals. We model the noisy observations by a linear Gaussian Discrete Fourier Transform-domain state space model whose parameters are estimated by an online generalized Expectation-Maximization algorithm. Unlike all other state-of-the-art approaches we suggest to model the covariance matrix of the observation probability density function by a dictionary model. We propose to learn the noise dictionary from training data, which can be gathered either offline or online whenever the system is not excited, while we infer the activations continuously. The proposed algorithm represents a novel machine-learning based approach to noise-robust adaptation control which allows for faster convergence in applications characterized by high-level and non-stationary interfering noise signals and abrupt system changes. △ Less

Submitted 3 February, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

arXiv:2007.01543 [pdf, other]

Online Supervised Acoustic System Identification exploiting Prelearned Local Affine Subspace Models

Authors: Thomas Haubner, Andreas Brendel, Walter Kellermann

Abstract: In this paper we present a novel algorithm for improved block-online supervised acoustic system identification in adverse noise scenarios by exploiting prior knowledge about the space of Room Impulse Responses (RIRs). The method is based on the assumption that the variability of the unknown RIRs is controlled by only few physical parameters, describing, e.g., source position movements, and thus is… ▽ More In this paper we present a novel algorithm for improved block-online supervised acoustic system identification in adverse noise scenarios by exploiting prior knowledge about the space of Room Impulse Responses (RIRs). The method is based on the assumption that the variability of the unknown RIRs is controlled by only few physical parameters, describing, e.g., source position movements, and thus is confined to a low-dimensional manifold which is modelled by a union of affine subspaces. The offsets and bases of the affine subspaces are learned in advance from training data by unsupervised clustering followed by Principal Component Analysis. We suggest to denoise the parameter update of any supervised adaptive filter by projecting it onto an optimal affine subspace which is selected based on a novel computationally efficient approximation of the associated evidence. The proposed method significantly improves the system identification performance of state-of-the-art algorithms in adverse noise scenarios. △ Less

Submitted 3 July, 2020; originally announced July 2020.

arXiv:2006.13769 [pdf, other]

Deep Neural Network based Distance Estimation for Geometry Calibration in Acoustic Sensor Networks

Authors: Tobias Gburrek, Joerg Schmalenstroeer, Andreas Brendel, Walter Kellermann, Reinhold Haeb-Umbach

Abstract: We present an approach to deep neural network based (DNN-based) distance estimation in reverberant rooms for supporting geometry calibration tasks in wireless acoustic sensor networks. Signal diffuseness information from acoustic signals is aggregated via the coherent-to-diffuse power ratio to obtain a distance-related feature, which is mapped to a source-to-microphone distance estimate by means o… ▽ More We present an approach to deep neural network based (DNN-based) distance estimation in reverberant rooms for supporting geometry calibration tasks in wireless acoustic sensor networks. Signal diffuseness information from acoustic signals is aggregated via the coherent-to-diffuse power ratio to obtain a distance-related feature, which is mapped to a source-to-microphone distance estimate by means of a DNN. This information is then combined with direction-of-arrival estimates from compact microphone arrays to infer the geometry of the sensor network. Unlike many other approaches to geometry calibration, the proposed scheme does only require that the sampling clocks of the sensor nodes are roughly synchronized. In simulations we show that the proposed DNN-based distance estimator generalizes to unseen acoustic environments and that precise estimates of the sensor node positions are obtained. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: Accepted for EUSIPCO 2020

arXiv:2003.09531 [pdf, ps, other]

Faster IVA: Update Rules for Independent Vector Analysis based on Negentropy and the Majorize-Minimize Principle

Authors: Andreas Brendel, Walter Kellermann

Abstract: Algorithms for Blind Source Separation (BSS) of acoustic signals require efficient and fast converging optimization strategies to adapt to nonstationary signal statistics and time-varying acoustic scenarios. In this paper, we derive fast converging update rules from a negentropy perspective, which are based on the Majorize-Minimize (MM) principle and eigenvalue decomposition. The presented update… ▽ More Algorithms for Blind Source Separation (BSS) of acoustic signals require efficient and fast converging optimization strategies to adapt to nonstationary signal statistics and time-varying acoustic scenarios. In this paper, we derive fast converging update rules from a negentropy perspective, which are based on the Majorize-Minimize (MM) principle and eigenvalue decomposition. The presented update rules are shown to outperform competing state-of-the-art methods in terms of convergence speed at a comparable runtime due to the restriction to unitary demixing matrices. This is demonstrated by experiments with recorded real-world data. △ Less

Submitted 22 July, 2021; v1 submitted 20 March, 2020; originally announced March 2020.

arXiv:2001.05958 [pdf, ps, other]

doi 10.1109/TSP.2020.3000199

A Unified Bayesian View on Spatially Informed Source Separation and Extraction based on Independent Vector Analysis

Authors: Andreas Brendel, Thomas Haubner, Walter Kellermann

Abstract: Signal separation and extraction are important tasks for devices recording audio signals in real environments which, aside from the desired sources, often contain several interfering sources such as background noise or concurrent speakers. Blind Source Separation (BSS) provides a powerful approach to address such problems. However, BSS algorithms typically treat all sources equally and do not reso… ▽ More Signal separation and extraction are important tasks for devices recording audio signals in real environments which, aside from the desired sources, often contain several interfering sources such as background noise or concurrent speakers. Blind Source Separation (BSS) provides a powerful approach to address such problems. However, BSS algorithms typically treat all sources equally and do not resolve uncertainty regarding the ordering of the separated signals at the output of the algorithm, i.e., the outer permutation problem. This paper addresses this problem by incorporating prior knowledge into the adaptation of the demixing filters, e.g., the position of the sources, in a Bayesian framework. We focus here on methods based on Independent Vector Analysis (IVA) as it elegantly and successfully deals with the internal permutation problem. By including a background model, i.e., a model for sources we are not interested to separate, we enable the algorithm to extract the sources of interest in overdetermined and underdetermined scenarios at a low computational complexity. The proposed framework allows to incorporate prior knowledge about the demixing filters in a generic way and unifies several known and newly proposed algorithms using a Bayesian view. For all algorithmic variants, we provide efficient update rules based on the iterative projection principle. The performance of a large variety of representative algorithmic variants, including very recent algorithms, is compared using measured room impulse responses. △ Less

Submitted 16 January, 2020; originally announced January 2020.

arXiv:1907.09972 [pdf, ps, other]

Spatially Informed Independent Vector Analysis

Authors: Andreas Brendel, Thomas Haubner, Walter Kellermann

Abstract: We present a Maximum A Posteriori (MAP) derivation of the Independent Vector Analysis (IVA) algorithm, a blind source separation algorithm, by incorporating a prior over the demixing matrices, relying on a free-field model. In this way, the outer permutation ambiguity of IVA is avoided. The resulting MAP optimization problem is solved by deriving majorize-minimize update rules to achieve convergen… ▽ More We present a Maximum A Posteriori (MAP) derivation of the Independent Vector Analysis (IVA) algorithm, a blind source separation algorithm, by incorporating a prior over the demixing matrices, relying on a free-field model. In this way, the outer permutation ambiguity of IVA is avoided. The resulting MAP optimization problem is solved by deriving majorize-minimize update rules to achieve convergence speed comparable to the well-known auxiliary function IVA algorithm. The performance of the proposed algorithm is investigated and compared to a benchmark algorithm using real measurements. △ Less

Submitted 16 January, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

arXiv:1511.04063 [pdf, ps, other]

Single-Channel Maximum-Likelihood T60 Estimation Exploiting Subband Information

Authors: Heinrich Loellmann, Andreas Brendel, Peter Vary, Walter Kellermann

Abstract: This contribution presents four algorithms developed by the authors for single-channel fullband and subband T60 estimation within the ACE challenge. The blind estimation of the fullband reverberation time (RT) by maximum-likelihood (ML) estimation based on [15] is considered as baseline approach. An improvement of this algorithm is devised where an energy-weighted averaging of the upper subband RT… ▽ More This contribution presents four algorithms developed by the authors for single-channel fullband and subband T60 estimation within the ACE challenge. The blind estimation of the fullband reverberation time (RT) by maximum-likelihood (ML) estimation based on [15] is considered as baseline approach. An improvement of this algorithm is devised where an energy-weighted averaging of the upper subband RT estimates is performed using either a DCT or 1/3-octave filter-bank. The evaluation results show that this approach leads to a lower variance for the estimation error in comparison to the baseline approach at the price of an increased computational complexity. Moreover, a new algorithm to estimate the subband RT is devised, where the RT estimates for the lower octave subbands are extrapolated from the RT estimates of the upper subbands by means of a simple model for the frequency-dependency of the subband RT. The evaluation results of the ACE challenge reveal that this approach allows to estimate the subband RT with an estimation error which is in a similar range as for the presented fullband RT estimators. △ Less

Submitted 12 November, 2015; originally announced November 2015.

Comments: In Proceedings of the ACE Challenge Workshop - a satellite event of IEEE-WASPAA 2015 (arXiv:1510.00383)

Report number: ACEChallenge/2015/05

Showing 1–19 of 19 results for author: Brendel, A