Search | arXiv e-print repository

Informed FastICA: Semi-Blind Minimum Variance Distortionless Beamformer

Authors: Zbyněk Koldovský, Jiří Málek, Jaroslav Čmejla, Stephen O'Regan

Abstract: Non-Gaussianity-based Independent Vector Extraction leads to the famous one-unit FastICA/FastIVA algorithm when the likelihood function is optimized using an approximate Newton-Raphson algorithm under the orthogonality constraint. In this paper, we replace the constraint with the analytic form of the minimum variance distortionless beamformer (MVDR), by which a semi-blind variant of FastICA/FastIV… ▽ More Non-Gaussianity-based Independent Vector Extraction leads to the famous one-unit FastICA/FastIVA algorithm when the likelihood function is optimized using an approximate Newton-Raphson algorithm under the orthogonality constraint. In this paper, we replace the constraint with the analytic form of the minimum variance distortionless beamformer (MVDR), by which a semi-blind variant of FastICA/FastIVA is obtained. The side information here is provided by a weighted covariance matrix replacing the noise covariance matrix, the estimation of which is a frequent goal of neural beamformers. The algorithm thus provides an intuitive connection between model-based blind extraction and learning-based extraction. The algorithm is tested in simulations and speaker ID-guided speaker extraction, showing fast convergence and promising performance. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: accepted for IWAENC 2024

arXiv:2304.01778 [pdf, other]

Independent Vector Extraction Constrained on Manifold of Half-Length Filters

Authors: Zbyněk Koldovský, Jaroslav Čmejla, Tülay Adalı, Stephen O'Regan

Abstract: Independent Vector Analysis (IVA) is a popular extension of Independent Component Analysis (ICA) for joint separation of a set of instantaneous linear mixtures, with a direct application in frequency-domain speaker separation or extraction. The mixtures are parameterized by mixing matrices, one matrix per mixture. This means that the IVA mixing model does not account for any relationships between… ▽ More Independent Vector Analysis (IVA) is a popular extension of Independent Component Analysis (ICA) for joint separation of a set of instantaneous linear mixtures, with a direct application in frequency-domain speaker separation or extraction. The mixtures are parameterized by mixing matrices, one matrix per mixture. This means that the IVA mixing model does not account for any relationships between parameters across the mixtures/frequencies. The separation proceeds jointly only through the source model, where statistical dependencies of sources across the mixtures are taken into account. In this paper, we propose a mixing model for joint blind source extraction where the mixing model parameters are linked across the frequencies. This is achieved by constraining the set of feasible parameters to the manifold of half-length separating filters, which has a clear interpretation and application in frequency-domain speaker extraction. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2212.01178 [pdf, other]

Dynamic Independent Component Extraction with Blending Mixing Vector: Lower Bound on Mean Interference-to-Signal Ratio

Authors: Jaroslav Čmejla, Zbyněk Koldovský, Václav Kautský, Tülay Adalı

Abstract: This paper deals with dynamic Blind Source Extraction (BSE) from where the mixing parameters characterizing the position of a source of interest (SOI) are allowed to vary over time. We present a new source extraction model called CvxCSV which is a parameter-reduced modification of the recent Constant Separation Vector (CSV) mixing model. In CvxCSV, the mixing vector evolves as a convex combination… ▽ More This paper deals with dynamic Blind Source Extraction (BSE) from where the mixing parameters characterizing the position of a source of interest (SOI) are allowed to vary over time. We present a new source extraction model called CvxCSV which is a parameter-reduced modification of the recent Constant Separation Vector (CSV) mixing model. In CvxCSV, the mixing vector evolves as a convex combination of its initial and final values. We derive a lower bound on the achievable mean interference-to-signal ratio (ISR) based on the Cramér-Rao theory. The bound reveals advantageous properties of CvxCSV compared with CSV and compared with a sequential BSE based on independent component extraction (ICE). In particular, the achievable ISR by CvxCSV is lower than that by the previous approaches. Moreover, the model requires significantly weaker conditions for identifiability, even when the SOI is Gaussian. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: submitted to a conference

arXiv:2111.03482 [pdf, other]

doi 10.1109/TASLP.2022.3190739

Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification

Authors: Jiri Malek, Jakub Jansky, Zbynek Koldovsky, Tomas Kounovsky, Jaroslav Cmejla, **drich Zdansky

Abstract: This manuscript proposes a novel robust procedure for the extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is performed via independent vector extraction (IVE). Since the blind IVE cannot distinguish the target source by itself, it is guided towards the SOI via frame-wise speaker identification based on deep learning. Still, an incorrect speaker… ▽ More This manuscript proposes a novel robust procedure for the extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is performed via independent vector extraction (IVE). Since the blind IVE cannot distinguish the target source by itself, it is guided towards the SOI via frame-wise speaker identification based on deep learning. Still, an incorrect speaker can be extracted due to guidance failings, especially when processing challenging data. To identify such cases, we propose a criterion for non-intrusively assessing the estimated speaker. It utilizes the same model as the speaker identification, so no additional training is required. When incorrect extraction is detected, we propose a ``deflation'' step in which the incorrect source is subtracted from the mixture and, subsequently, another attempt to extract the SOI is performed. The process is repeated until successful extraction is achieved. The proposed procedure is experimentally tested on artificial and real-world datasets containing challenging phenomena: source movements, reverberation, transient noise, or microphone failures. The method is compared with state-of-the-art blind algorithms as well as with current fully supervised deep learning-based methods. △ Less

Submitted 8 July, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: Modified version of the article accepted for publication in IEEE/ACM Transactions on Audio Speech and Language Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusions

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2295-2309, 2022

arXiv:2002.12619 [pdf, other]

Auxiliary Function-Based Algorithm for Blind Extraction of a Moving Speaker

Authors: Jakub Janský, Zbyněk Koldovský, Jiří Málek, Tomáš Kounovský, Jaroslav Čmejla

Abstract: Recently, Constant Separating Vector (CSV) mixing model has been proposed for the Blind Source Extraction (BSE) of moving sources. In this paper, we experimentally verify the applicability of CSV in the blind extraction of a moving speaker and propose a new BSE method derived by modifying the auxiliary function-based algorithm for Independent Vector Analysis. Also, a piloted variant is proposed fo… ▽ More Recently, Constant Separating Vector (CSV) mixing model has been proposed for the Blind Source Extraction (BSE) of moving sources. In this paper, we experimentally verify the applicability of CSV in the blind extraction of a moving speaker and propose a new BSE method derived by modifying the auxiliary function-based algorithm for Independent Vector Analysis. Also, a piloted variant is proposed for the method with partially controllable global convergence. The methods are verified under reverberant and noisy conditions using {\color{red} simulated as well as real-world acoustic conditions}. They are also verified within the CHiME-4 speech separation and recognition challenge. The experiments corroborate the applicability of CSV as well as the improved convergence of the proposed algorithms. △ Less

Submitted 5 February, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

arXiv:1910.11824 [pdf, other]

Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors

Authors: Jakub Janský, Jiří Málek, Jaroslav Čmejla, Tomáš Kounovský, Zbyněk Koldovský, **dřich Žďánský

Abstract: We propose a novel algorithm for adaptive blind audio source extraction. The proposed method is based on independent vector analysis and utilizes the auxiliary function optimization to achieve high convergence speed. The algorithm is partially supervised by a pilot signal related to the source of interest (SOI), which ensures that the method correctly extracts the utterance of the desired speaker.… ▽ More We propose a novel algorithm for adaptive blind audio source extraction. The proposed method is based on independent vector analysis and utilizes the auxiliary function optimization to achieve high convergence speed. The algorithm is partially supervised by a pilot signal related to the source of interest (SOI), which ensures that the method correctly extracts the utterance of the desired speaker. The pilot is based on the identification of a dominant speaker in the mixture using x-vectors. The properties of the x-vectors computed in the presence of cross-talk are experimentally analyzed. The proposed approach is verified in a scenario with a moving SOI, static interfering speaker, and environmental noise. △ Less

Submitted 25 October, 2019; originally announced October 2019.

arXiv:1910.10242 [pdf, other]

Algorithm for Independent Vector Extraction Based on Semi-Time-Variant Mixing Model

Authors: Zbyněk Koldovský, Václav Kautský, Tomáš Kounovský, Jaroslav Čmejla

Abstract: A new algorithm for dynamic independent vector extraction is proposed. It is based on the mixing model where mixing parameters related to the source-of-interest (SOI) are time-variant while the separating parameters are time-invariant. A contrast function based on the quasi-likelihood approach is optimized using the Newton-Raphson approach. The update is computed without imposing the orthogonal co… ▽ More A new algorithm for dynamic independent vector extraction is proposed. It is based on the mixing model where mixing parameters related to the source-of-interest (SOI) are time-variant while the separating parameters are time-invariant. A contrast function based on the quasi-likelihood approach is optimized using the Newton-Raphson approach. The update is computed without imposing the orthogonal constraint, and the orthogonality is enforced afterward. This yields an algorithm that is significantly faster than gradient-based algorithms while different from fixed-point methods, which are even faster. We show advantageous properties of the proposed algorithm compared to the fixed-point methods in an on-line processing regime where stable convergence to the SOI is the important issue. The effectiveness of the method is demonstrated in a speech extraction experiment with a dense microphone array. △ Less

Submitted 1 March, 2021; v1 submitted 22 October, 2019; originally announced October 2019.

arXiv:1907.12421 [pdf, other]

MIRaGe: Multichannel Database Of Room Impulse Responses Measured On High-Resolution Cube-Shaped Grid In Multiple Acoustic Conditions

Authors: Jaroslav Čmejla, Tomáš Kounovský, Sharon Gannot, Zbyněk Koldovský, Pinchas Tandeitnik

Abstract: We introduce a database of multi-channel recordings performed in an acoustic lab with adjustable reverberation time. The recordings provide information about room impulse responses (RIR) for various positions of a loudspeaker. In particular, the main positions correspond to 4104 vertices of a cube-shaped dense grid within a 46x36x32 cm volume. The database thus provides a tool for detailed analyse… ▽ More We introduce a database of multi-channel recordings performed in an acoustic lab with adjustable reverberation time. The recordings provide information about room impulse responses (RIR) for various positions of a loudspeaker. In particular, the main positions correspond to 4104 vertices of a cube-shaped dense grid within a 46x36x32 cm volume. The database thus provides a tool for detailed analyses of beampatterns of spatial processing methods as well as for training and testing of mathematical models of the acoustic field. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Showing 1–8 of 8 results for author: Čmejla, J