Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

Briegleb, Annika; Halimeh, Mhd Modar; Kellermann, Walter

doi:10.1109/ICASSP49357.2023.10095196

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2210.15512 (eess)

[Submitted on 27 Oct 2022 (v1), last revised 14 Mar 2023 (this version, v2)]

Title:Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

Authors:Annika Briegleb, Mhd Modar Halimeh, Walter Kellermann

View PDF

Abstract:In conventional multichannel audio signal enhancement, spatial and spectral filtering are often performed sequentially. In contrast, it has been shown that for neural spatial filtering a joint approach of spectro-spatial filtering is more beneficial. In this contribution, we investigate the spatial filtering performed by such a time-varying spectro-spatial filter. We extend the recently proposed complex-valued spatial autoencoder (COSPA) for the task of target speaker extraction by leveraging its interpretable structure and purposefully informing the network of the target speaker's position. We show that the resulting informed COSPA (iCOSPA) effectively and flexibly extracts a target speaker from a mixture of speakers. We also find that the proposed architecture is well capable of learning pronounced spatial selectivity patterns and show that the results depend significantly on the training target and the reference signal when computing various evaluation metrics.

Comments:	Accepted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece. 5 pages, 2 figures
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2210.15512 [eess.AS]
	(or arXiv:2210.15512v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2210.15512
Related DOI:	https://doi.org/10.1109/ICASSP49357.2023.10095196

Submission history

From: Annika Briegleb [view email]
[v1] Thu, 27 Oct 2022 14:47:51 UTC (275 KB)
[v2] Tue, 14 Mar 2023 15:17:57 UTC (152 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators