Neural Target Speech Extraction: An Overview

Zmolikova, Katerina; Delcroix, Marc; Ochiai, Tsubasa; Kinoshita, Keisuke; Černocký, Jan; Yu, Dong

doi:10.1109/MSP.2023.3240008

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2301.13341 (eess)

[Submitted on 31 Jan 2023]

Title:Neural Target Speech Extraction: An Overview

Authors:Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu

View PDF

Abstract:Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail-party effect. For decades, researchers have focused on approaching the listening ability of humans. One critical issue is handling interfering speakers because the target and non-target speech signals share similar characteristics, complicating their discrimination. Target speech/speaker extraction (TSE) isolates the speech signal of a target speaker from a mixture of several speakers with or without noises and reverberations using clues that identify the speaker in the mixture. Such clues might be a spatial clue indicating the direction of the target speaker, a video of the speaker's lips, or a pre-recorded enrollment utterance from which their voice characteristics can be derived. TSE is an emerging field of research that has received increased attention in recent years because it offers a practical approach to the cocktail-party problem and involves such aspects of signal processing as audio, visual, array processing, and deep learning. This paper focuses on recent neural-based approaches and presents an in-depth overview of TSE. We guide readers through the different major approaches, emphasizing the similarities among frameworks and discussing potential future directions.

Comments:	Submitted to IEEE Signal Processing Magazine on Apr. 25, 2022, and accepted on Jan. 12, 2023
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2301.13341 [eess.AS]
	(or arXiv:2301.13341v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2301.13341
Related DOI:	https://doi.org/10.1109/MSP.2023.3240008

Submission history

From: Marc Delcroix [view email]
[v1] Tue, 31 Jan 2023 00:26:52 UTC (1,743 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Neural Target Speech Extraction: An Overview

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Neural Target Speech Extraction: An Overview

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators