Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Gabbay, Aviv; Ephrat, Ariel; Halperin, Tavi; Peleg, Shmuel

Computer Science > Computer Vision and Pattern Recognition

arXiv:1708.06767 (cs)

[Submitted on 22 Aug 2017 (v1), last revised 9 Feb 2018 (this version, v3)]

Title:Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Authors:Aviv Gabbay, Ariel Ephrat, Tavi Halperin, Shmuel Peleg

View PDF

Abstract:Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. First, face motions captured in the video are used to estimate the speaker's voice, by passing the silent video frames through a video-to-speech neural network-based model. Then the speech predictions are applied as a filter on the noisy input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model. We evaluate our method on two audio-visual datasets, GRID and TCD-TIMIT, and show that our method attains significant SDR and PESQ improvements over the raw video-to-speech predictions, and a well-known audio-only method.

Comments:	Supplementary video: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
Cite as:	arXiv:1708.06767 [cs.CV]
	(or arXiv:1708.06767v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1708.06767

Submission history

From: Aviv Gabbay [view email]
[v1] Tue, 22 Aug 2017 18:02:27 UTC (3,534 KB)
[v2] Tue, 31 Oct 2017 19:18:41 UTC (430 KB)
[v3] Fri, 9 Feb 2018 21:34:19 UTC (509 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2017-08

Change to browse by:

cs
cs.SD

References & Citations

DBLP - CS Bibliography

listing | bibtex

Aviv Gabbay
Ariel Ephrat
Tavi Halperin
Shmuel Peleg

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators