UniFLG: Unified Facial Landmark Generator from Text or Speech

Mitsui, Kentaro; Hono, Yukiya; Sawada, Kei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2302.14337 (cs)

[Submitted on 28 Feb 2023 (v1), last revised 19 May 2023 (this version, v2)]

Title:UniFLG: Unified Facial Landmark Generator from Text or Speech

Authors:Kentaro Mitsui, Yukiya Hono, Kei Sawada

View PDF

Abstract:Talking face generation has been extensively investigated owing to its wide applicability. The two primary frameworks used for talking face generation comprise a text-driven framework, which generates synchronized speech and talking faces from text, and a speech-driven framework, which generates talking faces from speech. To integrate these frameworks, this paper proposes a unified facial landmark generator (UniFLG). The proposed system exploits end-to-end text-to-speech not only for synthesizing speech but also for extracting a series of latent representations that are common to text and speech, and feeds it to a landmark decoder to generate facial landmarks. We demonstrate that our system achieves higher naturalness in both speech synthesis and facial landmark generation compared to the state-of-the-art text-driven method. We further demonstrate that our system can generate facial landmarks from speech of speakers without facial video data or even speech data.

Comments:	5 pages, 2 figures, 3 tables, accepted for INTERSPEECH 2023. Audio samples: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
Cite as:	arXiv:2302.14337 [cs.CV]
	(or arXiv:2302.14337v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2302.14337

Submission history

From: Kentaro Mitsui [view email]
[v1] Tue, 28 Feb 2023 06:05:43 UTC (361 KB)
[v2] Fri, 19 May 2023 02:43:32 UTC (363 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:UniFLG: Unified Facial Landmark Generator from Text or Speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:UniFLG: Unified Facial Landmark Generator from Text or Speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators