Skip to main content

Showing 1–3 of 3 results for author: Kefalas, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.16584  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Audio-visual video-to-speech synthesis with synthesized input audio

    Authors: Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic

    Abstract: Video-to-speech synthesis involves reconstructing the speech signal of a speaker from a silent video. The implicit assumption of this task is that the sound signal is either missing or contains a high amount of noise/corruption such that it is not useful for processing. Previous works in the literature either use video inputs only or employ both video and audio inputs during training, and discard… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  2. arXiv:2306.15464  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Large-scale unsupervised audio pre-training for video-to-speech synthesis

    Authors: Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic

    Abstract: Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video of a speaker. Most established approaches to date involve a two-step process, whereby an intermediate representation from the video, such as a spectrogram, is extracted first and then passed to a vocoder to produce the raw audio. Some recent work has focused on end-to-end synthesis, whereby the generation… ▽ More

    Submitted 31 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Corrected typos. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:1912.05833  [pdf, other

    cs.LG eess.AS stat.ML

    Speech-driven facial animation using polynomial fusion of features

    Authors: Triantafyllos Kefalas, Konstantinos Vougioukas, Yannis Panagakis, Stavros Petridis, Jean Kossaifi, Maja Pantic

    Abstract: Speech-driven facial animation involves using a speech signal to generate realistic videos of talking faces. Recent deep learning approaches to facial synthesis rely on extracting low-dimensional representations and concatenating them, followed by a decoding step of the concatenated vector. This accounts for only first-order interactions of the features and ignores higher-order interactions. In th… ▽ More

    Submitted 19 February, 2020; v1 submitted 12 December, 2019; originally announced December 2019.