Skip to main content

Showing 1–9 of 9 results for author: Sung-Bin, K

.
  1. arXiv:2407.01034  [pdf, other

    cs.CV cs.GR

    Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert

    Authors: Han EunGi, Oh Hyun-Bin, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, Janghoon Joo, Tae-Hyun Oh

    Abstract: Speech-driven 3D facial animation has recently garnered attention due to its cost-effective usability in multimedia production. However, most current advances overlook the intelligibility of lip movements, limiting the realism of facial expressions. In this paper, we introduce a method for speech-driven 3D facial animation to generate accurate lip movements, proposing an audio-visual multimodal pe… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: INTERSPEECH 2024

  2. arXiv:2406.14272  [pdf, other

    cs.CV cs.GR

    MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

    Authors: Kim Sung-Bin, Lee Chae-Yeon, Gihun Son, Oh Hyun-Bin, Janghoon Ju, Suekyeong Nam, Tae-Hyun Oh

    Abstract: Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  3. arXiv:2403.01898  [pdf, other

    cs.CV eess.IV

    Revisiting Learning-based Video Motion Magnification for Real-time Processing

    Authors: Hyunwoo Ha, Oh Hyun-Bin, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh

    Abstract: Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being e… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 19 pages

  4. arXiv:2312.09818  [pdf, other

    cs.CL cs.AI

    SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

    Authors: Lee Hyun, Kim Sung-Bin, Seungju Han, Youngjae Yu, Tae-Hyun Oh

    Abstract: Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explai… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 19 pages, 14 figures

  5. arXiv:2311.00994  [pdf, other

    cs.CV cs.GR

    LaughTalk: Expressive 3D Talking Head Generation with Laughter

    Authors: Kim Sung-Bin, Lee Hyun, Da Hye Hong, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

    Abstract: Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV2024

  6. arXiv:2310.03205  [pdf, other

    cs.CV cs.AI

    A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

    Authors: Kim Youwang, Lee Hyun, Kim Sung-Bin, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

    Abstract: We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFa… ▽ More

    Submitted 6 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 9 pages, 7 figures, and 3 tables for the main paper. 8 pages, 6 figures and 3 tables for the appendix

  7. arXiv:2308.07378  [pdf, other

    cs.CV

    The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation

    Authors: Kwon Byung-Ki, Kim Sung-Bin, Tae-Hyun Oh

    Abstract: Recent work on dense optical flow has shown significant progress, primarily in a supervised learning manner requiring a large amount of labeled data. Due to the expensiveness of obtaining large scale real-world data, computer graphics are typically leveraged for constructing datasets. However, there is a common belief that synthetic-to-real domain gaps limit generalization to real scenes. In this… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  8. arXiv:2303.17490  [pdf, other

    cs.CV cs.MM cs.SD eess.AS eess.IV

    Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

    Authors: Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh

    Abstract: How does audio describe the world around us? In this paper, we propose a method for generating an image of a scene from sound. Our method addresses the challenges of dealing with the large gaps that often exist between sight and sound. We design a model that works by scheduling the learning procedure of each model component to associate audio-visual modalities despite their information gaps. The k… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  9. arXiv:2303.17489  [pdf, other

    eess.AS cs.MM cs.SD

    Prefix tuning for automated audio captioning

    Authors: Minkyu Kim, Kim Sung-Bin, Tae-Hyun Oh

    Abstract: Audio captioning aims to generate text descriptions from environmental sounds. One challenge of audio captioning is the difficulty of the generalization due to the lack of audio-text paired training data. In this work, we propose a simple yet effective method of dealing with small-scaled datasets by leveraging a pre-trained language model. We keep the language model frozen to maintain the expressi… ▽ More

    Submitted 4 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023