Skip to main content

Showing 1–11 of 11 results for author: Yeo, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.15151  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

    Authors: Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

    Abstract: In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM),… ▽ More

    Submitted 13 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: An Erratum was added on the last page of this paper

  2. arXiv:2401.09802  [pdf, other

    eess.AS cs.CV cs.SD

    Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

    Authors: Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se ** Park, Yong Man Ro

    Abstract: This paper explores sentence-level Multilingual Visual Speech Recognition with a single model for the first time. As the massive multilingual modeling of visual data requires huge computational costs, we propose a novel strategy, processing with visual speech units. Motivated by the recent success of the audio speech unit, the proposed visual speech unit is obtained by discretizing the visual spee… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  3. arXiv:2309.08535  [pdf, other

    cs.CV cs.AI eess.AS

    Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

    Authors: Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro

    Abstract: This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the… ▽ More

    Submitted 12 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  4. arXiv:2309.08531  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

    Authors: Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

    Abstract: In this paper, we propose methods to build a powerful and efficient Image-to-Speech captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp. We set the output of the proposed Im2Sp as discretized speech units, i.e., the quantized speech features of a self-s… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  5. arXiv:2308.09311  [pdf, other

    cs.CV cs.CL cs.SD eess.AS eess.IV

    Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

    Authors: Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

    Abstract: This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip reading models for low-resource languages. In order… ▽ More

    Submitted 12 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  6. arXiv:2308.07593  [pdf, other

    cs.CV cs.MM eess.AS eess.IV

    AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

    Authors: Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

    Abstract: Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different fro… ▽ More

    Submitted 11 January, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE Transactions on Multimedia

  7. arXiv:2305.04542  [pdf, other

    cs.CV eess.AS

    Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

    Authors: Jeong Hun Yeo, Minsu Kim, Yong Man Ro

    Abstract: Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements. Some works have been recently presented which use audio signals to supplement visual information. However, existing methods utilize only limited information such as phoneme-level features and soft labels of Automatic Speech Recognition (ASR) networks. In this paper, we present a Multi-Temporal Lip-Audio Mem… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Presented at ICASSP 2023

  8. arXiv:2303.15826  [pdf, other

    eess.IV cs.AI cs.CV

    MS-MT: Multi-Scale Mean Teacher with Contrastive Unpaired Translation for Cross-Modality Vestibular Schwannoma and Cochlea Segmentation

    Authors: Ziyuan Zhao, Kaixin Xu, Huai Zhe Yeo, Xulei Yang, Cuntai Guan

    Abstract: Domain shift has been a long-standing issue for medical image segmentation. Recently, unsupervised domain adaptation (UDA) methods have achieved promising cross-modality segmentation performance by distilling knowledge from a label-rich source domain to a target domain without labels. In this work, we propose a multi-scale self-ensembling based UDA framework for automatic segmentation of two key b… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted by BrainLes MICCAI proceedings (5th solution for MICCAI 2022 Cross-Modality Domain Adaptation (crossMoDA) Challenge)

  9. arXiv:2009.01382  [pdf

    eess.SY

    Application of Transformer Impedance Correction Tables in Power Flow Studies

    Authors: Pooria Dehghanian, Ju Hee Yeo, Jessica Wert, Hanyue Li, Komal Shetye, Thomas J. Overbye

    Abstract: Phase Shifting Transformers (PST) are used to control or block certain flows of real power through phase angle regulation across the device. Its functionality is crucial to special situations such as eliminating loop flow through an area and balancing real power flow between parallel paths. Impedance correction tables are used to model that the impedance of phase shifting transformers often vary a… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

  10. arXiv:2002.04406  [pdf, other

    physics.soc-ph cs.LG eess.SP stat.ML

    Traffic Data Imputation using Deep Convolutional Neural Networks

    Authors: Ouafa Benkraouda, Bilal Thonnam Thodi, Hwasoo Yeo, Monica Menendez, Saif Eddin Jabari

    Abstract: We propose a statistical learning-based traffic speed estimation method that uses sparse vehicle trajectory information. Using a convolutional encoder-decoder based architecture, we show that a well trained neural network can learn spatio-temporal traffic speed dynamics from time-space diagrams. We demonstrate this for a homogeneous road section using simulated vehicle trajectories and then valida… ▽ More

    Submitted 21 January, 2020; originally announced February 2020.

    Journal ref: IEEE Access, 8, 2020, pp. 104740-104752

  11. arXiv:1911.06934  [pdf, other

    eess.SY

    The Creation and Validation of Load Time Series for Synthetic Electric Power Systems

    Authors: Hanyue Li, Ju Hee Yeo, Ashly L. Bornsheuer, Thomas J. Overbye

    Abstract: Synthetic power systems that imitate functional and statistical characteristics of the actual grid have been developed to promote researchers' access to public system models. Develo** time series to represent different operating conditions of these synthetic systems will expand the potential of synthetic power systems applications. This paper proposes a methodology to create synthetic time serie… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: Submitted to IEEE Transactions on Power Systems