Skip to main content

Showing 1–17 of 17 results for author: Yoo, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  2. arXiv:2403.11578  [pdf, other

    eess.AS

    AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

    Authors: SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence map**. While earlier efforts have tried to combine the CTC loss with an entropy maximization regulariz… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  3. arXiv:2312.09736  [pdf, other

    cs.CL cs.SD eess.AS

    HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

    Authors: Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo

    Abstract: Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in develo** VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information f… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023, 14 pages, 13 figures

  4. arXiv:2312.05790  [pdf, other

    cs.LG cs.AI eess.SP

    SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation

    Authors: Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo

    Abstract: Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  5. arXiv:2311.18508  [pdf, other

    eess.IV cs.CV

    DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

    Authors: Axi Niu, Kang Zhang, Joshua Tian ** Tee, Trung X. Pham, **qiu Sun, Chang D. Yoo, In So Kweon, Yanning Zhang

    Abstract: It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion. We attribute the cause of such distortions to the poor calibration of the discriminator, which hampers its ability to provide meaningful feedback to the generator for learning high-quality images. To… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  6. arXiv:2310.02382  [pdf, other

    cs.CL cs.SD eess.AS

    Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching

    Authors: Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N=3) combined with positional unigram statistics gathered from a small batch of samples. Eva… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  7. arXiv:2308.08442  [pdf, other

    cs.CL cs.SD eess.AS

    Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

    Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

    Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

  8. arXiv:2306.07926  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    A Theory of Unsupervised Speech Recognition

    Authors: Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Unsupervised speech recognition (ASR-U) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora. While various algorithms exist to solve this problem, a theoretical framework is missing from studying their properties and addressing such issues as sensitivity to hyperparameters and training instability. In this paper, we proposed a gener… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  9. arXiv:2305.16371  [pdf, other

    cs.CL cs.SD eess.AS

    INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

    Authors: Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: ACL2023

  10. arXiv:2303.13110  [pdf, other

    eess.IV cs.CV

    OCELOT: Overlapped Cell on Tissue Dataset for Histopathology

    Authors: Jeongun Ryu, Aaron Valero Puche, JaeWoong Shin, Seonwook Park, Biagio Brattoli, **hee Lee, Wonkyung Jung, Soo Ick Cho, Kyunghyun Paeng, Chan-Young Ock, Donggeun Yoo, Sérgio Pereira

    Abstract: Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by… ▽ More

    Submitted 23 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at CVPR'23

  11. arXiv:2207.06020  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS eess.IV

    Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition

    Authors: Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro

    Abstract: This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature Enhancement module (V-CAFE) to enhance the input noisy audio speech with a help of audio-visual correspondence. The proposed V-CAFE is designed to capture the transition of lip movements, namely visual context and to generate a noise r… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: Accepted at Interspeech 2022

  12. arXiv:2111.05014  [pdf, other

    eess.IV cs.AI cs.CV

    GDCA: GAN-based single image super resolution with Dual discriminators and Channel Attention

    Authors: Thanh Nguyen, Hieu Hoang, Chang D. Yoo

    Abstract: Single Image Super-Resolution (SISR) is a very active research field. This paper addresses SISR by using a GAN-based approach with dual discriminators and incorporating it with an attention mechanism. The experimental results show that GDCA can generate sharper and high pleasing images compare to other conventional methods.

    Submitted 9 November, 2021; originally announced November 2021.

    Journal ref: Korean Association of Artificial Intelligence 2019

  13. arXiv:2108.00475  [pdf, other

    cs.CV eess.IV

    Self-supervised Learning with Local Attention-Aware Feature

    Authors: Trung X. Pham, Rusty John Lloyd Mina, Dias Issa, Chang D. Yoo

    Abstract: In this work, we propose a novel methodology for self-supervised learning for generating global and local attention-aware visual features. Our approach is based on training a model to differentiate between specific image transformations of an input sample and the patched images. Utilizing this approach, the proposed method is able to outperform the previous best competitor by 1.03% on the Tiny-Ima… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: 5 pages, 4 figures

  14. Robust MAML: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning

    Authors: Thanh Nguyen, Tung Luu, Trung Pham, Sanzhar Rakhimkul, Chang D. Yoo

    Abstract: Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm that provides good weight initialization of a model given a variety of learning tasks. The model initialized by provided weight can be fine-tuned to an unseen task despite only using a small amount of samples and within a few adaptation steps. MAML is simple and versatile but requires costly learning rate tun… ▽ More

    Submitted 10 June, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  15. arXiv:2012.00348  [pdf

    cs.LG eess.SP stat.ML

    Deep Learning-Based Arrhythmia Detection Using RR-Interval Framed Electrocardiograms

    Authors: Song-Kyoo Kim, Chan Yeob Yeun, Paul D. Yoo, Nai-Wei Lo, Ernesto Damiani

    Abstract: Deep learning applied to electrocardiogram (ECG) data can be used to achieve personal authentication in biometric security applications, but it has not been widely used to diagnose cardiovascular disorders. We developed a deep learning model for the detection of arrhythmia in which time-sliced ECG data representing the distance between successive R-peaks are used as the input for a convolutional n… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: This paper is considered to be submitted to an international journal

  16. arXiv:1907.13517  [pdf

    cs.CR cs.LG eess.SP

    An Enhanced Machine Learning-based Biometric Authentication System Using RR-Interval Framed Electrocardiograms

    Authors: Amang Song-Kyoo Kim, Chan Yeob Yeun, Paul D. Yoo

    Abstract: This paper is targeted in the area of biometric data enabled security system based on the machine learning for the digital health. The disadvantages of traditional authentication systems include the risks of forgetfulness, loss, and theft. Biometric authentication is therefore rapidly replacing traditional authentication methods and is becoming an everyday part of life. The electrocardiogram (ECG)… ▽ More

    Submitted 30 November, 2019; v1 submitted 27 July, 2019; originally announced July 2019.

    Comments: The paper has been accepted and published in the IEEE Access

    Journal ref: IEEE Access 7 (2019), pp. 168669-168674

  17. arXiv:1907.00366  [pdf

    cs.CR cs.LG eess.SP stat.ML

    An Enhanced Electrocardiogram Biometric Authentication System Using Machine Learning

    Authors: Ebrahim Al Alkeem, Song-Kyoo Kim, Chan Yeob Yeun, M. Jamal Zemerly, Kin Poon, Paul D. Yoo

    Abstract: Traditional authentication systems use alphanumeric or graphical passwords, or token-based techniques that require "something you know and something you have". The disadvantages of these systems include the risks of forgetfulness, loss, and theft. To address these shortcomings, biometric authentication is rapidly replacing traditional authentication methods and is becoming a part of everyday life.… ▽ More

    Submitted 24 September, 2019; v1 submitted 30 June, 2019; originally announced July 2019.

    Comments: This paper has been published in the IEEE Access

    Journal ref: IEEE Access 7 (2019), pp. 123069-123075