Skip to main content

Showing 1–8 of 8 results for author: Lee, G G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15723  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment

    Authors: Hee** Do, Wonjun Lee, Gary Geunbae Lee

    Abstract: In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly inte… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  2. arXiv:2404.02592  [pdf

    cs.CL cs.SD eess.AS

    Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation

    Authors: Ye** Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  3. arXiv:2403.04111  [pdf

    cs.SD eess.AS

    Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication

    Authors: Ye** Jeon, Gary Geunbae Lee

    Abstract: This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide… ▽ More

    Submitted 3 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to EACL Main 2024

  4. arXiv:2401.02014  [pdf, other

    cs.SD eess.AS

    Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations

    Authors: Ye** Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that mode… ▽ More

    Submitted 5 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  5. arXiv:2312.03312  [pdf, other

    cs.CL cs.SD eess.AS

    Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

    Authors: Wonjun Lee, Gary Geunbae Lee, Yunsu Kim

    Abstract: This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. A… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 8 pages, ASRU 2023 Accepted

  6. arXiv:2312.01842  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking

    Authors: Jihyun Lee, Ye** Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee

    Abstract: Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted in ASRU 2023

  7. Score-balanced Loss for Multi-aspect Pronunciation Assessment

    Authors: Hee** Do, Yunsu Kim, Gary Geunbae Lee

    Abstract: With rapid technological growth, automatic pronunciation assessment has transitioned toward systems that evaluate pronunciation in various aspects, such as fluency and stress. However, despite the highly imbalanced score labels within each aspect, existing studies have rarely tackled the data imbalance problem. In this paper, we suggest a novel loss function, score-balanced loss, to address the pr… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

  8. Hierarchical Pronunciation Assessment with Multi-Aspect Attention

    Authors: Hee** Do, Yunsu Kim, Gary Geunbae Lee

    Abstract: Automatic pronunciation assessment is a major component of a computer-assisted pronunciation training system. To provide in-depth feedback, scoring pronunciation at various levels of granularity such as phoneme, word, and utterance, with diverse aspects such as accuracy, fluency, and completeness, is essential. However, existing multi-aspect multi-granularity methods simultaneously predict all asp… ▽ More

    Submitted 26 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted at ICASSP 2023