Skip to main content

Showing 1–13 of 13 results for author: Heo, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07103  [pdf, other

    eess.AS cs.AI

    MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

    Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-** Yu

    Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  2. arXiv:2405.05426  [pdf, other

    eess.SY

    ATLS: Automated Trailer Loading for Surface Vessels

    Authors: Amer Abughaida, Meet Gandhi, Jun Heo, Vaishnav Tadiparthi, Yosuke Sakamoto, Joohyun Woo, Sangjae Bae

    Abstract: Automated docking technologies of marine boats have been enlightened by an increasing number of literature. This paper contributes to the literature by proposing a mathematical framework that automates "trailer loading" in the presence of wind disturbances, which is unexplored despite its importance to boat owners. The comprehensive pipeline of localization, system identification, and trajectory o… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: To be presented at IEEE Intelligent Vehicles Symposium (IV 2024)

  3. arXiv:2309.08320  [pdf, other

    eess.AS cs.SD

    Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

    Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-** Yu

    Abstract: Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV… ▽ More

    Submitted 13 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures, accepted for ICASSP 2024

  4. arXiv:2309.08208  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

    Authors: Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-** Yu

    Abstract: Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Submitted to 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

  5. arXiv:2309.04549  [pdf, other

    cs.CV cs.DC cs.MM eess.IV

    Poster: Making Edge-assisted LiDAR Perceptions Robust to Lossy Point Cloud Compression

    Authors: ** Heo, Gregorie Phillips, Per-Erik Brodin, Ada Gavrilovska

    Abstract: Real-time light detection and ranging (LiDAR) perceptions, e.g., 3D object detection and simultaneous localization and map** are computationally intensive to mobile devices of limited resources and often offloaded on the edge. Offloading LiDAR perceptions requires compressing the raw sensor data, and lossy compression is used for efficiently reducing the data volume. Lossy compression degrades t… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: extended abstract of 2 pages, 2 figures, 1 table

  6. arXiv:2307.10628  [pdf, other

    eess.AS cs.SD

    PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

    Authors: Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-** Yu

    Abstract: Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environ… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, 1 table, accepted to CKAIA2023 as a conference paper

  7. arXiv:2305.17394  [pdf, other

    eess.AS cs.SD

    One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

    Authors: Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-** Yu

    Abstract: The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods coul… ▽ More

    Submitted 7 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: ISCA INTERSPEECH 2023

  8. arXiv:2211.02227  [pdf, other

    eess.AS cs.SD

    Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

    Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-** Yu

    Abstract: The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive mod… ▽ More

    Submitted 1 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures

  9. arXiv:2211.01599  [pdf, other

    eess.AS cs.SD

    Convolution channel separation and frequency sub-bands aggregation for music genre classification

    Authors: Jungwoo Heo, Hyun-seo Shin, Ju-ho Kim, Chan-yeong Lim, Ha-** Yu

    Abstract: In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  10. Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

    Authors: Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin

    Abstract: The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating AS… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: 5 pages, 4 figures, 5 tables, accepted to 2022 Interspeech as a conference paper

    Journal ref: Proc. Interspeech 2022

  11. arXiv:2206.13044  [pdf, other

    eess.AS cs.SD

    Extended U-Net for Speaker Verification in Noisy Environments

    Authors: Ju-ho Kim, Jungwoo Heo, Hye-** Shim, Ha-** Yu

    Abstract: Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility. Various studies have used separate pretrained enhancement models as the front-end module of the SV system in noisy environments, and these methods effectively remove noises. However, the denoising process of independent enhancement models n… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 5 pages, 2 figures, 4 tables, accepted to 2022 Interspeech as a conference paper

  12. arXiv:2112.12343  [pdf, other

    cs.SD eess.AS

    Graph attentive feature aggregation for text-independent speaker verification

    Authors: Hye-** Shim, Jungwoo Heo, Jae-han Park, Ga-hui Lee, Ha-** Yu

    Abstract: The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pairwise relationship. For this purpose, we propose a novel graph attentive feature aggregation module by interpreting each frame-level feature as a node of a graph. The inter-relationship between all possible pairs of features, typically exploited indirectly, can be dir… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 5 pages, 1 figure, 6 tables, submitted to ICASSP 2022

  13. arXiv:2112.07935  [pdf, other

    eess.AS

    RawNeXt: Speaker verification system for variable-duration utterances with deep layer aggregation and extended dynamic scaling policies

    Authors: Ju-ho Kim, Hye-** Shim, Jungwoo Heo, Ha-** Yu

    Abstract: Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems. To deal with this issue, we propose a speaker verification system called RawNeXt that can handle input raw waveforms of arbitrary length by employing the following two components: (1) A deep layer aggregation strate… ▽ More

    Submitted 27 June, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 5 pages, 2 figures, 4 tables, accepted to 2022 ICASSP as a conference paper