Skip to main content

Showing 1–22 of 22 results for author: Jung, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07923  [pdf, other

    cs.SD cs.AI eess.AS

    CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting

    Authors: Sichen **, Youngmoon Jung, Seung** Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho

    Abstract: This paper introduces a novel approach for streaming openvocabulary keyword spotting (KWS) with text-based keyword enrollment. For every input frame, the proposed method finds the optimal alignment ending at the frame using connectionist temporal classification (CTC) and aggregates the frame-level acoustic embedding (AE) to obtain higher-level (i.e., character, word, or phrase) AE that aligns with… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2406.05314  [pdf, other

    eess.AS cs.AI eess.SP

    Relational Proxy Loss for Audio-Text based Keyword Spotting

    Authors: Youngmoon Jung, Seung** Lee, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho

    Abstract: In recent years, there has been an increasing focus on user convenience, leading to increased interest in text-based keyword enrollment systems for keyword spotting (KWS). Since the system utilizes text input during the enrollment phase and audio input during actual usage, we call this task audio-text based KWS. To enable this task, both acoustic and text encoders are typically trained using deep… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, Accepted by Interspeech 2024

  3. arXiv:2310.05538  [pdf, other

    eess.IV cs.CV cs.LG

    M3FPolypSegNet: Segmentation Network with Multi-frequency Feature Fusion for Polyp Localization in Colonoscopy Images

    Authors: Ju-Hyeon Nam, Seo-Hyeong Park, Nur Suriza Syazwany, Yerim Jung, Yu-Han Im, Sang-Chul Lee

    Abstract: Polyp segmentation is crucial for preventing colorectal cancer a common type of cancer. Deep learning has been used to segment polyps automatically, which reduces the risk of misdiagnosis. Localizing small polyps in colonoscopy images is challenging because of its complex characteristics, such as color, occlusion, and various shapes of polyps. To address this challenge, a novel frequency-based ful… ▽ More

    Submitted 9 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 5pages. 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023

    MSC Class: 92C55

  4. arXiv:2309.07152  [pdf

    eess.SP physics.med-ph

    Novel Smart N95 Filtering Facepiece Respirator with Real-time Adaptive Fit Functionality and Wireless Humidity Monitoring for Enhanced Wearable Comfort

    Authors: Kangkyu Kwon, Yoon Jae Lee, Yeongju Jung, Ira Soltis, Chanyeong Choi, Yewon Na, Lissette Romero, Myung Chul Kim, Nathan Rodeheaver, Hodam Kim, Michael S. Lloyd, Ziqing Zhuang, William King, Susan Xu, Seung-Hwan Ko, **woo Lee, Woon-Hong Yeo

    Abstract: The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the cur… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 20 pages, 5 figures, 1 table, submitted for possible publication

    MSC Class: 92C55

  5. arXiv:2211.15950  [pdf, other

    eess.IV cs.CV

    Enhanced artificial intelligence-based diagnosis using CBCT with internal denoising: Clinical validation for discrimination of fungal ball, sinusitis, and normal cases in the maxillary sinus

    Authors: Kyungsu Kim, Chae Yeon Lim, Joong Bo Shin, Myung ** Chung, Yong Gi Jung

    Abstract: The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can di… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  6. arXiv:2210.11153  [pdf, other

    eess.IV cs.CV

    Reversed Image Signal Processing and RAW Reconstruction. AIM 2022 Challenge Report

    Authors: Marcos V. Conde, Radu Timofte, Yibin Huang, **gyang Peng, Chang Chen, Cheng Li, Eduardo PĂ©rez-Pellitero, Fenglong Song, Furui Bai, Shuai Liu, Chaoyu Feng, Xiaotao Wang, Lei Lei, Yu Zhu, Chenghua Li, Yingying Jiang, Yong A, Peisong Wang, Cong Leng, Jian Cheng, Xiaoyu Liu, Zhicun Yin, Zhilu Zhang, Junyi Li, Ming Liu , et al. (18 additional authors not shown)

    Abstract: Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP). Numerous low-level vision tasks operate in the RAW domain (e.g. image denoising, white balance) due to its linear relationship with the scene irradiance, wide-range of information at 12bits, and sensor designs. Despite this, RAW image data… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: ECCV 2022 Advances in Image Manipulation (AIM) workshop

  7. arXiv:2207.05176  [pdf, other

    cs.CV cs.LG eess.IV

    Denoising single images by feature ensemble revisited

    Authors: Masud An Nur Islam Fahim, Nazmus Saqib, Shafkat Khan Siam, Ho Yub Jung

    Abstract: Image denoising is still a challenging issue in many computer vision sub-domains. Recent studies show that significant improvements are made possible in a supervised setting. However, few challenges, such as spatial fidelity and cartoon-like smoothing remain unresolved or decisively overlooked. Our study proposes a simple yet efficient architecture for the denoising problem that addresses the afor… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  8. arXiv:2207.00555  [pdf, other

    eess.AS cs.CL cs.LG

    FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

    Authors: Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoirin Kim

    Abstract: Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted to Interspeech 2022

  9. arXiv:2110.03165  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Offline RL With Resource Constrained Online Deployment

    Authors: Jayanth Reddy Regatti, Aniket Anand Deshmukh, Frank Cheng, Young Hun Jung, Abhishek Gupta, Urun Dogan

    Abstract: Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (ava… ▽ More

    Submitted 7 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: Added experiments on discrete control and real world datasets along with more analyses on continuous control tasks

  10. arXiv:2011.01174  [pdf, other

    eess.AS cs.LG cs.SD

    Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech

    Authors: Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim

    Abstract: Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation. Therefore, we propose a novel method to improve speech quality by training a TTS model under the supervision of perceptual loss, which measures the… ▽ More

    Submitted 25 May, 2022; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: 9 pages, 5 figures, 4 tables

    Journal ref: IEEE Access, vol. 10, pp. 52621 - 52629, 2022

  11. arXiv:2010.02477  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

    Authors: Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim

    Abstract: Speaker verification (SV) has recently attracted considerable research interest due to the growing popularity of virtual assistants. At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments. In this paper, we consider one more important requirement for practical applications: the system sho… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 19 pages, 10 figures, 13 tables

    Journal ref: in IEEE Access, vol. 8, pp. 175448-175466, 2020

  12. arXiv:2008.11920  [pdf, other

    eess.AS

    Dynamic Noise Embedding: Noise Aware Training and Adaptation for Speech Enhancement

    Authors: Joohyung Lee, Youngmoon Jung, Myunghun Jung, Hoirin Kim

    Abstract: Estimating noise information exactly is crucial for noise aware training in speech applications including speech enhancement (SE) which is our focus in this paper. To estimate noise-only frames, we employ voice activity detection (VAD) to detect non-speech frames by applying optimal threshold on speech posterior. Here, the non-speech frames can be regarded as noise-only frames in noisy signal. The… ▽ More

    Submitted 3 December, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: Accepted to APSIPA ASC 2020

  13. arXiv:2008.03710  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling

    Authors: Yeunju Choi, Youngmoon Jung, Hoirin Kim

    Abstract: While deep learning has made impressive progress in speech synthesis and voice conversion, the assessment of the synthesized speech is still carried out by human participants. Several recent papers have proposed deep-learning-based assessment models and shown the potential to automate the speech quality assessment. To improve the previously proposed assessment model, MOSNet, we propose three model… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

    Comments: 5 pages, 1 figure, accepted to Interspeech 2020

    Journal ref: Proc. Interspeech 2020, pp. 1743-1747

  14. arXiv:2007.08267  [pdf, other

    eess.AS cs.LG cs.SD

    Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification

    Authors: Yeunju Choi, Youngmoon Jung, Hoirin Kim

    Abstract: Several studies have proposed deep-learning-based models to predict the mean opinion score (MOS) of synthesized speech, showing the possibility of replacing human raters. However, inter- and intra-rater variability in MOSs makes it hard to ensure the high performance of the models. In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model usi… ▽ More

    Submitted 2 December, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: 8 pages, 5 figures, accepted to SLT 2021

  15. arXiv:2005.03867  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

    Authors: Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim

    Abstract: Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challengin… ▽ More

    Submitted 7 August, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

    Comments: Accepted to Interspeech 2020

  16. arXiv:2004.03194  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

    Authors: Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim

    Abstract: Currently, the most widely used approach for speaker verification is the deep speaker embedding learning. In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor. Multi-scale aggregation (MSA), which utilizes multi-scale features from different layers of the feature extractor, has recently been in… ▽ More

    Submitted 6 August, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: Accepted to Interspeech 2020

    Journal ref: Proc. Interspeech 2020, pp. 1501-1505

  17. arXiv:2004.02863  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

    Authors: Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

    Abstract: In practical settings, a speaker recognition system needs to identify a speaker given a short utterance, while the enrollment utterance may be relatively long. However, existing speaker recognition models perform poorly with such short utterances. To solve this problem, we introduce a meta-learning framework for imbalance length pairs. Specifically, we use a Prototypical Networks and train it with… ▽ More

    Submitted 10 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to Interspeech 2020. The codes are available at https://github.com/seongmin-kye/meta-SR

  18. arXiv:2003.12266  [pdf, other

    eess.AS

    Dual Attention in Time and Frequency Domain for Voice Activity Detection

    Authors: Joohyung Lee, Youngmoon Jung, Hoirin Kim

    Abstract: Voice activity detection (VAD) is a challenging task in low signal-to-noise ratio (SNR) environment, especially in non-stationary noise. To deal with this issue, we propose a novel attention module that can be integrated in Long Short-Term Memory (LSTM). Our proposed attention module refines each LSTM layer's hidden states so as to make it possible to adaptively focus on both time and frequency do… ▽ More

    Submitted 25 August, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

    Comments: Accepted to Interspeech 2020

  19. arXiv:1910.00341  [pdf, other

    eess.AS cs.IR cs.LG cs.SD stat.ML

    Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

    Authors: Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim

    Abstract: Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection. Recently, on the fact that the orthography of text labels partly reflects the phonetic similarity between the words' pronunciation, a multi-view approach has been introduced that jointly learns acoustic and text embeddings… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted at 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)

  20. arXiv:1909.11886  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

    Authors: Youngmoon Jung, Yeunju Choi, Hoirin Kim

    Abstract: Voice activity detection (VAD), which classifies frames as speech or non-speech, is an important module in many speech applications including speaker verification. In this paper, we propose a novel method, called self-adaptive soft VAD, to incorporate a deep neural network (DNN)-based VAD into a deep speaker embedding system. The proposed method is a combination of the following two approaches. Th… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: Accepted at 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)

    Journal ref: Proc. of ASRU 2019, pp. 365-372

  21. arXiv:1906.08333  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

    Authors: Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim

    Abstract: In this paper, we propose a new pooling method called spatial pyramid encoding (SPE) to generate speaker embeddings for text-independent speaker verification. We first partition the output feature maps from a deep residual network (ResNet) into increasingly fine sub-regions and extract speaker embeddings from each sub-region through a learnable dictionary encoding layer. These embeddings are conca… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: 5 pages, 2 figures, Interspeech 2019

    Journal ref: Proc. of Interspeech 2019, 2019, pp. 4030-4034

  22. arXiv:1811.02736  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD eess.SP

    Learning acoustic word embeddings with phonetically associated triplet network

    Authors: Hyungjun Lim, Younggwan Kim, Youngmoon Jung, Myunghun Jung, Hoirin Kim

    Abstract: Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network. However, the triplet network is trained using only a limited information about acoustic similarity between words. In this paper, we propose a novel architecture, phonetically associated triplet network (PATN), which aims at incr… ▽ More

    Submitted 27 November, 2018; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: 5 pages, 4 figures, submitted to ICASSP 2019