Skip to main content

Showing 1–47 of 47 results for author: Yoon, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.11578  [pdf, other

    eess.AS

    AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

    Authors: SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence map**. While earlier efforts have tried to combine the CTC loss with an entropy maximization regulariz… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  2. arXiv:2403.06940  [pdf, other

    eess.IV cs.LG q-bio.QM

    Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction

    Authors: Qing Xiao, Siyeop Yoon, Hui Ren, Matthew Tivnan, Lichao Sun, Quanzheng Li, Tianming Liu, Yu Zhang, Xiang Li

    Abstract: Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  3. arXiv:2403.06069  [pdf, other

    eess.IV cs.CV cs.LG

    Implicit Image-to-Image Schrodinger Bridge for CT Super-Resolution and Denoising

    Authors: Yuang Wang, Siyeop Yoon, Pengfei **, Matthew Tivnan, Zhennong Chen, Rui Hu, Li Zhang, Zhiqiang Chen, Quanzheng Li, Dufan Wu

    Abstract: Conditional diffusion models have gained recognition for their effectiveness in image restoration tasks, yet their iterative denoising process, starting from Gaussian noise, often leads to slow inference speeds. As a promising alternative, the Image-to-Image Schrödinger Bridge (I2SB) initializes the generative process from corrupted images and integrates training techniques from conditional diffus… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  4. arXiv:2402.05706  [pdf, other

    cs.CL cs.SD eess.AS

    Unified Speech-Text Pretraining for Spoken Dialog Modeling

    Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo

    Abstract: While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  5. arXiv:2312.09736  [pdf, other

    cs.CL cs.SD eess.AS

    HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

    Authors: Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo

    Abstract: Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in develo** VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information f… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023, 14 pages, 13 figures

  6. arXiv:2312.05790  [pdf, other

    cs.LG cs.AI eess.SP

    SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation

    Authors: Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo

    Abstract: Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  7. arXiv:2310.10088  [pdf, other

    eess.IV cs.CV cs.LG

    PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising

    Authors: Hyemi Jang, Junsung Park, Dahuin Jung, Jaihyun Lew, Ho Bae, Sungroh Yoon

    Abstract: Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised deno… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  8. arXiv:2310.08598  [pdf, other

    eess.IV cs.AI cs.CV

    Domain Generalization for Medical Image Analysis: A Survey

    Authors: Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, Maciej A. Mazurowski, Heung-Il Suk

    Abstract: Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap bet… ▽ More

    Submitted 15 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  9. Deep Video Inpainting Guided by Audio-Visual Self-Supervision

    Authors: Kyuyeon Kim, Junsik Jung, Woo Jae Kim, Sung-Eui Yoon

    Abstract: Humans can easily imagine a scene from auditory information based on their prior knowledge of audio-visual events. In this paper, we mimic this innate human ability in deep learning models to improve the quality of video inpainting. To implement the prior knowledge, we first train the audio-visual network, which learns the correspondence between auditory and visual information. Then, the audio-vis… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2022

  10. arXiv:2308.08442  [pdf, other

    cs.CL cs.SD eess.AS

    Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

    Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

    Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

  11. arXiv:2306.16083  [pdf, other

    cs.SD eess.AS

    UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data

    Authors: Heeseung Kim, Sungwon Kim, Jiheum Yeom, Sungroh Yoon

    Abstract: We propose UnitSpeech, a speaker-adaptive speech synthesis method that fine-tunes a diffusion-based text-to-speech (TTS) model using minimal untranscribed data. To achieve this, we use the self-supervised unit representation as a pseudo transcript and integrate the unit encoder into the pre-trained TTS model. We train the unit encoder to provide speech content to the diffusion-based decoder and th… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: INTERSPEECH 2023, Oral

  12. arXiv:2305.16371  [pdf, other

    cs.CL cs.SD eess.AS

    INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

    Authors: Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: ACL2023

  13. arXiv:2302.04224  [pdf

    eess.SP

    Data Poisoning Attacks on EEG Signal-based Risk Assessment Systems

    Authors: Zhibo Zhang, Sani Umar, Ahmed Y. Al Hammadi, Sangyoung Yoon, Ernesto Damiani, Chan Yeob Yeun

    Abstract: Industrial insider risk assessment using electroencephalogram (EEG) signals has consistently attracted a lot of research attention. However, EEG signal-based risk assessment systems, which could evaluate the emotional states of humans, have shown several vulnerabilities to data poison attacks. In this paper, from the attackers' perspective, data poison attacks involving label-flip** occurring in… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: 2nd International Conference on Business Analytics For Technology and Security (ICBATS)

  14. Explainable Data Poison Attacks on Human Emotion Evaluation Systems based on EEG Signals

    Authors: Zhibo Zhang, Sani Umar, Ahmed Y. Al Hammadi, Sangyoung Yoon, Ernesto Damiani, Claudio Agostino Ardagna, Nicola Bena, Chan Yeob Yeun

    Abstract: The major aim of this paper is to explain the data poisoning attacks using label-flip** during the training stage of the electroencephalogram (EEG) signal-based human emotion evaluation systems deploying Machine Learning models from the attackers' perspective. Human emotion evaluation using EEG signals has consistently attracted a lot of research attention. The identification of human emotional… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Journal ref: IEEE Access 2023

  15. arXiv:2211.11381  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    LISA: Localized Image Stylization with Audio via Implicit Neural Representation

    Authors: Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, **kyu Kim, Sangpil Kim

    Abstract: We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization. Sound often provides information about the specific context of the scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a pa… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  16. arXiv:2210.11592  [pdf, other

    cs.CR eess.SY

    New data poison attacks on machine learning classifiers for mobile exfiltration

    Authors: Miguel A. Ramirez, Sangyoung Yoon, Ernesto Damiani, Hussam Al Hamadi, Claudio Agostino Ardagna, Nicola Bena, Young-Ji Byon, Tae-Yeon Kim, Chung-Suk Cho, Chan Yeob Yeun

    Abstract: Most recent studies have shown several vulnerabilities to attacks with the potential to jeopardize the integrity of the model, opening in a few recent years a new window of opportunity in terms of cyber-security. The main interest of this paper is directed towards data poisoning attacks involving label-flip**, this kind of attacks occur during the training phase, being the aim of the attacker to… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2202.10276

  17. arXiv:2207.13223  [pdf, other

    cs.LG eess.IV

    XADLiME: eXplainable Alzheimer's Disease Likelihood Map Estimation via Clinically-guided Prototype Learning

    Authors: Ahmad Wisnu Mulyadi, Wonsik Jung, Kwanseok Oh, Jee Seok Yoon, Heung-Il Suk

    Abstract: Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-le… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  18. Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

    Authors: Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung

    Abstract: In this paper, we propose a novel speech emotion recognition model called Cross Attention Network (CAN) that uses aligned audio and text signals as inputs. It is inspired by the fact that humans recognize speech as a combination of simultaneously produced acoustic and textual signals. First, our method segments the audio and the underlying text signals into equal number of steps in an aligned way… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: 5 pages, accepted by INTERSPEECH 2020

    Journal ref: Proc. Interspeech 2020, 2717-2721

  19. arXiv:2206.07578   

    cs.CV cs.LG eess.IV

    E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations

    Authors: Jongwan Kim, Dong** Lee, Byunggook Na, Seongsik Park, Jeonghee Jo, Sungroh Yoon

    Abstract: Event cameras respond to brightness changes in the scene asynchronously and independently for every pixel. Due to the properties, these cameras have distinct features: high dynamic range (HDR), high temporal resolution, and low power consumption. However, the results of event cameras should be processed into an alternative representation for computer vision tasks. Also, they are usually noisy and… ▽ More

    Submitted 13 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: This submission has been withdrawn by arXiv administrators due to inappropriate text overlap with external sources. Additional information at https://doi.org/10.1109/CVPR52688.2022.01319

    Journal ref: The IEEE / CVF Computer Vision and Pattern Recognition Conference 2022

  20. arXiv:2206.04658  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    BigVGAN: A Universal Neural Vocoder with Large-Scale Training

    Authors: Sang-gil Lee, Wei **, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

    Abstract: Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments. In this work, we present BigVGAN, a universal vocoder that generalizes well for various out-of-distribution scenarios without fine-tun… ▽ More

    Submitted 16 February, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: To appear at ICLR 2023. Listen to audio samples from BigVGAN at: https://bigvgan-demo.github.io/

  21. arXiv:2205.15370  [pdf, other

    cs.SD cs.AI eess.AS

    Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data

    Authors: Sungwon Kim, Heeseung Kim, Sungroh Yoon

    Abstract: We propose Guided-TTS 2, a diffusion-based generative model for high-quality adaptive TTS using untranscribed data. Guided-TTS 2 combines a speaker-conditional diffusion model with a speaker-dependent phoneme classifier for adaptive text-to-speech. We train the speaker-conditional diffusion model on large-scale untranscribed datasets for a classifier-free guidance method and further fine-tune the… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  22. arXiv:2112.01535  [pdf, other

    eess.IV cs.AI cs.LG

    Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images

    Authors: Sang-gil Lee, Eunji Kim, Jae Seok Bae, Jung Hoon Kim, Sungroh Yoon

    Abstract: The computer-aided diagnosis of focal liver lesions (FLLs) can help improve workflow and enable correct diagnoses; FLL detection is the first step in such a computer-aided diagnosis. Despite the recent success of deep-learning-based approaches in detecting FLLs, current methods are not sufficiently robust for assessing misaligned multiphase data. By introducing an attention-guided multiphase align… ▽ More

    Submitted 16 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: IEEE TETCI. 14 pages, 8 figures, 5 tables

  23. arXiv:2112.00007  [pdf, other

    cs.GR cs.CV cs.LG cs.SD eess.AS

    Sound-Guided Semantic Image Manipulation

    Authors: Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, **kyu Kim, Sangpil Kim

    Abstract: The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a fra… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

  24. arXiv:2111.11755  [pdf, other

    cs.SD cs.AI eess.AS

    Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

    Authors: Heeseung Kim, Sungwon Kim, Sungroh Yoon

    Abstract: We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require any transcript of target speaker using classifier guidance. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for classifier guidance. Our unconditional diffusion model learns to generate speech without any context from untranscribed speech data. For… ▽ More

    Submitted 10 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: 15 pages, 5 figures, ICML'2022

  25. arXiv:2109.02342  [pdf, other

    eess.IV cs.CV physics.med-ph

    Automated Cardiac Resting Phase Detection Targeted on the Right Coronary Artery

    Authors: Seung Su Yoon, Elisabeth Preuhs, Michaela Schmidt, Christoph Forman, Teodora Chitiboi, Puneet Sharma, Juliano Lara Fernandes, Christoph Tillmanns, Jens Wetzl, Andreas Maier

    Abstract: Static cardiac imaging such as late gadolinium enhancement, map**, or 3-D coronary angiography require prior information, e.g., the phase during a cardiac cycle with least motion, called resting phase (RP). The purpose of this work is to propose a fully automated framework that allows the detection of the right coronary artery (RCA) RP within CINE series. The proposed prototype system consists o… ▽ More

    Submitted 31 January, 2023; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2023:001

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2023)

  26. arXiv:2108.02716  [pdf, other

    cs.NI eess.SP

    Link Quality-Guaranteed Minimum-Cost Millimeter-Wave Base Station Deployment

    Authors: Miaomiao Dong, Taejoon Kim, Minsung Cho, Kangeun Lee, Sungrok Yoon

    Abstract: Today's growth in the volume of wireless devices coupled with the promise of supporting data-intensive 5G-&-beyond use cases is driving the industry to deploy more millimeter-wave (mmWave) base stations (BSs). Although mmWave cellular systems can carry a larger volume of traffic, dense deployment, in turn, increases the BS installation and maintenance cost, which has been largely ignored in their… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 16 pages, submitted to IEEE Transactions on Wireless Communications

  27. arXiv:2106.06406  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

    Authors: Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu

    Abstract: Denoising diffusion probabilistic models have been recently proposed to generate high-quality samples by estimating the gradient of the data density. The framework defines the prior noise as a standard Gaussian distribution, whereas the corresponding data distribution may be more complicated than the standard Gaussian distribution, which potentially introduces inefficiency in denoising the prior n… ▽ More

    Submitted 20 February, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: ICLR 2022. 19 pages, 7 figures, 8 tables. Audio samples: https://speechresearch.github.io/priorgrad/

  28. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: **** Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, **gyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  29. arXiv:2104.14730  [pdf, other

    cs.CV eess.IV

    Perceptual Image Quality Assessment with Transformers

    Authors: Manri Cheon, Sung-Jun Yoon, Byungyeon Kang, Junwoo Lee

    Abstract: In this paper, we propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task. Perceptual representation becomes more important in image quality assessment. In this context, we extract the perceptual feature representations from each of input images using a convolutional neural network (CNN) back… ▽ More

    Submitted 4 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted to NTIRE workshop at CVPR 2021. 1st Place in NTIRE 2021 perceptual IQA challenge. https://github.com/manricheon/IQT

  30. arXiv:2101.04662  [pdf, other

    eess.SY

    Output Regulation of Linear Aperiodic Sampled-Data Systems

    Authors: Himadri Basu, Francesco Ferrante, Se Young Yoon

    Abstract: This paper deals with the output regulation problem of a linear time-invariant system in the presence of sporadically available measurement streams. A regulator with a continuous intersample injection term is proposed, where the intersample injection is provided by a linear dynamical system and the state of which is reset with the arrival of every new measurement updates. The resulting system is a… ▽ More

    Submitted 15 February, 2022; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: Accepted for presentation at the American Control Conference 2022

  31. arXiv:2010.14231  [pdf

    eess.IV physics.med-ph

    Virtual Alignment Method and its application to the dental prostheses and diagnosis

    Authors: Kyungtaek Jun, Seokhwan Yoon, Jae-Hong Lim, SeungJoon Noh

    Abstract: The recent proposal of a new alignment solution for X-ray tomography, Virtual alignment method (VAM) allowed a more accurate method to remove the possible errors that limit the resolution and clarity of the reconstructed image. In the field of dentistry, the movement of patients during the scanning poses as one of the major factors hindering the final reconstructed image quality. Here, the patient… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: 21 Pages, 5 figures

  32. arXiv:2010.11457  [pdf, other

    eess.AS cs.SD

    Momentum Contrast Speaker Representation Learning

    Authors: Jangho Lee, Jaihyun Koh, Sungroh Yoon

    Abstract: Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementi… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  33. arXiv:2005.11129  [pdf, other

    eess.AS cs.SD

    Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

    Authors: Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

    Abstract: Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external al… ▽ More

    Submitted 22 October, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: Accepted by NeurIPS2020

  34. arXiv:2005.08374  [pdf, ps, other

    eess.SP cs.LG

    Intelligent O-RAN for Beyond 5G and 6G Wireless Networks

    Authors: Solmaz Niknam, Abhishek Roy, Harpreet S. Dhillon, Sukhdeep Singh, Rahul Banerji, Jeffery H. Reed, Navrati Saxena, Seungil Yoon

    Abstract: Building on the principles of openness and intelligence, there has been a concerted global effort from the operators towards enhancing the radio access network (RAN) architecture. The objective is to build an operator-defined RAN architecture (and associated interfaces) on open hardware that provides intelligent radio control for beyond fifth generation (5G) as well as future sixth generation (6G)… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

  35. arXiv:1910.11122  [pdf

    cs.CV cs.LG eess.IV

    Peanut Maturity Classification using Hyperspectral Imagery

    Authors: Sheng Zou, Yu-Chien Tseng, Alina Zare, Diane Rowland, Barry Tillman, Seung-Chul Yoon

    Abstract: Seed maturity in peanut (Arachis hypogaea L.) determines economic return to a producer because of its impact on seed weight (yield), and critically influences seed vigor and other quality characteristics. During seed development, the inner mesocarp layer of the pericarp (hull) transitions in color from white to black as the seed matures. The maturity assessment process involves the removal of the… ▽ More

    Submitted 24 October, 2019; v1 submitted 20 October, 2019; originally announced October 2019.

  36. arXiv:1910.04681  [pdf

    physics.bio-ph eess.IV physics.optics

    Laser scanning reflection-matrix microscopy for label-free in vivo imaging of a mouse brain through an intact skull

    Authors: Seokchan Yoon, Hojun Lee, ** Hee Hong, Yong-Sik Lim, Wonshik Choi

    Abstract: We present a laser scanning reflection-matrix microscopy combining the scanning of laser focus and the wide-field map** of the electric field of the backscattered waves for eliminating higher-order aberrations even in the presence of strong multiple light scattering noise. Unlike conventional confocal laser scanning microscopy, we record the amplitude and phase maps of reflected waves from the s… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: 14 pages, 4 figures

  37. arXiv:1909.11915  [pdf

    cs.CV cs.LG eess.IV

    Unsupervised Image Translation using Adversarial Networks for Improved Plant Disease Recognition

    Authors: Haseeb Nazki, Sook Yoon, Alvaro Fuentes, Dong Sun Park

    Abstract: Acquisition of data in task-specific applications of machine learning like plant disease recognition is a costly endeavor owing to the requirements of professional human diligence and time constraints. In this paper, we present a simple pipeline that uses GANs in an unsupervised image translation environment to improve learning with respect to the data distribution in a plant disease dataset, redu… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: 20 pages, 11 figures, 3 tables, article under review

  38. arXiv:1904.10788  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Speech Emotion Recognition Using Multi-hop Attention Mechanism

    Authors: Seunghyun Yoon, Seokhyun Byun, Subhadeep Dey, Kyomin Jung

    Abstract: In this paper, we are interested in exploiting textual and acoustic data of an utterance for the speech emotion classification task. The baseline approach models the information from audio and text independently using two deep neural networks (DNNs). The outputs from both the DNNs are then fused for classification. As opposed to using knowledge from both the modalities separately, we propose a fra… ▽ More

    Submitted 9 May, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

    Comments: 5 pages, Accepted as a conference paper at ICASSP 2019 (oral presentation)

  39. arXiv:1902.09179  [pdf, other

    cs.SD cs.RO eess.AS

    Robust Sound Source Localization considering Similarity of Back-Propagation Signals

    Authors: Inkyu An, Doheon Lee, Byeongho Jo, Jung-Woo Choi, Sung-Eui Yoon

    Abstract: We present a novel, robust sound source localization algorithm considering back-propagation signals. Sound propagation paths are estimated by generating direct and reflection acoustic rays based on ray tracing in a backward manner. We then compute the back-propagation signals by designing and using the impulse response of the backward sound propagation based on the acoustic ray paths. For identify… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

  40. arXiv:1902.02455  [pdf, other

    eess.AS cs.LG cs.SD

    End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification

    Authors: Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-** Shim, Ha-** Yu

    Abstract: In recent years, speaker verification has primarily performed using deep neural networks that are trained to output embeddings from input features such as spectrograms or Mel-filterbank energies. Studies that design various loss functions, including metric learning have been widely explored. In this study, we propose two end-to-end loss functions for speaker verification using the concept of speak… ▽ More

    Submitted 17 July, 2019; v1 submitted 6 February, 2019; originally announced February 2019.

    Comments: 5 pages and 2 figures

  41. arXiv:1811.02155  [pdf, other

    cs.SD eess.AS

    FloWaveNet : A Generative Flow for Raw Audio

    Authors: Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon

    Abstract: Most modern text-to-speech architectures use a WaveNet vocoder for synthesizing high-fidelity waveform audio, but there have been limitations, such as high inference time, in its practical application due to its ancestral sampling scheme. The recently suggested Parallel WaveNet and ClariNet have achieved real-time audio synthesis capability by incorporating inverse autoregressive flow for parallel… ▽ More

    Submitted 20 May, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: 9 pages, ICML'2019

  42. arXiv:1809.07524  [pdf, other

    cs.RO cs.SD eess.AS

    Diffraction-Aware Sound Localization for a Non-Line-of-Sight Source

    Authors: Inkyu An, Doheon Lee, Jung-woo Choi, Dinesh Manocha, Sung-eui Yoon

    Abstract: We present a novel sound localization algorithm for a non-line-of-sight (NLOS) sound source in indoor environments. Our approach exploits the diffraction properties of sound waves as they bend around a barrier or an obstacle in the scene. We combine a ray tracing based sound propagation algorithm with a Uniform Theory of Diffraction (UTD) model, which simulate bending effects by placing a virtual… ▽ More

    Submitted 20 September, 2018; originally announced September 2018.

    Comments: Submitted to ICRA 2019. The working video is available at (https://www.youtube.com/watch?v=qqf7jM45bz4)

  43. arXiv:1808.09638  [pdf

    eess.AS cs.LG cs.SD eess.SP stat.ML

    Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes

    Authors: Hye-** Shim, Jee-weon Jung, Hee-Soo Heo, Sunghyun Yoon, Ha-** Yu

    Abstract: In this paper, we propose a replay attack spoofing detection system for automatic speaker verification using multitask learning of noise classes. We define the noise that is caused by the replay attack as replay noise. We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification. The multi-task learning includes… ▽ More

    Submitted 25 October, 2018; v1 submitted 29 August, 2018; originally announced August 2018.

    Comments: 5 pages, accepted by Technologies and Applications of Artificial Intelligence(TAAI)

  44. arXiv:1711.07791  [pdf, other

    cs.SD cs.RO eess.AS

    Reflection-Aware Sound Source Localization

    Authors: Inkyu An, Myungbae Son, Dinesh Manocha, Sung-eui Yoon

    Abstract: We present a novel, reflection-aware method for 3D sound localization in indoor environments. Unlike prior approaches, which are mainly based on continuous sound signals from a stationary source, our formulation is designed to localize the position instantaneously from signals within a single frame. We consider direct sound and indirect sound signals that reach the microphones after reflecting off… ▽ More

    Submitted 21 November, 2017; originally announced November 2017.

    Comments: Submitted to ICRA 2018. The working video is available at (https://youtu.be/TkQ36lMEC-M)

  45. arXiv:1710.11418  [pdf, other

    cs.SD eess.AS

    Polyphonic Music Generation with Sequence Generative Adversarial Networks

    Authors: Sang-gil Lee, Uiwon Hwang, Seonwoo Min, Sungroh Yoon

    Abstract: We propose an application of sequence generative adversarial networks (SeqGAN), which are generative adversarial networks for discrete sequence generation, for creating polyphonic musical sequences. Instead of a monophonic melody generation suggested in the original work, we present an efficient representation of a polyphony MIDI file that simultaneously captures chords and melodies with dynamic t… ▽ More

    Submitted 2 July, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

    Comments: 8 pages, 3 figures, 3 tables

  46. arXiv:1710.02928  [pdf, ps, other

    eess.SP

    Range-Spread Targets Detection in Unknown Doppler Shift via Semi-Definite Programming

    Authors: Mai. P. T. Nguyen, I. Song, S. Lee, S. Yoon

    Abstract: Based on the technique of generalized likelihood ratio test, we address detection schemes for Doppler-shifted range-spread targets in Gaussian noise. First, a detection scheme is derived by solving the maximization associated with the estimation of unknown Doppler frequency with semi-definite programming. To lower the computational complexity of the detector, we then consider a simplification of t… ▽ More

    Submitted 8 October, 2017; originally announced October 2017.

    Comments: First author is Mai P. T. Nguyen

  47. arXiv:1510.03062  [pdf

    eess.SY

    GPS Receiver with Enhanced User Positioning Time

    Authors: Seung-Hyun Yoon, Ji-Woon Jung, Su-Bong Kim, Hyun-Chang Shin, Jae-Hyang Lee, Kyu-Yun Lee, Hyo-Sun Shim

    Abstract: This paper introduces a Global Positioning System (GPS) Receiver that locates user's position instantly. Recently, many mobile devices require location information to add user position into their contents, and some applications require quick positioning when the device is initially switched on. In order to reduce the time to fix user's position, we propose the Instant-On GPS receiver system which… ▽ More

    Submitted 11 October, 2015; originally announced October 2015.

    Comments: Proc. International Symposium on GPS/GNSS 2008