Skip to main content

Showing 1–50 of 65 results for author: Kim, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  2. arXiv:2405.18012  [pdf, other

    cs.CV eess.IV

    Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, **young Park, Yooseung Wang, Donguk Kim, Changick Kim

    Abstract: Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2405.01591  [pdf, other

    cs.CL cs.AI eess.IV

    Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

    Authors: Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Su** Choi, Joo Heung Yoon

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: Under review

  4. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhi**g Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  5. arXiv:2403.19425  [pdf, ps, other

    eess.IV cs.CV

    A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

    Authors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel , et al. (33 additional authors not shown)

    Abstract: Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemi… ▽ More

    Submitted 3 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  6. arXiv:2403.11578  [pdf, other

    eess.AS

    AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

    Authors: SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence map**. While earlier efforts have tried to combine the CTC loss with an entropy maximization regulariz… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  7. arXiv:2401.10465  [pdf, other

    cs.CL cs.SD eess.AS

    Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

    Authors: Abhinav Garg, Jiyeon Kim, Sushil Khyalia, Chanwoo Kim, Dhananjaya Gowda

    Abstract: Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-Speech (TTS) system. Most of the current G2P systems rely on carefully hand-crafted lexicons developed by experts. This poses a two-fold problem. Firstly, the lexicons are generated using a fixed phoneme set, usually, ARPABET or IPA, which might not be the most optimal way to represent phonemes for all languag… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  8. arXiv:2312.09842  [pdf, ps, other

    cs.SD eess.AS

    On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

    Authors: Nagaraj Adiga, **hwan Park, Chintigari Shiva Kumar, Shatrughan Singh, Kyungmin Lee, Chanwoo Kim, Dhananjaya Gowda

    Abstract: Recently, the cascaded two-pass architecture has emerged as a strong contender for on-device automatic speech recognition (ASR). A cascade of causal and shallow non-causal encoders coupled with a shared decoder enables operation in both streaming and look-ahead modes. In this paper, we propose shallow cascaded model by combining various model compression techniques such as knowledge distillation,… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  9. arXiv:2311.05878  [pdf

    cs.CV eess.IV

    Central Angle Optimization for 360-degree Holographic 3D Content

    Authors: Hakdong Kim, Minsung Yoon, Cheongwon Kim

    Abstract: In this study, we propose a method to find an optimal central angle in deep learning-based depth map estimation used to produce realistic holographic content. The acquisition of RGB-depth map images as detailed as possible must be performed to generate holograms of high quality, despite the high computational cost. Therefore, we introduce a novel pipeline designed to analyze various values of cent… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  10. arXiv:2310.07981  [pdf, other

    cs.LG cs.RO eess.SY

    Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach

    Authors: Hwajong Lee, Chan Kim, Seong-Woo Kim

    Abstract: A flow control system is a critical concept for increasing the production capacity of manufacturing systems. To solve the scheduling optimization problem related to the flow control with the aim of improving productivity, existing methods depend on a heuristic design by domain human experts. Therefore, the methods require correction, monitoring, and verification by using real equipment. As system… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 10 pages, 17 figures

  11. arXiv:2310.03538  [pdf, other

    eess.AS

    Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis

    Authors: Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim

    Abstract: Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overall system performance. To avoid such degradation, instead of directly augmenting the input data, we propose a latent filling (LF) method that adopts s… ▽ More

    Submitted 22 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to ICASSP 2024

  12. arXiv:2310.01413  [pdf

    eess.IV cs.AI cs.CV

    A multi-institutional pediatric dataset of clinical radiology MRIs by the Children's Brain Tumor Network

    Authors: Ariana M. Familiar, Anahita Fathi Kazerooni, Hannah Anderson, Aliaksandr Lubneuski, Karthik Viswanathan, Rocky Breslow, Nastaran Khalili, Sina Bagheri, Debanjan Haldar, Meen Chul Kim, Sherjeel Arif, Rachel Madhogarhia, Thinh Q. Nguyen, Elizabeth A. Frenkel, Zeinab Helili, Jessica Harrison, Keyvan Farahani, Marius George Linguraru, Ulas Bagci, Yury Velichko, Jeffrey Stevens, Sarah Leary, Robert M. Lober, Stephani Campion, Amy A. Smith , et al. (15 additional authors not shown)

    Abstract: Pediatric brain and spinal cancers remain the leading cause of cancer-related death in children. Advancements in clinical decision-support in pediatric neuro-oncology utilizing the wealth of radiology imaging data collected through standard care, however, has significantly lagged other domains. Such data is ripe for use with predictive analytics such as artificial intelligence (AI) methods, which… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  13. arXiv:2309.14967  [pdf, other

    cs.CV eess.IV

    A novel approach for holographic 3D content generation without depth map

    Authors: Hakdong Kim, Minkyu Jee, Yurim Lee, Kyudam Choi, MinSung Yoon, Cheongwon Kim

    Abstract: In preparation for observing holographic 3D content, acquiring a set of RGB color and depth map images per scene is necessary to generate computer-generated holograms (CGHs) when using the fast Fourier transform (FFT) algorithm. However, in real-world situations, these paired formats of RGB color and depth map images are not always fully available. We propose a deep learning-based method to synthe… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  14. arXiv:2309.07152  [pdf

    eess.SP physics.med-ph

    Novel Smart N95 Filtering Facepiece Respirator with Real-time Adaptive Fit Functionality and Wireless Humidity Monitoring for Enhanced Wearable Comfort

    Authors: Kangkyu Kwon, Yoon Jae Lee, Yeongju Jung, Ira Soltis, Chanyeong Choi, Yewon Na, Lissette Romero, Myung Chul Kim, Nathan Rodeheaver, Hodam Kim, Michael S. Lloyd, Ziqing Zhuang, William King, Susan Xu, Seung-Hwan Ko, **woo Lee, Woon-Hong Yeo

    Abstract: The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the cur… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 20 pages, 5 figures, 1 table, submitted for possible publication

    MSC Class: 92C55

  15. arXiv:2309.01950  [pdf, other

    cs.CV cs.AI cs.LG cs.SD eess.AS

    RADIO: Reference-Agnostic Dubbing Video Synthesis

    Authors: Dongyeun Lee, Chaewon Kim, Sangjoon Yu, Jaejun Yoo, Gyeong-Moon Park

    Abstract: One of the most challenging problems in audio-driven talking head generation is achieving high-fidelity detail while ensuring precise synchronization. Given only a single reference image, extracting meaningful identity attributes becomes even more challenging, often causing the network to mirror the facial and lip structures too closely. To address these issues, we introduce RADIO, a framework eng… ▽ More

    Submitted 6 November, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted by WACV 2024

  16. Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

    Authors: Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

    Abstract: For personalized speech generation, a neural text-to-speech (TTS) model must be successfully implemented with limited data from a target speaker. To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i.e., target speaker's speech). However, approaches to address this out-of-domain generalization problem in TTS have yet to be thoroughly studied. In this work, we p… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

    Journal ref: Proc. INTERSPEECH 2023, 4299-4303

  17. arXiv:2308.08442  [pdf, other

    cs.CL cs.SD eess.AS

    Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

    Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

    Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

  18. arXiv:2307.07409  [pdf, other

    cs.CL cs.AI eess.IV

    KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

    Authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang

    Abstract: In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Published at BioNLP workshop @ ACL 2023

  19. arXiv:2307.01722  [pdf

    eess.SY

    Data Transmission and Communication via Electrolytic Flow Channel

    Authors: Khoi Ly, Chong-Chan Kim, Robert Shepherd

    Abstract: As an alternative approach to ionic data transmission with hydrogel as substrate, this work explores the possible applications of liquid electrolyte filling cavity of a stretchable, flexible elastomeric tubing, which is the primary ingredient used in redox flow battery systems. While hydrogel-based ionic impedance characterization and its data communication capability have been well studied, the m… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  20. arXiv:2306.01981  [pdf, other

    eess.AS cs.AI cs.LG

    SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

    Authors: Changhun Kim, Joonhyung Park, Ha** Shim, Eunho Yang

    Abstract: Automatic speech recognition (ASR) models are frequently exposed to data distribution shifts in many real-world scenarios, leading to erroneous predictions. To tackle this issue, an existing test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data. Despite decent performance gain, this work relies solely on naiv… ▽ More

    Submitted 21 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: INTERSPEECH 2023 Oral Presentation; Code is available at https://github.com/drumpt/SGEM

  21. arXiv:2305.11920  [pdf, other

    eess.IV physics.optics

    Megahertz X-ray Multi-projection imaging

    Authors: Pablo Villanueva-Perez, Valerio Bellucci, Yuhe Zhang, Sarlota Birnsteinova, Rita Graceffa, Luigi Adriano, Eleni Myrto Asimakopoulou, Ilia Petrov, Zisheng Yao, Marco Romagnoni, Andrea Mazzolari, Romain Letrun, Chan Kim, Jayanath C. P. Koliyadu, Carsten Deiter, Richard Bean, Gabriele Giovanetti, Luca Gelisio, Tobias Ritschel, Adrian Mancuso, Henry N. Chapman, Alke Meents, Tokushi Sato, Patrik Vagovic

    Abstract: X-ray time-resolved tomography is one of the most popular X-ray techniques to probe dynamics in three dimensions (3D). Recent developments in time-resolved tomography opened the possibility of recording kilohertz-rate 3D movies. However, tomography requires rotating the sample with respect to the X-ray beam, which prevents characterization of faster structural dynamics. Here, we present megahertz… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  22. arXiv:2304.14496  [pdf, ps, other

    physics.ins-det cs.LG eess.SP nucl-ex

    Restoring Original Signal From Pile-up Signal using Deep Learning

    Authors: C. H. Kim, S. Ahn, K. Y. Chae, J. Hooker, G. V. Rogachev

    Abstract: Pile-up signals are frequently produced in experimental physics. They create inaccurate physics data with high uncertainty and cause various problems. Therefore, the correction to pile-up signals is crucially required. In this study, we implemented a deep learning method to restore the original signals from the pile-up signals. We showed that a deep learning model could accurately reconstruct the… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  23. arXiv:2303.08670  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

    Authors: Minsu Kim, Chae Won Kim, Yong Man Ro

    Abstract: Forced alignment refers to a technology that time-aligns a given transcription with a corresponding speech. However, as the forced alignment technologies have developed using speech audio, they might fail in alignment when the input speech audio is noise-corrupted or is not accessible. We focus on that there is another component that the speech can be inferred from, the speech video (i.e., talking… ▽ More

    Submitted 26 February, 2023; originally announced March 2023.

    Comments: Accepted in AAAI2023

  24. arXiv:2303.05715  [pdf, other

    eess.IV cs.CV

    Context-Based Trit-Plane Coding for Progressive Image Compression

    Authors: Seungmin Jeon, Kwang Pyo Choi, Youngo Park, Chang-Su Kim

    Abstract: Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models. In this paper, we propose the context-based trit-plane coding (CTC) algorithm to achieve progressive compression more compactly. First, we develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately and thus encode the trit-planes compactly… ▽ More

    Submitted 13 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  25. arXiv:2301.08078  [pdf, other

    cs.RO eess.SY

    Stable Contact Guaranteeing Motion/Force Control for an Aerial Manipulator on an Arbitrarily Tilted Surface

    Authors: Jeonghyun Byun, Byeongjun Kim, Changhyeon Kim, Donggeon David Oh, H. ** Kim

    Abstract: This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: to be presented in 2023 IEEE International Conference on Robotics and Automations (ICRA), London, United Kingdom, 2023

  26. arXiv:2212.14149  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Macro-block dropout for improved regularization in training end-to-end speech recognition models

    Authors: Chanwoo Kim, Sathish Indurti, **hwan Park, Wonyong Sung

    Abstract: This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

    Comments: Accepted for presentation at The 2022 IEEE Spoken Language Technology Workshop (SLT 2022)

  27. arXiv:2211.11381  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    LISA: Localized Image Stylization with Audio via Implicit Neural Representation

    Authors: Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, **kyu Kim, Sangpil Kim

    Abstract: We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization. Sound often provides information about the specific context of the scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a pa… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  28. arXiv:2211.03078  [pdf, other

    eess.AS cs.SD

    An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

    Authors: Jihwan Lee, Jae-Sung Bae, Seongkyu Mun, Hee** Choi, Joun Yeop Lee, Hoon-Young Cho, Chanwoo Kim

    Abstract: With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply th… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  29. arXiv:2210.09655  [pdf, other

    cs.CV cs.LG eess.IV

    WaGI : Wavelet-based GAN Inversion for Preserving High-frequency Image Details

    Authors: Seung-Jun Moon, Chaewon Kim, Gyeong-Moon Park

    Abstract: Recent GAN inversion models focus on preserving image-specific details through various methods, e.g., generator tuning or feature mixing. While those are helpful for preserving details compared to a naiive low-rate latent inversion, they still fail to maintain high-frequency features precisely. In this paper, we point out that the existing GAN inversion models have inherent limitations in both str… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

  30. arXiv:2207.06330  [pdf, other

    eess.IV cs.CV

    Left Ventricle Contouring of Apical Three-Chamber Views on 2D Echocardiography

    Authors: Alberto Gomez, Mihaela Porumb, Angela Mumith, Thierry Judge, Shan Gao, Woo-** Cho Kim, Jorge Oliveira, Agis Chartsias

    Abstract: We propose a new method to automatically contour the left ventricle on 2D echocardiographic images. Unlike most existing segmentation methods, which are based on predicting segmentation masks, we focus at predicting the endocardial contour and the key landmark points within this contour (basal points and apex). This provides a representation that is closer to how experts perform manual annotations… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: Submitted to MICCAI-ASMUS 2022

  31. arXiv:2204.02405  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Zero-shot Blind Image Denoising via Implicit Neural Representations

    Authors: Chaewon Kim, Jaeho Lee, **woo Shin

    Abstract: Recent denoising algorithms based on the "blind-spot" strategy show impressive blind image denoising performances, without utilizing any external dataset. While the methods excel in recovering highly contaminated images, we observe that such algorithms are often less effective under a low-noise or real noise regime. To address this gap, we propose an alternative denoising strategy that leverages t… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 8 pages, 3 figures

  32. Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech

    Authors: Hyungchan Yoon, Seyun Um, Changwhan Kim, Hong-Goo Kang

    Abstract: To simplify the generation process, several text-to-speech (TTS) systems implicitly learn intermediate latent representations instead of relying on predefined features (e.g., mel-spectrogram). However, their generation quality is unsatisfactory as these representations lack speech variances. In this paper, we improve TTS performance by adding \emph{prosody embeddings} to the latent representations… ▽ More

    Submitted 28 August, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: INTERSPEECH 2023

    MSC Class: 68T07 (Primary) 68T50; 68T99 (Secondary) ACM Class: I.2.7; I.2.6

  33. arXiv:2204.01271  [pdf, other

    eess.AS cs.LG cs.SD

    Into-TTS : Intonation Template Based Prosody Control System

    Authors: Jihwan Lee, Joun Yeop Lee, Hee** Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim

    Abstract: Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervi… ▽ More

    Submitted 6 November, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2023

  34. arXiv:2203.13467  [pdf, other

    eess.IV cs.CV

    RD-Optimized Trit-Plane Coding of Deep Compressed Image Latent Tensors

    Authors: Seungmin Jeon, Jae-Han Lee, Chang-Su Kim

    Abstract: DPICT is the first learning-based image codec supporting fine granular scalability. In this paper, we describe how to implement two key components of DPICT efficiently: trit-plane slicing and rate-distortion-optimized (RD-optimized) coding. In DPICT, we transform an image into a latent tensor, represent the tensor in ternary digits (trits), and encode the trits in the decreasing order of significa… ▽ More

    Submitted 8 May, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

  35. arXiv:2202.08900  [pdf, other

    cs.SD cs.CR eess.AS

    Attributable-Watermarking of Speech Generative Models

    Authors: Yongbaek Cho, Changhoon Kim, Yezhou Yang, Yi Ren

    Abstract: Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past… ▽ More

    Submitted 15 March, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022

  36. arXiv:2201.02741  [pdf, other

    eess.AS cs.SD

    Two-Pass End-to-End ASR Model Compression

    Authors: Nauman Dawalatabad, Tushar Vatsal, Ashutosh Gupta, Sungsoo Kim, Shatrughan Singh, Dhananjaya Gowda, Chanwoo Kim

    Abstract: Speech recognition on smart devices is challenging owing to the small memory footprint. Hence small size ASR models are desirable. With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1]. Recently, the two-pass model [2] combining RNN-T and LAS modules has shown exceptional performance for streaming on-d… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: IEEE ASRU 2021

  37. arXiv:2112.06334  [pdf, other

    eess.IV cs.CV

    DPICT: Deep Progressive Image Compression Using Trit-Planes

    Authors: Jae-Han Lee, Seungmin Jeon, Kwang Pyo Choi, Youngo Park, Chang-Su Kim

    Abstract: We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). First, we transform an image into a latent tensor using an analysis network. Then, we represent the latent tensor in ternary digits (trits) and encode it into a compressed bitstream trit-plane by trit-plane in the decreasing orde… ▽ More

    Submitted 6 May, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: Accepted to CVPR 2022 (Oral presentation)

    MSC Class: 94A08 (Primary) 68T07; 68P30; 68U10 (Secondary) ACM Class: I.4.2; I.4.9

  38. arXiv:2112.00007  [pdf, other

    cs.GR cs.CV cs.LG cs.SD eess.AS

    Sound-Guided Semantic Image Manipulation

    Authors: Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, **kyu Kim, Sangpil Kim

    Abstract: The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a fra… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

  39. arXiv:2111.10067  [pdf

    physics.med-ph eess.IV

    Noise-resistant reconstruction algorithm based on the sinogram pattern

    Authors: Byung Chun Kim, Hyunju Lee, Kyungtaek Jun

    Abstract: We introduce a new CT image reconstruction algorithm that is less affected by various artifacts. The new reconstruction algorithm is a method of minimizing the difference between synchrotron X-ray tomography data and sinograms generated using Radon transform of CT images. The CT image is iteratively updated to reduce the difference from the sinogram of the data. This method can obtain clean CT ima… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: 8 pages, 3 figures

  40. arXiv:2111.10047  [pdf, other

    eess.AS cs.CL cs.SD

    Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages

    Authors: Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

    Abstract: In this paper, we propose a three-stage training methodology to improve the speech recognition accuracy of low-resource languages. We explore and propose an effective combination of techniques such as transfer learning, encoder freezing, data augmentation using Text-To-Speech (TTS), and Semi-Supervised Learning (SSL). To improve the accuracy of a low-resource Italian ASR, we leverage a well-traine… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: Accepted as a conference paper at ASRU 2021

  41. arXiv:2111.10043  [pdf, other

    eess.AS cs.SD

    A comparison of streaming models and data augmentation methods for robust speech recognition

    Authors: Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

    Abstract: In this paper, we present a comparative study on the robustness of two different online streaming speech recognition models: Monotonic Chunkwise Attention (MoChA) and Recurrent Neural Network-Transducer (RNN-T). We explore three recently proposed data augmentation techniques, namely, multi-conditioned training using an acoustic simulator, Vocal Tract Length Perturbation (VTLP) for speaker variabil… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Accepted as a conference paper at ASRU 2021

  42. arXiv:2110.15729  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems

    Authors: Mohd Abbas Zaidi, Beomseok Lee, Sangha Kim, Chanwoo Kim

    Abstract: Simultaneous translation systems start producing the output while processing the partial source sentence in the incoming input stream. These systems need to decide when to read more input and when to write the output. These decisions depend on the structure of source/target language and the information contained in the partial input sequence. Hence, read/write decision policy remains the same acro… ▽ More

    Submitted 17 June, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, 1 table

  43. arXiv:2109.09041  [pdf, other

    cs.RO eess.SY

    Online Distributed Trajectory Planning for Quadrotor Swarm with Feasibility Guarantee using Linear Safe Corridor

    Authors: Jungwon Park, Dabin Kim, Gyeong Chan Kim, Dahyun Oh, H. ** Kim

    Abstract: This paper presents a new online multi-agent trajectory planning algorithm that guarantees to generate safe, dynamically feasible trajectories in a cluttered environment. The proposed algorithm utilizes a linear safe corridor (LSC) to formulate the distributed trajectory optimization problem with only feasible constraints, so it does not resort to slack variables or soft constraints to avoid optim… ▽ More

    Submitted 3 January, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: 8 pages, RA-L 2022 under review

  44. IceNet for Interactive Contrast Enhancement

    Authors: Keunsoo Ko, Chang-Su Kim

    Abstract: A CNN-based interactive contrast enhancement algorithm, called IceNet, is proposed in this work, which enables a user to adjust image contrast easily according to his or her preference. Specifically, a user provides a parameter for controlling the global brightness and two types of scribbles to darken or brighten local regions in an image. Then, given these annotations, IceNet estimates a gamma ma… ▽ More

    Submitted 25 December, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: 11 pages, 9 figures, 3 tables. This paper has been accepted for publication in IEEE Access. Copyright may change without notice

  45. arXiv:2108.12947  [pdf, other

    eess.IV cs.CV cs.LG cs.MM

    Learning JPEG Compression Artifacts for Image Manipulation Detection and Localization

    Authors: Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, Changick Kim

    Abstract: Detecting and localizing image manipulation are necessary to counter malicious use of image editing techniques. Accordingly, it is essential to distinguish between authentic and tampered regions by analyzing intrinsic statistics in an image. We focus on JPEG compression artifacts left during image acquisition and editing. We propose a convolutional neural network (CNN) that uses discrete cosine tr… ▽ More

    Submitted 25 May, 2022; v1 submitted 29 August, 2021; originally announced August 2021.

    Comments: The version of record of this article, published in the International Journal of Computer Vision (IJCV), is available online at Publisher's website: https://link.springer.com/article/10.1007/s11263-022-01617-5 ; Code is available at: https://github.com/mjkwon2021/CAT-Net

    Journal ref: International Journal of Computer Vision (IJCV), 2022

  46. arXiv:2105.01254  [pdf, other

    cs.SD cs.LG eess.AS

    Streaming end-to-end speech recognition with jointly trained neural feature enhancement

    Authors: Chanwoo Kim, Abhinav Garg, Dhananjaya Gowda, Seongkyu Mun, Changwoo Han

    Abstract: In this paper, we present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers. Even though the MoCha attention enables streaming speech recognition with recognition accuracy comparable to a full attention-based approach, training this model is sensitive to various factors such as the difficulty of training examples,… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: Accepted to ICASSP 2021

  47. arXiv:2103.05158  [pdf

    cs.CV cs.AI eess.IV

    Deep Learning-based High-precision Depth Map Estimation from Missing Viewpoints for 360 Degree Digital Holography

    Authors: Hakdong Kim, Heonyeong Lim, Minkyu Jee, Yurim Lee, Jisoo Jeong, Kyudam Choi, MinSung Yoon, Cheongwon Kim

    Abstract: In this paper, we propose a novel, convolutional neural network model to extract highly precise depth maps from missing viewpoints, especially well applicable to generate holographic 3D contents. The depth map is an essential element for phase extraction which is required for synthesis of computer-generated hologram (CGH). The proposed model called the HDD Net uses MSE for the better performance o… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 12 pages, 10 figures, 5 tables

  48. arXiv:2102.08567  [pdf

    cs.CV cs.AI eess.IV

    Ensemble Transfer Learning of Elastography and B-mode Breast Ultrasound Images

    Authors: Sampa Misra, Seungwan Jeon, Ravi Managuli, Seiyon Lee, Gyuwon Kim, Seungchul Lee, Richard G Barr, Chulhong Kim

    Abstract: Computer-aided detection (CAD) of benign and malignant breast lesions becomes increasingly essential in breast ultrasound (US) imaging. The CAD systems rely on imaging features identified by the medical experts for their performance, whereas deep learning (DL) methods automatically extract features from the data. The challenge of the DL is the insufficiency of breast US images available to train t… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: 17 pages, 10 figures, 6 Tables

  49. arXiv:2010.13974  [pdf, other

    cs.CV eess.IV

    Decentralized Attribution of Generative Models

    Authors: Changhoon Kim, Yi Ren, Yezhou Yang

    Abstract: Growing applications of generative models have led to new threats such as malicious personation and digital copyright infringement. One solution to these threats is model attribution, i.e., the identification of user-end models where the contents under question are generated from. Existing studies showed empirical feasibility of attribution through a centralized classifier trained on all user-end… ▽ More

    Submitted 28 April, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: 16 pages, 7 figures

  50. arXiv:2008.10812  [pdf, other

    eess.SP

    Multiview Variational Deep Learning with Application to Scalable Indoor Localization

    Authors: Minseuk Kim, Changjun Kim, Dongsoo Han, June-Koo Kevin Rhee

    Abstract: Radio channel state information (CSI) measured with many receivers is a good resource for localizing a transmit device with machine learning with a discriminative model. However, CSI localization is nontrivial when the radio map is complicated, such as in building corridors. This paper introduces a view-selective deep learning (VSDL) system for indoor localization using CSI of WiFi. The multiview… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: 10 pages, 6 figures