Skip to main content

Showing 1–50 of 163 results for author: Kim, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.19380  [pdf, other

    stat.ML cs.LG eess.SY

    Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

    Authors: Yeoneung Kim, Gihun Kim, Insoon Yang

    Abstract: We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby acc… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 61 pages, 6 figures

  2. arXiv:2405.13413  [pdf, other

    cs.IT cs.LG eess.SP

    Boosted Neural Decoders: Achieving Extreme Reliability of LDPC Codes for 6G Networks

    Authors: Hee-Youl Kwak, Dae-Young Yun, Yongjune Kim, Sang-Hyo Kim, Jong-Seon No

    Abstract: Ensuring extremely high reliability is essential for channel coding in 6G networks. The next-generation of ultra-reliable and low-latency communications (xURLLC) scenario within 6G networks requires a frame error rate (FER) below 10-9. However, low-density parity-check (LDPC) codes, the standard in 5G new radio (NR), encounter a challenge known as the error floor phenomenon, which hinders to achie… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 12 pages, 11 figures

  3. arXiv:2405.09193  [pdf, other

    eess.SY

    Autonomous Cooperative Levels of Multiple-Heterogeneous Unmanned Vehicle Systems

    Authors: Yoo-Bin Bae, Yeong-Ung Kim, Jun-Oh Park, Hyo-Sung Ahn

    Abstract: As multiple and heterogenous unmanned vehicle systems continue to play an increasingly important role in addressing complex missions in the real world, the need for effective cooperation among unmanned vehicles becomes paramount. The concept of autonomous cooperation, wherein unmanned vehicles cooperate without human intervention or human control, offers promising avenues for enhancing the efficie… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  4. arXiv:2404.15333  [pdf, other

    eess.SP cs.LG

    EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection

    Authors: JuneYoung Park, Da Young Kim, Yunsoo Kim, Jisu Yoo, Tae Joon Kim

    Abstract: Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are no… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  5. arXiv:2404.07217  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    Attention-aware Semantic Communications for Collaborative Inference

    Authors: Jiwoong Im, Nayoung Kwon, Taewoo Park, Jiheon Woo, Jaeho Lee, Yongjune Kim

    Abstract: We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. There… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 February, 2024; originally announced April 2024.

  6. arXiv:2404.02592  [pdf

    cs.CL cs.SD eess.AS

    Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation

    Authors: Ye** Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  7. arXiv:2404.02477  [pdf, ps, other

    eess.SP cs.AI

    Enhancing Sum-Rate Performance in Constrained Multicell Networks: A Low-Information Exchange Approach

    Authors: You** Kim, Jonggyu Jang, Hyun Jong Yang

    Abstract: Despite the extensive research on massive MIMO systems for 5G telecommunications and beyond, the reality is that many deployed base stations are equipped with a limited number of antennas rather than supporting massive MIMO configurations. Furthermore, while the cell-less network concept, which eliminates cell boundaries, is under investigation, practical deployments often grapple with significant… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 5 pages, 12 figures

  8. arXiv:2404.00559  [pdf, other

    eess.SY

    Hierarchical Climate Control Strategy for Electric Vehicles with Door-Opening Consideration

    Authors: Sanghyeon Nam, Hye** Lee, Youngki Kim, Kyoung hyun Kwak, Kyoungseok Han

    Abstract: This study proposes a novel climate control strategy for electric vehicles (EVs) by addressing door-opening interruptions, an overlooked aspect in EV thermal management. We create and validate an EV simulation model that incorporates door-opening scenarios. Three controllers are compared using the simulation model: (i) a hierarchical non-linear model predictive control (NMPC) with a unique coolant… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: This paper, intended for presentation at the IEEE Intelligent Vehicles Symposium (IV) 2024, comprises six pages and includes eight figures

  9. arXiv:2403.05136  [pdf, other

    cs.RO eess.SP

    DeRO: Dead Reckoning Based on Radar Odometry With Accelerometers Aided for Robot Localization

    Authors: Hoang Viet Do, Yong Hun Kim, Joo Han Lee, Min Ho Lee, ** Woo Song

    Abstract: In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mit… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 9 pages, 5 figures, 1 table, conference

    ACM Class: I.2.9

  10. arXiv:2403.01256  [pdf

    eess.SY

    Resilient Microgrid Formation Considering Communication Interruptions

    Authors: Jian Zhong, Chen Chen, Young-** Kim, Yuxiong Huang, Mengjie Teng, Yiheng Bian, Zhaohong Bie

    Abstract: Distribution system (DS) communication failures following extreme events often degrade monitoring and control functions, thus preventing the acquisition of complete global DS component state information, on which existing post-disaster DS restoration methods are based. This letter proposes methods of inferring the states of DS components in the case of incomplete component state information. By us… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  11. arXiv:2402.16998  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    What Do Language Models Hear? Probing for Auditory Representations in Language Models

    Authors: Jerry Ngo, Yoon Kim

    Abstract: This work explores whether language models encode meaningfully grounded representations of sounds of objects. We learn a linear probe that retrieves the correct text representation of an object given a snippet of audio related to that object, where the sound representation is given by a pretrained audio model. This probe is trained via a contrastive loss that pushes the language representations an… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  12. arXiv:2402.06463  [pdf, other

    eess.IV cs.CV cs.LG

    Cardiac ultrasound simulation for autonomous ultrasound navigation

    Authors: Abdoul Aziz Amadou, Laura Peralta, Paul Dryburgh, Paul Klein, Kaloian Petkov, Richard James Housden, Vivek Singh, Rui Liao, Young-Ho Kim, Florin Christian Ghesu, Tommaso Mansi, Ronak Rajani, Alistair Young, Kawal Rhode

    Abstract: Ultrasound is well-established as an imaging modality for diagnostic and interventional purposes. However, the image quality varies with operator skills as acquiring and interpreting ultrasound images requires extensive training due to the imaging artefacts, the range of acquisition parameters and the variability of patient anatomies. Automating the image acquisition task could improve acquisition… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 24 pages, 10 figures, 5 tables

    ACM Class: I.6.0; I.5.4; J.3

  13. arXiv:2402.05402  [pdf, other

    cs.NI eess.SP eess.SY

    A State-of-the-art Survey on Full-duplex Network Design

    Authors: Yonghwi Kim, Hyung-Joo Moon, Hanju Yoo, Byoungnam, Kim, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: Full-duplex (FD) technology is gaining popularity for integration into a wide range of wireless networks due to its demonstrated potential in recent studies. In contrast to half-duplex (HD) technology, the implementation of FD in networks necessitates considering inter-node interference (INI) from various network perspectives. When deploying FD technology in networks, several critical factors must… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 23 pages, 10 figures, To appear in Proceedings of the IEEE

  14. arXiv:2402.05350  [pdf, other

    cs.CV eess.IV

    Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

    Authors: Junghun Cha, Ali Haider, Seoyun Yang, Hoeyeong **, Subin Yang, A. F. M. Shahab Uddin, Jaehyoung Kim, Soo Ye Kim, Sung-Ho Bae

    Abstract: A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to AAAI 2024

  15. arXiv:2401.15313  [pdf, other

    cs.RO cs.CV eess.SY math.OC

    Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

    Authors: Kihoon Shin, Hyunjae Sim, Seungwon Nam, Yonghee Kim, Jae Hu, Kwang-Ki K. Kim

    Abstract: In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose es… ▽ More

    Submitted 4 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: 20 pages, 21 figures

    MSC Class: 93C85; 93E11; 93E24; 90C26; 93E10; 62M20;

  16. arXiv:2401.11268  [pdf, other

    cs.CL cs.SD eess.AS

    Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric

    Authors: Golara Javadi, Kamer Ali Yuksel, Yunsu Kim, Thiago Castro Ferreira, Mohamed Al-Badrashiny

    Abstract: In the realm of automatic speech recognition (ASR), the quest for models that not only perform with high accuracy but also offer transparency in their decision-making processes is crucial. The potential of quality estimation (QE) metrics is introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) in ASR systems. Through experiments and analyses, the capabilitie… ▽ More

    Submitted 2 February, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Journal ref: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024), Seoul, Korea

  17. arXiv:2401.02014  [pdf, other

    cs.SD eess.AS

    Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations

    Authors: Ye** Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that mode… ▽ More

    Submitted 5 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  18. arXiv:2312.03312  [pdf, other

    cs.CL cs.SD eess.AS

    Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

    Authors: Wonjun Lee, Gary Geunbae Lee, Yunsu Kim

    Abstract: This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. A… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 8 pages, ASRU 2023 Accepted

  19. arXiv:2312.01842  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking

    Authors: Jihyun Lee, Ye** Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee

    Abstract: Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted in ASRU 2023

  20. arXiv:2312.01285  [pdf, other

    eess.SY

    A Literature Review on the Smart Wheelchair Systems

    Authors: Yane Kim, Bharath Velamala, Youngseo Choi, Yu** Kim, Hyunkin Kim, Nishad Kulkarni, Eung-Joo Lee

    Abstract: This study offers an in-depth analysis of smart wheelchair (SW) systems, charting their progression from early developments to future innovations. It delves into various Brain-Computer Interface (BCI) systems, including mu rhythm, event-related potential, and steady-state visual evoked potential. The paper addresses challenges in signal categorization, proposing the sparse Bayesian extreme learnin… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  21. arXiv:2312.00919  [pdf, other

    eess.SP

    Rethinking Skip Connections in Spiking Neural Networks with Time-To-First-Spike Coding

    Authors: Youngeun Kim, Adar Kahana, Ruokai Yin, Yuhang Li, Panos Stinis, George Em Karniadakis, Priyadarshini Panda

    Abstract: Time-To-First-Spike (TTFS) coding in Spiking Neural Networks (SNNs) offers significant advantages in terms of energy efficiency, closely mimicking the behavior of biological neurons. In this work, we delve into the role of skip connections, a widely used concept in Artificial Neural Networks (ANNs), within the domain of SNNs with TTFS coding. Our focus is on two distinct types of skip connection a… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  22. arXiv:2311.17396  [pdf, other

    cs.CV eess.IV

    Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset

    Authors: Yu** Jeon, Eunsue Choi, Youngchan Kim, Yunseong Moon, Khalid Omer, Felix Heide, Seung-Hwan Baek

    Abstract: Image datasets are essential not only in validating existing methods in computer vision but also in develo** new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing da… ▽ More

    Submitted 30 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  23. arXiv:2311.07227  [pdf, other

    cs.OS eess.SY

    CARTOS: A Charging-Aware Real-Time Operating System for Intermittent Batteryless Devices

    Authors: Mohsen Karimi, Yidi Wang, Youngbin Kim, Yoo** Lim, Hyoseung Kim

    Abstract: This paper presents CARTOS, a charging-aware real-time operating system designed to enhance the functionality of intermittently-powered batteryless devices (IPDs) for various Internet of Things (IoT) applications. While IPDs offer significant advantages such as extended lifespan and operability in extreme environments, they pose unique challenges, including the need to ensure forward progress of p… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  24. arXiv:2311.04753  [pdf, other

    eess.AS

    1SPU: 1-step Speech Processing Unit

    Authors: Karan Singla, Shahab Jalalvand, Yeon-Jun Kim, Antonio Moreno Daniel, Srinivas Bangalore, Andrej Ljolje, Ben Stern

    Abstract: Recent studies have made some progress in refining end-to-end (E2E) speech recognition encoders by applying Connectionist Temporal Classification (CTC) loss to enhance named entity recognition within transcriptions. However, these methods have been constrained by their exclusive use of the ASCII character set, allowing only a limited array of semantic labels. We propose 1SPU, a 1-step Speech Proce… ▽ More

    Submitted 10 December, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted at International Conference on Natural Language Processing 2023

  25. arXiv:2310.14506  [pdf, other

    eess.SP cs.DB

    Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning

    Authors: Ji Youn Lee, Changbeom Shim, Hoa Van Nguyen, Tran Thien Dat Nguyen, Hyun** Choi, Youngho Kim

    Abstract: Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the i… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures

  26. arXiv:2310.07654  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Audio-Visual Neural Syntax Acquisition

    Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

    Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  27. arXiv:2310.06546  [pdf, other

    cs.SD cs.CL eess.AS

    AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion

    Authors: Haeyun Choi, Jio Gim, Yuho Lee, Youngin Kim, Young-Joo Suh

    Abstract: This paper proposes a simple and robust zero-shot voice conversion system with a cycle structure and mel-spectrogram pre-processing. Previous works suffer from information loss and poor synthesis quality due to their reliance on a carefully designed bottleneck structure. Moreover, models relying solely on self-reconstruction loss struggled with reproducing different speakers' voices. To address th… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  28. arXiv:2309.14741  [pdf, other

    eess.AS cs.SD

    Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

    Authors: Hee-Soo Heo, KiHyun Nam, Bong-** Lee, Youngki Kwon, Minjae Lee, You ** Kim, Joon Son Chung

    Abstract: In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remain… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  29. arXiv:2309.14668  [pdf

    physics.optics cs.GR eess.IV physics.app-ph physics.comp-ph

    Depolarized Holography with Polarization-multiplexing Metasurface

    Authors: Seung-Woo Nam, Young** Kim, Dongyeon Kim, Yoonchan Jeong

    Abstract: The evolution of computer-generated holography (CGH) algorithms has prompted significant improvements in the performances of holographic displays. Nonetheless, they start to encounter a limited degree of freedom in CGH optimization and physical constraints stemming from the coherent nature of holograms. To surpass the physical limitations, we consider polarization as a new degree of freedom by uti… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 15 pages, 13 figures, to be published in SIGGRAPH Asia 2023

  30. arXiv:2309.12306  [pdf, other

    cs.CV cs.SD eess.AS

    TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

    Authors: Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, You ** Kim, Youngjoon Jang, Joon Son Chung

    Abstract: The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full se… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  31. arXiv:2309.00372  [pdf, other

    eess.IV cs.CV

    On the Localization of Ultrasound Image Slices within Point Distribution Models

    Authors: Lennart Bastian, Vincent Bürgin, Ha Young Kim, Alexander Baumann, Benjamin Busam, Mahdi Saleh, Nassir Navab

    Abstract: Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US). Longitudinal nodule tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology. This task, however, imposes a substantial cognitive load on clinicians due to the inherent challenge of maintaining a mental 3D reconstruction of the organ. We thus present a framework for autom… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: ShapeMI Workshop @ MICCAI 2023; 12 pages 2 figures

  32. arXiv:2308.15791  [pdf, other

    cs.CV eess.IV

    Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

    Authors: Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui Yong Kim

    Abstract: Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In N… ▽ More

    Submitted 5 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  33. arXiv:2308.04009  [pdf, other

    eess.SY

    Safe Control Synthesis for Multicopter via Control Barrier Function Backstep**

    Authors: **rae Kim, Youdan Kim

    Abstract: A safe controller for multicopter is proposed using control barrier function. Multicopter dynamics are reformulated to deal with mixed-relative-degree and non-strict-feedback-form dynamics, and a time-varying safe backstep** controller is designed. Despite the time-varying variation, it is proven that the control input can be obtained by solving quadratic programming with affine inequality const… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 6 pages, 2 figures, accepted for IEEE Conference on Decision and Control (CDC) 2023

  34. arXiv:2308.02416  [pdf, other

    eess.SP cs.LG

    Local-Global Temporal Fusion Network with an Attention Mechanism for Multiple and Multiclass Arrhythmia Classification

    Authors: Yun Kwan Kim, Minji Lee, Kunwook Jo, Hee Seok Song, Seong-Whan Lee

    Abstract: Clinical decision support systems (CDSSs) have been widely utilized to support the decisions made by cardiologists when detecting and classifying arrhythmia from electrocardiograms (ECGs). However, forming a CDSS for the arrhythmia classification task is challenging due to the varying lengths of arrhythmias. Although the onset time of arrhythmia varies, previously developed methods have not consid… ▽ More

    Submitted 13 October, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 14 pages, 6 figures

    MSC Class: 68T07; 92C55

  35. arXiv:2308.01025  [pdf

    eess.SY

    Error Analysis of CORDIC Processor with FPGA Implementation

    Authors: Young-Man Kim

    Abstract: The coordinate rotation digital computer (CORDIC) is a shift-add based fast computing algorithm which has been found in many digital signal processing (DSP) applications. In this paper, a detailed error analysis based on mean square error criteria and its implementation on FPGA is presented. Two considered error sources are an angle approximation error and a quantization error due to finite word l… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 5 pages, 7 Figures

  36. arXiv:2307.13665  [pdf

    eess.SY

    FPGA Implementation of Robust Residual Generator

    Authors: Y. M. Kim

    Abstract: In this paper, one can explicitly see the process of implementing the robust residual generator on digital domain, especially on FPGA. Firstly, the baseline model is developed in double precision floating point format. To develop the baseline model, key parameters such as SNR and detection window length are selected in the identification stage. (Please refer to the uploaded paper because this box… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: 6 pages, 3 figures

  37. End-to-End Learnable Multi-Scale Feature Compression for VCM

    Authors: Yeongwoong Kim, Hyewon Jeong, Janghyun Yu, Younhee Kim, Jooyoung Lee, Se Yoon Jeong, Hui Yong Kim

    Abstract: The proliferation of deep learning-based machine vision applications has given rise to a new type of compression, so called video coding for machine (VCM). VCM differs from traditional video coding in that it is optimized for machine vision performance instead of human visual quality. In the feature compression track of MPEG-VCM, multi-scale features extracted from images are subject to compressio… ▽ More

    Submitted 8 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 13 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technology

  38. arXiv:2306.13020  [pdf

    eess.IV cs.AI cs.CV

    Toward Automated Detection of Microbleeds with Anatomical Scale Localization: A Complete Clinical Diagnosis Support Using Deep Learning

    Authors: Jun-Ho Kim, Young Noh, Haejoon Lee, Seul Lee, Woo-Ram Kim, Koung Mi Kang, Eung Yeop Kim, Mohammed A. Al-masni, Dong-Hyun Kim

    Abstract: Cerebral Microbleeds (CMBs) are chronic deposits of small blood products in the brain tissues, which have explicit relation to various cerebrovascular diseases depending on their anatomical location, including cognitive decline, intracerebral hemorrhage, and cerebral infarction. However, manual detection of CMBs is a time-consuming and error-prone process because of their sparse and tiny structura… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 16 pages, 10 figures,3 tables

  39. arXiv:2306.12562  [pdf, other

    cs.CV eess.IV

    Neural Spectro-polarimetric Fields

    Authors: Youngchan Kim, Wonjoon **, Sunghyun Cho, Seung-Hwan Baek

    Abstract: Modeling the spatial radiance distribution of light rays in a scene has been extensively explored for applications, including view synthesis. Spectrum and polarization, the wave properties of light, are often neglected due to their integration into three RGB spectral bands and their non-perceptibility to human vision. However, these properties are known to encompass substantial material and geomet… ▽ More

    Submitted 10 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  40. arXiv:2306.05291  [pdf

    eess.SP cs.AI cs.LG

    One shot learning based drivers head movement identification using a millimetre wave radar sensor

    Authors: Hong Nhung Nguyen, Seongwook Lee, Tien Tung Nguyen, Yong Hwa Kim

    Abstract: Concentration of drivers on traffic is a vital safety issue; thus, monitoring a driver being on road becomes an essential requirement. The key purpose of supervision is to detect abnormal behaviours of the driver and promptly send warnings to him her for avoiding incidents related to traffic accidents. In this paper, to meet the requirement, based on radar sensors applications, the authors first u… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  41. arXiv:2306.00680  [pdf, other

    cs.SD cs.AI eess.AS

    Encoder-decoder multimodal speaker change detection

    Authors: Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You ** Kim, Young-ki Kwon, Minjae Lee, Bong-** Lee

    Abstract: The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are bui… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted for presentation at INTERSPEECH 2023

  42. Score-balanced Loss for Multi-aspect Pronunciation Assessment

    Authors: Hee** Do, Yunsu Kim, Gary Geunbae Lee

    Abstract: With rapid technological growth, automatic pronunciation assessment has transitioned toward systems that evaluate pronunciation in various aspects, such as fluency and stress. However, despite the highly imbalanced score labels within each aspect, existing studies have rarely tackled the data imbalance problem. In this paper, we suggest a novel loss function, score-balanced loss, to address the pr… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

  43. Sample-Efficient Learning for a Surrogate Model of Three-Phase Distribution System

    Authors: Hoang Tien Nguyen, Young-** Kim, Dae-Hyun Choi

    Abstract: A surrogate model that accurately predicts distribution system voltages is crucial for reliable smart grid planning and operation. This letter proposes a fixed-point data-driven surrogate modeling method that employs a limited dataset to learn the power-voltage relationship of an unbalanced three-phase distribution system. The proposed surrogate model is designed using a fixed-point load-flow equa… ▽ More

    Submitted 18 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Journal ref: IEEE Transactions on Power Systems, vol. 39, no. 1, pp. 2361-2364, Jan. 2024

  44. arXiv:2305.14732  [pdf, other

    eess.SY

    Increasing Electric Vehicles Utilization in Transit Fleets using Learning, Predictions, Optimization, and Automation

    Authors: Jacopo Guanetti, Yeojun Kim, Xu Shen, Joel Donham, Santosh Alexander, Bruce Wootton, Francesco Borrelli

    Abstract: This work presents a novel hierarchical approach to increase Battery Electric Buses (BEBs) utilization in transit fleets. The proposed approach relies on three key components. A learning-based BEB digital twin cloud platform is used to accurately predict BEB charge consumption on a per vehicle, per driver, and per route basis, and accurately predict the time-to-charge BEB batteries to any level. T… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at the 35th IEEE Intelligent Vehicles Symposium (IV 2023)

  45. arXiv:2305.08878  [pdf, other

    eess.IV cs.CV cs.LG

    Learning to Learn Unlearned Feature for Brain Tumor Segmentation

    Authors: Seungyub Han, Yeongmo Kim, Seokhyeon Ha, Jungwoo Lee, Seunghong Choi

    Abstract: We propose a fine-tuning algorithm for brain tumor segmentation that needs only a few data samples and helps networks not to forget the original tasks. Our approach is based on active learning and meta-learning. One of the difficulties in medical image segmentation is the lack of datasets with proper annotations, because it requires doctors to tag reliable annotation and there are many variants of… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

    Comments: Medical Imaging Meets NeurIPS 2018

  46. arXiv:2303.16511  [pdf, other

    eess.AS

    Joint unsupervised and supervised learning for context-aware language identification

    Authors: **seok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim

    Abstract: Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this probl… ▽ More

    Submitted 14 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  47. arXiv:2303.16205  [pdf

    eess.IV cs.LG physics.optics

    mHealth hyperspectral learning for instantaneous spatiospectral imaging of hemodynamics

    Authors: Yuhyun Ji, Sang Mok Park, Semin Kwon, Jung Woo Leem, Vidhya Vijayakrishnan Nair, Yunjie Tong, Young L. Kim

    Abstract: Hyperspectral imaging acquires data in both the spatial and frequency domains to offer abundant physical or biological information. However, conventional hyperspectral imaging has intrinsic limitations of bulky instruments, slow data acquisition rate, and spatiospectral tradeoff. Here we introduce hyperspectral learning for snapshot hyperspectral imaging in which sampled hyperspectral data in a sm… ▽ More

    Submitted 5 April, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Journal ref: PNAS Nexus, pgad111, 2023

  48. arXiv:2303.07592  [pdf, other

    eess.AS cs.SD

    Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

    Authors: Hyungjun Lim, Younggwan Kim, Kiho Yeom, Eunjoo Seo, Hoodong Lee, Stanley Jungkyu Choi, Honglak Lee

    Abstract: Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  49. arXiv:2302.10186  [pdf, other

    eess.AS cs.CL cs.SD

    E2E Spoken Entity Extraction for Virtual Agents

    Authors: Karan Singla, Yeon-Jun Kim, Srinivas Bangalore

    Abstract: In human-computer conversations, extracting entities such as names, street addresses and email addresses from speech is a challenging task. In this paper, we study the impact of fine-tuning pre-trained speech encoders on extracting spoken entities in human-readable form directly from speech without the need for text transcription. We illustrate that such a direct approach optimizes the encoder to… ▽ More

    Submitted 9 November, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted at EMNLP 2023 Industry Track

  50. arXiv:2301.09058  [pdf, other

    eess.AS cs.LG

    Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

    Authors: Kwangje Baeg, Yeong-Gwan Kim, Young-Sub Han, Byoung-Ki Jeon

    Abstract: Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech informatio… ▽ More

    Submitted 22 January, 2023; originally announced January 2023.