Skip to main content

Showing 1–50 of 1,868 results for author: Kim

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00759  [pdf

    eess.IV cs.CV cs.LG

    Analysis of Modern Computer Vision Models for Blood Cell Classification

    Authors: Alexander Kim, Ryan Kim

    Abstract: The accurate classification of white blood cells and related blood components is crucial for medical diagnoses. While traditional manual examinations and automated hematology analyzers have been widely used, they are often slow and prone to errors. Recent advancements in deep learning have shown promise for addressing these limitations. Earlier studies have demonstrated the viability of convolutio… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 18 pages, 8 figures

    ACM Class: I.4.9

  2. arXiv:2407.00218  [pdf, other

    eess.SY cs.RO

    Resilient Estimator-based Control Barrier Functions for Dynamical Systems with Disturbances and Noise

    Authors: Chuyuan Tao, Wenbin Wan, Junjie Gao, Bihao Mo, Hunmin Kim, Naira Hovakimyan

    Abstract: Control Barrier Function (CBF) is an emerging method that guarantees safety in path planning problems by generating a control command to ensure the forward invariance of a safety set. Most of the developments up to date assume availability of correct state measurements and absence of disturbances on the system. However, if the system incurs disturbances and is subject to noise, the CBF cannot guar… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  3. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, ** Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  5. arXiv:2406.17597  [pdf, other

    math.NA cs.LG eess.SP eess.SY math.ST

    Constructing structured tensor priors for Bayesian inverse problems

    Authors: Kim Batselier

    Abstract: Specifying a prior distribution is an essential part of solving Bayesian inverse problems. The prior encodes a belief on the nature of the solution and this regularizes the problem. In this article we completely characterize a Gaussian prior that encodes the belief that the solution is a structured tensor. We first define the notion of (A,b)-constrained tensors and show that they describe a large… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    MSC Class: 15A29; 15A69; 62F15

  6. arXiv:2406.17310  [pdf, other

    eess.AS

    High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

    Authors: Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

    Abstract: We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech2024

  7. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  8. arXiv:2406.16886  [pdf, other

    eess.SP cs.CV cs.LG

    Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

    Authors: Parham Zolfaghari, Vitor Fortes Rey, Lala Ray, Hyun Kim, Sungho Suh, Paul Lukowicz

    Abstract: The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from… ▽ More

    Submitted 25 April, 2024; originally announced June 2024.

    Comments: Accepted in IEEE 6th International Conference on Activity and Behavior Computing (ABC 2024)

  9. arXiv:2406.16716  [pdf, other

    eess.AS cs.CR cs.SD

    One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection

    Authors: Hyun Myung Kim, Kangwook Jang, Hoirin Kim

    Abstract: As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafid… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  10. arXiv:2406.15225  [pdf, other

    cs.AI cs.RO eess.SP

    Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting

    Authors: Jiyong Oh, Syed M. Raza, Lusungu J. Mwasinga, Moonseong Kim, Hyunseung Choo

    Abstract: Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. T… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures, Published in the 2024 IEEE Network Operations and Management Symposium (NOMS 2024)

  11. arXiv:2406.14372  [pdf, ps, other

    eess.SY

    Ring-LWE based encrypted controller with unlimited number of recursive multiplications and effect of error growth

    Authors: Yeongjun Jang, Joowon Lee, Seonhong Min, Hyesun Kwak, Junsoo Kim, Yongsoo Song

    Abstract: In this paper, we propose a method to encrypt linear dynamic controllers that enables an unlimited number of recursive homomorphic multiplications on a Ring Learning With Errors (Ring-LWE) based cryptosystem without bootstrap**. Unlike LWE based schemes, where a scalar error is injected during encryption for security, Ring-LWE based schemes are based on polynomial rings and inject error as a pol… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 3 figures

  12. arXiv:2406.13935  [pdf, other

    eess.AS cs.AI cs.SD

    CONMOD: Controllable Neural Frame-based Modulation Effects

    Authors: Gyubin Lee, Hounsu Kim, Junwon Lee, Juhan Nam

    Abstract: Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single blac… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  13. arXiv:2406.13248  [pdf, other

    cs.IT eess.SP

    Overlay Space-Air-Ground Integrated Networks with SWIPT-Empowered Aerial Communications

    Authors: Anuradha Verma, Pankaj Kumar Sharma, Pawan Kumar, Dong In Kim

    Abstract: In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employ… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 36 pages, 14 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  14. arXiv:2406.12721  [pdf

    eess.AS cs.SD

    Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

    Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

    Abstract: In this report, we propose three novel methods for develo** a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More

    Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 challenge Task4, 4 pages

  15. arXiv:2406.12688  [pdf, other

    eess.AS eess.SP

    Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

    Authors: Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi

    Abstract: This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  16. arXiv:2406.12632  [pdf, other

    eess.IV cs.CV

    Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Image Synthesis: T1 MRI to Tau-PET

    Authors: Symac Kim, Junho Moon, Haejun Chung, Ikbeom Jang

    Abstract: Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its inv… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

  17. arXiv:2406.12254  [pdf, other

    eess.IV cs.CV

    Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

    Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael Kim, Shunxing Bao, Ann Xenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  18. arXiv:2406.11427  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

    Authors: Keon Lee, Dong Won Kim, Jaehyeon Kim, Jaewoong Cho

    Abstract: Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  19. arXiv:2406.11248  [pdf

    eess.AS cs.AI cs.SD

    Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9

    Authors: Do Hyun Lee, Yoonah Song, Hong Kook Kim

    Abstract: We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 9, 4 pages

  20. arXiv:2406.11220  [pdf, other

    eess.SP

    No Analog Combiner TTD-based Hybrid Precoding for Multi-User Sub-THz Communications

    Authors: Dang Qua Nguyen, Alexei Ashikhmin, Hong Yang, Taejoon Kim

    Abstract: We address the design and optimization of real-world-suitable hybrid precoders for multi-user wideband sub-terahertz (sub-THz) communications. We note that the conventional fully connected true-time delay (TTD)-based architecture is impractical because there is no room for the required large number of analog signal combiners in the circuit board. Additionally, analog signal combiners incur signifi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  21. arXiv:2406.10549  [pdf, other

    eess.AS cs.CL cs.SD

    Lightweight Audio Segmentation for Long-form Speech Translation

    Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

    Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  22. arXiv:2406.09894  [pdf, other

    eess.AS cs.SD

    Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis

    Authors: Taewoo Kim, Choongsang Cho, Young Han Lee

    Abstract: In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms. Recent end-to-end SVS models have demonstrated the capability of synthesizing high-fidelity singing voices. However, owing to deterministic pitch conditioning, they do not fully addr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  23. arXiv:2406.09819  [pdf, other

    eess.AS

    Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments

    Authors: Jihyun Kim, Stijn Kindt, Nilesh Madhu, Hong-Goo Kang

    Abstract: Ad-hoc distributed microphone environments, where microphone locations and numbers are unpredictable, present a challenge to traditional deep learning models, which typically require fixed architectures. To tailor deep learning models to accommodate arbitrary array configurations, the Transform-Average-Concatenate (TAC) layer was previously introduced. In this work, we integrate TAC layers with du… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  24. arXiv:2406.09345  [pdf, other

    cs.CL cs.SD eess.AS

    DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

    Authors: Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu

    Abstract: The integration of pre-trained text-based large language models (LLM) with speech input has enabled instruction-following capabilities for diverse speech tasks. This integration requires the use of a speech encoder, a speech adapter, and an LLM, trained on diverse tasks. We propose the use of discrete speech units (DSU), rather than continuous-valued speech encoder outputs, that are converted to t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  25. arXiv:2406.09286  [pdf, other

    eess.AS cs.SD

    FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching

    Authors: Chaeyoung Jung, Suyeon Lee, Ji-Hoon Kim, Joon Son Chung

    Abstract: This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is limited by slow inference speeds and computational complexity. To address this issue, we present FlowAVSE which enhances the inference speed and reduces the numbe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  26. arXiv:2406.09025  [pdf, other

    eess.SP

    Site-Specific Radio Channel Representation -- Current State and Future Applications

    Authors: Thomas Zemen, Jorge Gomez-Ponce, Aniruddha Chandra, Michael Walter, Enes Aksoy, Ruisi He, David Matolak, Minseok Kim, Jun-ichi Takada, Sana Salous, Reinaldo Valenzuela, Andreas F. Molisch

    Abstract: A site-specific radio channel representation considers the surroundings of the communication system through the environment geometry, such as buildings, vegetation, and mobile objects including their material and surface properties. In this article, we focus on communication technologies for 5G and beyond that are increasingly able to exploit the specific environment geometry for both communicatio… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, submitted to the IEEE Communication Magazine

  27. arXiv:2406.08714  [pdf, other

    eess.SP

    Real-time Digital RF Emulation -- II: A Near Memory Custom Accelerator

    Authors: Mandovi Mukherjee, Xiangyu Mao, Nael Rahman, Coleman DeLude, Joe Driscoll, Sudarshan Sharma, Payman Behnam, Uday Kamal, Jongseok Woo, Daehyun Kim, Sharjeel Khan, Jianming Tong, Jamin Seo, Prachi Sinha, Madhavan Swaminathan, Tushar Krishna, Santosh Pande, Justin Romberg, Saibal Mukhopadhyay

    Abstract: A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  28. arXiv:2406.08328  [pdf, other

    eess.AS

    Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation

    Authors: Tsun-An Hsieh, Heeyoul Choi, Minje Kim

    Abstract: Recent studies highlight the potential of textual modalities in conditioning the speech separation model's inference process. However, regularization-based methods remain underexplored despite their advantages of not requiring auxiliary text data during the test time. To address this gap, we introduce a timed text-based regularization (TTR) method that uses language model-derived semantics to impr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  29. arXiv:2406.07909  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

    Authors: Eungbeom Kim, Hantae Kim, Kyogu Lee

    Abstract: Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduce… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  30. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  31. arXiv:2406.07803  [pdf, other

    cs.SD cs.AI eess.AS

    EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

    Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  32. arXiv:2406.07728  [pdf, other

    cs.RO eess.SY

    Visibility-Aware RRT* for Safety-Critical Navigation of Perception-Limited Robots in Unknown Environments

    Authors: Taekyung Kim, Dimitra Panagou

    Abstract: Safe autonomous navigation in unknown environments remains a critical challenge for robots with limited sensing capabilities. While safety-critical control techniques, such as Control Barrier Functions (CBFs), have been proposed to ensure safety, their effectiveness relies on the assumption that the robot has complete knowledge of its surroundings. In reality, robots often operate with restricted… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Our project page can be found at: https://www.taekyung.me/visibility-rrt

  33. arXiv:2406.07103  [pdf, other

    eess.AS cs.AI

    MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

    Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-** Yu

    Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  34. arXiv:2406.06786  [pdf, other

    cs.SD cs.AI eess.AS

    BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

    Authors: June-Woo Kim, Miika Toikkanen, Yera Choi, Seoung-Eun Moon, Ho-Young Jung

    Abstract: Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model u… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted INTERSPEECH 2024

  35. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  36. arXiv:2406.05983  [pdf, other

    eess.AS

    Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation

    Authors: Ui-Hyeop Shin, Sangyoun Lee, Taehan Kim, Hyung-Min Park

    Abstract: Since the success of a time-domain speech separation, further improvements have been made by expanding the length and channel of a feature sequence to increase the amount of computation. When temporally expanded to a long sequence, the feature is segmented into chunks as a dual-path model in most studies of speech separation. In particular, it is common for the process of separating features corre… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Project Page https://fordemopage.github.io/SepReformer

  37. arXiv:2406.05965  [pdf, other

    eess.AS cs.AI

    MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

    Authors: Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung ** Choi, Nam Soo Kim

    Abstract: In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  38. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  39. arXiv:2406.05270  [pdf

    physics.med-ph cs.CV cs.LG eess.IV

    fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI

    Authors: Eddy Solomon, Patricia M. Johnson, Zhengguo Tan, Radhika Tibrewala, Yvonne W. Lui, Florian Knoll, Linda Moy, Sungheon Gene Kim, Laura Heacock

    Abstract: This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  40. arXiv:2406.02936  [pdf

    eess.IV cs.CV

    Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI

    Authors: Jonghun Kim, Hyun** Park

    Abstract: Breast cancer is the most prevalent cancer among women and predicting pathologic complete response (pCR) after anti-cancer treatment is crucial for patient prognosis and treatment customization. Deep learning has shown promise in medical imaging diagnosis, particularly when utilizing multiple imaging modalities to enhance accuracy. This study presents a model that predicts pCR in breast cancer pat… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 5 figures, IEEE ISBI 2024 proceedings

  41. arXiv:2406.02562  [pdf, other

    eess.AS cs.AI cs.CL

    Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

    Authors: Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

    Abstract: In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter… ▽ More

    Submitted 23 April, 2024; originally announced June 2024.

    Comments: Table 2 is revised

    Journal ref: ICASSP 2024 Workshop(HSCMA 2024) paper

  42. arXiv:2406.02479  [pdf

    cs.LG eess.SP eess.SY

    Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis

    Authors: Yi Hu, Hyeon** Kim, Kai Ye, Ning Lu

    Abstract: This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate t… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  43. arXiv:2406.02000  [pdf, other

    cs.NI eess.SP

    Advancing Ultra-Reliable 6G: Transformer and Semantic Localization Empowered Robust Beamforming in Millimeter-Wave Communications

    Authors: Avi Deb Raha, Kitae Kim, Apurba Adhikary, Mrityunjoy Gain, Choong Seon Hong

    Abstract: Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead,… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  44. arXiv:2406.00669  [pdf

    eess.SY econ.GN

    Multi-technology co-optimization approach for sustainable hydrogen and electricity supply chains considering variability and demand scale

    Authors: Sunwoo Kim, Joungho Park, Jay H. Lee

    Abstract: In the pursuit of a carbon-neutral future, hydrogen emerges as a pivotal element, serving as a carbon-free energy carrier and feedstock. As efforts to decarbonize sectors such as heating and transportation intensify, understanding and navigating through the dynamics of hydrogen demand expansion becomes critical. Transitioning to hydrogen economy is complicated by varying regional scales and types… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  45. arXiv:2406.00665   

    econ.GN eess.SY

    Integrating solid direct air capture systems with green hydrogen production: Economic synergy of sector coupling

    Authors: Sunwoo Kim, Joungho Park, Jay H. Lee

    Abstract: In the global pursuit of sustainable energy solutions, mitigating carbon dioxide (CO2) emissions stands as a pivotal challenge. With escalating atmospheric CO2 levels, the imperative of direct air capture (DAC) systems becomes evident. Simultaneously, green hydrogen (GH) emerges as a pivotal medium for renewable energy. Nevertheless, the substantial expenses associated with these technologies impe… ▽ More

    Submitted 28 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: Some of the results of our previous preprint paper are flawed, and we are withdrawing them to prevent the spread of incorrect knowledge

  46. arXiv:2406.00123  [pdf

    eess.IV cs.CV

    Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration

    Authors: Mingyuan Meng, Dagan Feng, Lei Bi, **man Kim

    Abstract: Deformable image registration is a fundamental step for medical image analysis. Recently, transformers have been used for registration and outperformed Convolutional Neural Networks (CNNs). Transformers can capture long-range dependence among image features, which have been shown beneficial for registration. However, due to the high computation/memory loads of self-attention, transformers are typi… ▽ More

    Submitted 12 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

    Comments: Accepted at CVPR2024 as Oral Presentation && Best Paper Candidate

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9645-9654

  47. arXiv:2405.19771  [pdf, other

    cs.NI eess.SP

    Data Service Maximization in Integrated Terrestrial-Non-Terrestrial 6G Networks: A Deep Reinforcement Learning Approach

    Authors: Nway Nway Ei, Kitae Kim, Yan Kyaw Tun, Choong Seon Hong

    Abstract: Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestria… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  48. arXiv:2405.19380  [pdf, other

    stat.ML cs.LG eess.SY

    Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

    Authors: Yeoneung Kim, Gihun Kim, Insoon Yang

    Abstract: We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby acc… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 61 pages, 6 figures

  49. arXiv:2405.19346  [pdf, other

    eess.SP cs.AI cs.LG

    Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

    Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

    Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted at MICCAI 2024

  50. arXiv:2405.18701  [pdf, other

    eess.SP

    Near-Field Localization with RIS via Two-Dimensional Signal Path Classification

    Authors: Jeongwan Kang, Seung-Woo Ko, Sunwoo Kim

    Abstract: In this paper, we propose two-dimensional signal path classification (2D-SPC) for reconfigurable intelligent surface (RIS)-assisted near-field (NF) localization. In the NF regime, multiple RIS-driven signal paths (SPs) can contribute to precise localization if these are decomposable and the reflected locations on the RIS are known, referred to as SP decomposition (SPD) and SP labeling (SPL), respe… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 15pages, 12figures, Submitted to IEEE Transactions on Wireless Communications