Skip to main content

Showing 1–50 of 402 results for author: Kim, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, ** Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  3. arXiv:2406.17310  [pdf, other

    eess.AS

    High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

    Authors: Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

    Abstract: We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech2024

  4. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  5. arXiv:2406.12632  [pdf, other

    eess.IV cs.CV

    Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Image Synthesis: T1 MRI to Tau-PET

    Authors: Symac Kim, Junho Moon, Haejun Chung, Ikbeom Jang

    Abstract: Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its inv… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

  6. arXiv:2406.10549  [pdf, other

    eess.AS cs.CL cs.SD

    Lightweight Audio Segmentation for Long-form Speech Translation

    Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

    Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  7. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.07803  [pdf, other

    cs.SD cs.AI eess.AS

    EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

    Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  9. arXiv:2406.07103  [pdf, other

    eess.AS cs.AI

    MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

    Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-** Yu

    Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  10. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  11. arXiv:2406.05965  [pdf, other

    eess.AS cs.AI

    MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

    Authors: Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung ** Choi, Nam Soo Kim

    Abstract: In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  12. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  13. arXiv:2406.05270  [pdf

    physics.med-ph cs.CV cs.LG eess.IV

    fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI

    Authors: Eddy Solomon, Patricia M. Johnson, Zhengguo Tan, Radhika Tibrewala, Yvonne W. Lui, Florian Knoll, Linda Moy, Sungheon Gene Kim, Laura Heacock

    Abstract: This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  14. arXiv:2406.00669  [pdf

    eess.SY econ.GN

    Multi-technology co-optimization approach for sustainable hydrogen and electricity supply chains considering variability and demand scale

    Authors: Sunwoo Kim, Joungho Park, Jay H. Lee

    Abstract: In the pursuit of a carbon-neutral future, hydrogen emerges as a pivotal element, serving as a carbon-free energy carrier and feedstock. As efforts to decarbonize sectors such as heating and transportation intensify, understanding and navigating through the dynamics of hydrogen demand expansion becomes critical. Transitioning to hydrogen economy is complicated by varying regional scales and types… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  15. arXiv:2406.00665  [pdf

    econ.GN eess.SY

    Integrating solid direct air capture systems with green hydrogen production: Economic synergy of sector coupling

    Authors: Sunwoo Kim, Joungho Park, Jay H. Lee

    Abstract: In the global pursuit of sustainable energy solutions, mitigating carbon dioxide (CO2) emissions stands as a pivotal challenge. With escalating atmospheric CO2 levels, the imperative of direct air capture (DAC) systems becomes evident. Simultaneously, green hydrogen (GH) emerges as a pivotal medium for renewable energy. Nevertheless, the substantial expenses associated with these technologies impe… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  16. arXiv:2405.19346  [pdf, other

    eess.SP cs.AI cs.LG

    Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

    Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

    Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted at MICCAI 2024

  17. arXiv:2405.18701  [pdf, other

    eess.SP

    Near-Field Localization with RIS via Two-Dimensional Signal Path Classification

    Authors: Jeongwan Kang, Seung-Woo Ko, Sunwoo Kim

    Abstract: In this paper, we propose two-dimensional signal path classification (2D-SPC) for reconfigurable intelligent surface (RIS)-assisted near-field (NF) localization. In the NF regime, multiple RIS-driven signal paths (SPs) can contribute to precise localization if these are decomposable and the reflected locations on the RIS are known, referred to as SP decomposition (SPD) and SP labeling (SPL), respe… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 15pages, 12figures, Submitted to IEEE Transactions on Wireless Communications

  18. arXiv:2405.13413  [pdf, other

    cs.IT cs.LG eess.SP

    Boosted Neural Decoders: Achieving Extreme Reliability of LDPC Codes for 6G Networks

    Authors: Hee-Youl Kwak, Dae-Young Yun, Yongjune Kim, Sang-Hyo Kim, Jong-Seon No

    Abstract: Ensuring extremely high reliability is essential for channel coding in 6G networks. The next-generation of ultra-reliable and low-latency communications (xURLLC) scenario within 6G networks requires a frame error rate (FER) below 10-9. However, low-density parity-check (LDPC) codes, the standard in 5G new radio (NR), encounter a challenge known as the error floor phenomenon, which hinders to achie… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 12 pages, 11 figures

  19. arXiv:2405.11807  [pdf, other

    cs.HC cs.RO eess.SY

    Dual-sided Peltier Elements for Rapid Thermal Feedback in Wearables

    Authors: Seongjun Kang, Gwangbin Kim, Seokhyun Hwang, Jeongju Park, Ahmed Elsharkawy, SeungJun Kim

    Abstract: This paper introduces a motor-driven Peltier device designed to deliver immediate thermal sensations within extended reality (XR) environments. The system incorporates eight motor-driven Peltier elements, facilitating swift transitions between warm and cool sensations by rotating preheated or cooled elements to opposite sides. A multi-layer structure, comprising aluminum and silicone layers, ensur… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 3 pages, 4 figures, ICRA Wearable Workshop 2024 - 1st Workshop on Advancing Wearable Devices and Applications through Novel Design, Sensing, Actuation, and AI

  20. arXiv:2405.07255  [pdf, ps, other

    eess.SP

    Deep Learning-aided Parametric Sparse Channel Estimation for Terahertz Massive MIMO Systems

    Authors: **hong Kim, Yongjun Ahn, Seungnyun Kim, Byonghyo Shim

    Abstract: Terahertz (THz) communications is considered as one of key solutions to support extremely high data demand in 6G. One main difficulty of the THz communication is the severe signal attenuation caused by the foliage loss, oxygen/atmospheric absorption, body and hand losses. To compensate for the severe path loss, multiple-input-multiple-output (MIMO) antenna array-based beamforming has been widely u… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  21. arXiv:2405.06284  [pdf, other

    eess.IV cs.CV cs.LG

    Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention

    Authors: Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim, Sang-Chul Lee

    Abstract: Generalizability in deep neural networks plays a pivotal role in medical image segmentation. However, deep learning-based medical image analyses tend to overlook the importance of frequency variance, which is critical element for achieving a model that is both modality-agnostic and domain-generalizable. Additionally, various models fail to account for the potential information loss that can arise… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted in Computer Vision and Pattern Recognition (CVPR) 2024

  22. arXiv:2405.05787  [pdf, other

    cs.RO cs.CV eess.SY

    Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

    Authors: Tianpeng Zhang, Sekeun Kim, Jerome Charton, Haitong Ma, Kyungsang Kim, Na Li, Quanzheng Li

    Abstract: The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate map** between CT image and robot, and (iii) ta… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  23. arXiv:2405.04752  [pdf, other

    eess.AS cs.SD

    HILCodec: High Fidelity and Lightweight Neural Audio Codec

    Authors: Sunghwan Ahn, Beom Jun Woo, Min Hyun Han, Chanyeong Moon, Nam Soo Kim

    Abstract: The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consist… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  24. arXiv:2405.02066  [pdf, other

    cs.CV eess.IV

    WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights

    Authors: Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim

    Abstract: The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representat… ▽ More

    Submitted 27 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  25. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  26. arXiv:2404.16065  [pdf, other

    cs.HC eess.SP

    mmWave Wearable Antenna for Interaction with VR Devices

    Authors: Haksun Son, Song Min Kim

    Abstract: The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior rega… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  27. arXiv:2404.15302  [pdf, other

    eess.SP math.OC math.ST

    Robust Phase Retrieval by Alternating Minimization

    Authors: Seonho Kim, Kiryung Lee

    Abstract: We consider a least absolute deviation (LAD) approach to the robust phase retrieval problem that aims to recover a signal from its absolute measurements corrupted with sparse noise. To solve the resulting non-convex optimization problem, we propose a robust alternating minimization (Robust-AM) derived as an unconstrained Gauss-Newton method. To solve the inner optimization arising in each step of… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

  28. arXiv:2404.06902  [pdf, other

    eess.SY

    Spatiotemporal Analysis of Shared Situation Awareness among Connected Vehicles

    Authors: Seungmo Kim

    Abstract: Shared situation awareness (SSA) has been garnering explosive interest in various applications for intelligent transportation systems (ITS). In addition, the delay-constrained nature of supporting vehicular networks makes it critical to precisely analyze the performance of a SSA procedure. Extending the relevant literature, this paper provides an analysis framework that evaluates the performance o… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  29. arXiv:2403.11433  [pdf, ps, other

    quant-ph cs.CR cs.IT eess.SY

    Measuring Quantum Information Leakage Under Detection Threat

    Authors: Farhad Farokhi, Sejeong Kim

    Abstract: Gentle quantum leakage is proposed as a measure of information leakage to arbitrary eavesdroppers that aim to avoid detection. Gentle (also sometimes referred to as weak or non-demolition) measurements are used to encode the desire of the eavesdropper to evade detection. The gentle quantum leakage meets important axioms proposed for measures of information leakage including positivity, independenc… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  30. arXiv:2403.10186  [pdf, ps, other

    cs.NI eess.SP

    Is Wireless Bad for Consensus in Blockchain?

    Authors: Seungmo Kim

    Abstract: This paper examines how wireless communication affects the performance of various blockchain consensus mechanisms, focusing on their scalability and decentralization. It introduces an analytical framework for quantifying these effects, backed by extensive simulations, underscoring its broad applicability to various consensus mechanisms despite wireless communication's unreliability.

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: This manuscript was accepted for publication at IEEE International Conference on Blockchain and Cryptocurrency 2024

  31. arXiv:2403.10009  [pdf

    eess.IV cs.CV

    Cardiac Magnetic Resonance 2D+T Short- and Long-axis Segmentation via Spatio-temporal SAM Adaptation

    Authors: Zhennong Chen, Sekeun Kim, Hui Ren, Quanzheng Li, Xiang Li

    Abstract: Accurate 2D+T myocardium segmentation in cine cardiac magnetic resonance (CMR) scans is essential to analyze LV motion throughout the cardiac cycle comprehensively. The Segment Anything Model (SAM), known for its accurate segmentation and zero-shot generalization, has not yet been tailored for CMR 2D+T segmentation. We therefore introduce CMR2D+T-SAM, a novel approach to adapt SAM for CMR 2D+T seg… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 10 pages, 4 figures

  32. arXiv:2403.09967  [pdf, other

    eess.SP

    NR-Surface: NextG-ready $μ$W-reconfigurable mmWave Metasurface

    Authors: Minseok Kim, Namjo Ahn, Song Min Kim

    Abstract: Metasurface has recently emerged as an economic solution to expand mmWave coverage. However, their pervasive deployment remains a challenge, mainly due to the difficulty in reaching the tight 260ns NR synchronization requirement and real-time wireless reconfiguration while maintaining multi-year battery life. This paper presents NR-Surface, the first real-time reconfigurable metasurface fully comp… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 17 pages, 28 figures, to be published in NSDI '24

  33. arXiv:2402.17790  [pdf, other

    eess.SP cs.LG

    EEG classifier cross-task transfer to avoid training sessions in robot-assisted rehabilitation

    Authors: Niklas Kueper, Su Kyoung Kim, Elsa Andrea Kirchner

    Abstract: Background: For an individualized support of patients during rehabilitation, learning of individual machine learning models from the human electroencephalogram (EEG) is required. Our approach allows labeled training data to be recorded without the need for a specific training session. For this, the planned exoskeleton-assisted rehabilitation enables bilateral mirror therapy, in which movement inte… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 11 pages, 6 figures, 1 table

    MSC Class: 68

  34. arXiv:2402.16307  [pdf, ps, other

    eess.SP

    Analyzing Downlink Coverage in Clustered Low Earth Orbit Satellite Constellations: A Stochastic Geometry Approach

    Authors: Miyeon Lee, Sucheol Kim, Minje Kim, Dong-Hyun Jung, Junil Choi

    Abstract: Satellite networks are emerging as vital solutions for global connectivity beyond 5G. As companies such as SpaceX, OneWeb, and Amazon are poised to launch a large number of satellites in low Earth orbit, the heightened inter-satellite interference caused by mega-constellations has become a significant concern. To address this challenge, recent works have introduced the concept of satellite cluster… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: submitted to IEEE Transactions on Communications

  35. arXiv:2402.15539  [pdf, ps, other

    eess.AS cs.CL

    Speech Corpus for Korean Children with Autism Spectrum Disorder: Towards Automatic Assessment Systems

    Authors: Seonwoo Lee, Jihyun Mun, Sunhee Kim, Minhwa Chung

    Abstract: Despite the growing demand for digital therapeutics for children with Autism Spectrum Disorder (ASD), there is currently no speech corpus available for Korean children with ASD. This paper introduces a speech corpus specifically designed for Korean children with ASD, aiming to advance speech technologies such as pronunciation and severity evaluation. Speech recordings from speech and language eval… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 11 pages, Accepted for LREC-COLING 2024

  36. arXiv:2402.13820  [pdf, other

    cs.LG cs.AI cs.RO eess.SP eess.SY

    FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning

    Authors: Chenhao Li, Elijah Stanger-Jones, Steve Heim, Sangbae Kim

    Abstract: Motion trajectories offer reliable references for physics-based motion learning but suffer from sparsity, particularly in regions that lack sufficient data coverage. To address this challenge, we introduce a self-supervised, structured representation and generation method that extracts spatial-temporal relationships in periodic or quasi-periodic motions. The motion dynamics in a continuously param… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  37. arXiv:2402.06134  [pdf, other

    eess.SP

    Can 5G Coexist with Satellite Uplink in 28 GHz Band?

    Authors: Bryce Jeffrey, Seungmo Kim

    Abstract: 5G standalone (SA) rollout is right around the corner. 28 GHz band is considered as one of the main spectrum bands for the 5G SA in many countries. However, the band has already been occupied by uplink of the fixed satellite service (FSS). Due to high equivalent isotropic radiated power (EIRP) adopted by the FSS, the interference that FSS may cause into 5G is garnering research interest. This rese… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: To appear in Proc. IEEE SoutheastCon 2024

  38. arXiv:2402.05350  [pdf, other

    cs.CV eess.IV

    Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

    Authors: Junghun Cha, Ali Haider, Seoyun Yang, Hoeyeong **, Subin Yang, A. F. M. Shahab Uddin, Jaehyoung Kim, Soo Ye Kim, Sung-Ho Bae

    Abstract: A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to AAAI 2024

  39. arXiv:2401.15844  [pdf, other

    eess.SY

    Cross-Layer Performance Evaluation of C-V2X

    Authors: Dhruba Sunuwar, Seungmo Kim

    Abstract: As self-driving cars increasingly penetrate our daily lives, vehicle-to-everything (V2X) communications are emerging as one of the key enabler technologies. However, the dynamicity of vehicles (one of whose causes is the mobility of vehicles) often complicates it even further to evaluate the performance of a V2X system. We have been building a system-level simulator dedicated to assessing the perf… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: This is an extended abstract that was accepted on 01/22/2024 for publication to IEEE SoutheastCon 2024

  40. arXiv:2401.13851  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

    Authors: Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

    Abstract: In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.… ▽ More

    Submitted 29 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Presentation accepted at ICASSP 2024

  41. TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data

    Authors: Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Although there has been significant advancement in the field of speech-to-speech translation, conventional models still require language-parallel speech data between the source and target languages for training. In this paper, we introduce TranSentence, a novel speech-to-speech translation without language-parallel speech data. To achieve this, we first adopt a language-agnostic sentence-level spe… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  42. arXiv:2401.10250  [pdf, other

    cs.NI eess.SY

    Spectrum Sharing through Marketplaces for O-RAN based Non-Terrestrial and Terrestrial Networks

    Authors: **ho Choi, Bohai Li, Bassel Al Homssi, Jihong Park, Seung-Lyun Kim

    Abstract: Non-terrestrial networks (NTNs), including low Earth orbit (LEO) satellites, are expected to play a pivotal role in achieving global coverage for Internet-of-Things (IoT) applications in sixth-generation (6G) systems. Although specific frequency bands have been identified for satellite use in NTNs, persistent challenges arise due to the limited availability of spectrum resources. The coexistence o… ▽ More

    Submitted 16 December, 2023; originally announced January 2024.

    Comments: 7 pages, 5 figures

  43. arXiv:2401.06966  [pdf, other

    eess.SP

    Near-Field Channel Estimation for XL-RIS Assisted Multi-User XL-MIMO Systems: Hybrid Beamforming Architectures

    Authors: Jeongjae Lee, Hyeong** Chung, Yunseong Cho, Sunwoo Kim, Songnam Hong

    Abstract: Channel estimation is one of the key challenges for the deployment of extremely large-scale reconfigurable intelligent surface (XL-RIS) assisted multiple-input multiple-output (MIMO) systems. In this paper, we study the channel estimation problem for XL-RIS assisted multi-user XL-MIMO systems with hybrid beamforming structures. For this system, we propose an {\em unified} channel estimation method… ▽ More

    Submitted 25 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: submitted to IEEE Transactions on Communications

  44. arXiv:2401.01498  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

    Authors: Minchan Kim, Myeonghun Jeong, Byoung ** Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

    Abstract: We propose a novel text-to-speech (TTS) framework centered around a neural transducer. Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence (seq2seq) modeling and fine-grained acoustic modeling stages, utilizing discrete semantic tokens obtained from wav2vec2.0 embeddings. For a robust and efficient alignment modeling, we employ a neural transducer named token trans… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  45. arXiv:2401.01099  [pdf, other

    eess.AS cs.AI cs.LG

    Efficient Parallel Audio Generation using Group Masked Language Modeling

    Authors: Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim

    Abstract: We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inference due to iterative sampling. To resolve this problem, we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel Decoding~(G-… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  46. arXiv:2312.16255  [pdf, other

    eess.SP

    In-Lab Implementation of DSRC PHY Layer

    Authors: Leighton Thompson, Seungmo Kim

    Abstract: Connected and autonomous vehicles are already right around the corner of our everyday life. One of the key technologies actualizing the connected vehicles is vehicle-to-everything communications (V2X), which has been enhanced along the lines of two technologies--i.e., dedicated short-range communications (DSRC) and cellular V2X (C-V2X). While the United States (U.S.) federal government is on the m… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: text overlap with arXiv:2003.09724

  47. arXiv:2312.09446  [pdf, other

    eess.SP cs.AI cs.CV

    A Distributed Inference System for Detecting Task-wise Single Trial Event-Related Potential in Stream of Satellite Images

    Authors: Sung-** Kim, Heon-Gyu Kwak, Hyeon-Taek Han, Dae-Hyeok Lee, Ji-Hoon Jeong, Seong-Whan Lee

    Abstract: Brain-computer interface (BCI) has garnered the significant attention for their potential in various applications, with event-related potential (ERP) performing a considerable role in BCI systems. This paper introduces a novel Distributed Inference System tailored for detecting task-wise single-trial ERPs in a stream of satellite images. Unlike traditional methodologies that employ a single model… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  48. arXiv:2312.09423  [pdf, other

    eess.SP cs.AI cs.LG

    Decoding EEG-based Workload Levels Using Spatio-temporal Features Under Flight Environment

    Authors: Dae-Hyeok Lee, Sung-** Kim, Si-Hyun Kim, Seong-Whan Lee

    Abstract: The detection of pilots' mental states is important due to the potential for their abnormal mental states to result in catastrophic accidents. This study introduces the feasibility of employing deep learning techniques to classify different workload levels, specifically normal state, low workload, and high workload. To the best of our knowledge, this study is the first attempt to classify workload… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, 1 table, 1 algorithm

  49. arXiv:2312.09040  [pdf, other

    cs.SD cs.CL eess.AS

    STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models

    Authors: Kangwook Jang, Sungnyun Kim, Hoirin Kim

    Abstract: Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize. In this study, we propose to compress the speech SSL models by distilling speech temporal relation (STaR). Unlike previous works that directly match the representation for each speech frame, STaR distillation transfers tempor… ▽ More

    Submitted 25 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024 Best Student Paper Awarded. Code URL: https://github.com/sungnyun/ARMHuBERT

  50. arXiv:2312.06065  [pdf, other

    eess.AS cs.SD

    EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

    Authors: Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

    Abstract: In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel framework utilizing demultiplexed speaker embeddings. In this work, we focus on disentangling speaker-relevant information in the latent space and then transform each separated latent variable into its corresponding speech activity… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Signal Processing Letters