Search | arXiv e-print repository

Overlay Space-Air-Ground Integrated Networks with SWIPT-Empowered Aerial Communications

Authors: Anuradha Verma, Pankaj Kumar Sharma, Pawan Kumar, Dong In Kim

Abstract: In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employ… ▽ More In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employed at the aerial transmitter. Specifically, we take the random locations of the satellite, ground and aerial receivers to investigate the outage performance of both the satellite-to-ground and aerial networks leveraging the stochastic tools. By taking into account the Shadowed-Rician fading for satellite link, the Nakagami-\emph{m} for ground link, and the Rician fading for aerial link, we derive analytical expressions for the outage probability of these networks. For a comprehensive analysis of aerial network, we consider both the perfect and imperfect successive interference cancellation (SIC) scenarios. Through our analysis, we illustrate that, unlike linear EH, the implementation of non-linear EH provides accurate figures for any target rate, underscoring the significance of using non-linear EH models. Additionally, the influence of key parameters is emphasized, providing guidelines for the practical design of an energy-efficient as well as spectrum-efficient future non-terrestrial networks. Monte Carlo simulations validate the accuracy of our theoretical developments. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 36 pages, 14 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2406.11427 [pdf, other]

DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

Authors: Keon Lee, Dong Won Kim, Jaehyeon Kim, Jaewoong Cho

Abstract: Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f… ▽ More Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models for TTS. In this work, we present an efficient and scalable Diffusion Transformer (DiT) that utilizes off-the-shelf pre-trained text and speech encoders. Our approach addresses the challenge of text-speech alignment via cross-attention mechanisms with the prediction of the total length of speech representations. To achieve this, we enhance the DiT architecture to suit TTS and improve the alignment by incorporating semantic guidance into the latent space of speech. We scale the training dataset and the model size to 82K hours and 790M parameters, respectively. Our extensive experiments demonstrate that the large-scale diffusion model for TTS without domain-specific modeling not only simplifies the training pipeline but also yields superior or comparable zero-shot performance to state-of-the-art TTS models in terms of naturalness, intelligibility, and speaker similarity. Our speech samples are available at https://ditto-tts.github.io. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.08714 [pdf, other]

Real-time Digital RF Emulation -- II: A Near Memory Custom Accelerator

Authors: Mandovi Mukherjee, Xiangyu Mao, Nael Rahman, Coleman DeLude, Joe Driscoll, Sudarshan Sharma, Payman Behnam, Uday Kamal, Jongseok Woo, Daehyun Kim, Sharjeel Khan, Jianming Tong, Jamin Seo, Prachi Sinha, Madhavan Swaminathan, Tushar Krishna, Santosh Pande, Justin Romberg, Saibal Mukhopadhyay

Abstract: A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous… ▽ More A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous control to extract concurrency in compute as well as low latency. It achieves a $518$ MHz per channel bandwidth in a prototype $4$-node system. The maximum emulation range supported in this paradigm is $9.5$ km with $0.24$ $μ$s of per-sample emulation latency. 2). The FPGA-based implementation, evaluated on a Xilinx ZCU104 board, demonstrates a $9$-node test case (two Transmitters, one Receiver, and $6$ passive reflectors) with an emulation range of $1.13$ km to $27.3$ km at $215$ MHz bandwidth. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.02562 [pdf, other]

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

Authors: Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

Abstract: In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter… ▽ More In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch. Furthermore, GLoRA enhances parameter-efficient fine-tuning performance compared to conventional LoRA. △ Less

Submitted 23 April, 2024; originally announced June 2024.

Comments: Table 2 is revised

Journal ref: ICASSP 2024 Workshop(HSCMA 2024) paper

arXiv:2405.18503 [pdf, other]

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Authors: Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji

Abstract: Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error… ▽ More Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error to align them with their artistic intentions. To address this issue, we introduce Sound Consistency Trajectory Models (SoundCTM). Our model enables flexible transitioning between high-quality 1-step sound generation and superior sound quality through multi-step generation. This allows creators to initially control sounds with 1-step samples before refining them through multi-step generation. While CTM fundamentally achieves flexible 1-step and multi-step generation, its impressive performance heavily depends on an additional pretrained feature extractor and an adversarial loss, which are expensive to train and not always available in other domains. Thus, we reframe CTM's training framework and introduce a novel feature distance by utilizing the teacher's network for a distillation loss. Additionally, while distilling classifier-free guided trajectories, we train conditional and unconditional student models simultaneously and interpolate between these models during inference. We also propose training-free controllable frameworks for SoundCTM, leveraging its flexible sampling capability. SoundCTM achieves both promising 1-step and multi-step real-time sound generation without using any extra off-the-shelf networks. Furthermore, we demonstrate SoundCTM's capability of controllable sound generation in a training-free manner. Our codes, pretrained models, and audio samples are available at https://github.com/sony/soundctm. △ Less

Submitted 10 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Audio samples: https://koichi-saito-sony.github.io/soundctm/. Codes: https://github.com/sony/soundctm. Checkpoints: https://huggingface.co/Sony/soundctm

arXiv:2405.18012 [pdf, other]

Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, **young Park, Yooseung Wang, Donguk Kim, Changick Kim

Abstract: Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction… ▽ More Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction among actors and their activity. Flaming-Net leverages an additional optical flow modality in the training stage to enhance its motion awareness when finding locally active actors. The first pathway of the relation module, the actor-centric path, initially captures the temporal dynamics of individual actors and then constructs inter-actor relationships. In parallel, the group-centric path starts by building spatial connections between actors within the same timeframe and then captures simultaneous spatio-temporal dynamics among them. We demonstrate that Flaming-Net achieves new state-of-the-art WSGAR results on two benchmarks, including a 2.8%p higher MPCA score on the NBA dataset. Importantly, we use the optical flow modality only for training and not for inference. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.11133 [pdf]

XCAT-3.0: A Comprehensive Library of Personalized Digital Twins Derived from CT Scans

Authors: Lavsen Dahal, Mobina Ghojoghnejad, Dhrubajyoti Ghosh, Yubraj Bhandari, David Kim, Fong Chi Ho, Fakrul Islam Tushar, Sheng Luoa, Kyle J. Lafata, Ehsan Abadi, Ehsan Samei, Joseph Y. Lo, W. Paul Segars

Abstract: Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VIT. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and diversity. Insufficient representation of the population hampers… ▽ More Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VIT. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and diversity. Insufficient representation of the population hampers accurate assessment of imaging technologies across different patient groups. Traditionally, phantoms were created by manual segmentation, which is a laborious and time-consuming task, impeding the expansion of phantom libraries. This study presents a framework for realistic computational phantom modeling using a suite of four deep learning segmentation models, followed by three forms of automated organ segmentation quality control. Over 2500 computational phantoms with up to 140 structures illustrating a sophisticated approach to detailed anatomical modeling are released. Phantoms are available in both voxelized and surface mesh formats. The framework is aggregated with an in-house CT scanner simulator to produce realistic CT images. The framework can potentially advance virtual imaging trials, facilitating comprehensive and reliable evaluations of medical imaging technologies. Phantoms may be requested at https://cvit.duke.edu/resources/, code, model weights, and sample CT images are available at https://xcat-3.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.05107 [pdf, other]

Leveraging AES Padding: dBs for Nothing and FEC for Free in IoT Systems

Authors: Jongchan Woo, Vipindev Adat Vasudevan, Benjamin D. Kim, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Ken R. Duffy, Muriel Médard

Abstract: The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed… ▽ More The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed for secure encryption in IoT systems. Our study explores an innovative use of AES, by repurposing AES padding bits for error correction and thus introducing a dual-functional method that seamlessly integrates error-correcting capabilities into the standard encryption process. The integration of the state-of-the-art Guessing Random Additive Noise Decoder (GRAND) in the receiver's architecture facilitates the joint decoding and decryption process. This strategic approach not only preserves the existing structure of the transmitter but also significantly enhances communication reliability in noisy environments, achieving a notable over 3 dB gain in Block Error Rate (BLER). Remarkably, this enhanced performance comes with a minimal power overhead at the receiver - less than 15% compared to the traditional decryption-only process, underscoring the efficiency of our hardware design for IoT applications. This paper discusses a comprehensive analysis of our approach, particularly in energy efficiency and system performance, presenting a novel and practical solution for reliable IoT communications. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.18705 [pdf, other]

Wireless Information and Energy Transfer in the Era of 6G Communications

Authors: Constantinos Psomas, Konstantinos Ntougias, Nikita Shanin, Dongfang Xu, Kenneth MacSporran Mayer, Nguyen Minh Tran, Laura Cottatellucci, Kae Won Choi, Dong In Kim, Robert Schober, Ioannis Krikidis

Abstract: Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting… ▽ More Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting the quality-of-service demands of WIET, in terms of both data transfer and power delivery, requires effective co-design of the information and energy signals. In this article, we present the main principles and design aspects of WIET, focusing on its integration in 6G networks. First, we discuss how conventional communication notions such as resource allocation and waveform design need to be revisited in the context of WIET. Next, we consider various candidate 6G technologies that can boost WIET efficiency, namely, holographic multiple-input multiple-output, near-field beamforming, terahertz communication, intelligent reflecting surfaces (IRSs), and reconfigurable (fluid) antenna arrays. We introduce respective WIET design methods, analyze the promising performance gains of these WIET systems, and discuss challenges, open issues, and future research directions. Finally, a near-field energy beamforming scheme and a power-based IRS beamforming algorithm are experimentally validated using a wireless energy transfer testbed. The vision of WIET in communication systems has been gaining momentum in recent years, with constant progress with respect to theoretical but also practical aspects. The comprehensive overview of the state of the art of WIET presented in this paper highlights the potentials of WIET systems as well as their overall benefits in 6G networks. △ Less

Submitted 16 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Proceedings of the IEEE, 36 pages, 33 figures

arXiv:2404.17585 [pdf, other]

NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG

Authors: Cheol-Hui Lee, Hakseung Kim, Hyun-jee Han, Min-Kyung Jung, Byung C. Yoon, Dong-Joo Kim

Abstract: The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need fo… ▽ More The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need for large datasets with labels and the inherent biases in human-generated annotations. This paper introduces NeuroNet, a self-supervised learning (SSL) framework designed to effectively harness unlabeled single-channel sleep electroencephalogram (EEG) signals by integrating contrastive learning tasks and masked prediction tasks. NeuroNet demonstrates superior performance over existing SSL methodologies through extensive experimentation conducted across three polysomnography (PSG) datasets. Additionally, this study proposes a Mamba-based temporal context module to capture the relationships among diverse EEG epochs. Combining NeuroNet with the Mamba-based temporal context module has demonstrated the capability to achieve, or even surpass, the performance of the latest supervised learning methodologies, even with a limited amount of labeled data. This study is expected to establish a new benchmark in sleep stage classification, promising to guide future research and applications in the field of sleep analysis. △ Less

Submitted 13 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 14 pages, 4 figures

arXiv:2404.15333 [pdf, other]

EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection

Authors: JuneYoung Park, Da Young Kim, Yunsoo Kim, Jisu Yoo, Tae Joon Kim

Abstract: Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are no… ▽ More Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are not applicable in cases with few training examples. This is because abnormal ECG data is scarce compared to normal data in most real-world clinical settings. Therefore, in this study, GAN-based anomaly detec-tion, i.e., unsupervised learning, was employed to address the issue of data imbalance. This paper focuses on detecting abnormal signals in electrocardi-ograms (ECGs) using only labels from normal signals as training data. In-spired by self-supervised vision transformers, which learn by dividing images into patches, and masked auto-encoders, known for their effectiveness in patch reconstruction and solving information redundancy, we introduce the ECG Heartbeat Anomaly Detection model, EB-GAME. EB-GAME was trained and validated on the MIT-BIH Arrhythmia Dataset, where it achieved state-of-the-art performance on this benchmark. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.14140 [pdf, other]

Generative Artificial Intelligence Assisted Wireless Sensing: Human Flow Detection in Practical Communication Environments

Authors: Jiacheng Wang, Hongyang Du, Dusit Niyato, Zehui Xiong, Jiawen Kang, Bo Ai, Zhu Han, Dong In Kim

Abstract: Groundbreaking applications such as ChatGPT have heightened research interest in generative artificial intelligence (GAI). Essentially, GAI excels not only in content generation but also in signal processing, offering support for wireless sensing. Hence, we introduce a novel GAI-assisted human flow detection system (G-HFD). Rigorously, G-HFD first uses channel state information (CSI) to estimate t… ▽ More Groundbreaking applications such as ChatGPT have heightened research interest in generative artificial intelligence (GAI). Essentially, GAI excels not only in content generation but also in signal processing, offering support for wireless sensing. Hence, we introduce a novel GAI-assisted human flow detection system (G-HFD). Rigorously, G-HFD first uses channel state information (CSI) to estimate the velocity and acceleration of propagation path length change of the human-induced reflection (HIR). Then, given the strong inference ability of the diffusion model, we propose a unified weighted conditional diffusion model (UW-CDM) to denoise the estimation results, enabling the detection of the number of targets. Next, we use the CSI obtained by a uniform linear array with wavelength spacing to estimate the HIR's time of flight and direction of arrival (DoA). In this process, UW-CDM solves the problem of ambiguous DoA spectrum, ensuring accurate DoA estimation. Finally, through clustering, G-HFD determines the number of subflows and the number of targets in each subflow, i.e., the subflow size. The evaluation based on practical downlink communication signals shows G-HFD's accuracy of subflow size detection can reach 91%. This validates its effectiveness and underscores the significant potential of GAI in the context of wireless sensing. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.01661 [pdf, other]

Interaction-Aware Vehicle Motion Planning with Collision Avoidance Constraints in Highway Traffic

Authors: Dongryul Kim, Hyeonjeong Kim, Kyoungseok Han

Abstract: This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique un… ▽ More This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique under state constraints, utilizing a planner based on Pontryagin's Minimum Principle, capable of numerically solving collision avoidance scenarios with surrounding vehicles. Simulation results demonstrate the effectiveness of the proposed approach regarding interaction-based motion planning for different scenarios. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.17420 [pdf, other]

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

Authors: Dong** Kim, Sung ** Um, Sangmin Lee, Jung Uk Kim

Abstract: The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their reliance on prior information about the number of objects to be separated. In this paper, to overcome this limitation, we present a novel multi-sound source localizati… ▽ More The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their reliance on prior information about the number of objects to be separated. In this paper, to overcome this limitation, we present a novel multi-sound source localization method that can perform localization without prior knowledge of the number of sound sources. To achieve this goal, we propose an iterative object identification (IOI) module, which can recognize sound-making objects in an iterative manner. After finding the regions of sound-making objects, we devise object similarity-aware clustering (OSC) loss to guide the IOI module to effectively combine regions of the same object but also distinguish between different objects and backgrounds. It enables our method to perform accurate localization of sound-making objects without any prior knowledge. Extensive experimental results on the MUSIC and VGGSound benchmarks show the significant performance improvements of the proposed method over the existing methods for both single and multi-source. Our code is available at: https://github.com/VisualAIKHU/NoPrior_MultiSSL △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.16477 [pdf, other]

Safeguarding Next Generation Multiple Access Using Physical Layer Security Techniques: A Tutorial

Authors: Lu Lv, Dongyang Xu, Rose Qingyang Hu, Yinghui Ye, Long Yang, Xianfu Lei, Xianbin Wang, Dong In Kim, Arumugam Nallanathan

Abstract: Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Neve… ▽ More Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Nevertheless, the support of massive access via NOMA leads to additional security threats, due to the open nature of the air interface, the broadcast characteristic of radio propagation as well as intertwined relationship among paired NOMA users. To address this specific challenge, the superimposed transmission of NOMA can be explored as new opportunities for security aware design, for example, multiuser interference inherent in NOMA can be constructively engineered to benefit communication secrecy and privacy. The purpose of this tutorial is to provide a comprehensive overview on the state-of-the-art physical layer security techniques that guarantee wireless security and privacy for NOMA networks, along with the opportunities, technical challenges, and future research trends. △ Less

Submitted 21 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Invited paper by Proceedings of the IEEE

arXiv:2403.11104 [pdf, other]

Deep Neural Network NMPC for Computationally Tractable Optimal Power Management of Hybrid Electric Vehicle

Authors: Suyong Park, Duc Giap Nguyen, **rak Park, Dohee Kim, Jeong Soo Eo, Kyoungseok Han

Abstract: This study presents a method for deep neural network nonlinear model predictive control (DNN-MPC) to reduce computational complexity, and we show its practical utility through its application in optimizing the energy management of hybrid electric vehicles (HEVs). For optimal power management of HEVs, we first design the online NMPC to collect the data set, and the deep neural network is trained to… ▽ More This study presents a method for deep neural network nonlinear model predictive control (DNN-MPC) to reduce computational complexity, and we show its practical utility through its application in optimizing the energy management of hybrid electric vehicles (HEVs). For optimal power management of HEVs, we first design the online NMPC to collect the data set, and the deep neural network is trained to approximate the NMPC solutions. We assess the effectiveness of our approach by conducting comparative simulations with rule and online NMPC-based power management strategies for HEV, evaluating both fuel consumption and computational complexity. Lastly, we verify the real-time feasibility of our approach through process-in-the-loop (PIL) testing. The test results demonstrate that the proposed method closely approximates the NMPC performance while substantially reducing the computational burden. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 6 pages, 10 figures, 3 tables, 2024 ACC conference (accepted)

arXiv:2403.08187 [pdf, other]

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 figures

ACM Class: I.2.7

arXiv:2403.04925 [pdf, ps, other]

Near Field Communications for DMA-NOMA Networks

Authors: Zheng Zhang, Yuanwei Liu, Zhaolin Wang, Jian Chen, Dong In Kim

Abstract: A novel near-field transmission framework is proposed for dynamic metasurface antenna (DMA)-enabled non-orthogonal multiple access (NOMA) networks. The base station (BS) exploits the hybrid beamforming to communicate with multiple near users (NUs) and far users (FUs) using the NOMA principle. Based on this framework, two novel beamforming schemes are proposed. 1) For the case of the grouped users… ▽ More A novel near-field transmission framework is proposed for dynamic metasurface antenna (DMA)-enabled non-orthogonal multiple access (NOMA) networks. The base station (BS) exploits the hybrid beamforming to communicate with multiple near users (NUs) and far users (FUs) using the NOMA principle. Based on this framework, two novel beamforming schemes are proposed. 1) For the case of the grouped users distributed in the same direction, a beam-steering scheme is developed. The metric of beam pattern error (BPE) is introduced for the characterization of the gap between the hybrid beamformers and the desired ideal beamformers, where a two-layer algorithm is proposed to minimize BPE by optimizing hybrid beamformers. Then, the optimal power allocation strategy is obtained to maximize the sum achievable rate of the network. 2) For the case of users randomly distributed, a beam-splitting scheme is proposed, where two sub-beamformers are extracted from the single beamformer to serve different users in the same group. An alternating optimization (AO) algorithm is proposed for hybrid beamformer optimization, and the optimal power allocation is also derived. Numerical results validate that: 1) the proposed beamforming schemes exhibit superior performance compared with the existing imperfect-resolution-based beamforming scheme; 2) the communication rate of the proposed transmission framework is sensitive to the imperfect distance knowledge of NUs but not to that of FUs. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 13 pages

arXiv:2402.09756 [pdf, other]

Mixture of Experts for Network Optimization: A Large Language Model-enabled Approach

Authors: Hongyang Du, Guangyuan Liu, Yi**g Lin, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim

Abstract: Optimizing various wireless user tasks poses a significant challenge for networking systems because of the expanding range of user requirements. Despite advancements in Deep Reinforcement Learning (DRL), the need for customized optimization tasks for individual users complicates develo** and applying numerous DRL models, leading to substantial computation resource and energy consumption and can… ▽ More Optimizing various wireless user tasks poses a significant challenge for networking systems because of the expanding range of user requirements. Despite advancements in Deep Reinforcement Learning (DRL), the need for customized optimization tasks for individual users complicates develo** and applying numerous DRL models, leading to substantial computation resource and energy consumption and can lead to inconsistent outcomes. To address this issue, we propose a novel approach utilizing a Mixture of Experts (MoE) framework, augmented with Large Language Models (LLMs), to analyze user objectives and constraints effectively, select specialized DRL experts, and weigh each decision from the participating experts. Specifically, we develop a gate network to oversee the expert models, allowing a collective of experts to tackle a wide array of new tasks. Furthermore, we innovatively substitute the traditional gate network with an LLM, leveraging its advanced reasoning capabilities to manage expert model selection for joint decisions. Our proposed method reduces the need to train new DRL models for each unique optimization problem, decreasing energy consumption and AI model implementation costs. The LLM-enabled MoE approach is validated through a general maze navigation task and a specific network service provider utility maximization task, demonstrating its effectiveness and practical applicability in optimizing complex networking systems. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2401.13936 [pdf, ps, other]

Learning-based sensing and computing decision for data freshness in edge computing-enabled networks

Authors: Sinwoong Yun, Dongsun Kim, Chanwon Park, Jemin Lee

Abstract: As the demand on artificial intelligence (AI)-based applications increases, the freshness of sensed data becomes crucial in the wireless sensor networks. Since those applications require a large amount of computation for processing the sensed data, it is essential to offload the computation load to the edge computing (EC) server. In this paper, we propose the sensing and computing decision (SCD) a… ▽ More As the demand on artificial intelligence (AI)-based applications increases, the freshness of sensed data becomes crucial in the wireless sensor networks. Since those applications require a large amount of computation for processing the sensed data, it is essential to offload the computation load to the edge computing (EC) server. In this paper, we propose the sensing and computing decision (SCD) algorithms for data freshness in the EC-enabled wireless sensor networks. We define the η-coverage probability to show the probability of maintaining fresh data for more than η ratio of the network, where the spatial-temporal correlation of information is considered. We then propose the probability-based SCD for the single pre-charged sensor case with providing the optimal point after deriving the η-coverage probability. We also propose the reinforcement learning (RL)- based SCD by training the SCD policy of sensors for both the single pre-charged and multiple energy harvesting (EH) sensor cases, to make a real-time decision based on its observation. Our simulation results verify the performance of the proposed algorithms under various environment settings, and show that the RL-based SCD algorithm achieves higher performance compared to baseline algorithms for both the single pre-charged sensor and multiple EH sensor cases. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 15 pages

arXiv:2401.11429 [pdf, ps, other]

Joint Downlink and Uplink Optimization for RIS-Aided FDD MIMO Communication Systems

Authors: Gyoseung Lee, Hyeongtaek Lee, Donghwan Kim, Jaehoon Chung, A. Lee. Swindlehurst, Junil Choi

Abstract: This paper investigates reconfigurable intelligent surface (RIS)-aided frequency division duplexing (FDD) communication systems. Since the downlink and uplink signals are simultaneously transmitted in FDD, the phase shifts at the RIS should be designed to support both transmissions. Considering a single-user multiple-input multiple-output system, we formulate a weighted sum-rate maximization probl… ▽ More This paper investigates reconfigurable intelligent surface (RIS)-aided frequency division duplexing (FDD) communication systems. Since the downlink and uplink signals are simultaneously transmitted in FDD, the phase shifts at the RIS should be designed to support both transmissions. Considering a single-user multiple-input multiple-output system, we formulate a weighted sum-rate maximization problem to jointly maximize the downlink and uplink system performance. To tackle the non-convex optimization problem, we adopt an alternating optimization (AO) algorithm, in which two phase shift optimization techniques are developed to handle the unit-modulus constraints induced by the reflection coefficients at the RIS. The first technique exploits the manifold optimization-based algorithm, while the second uses a lower-complexity AO approach. Numerical results verify that the proposed techniques rapidly converge to local optima and significantly improve the overall system performance compared to existing benchmark schemes. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Accepted to IEEE Transactions on Wireless Communications

arXiv:2401.06422 [pdf, other]

Joint Mechanical and Electrical Adjustment of IRS-aided LEO Satellite MIMO Communications

Authors: Doyoung Kim, Seongah Jeong

Abstract: In this letter, we propose a joint mechanical and electrical adjustment of intelligent reflecting surface (IRS) for the performance improvements of low-earth orbit (LEO) satellite multiple-input multiple-output (MIMO) communications. In particular, we construct a three-dimensional (3D) MIMO channel model for the mechanically-tilted IRS in general deployment, and consider two types of scenarios wit… ▽ More In this letter, we propose a joint mechanical and electrical adjustment of intelligent reflecting surface (IRS) for the performance improvements of low-earth orbit (LEO) satellite multiple-input multiple-output (MIMO) communications. In particular, we construct a three-dimensional (3D) MIMO channel model for the mechanically-tilted IRS in general deployment, and consider two types of scenarios with and without the direct path of LEO-ground user link due to the orbital flight. With the aim of maximizing the end-to-end performance, we jointly optimize tilting angle and phase shift of IRS along with the transceiver beamforming, whose performance superiority is verified via simulations with the Orbcomm LEO satellite using a real orbit data. △ Less

Submitted 20 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 5 pages, 6 figures

arXiv:2401.02216 [pdf, ps, other]

Harnessing Membership Function Dynamics for Stability Analysis of T-S Fuzzy Systems

Authors: Donghwan Lee, Do-Wan Kim

Abstract: The main goal of this paper is to develop a new linear matrix inequality (LMI) condition for the asymptotic stability of continuous-time Takagi-Sugeno (T-S) fuzzy systems. A key advantage of this new condition is its independence from the bounds on the time-derivatives of the membership functions, a requirement present in the existing approaches. This is achieved by introducing a novel fuzzy Lyapu… ▽ More The main goal of this paper is to develop a new linear matrix inequality (LMI) condition for the asymptotic stability of continuous-time Takagi-Sugeno (T-S) fuzzy systems. A key advantage of this new condition is its independence from the bounds on the time-derivatives of the membership functions, a requirement present in the existing approaches. This is achieved by introducing a novel fuzzy Lyapunov function that incorporates an augmented state vector. Notably, this augmented state vector encompasses the membership functions, allowing the dynamics of these functions to be integrated into the proposed condition. This inclusion of additional information about the membership function serves to reduce the conservativeness of the suggested stability condition. To demonstrate the effectiveness of the proposed method, examples are provided. △ Less

Submitted 13 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2309.06841

arXiv:2312.09736 [pdf, other]

HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

Authors: Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo

Abstract: Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in develo** VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information f… ▽ More Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in develo** VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information from the audio when generating appropriate responses to the question. The VGD system seems to be deaf, and thus, we coin this symptom of current systems' ignoring audio data as a deaf response. To overcome the deaf response problem, Hearing Enhanced Audio Response (HEAR) framework is proposed to perform sensible listening by selectively attending to audio whenever the question requires it. The HEAR framework enhances the accuracy and audibility of VGD systems in a model-agnostic manner. HEAR is validated on VGD datasets (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows effectiveness with various VGD systems. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: EMNLP 2023, 14 pages, 13 figures

arXiv:2312.09461 [pdf, other]

Improving Generalization of Drowsiness State Classification by Domain-Specific Normalization

Authors: Dong-Young Kim, Dong-Kyun Han, Seo-Hyeon Park, Geun-Deok Jang, Seong-Whan Lee

Abstract: Abnormal driver states, particularly have been major concerns for road safety, emphasizing the importance of accurate drowsiness detection to prevent accidents. Electroencephalogram (EEG) signals are recognized for their effectiveness in monitoring a driver's mental state by monitoring brain activities. However, the challenge lies in the requirement for prior calibration due to the variation of EE… ▽ More Abnormal driver states, particularly have been major concerns for road safety, emphasizing the importance of accurate drowsiness detection to prevent accidents. Electroencephalogram (EEG) signals are recognized for their effectiveness in monitoring a driver's mental state by monitoring brain activities. However, the challenge lies in the requirement for prior calibration due to the variation of EEG signals among and within individuals. The necessity of calibration has made the brain-computer interface (BCI) less accessible. We propose a practical generalized framework for classifying driver drowsiness states to improve accessibility and convenience. We separate the normalization process for each driver, treating them as individual domains. The goal of develo** a general model is similar to that of domain generalization. The framework considers the statistics of each domain separately since they vary among domains. We experimented with various normalization methods to enhance the ability to generalize across subjects, i.e. the model's generalization performance of unseen domains. The experiments showed that applying individual domain-specific normalization yielded an outstanding improvement in generalizability. Furthermore, our framework demonstrates the potential and accessibility by removing the need for calibration in BCI applications. △ Less

Submitted 14 November, 2023; originally announced December 2023.

Comments: Submitted to 2024 12th IEEE International Winter Conference on Brain-Computer Interface

arXiv:2312.06985 [pdf, ps, other]

Ergodic Secrecy Rate Analysis for LEO Satellite Downlink Networks

Authors: Daeun Kim, Namyoon Lee

Abstract: Satellite networks are recognized as an effective solution to ensure seamless connectivity worldwide, catering to a diverse range of applications. However, the broad coverage and broadcasting nature of satellite networks also expose them to security challenges. Despite these challenges, there is a lack of analytical understanding addressing the secrecy performance of these networks. This paper pre… ▽ More Satellite networks are recognized as an effective solution to ensure seamless connectivity worldwide, catering to a diverse range of applications. However, the broad coverage and broadcasting nature of satellite networks also expose them to security challenges. Despite these challenges, there is a lack of analytical understanding addressing the secrecy performance of these networks. This paper presents a secrecy rate analysis for downlink low Earth orbit (LEO) satellite networks by modeling the spatial distribution of satellites, users, and potential eavesdroppers as homogeneous Poisson point processes on concentric spheres. Specifically, we provide an analytical expression for the ergodic secrecy rate of a typical downlink user in terms of the satellite network parameters, fading parameters, and path-loss exponent. Simulation results show the exactness of the provided expressions and we find that optimal satellite altitude increases with eavesdropper density. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.17923 [pdf, other]

Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals

Authors: Young-Eun Lee, Seo-Hyun Lee, Soowon Kim, Jung-Sun Lee, Deok-Seon Kim, Seong-Whan Lee

Abstract: Recent advances in brain-computer interface (BCI) technology, particularly based on generative adversarial networks (GAN), have shown great promise for improving decoding performance for BCI. Within the realm of Brain-Computer Interfaces (BCI), GANs find application in addressing many areas. They serve as a valuable tool for data augmentation, which can solve the challenge of limited data availabi… ▽ More Recent advances in brain-computer interface (BCI) technology, particularly based on generative adversarial networks (GAN), have shown great promise for improving decoding performance for BCI. Within the realm of Brain-Computer Interfaces (BCI), GANs find application in addressing many areas. They serve as a valuable tool for data augmentation, which can solve the challenge of limited data availability, and synthesis, effectively expanding the dataset and creating novel data formats, thus enhancing the robustness and adaptability of BCI systems. Research in speech-related paradigms has significantly expanded, with a critical impact on the advancement of assistive technologies and communication support for individuals with speech impairments. In this study, GANs were investigated, particularly for the BCI field, and applied to generate text from EEG signals. The GANs could generalize all subjects and decode unseen words, indicating its ability to capture underlying speech patterns consistent across different individuals. The method has practical applications in neural signal-based speech recognition systems and communication aids for individuals with speech difficulties. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 5 pages, 2 figures

arXiv:2311.06523 [pdf, other]

Generative AI for Space-Air-Ground Integrated Networks (SAGIN)

Authors: Ruichen Zhang, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, ** Zhang, Dong In Kim

Abstract: Recently, generative AI technologies have emerged as a significant advancement in artificial intelligence field, renowned for their language and image generation capabilities. Meantime, space-air-ground integrated network (SAGIN) is an integral part of future B5G/6G for achieving ubiquitous connectivity. Inspired by this, this article explores an integration of generative AI in SAGIN, focusing on… ▽ More Recently, generative AI technologies have emerged as a significant advancement in artificial intelligence field, renowned for their language and image generation capabilities. Meantime, space-air-ground integrated network (SAGIN) is an integral part of future B5G/6G for achieving ubiquitous connectivity. Inspired by this, this article explores an integration of generative AI in SAGIN, focusing on potential applications and case study. We first provide a comprehensive review of SAGIN and generative AI models, highlighting their capabilities and opportunities of their integration. Benefiting from generative AI's ability to generate useful data and facilitate advanced decision-making processes, it can be applied to various scenarios of SAGIN. Accordingly, we present a concise survey on their integration, including channel modeling and channel state information (CSI) estimation, joint air-space-ground resource allocation, intelligent network deployment, semantic communications, image extraction and processing, security and privacy enhancement. Next, we propose a framework that utilizes a Generative Diffusion Model (GDM) to construct channel information map to enhance quality of service for SAGIN. Simulation results demonstrate the effectiveness of the proposed framework. Finally, we discuss potential research directions for generative AI-enabled SAGIN. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 9page, 5 figures

arXiv:2309.17008 [pdf, other]

Energy-Efficient Secure Offloading System Designed via UAV-Mounted Intelligent Reflecting Surface for Resilience Enhancement

Authors: Doyoung Kim, Seongah Jeong, **kyu Kang

Abstract: With increasing interest in mmWave and THz communication systems, an unmanned aerial vehicle (UAV)-mounted intelligent reflecting surface (IRS) has been suggested as a key enabling technology to establish robust line-of-sight (LoS) connections with ground nodes owing to their free mobility and high altitude, especially for emergency and disaster response. This paper investigates a secure offloadin… ▽ More With increasing interest in mmWave and THz communication systems, an unmanned aerial vehicle (UAV)-mounted intelligent reflecting surface (IRS) has been suggested as a key enabling technology to establish robust line-of-sight (LoS) connections with ground nodes owing to their free mobility and high altitude, especially for emergency and disaster response. This paper investigates a secure offloading system, where the UAV-mounted IRS assists the offloading procedures between ground users and an access point (AP) acting as an edge cloud. In this system, the users except the intended recipients in the offloading process are considered as potential eavesdroppers. The system aims to achieve the minimum total energy consumption of battery-limited ground user devices under constraints for secure offloading accomplishment and operability of UAV-mounted IRS, which is done by optimizing the transmit power of ground user devices, the trajectory and phase shift matrix of UAV-mounted IRS, and the offloading ratio between local execution and edge computing based on the successive convex approximation (SCA) algorithms. Numerical results show that the proposed algorithm can provide the considerable energy savings compared with local execution and partial optimizations. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: 11 pages, 5 figures

arXiv:2309.14668 [pdf]

Depolarized Holography with Polarization-multiplexing Metasurface

Authors: Seung-Woo Nam, Young** Kim, Dongyeon Kim, Yoonchan Jeong

Abstract: The evolution of computer-generated holography (CGH) algorithms has prompted significant improvements in the performances of holographic displays. Nonetheless, they start to encounter a limited degree of freedom in CGH optimization and physical constraints stemming from the coherent nature of holograms. To surpass the physical limitations, we consider polarization as a new degree of freedom by uti… ▽ More The evolution of computer-generated holography (CGH) algorithms has prompted significant improvements in the performances of holographic displays. Nonetheless, they start to encounter a limited degree of freedom in CGH optimization and physical constraints stemming from the coherent nature of holograms. To surpass the physical limitations, we consider polarization as a new degree of freedom by utilizing a novel optical platform called metasurface. Polarization-multiplexing metasurfaces enable incoherent-like behavior in holographic displays due to the mutual incoherence of orthogonal polarization states. We leverage this unique characteristic of a metasurface by integrating it into a holographic display and exploiting polarization diversity to bring an additional degree of freedom for CGH algorithms. To minimize the speckle noise while maximizing the image quality, we devise a fully differentiable optimization pipeline by taking into account the metasurface proxy model, thereby jointly optimizing spatial light modulator phase patterns and geometric parameters of metasurface nanostructures. We evaluate the metasurface-enabled depolarized holography through simulations and experiments, demonstrating its ability to reduce speckle noise and enhance image quality. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 15 pages, 13 figures, to be published in SIGGRAPH Asia 2023

arXiv:2309.11988 [pdf, ps, other]

Relaxed Conditions for Parameterized Linear Matrix Inequality in the Form of Nested Fuzzy Summations

Authors: Do Wan Kim, Donghwan Lee

Abstract: The aim of this study is to investigate less conservative conditions for parameterized linear matrix inequalities (PLMIs) that are formulated as nested fuzzy summations. Such PLMIs are commonly encountered in stability analysis and control design problems for Takagi-Sugeno (T-S) fuzzy systems. Utilizing the weighted inequality of arithmetic and geometric means (AM-GM inequality), we develop new, l… ▽ More The aim of this study is to investigate less conservative conditions for parameterized linear matrix inequalities (PLMIs) that are formulated as nested fuzzy summations. Such PLMIs are commonly encountered in stability analysis and control design problems for Takagi-Sugeno (T-S) fuzzy systems. Utilizing the weighted inequality of arithmetic and geometric means (AM-GM inequality), we develop new, less conservative linear matrix inequalities for the PLMIs. This methodology enables us to efficiently handle the product of membership functions that have intersecting indices. Through empirical case studies, we demonstrate that our proposed conditions produce less conservative results compared to existing approaches in the literature. △ Less

Submitted 18 December, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: This work has been submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems for possible publications

arXiv:2309.10460 [pdf, ps, other]

Coverage Analysis of Dynamic Coordinated Beamforming for LEO Satellite Downlink Networks

Authors: Daeun Kim, Jeonghun Park, Namyoon Lee

Abstract: In this paper, we investigate the coverage performance of downlink satellite networks employing dynamic coordinated beamforming. Our approach involves modeling the spatial arrangement of satellites and users using Poisson point processes situated on concentric spheres. We derive analytical expressions for the coverage probability, which take into account the in-cluster geometry of the coordinated… ▽ More In this paper, we investigate the coverage performance of downlink satellite networks employing dynamic coordinated beamforming. Our approach involves modeling the spatial arrangement of satellites and users using Poisson point processes situated on concentric spheres. We derive analytical expressions for the coverage probability, which take into account the in-cluster geometry of the coordinated satellite set. These expressions are formulated in terms of various parameters, including the number of antennas per satellite, satellite density, fading characteristics, and path-loss exponent. To offer a more intuitive understanding, we also develop an approximation for the coverage probability. Furthermore, by considering the distribution of normalized distances, we derive the spatially averaged coverage probability, thereby validating the advantages of coordinated beamforming from a spatial average perspective. Our primary finding is that dynamic coordinated beamforming significantly improves coverage compared to the absence of satellite coordination, in direct proportion to the number of antennas on each satellite. Moreover, we observe that the optimal cluster size, which maximizes the ergodic spectral efficiency, increases with higher satellite density, provided that the number of antennas on the satellites is sufficiently large. Our findings are corroborated by simulation results, confirming the accuracy of the derived expressions. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.06841 [pdf, ps, other]

On the Local Quadratic Stability of T-S Fuzzy Systems in the Vicinity of the Origin

Authors: Donghwan Lee, Do Wan Kim

Abstract: The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying non… ▽ More The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying nonlinear system in the vicinity of the origin. As a result, the proposed conditions are proved to be less conservative compared to existing methods using fuzzy Lyapunov functions in the literature. Moreover, we establish that the proposed methods offer necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems. The paper also includes discussions on the inherent limitations associated with fuzzy Lyapunov approaches. To demonstrate the theoretical results, we provide comprehensive examples that elucidate the core concepts and validate the efficacy of the proposed conditions. △ Less

Submitted 13 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.02616 [pdf, other]

Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts

Authors: Hongyang Du, Guangyuan Liu, Dusit Niyato, Jiayi Zhang, Jiawen Kang, Zehui Xiong, Bo Ai, Dong In Kim

Abstract: Semantic communication (SemCom) holds promise for reducing network resource consumption while achieving the communications goal. However, the computational overheads in jointly training semantic encoders and decoders-and the subsequent deployment in network devices-are overlooked. Recent advances in Generative artificial intelligence (GAI) offer a potential solution. The robust learning abilities… ▽ More Semantic communication (SemCom) holds promise for reducing network resource consumption while achieving the communications goal. However, the computational overheads in jointly training semantic encoders and decoders-and the subsequent deployment in network devices-are overlooked. Recent advances in Generative artificial intelligence (GAI) offer a potential solution. The robust learning abilities of GAI models indicate that semantic decoders can reconstruct source messages using a limited amount of semantic information, e.g., prompts, without joint training with the semantic encoder. A notable challenge, however, is the instability introduced by GAI's diverse generation ability. This instability, evident in outputs like text-generated images, limits the direct application of GAI in scenarios demanding accurate message recovery, such as face image transmission. To solve the above problems, this paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. Moreover, in response to security concerns, we introduce the application of covert communications aided by a friendly jammer. The system jointly optimizes the diffusion step, jamming, and transmitting power with the aid of the generative diffusion models, enabling successful and secure transmission of the source messages. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.13202 [pdf, other]

Joint Band Assignment and Beam Management using Hierarchical Reinforcement Learning for Multi-Band Communication

Authors: Dohyun Kim, Miguel R. Castellanos, Robert W. Heath Jr

Abstract: Multi-band operation in wireless networks can improve data rates by leveraging the benefits of propagation in different frequency ranges. Distinctive beam management procedures in different bands complicate band assignment because they require considering not only the channel quality but also the associated beam management overhead. Reinforcement learning (RL) is a promising approach for multi-ban… ▽ More Multi-band operation in wireless networks can improve data rates by leveraging the benefits of propagation in different frequency ranges. Distinctive beam management procedures in different bands complicate band assignment because they require considering not only the channel quality but also the associated beam management overhead. Reinforcement learning (RL) is a promising approach for multi-band operation as it enables the system to learn and adjust its behavior through environmental feedback. In this paper, we formulate a sequential decision problem to jointly perform band assignment and beam management. We propose a method based on hierarchical RL (HRL) to handle the complexity of the problem by separating the policies for band selection and beam management. We evaluate the proposed HRL-based algorithm on a realistic channel generated based on ray-tracing simulators. Our results show that the proposed approach outperforms traditional RL approaches in terms of reduced beam training overhead and increased data rates under a realistic vehicular channel. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.08442 [pdf, other]

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or paragraph-level G2P can improve usability in real-world applications as it is better suited to perform on heteronyms and linking sounds between words, we find that using ByT5 for these scenarios is nontrivial. Since ByT5 operates on the character level, it requires longer decoding steps, which deteriorates the performance due to the exposure bias commonly observed in auto-regressive generation models. This paper shows that the performance of sentence-level and paragraph-level G2P can be improved by mitigating such exposure bias using our proposed loss-based sampling method. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: INTERSPEECH 2023

arXiv:2308.07593 [pdf, other]

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Authors: Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

Abstract: Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different fro… ▽ More Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different from the previous methods, the proposed AKVSR 1) utilizes rich audio knowledge encoded by a large-scale pretrained audio model, 2) saves the linguistic information of audio knowledge in compact audio memory by discarding the non-linguistic information from the audio through quantization, and 3) includes Audio Bridging Module which can find the best-matched audio features from the compact audio memory, which makes our training possible without audio inputs, once after the compact audio memory is composed. We validate the effectiveness of the proposed method through extensive experiments, and achieve new state-of-the-art performances on the widely-used LRS3 dataset. △ Less

Submitted 11 January, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: Accepted by IEEE Transactions on Multimedia

arXiv:2308.05384 [pdf, other]

Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization

Authors: Hongyang Du, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Yi**g Lin, Zonghang Li, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shuguang Cui, Bo Ai, Haibo Zhou, Dong In Kim

Abstract: Generative Diffusion Models (GDMs) have emerged as a transformative force in the realm of Generative Artificial Intelligence (GenAI), demonstrating their versatility and efficacy across various applications. The ability to model complex data distributions and generate high-quality samples has made GDMs particularly effective in tasks such as image generation and reinforcement learning. Furthermore… ▽ More Generative Diffusion Models (GDMs) have emerged as a transformative force in the realm of Generative Artificial Intelligence (GenAI), demonstrating their versatility and efficacy across various applications. The ability to model complex data distributions and generate high-quality samples has made GDMs particularly effective in tasks such as image generation and reinforcement learning. Furthermore, their iterative nature, which involves a series of noise addition and denoising steps, is a powerful and unique approach to learning and generating data. This paper serves as a comprehensive tutorial on applying GDMs in network optimization tasks. We delve into the strengths of GDMs, emphasizing their wide applicability across various domains, such as vision, text, and audio generation. We detail how GDMs can be effectively harnessed to solve complex optimization problems inherent in networks. The paper first provides a basic background of GDMs and their applications in network optimization. This is followed by a series of case studies, showcasing the integration of GDMs with Deep Reinforcement Learning (DRL), incentive mechanism design, Semantic Communications (SemCom), Internet of Vehicles (IoV) networks, etc. These case studies underscore the practicality and efficacy of GDMs in real-world scenarios, offering insights into network design. We conclude with a discussion on potential future directions for GDM research and applications, providing major insights into how they can continue to shape the future of network optimization. △ Less

Submitted 8 May, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: This paper has been accepted by IEEE Communications Surveys & Tutorials (COMST)

arXiv:2308.05063 [pdf, other]

CERMET: Coding for Energy Reduction with Multiple Encryption Techniques -- $It's\ easy\ being\ green$

Authors: Jongchan Woo, Vipindev Adat Vasudevan, Benjamin Kim, Alejandro Cohen, Rafael G. L. D'Oliveira, Thomas Stahlbuhk, Muriel Médard

Abstract: This paper presents CERMET, an energy-efficient hardware architecture designed for hardware-constrained cryptosystems. CERMET employs a base cryptosystem in conjunction with network coding to provide both information-theoretic and computational security while reducing energy consumption per bit. This paper introduces the hardware architecture for the system and explores various optimizations to en… ▽ More This paper presents CERMET, an energy-efficient hardware architecture designed for hardware-constrained cryptosystems. CERMET employs a base cryptosystem in conjunction with network coding to provide both information-theoretic and computational security while reducing energy consumption per bit. This paper introduces the hardware architecture for the system and explores various optimizations to enhance its performance. The universality of the approach is demonstrated by designing the architecture to accommodate both asymmetric and symmetric cryptosystems. The analysis reveals that the benefits of this proposed approach are multifold, reducing energy per bit and area without compromising security or throughput. The optimized hardware architectures can achieve below 1 pJ/bit operations for AES-256. Furthermore, for a public key cryptosystem based on Elliptic Curve Cryptography (ECC), a remarkable 14.6X reduction in energy per bit and a 9.3X reduction in area are observed, bringing it to less than 1 nJ/bit. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.04717 [pdf, other]

Fully Decentralized Peer-to-Peer Community Grid with Dynamic and Congestion Pricing

Authors: Hien Thanh Doan, Truong Hoang Bao Huy, Daehee Kim, Hongseok Kim

Abstract: Peer-to-peer (P2P) electricity markets enable prosumers to minimize their costs, which has been extensively studied in recent research. However, there are several challenges with P2P trading when physical network constraints are also included. Moreover, most studies use fixed prices for grid power prices without considering dynamic grid pricing, and equity for all participants. This policy may neg… ▽ More Peer-to-peer (P2P) electricity markets enable prosumers to minimize their costs, which has been extensively studied in recent research. However, there are several challenges with P2P trading when physical network constraints are also included. Moreover, most studies use fixed prices for grid power prices without considering dynamic grid pricing, and equity for all participants. This policy may negatively affect the long-term development of the market if prosumers with low demand are not treated fairly. An initial step towards addressing these problems is the design of a new decentralized P2P electricity market with two dynamic grid pricing schemes that are determined by consumer demand. Futhermore, we consider a decentralized system with physical constraints for optimizing power flow in networks without compromising privacy. We propose a dynamic congestion price to effectively address congestion and then prove the convergence and global optimality of the proposed method. Our experiments show that P2P energy trade decreases generation cost of main grid by 56.9% compared with previous works. Consumers reduce grid trading by 57.3% while the social welfare of consumers is barely affected by the increase of grid price. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.01831 [pdf, other]

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

Authors: Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro

Abstract: In this paper, we propose a method to learn unified representations of multilingual speech and text with a single model, especially focusing on the purpose of speech synthesis. We represent multilingual speech audio with speech units, the quantized representations of speech features encoded from a self-supervised speech model. Therefore, we can focus on their linguistic content by treating the aud… ▽ More In this paper, we propose a method to learn unified representations of multilingual speech and text with a single model, especially focusing on the purpose of speech synthesis. We represent multilingual speech audio with speech units, the quantized representations of speech features encoded from a self-supervised speech model. Therefore, we can focus on their linguistic content by treating the audio as pseudo text and can build a unified representation of speech and text. Then, we propose to train an encoder-decoder structured model with a Unit-to-Unit Translation (UTUT) objective on multilingual data. Specifically, by conditioning the encoder with the source language token and the decoder with the target language token, the model is optimized to translate the spoken language into that of the target language, in a many-to-many language translation setting. Therefore, the model can build the knowledge of how spoken languages are comprehended and how to relate them to different languages. A single pre-trained model with UTUT can be employed for diverse multilingual speech- and text-related tasks, such as Speech-to-Speech Translation (STS), multilingual Text-to-Speech Synthesis (TTS), and Text-to-Speech Translation (TTST). By conducting comprehensive experiments encompassing various languages, we validate the efficacy of the proposed method across diverse multilingual tasks. Moreover, we show UTUT can perform many-to-many language STS, which has not been previously explored in the literature. Samples are available on https://choijeongsoo.github.io/utut. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2307.16706 [pdf, ps, other]

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

Authors: Donghwan Lee, Han-Dong Lim, Do Wan Kim

Abstract: The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parame… ▽ More The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms. △ Less

Submitted 13 June, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

arXiv:2307.12644 [pdf, other]

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

Authors: Dae-Yeol Kim, Eunsu Goh, KwangKee Lee, JongEui Chae, JongHyeon Mun, Junyeong Na, Chae-bong Sohn, Do-Yup Kim

Abstract: rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monito… ▽ More rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monitoring, and early prediction of cardiovascular disease. rPPG is rapidly evolving and attracting great attention from both academia and industry by providing great usability and convenience as it can measure biosignals using a camera-equipped device without medical or wearable devices. Despite extensive efforts and advances in this field, serious challenges remain, including issues related to skin color, camera characteristics, ambient lighting, and other sources of noise and artifacts, which degrade accuracy performance. We argue that fair and evaluable benchmarking is urgently required to overcome these challenges and make meaningful progress from both academic and commercial perspectives. In most existing work, models are trained, tested, and validated only on limited datasets. Even worse, some studies lack available code or reproducibility, making it difficult to fairly evaluate and compare performance. Therefore, the purpose of this study is to provide a benchmarking framework to evaluate various rPPG techniques across a wide range of datasets for fair evaluation and comparison, including both conventional non-deep neural network (non-DNN) and deep neural network (DNN) methods. GitHub URL: https://github.com/remotebiosensing/rppg △ Less

Submitted 18 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 20 pages, 10 figures

MSC Class: 68T45; 68T07 ACM Class: I.4.9; I.5.4; I.2

arXiv:2307.12254 [pdf, other]

Semantic Communication-Empowered Vehicle Count Prediction for Traffic Management

Authors: Sachin Kadam, Dong In Kim

Abstract: Vehicle count prediction is an important aspect of smart city traffic management. Most major roads are monitored by cameras with computing and transmitting capabilities. These cameras provide data to the central traffic controller (CTC), which is in charge of traffic control management. In this paper, we propose a joint CNN-LSTM-based semantic communication (SemCom) model in which the semantic enc… ▽ More Vehicle count prediction is an important aspect of smart city traffic management. Most major roads are monitored by cameras with computing and transmitting capabilities. These cameras provide data to the central traffic controller (CTC), which is in charge of traffic control management. In this paper, we propose a joint CNN-LSTM-based semantic communication (SemCom) model in which the semantic encoder of a camera extracts the relevant semantics from raw images. The encoded semantics are then sent to the CTC by the transmitter in the form of symbols. The semantic decoder of the CTC predicts the vehicle count on each road based on the sequence of received symbols and develops a traffic management strategy accordingly. Using numerical results, we show that the proposed SemCom model reduces overhead by $54.42\%$ when compared to source encoder/decoder methods. Also, we demonstrate through simulations that the proposed model outperforms state-of-the-art models in terms of mean absolute error (MAE) and mean-squared error (MSE). △ Less

Submitted 2 January, 2024; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: Accepted for publication in WCNC 2024 - IEEE Wireless Communications and Networking Conference, Dubai, United Arab Emirates (UAE), April 2024

arXiv:2307.10550 [pdf]

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

Authors: Daegyeom Kim, Seongho Hong, Yong-Hoon Choi

Abstract: Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generativ… ▽ More Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.10538 [pdf, other]

Power Allocation for Device-to-Device Interference Channel Using Truncated Graph Transformers

Authors: Dohoon Kim, Shenghui Song

Abstract: Power control for the device-to-device interference channel with single-antenna transceivers has been widely analyzed with both model-based methods and learning-based approaches. Although the learning-based approaches, i.e., datadriven and model-driven, offer performance improvement, the widely adopted graph neural network suffers from learning the heterophilous power distribution of the interfere… ▽ More Power control for the device-to-device interference channel with single-antenna transceivers has been widely analyzed with both model-based methods and learning-based approaches. Although the learning-based approaches, i.e., datadriven and model-driven, offer performance improvement, the widely adopted graph neural network suffers from learning the heterophilous power distribution of the interference channel. In this paper, we propose a deep learning architecture in the family of graph transformers to circumvent the issue. Experiment results show that the proposed methods achieve the state-of-theart performance across a wide range of untrained network configurations. Furthermore, we show there is a trade-off between model complexity and generality. △ Less

Submitted 23 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures. Accepted in IEEE International Mediterranean Conference on Communications and Networking

arXiv:2307.10263 [pdf, ps, other]

Dynamic Joint Scheduling of Anycast Transmission and Modulation in Hybrid Unicast-Multicast SWIPT-Based IoT Sensor Networks

Authors: Do-Yup Kim, Chae-Bong Sohn, Hyun-Suk Lee

Abstract: The separate receiver architecture with a time- or power-splitting mode, widely used for simultaneous wireless information and power transfer (SWIPT), has a major drawback: Energy-intensive local oscillators and mixers need to be installed in the information decoding (ID) component to downconvert radio frequency (RF) signals to baseband signals, resulting in high energy consumption. As a solution… ▽ More The separate receiver architecture with a time- or power-splitting mode, widely used for simultaneous wireless information and power transfer (SWIPT), has a major drawback: Energy-intensive local oscillators and mixers need to be installed in the information decoding (ID) component to downconvert radio frequency (RF) signals to baseband signals, resulting in high energy consumption. As a solution to this challenge, an integrated receiver (IR) architecture has been proposed, and, in turn, various SWIPT modulation schemes compatible with the IR architecture have been developed. However, to the best of our knowledge, no research has been conducted on modulation scheduling in SWIPT-based IoT sensor networks while taking into account the IR architecture. Accordingly, in this paper, we address this research gap by studying the problem of joint scheduling for unicast/multicast, IoT sensor, and modulation (UMSM) in a time-slotted SWIPT-based IoT sensor network system. To this end, we leverage mathematical modeling and optimization techniques, such as the Lagrangian duality and stochastic optimization theory, to develop an UMSM scheduling algorithm that maximizes the weighted sum of average unicast service throughput and harvested energy of IoT sensors, while ensuring the minimum average throughput of both multicast and unicast, as well as the minimum average harvested energy of IoT sensors. Finally, we demonstrate through extensive simulations that our UMSM scheduling algorithm achieves superior energy harvesting (EH) and throughput performance while ensuring the satisfaction of specified constraints well. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 29 pages, 13 figures (eps)

MSC Class: 90B36; 68M20; 90B15 ACM Class: C.2.3; I.2.8

arXiv:2307.07123 [pdf, other]

Improved Flood Insights: Diffusion-Based SAR to EO Image Translation

Authors: Minseok Seo, Youngtack Oh, Doyi Kim, Dongmin Kang, Yeji Choi

Abstract: Driven by rapid climate change, the frequency and intensity of flood events are increasing. Electro-Optical (EO) satellite imagery is commonly utilized for rapid response. However, its utilities in flood situations are hampered by issues such as cloud cover and limitations during nighttime, making accurate assessment of damage challenging. Several alternative flood detection techniques utilizing S… ▽ More Driven by rapid climate change, the frequency and intensity of flood events are increasing. Electro-Optical (EO) satellite imagery is commonly utilized for rapid response. However, its utilities in flood situations are hampered by issues such as cloud cover and limitations during nighttime, making accurate assessment of damage challenging. Several alternative flood detection techniques utilizing Synthetic Aperture Radar (SAR) data have been proposed. Despite the advantages of SAR over EO in the aforementioned situations, SAR presents a distinct drawback: human analysts often struggle with data interpretation. To tackle this issue, this paper introduces a novel framework, Diffusion-Based SAR to EO Image Translation (DSE). The DSE framework converts SAR images into EO images, thereby enhancing the interpretability of flood insights for humans. Experimental results on the Sen1Floods11 and SEN12-FLOOD datasets confirm that the DSE framework not only delivers enhanced visual information but also improves performance across all tested flood segmentation baselines. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 10 pages, 6 figures

Report number: 10

arXiv:2307.00541 [pdf, ps, other]

Collaborative Policy Learning for Dynamic Scheduling Tasks in Cloud-Edge-Terminal IoT Networks Using Federated Reinforcement Learning

Authors: Do-Yup Kim, Da-Eun Lee, Ji-Wan Kim, Hyun-Suk Lee

Abstract: In this paper, we examine cloud-edge-terminal IoT networks, where edges undertake a range of typical dynamic scheduling tasks. In these IoT networks, a central policy for each task can be constructed at a cloud server. The central policy can be then used by the edges conducting the task, thereby mitigating the need for them to learn their own policy from scratch. Furthermore, this central policy c… ▽ More In this paper, we examine cloud-edge-terminal IoT networks, where edges undertake a range of typical dynamic scheduling tasks. In these IoT networks, a central policy for each task can be constructed at a cloud server. The central policy can be then used by the edges conducting the task, thereby mitigating the need for them to learn their own policy from scratch. Furthermore, this central policy can be collaboratively learned at the cloud server by aggregating local experiences from the edges, thanks to the hierarchical architecture of the IoT networks. To this end, we propose a novel collaborative policy learning framework for dynamic scheduling tasks using federated reinforcement learning. For effective learning, our framework adaptively selects the tasks for collaborative learning in each round, taking into account the need for fairness among tasks. In addition, as a key enabler of the framework, we propose an edge-agnostic policy structure that enables the aggregation of local policies from different edges. We then provide the convergence analysis of the framework. Through simulations, we demonstrate that our proposed framework significantly outperforms the approaches without collaborative policy learning. Notably, it accelerates the learning speed of the policies and allows newly arrived edges to adapt to their tasks more easily. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 14 pages, 16 figures, IEEEtran.cls

MSC Class: 68M20; 68T05; 68T07 ACM Class: C.2.1; C.2.4; I.2.8; I.2.11

arXiv:2306.16739 [pdf, other]

Sparse RF Lens Antenna Array Design for AoA Estimation in Wideband Systems: Placement Optimization and Performance Analysis

Authors: Joo-Hyun Jo, Jae-Nam Shim, Chan-Byoung Chae, Dong Ku Kim, Robert W. Heath Jr

Abstract: In this paper, we propose a novel architecture for a lens antenna array (LAA) designed to work with a small number of antennas and enable angle-of-arrival (AoA) estimation for advanced 5G vehicle-to-everything (V2X) use cases that demand wider bandwidths and higher data rates. We derive a received signal in terms of optical analysis to consider the variability of the focal region for different car… ▽ More In this paper, we propose a novel architecture for a lens antenna array (LAA) designed to work with a small number of antennas and enable angle-of-arrival (AoA) estimation for advanced 5G vehicle-to-everything (V2X) use cases that demand wider bandwidths and higher data rates. We derive a received signal in terms of optical analysis to consider the variability of the focal region for different carrier frequencies in a wideband multi-carrier system. By taking full advantage of the beam squint effect for multiple pilot signals with different frequencies, we propose a novel reconfiguration of antenna array (RAA) for the sparse LAA and a max-energy antenna selection (MS) algorithm for the AoA estimation. In addition, this paper presents an analysis of the received power at the single antenna with the maximum energy and compares it to simulation results. In contrast to previous studies on LAA that assumed a large number of antennas, which can require high complexity and hardware costs, the proposed RAA with MS estimation algorithm is shown meets the requirements of 5G V2X in a vehicular environment while utilizing limited RF hardware and has low complexity. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: 15 pages, 10 figures

Showing 1–50 of 186 results for author: Kim, D