Search | arXiv e-print repository

Overlay Space-Air-Ground Integrated Networks with SWIPT-Empowered Aerial Communications

Authors: Anuradha Verma, Pankaj Kumar Sharma, Pawan Kumar, Dong In Kim

Abstract: In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employ… ▽ More In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employed at the aerial transmitter. Specifically, we take the random locations of the satellite, ground and aerial receivers to investigate the outage performance of both the satellite-to-ground and aerial networks leveraging the stochastic tools. By taking into account the Shadowed-Rician fading for satellite link, the Nakagami-\emph{m} for ground link, and the Rician fading for aerial link, we derive analytical expressions for the outage probability of these networks. For a comprehensive analysis of aerial network, we consider both the perfect and imperfect successive interference cancellation (SIC) scenarios. Through our analysis, we illustrate that, unlike linear EH, the implementation of non-linear EH provides accurate figures for any target rate, underscoring the significance of using non-linear EH models. Additionally, the influence of key parameters is emphasized, providing guidelines for the practical design of an energy-efficient as well as spectrum-efficient future non-terrestrial networks. Monte Carlo simulations validate the accuracy of our theoretical developments. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 36 pages, 14 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.10272 [pdf, other]

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations in facial motion for the same identity. To tackle these issues, we introduce a motion sampler based on conditional flow matching, which is capable of high-quality motion code generation in an efficient way. Moreover, we introduce a novel conditioning method for the TTS system, which utilises motion-removed features from the TFG model to yield uniform speech outputs. Our extensive experiments demonstrate that our method effectively creates natural-looking talking faces and speech that accurately match the input text. To our knowledge, this is the first effort to build a multimodal synthesis system that can generalise to unseen identities. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2404.18705 [pdf, other]

Wireless Information and Energy Transfer in the Era of 6G Communications

Authors: Constantinos Psomas, Konstantinos Ntougias, Nikita Shanin, Dongfang Xu, Kenneth MacSporran Mayer, Nguyen Minh Tran, Laura Cottatellucci, Kae Won Choi, Dong In Kim, Robert Schober, Ioannis Krikidis

Abstract: Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting… ▽ More Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting the quality-of-service demands of WIET, in terms of both data transfer and power delivery, requires effective co-design of the information and energy signals. In this article, we present the main principles and design aspects of WIET, focusing on its integration in 6G networks. First, we discuss how conventional communication notions such as resource allocation and waveform design need to be revisited in the context of WIET. Next, we consider various candidate 6G technologies that can boost WIET efficiency, namely, holographic multiple-input multiple-output, near-field beamforming, terahertz communication, intelligent reflecting surfaces (IRSs), and reconfigurable (fluid) antenna arrays. We introduce respective WIET design methods, analyze the promising performance gains of these WIET systems, and discuss challenges, open issues, and future research directions. Finally, a near-field energy beamforming scheme and a power-based IRS beamforming algorithm are experimentally validated using a wireless energy transfer testbed. The vision of WIET in communication systems has been gaining momentum in recent years, with constant progress with respect to theoretical but also practical aspects. The comprehensive overview of the state of the art of WIET presented in this paper highlights the potentials of WIET systems as well as their overall benefits in 6G networks. △ Less

Submitted 16 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Proceedings of the IEEE, 36 pages, 33 figures

arXiv:2404.14140 [pdf, other]

Generative Artificial Intelligence Assisted Wireless Sensing: Human Flow Detection in Practical Communication Environments

Authors: Jiacheng Wang, Hongyang Du, Dusit Niyato, Zehui Xiong, Jiawen Kang, Bo Ai, Zhu Han, Dong In Kim

Abstract: Groundbreaking applications such as ChatGPT have heightened research interest in generative artificial intelligence (GAI). Essentially, GAI excels not only in content generation but also in signal processing, offering support for wireless sensing. Hence, we introduce a novel GAI-assisted human flow detection system (G-HFD). Rigorously, G-HFD first uses channel state information (CSI) to estimate t… ▽ More Groundbreaking applications such as ChatGPT have heightened research interest in generative artificial intelligence (GAI). Essentially, GAI excels not only in content generation but also in signal processing, offering support for wireless sensing. Hence, we introduce a novel GAI-assisted human flow detection system (G-HFD). Rigorously, G-HFD first uses channel state information (CSI) to estimate the velocity and acceleration of propagation path length change of the human-induced reflection (HIR). Then, given the strong inference ability of the diffusion model, we propose a unified weighted conditional diffusion model (UW-CDM) to denoise the estimation results, enabling the detection of the number of targets. Next, we use the CSI obtained by a uniform linear array with wavelength spacing to estimate the HIR's time of flight and direction of arrival (DoA). In this process, UW-CDM solves the problem of ambiguous DoA spectrum, ensuring accurate DoA estimation. Finally, through clustering, G-HFD determines the number of subflows and the number of targets in each subflow, i.e., the subflow size. The evaluation based on practical downlink communication signals shows G-HFD's accuracy of subflow size detection can reach 91%. This validates its effectiveness and underscores the significant potential of GAI in the context of wireless sensing. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2403.19132 [pdf, ps, other]

Meta-Heuristic Fronthaul Bit Allocation for Cell-free Massive MIMO Systems

Authors: Minje Kim, In-soo Kim, Junil Choi

Abstract: Limited capacity of fronthaul links in a cell-free massive multiple-input multiple-output (MIMO) system can cause quantization errors at a central processing unit (CPU) during data transmission, complicating the centralized rate optimization problem. Addressing this challenge, we propose a harmony search (HS)-based algorithm that renders the combinatorial non-convex problem tractable. One of the d… ▽ More Limited capacity of fronthaul links in a cell-free massive multiple-input multiple-output (MIMO) system can cause quantization errors at a central processing unit (CPU) during data transmission, complicating the centralized rate optimization problem. Addressing this challenge, we propose a harmony search (HS)-based algorithm that renders the combinatorial non-convex problem tractable. One of the distinctive features of our algorithm is its hierarchical structure: it first allocates resources at the access point (AP) level and subsequently optimizes for user equipment (UE), ensuring a more efficient and structured approach to resource allocation. Our proposed algorithm deals with rigorous conditions, such as asymmetric fronthaul bit allocation and distinct quantization error levels at each AP, which were not considered in previous works. We derive a closed-form expression of signal-to-interference-plusnoise ratio (SINR), in which additive quantization noise model (AQNM) based distortion error is taken into account, to define the mathematical expression of spectral efficiency (SE) for each UE. Also, we provide analyses on computational complexity and convergence to investigate the practicality of proposed algorithm. By leveraging various performance metrics such as total SE and max-min fairness, we demonstrate that the proposed algorithm can adaptively optimize the fronthaul bit allocation depending on system requirements. Finally, simulation results show that the proposed algorithm can achieve satisfactory performance while maintaining low computational complexity, as compared to the exhaustive search method △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 16 pages, 13 figures, accepted to IEEE Transactions on Wireless Communications (TWC)

arXiv:2403.16477 [pdf, other]

Safeguarding Next Generation Multiple Access Using Physical Layer Security Techniques: A Tutorial

Authors: Lu Lv, Dongyang Xu, Rose Qingyang Hu, Yinghui Ye, Long Yang, Xianfu Lei, Xianbin Wang, Dong In Kim, Arumugam Nallanathan

Abstract: Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Neve… ▽ More Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Nevertheless, the support of massive access via NOMA leads to additional security threats, due to the open nature of the air interface, the broadcast characteristic of radio propagation as well as intertwined relationship among paired NOMA users. To address this specific challenge, the superimposed transmission of NOMA can be explored as new opportunities for security aware design, for example, multiuser interference inherent in NOMA can be constructively engineered to benefit communication secrecy and privacy. The purpose of this tutorial is to provide a comprehensive overview on the state-of-the-art physical layer security techniques that guarantee wireless security and privacy for NOMA networks, along with the opportunities, technical challenges, and future research trends. △ Less

Submitted 21 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Invited paper by Proceedings of the IEEE

arXiv:2403.04925 [pdf, ps, other]

Near Field Communications for DMA-NOMA Networks

Authors: Zheng Zhang, Yuanwei Liu, Zhaolin Wang, Jian Chen, Dong In Kim

Abstract: A novel near-field transmission framework is proposed for dynamic metasurface antenna (DMA)-enabled non-orthogonal multiple access (NOMA) networks. The base station (BS) exploits the hybrid beamforming to communicate with multiple near users (NUs) and far users (FUs) using the NOMA principle. Based on this framework, two novel beamforming schemes are proposed. 1) For the case of the grouped users… ▽ More A novel near-field transmission framework is proposed for dynamic metasurface antenna (DMA)-enabled non-orthogonal multiple access (NOMA) networks. The base station (BS) exploits the hybrid beamforming to communicate with multiple near users (NUs) and far users (FUs) using the NOMA principle. Based on this framework, two novel beamforming schemes are proposed. 1) For the case of the grouped users distributed in the same direction, a beam-steering scheme is developed. The metric of beam pattern error (BPE) is introduced for the characterization of the gap between the hybrid beamformers and the desired ideal beamformers, where a two-layer algorithm is proposed to minimize BPE by optimizing hybrid beamformers. Then, the optimal power allocation strategy is obtained to maximize the sum achievable rate of the network. 2) For the case of users randomly distributed, a beam-splitting scheme is proposed, where two sub-beamformers are extracted from the single beamformer to serve different users in the same group. An alternating optimization (AO) algorithm is proposed for hybrid beamformer optimization, and the optimal power allocation is also derived. Numerical results validate that: 1) the proposed beamforming schemes exhibit superior performance compared with the existing imperfect-resolution-based beamforming scheme; 2) the communication rate of the proposed transmission framework is sensitive to the imperfect distance knowledge of NUs but not to that of FUs. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 13 pages

arXiv:2402.09756 [pdf, other]

Mixture of Experts for Network Optimization: A Large Language Model-enabled Approach

Authors: Hongyang Du, Guangyuan Liu, Yi**g Lin, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim

Abstract: Optimizing various wireless user tasks poses a significant challenge for networking systems because of the expanding range of user requirements. Despite advancements in Deep Reinforcement Learning (DRL), the need for customized optimization tasks for individual users complicates develo** and applying numerous DRL models, leading to substantial computation resource and energy consumption and can… ▽ More Optimizing various wireless user tasks poses a significant challenge for networking systems because of the expanding range of user requirements. Despite advancements in Deep Reinforcement Learning (DRL), the need for customized optimization tasks for individual users complicates develo** and applying numerous DRL models, leading to substantial computation resource and energy consumption and can lead to inconsistent outcomes. To address this issue, we propose a novel approach utilizing a Mixture of Experts (MoE) framework, augmented with Large Language Models (LLMs), to analyze user objectives and constraints effectively, select specialized DRL experts, and weigh each decision from the participating experts. Specifically, we develop a gate network to oversee the expert models, allowing a collective of experts to tackle a wide array of new tasks. Furthermore, we innovatively substitute the traditional gate network with an LLM, leveraging its advanced reasoning capabilities to manage expert model selection for joint decisions. Our proposed method reduces the need to train new DRL models for each unique optimization problem, decreasing energy consumption and AI model implementation costs. The LLM-enabled MoE approach is validated through a general maze navigation task and a specific network service provider utility maximization task, demonstrating its effectiveness and practical applicability in optimizing complex networking systems. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2311.06523 [pdf, other]

Generative AI for Space-Air-Ground Integrated Networks (SAGIN)

Authors: Ruichen Zhang, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, ** Zhang, Dong In Kim

Abstract: Recently, generative AI technologies have emerged as a significant advancement in artificial intelligence field, renowned for their language and image generation capabilities. Meantime, space-air-ground integrated network (SAGIN) is an integral part of future B5G/6G for achieving ubiquitous connectivity. Inspired by this, this article explores an integration of generative AI in SAGIN, focusing on… ▽ More Recently, generative AI technologies have emerged as a significant advancement in artificial intelligence field, renowned for their language and image generation capabilities. Meantime, space-air-ground integrated network (SAGIN) is an integral part of future B5G/6G for achieving ubiquitous connectivity. Inspired by this, this article explores an integration of generative AI in SAGIN, focusing on potential applications and case study. We first provide a comprehensive review of SAGIN and generative AI models, highlighting their capabilities and opportunities of their integration. Benefiting from generative AI's ability to generate useful data and facilitate advanced decision-making processes, it can be applied to various scenarios of SAGIN. Accordingly, we present a concise survey on their integration, including channel modeling and channel state information (CSI) estimation, joint air-space-ground resource allocation, intelligent network deployment, semantic communications, image extraction and processing, security and privacy enhancement. Next, we propose a framework that utilizes a Generative Diffusion Model (GDM) to construct channel information map to enhance quality of service for SAGIN. Simulation results demonstrate the effectiveness of the proposed framework. Finally, we discuss potential research directions for generative AI-enabled SAGIN. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 9page, 5 figures

arXiv:2309.12047 [pdf, other]

doi 10.1145/3610548.3618140

Self-Calibrating, Fully Differentiable NLOS Inverse Rendering

Authors: Kiseok Choi, Inchul Kim, Dongyoung Choi, Julio Marco, Diego Gutierrez, Min H. Kim

Abstract: Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully… ▽ More Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully-differentiable end-to-end NLOS inverse rendering pipeline that self-calibrates the imaging parameters during the reconstruction of hidden scenes, using as input only the measured illumination while working both in the time and frequency domains. Our pipeline extracts a geometric representation of the hidden scene from NLOS volumetric intensities and estimates the time-resolved illumination at the relay wall produced by such geometric information using differentiable transient rendering. We then use gradient descent to optimize imaging parameters by minimizing the error between our simulated time-resolved illumination and the measured illumination. Our end-to-end differentiable pipeline couples diffraction-based volumetric NLOS reconstruction with path-space light transport and a simple ray marching technique to extract detailed, dense sets of surface points and normals of hidden scenes. We demonstrate the robustness of our method to consistently reconstruct geometry and albedo, even under significant noise levels. △ Less

Submitted 25 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Journal ref: Proceedings of ACM SIGGRAPH Asia 2023 (December 2023)

arXiv:2309.02616 [pdf, other]

Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts

Authors: Hongyang Du, Guangyuan Liu, Dusit Niyato, Jiayi Zhang, Jiawen Kang, Zehui Xiong, Bo Ai, Dong In Kim

Abstract: Semantic communication (SemCom) holds promise for reducing network resource consumption while achieving the communications goal. However, the computational overheads in jointly training semantic encoders and decoders-and the subsequent deployment in network devices-are overlooked. Recent advances in Generative artificial intelligence (GAI) offer a potential solution. The robust learning abilities… ▽ More Semantic communication (SemCom) holds promise for reducing network resource consumption while achieving the communications goal. However, the computational overheads in jointly training semantic encoders and decoders-and the subsequent deployment in network devices-are overlooked. Recent advances in Generative artificial intelligence (GAI) offer a potential solution. The robust learning abilities of GAI models indicate that semantic decoders can reconstruct source messages using a limited amount of semantic information, e.g., prompts, without joint training with the semantic encoder. A notable challenge, however, is the instability introduced by GAI's diverse generation ability. This instability, evident in outputs like text-generated images, limits the direct application of GAI in scenarios demanding accurate message recovery, such as face image transmission. To solve the above problems, this paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. Moreover, in response to security concerns, we introduce the application of covert communications aided by a friendly jammer. The system jointly optimizes the diffusion step, jamming, and transmitting power with the aid of the generative diffusion models, enabling successful and secure transmission of the source messages. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.05384 [pdf, other]

Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization

Authors: Hongyang Du, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Yi**g Lin, Zonghang Li, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shuguang Cui, Bo Ai, Haibo Zhou, Dong In Kim

Abstract: Generative Diffusion Models (GDMs) have emerged as a transformative force in the realm of Generative Artificial Intelligence (GenAI), demonstrating their versatility and efficacy across various applications. The ability to model complex data distributions and generate high-quality samples has made GDMs particularly effective in tasks such as image generation and reinforcement learning. Furthermore… ▽ More Generative Diffusion Models (GDMs) have emerged as a transformative force in the realm of Generative Artificial Intelligence (GenAI), demonstrating their versatility and efficacy across various applications. The ability to model complex data distributions and generate high-quality samples has made GDMs particularly effective in tasks such as image generation and reinforcement learning. Furthermore, their iterative nature, which involves a series of noise addition and denoising steps, is a powerful and unique approach to learning and generating data. This paper serves as a comprehensive tutorial on applying GDMs in network optimization tasks. We delve into the strengths of GDMs, emphasizing their wide applicability across various domains, such as vision, text, and audio generation. We detail how GDMs can be effectively harnessed to solve complex optimization problems inherent in networks. The paper first provides a basic background of GDMs and their applications in network optimization. This is followed by a series of case studies, showcasing the integration of GDMs with Deep Reinforcement Learning (DRL), incentive mechanism design, Semantic Communications (SemCom), Internet of Vehicles (IoV) networks, etc. These case studies underscore the practicality and efficacy of GDMs in real-world scenarios, offering insights into network design. We conclude with a discussion on potential future directions for GDM research and applications, providing major insights into how they can continue to shape the future of network optimization. △ Less

Submitted 8 May, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: This paper has been accepted by IEEE Communications Surveys & Tutorials (COMST)

arXiv:2307.12254 [pdf, other]

Semantic Communication-Empowered Vehicle Count Prediction for Traffic Management

Authors: Sachin Kadam, Dong In Kim

Abstract: Vehicle count prediction is an important aspect of smart city traffic management. Most major roads are monitored by cameras with computing and transmitting capabilities. These cameras provide data to the central traffic controller (CTC), which is in charge of traffic control management. In this paper, we propose a joint CNN-LSTM-based semantic communication (SemCom) model in which the semantic enc… ▽ More Vehicle count prediction is an important aspect of smart city traffic management. Most major roads are monitored by cameras with computing and transmitting capabilities. These cameras provide data to the central traffic controller (CTC), which is in charge of traffic control management. In this paper, we propose a joint CNN-LSTM-based semantic communication (SemCom) model in which the semantic encoder of a camera extracts the relevant semantics from raw images. The encoded semantics are then sent to the CTC by the transmitter in the form of symbols. The semantic decoder of the CTC predicts the vehicle count on each road based on the sequence of received symbols and develops a traffic management strategy accordingly. Using numerical results, we show that the proposed SemCom model reduces overhead by $54.42\%$ when compared to source encoder/decoder methods. Also, we demonstrate through simulations that the proposed model outperforms state-of-the-art models in terms of mean absolute error (MAE) and mean-squared error (MSE). △ Less

Submitted 2 January, 2024; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: Accepted for publication in WCNC 2024 - IEEE Wireless Communications and Networking Conference, Dubai, United Arab Emirates (UAE), April 2024

arXiv:2303.01896 [pdf, other]

AI-Generated Incentive Mechanism and Full-Duplex Semantic Communications for Information Sharing

Authors: Hongyang Du, Jiacheng Wang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim

Abstract: The next generation of Internet services, such as Metaverse, rely on mixed reality (MR) technology to provide immersive user experiences. However, the limited computation power of MR headset-mounted devices (HMDs) hinders the deployment of such services. Therefore, we propose an efficient information sharing scheme based on full-duplex device-to-device (D2D) semantic communications to address this… ▽ More The next generation of Internet services, such as Metaverse, rely on mixed reality (MR) technology to provide immersive user experiences. However, the limited computation power of MR headset-mounted devices (HMDs) hinders the deployment of such services. Therefore, we propose an efficient information sharing scheme based on full-duplex device-to-device (D2D) semantic communications to address this issue. Our approach enables users to avoid heavy and repetitive computational tasks, such as artificial intelligence-generated content (AIGC) in the view images of all MR users. Specifically, a user can transmit the generated content and semantic information extracted from their view image to nearby users, who can then use this information to obtain the spatial matching of computation results under their view images. We analyze the performance of full-duplex D2D communications, including the achievable rate and bit error probability, by using generalized small-scale fading models. To facilitate semantic information sharing among users, we design a contract theoretic AI-generated incentive mechanism. The proposed diffusion model generates the optimal contract design, outperforming two deep reinforcement learning algorithms, i.e., proximal policy optimization and soft actor-critic algorithms. Our numerical analysis experiment proves the effectiveness of our proposed methods. The code for this paper is available at https://github.com/HongyangDu/SemSharing △ Less

Submitted 28 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE JSAC

arXiv:2302.14370 [pdf, other]

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

Authors: Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim

Abstract: While recent text-to-speech (TTS) systems have made remarkable strides toward human-level quality, the performance of cross-lingual TTS lags behind that of intra-lingual TTS. This gap is mainly rooted from the speaker-language entanglement problem in cross-lingual TTS. In this paper, we propose CrossSpeech which improves the quality of cross-lingual speech by effectively disentangling speaker and… ▽ More While recent text-to-speech (TTS) systems have made remarkable strides toward human-level quality, the performance of cross-lingual TTS lags behind that of intra-lingual TTS. This gap is mainly rooted from the speaker-language entanglement problem in cross-lingual TTS. In this paper, we propose CrossSpeech which improves the quality of cross-lingual speech by effectively disentangling speaker and language information in the level of acoustic feature space. Specifically, CrossSpeech decomposes the speech generation pipeline into the speaker-independent generator (SIG) and speaker-dependent generator (SDG). The SIG produces the speaker-independent acoustic representation which is not biased to specific speaker distributions. On the other hand, the SDG models speaker-dependent speech variation that characterizes speaker attributes. By handling each information separately, CrossSpeech can obtain disentangled speaker and language representations. From the experiments, we verify that CrossSpeech achieves significant improvements in cross-lingual TTS, especially in terms of speaker similarity to the target speaker. △ Less

Submitted 12 June, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: Accepted to ICASSP 2023

arXiv:2301.12161 [pdf, other]

doi 10.1109/ICC45041.2023.10278770

Knowledge-Aware Semantic Communication System Design

Authors: Sachin Kadam, Dong In Kim

Abstract: The recent emergence of 6G raises the challenge of increasing the transmission data rate even further in order to break the barrier set by the Shannon limit. Traditional communication methods fall short of the 6G goals, paving the way for Semantic Communication (SemCom) systems. These systems find applications in wide range of fields such as economics, metaverse, autonomous transportation systems,… ▽ More The recent emergence of 6G raises the challenge of increasing the transmission data rate even further in order to break the barrier set by the Shannon limit. Traditional communication methods fall short of the 6G goals, paving the way for Semantic Communication (SemCom) systems. These systems find applications in wide range of fields such as economics, metaverse, autonomous transportation systems, healthcare, smart factories, etc. In SemCom systems, only the relevant information from the data, known as semantic data, is extracted to eliminate unwanted overheads in the raw data and then transmitted after encoding. In this paper, we first use the shared knowledge base to extract the keywords from the dataset. Then, we design an auto-encoder and auto-decoder that only transmit these keywords and, respectively, recover the data using the received keywords and the shared knowledge. We show analytically that the overall semantic distortion function has an upper bound, which is shown in the literature to converge. We numerically compute the accuracy of the reconstructed sentences at the receiver. Using simulations, we show that the proposed methods outperform a state-of-the-art method in terms of the average number of words per sentence. △ Less

Submitted 28 January, 2023; originally announced January 2023.

Comments: Accepted for publication in ICC 2023 - IEEE International Conference on Communications, Rome, Italy, May 2023. arXiv admin note: substantial text overlap with arXiv:2301.03468

arXiv:2301.03468 [pdf, other]

doi 10.1109/TVT.2023.3333350

Knowledge-Aware Semantic Communication System Design and Data Allocation

Authors: Sachin Kadam, Dong In Kim

Abstract: The recent emergence of 6G raises the challenge of increasing the transmission data rate even further in order to overcome the Shannon limit. Traditional communication methods fall short of the 6G goals, paving the way for Semantic Communication (SemCom) systems that have applications in the metaverse, healthcare, economics, etc. In SemCom systems, only the relevant keywords from the data are extr… ▽ More The recent emergence of 6G raises the challenge of increasing the transmission data rate even further in order to overcome the Shannon limit. Traditional communication methods fall short of the 6G goals, paving the way for Semantic Communication (SemCom) systems that have applications in the metaverse, healthcare, economics, etc. In SemCom systems, only the relevant keywords from the data are extracted and used for transmission. In this paper, we design an auto-encoder and auto-decoder that only transmit these keywords and, respectively, recover the data using the received keywords and the shared knowledge. This SemCom system is used in a setup in which the receiver allocates various categories of the same dataset collected from the transmitter, which differ in size and accuracy, to a number of users. This scenario is formulated using an optimization problem called the data allocation problem (DAP). We show that it is NP-complete and propose a greedy algorithm to solve it. Using simulations, we show that the proposed methods for SemCom system design outperform state-of-the-art methods in terms of average number of words per sentence for a given accuracy, and that the proposed greedy algorithm solution of the DAP performs significantly close to the optimal solution. △ Less

Submitted 13 November, 2023; v1 submitted 30 December, 2022; originally announced January 2023.

Comments: Accepted for publication at IEEE Transactions on Vehicular Technology. It is an expanded version of the conference paper, which was presented at the IEEE ICC 2023. DOI: 10.1109/ICC45041.2023.10278770

arXiv:2301.03220 [pdf, other]

Enabling AI-Generated Content (AIGC) Services in Wireless Edge Networks

Authors: Hongyang Du, Zonghang Li, Dusit Niyato, Jiawen Kang, Zehui Xiong, Xuemin, Shen, Dong In Kim

Abstract: Artificial Intelligence-Generated Content (AIGC) refers to the use of AI to automate the information creation process while fulfilling the personalized requirements of users. However, due to the instability of AIGC models, e.g., the stochastic nature of diffusion models, the quality and accuracy of the generated content can vary significantly. In wireless edge networks, the transmission of incorre… ▽ More Artificial Intelligence-Generated Content (AIGC) refers to the use of AI to automate the information creation process while fulfilling the personalized requirements of users. However, due to the instability of AIGC models, e.g., the stochastic nature of diffusion models, the quality and accuracy of the generated content can vary significantly. In wireless edge networks, the transmission of incorrectly generated content may unnecessarily consume network resources. Thus, a dynamic AIGC service provider (ASP) selection scheme is required to enable users to connect to the most suited ASP, improving the users' satisfaction and quality of generated content. In this article, we first review the AIGC techniques and their applications in wireless networks. We then present the AIGC-as-a-service (AaaS) concept and discuss the challenges in deploying AaaS at the edge networks. Yet, it is essential to have performance metrics to evaluate the accuracy of AIGC services. Thus, we introduce several image-based perceived quality evaluation metrics. Then, we propose a general and effective model to illustrate the relationship between computational resources and user-perceived quality evaluation metrics. To achieve efficient AaaS and maximize the quality of generated content in wireless edge networks, we propose a deep reinforcement learning-enabled algorithm for optimal ASP selection. Simulation results show that the proposed algorithm can provide a higher quality of generated content to users and achieve fewer crashed tasks by comparing with four benchmarks, i.e., overloading-avoidance, random, round-robin policies, and the upper-bound schemes. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2211.14771 [pdf, other]

Performance Analysis of Free-Space Information Sharing in Full-Duplex Semantic Communications

Authors: Hongyang Du, Jiacheng Wang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Boon Hee Soong

Abstract: In next-generation Internet services, such as Metaverse, the mixed reality (MR) technique plays a vital role. Yet the limited computing capacity of the user-side MR headset-mounted device (HMD) prevents its further application, especially in scenarios that require a lot of computation. One way out of this dilemma is to design an efficient information sharing scheme among users to replace the heavy… ▽ More In next-generation Internet services, such as Metaverse, the mixed reality (MR) technique plays a vital role. Yet the limited computing capacity of the user-side MR headset-mounted device (HMD) prevents its further application, especially in scenarios that require a lot of computation. One way out of this dilemma is to design an efficient information sharing scheme among users to replace the heavy and repetitive computation. In this paper, we propose a free-space information sharing mechanism based on full-duplex device-to-device (D2D) semantic communications. Specifically, the view images of MR users in the same real-world scenario may be analogous. Therefore, when one user (i.e., a device) completes some computation tasks, the user can send his own calculation results and the semantic features extracted from the user's own view image to nearby users (i.e., other devices). On this basis, other users can use the received semantic features to obtain the spatial matching of the computational results under their own view images without repeating the computation. Using generalized small-scale fading models, we analyze the key performance indicators of full-duplex D2D communications, including channel capacity and bit error probability, which directly affect the transmission of semantic information. Finally, the numerical analysis experiment proves the effectiveness of our proposed methods. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2210.04308 [pdf, other]

doi 10.1109/JSTSP.2022.3224591

Privacy-preserving Intelligent Resource Allocation for Federated Edge Learning in Quantum Internet

Authors: Minrui Xu, Dusit Niyato, Zhaohui Yang, Zehui Xiong, Jiawen Kang, Dong In Kim, Xuemin, Shen

Abstract: Federated edge learning (FEL) is a promising paradigm of distributed machine learning that can preserve data privacy while training the global model collaboratively. However, FEL is still facing model confidentiality issues due to eavesdrop** risks of exchanging cryptographic keys through traditional encryption schemes. Therefore, in this paper, we propose a hierarchical architecture for quantum… ▽ More Federated edge learning (FEL) is a promising paradigm of distributed machine learning that can preserve data privacy while training the global model collaboratively. However, FEL is still facing model confidentiality issues due to eavesdrop** risks of exchanging cryptographic keys through traditional encryption schemes. Therefore, in this paper, we propose a hierarchical architecture for quantum-secured FEL systems with ideal security based on the quantum key distribution (QKD) to facilitate public key and model encryption against eavesdrop** attacks. Specifically, we propose a stochastic resource allocation model for efficient QKD to encrypt FEL keys and models. In FEL systems, remote FEL workers are connected to cluster heads via quantum-secured channels to train an aggregated global model collaboratively. However, due to the unpredictable number of workers at each location, the demand for secret-key rates to support secure model transmission to the server is unpredictable. The proposed systems need to efficiently allocate limited QKD resources (i.e., wavelengths) such that the total cost is minimized in the presence of stochastic demand by formulating the optimization problem for the proposed architecture as a stochastic programming model. To this end, we propose a federated reinforcement learning-based resource allocation scheme to solve the proposed model without complete state information. The proposed scheme enables QKD managers and controllers to train a global QKD resource allocation policy while kee** their private experiences local. Numerical results demonstrate that the proposed schemes can successfully achieve the cost-minimizing objective under uncertain demand while improving the training efficiency by about 50\% compared to state-of-the-art schemes. △ Less

Submitted 9 October, 2022; originally announced October 2022.

arXiv:2209.10332 [pdf, other]

Deep Learning for Multi-User MIMO Systems: Joint Design of Pilot, Limited Feedback, and Precoding

Authors: Jeonghyeon Jang, Hoon Lee, Il-Min Kim, Inkyu Lee

Abstract: In conventional multi-user multiple-input multiple-output (MU-MIMO) systems with frequency division duplexing (FDD), channel acquisition and precoder optimization processes have been designed separately although they are highly coupled. This paper studies an end-to-end design of downlink MU-MIMO systems which include pilot sequences, limited feedback, and precoding. To address this problem, we pro… ▽ More In conventional multi-user multiple-input multiple-output (MU-MIMO) systems with frequency division duplexing (FDD), channel acquisition and precoder optimization processes have been designed separately although they are highly coupled. This paper studies an end-to-end design of downlink MU-MIMO systems which include pilot sequences, limited feedback, and precoding. To address this problem, we propose a novel deep learning (DL) framework which jointly optimizes the feedback information generation at users and the precoder design at a base station (BS). Each procedure in the MU-MIMO systems is replaced by intelligently designed multiple deep neural networks (DNN) units. At the BS, a neural network generates pilot sequences and helps the users obtain accurate channel state information. At each user, the channel feedback operation is carried out in a distributed manner by an individual user DNN. Then, another BS DNN collects feedback information from the users and determines the MIMO precoding matrices. A joint training algorithm is proposed to optimize all DNN units in an end-to-end manner. In addition, a training strategy which can avoid retraining for different network sizes for a scalable design is proposed. Numerical results demonstrate the effectiveness of the proposed DL framework compared to classical optimization techniques and other conventional DNN schemes. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: to appear in IEEE Trans. Commun

arXiv:2209.03739 [pdf, other]

doi 10.1109/JPROC.2021.3132369.

Foundations of Wireless Information and Power Transfer: Theory, Prototypes, and Experiments

Authors: Bruno Clerckx, Junghoon Kim, Kae Won Choi, Dong In Kim

Abstract: As wireless has disrupted communications, wireless will also disrupt the delivery of energy. Future wireless networks will be equipped with (radiative) wireless power transfer (WPT) capability and exploit radio waves to carry both energy and information through a unified wireless information and power transfer (WIPT). Such networks will make the best use of the RF spectrum and radiation as well as… ▽ More As wireless has disrupted communications, wireless will also disrupt the delivery of energy. Future wireless networks will be equipped with (radiative) wireless power transfer (WPT) capability and exploit radio waves to carry both energy and information through a unified wireless information and power transfer (WIPT). Such networks will make the best use of the RF spectrum and radiation as well as the network infrastructure for the dual purpose of communicating and energizing. Consequently those networks will enable trillions of future low-power devices to sense, compute, connect, and energize anywhere, anytime, and on the move. In this paper, we review the foundations of such future system. We first give an overview of the fundamental theoretical building blocks of WPT and WIPT. Then we discuss some state-of-the-art experimental setups and prototypes of both WPT and WIPT and contrast theoretical and experimental results. We draw a special attention to how the integration of RF, signal and system designs in WPT and WIPT leads to new theoretical and experimental design challenges for both microwave and communication engineers and highlight some promising solutions. Topics and experimental testbeds discussed include closed-loop WPT and WIPT architectures with beamforming, waveform, channel acquisition, and single/multi-antenna energy harvester, centralized and distributed WPT, reconfigurable metasurfaces and intelligent surfaces for WPT, transmitter and receiver architecture for WIPT, modulation, rate-energy trade-off. Moreover, we highlight important theoretical and experimental research directions to be addressed for WPT and WIPT to become a foundational technology of future wireless networks. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Journal ref: in Proceedings of the IEEE, vol. 110, no. 1, pp. 8-30, Jan. 2022, doi: 10.1109/JPROC.2021.3132369

arXiv:2207.06451 [pdf, ps, other]

doi 10.1109/SSP49050.2021.9513836

Gridless Channel Estimation for MmWave Hybrid Massive MIMO Systems with Low-Resolution ADCs

Authors: In-soo Kim, Junil Choi

Abstract: This paper proposes the Newtonized fully corrective forward greedy selection-cross validation-based (NFCFGS-CV-based) channel estimator for millimeter (mmWave) hybrid massive multiple-input multiple-output (MIMO) systems with low-resolution analog-to-digital converters (ADCs). The proposed NFCFGS algorithm is a gridless compressed sensing (CS) technique that combines the FCFGS and Newtonized ortho… ▽ More This paper proposes the Newtonized fully corrective forward greedy selection-cross validation-based (NFCFGS-CV-based) channel estimator for millimeter (mmWave) hybrid massive multiple-input multiple-output (MIMO) systems with low-resolution analog-to-digital converters (ADCs). The proposed NFCFGS algorithm is a gridless compressed sensing (CS) technique that combines the FCFGS and Newtonized orthogonal matching pursuit (NOMP) algorithms. In particular, NFCFGS performs single path estimation over the continuum at each iteration based on the previously estimated paths. The CV technique is adopted as an indicator of termination in the absence of the prior knowledge on the number of paths, which is a model validation technique that prevents overfitting. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: to appear in SSP 2021, Rio de Janeiro, Brazil

Journal ref: 2021 IEEE Statistical Signal Processing Workshop (SSP), 2021, pp. 351-355

arXiv:2207.04096 [pdf, other]

doi 10.23919/OECC/PSC53152.2022.9849911

On Optimum Enumerative Sphere Sha** Blocklength at Different Symbol Rates for the Nonlinear Fiber Channel

Authors: Yunus Can Gültekin, Olga Vassilieva, Inwoong Kim, Paparao Palacharla, Chigo Okonkwo, Alex Alvarado

Abstract: We show that a 0.9 dB SNR improvement can be obtained via short-blocklength enumerative sphere sha** for single-span transmission at 56 GBd. This gain vanishes for higher symbol rates and a larger number of spans. We show that a 0.9 dB SNR improvement can be obtained via short-blocklength enumerative sphere sha** for single-span transmission at 56 GBd. This gain vanishes for higher symbol rates and a larger number of spans. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 3 pages, 3 figures, presented at the OECC/PSC 2022

arXiv:2206.06605 [pdf, ps, other]

doi 10.1109/TWC.2023.3273284

Bayesian Channel Estimation for Intelligent Reflecting Surface-Aided mmWave Massive MIMO Systems With Semi-Passive Elements

Authors: In-soo Kim, Mehdi Bennis, Jaeky Oh, Jaehoon Chung, Junil Choi

Abstract: In this paper, we propose a Bayesian channel estimator for intelligent reflecting surface-aided (IRS-aided) millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with semi-passive elements that can receive the signal in the active sensing mode. Ultimately, our goal is to minimize the channel estimation error using the received signal at the base station and additional info… ▽ More In this paper, we propose a Bayesian channel estimator for intelligent reflecting surface-aided (IRS-aided) millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with semi-passive elements that can receive the signal in the active sensing mode. Ultimately, our goal is to minimize the channel estimation error using the received signal at the base station and additional information acquired from a small number of active sensors at the IRS. Unlike recent works on channel estimation with semi-passive elements that require both uplink and downlink training signals to estimate the UE-IRS and IRS-BS links, we only use uplink training signals to estimate all the links. To compute the minimum mean squared error (MMSE) estimates of all the links, we propose a novel variational inference-sparse Bayesian learning (VI-SBL) channel estimator that performs approximate posterior inference on the channel using VI with the mean-field approximation under the SBL framework. The simulation results show that VI-SBL outperforms the state-of-the-art baselines for IRS with passive reflecting elements in terms of the channel estimation accuracy and training overhead. Furthermore, VI-SBL with semi-passive elements is shown to be more spectral- and energy-efficient than the baselines with passive reflecting elements. △ Less

Submitted 3 May, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: to appear in IEEE Transactions on Wireless Communications

Journal ref: IEEE Transactions on Wireless Communications, vol. 22, no. 12, pp. 9732-9745, Dec. 2023

arXiv:2206.03341 [pdf, other]

doi 10.1109/JLT.2022.3220402

Introducing 4D Geometric Shell Sha** for Mitigating Nonlinear Interference Noise

Authors: Sebastiaan Goossens, Yunus Can Gültekin, Olga Vassilieva, Inwoong Kim, Paparao Palacharla, Chigo Okonkwo, Alex Alvarado

Abstract: Four dimensional geometric shell sha** (4D-GSS) is introduced as an approach for closing the nonlinearity-caused sha** gap. This format is designed at the spectral efficiency of 8 b/4D-sym and is compared against polarization-multiplexed 16QAM (PM-16QAM) and probabilistically shaped PM-16QAM (PS-PM-16QAM) in a 400ZR-compatible transmission setup with high amount of nonlinearities. Reach increa… ▽ More Four dimensional geometric shell sha** (4D-GSS) is introduced as an approach for closing the nonlinearity-caused sha** gap. This format is designed at the spectral efficiency of 8 b/4D-sym and is compared against polarization-multiplexed 16QAM (PM-16QAM) and probabilistically shaped PM-16QAM (PS-PM-16QAM) in a 400ZR-compatible transmission setup with high amount of nonlinearities. Reach increase and nonlinearity tolerance are evaluated in terms of achievable information rates and post-FEC bit-error rate. Numerical simulations for a single-span, single-channel show that 4D-GSS achieves increased nonlinear tolerance and reach increase against PM-16QAM and PS-PM-16QAM when optimized for bit-metric decoding (RBMD). In terms of RBMD, gains are small with a reach increase of 1.7% compared to PM-16QAM. When optimizing for mutual information, a larger reach increase of 3% is achieved compared to PM-16QAM. Moreover, the introduced GSS scheme provides a scalable framework for designing well-structured 4D modulation formats with low complexity. △ Less

Submitted 5 January, 2024; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: 11 pages, 10 figures

arXiv:2206.03251 [pdf, other]

4D Geometric Shell Sha** with Applications to 400ZR

Authors: Sebastiaan Goossens, Yunus Can Gültekin, Olga Vassilieva, Inwoong Kim, Paparao Palacharla, Chigo Okonkwo, Alex Alvarado

Abstract: Geometric shell sha** is introduced and evaluated for reach increase and nonlinearity tolerance in terms of MI against PM-16QAM and PS-PM-16QAM in a 400ZR compatible transmission setup. Geometric shell sha** is introduced and evaluated for reach increase and nonlinearity tolerance in terms of MI against PM-16QAM and PS-PM-16QAM in a 400ZR compatible transmission setup. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 2 pages, 2 figures

arXiv:2205.07598 [pdf, ps, other]

doi 10.1109/TVT.2022.3184172

Cell-Free MmWave Massive MIMO Systems with Low-Capacity Fronthaul Links and Low-Resolution ADC/DACs

Authors: In-soo Kim, Mehdi Bennis, Junil Choi

Abstract: In this paper, we consider the uplink channel estimation phase and downlink data transmission phase of cell-free millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with low-capacity fronthaul links and low-resolution analog-to-digital converters/digital-to-analog converters (ADC/DACs). In cell-free massive MIMO, a control unit dictates the baseband processing at a geogr… ▽ More In this paper, we consider the uplink channel estimation phase and downlink data transmission phase of cell-free millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with low-capacity fronthaul links and low-resolution analog-to-digital converters/digital-to-analog converters (ADC/DACs). In cell-free massive MIMO, a control unit dictates the baseband processing at a geographical scale, while the base stations communicate with the control unit through fronthaul links. Unlike most of previous works in cell-free massive MIMO with finite-capacity fronthaul links, we consider the general case where the fronthaul capacity and ADC/DAC resolution are not necessarily the same. In particular, the fronthaul compression and ADC/DAC quantization occur independently where each one is modeled based on the information theoretic argument and additive quantization noise model (AQNM). Then, we address the codebook design problem that aims to minimize the channel estimation error for the independent and identically distributed (i.i.d.) and colored compression noise cases. Also, we propose an alternating optimization (AO) method to tackle the max-min fairness problem. In essence, the AO method alternates between two subproblems that correspond to the power allocation and codebook design problems. The AO method proposed for the zero-forcing (ZF) precoder is guaranteed to converge, whereas the one for the maximum ratio transmission (MRT) precoder has no such guarantee. Finally, the performance of the proposed schemes is evaluated by the simulation results in terms of both energy and spectral efficiency. The numerical results show that the proposed scheme for the ZF precoder yields spectral and energy efficiency 28% and 15% higher than that of the best baseline. △ Less

Submitted 15 June, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: to appear in IEEE Transactions on Vehicular Technology

Journal ref: IEEE Transactions on Vehicular Technology, vol. 71, no. 10, pp. 10512-10526, Oct. 2022

arXiv:2203.03969 [pdf, other]

A Dynamic Hierarchical Framework for IoT-assisted Metaverse Synchronization

Authors: Yue Han, Dusit Niyato, Cyril Leung, Dong In Kim, Kun Zhu, Shaohan Feng, Sherman Xuemin Shen, Chunyan Miao

Abstract: Metaverse has recently attracted much attention from both academia and industry. Virtual services, ranging from virtual driver training to online route optimization for smart goods delivery, are emerging in the Metaverse. To make the human experience of virtual life more real, digital twins (DTs), namely digital replicas of physical objects, are key enablers. However, DT status may not always accu… ▽ More Metaverse has recently attracted much attention from both academia and industry. Virtual services, ranging from virtual driver training to online route optimization for smart goods delivery, are emerging in the Metaverse. To make the human experience of virtual life more real, digital twins (DTs), namely digital replicas of physical objects, are key enablers. However, DT status may not always accurately reflect that of its real-world twin because the latter may be subject to changes with time. As such, it is necessary to synchronize a DT with its physical counterpart to ensure that its status is accurate for virtual businesses in the Metaverse. In this paper, we propose a dynamic hierarchical framework in which a group of IoT devices is incentivized to sense and collect physical objects' status information collectively so as to assists virtual service providers (VSPs) in synchronizing DTs. Based on the collected sensing data and the value decay rate of the DTs, the VSPs can determine synchronization intensities to maximize their payoffs. In our proposed dynamic hierarchical framework, the lower-level evolutionary game captures the VSPs selection by the IoT device population, and the upper-level differential game captures the VSPs payoffs, which are affected by the synchronization strategy, IoT devices selections, and the DTs value status, given VSPs are simultaneous decision makers. We further consider the case in which some VSPs are first movers and extend it as a Stackelberg differential game. We theoretically and experimentally show that the equilibrium to the lower-level game exists and is evolutionarily robust, and provide a sensitivity analysis with respect to various system parameters. Experiments show that the proposed dynamic hierarchical game outperform the baseline. △ Less

Submitted 14 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

arXiv:2203.02704 [pdf, ps, other]

Reconfigurable Intelligent Surface-Aided Joint Radar and Covert Communications: Fundamentals, Optimization, and Challenges

Authors: Hongyang Du, Jiawen Kang, Dusit Niyato, Jiayi Zhang, Dong In Kim

Abstract: Future wireless communication systems will evolve toward multi-functional integrated systems to improve spectrum utilization and reduce equipment sizes. A joint radar and communication (JRC) system, which can support simultaneous information transmission and target detection, has been regarded as a promising solution for emerging applications such as autonomous vehicles. In JRC, data security and… ▽ More Future wireless communication systems will evolve toward multi-functional integrated systems to improve spectrum utilization and reduce equipment sizes. A joint radar and communication (JRC) system, which can support simultaneous information transmission and target detection, has been regarded as a promising solution for emerging applications such as autonomous vehicles. In JRC, data security and privacy protection are critical issues. Thus, we first apply covert communication into JRC and propose a joint radar and covert communication (JRCC) system to achieve high spectrum utilization and secure data transmission simultaneously. In the JRCC system, an existence of sensitive data transmission is hidden from a maliciously observant warden. However, the performance of JRCC is restricted by severe signal propagation environment and hardware devices. Fortunately, reconfigurable intelligent surfaces (RISs) can change the signal propagation smartly to improve the networks performance with low cost. We first overview fundamental concepts of JRCC and RIS and then propose the RIS-aided JRCC system design. Furthermore, both covert communication and radar performance metrics are investigated and a game theory-based covert rate optimization scheme is designed to achieve secure communication. Finally, we present several promising applications and future directions of RIS-aided JRCC systems. △ Less

Submitted 7 March, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

arXiv:2203.00931 [pdf, other]

MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity

Authors: Sungjae Kim, Yewon Kim, Jewoo Jun, Injung Kim

Abstract: We propose a multi-singer emotional singing voice synthesizer, Muse-SVS, that expresses emotion at various intensity levels by controlling subtle changes in pitch, energy, and phoneme duration while accurately following the score. To control multiple style attributes while avoiding loss of fidelity and expressiveness due to interference between attributes, Muse-SVS represents all attributes and th… ▽ More We propose a multi-singer emotional singing voice synthesizer, Muse-SVS, that expresses emotion at various intensity levels by controlling subtle changes in pitch, energy, and phoneme duration while accurately following the score. To control multiple style attributes while avoiding loss of fidelity and expressiveness due to interference between attributes, Muse-SVS represents all attributes and their relations together by a joint embedding in a unified embedding space. Muse-SVS can express emotional intensity levels not included in the training data through embedding interpolation and extrapolation. We also propose a statistical pitch predictor to express pitch variance according to emotional intensity, and a context-aware residual duration predictor to prevent the accumulation of variances in phoneme duration, which is crucial for synchronization with instrumental parts. In addition, we propose a novel ASPP-Transformer, which combines atrous spatial pyramid pooling (ASPP) and Transformer, to improve fidelity and expressiveness by referring to broad contexts. In experiments, Muse-SVS exhibited improved fidelity, expressiveness, and synchronization performance compared with baseline models. The visualization results show that Muse-SVS effectively express the variance in pitch, energy, and phoneme duration according to emotional intensity. To the best of our knowledge, Muse-SVS is the first neural SVS capable of controlling emotional intensity. △ Less

Submitted 20 March, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: 13 pages, 11 figures

arXiv:2202.13799 [pdf, other]

One-shot Ultra-high-Resolution Generative Adversarial Network That Synthesizes 16K Images On A Single GPU

Authors: Junseok Oh, Donghwee Yoon, Injung Kim

Abstract: We propose a one-shot ultra-high-resolution generative adversarial network (OUR-GAN) framework that generates non-repetitive 16K (16, 384 x 8, 640) images from a single training image and is trainable on a single consumer GPU. OUR-GAN generates an initial image that is visually plausible and varied in shape at low resolution, and then gradually increases the resolution by adding detail through sup… ▽ More We propose a one-shot ultra-high-resolution generative adversarial network (OUR-GAN) framework that generates non-repetitive 16K (16, 384 x 8, 640) images from a single training image and is trainable on a single consumer GPU. OUR-GAN generates an initial image that is visually plausible and varied in shape at low resolution, and then gradually increases the resolution by adding detail through super-resolution. Since OUR-GAN learns from a real ultra-high-resolution (UHR) image, it can synthesize large shapes with fine details and long-range coherence, which is difficult to achieve with conventional generative models that rely on the patch distribution learned from relatively small images. OUR-GAN can synthesize high-quality 16K images with 12.5 GB of GPU memory and 4K images with only 4.29 GB as it synthesizes a UHR image part by part through seamless subregion-wise super-resolution. Additionally, OUR-GAN improves visual coherence while maintaining diversity by applying vertical positional convolution. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with the baseline one-shot synthesis models. To the best of our knowledge, OUR-GAN is the first one-shot image synthesizer that generates non-repetitive UHR images on a single consumer GPU. The synthesized image samples are presented at https://our-gan.github.io. △ Less

Submitted 28 August, 2023; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: 36 pages, 26 figures

arXiv:2201.00674 [pdf, other]

Mitigating Nonlinear Interference by Limiting Energy Variations in Sphere Sha**

Authors: Yunus Can Gültekin, Alex Alvarado, Olga Vassilieva, Inwoong Kim, Paparao Palacharla, Chigo Okonkwo, Frans M. J. Willems

Abstract: Band-trellis enumerative sphere sha** is proposed to decrease the energy variations in channel input sequences. Against sphere sha**, 0.74 dB SNR gain and up to 9% increase in data rates are demonstrated for single-span systems. Band-trellis enumerative sphere sha** is proposed to decrease the energy variations in channel input sequences. Against sphere sha**, 0.74 dB SNR gain and up to 9% increase in data rates are demonstrated for single-span systems. △ Less

Submitted 10 January, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

Comments: 3 pages, 4 figures, accepted to be presented at the OFC 2022, a few numeric typos are corrected (v2: a typo in the caption of Fig. 4 is fixed)

arXiv:2112.10471 [pdf, other]

High-Cardinality Hybrid Sha** for 4D Modulation Formats in Optical Communications Optimized via End-to-End Learning

Authors: Vinícius Oliari, Boris Karanov, Sebastiaan Goossens, Gabriele Liga, Olga Vassilieva, Inwoong Kim, Paparao Palacharla, Chigo Okonkwo, Alex Alvarado

Abstract: In this paper we carry out a joint optimization of probabilistic (PS) and geometric sha** (GS) for four-dimensional (4D) modulation formats in long-haul coherent wavelength division multiplexed (WDM) optical fiber communications using an auto-encoder framework. We propose a 4D 10 bits/symbol constellation which we obtained via end-to-end deep learning over the split-step Fourier model of the fib… ▽ More In this paper we carry out a joint optimization of probabilistic (PS) and geometric sha** (GS) for four-dimensional (4D) modulation formats in long-haul coherent wavelength division multiplexed (WDM) optical fiber communications using an auto-encoder framework. We propose a 4D 10 bits/symbol constellation which we obtained via end-to-end deep learning over the split-step Fourier model of the fiber channel. The constellation achieved 13.6% reach increase at a data rate of approximately 400 Gbits/second in comparison to the ubiquitously employed polarization multiplexed 32-QAM format at a forward error correction overhead of 20%. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: 5 pages, 3 figures

arXiv:2112.02954 [pdf]

Reinforcement Learning for Navigation of Mobile Robot with LiDAR

Authors: Inhwan Kim, Sarvar Hussain Nengroo, Dongsoo Har

Abstract: This paper presents a technique for navigation of mobile robot with Deep Q-Network (DQN) combined with Gated Recurrent Unit (GRU). The DQN integrated with the GRU allows action skip** for improved navigation performance. This technique aims at efficient navigation of mobile robot such as autonomous parking robot. Framework for reinforcement learning can be applied to the DQN combined with the GR… ▽ More This paper presents a technique for navigation of mobile robot with Deep Q-Network (DQN) combined with Gated Recurrent Unit (GRU). The DQN integrated with the GRU allows action skip** for improved navigation performance. This technique aims at efficient navigation of mobile robot such as autonomous parking robot. Framework for reinforcement learning can be applied to the DQN combined with the GRU in a real environment, which can be modeled by the Partially Observable Markov Decision Process (POMDP). By allowing action skip**, the ability of the DQN combined with the GRU in learning key-action can be improved. The proposed algorithm is applied to explore the feasibility of solution in real environment by the ROS-Gazebo simulator, and the simulation results show that the proposed algorithm achieves improved performance in navigation and collision avoidance as compared to the results obtained by DQN alone and DQN combined with GRU without allowing action skip**. △ Less

Submitted 7 December, 2021; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: 7 pages, 7 figures, Accepted by "5th International Conference on Electronics, Communication and Aerospace Technology (ICECA 2021)"

arXiv:2110.02509 [pdf, other]

Design and Implementation of 5.8GHz RF Wireless PowerTransfer System

Authors: Je Hyeon Park, Nguyen Minh Tran, Sa Il Hwang, Dong In Kim, Kae Won Choi

Abstract: In this paper, we present a 5.8 GHz radio-frequency (RF) wireless power transfer (WPT) system that consists of 64 transmit antennas and 16 receive antennas. Unlike the inductive or resonant coupling-based near-field WPT, RF WPT has a great advantage in powering low-power internet of things (IoT) devices with its capability of long-range wireless power transfer. We also propose a beam scanning algo… ▽ More In this paper, we present a 5.8 GHz radio-frequency (RF) wireless power transfer (WPT) system that consists of 64 transmit antennas and 16 receive antennas. Unlike the inductive or resonant coupling-based near-field WPT, RF WPT has a great advantage in powering low-power internet of things (IoT) devices with its capability of long-range wireless power transfer. We also propose a beam scanning algorithm that can effectively transfer the power no matter whether the receiver is located in the radiative near-field zone or far-field zone. The proposed beam scanning algorithm is verified with a real-life WPT testbed implemented by ourselves. By experiments, we confirm that the implemented 5.8 GHz RF WPT system is able to transfer 3.67 mW at a distance of 25 meters with the proposed beam scanning algorithm. Moreover, the results show that the proposed algorithm can effectively cover radiative near-field region differently from the conventional scanning schemes which are designed under the assumption of the far-field WPT. △ Less

Submitted 6 October, 2021; originally announced October 2021.

arXiv:2108.10080 [pdf, other]

doi 10.1109/ECOC52684.2021.9605938

On Kurtosis-limited Enumerative Sphere Sha** for Reach Increase in Single-span Systems

Authors: Yunus Can Gültekin, Alex Alvarado, Olga Vassilieva, Inwoong Kim, Paparao Palacharla, Chigo M. Okonkwo, Frans M. J. Willems

Abstract: The effect of decreasing the kurtosis of channel inputs is investigated for the first time with an algorithmic sha** implementation. No significant gains in decoding performance are observed for multi-span systems, while an increase in reach is obtained for single-span transmission. The effect of decreasing the kurtosis of channel inputs is investigated for the first time with an algorithmic sha** implementation. No significant gains in decoding performance are observed for multi-span systems, while an increase in reach is obtained for single-span transmission. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: 4 pages, 5 figures, accepted to be presented at the ECOC 2021

arXiv:2106.13937 [pdf, ps, other]

Unified Simultaneous Wireless Information and Power Transfer for IoT: Signaling and Architecture with Deep Learning Adaptive Control

Authors: Jong ** Park, Jong Ho Moon, Hyeon Ho Jang, Dong In Kim

Abstract: In this paper, we propose a unified SWIPT signal and its architecture design in order to take advantage of both single tone and multi-tone signaling by adjusting only the power allocation ratio of a unified signal. For this, we design a novel unified and integrated receiver architecture for the proposed unified SWIPT signaling, which consumes low power with an envelope detection. To relieve the co… ▽ More In this paper, we propose a unified SWIPT signal and its architecture design in order to take advantage of both single tone and multi-tone signaling by adjusting only the power allocation ratio of a unified signal. For this, we design a novel unified and integrated receiver architecture for the proposed unified SWIPT signaling, which consumes low power with an envelope detection. To relieve the computational complexity of the receiver, we propose an adaptive control algorithm by which the transmitter adjusts the communication mode through temporal convolutional network (TCN) based asymmetric processing. To this end, the transmitter optimizes the modulation index and power allocation ratio in short-term scale while updating the mode switching threshold in long-term scale. We demonstrate that the proposed unified SWIPT system improves the achievable rate under the self-powering condition of low-power IoT devices. Consequently it is foreseen to effectively deploy low-power IoT networks that concurrently supply both information and energy wirelessly to the devices by using the proposed unified SWIPT and adaptive control algorithm in place at the transmitter side. △ Less

Submitted 25 June, 2021; originally announced June 2021.

Comments: 15 pages, 15 figures

arXiv:2106.11805 [pdf, other]

doi 10.1109/JIOT.2022.3179691

Reconfigurable Intelligent Surface-Aided Wireless Power Transfer Systems: Analysis and Implementation

Authors: Nguyen Minh Tran, Muhammad Miftahul Amri, Je Hyeon Park, Dong In Kim, Kae Won Choi

Abstract: Reconfigurable intelligent surface (RIS) is a promising technology for RF wireless power transfer (WPT) as it is capable of beamforming and beam focusing without using active and power-hungry components. In this paper, we propose a multi-tile RIS beam scanning (MTBS) algorithm for powering up internet-of-things (IoT) devices. Considering the hardware limitations of the IoT devices, the proposed al… ▽ More Reconfigurable intelligent surface (RIS) is a promising technology for RF wireless power transfer (WPT) as it is capable of beamforming and beam focusing without using active and power-hungry components. In this paper, we propose a multi-tile RIS beam scanning (MTBS) algorithm for powering up internet-of-things (IoT) devices. Considering the hardware limitations of the IoT devices, the proposed algorithm requires only power information to enable the beam focusing capability of the RIS. Specifically, we first divide the RIS into smaller RIS tiles. Then, all RIS tiles and the phased array transmitter are iteratively scanned and optimized to maximize the receive power. We elaborately analyze the proposed algorithm and build a simulator to verify it. Furthermore, we have built a real-life testbed of RIS-aided WPT systems to validate the algorithm. The experimental results show that the proposed MTBS algorithm can properly control the transmission phase of the transmitter and the reflection phase of the RIS to focus the power at the receiver. Consequently, after executing the algorithm, about 20 dB improvement of the receive power is achieved compared to the case that all unit cells of the RIS are in OFF state. By experiments, we confirm that the RIS with the MTBS algorithm can greatly enhance the power transfer efficiency. △ Less

Submitted 13 March, 2022; v1 submitted 12 June, 2021; originally announced June 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2106.11171 [pdf, other]

UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control

Authors: Minsu Kang, Sungjae Kim, Injung Kim

Abstract: We propose a novel high-fidelity expressive speech synthesis model, UniTTS, that learns and controls overlap** style attributes avoiding interference. UniTTS represents multiple style attributes in a single unified embedding space by the residuals between the phoneme embeddings before and after applying the attributes. The proposed method is especially effective in controlling multiple attribute… ▽ More We propose a novel high-fidelity expressive speech synthesis model, UniTTS, that learns and controls overlap** style attributes avoiding interference. UniTTS represents multiple style attributes in a single unified embedding space by the residuals between the phoneme embeddings before and after applying the attributes. The proposed method is especially effective in controlling multiple attributes that are difficult to separate cleanly, such as speaker ID and emotion, because it minimizes redundancy when adding variance in speaker ID and emotion, and additionally, predicts duration, pitch, and energy based on the speaker ID and emotion. In experiments, the visualization results exhibit that the proposed methods learned multiple attributes harmoniously in a manner that can be easily separated again. As well, UniTTS synthesized high-fidelity speech signals controlling multiple style attributes. The synthesized speech samples are presented at https://anonymous-authors2022.github.io/paper_works/UniTTS/demos/. △ Less

Submitted 28 February, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 20 pages, 11 figures

arXiv:2106.05735 [pdf, other]

doi 10.1038/s41467-022-30695-9

The Medical Segmentation Decathlon

Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts. △ Less

Submitted 10 June, 2021; originally announced June 2021.

MSC Class: 68T07

arXiv:2106.05671 [pdf, ps, other]

doi 10.1109/TVT.2021.3089742

Outage Performance of $3$D Mobile UAV Caching for Hybrid Satellite-Terrestrial Networks

Authors: Pankaj K. Sharma, Deepika Gupta, Dong In Kim

Abstract: In this paper, we consider a hybrid satellite-terrestrial network (HSTN) where a multiantenna satellite communicates with a ground user equipment (UE) with the help of multiple cache-enabled amplify-and-forward (AF) three-dimensional ($3$D) mobile unmanned aerial vehicle (UAV) relays. Herein, we employ the two fundamental most popular content (MPC) and uniform content (UC) caching schemes for two… ▽ More In this paper, we consider a hybrid satellite-terrestrial network (HSTN) where a multiantenna satellite communicates with a ground user equipment (UE) with the help of multiple cache-enabled amplify-and-forward (AF) three-dimensional ($3$D) mobile unmanned aerial vehicle (UAV) relays. Herein, we employ the two fundamental most popular content (MPC) and uniform content (UC) caching schemes for two types of mobile UAV relays, namely fully $3$D and fixed height. Taking into account the multiantenna satellite links and the random $3$D distances between UAV relays and UE, we analyze the outage probability (OP) of considered system with MPC and UC caching schemes. We further carry out the corresponding asymptotic OP analysis to present the insights on achievable performance gains of two schemes for both types of $3$D mobile UAV relaying. Specifically, we show the following: (a) MPC caching dominates the UC and no caching schemes; (b) fully $3$D mobile UAV relaying outperforms its fixed height counterpart. We finally corroborate the theoretic analysis by simulations. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 17 pages, 3 figures, Submitted to IEEE for possible publication

arXiv:2105.14794 [pdf, other]

doi 10.1109/JLT.2021.3120915

Kurtosis-limited Sphere Sha** for Nonlinear Interference Noise Reduction in Optical Channels

Authors: Yunus Can Gültekin, Alex Alvarado, Olga Vassilieva, Inwoong Kim, Paparao Palacharla, Chigo Okonkwo, Frans M. J. Willems

Abstract: Nonlinear interference (NLI) generated during the propagation of an optical waveform through the fiber depends on the fourth order standardized moment of the channel input distribution, also known as kurtosis. Probabilistically-shaped inputs optimized for the linear Gaussian channel have a Gaussian-like distribution with high kurtosis. For optical channels, this leads to an increase in NLI power a… ▽ More Nonlinear interference (NLI) generated during the propagation of an optical waveform through the fiber depends on the fourth order standardized moment of the channel input distribution, also known as kurtosis. Probabilistically-shaped inputs optimized for the linear Gaussian channel have a Gaussian-like distribution with high kurtosis. For optical channels, this leads to an increase in NLI power and consequently, a decrease in effective signal-to-noise ratio (SNR). In this work, we propose kurtosis-limited enumerative sphere sha** (K-ESS) as an algorithm to generate low-kurtosis shaped inputs. Numerical simulations at a sha** blocklength of 108 amplitudes demonstrate that with K-ESS, it is possible to increase the effective SNRs by 0.4 dB in a single-span single-channel scenario at 400 Gbit/s. K-ESS offers also a twofold decrease in frame error rate with respect to Gaussian-channel-optimal sphere sha**. △ Less

Submitted 14 October, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: 11 pages, 16 figures

arXiv:2104.00624 [pdf, ps, other]

Fast DCTTS: Efficient Deep Convolutional Text-to-Speech

Authors: Minsu Kang, Jihyun Lee, Simin Kim, Injung Kim

Abstract: We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesizes speech in real time on a single CPU thread. The proposed model is composed of a carefully-tuned lightweight network designed by applying multiple network reduction and fidelity improvement techniques. In addition, we propose a novel group highway activation that can compromise between computational efficiency and the regular… ▽ More We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesizes speech in real time on a single CPU thread. The proposed model is composed of a carefully-tuned lightweight network designed by applying multiple network reduction and fidelity improvement techniques. In addition, we propose a novel group highway activation that can compromise between computational efficiency and the regularization effect of the gating mechanism. As well, we introduce a new metric called Elastic mel-cepstral distortion (EMCD) to measure the fidelity of the output mel-spectrogram. In experiments, we analyze the effect of the acceleration techniques on speed and speech quality. Compared with the baseline model, the proposed model exhibits improved MOS from 2.62 to 2.74 with only 1.76% computation and 2.75% parameters. The speed on a single CPU thread was improved by 7.45 times, which is fast enough to produce mel-spectrogram in real time without GPU. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: 5 pages, 1 figure, to be published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021

arXiv:2101.10158 [pdf, ps, other]

doi 10.1109/TWC.2021.3054998

Spatial Wideband Channel Estimation for MmWave Massive MIMO Systems with Hybrid Architectures and Low-Resolution ADCs

Authors: In-soo Kim, Junil Choi

Abstract: In this paper, a channel estimator for wideband millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with hybrid architectures and low-resolution analog-to-digital converters (ADCs) is proposed. To account for the propagation delay across the antenna array, which cannot be neglected in wideband mmWave massive MIMO systems, the discrete time channel that models the spatial… ▽ More In this paper, a channel estimator for wideband millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems with hybrid architectures and low-resolution analog-to-digital converters (ADCs) is proposed. To account for the propagation delay across the antenna array, which cannot be neglected in wideband mmWave massive MIMO systems, the discrete time channel that models the spatial wideband effect is developed. Also, the training signal design that addresses inter-frame, inter-user, and inter-symbol interferences is investigated when the spatial wideband effect is not negligible. To estimate the channel parameters over the continuum based on the maximum a posteriori (MAP) criterion, the Newtonized fully corrective forward greedy selection-cross validation-based (NFCFGS-CV-based) channel estimator is proposed. NFCFGS-CV is a gridless compressed sensing (CS) algorithm, whose termination condition is determined by the CV technique. The CV-based termination condition is proved to achieve the minimum squared error (SE). The simulation results show that NFCFGS-CV outperforms state-of-the-art on-grid CS-based channel estimators. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: to appear in IEEE Transactions on Wireless Communications

Journal ref: IEEE Transactions on Wireless Communications, vol. 20, no. 6, pp. 4016-4029, June 2021

arXiv:2101.10157 [pdf, ps, other]

doi 10.1109/WCNCW49093.2021.9420030

Performance of Cell-Free MmWave Massive MIMO Systems with Fronthaul Compression and DAC Quantization

Authors: In-soo Kim, Junil Choi

Abstract: In this paper, the zero-forcing (ZF) precoder with max-min power allocation is proposed for cell-free millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems using low-resolution digital-to-analog converters (DACs) with limited-capacity fronthaul links. The proposed power allocation aims to achieve max-min fairness on the achievable rate lower bounds of the users obtained by… ▽ More In this paper, the zero-forcing (ZF) precoder with max-min power allocation is proposed for cell-free millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems using low-resolution digital-to-analog converters (DACs) with limited-capacity fronthaul links. The proposed power allocation aims to achieve max-min fairness on the achievable rate lower bounds of the users obtained by the additive quantization noise model (AQNM), which mimics the effect of low-resolution DACs. To solve the max-min power allocation problem, an alternating optimization (AO) method is proposed, which is guaranteed to converge because the global optima of the subproblems that constitute the original problem are attained at each AO iteration. The performance of cell-free and small-cell systems is explored in the simulation results, which suggest that not-too-small fronthaul capacity suffices for cell-free systems to outperform small-cell systems. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: to appear in WCNCW 2021, Nan**g, China

Journal ref: 2021 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), 2021, pp. 1-6

arXiv:2010.14742 [pdf, other]

ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications

Authors: Hochul Hwang, Cheongjae Jang, Geonwoo Park, Junghyun Cho, Ig-Jae Kim

Abstract: To train deep learning models for vision-based action recognition of elders' daily activities, we need large-scale activity datasets acquired under various daily living environments and conditions. However, most public datasets used in human action recognition either differ from or have limited coverage of elders' activities in many aspects, making it challenging to recognize elders' daily activit… ▽ More To train deep learning models for vision-based action recognition of elders' daily activities, we need large-scale activity datasets acquired under various daily living environments and conditions. However, most public datasets used in human action recognition either differ from or have limited coverage of elders' activities in many aspects, making it challenging to recognize elders' daily activities well by only utilizing existing datasets. Recently, such limitations of available datasets have actively been compensated by generating synthetic data from realistic simulation environments and using those data to train deep learning models. In this paper, based on these ideas we develop ElderSim, an action simulation platform that can generate synthetic data on elders' daily activities. For 55 kinds of frequent daily activities of the elders, ElderSim generates realistic motions of synthetic characters with various adjustable data-generating options, and provides different output modalities including RGB videos, two- and three-dimensional skeleton trajectories. We then generate KIST SynADL, a large-scale synthetic dataset of elders' activities of daily living, from ElderSim and use the data in addition to real datasets to train three state-of the-art human action recognition models. From the experiments following several newly proposed scenarios that assume different real and synthetic dataset configurations for training, we observe a noticeable performance improvement by augmenting our synthetic data. We also offer guidance with insights for the effective utilization of synthetic data to help recognize elders' daily activities. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: 18 pages, 9 figures

arXiv:2008.12938 [pdf, other]

Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks

Authors: Shimin Gong, Jiaye Lin, **bei Zhang, Dusit Niyato, Dong In Kim, Mohsen Guizani

Abstract: Intelligent reflecting surface (IRS) has been recently employed to reshape the wireless channels by controlling individual scattering elements' phase shifts, namely, passive beamforming. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity and inexact channel information. In this article, we focus on machine learning (ML… ▽ More Intelligent reflecting surface (IRS) has been recently employed to reshape the wireless channels by controlling individual scattering elements' phase shifts, namely, passive beamforming. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity and inexact channel information. In this article, we focus on machine learning (ML) approaches for performance maximization in IRS-assisted wireless networks. In general, ML approaches provide enhanced flexibility and robustness against uncertain information and imprecise modeling. Practical challenges still remain mainly due to the demand for a large dataset in offline training and slow convergence in online learning. These observations motivate us to design a novel optimization-driven ML framework for IRS-assisted wireless networks, which takes both advantages of the efficiency in model-based optimization and the robustness in model-free ML approaches. By splitting the decision variables into two parts, one part is obtained by the outer-loop ML approach, while the other part is optimized efficiently by solving an approximate problem. Numerical results verify that the optimization-driven ML approach can improve both the convergence and the reward performance compared to conventional model-free learning approaches. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Comments: submitted to IEEE Communications Magazine

arXiv:2007.15256 [pdf, other]

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Authors: **hyeok Yang, Junmo Lee, Youngik Kim, Hoonyoung Cho, Injung Kim

Abstract: We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time. However, it often produces a waveform that is insufficient in quality or inconsistent with acoustic characteristics of the input mel spectrogram. VocGAN is nearly as fast as MelGAN, but it significantly improves the quality and consistency… ▽ More We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time. However, it often produces a waveform that is insufficient in quality or inconsistent with acoustic characteristics of the input mel spectrogram. VocGAN is nearly as fast as MelGAN, but it significantly improves the quality and consistency of the output waveform. VocGAN applies a multi-scale waveform generator and a hierarchically-nested discriminator to learn multiple levels of acoustic properties in a balanced way. It also applies the joint conditional and unconditional objective, which has shown successful results in high-resolution image synthesis. In experiments, VocGAN synthesizes speech waveforms 416.7x faster on a GTX 1080Ti GPU and 3.24x faster on a CPU than real-time. Compared with MelGAN, it also exhibits significantly improved quality in multiple evaluation metrics including mean opinion score (MOS) with minimal additional overhead. Additionally, compared with Parallel WaveGAN, another recently developed high-fidelity vocoder, VocGAN is 6.98x faster on a CPU and exhibits higher MOS. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: Accepted to INTERSPEECH 2020

arXiv:2007.13146 [pdf, other]

Radio Resource Management in Joint Radar and Communication: A Comprehensive Survey

Authors: Nguyen Cong Luong, Xiao Lu, Dinh Thai Hoang, Dusit Niyato, Dong In Kim

Abstract: Joint radar and communication (JRC) has recently attracted substantial attention. The first reason is that JRC allows individual radar and communication systems to share spectrum bands and thus improves the spectrum utilization. The second reason is that JRC enables a single hardware platform, e.g., an autonomous vehicle or a UAV, to simultaneously perform the communication function and the radar… ▽ More Joint radar and communication (JRC) has recently attracted substantial attention. The first reason is that JRC allows individual radar and communication systems to share spectrum bands and thus improves the spectrum utilization. The second reason is that JRC enables a single hardware platform, e.g., an autonomous vehicle or a UAV, to simultaneously perform the communication function and the radar function. As a result, JRC is able to improve the efficiency of resources, i.e., spectrum and energy, reduce the system size, and minimize the system cost. However, there are several challenges to be solved for the JRC design. In particular, sharing the spectrum imposes the interference caused by the systems, and sharing the hardware platform and energy resource complicates the design of the JRC transmitter and compromises the performance of each function. To address the challenges, several resource management approaches have been recently proposed, and this paper presents a comprehensive literature review on resource management for JRC. First, we give fundamental concepts of JRC, important performance metrics used in JRC systems, and applications of the JRC systems. Then, we review and analyze resource management approaches, i.e., spectrum sharing, power allocation, and interference management, for JRC. In addition, we present security issues to JRC and provide a discussion of countermeasures to the security issues. Finally, we highlight important challenges in the JRC design and discuss future research directions related to JRC. △ Less

Submitted 28 January, 2021; v1 submitted 26 July, 2020; originally announced July 2020.

Showing 1–50 of 68 results for author: Kim, I