Search | arXiv e-print repository

arXiv:2406.19246 [pdf, other]

An Interpretable and Efficient Sleep Staging Algorithm: DetectsleepNet

Abstract: Sleep quality directly impacts human health and quality of life, so accurate sleep staging is essential for assessing sleep quality. However, most traditional methods are inefficient and time-consuming due to segmenting different sleep cycles by manual labeling. In contrast, automated sleep staging technology not only directly assesses sleep quality but also helps sleep specialists analyze sleep s… ▽ More Sleep quality directly impacts human health and quality of life, so accurate sleep staging is essential for assessing sleep quality. However, most traditional methods are inefficient and time-consuming due to segmenting different sleep cycles by manual labeling. In contrast, automated sleep staging technology not only directly assesses sleep quality but also helps sleep specialists analyze sleep status, significantly improving efficiency and reducing the cost of sleep monitoring, especially for continuous sleep monitoring. Most of the existing models, however, are deficient in computational efficiency, lightweight design, and model interpretability. In this paper, we propose a neural network architecture based on the prior knowledge of sleep experts. Specifically, 1) Propose an end-to-end model named DetectsleepNet that uses single-channel EEG signals without additional data processing, which has achieved an impressive 80.9% accuracy on the SHHS dataset and an outstanding 88.0% accuracy on the Physio2018 dataset. 2) Constructure an efficient lightweight sleep staging model named DetectsleepNet-tiny based on DetectsleepNet, which has just 6% of the parameter numbers of existing models, but its accuracy exceeds 99% of state-of-the-art models, 3) Introducing a specific inference header to assess the attention given to a specific EEG segment in each sleep frame, enhancing the transparency in the decisions of models. Our model comprises fewer parameters compared to existing ones and ulteriorly explores the interpretability of the model to facilitate its application in healthcare. The code is available at https://github.com/komdec/DetectSleepNet.git. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 25 pages, 11 figures

arXiv:2406.16878 [pdf, ps, other]

Benchmarking Semantic Communications for Image Transmission Over MIMO Interference Channels

Authors: Yanhu Wang, Shuaishuai Guo, Anming Dong, Hui Zhao

Abstract: Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference cha… ▽ More Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference channels, where we propose an interference-robust semantic communication (IRSC) scheme. This scheme involves the development of transceivers based on neural networks (NNs), which integrate channel state information (CSI) either solely at the receiver or at both transmitter and receiver ends. Moreover, we establish a composite loss function for training IRSC transceivers, along with a dynamic mechanism for updating the weights of various components in the loss function to enhance system fairness among users. Experimental results demonstrate that the proposed IRSC scheme effectively learns to mitigate interference and outperforms baseline approaches, particularly in low signal-to-noise (SNR) regimes. △ Less

Submitted 10 April, 2024; originally announced June 2024.

arXiv:2406.14067 [pdf]

A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz. The IF LFM signal is converted to the optical domain via an intensity modulator and then filtered by a fiber Bragg grating (FBG) to generate only two 2nd-order optical LFM sidebands. In radar detection, the two optical LFM sidebands beat with each other to generate a frequency-and-bandwidth-quadrupled LFM signal, which is used for ranging, radial velocity measurement, and imaging. By changing the center frequency of the IF LFM signal, the radar function can be operated within 8 to 40 GHz. In spectrum sensing, one 2nd-order optical LFM sideband is selected by another FBG, which then works in conjunction with the stimulated Brillouin scattering gain spectrum to map the frequency of the signal under test to time with an instantaneous measurement bandwidth of 2 GHz. By using a frequency shift module to adjust the pump frequency, the frequency measurement range can be adjusted from 0 to 40 GHz. The prototype is comprehensively studied and tested, which is capable of achieving a range resolution of 3.75 cm, a range error of less than $\pm$ 2 cm, a radial velocity error within $\pm$ 1 cm/s, delivering clear imaging of multiple small targets, and maintaining a frequency measurement error of less than $\pm$ 7 MHz and a frequency resolution of better than 20 MHz. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 18 pages, 12 figures, 1 table

arXiv:2406.06937 [pdf, other]

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

Authors: Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang

Abstract: Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee… ▽ More Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization between the speaker and listener. To overcome these challenges, we propose a novel non-autoregressive generation framework for simultaneous speech translation (NAST-S2X), which integrates speech-to-text and speech-to-speech tasks into a unified end-to-end framework. We develop a non-autoregressive decoder capable of concurrently generating multiple text or acoustic unit tokens upon receiving fixed-length speech chunks. The decoder can generate blank or repeated tokens and employ CTC decoding to dynamically adjust its latency. Experimental results show that NAST-S2X outperforms state-of-the-art models in both speech-to-text and speech-to-speech tasks. It achieves high-quality simultaneous interpretation within a delay of less than 3 seconds and provides a 28 times decoding speedup in offline generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: ACL 2024; Codes and demos are at https://github.com/ictnlp/NAST-S2x

arXiv:2406.03049 [pdf, other]

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

Abstract: Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing… ▽ More Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing a double challenge of translation and policy. In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. Adhering to a multi-task learning approach, StreamSpeech can perform offline and simultaneous speech recognition, speech translation and speech synthesis via an "All-in-One" seamless model. Experiments on CVSS benchmark demonstrate that StreamSpeech achieves state-of-the-art performance in both offline S2ST and Simul-S2ST tasks. Besides, StreamSpeech is able to present high-quality intermediate results (i.e., ASR or translation results) during simultaneous translation process, offering a more comprehensive real-time communication experience. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/StreamSpeech-site/

arXiv:2405.16011 [pdf, ps, other]

Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models

Authors: Shuaishuai Guo, Yanhu Wang, Jia Ye, Anbang Zhang, Kun Xu

Abstract: Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visua… ▽ More Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visual data, expressed as natural language descriptions. These are further refined using a pre-trained large language model (LLM) for importance quantification and semantic error correction. The subsequent semantic importance-aware communications (SIAC) aim to minimize semantic loss while respecting transmission delay constraints, exemplified through adaptive modulation and coding strategies. At the receiving end, LLM-based semantic error correction is utilized. If visual data recreation is desired, a pre-trained generative artificial intelligence (AI) model can regenerate it using the corrected descriptions. We assess semantic similarities between transmitted and recovered content, demonstrating ULSC's superior ability to convey semantic understanding compared to feature-level semantic communications (FLSC). ULSC's conversion of visual data to natural language facilitates various cognitive tasks, leveraging human knowledge bases. Additionally, this method enhances privacy, as neither original data nor features are directly transmitted. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.09552 [pdf, other]

ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

Authors: Jiayi Wang, Yi-An Mao, Xiaoyu Ma, Sicen Guo, Yuting Shao, Xiao Lv, Wenting Han, Mark Christopher, Linda M. Zangwill, Yanlong Bi, Rui Fan

Abstract: Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose seman… ▽ More Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose semantic segmentation methods using convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks specifically trained for ONH detection. Therefore, in this article, we make contributions from three key aspects: network design, the publication of a dataset, and the establishment of a comprehensive benchmark. Our newly developed ONH detection network, referred to as ODFormer, is based upon the Swin Transformer architecture and incorporates two novel components: a multi-scale context aggregator and a lightweight bidirectional feature recalibrator. Our published large-scale dataset, known as TongjiU-DROD, provides multi-resolution fundus images for each participant, captured using two distinct types of cameras. Our established benchmark involves three datasets: DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD, created by researchers from different countries and containing fundus images captured from participants of diverse races and ages. Extensive experimental results demonstrate that our proposed ODFormer outperforms other state-of-the-art (SoTA) networks in terms of performance and generalizability. Our dataset and source code are publicly available at mias.group/ODFormer. △ Less

Submitted 2 June, 2024; v1 submitted 15 April, 2024; originally announced May 2024.

arXiv:2404.13905 [pdf, other]

SI-FID: Only One Objective Indicator for Evaluating Stitched Images

Authors: Xinrui Zhang, Shengwei Guo, Guobing Sun

Abstract: Image quality evaluation accurately is vital in develo** image stitching algorithms as it directly reflects the algorithms progress. However, commonly used objective indicators always produce inconsistent and even conflicting results with subjective indicators. To enhance the consistency between objective and subjective evaluations, this paper introduces a novel indicator the Frechet Distance fo… ▽ More Image quality evaluation accurately is vital in develo** image stitching algorithms as it directly reflects the algorithms progress. However, commonly used objective indicators always produce inconsistent and even conflicting results with subjective indicators. To enhance the consistency between objective and subjective evaluations, this paper introduces a novel indicator the Frechet Distance for Stitched Images (SI-FID). To be specific, our training network employs the contrastive learning architecture overall. We employ data augmentation approaches that serve as noise to distort images in the training set. Both the initial and distorted training sets are then input into the pre-training model for fine-tuning. We then evaluate the altered FID after introducing interference to the test set and examine if the noise can improve the consistency between objective and subjective evaluation results. The rank correlation coefficient is utilized to measure the consistency. SI-FID is an altered FID that generates the highest rank correlation coefficient under the effect of a certain noise. The experimental results demonstrate that the rank correlation coefficient obtained by SI-FID is at least 25% higher than other objective indicators, which means achieving evaluation results closer to human subjective evaluation. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 17 pages, 9 figures

arXiv:2404.06393 [pdf, other]

MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2403.16468 [pdf, ps, other]

Unified Integrated Sensing and Communication Signal Design: A Sphere Packing Perspective

Authors: Shuaishuai Guo, Kaiqian Qu

Abstract: The design of communication signal sets is fundamentally a sphere packing problem. It aims to identify a set of M points in an N -dimensional space, with the objective of maximizing the separability of points that represent different bits.In contrast, signals used for sensing targets should ideally be asdeterministic as possible. This paper explores the inherent conflict and trade-off between comm… ▽ More The design of communication signal sets is fundamentally a sphere packing problem. It aims to identify a set of M points in an N -dimensional space, with the objective of maximizing the separability of points that represent different bits.In contrast, signals used for sensing targets should ideally be asdeterministic as possible. This paper explores the inherent conflict and trade-off between communication and sensing when these functions are combined within the same signal set. We present a unified approach to signal design in the time, frequency, and space domains for integrated sensing and communication (ISAC), framing it as a modified sphere packing problem. Through adept formula manipulation, this problem is transformed into a large-scale quadratic constrained quadratic programming (QCQP) challenge. We propose an augmented Lagrangian and dual ascent (ALDA) algorithm for iterative problem-solving. The computational complexity of this approach is analyzed and found to be daunting for large, high-dimensional signal set designs. To address this, we introduce a bit-dimension-power splitting (BDPS) method. This method decomposes the large-scale QCQP into a series of smaller-scale problems that can be solved more efficiently and in parallel, significantly reducing the overall computational load. Extensive simulations have been conducted to validate the effectiveness of our proposed signal design methods in the context of ISAC. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: submitted to IEEE TCOM

arXiv:2401.05446 [pdf, other]

Self-supervised Learning for Electroencephalogram: A Systematic Survey

Authors: Weining Weng, Yang Gu, Shuai Guo, Yuan Ma, Zhaohua Yang, Yuchen Liu, Yiqiang Chen

Abstract: Electroencephalogram (EEG) is a non-invasive technique to record bioelectrical signals. Integrating supervised deep learning techniques with EEG signals has recently facilitated automatic analysis across diverse EEG-based tasks. However, the label issues of EEG signals have constrained the development of EEG-based deep models. Obtaining EEG annotations is difficult that requires domain experts to… ▽ More Electroencephalogram (EEG) is a non-invasive technique to record bioelectrical signals. Integrating supervised deep learning techniques with EEG signals has recently facilitated automatic analysis across diverse EEG-based tasks. However, the label issues of EEG signals have constrained the development of EEG-based deep models. Obtaining EEG annotations is difficult that requires domain experts to guide collection and labeling, and the variability of EEG signals among different subjects causes significant label shifts. To solve the above challenges, self-supervised learning (SSL) has been proposed to extract representations from unlabeled samples through well-designed pretext tasks. This paper concentrates on integrating SSL frameworks with temporal EEG signals to achieve efficient representation and proposes a systematic review of the SSL for EEG signals. In this paper, 1) we introduce the concept and theory of self-supervised learning and typical SSL frameworks. 2) We provide a comprehensive review of SSL for EEG analysis, including taxonomy, methodology, and technique details of the existing EEG-based SSL frameworks, and discuss the difference between these methods. 3) We investigate the adaptation of the SSL approach to various downstream tasks, including the task description and related benchmark datasets. 4) Finally, we discuss the potential directions for future SSL-EEG research. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 35 pages, 12 figures

MSC Class: 68-02 (Primarily); 68T01 (Secondary) ACM Class: I.2; J.3; I.5.4

arXiv:2312.16247 [pdf, other]

Toward Accurate and Temporally Consistent Video Restoration from Raw Data

Authors: Shi Guo, Jianqi Ma, Xi Yang, Zhengqiang Zhang, Lei Zhang

Abstract: Denoising and demosaicking are two fundamental steps in reconstructing a clean full-color video from raw data, while performing video denoising and demosaicking jointly, namely VJDD, could lead to better video restoration performance than performing them separately. In addition to restoration accuracy, another key challenge to VJDD lies in the temporal consistency of consecutive frames. This issue… ▽ More Denoising and demosaicking are two fundamental steps in reconstructing a clean full-color video from raw data, while performing video denoising and demosaicking jointly, namely VJDD, could lead to better video restoration performance than performing them separately. In addition to restoration accuracy, another key challenge to VJDD lies in the temporal consistency of consecutive frames. This issue exacerbates when perceptual regularization terms are introduced to enhance video perceptual quality. To address these challenges, we present a new VJDD framework by consistent and accurate latent space propagation, which leverages the estimation of previous frames as prior knowledge to ensure consistent recovery of the current frame. A data temporal consistency (DTC) loss and a relational perception consistency (RPC) loss are accordingly designed. Compared with the commonly used flow-based losses, the proposed losses can circumvent the error accumulation problem caused by inaccurate flow estimation and effectively handle intensity changes in videos, improving much the temporal consistency of output videos while preserving texture details. Extensive experiments demonstrate the leading VJDD performance of our method in term of restoration accuracy, perceptual quality and temporal consistency. Codes and dataset are available at \url{https://github.com/GuoShi28/VJDD}. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2311.01812 [pdf, ps, other]

Carrier Frequency Offset Estimation for OCDM with Null Subchirps

Authors: Sidong Guo, Yiyin Wang, Xiaoli Ma

Abstract: In this paper, we investigate the carrier frequency offset (CFO) identifiability problem in orthogonal chirp division multiplexing (OCDM) systems. We propose a transmission scheme by inserting consecutive null subchirps. A CFO estimator is accordingly developed to achieve a full acquisition range. We further demonstrate that the proposed transmission scheme not only help to resolve CFO identifia… ▽ More In this paper, we investigate the carrier frequency offset (CFO) identifiability problem in orthogonal chirp division multiplexing (OCDM) systems. We propose a transmission scheme by inserting consecutive null subchirps. A CFO estimator is accordingly developed to achieve a full acquisition range. We further demonstrate that the proposed transmission scheme not only help to resolve CFO identifiability issues but also enable multipath diversity for OCDM systems. Simulation results corroborate our theoretical findings. △ Less

Submitted 8 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 2 fig

arXiv:2310.20242 [pdf, other]

Intelligent-Reflecting-Surface-Assisted UAV Communications for 6G Networks

Authors: Zhaolong Ning, Tengfeng Li, Yu Wu, Xiaojie Wang, Qingqing Wu, Fei Richard Yu, Song Guo

Abstract: In 6th-Generation (6G) mobile networks, Intelligent Reflective Surfaces (IRSs) and Unmanned Aerial Vehicles (UAVs) have emerged as promising technologies to address the coverage difficulties and resource constraints faced by terrestrial networks. UAVs, with their mobility and low costs, offer diverse connectivity options for mobile users and a novel deployment paradigm for 6G networks. However, th… ▽ More In 6th-Generation (6G) mobile networks, Intelligent Reflective Surfaces (IRSs) and Unmanned Aerial Vehicles (UAVs) have emerged as promising technologies to address the coverage difficulties and resource constraints faced by terrestrial networks. UAVs, with their mobility and low costs, offer diverse connectivity options for mobile users and a novel deployment paradigm for 6G networks. However, the limited battery capacity of UAVs, dynamic and unpredictable channel environments, and communication resource constraints result in poor performance of traditional UAV-based networks. IRSs can not only reconstruct the wireless environment in a unique way, but also achieve wireless network relay in a cost-effective manner. Hence, it receives significant attention as a promising solution to solve the above challenges. In this article, we conduct a comprehensive survey on IRS-assisted UAV communications for 6G networks. First, primary issues, key technologies, and application scenarios of IRS-assisted UAV communications for 6G networks are introduced. Then, we put forward specific solutions to the issues of IRS-assisted UAV communications. Finally, we discuss some open issues and future research directions to guide researchers in related fields. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2309.12688 [pdf, ps, other]

Green Holographic MIMO Communications With A Few Transmit Radio Frequency Chains

Authors: Shuaishuai Guo, Jia Ye, Kaiqian Qu, Shu** Dang

Abstract: Holographic multiple-input multiple-output (MIMO) communications are widely recognized as a promising candidate for the next-generation air interface. With holographic MIMO surface, the number of the spatial degrees-of-freedom (DoFs) considerably increases and also significantly varies as the user moves. To fully employ the large and varying number of spatial DoFs, the number of equipped RF chains… ▽ More Holographic multiple-input multiple-output (MIMO) communications are widely recognized as a promising candidate for the next-generation air interface. With holographic MIMO surface, the number of the spatial degrees-of-freedom (DoFs) considerably increases and also significantly varies as the user moves. To fully employ the large and varying number of spatial DoFs, the number of equipped RF chains has to be larger than or equal to the largest number of spatial DoFs. However, this causes much waste as radio frequency (RF) chains (especially the transmit RF chains) are costly and power-hungry. To avoid the heavy burden, this paper investigates green holographic MIMO communications with a few transmit RF chains under an electromagnetic-based communication model. We not only look at the fundamental capacity limits but also propose an effective transmission, namely non-uniform holographic pattern modulation (NUHPM), to achieve the capacity limit in the high signal-to-noise (SNR) regime. The analytical result sheds light on the green evaluation of MIMO communications, which can be realized by increasing the size of the antenna aperture without increasing the number of transmit RF chains. Numerical results are provided to verify our analysis and to show the great performance gain by employing the additional spatial DoFs as modulation resources. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: 10 figures; has been accepted by TGCN

arXiv:2308.10428 [pdf, other]

Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models

Authors: Heyang Xue, Shuai Guo, Pengcheng Zhu, Mengxiao Bi

Abstract: Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example. However, the sampling drift problem leads to these approaches struggling in multi-speaker scenarios in practice due to mo… ▽ More Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example. However, the sampling drift problem leads to these approaches struggling in multi-speaker scenarios in practice due to more complex target data distribution compared to single-speaker scenarios. In this paper, we present Multi-GradSpeech, a multi-speaker diffusion-based acoustic models which introduces the Consistent Diffusion Model (CDM) as a generative modeling approach. We enforce the consistency property of CDM during the training process to alleviate the sampling drift problem in the inference stage, resulting in significant improvements in multi-speaker TTS performance. Our experimental results corroborate that our proposed approach can improve the performance of different speakers involved in multi-speaker TTS compared to Grad-TTS, even outperforming the fine-tuning approach. Audio samples are available at https://welkinyang.github.io/multi-gradspeech/ △ Less

Submitted 31 August, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.10420 [pdf, other]

doi 10.1109/TVT.2023.3305330

Reconfigurable Intelligent Surface Enabled Joint Backscattering and Communication

Authors: **qiu Zhao, Jia Ye Shuaishuai Guo, Zhiquan Bai, Di Zhou, Abeer Mohamed

Abstract: Reconfigurable intelligent surface (RIS) as an essential topic in the sixth-generation (6G) communications aims to enhance communication performance or mitigate undesired transmission. However, the controllability of each reflecting element on RIS also enables it to act as a passive backscatter device (BD) and transmit its information to reader devices. In this paper, we propose a RIS-enabled join… ▽ More Reconfigurable intelligent surface (RIS) as an essential topic in the sixth-generation (6G) communications aims to enhance communication performance or mitigate undesired transmission. However, the controllability of each reflecting element on RIS also enables it to act as a passive backscatter device (BD) and transmit its information to reader devices. In this paper, we propose a RIS-enabled joint backscattering and communication (JBAC) system, where the backscatter communication coexists with the primary communication and occupies no extra spectrum. Specifically, the RIS modifies its reflecting pattern to act as a passive BD and reflect its own information back to the base station (BS) in the backscatter communication, while hel** the primary communication from the BS to the users simultaneously. We further present an iterative active beamforming and reflecting pattern design to maximize the user average transmission rate of the primary communication and the goodput of the backscatter communication by solving the formulated multi-objective optimization problem (MOOP). Numerical results fully uncover the impacts of the number of reflecting elements and the reflecting patterns on the system performance, and demonstrate the effectiveness of the proposed scheme. Important practical implementation remarks have also been discussed. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 11 pages, 8 figures, published to IEEE TVT

Journal ref: IEEE Transactions on Vehicular Technology, 2023

arXiv:2308.07342 [pdf, other]

Emergent communication for AR

Authors: Ruxiao Chen, Shuaishuai Guo

Abstract: Mobile augmented reality (MAR) is widely acknowledged as one of the ubiquitous interfaces to the digital twin and Metaverse, demanding unparalleled levels of latency, computational power, and energy efficiency. The existing solutions for realizing MAR combine multiple technologies like edge, cloud computing, and fifth-generation (5G) networks. However, the inherent communication latency of visual… ▽ More Mobile augmented reality (MAR) is widely acknowledged as one of the ubiquitous interfaces to the digital twin and Metaverse, demanding unparalleled levels of latency, computational power, and energy efficiency. The existing solutions for realizing MAR combine multiple technologies like edge, cloud computing, and fifth-generation (5G) networks. However, the inherent communication latency of visual data imposes apparent limitations on the quality of experience (QoE). To address the challenge, we propose an emergent semantic communication framework to learn the communication protocols in MAR. Specifically, we train two agents through a modified Lewis signaling game to emerge a discrete communication protocol spontaneously. Based on this protocol, two agents can communicate about the abstract idea of visual data through messages with extremely small data sizes in a noisy channel, which leads to message errors. To better simulate real-world scenarios, we incorporate channel uncertainty into our training process. Experiments have shown that the proposed scheme has better generalization on unseen objects than traditional object recognition used in MAR and can effectively enhance communication efficiency through the utilization of small-size messages. △ Less

Submitted 12 August, 2023; originally announced August 2023.

arXiv:2308.06455 [pdf, ps, other]

Near-Field Integrated Sensing and Communication: Performance Analysis and Beamforming Design

Authors: Kaiqian Qu, Shuaishuai Guo, Nasir Saeed

Abstract: This paper explores the potential of near-field beamforming (NFBF) in integrated sensing and communication (ISAC) systems with extremely large-scale arrays (XL-arrays). The large-scale antenna arrays increase the possibility of having communication users and targets of interest in the near field of the base station (BS). The paper first establishes the models of electromagnetic (EM) near-field sph… ▽ More This paper explores the potential of near-field beamforming (NFBF) in integrated sensing and communication (ISAC) systems with extremely large-scale arrays (XL-arrays). The large-scale antenna arrays increase the possibility of having communication users and targets of interest in the near field of the base station (BS). The paper first establishes the models of electromagnetic (EM) near-field spherical waves and far-field plane waves. With the models, we analyze the near-field beam focusing ability and the far-field beam steering ability by finding the gain-loss mathematical expression caused by the far-field steering vector mismatch in the near-field case. We formulate the NFBF design problem as minimizing the weighted summation of radar and the communication beamforming errors under a total power constraint and solve this quadratically constrained quadratic programming (QCQP) problem using the least squares (LS) method. Moreover, the Cramér-Rao bound (CRB) for target parameter estimation is derived to verify the performance of NFBF. Furthermore, we also perform power minimization using convex optimization while ensuring the required communication and sensing quality-of-service (QoS). The simulation results show the influence of model mismatch on near-field ISAC and the performance gain of transmit beamforming from the additional distance dimension of near-field. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: under review

arXiv:2308.00253 [pdf, ps, other]

Privacy and Security in Ubiquitous Integrated Sensing and Communication: Threats, Challenges and Future Directions

Authors: Kaiqian Qu, Jia Ye, Xuran Li, Shuaishuai Guo

Abstract: Integrated sensing and communication (ISAC) technology is one of the featuring technologies of the next-generation communication systems. When sensing capability becomes ubiquitous, more information can be collected, which can facilitate many applications in intelligent transportation, unmanned aerial vehicle (UAV) surveillance and healthcare. However, it also faces many information privacy leakag… ▽ More Integrated sensing and communication (ISAC) technology is one of the featuring technologies of the next-generation communication systems. When sensing capability becomes ubiquitous, more information can be collected, which can facilitate many applications in intelligent transportation, unmanned aerial vehicle (UAV) surveillance and healthcare. However, it also faces many information privacy leakage and security issues. This article highlights the potential threats to privacy and security and the technical challenges to realizing private and secure ISAC. Three promising combating solutions including artificial intelligence (AI)-enabled schemes, friendly jamming and reconfigurable intelligent surface (RIS)-assisted design are provided to maintain user privacy and ensure information security. Case studies demonstrate their effectiveness. △ Less

Submitted 13 May, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: to appear in IOTMAG

arXiv:2308.00252 [pdf, ps, other]

Near-Field Integrated Sensing and Communications: Unlocking Potentials and Sha** the Future

Authors: Kaiqian Qu, Shuaishuai Guo, Jia Ye, Nasir Saeed

Abstract: The sixth generation (6G) communication networks are featured by integrated sensing and communications (ISAC), revolutionizing base stations (BSs) and terminals. Additionally, in the unfolding 6G landscape, a pivotal physical layer technology, the Extremely Large-Scale Antenna Array (ELAA), assumes center stage. With its expansive coverage of the near-field region, ELAA's electromagnetic (EM) wave… ▽ More The sixth generation (6G) communication networks are featured by integrated sensing and communications (ISAC), revolutionizing base stations (BSs) and terminals. Additionally, in the unfolding 6G landscape, a pivotal physical layer technology, the Extremely Large-Scale Antenna Array (ELAA), assumes center stage. With its expansive coverage of the near-field region, ELAA's electromagnetic (EM) waves manifest captivating spherical wave properties. Embracing these distinctive features, communication and sensing capabilities scale unprecedented heights. Therefore, we systematically explore the prodigious potential of near-field ISAC technology. In particular, the fundamental principles of near-field are presented to unearth its benefits in both communication and sensing. Then, we delve into the technologies underpinning near-field communication and sensing, unraveling possibilities discussed in recent works. We then investigated the advantages of near-field ISAC through rigorous case simulations, showcasing the benefits of near-field ISAC and reinforcing its stature as a transformative paradigm. As we conclude, we confront the open frontiers and chart the future directions for near-field ISAC. △ Less

Submitted 5 August, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: under review

arXiv:2307.01396 [pdf, other]

doi 10.1109/CCET59170.2023.10335130

Precheck Sequence Based False Base Station Detection During Handover: A Physical Layer Security Scheme

Authors: Xiangyu Li, Kaiwen Zheng, Sidong Guo, Xiaoli Ma

Abstract: False Base Station (FBS) attack has been a severe security problem for the cellular network since 2G era. During handover, the user equipment (UE) periodically receives state information from surrounding base stations (BSs) and uploads it to the source BS. The source BS compares the uploaded signal power and shifts UE to another BS that can provide the strongest signal. An FBS can transmit signal… ▽ More False Base Station (FBS) attack has been a severe security problem for the cellular network since 2G era. During handover, the user equipment (UE) periodically receives state information from surrounding base stations (BSs) and uploads it to the source BS. The source BS compares the uploaded signal power and shifts UE to another BS that can provide the strongest signal. An FBS can transmit signal with the proper power and attract UE to connect to it. In this paper, based on the 3GPP standard, a Precheck Sequence-based Detection (PSD) Scheme is proposed to secure the transition of legal base station (LBS) for UE. This scheme first analyzes the structure of received signals in blocks and symbols. Several additional symbols are added to the current signal sequence for verification. By designing a long table of symbol sequence, every UE which needs handover will be allocated a specific sequence from this table. The simulation results show that the performance of this PSD Scheme is better than that of any existing ones, even when a specific transmit power is designed for FBS. △ Less

Submitted 3 November, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.06137 [pdf]

doi 10.1007/s41870-024-01768-3

Sensing-Aided Peer-to-Peer Millimeter-Wave Communication

Authors: Xiangyu Li, Sidong Guo, Shez Malik

Abstract: One of the bottlenecks of modern communications is to enable sensing and mutual communication simultaneously without causing scheduling conflicts, and how sensing may be leveraged to help directional communication accuracy. To this end, we propose and implement a novel peer-to-peer (P2P) millimeter-wave communication system to jointly achieve beamforming and sensing. A radar and IMU-assisted track… ▽ More One of the bottlenecks of modern communications is to enable sensing and mutual communication simultaneously without causing scheduling conflicts, and how sensing may be leveraged to help directional communication accuracy. To this end, we propose and implement a novel peer-to-peer (P2P) millimeter-wave communication system to jointly achieve beamforming and sensing. A radar and IMU-assisted tracking and beamforming algorithm is designed and tested. The results show that a robust tracking capacity with an overall higher throughput can be obtained. It is also hopeful that based on our proposed system, our design and implementation can be deployed in a more scalable manner for future extensions. △ Less

Submitted 26 January, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.00714 [pdf, other]

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

Authors: Ruibin Li, Qihua Zhou, Song Guo, Jie Zhang, **gcai Guo, Xinyang Jiang, Yifei Shen, Zhenhua Han

Abstract: Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hard… ▽ More Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hardware investments. More seriously, since the DGMs are established with a discrete pre-defined upsampling scale, they cannot well match the emerging requirements of arbitrary-scale super-resolution (ASSR), where a unified model adapts to arbitrary upsampling scales, instead of preparing a series of distinct models for each case. These limitations beg an intriguing question: can we identify the ASSR capability of existing pre-trained DGMs without the need for distillation or fine-tuning? In this paper, we take a step towards resolving this matter by proposing Diff-SR, a first ASSR attempt based solely on pre-trained DGMs, without additional training efforts. It is motivated by an exciting finding that a simple methodology, which first injects a specific amount of noise into the low-resolution images before invoking a DGM's backward diffusion process, outperforms current leading solutions. The key insight is determining a suitable amount of noise to inject, i.e., small amounts lead to poor low-level fidelity, while over-large amounts degrade the high-level signature. Through a finely-grained theoretical analysis, we propose the Perceptual Recoverable Field (PRF), a metric that achieves the optimal trade-off between these two factors. Extensive experiments verify the effectiveness, flexibility, and adaptability of Diff-SR, demonstrating superior performance to state-of-the-art solutions under diverse ASSR environments. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.19558 [pdf, other]

Look-Ahead Task Offloading for Multi-User Mobile Augmented Reality in Edge-Cloud Computing

Authors: Ruxiao Chen, Shuaishuai Guo

Abstract: Mobile augmented reality (MAR) blends a real scenario with overlaid virtual content, which has been envisioned as one of the ubiquitous interfaces to the Metaverse. Due to the limited computing power and battery life of MAR devices, it is common to offload the computation tasks to edge or cloud servers in close proximity. However, existing offloading solutions developed for MAR tasks suffer from h… ▽ More Mobile augmented reality (MAR) blends a real scenario with overlaid virtual content, which has been envisioned as one of the ubiquitous interfaces to the Metaverse. Due to the limited computing power and battery life of MAR devices, it is common to offload the computation tasks to edge or cloud servers in close proximity. However, existing offloading solutions developed for MAR tasks suffer from high migration overhead, poor scalability, and short-sightedness when applied in provisioning multi-user MAR services. To address these issues, a MAR service-oriented task offloading scheme is designed and evaluated in edge-cloud computing networks. Specifically, the task interdependency of MAR applications is firstly analyzed and modeled by using directed acyclic graphs. Then, we propose a look-ahead offloading scheme based on a modified Monte Carlo tree (MMCT) search, which can run several multi-step executions in advance to get an estimate of the long-term effect of immediate action. Experiment results show that the proposed offloading scheme can effectively improve the quality of service (QoS) in provisioning multi-user MAR services, compared to four benchmark schemes. Furthermore, it is also shown that the proposed solution is stable and suitable for applications in a highly volatile environment. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: Accepted by IEEE Network

arXiv:2305.09891 [pdf, ps, other]

doi 10.1109/LWC.2023.3277666

Beamspace Modulation for Near Field Capacity Improvement in XL-MIMO Communications

Authors: Shuaishuai Guo, Kaiqian Qu

Abstract: The spatial degrees of freedom (DoFs) greatly increase in the near-field region of millimeter wave or terahertz multiple-input multiple-output communications with extremely large antenna arrays (XL-MIMO). To employ the increased spatial DoFs, a beamspace modulation (BM) strategy is introduced to the near field of XL-MIMO. BM can work with a fixed small number of RF chains. It exploits the increase… ▽ More The spatial degrees of freedom (DoFs) greatly increase in the near-field region of millimeter wave or terahertz multiple-input multiple-output communications with extremely large antenna arrays (XL-MIMO). To employ the increased spatial DoFs, a beamspace modulation (BM) strategy is introduced to the near field of XL-MIMO. BM can work with a fixed small number of RF chains. It exploits the increased spatial DoFs as modulation resources for capacity improvements. The achievable spectral efficiency and its asymptotic capacity are analyzed. Both theoretical and simulation results show that the proposed BM strategy considerably outperforms the existing benchmark that only selects the best beamspace for data transmission in terms of spectral efficiency. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 5 pages, 4 figures, accepted by IEEE Wireless Communications Letters

arXiv:2303.09694 [pdf, other]

Drone Formation for Efficient Swarm Energy Consumption

Authors: Shilong Guo, Balsam Alkouz, Babar Shahzaad, Abdallah Lakhdari, Athman Bouguettaya

Abstract: We demonstrate formation flying for drone swarm services. A set of drones fly in four different swarm formations. A dataset is collected to study the effect of formation flying on energy consumption. We conduct a set of experiments to study the effect of wind on formation flying. We examine the forces the drones exert on each other when flying in a formation. We finally identify and classify the f… ▽ More We demonstrate formation flying for drone swarm services. A set of drones fly in four different swarm formations. A dataset is collected to study the effect of formation flying on energy consumption. We conduct a set of experiments to study the effect of wind on formation flying. We examine the forces the drones exert on each other when flying in a formation. We finally identify and classify the formations that conserve most energy under varying wind conditions. The collected dataset aims at providing researchers data to conduct further research in swarm-based drone service delivery. Demo: https://youtu.be/NnucUWhUwLs △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 3 pages, 7 figures. This is an accepted demo paper and it will appear in The 21st International Conference on Pervasive Computing and Communications (PerCom 2023)

arXiv:2302.14763 [pdf, other]

Vehicular Behavior-Aware Beamforming Design for Integrated Sensing and Communication Systems

Authors: Dingyan Cong, Shuaishuai Guo, Shu** Dang, Haixia Zhang

Abstract: Communication and sensing are two important features of connected and autonomous vehicles (CAVs). In traditional vehicle-mounted devices, communication and sensing modules exist but in an isolated way, resulting in a waste of hardware resources and wireless spectrum. In this paper, to cope with the above inefficiency, we propose a vehicular behavior-aware integrated sensing and communication (VBA-… ▽ More Communication and sensing are two important features of connected and autonomous vehicles (CAVs). In traditional vehicle-mounted devices, communication and sensing modules exist but in an isolated way, resulting in a waste of hardware resources and wireless spectrum. In this paper, to cope with the above inefficiency, we propose a vehicular behavior-aware integrated sensing and communication (VBA-ISAC) beamforming design for the vehicle-mounted transmitter with multiple antennas. In this work, beams are steered based on vehicular behaviors to assist driving and meanwhile provide spectral-efficient uplink data services with the help of a roadside unit (RSU). Specifically, we first predict the area of interest (AoI) to be sensed based on the vehicles' trajectories. Then, we formulate a VBA-ISAC beamforming design problem to sense the AoI while maximizing the spectral efficiency of uplink communications, where a trade-off factor is introduced to balance the communication and sensing performance. A semi-definite relaxation-based beampattern mismatch minimization (SDR-BMM) algorithm is proposed to solve the formulated problem. To reduce the hardware cost and power consumption, we further improve the proposed VBA-ISAC beamforming design by introducing the hybrid analog-digital (HAD) structure. Numerical results verify the effectiveness of VBA-ISAC scheme and show that the proposed beamforming design outperforms the benchmarks in both spectral efficiency and radar beampattern. △ Less

Submitted 26 February, 2023; originally announced February 2023.

arXiv:2302.07142 [pdf, ps, other]

Semantic Importance-Aware Communications Using Pre-trained Language Models

Authors: Shuaishuai Guo, Yanhu Wang, Shu**g Li, Nasir Saeed

Abstract: This letter proposes a semantic importance-aware communication (SIAC) scheme using pre-trained language models (e.g., ChatGPT, BERT, etc.). Specifically, we propose a cross-layer design with a pre-trained language model embedded in/connected by the cross-layer manager. The pre-trained language model is utilized to quantify the semantic importance of data frames. Based on the quantified semantic im… ▽ More This letter proposes a semantic importance-aware communication (SIAC) scheme using pre-trained language models (e.g., ChatGPT, BERT, etc.). Specifically, we propose a cross-layer design with a pre-trained language model embedded in/connected by the cross-layer manager. The pre-trained language model is utilized to quantify the semantic importance of data frames. Based on the quantified semantic importance, we investigate semantic importance-aware power allocation. Unlike existing deep joint source-channel coding (Deep-JSCC)-based semantic communication schemes, SIAC can be directly embedded into current communication systems by only introducing a cross-layer manager. Our experimental results show that the proposed SIAC scheme can achieve lower semantic loss than existing equal-priority communications. △ Less

Submitted 7 July, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: Accepted by IEEE Communications Letters, Semantic communications, pre-trained language model, ChatGPT, BERT, data importance

arXiv:2302.01972 [pdf, other]

doi 10.1109/TITS.2023.3287792

DCA: Delayed Charging Attack on the Electric Shared Mobility System

Authors: Shuocheng Guo, Hanlin Chen, Mizanur Rahman, Xinwu Qian

Abstract: An efficient operation of the electric shared mobility system (ESMS) relies heavily on seamless interconnections among shared electric vehicles (SEV), electric vehicle supply equipment (EVSE), and the grid. Nevertheless, this interconnectivity also makes the ESMS vulnerable to cyberattacks that may cause short-term breakdowns or long-term degradation of the ESMS. This study focuses on one such att… ▽ More An efficient operation of the electric shared mobility system (ESMS) relies heavily on seamless interconnections among shared electric vehicles (SEV), electric vehicle supply equipment (EVSE), and the grid. Nevertheless, this interconnectivity also makes the ESMS vulnerable to cyberattacks that may cause short-term breakdowns or long-term degradation of the ESMS. This study focuses on one such attack with long-lasting effects, the Delayed Charge Attack (DCA), that stealthily delays the charging service by exploiting the physical and communication vulnerabilities. To begin, we present the ESMS threat model by highlighting the assets, information flow, and access points. We next identify a linked sequence of vulnerabilities as a viable attack vector for launching DCA. Then, we detail the implementation of DCA, which can effectively bypass the detection in the SEV's battery management system and the cross-verification in the cloud environment. We test the DCA model against various Anomaly Detection (AD) algorithms by simulating the DCA dynamics in a Susceptible-Infectious-Removed-Susceptible process, where the EVSE can be compromised by the DCA or detected for repair. Using real-world taxi trip data and EVSE locations in New York City, the DCA model allows us to explore the long-term impacts and validate the system consequences. The results show that a 10-min delay results in 12-min longer queuing times and 8% more unfulfilled requests, leading to a 10.7% (\$311.7) weekly revenue loss per driver. With the AD algorithms, the weekly revenue loss remains at least 3.8% (\$111.8) with increased repair costs of \$36,000, suggesting the DCA's robustness against the AD. △ Less

Submitted 13 June, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Journal ref: IEEE Transactions on Intelligent Transportation Systems, 2023

arXiv:2301.03133 [pdf, ps, other]

Transceiver Cooperative Learning-aided Semantic Communications Against Mismatched Background Knowledge Bases

Authors: Yanhu Wang, Shuaishuai Guo

Abstract: Semantic communications learned on background knowledge bases (KBs) have been identified as a promising technology for communications between intelligent agents. Existing works assume that transceivers of semantic communications share the same KB. However, intelligent transceivers may suffer from the communication burden or worry about privacy leakage to exchange data in KBs. Besides, the transcei… ▽ More Semantic communications learned on background knowledge bases (KBs) have been identified as a promising technology for communications between intelligent agents. Existing works assume that transceivers of semantic communications share the same KB. However, intelligent transceivers may suffer from the communication burden or worry about privacy leakage to exchange data in KBs. Besides, the transceivers may independently learn from the environment and dynamically update their KBs, leading to timely sharing of the KBs infeasible. All these cause the mismatch between the KBs, which may result in a semantic-level misunderstanding on the receiver side. To address this issue, we propose a transceiver cooperative learning-assisted semantic communication (TCL-SC) scheme against mismatched KBs. In TCL-SC, the transceivers cooperatively train semantic encoder and decoder neuron networks (NNs) of the same structure based on their own KBs. They periodically share the parameters of NNs. To reduce the communication overhead of parameter sharing, parameter quantization is adopted. Moreover, we discuss the impacts of the number of communication rounds on the performance of semantic communication systems. Experiments on real-world data demonstrate that our proposed TCL-SC can reduce the semantic-level misunderstanding on the receiver side caused by the mismatch between the KBs, especially at the low signal-to-noise (SNR) ratio regime. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2212.01756 [pdf, other]

Connected Cruise and Traffic Control for Pairs of Connected Automated Vehicles

Authors: Sicong Guo, Gabor Orosz, Tamas G. Molnar

Abstract: This paper considers mixed traffic consisting of connected automated vehicles equipped with vehicle-to-everything (V2X) connectivity and human-driven vehicles. A control strategy is proposed for communicating pairs of connected automated vehicles, where the two vehicles regulate their longitudinal motion by responding to each other, and, at the same time, stabilize the human-driven traffic between… ▽ More This paper considers mixed traffic consisting of connected automated vehicles equipped with vehicle-to-everything (V2X) connectivity and human-driven vehicles. A control strategy is proposed for communicating pairs of connected automated vehicles, where the two vehicles regulate their longitudinal motion by responding to each other, and, at the same time, stabilize the human-driven traffic between them. Stability analysis is conducted to find stabilizing controllers, and simulations are used to show the efficacy of the proposed approach. The impact of the penetration of connectivity and automation on the string stability of traffic is quantified. It is shown that, even with moderate penetration, connected automated vehicle pairs executing the proposed controllers achieve significant benefits compared to when these vehicles are disconnected and controlled independently. △ Less

Submitted 12 June, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

Comments: Accepted to the IEEE Transactions on Intelligent Transportation Systems. 11 pages, 10 figures

arXiv:2211.08428 [pdf, other]

CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming

Authors: Qihua Zhou, Ruibin Li, Song Guo, Peiran Dong, Yi Liu, **gcai Guo, Zhenda Xu

Abstract: Recent years have witnessed the dramatic growth of Internet video traffic, where the video bitstreams are often compressed and delivered in low quality to fit the streamer's uplink bandwidth. To alleviate the quality degradation, it comes the rise of Neural-enhanced Video Streaming (NVS), which shows great prospects for recovering low-quality videos by mostly deploying neural super-resolution (SR)… ▽ More Recent years have witnessed the dramatic growth of Internet video traffic, where the video bitstreams are often compressed and delivered in low quality to fit the streamer's uplink bandwidth. To alleviate the quality degradation, it comes the rise of Neural-enhanced Video Streaming (NVS), which shows great prospects for recovering low-quality videos by mostly deploying neural super-resolution (SR) on the media server. Despite its benefit, we reveal that current mainstream works with SR enhancement have not achieved the desired rate-distortion trade-off between bitrate saving and quality restoration, due to: (1) overemphasizing the enhancement on the decoder side while omitting the co-design of encoder, (2) limited generative capacity to recover high-fidelity perceptual details, and (3) optimizing the compression-and-restoration pipeline from the resolution perspective solely, without considering color bit-depth. Aiming at overcoming these limitations, we are the first to conduct an encoder-decoder (i.e., codec) synergy by leveraging the inherent visual-generative property of diffusion models. Specifically, we present the Codec-aware Diffusion Modeling (CaDM), a novel NVS paradigm to significantly reduce streaming delivery bitrates while holding pretty higher restoration capacity over existing methods. First, CaDM improves the encoder's compression efficiency by simultaneously reducing resolution and color bit-depth of video frames. Second, CaDM empowers the decoder with high-quality enhancement by making the denoising diffusion restoration aware of encoder's resolution-color conditions. Evaluation on public cloud services with OpenMMLab benchmarks shows that CaDM effectively saves up to 5.12 - 21.44 times bitrates based on common video standards and achieves much better recovery quality (e.g., FID of 0.61) over state-of-the-art neural-enhancing methods. △ Less

Submitted 8 March, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:2209.15156 [pdf, ps, other]

Cooperative Beamforming Design for Multiple RIS-Assisted Communication Systems

Authors: Xiaoyan Ma, Yuguang Fang, Haixia Zhang, Shuaishuai Guo, Dongfeng Yuan

Abstract: Reconfigurable intelligent surface (RIS) provides a promising way to build programmable wireless transmission environments. Owing to the massive number of controllable reflecting elements on the surface, RIS is capable of providing considerable passive beamforming gains. At present, most related works mainly consider the modeling, design, performance analysis and optimization of single-RIS-assiste… ▽ More Reconfigurable intelligent surface (RIS) provides a promising way to build programmable wireless transmission environments. Owing to the massive number of controllable reflecting elements on the surface, RIS is capable of providing considerable passive beamforming gains. At present, most related works mainly consider the modeling, design, performance analysis and optimization of single-RIS-assisted systems. Although there are a few of works that investigate multiple RISs individually serving their associated users, the cooperation among multiple RISs is not well considered as yet. To fill the gap, this paper studies a cooperative beamforming design for multi-RIS-assisted communication systems, where multiple RISs are deployed to assist the downlink communications from a base station to its users. To do so, we first model the general channel from the base station to the users for arbitrary number of reflection links. Then, we formulate an optimization problem to maximize the sum rate of all users. Analysis shows that the formulated problem is difficult to solve due to its non-convexity and the interactions among the decision variables. To solve it effectively, we first decouple the problem into three disjoint subproblems. Then, by introducing appropriate auxiliary variables, we derive the closed-form expressions for the decision variables and propose a low-complexity cooperative beamforming algorithm. Simulation results have verified the effectiveness of the proposed algorithm through comparison with various baseline methods. Furthermore, these results also unveil that, for the sum rate maximization, distributing the reflecting elements among multiple RISs is superior to deploying them at one single RIS. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.09138 [pdf, ps, other]

Robust Beamforming and Rate-Splitting Design for Next Generation Ultra-Reliable and Low-Latency Communications

Authors: Tiantian Li, Haixia Zhang, Shuaishuai Guo, Dongfeng Yuan

Abstract: The next generation ultra-reliable and low-latency communications (xURLLC) need novel design to provide satisfactory services to the emerging mission-critical applications. To improve the spectrum efficiency and enhance the robustness of xURLLC, this paper proposes a robust beamforming and rate-splitting design in the finite blocklength (FBL) regime for downlink multi-user multi-antenna xURLLC sys… ▽ More The next generation ultra-reliable and low-latency communications (xURLLC) need novel design to provide satisfactory services to the emerging mission-critical applications. To improve the spectrum efficiency and enhance the robustness of xURLLC, this paper proposes a robust beamforming and rate-splitting design in the finite blocklength (FBL) regime for downlink multi-user multi-antenna xURLLC systems. In the design, adaptive rate-splitting is introduced to flexibly handle the complex inter-user interference and thus improve the spectrum efficiency. Taking the imperfection of the channel state information at the transmitter (CSIT) into consideration, a max-min user rate problem is formulated to optimize the common and private beamforming vectors and the rate-splitting vector under the premise of ensuring the requirements of transmission latency and reliability of all the users. The optimization problem is intractable due to the non-convexity of the constraint set and the infinite constraints caused by CSIT uncertainties. To solve it, we convert the infinite constraints into finite ones by the S-Procedure method and transform the original problem into a difference of convex (DC) programming. A constrained concave convex procedure (CCCP) and the Gaussian randomization based iterative algorithm is proposed to obtain a local minimum. Simulation results confirm the convergence, robustness and effectiveness of the proposed robust beamforming and rate-splitting design in the FBL regime. It is also shown that the proposed robust design achieves considerable performance gain in the worst user rate compared with existing transmission schemes under various blocklength and block error rate requirements. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 12 pages, 9 figures

arXiv:2205.13412 [pdf, other]

Physical-World Optical Adversarial Attacks on 3D Face Recognition

Authors: Yanjie Li, Yiquan Li, Xuelong Dai, Songtao Guo, Bin Xiao

Abstract: 2D face recognition has been proven insecure for physical adversarial attacks. However, few studies have investigated the possibility of attacking real-world 3D face recognition systems. 3D-printed attacks recently proposed cannot generate adversarial points in the air. In this paper, we attack 3D face recognition systems through elaborate optical noises. We took structured light 3D scanners as ou… ▽ More 2D face recognition has been proven insecure for physical adversarial attacks. However, few studies have investigated the possibility of attacking real-world 3D face recognition systems. 3D-printed attacks recently proposed cannot generate adversarial points in the air. In this paper, we attack 3D face recognition systems through elaborate optical noises. We took structured light 3D scanners as our attack target. End-to-end attack algorithms are designed to generate adversarial illumination for 3D faces through the inherent or an additional projector to produce adversarial points at arbitrary positions. Nevertheless, face reflectance is a complex procedure because the skin is translucent. To involve this projection-and-capture procedure in optimization loops, we model it by Lambertian rendering model and use SfSNet to estimate the albedo. Moreover, to improve the resistance to distance and angle changes while maintaining the perturbation unnoticeable, a 3D transform invariant loss and two kinds of sensitivity maps are introduced. Experiments are conducted in both simulated and physical worlds. We successfully attacked point-cloud-based and depth-image-based 3D face recognition algorithms while needing fewer perturbations than previous state-of-the-art physical-world 3D adversarial attacks. △ Less

Submitted 13 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Submitted to CVPR 2023

arXiv:2205.04029 [pdf, other]

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

Authors: Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin **

Abstract: This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training,… ▽ More This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training, and recipe pipelines. To the best of our knowledge, this toolkit is the first platform that allows a fair and highly-reproducible comparison between several published works in SVS. In addition, we also demonstrate several advanced usages based on the toolkit functionalities, including multilingual training and transfer learning. This paper describes the major framework of Muskits, its functionalities, and experimental results in single-singer, multi-singer, multilingual, and transfer learning scenarios. The toolkit is publicly available at https://github.com/SJTMusicTeam/Muskits. △ Less

Submitted 2 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: Accepted by Interspeech

arXiv:2203.17001 [pdf, other]

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Authors: Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin **

Abstract: Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation… ▽ More Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations. △ Less

Submitted 6 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Accepted by INTERSPEECH 2022

arXiv:2203.09294 [pdf, other]

A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

Authors: Shi Guo, Xi Yang, Jianqi Ma, Gaofeng Ren, Lei Zhang

Abstract: Denoising and demosaicking are two essential steps to reconstruct a clean full-color image from the raw data. Recently, joint denoising and demosaicking (JDD) for burst images, namely JDD-B, has attracted much attention by using multiple raw images captured in a short time to reconstruct a single high-quality image. One key challenge of JDD-B lies in the robust alignment of image frames. State-of-… ▽ More Denoising and demosaicking are two essential steps to reconstruct a clean full-color image from the raw data. Recently, joint denoising and demosaicking (JDD) for burst images, namely JDD-B, has attracted much attention by using multiple raw images captured in a short time to reconstruct a single high-quality image. One key challenge of JDD-B lies in the robust alignment of image frames. State-of-the-art alignment methods in feature domain cannot effectively utilize the temporal information of burst images, where large shifts commonly exist due to camera and object motion. In addition, the higher resolution (e.g., 4K) of modern imaging devices results in larger displacement between frames. To address these challenges, we design a differentiable two-stage alignment scheme sequentially in patch and pixel level for effective JDD-B. The input burst images are firstly aligned in the patch level by using a differentiable progressive block matching method, which can estimate the offset between distant frames with small computational cost. Then we perform implicit pixel-wise alignment in full-resolution feature domain to refine the alignment results. The two stages are jointly trained in an end-to-end manner. Extensive experiments demonstrate the significant improvement of our method over existing JDD-B methods. Codes are available at https://github.com/GuoShi28/2StageAlign. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition 2022

arXiv:2202.11703 [pdf, other]

U-Attention to Textures: Hierarchical Hourglass Vision Transformer for Universal Texture Synthesis

Authors: Shouchang Guo, Valentin Deschaintre, Douglas Noll, Arthur Roullier

Abstract: We present a novel U-Attention vision Transformer for universal texture synthesis. We exploit the natural long-range dependencies enabled by the attention mechanism to allow our approach to synthesize diverse textures while preserving their structures in a single inference. We propose a hierarchical hourglass backbone that attends to the global structure and performs patch map** at varying scale… ▽ More We present a novel U-Attention vision Transformer for universal texture synthesis. We exploit the natural long-range dependencies enabled by the attention mechanism to allow our approach to synthesize diverse textures while preserving their structures in a single inference. We propose a hierarchical hourglass backbone that attends to the global structure and performs patch map** at varying scales in a coarse-to-fine-to-coarse stream. Completed by skip connection and convolution designs that propagate and fuse information at different scales, our hierarchical U-Attention architecture unifies attention to features from macro structures to micro details, and progressively refines synthesis results at successive stages. Our method achieves stronger 2$\times$ synthesis than previous work on both stochastic and structured textures while generalizing to unseen textures without fine-tuning. Ablation studies demonstrate the effectiveness of each component of our architecture. △ Less

Submitted 30 June, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

arXiv:2202.02072 [pdf, ps, other]

Signal Sha** for Semantic Communication Systems with A Few Message Candidates

Authors: Shuaishuai Guo, Yanhu Wang, Peng Zhang

Abstract: Semantic communications target to reliably convey the semantic meaning of messages. It is different from existing communication systems focusing on reliable bit transmission. To achieve the goal of semantic communications, we propose a signal sha** method by minimizing the semantic loss, which is measured by the pretrained bidirectional encoder representation from transformers (BERT) model. The… ▽ More Semantic communications target to reliably convey the semantic meaning of messages. It is different from existing communication systems focusing on reliable bit transmission. To achieve the goal of semantic communications, we propose a signal sha** method by minimizing the semantic loss, which is measured by the pretrained bidirectional encoder representation from transformers (BERT) model. The signal set optimization problem for semantic communication systems with a few message candidates is investigated. We propose an efficient projected gradient descent method to solve the problem and prove its convergence. Simulation results show that the proposed method outperforms existing signal sha** methods in minimizing the semantic loss. △ Less

Submitted 18 August, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

arXiv:2111.14916 [pdf]

doi 10.1364/OE.469238

High-Speed Light Focusing through Scattering Medium by Cooperatively Accelerated Genetic Algorithm

Authors: Shu Guo, Lin Pang

Abstract: We develop an accelerated Genetic Algorithm (GA) system constructed by the cooperation of field-programmable gate array (FPGA) and optimized parameters of the GA. We found the enhanced decay of mutation rate makes convergence of the GA much faster, enabling the parameter-induced acceleration of the GA. Furthermore, the accelerated configuration of the GA is programmed in FPGA to boost processing s… ▽ More We develop an accelerated Genetic Algorithm (GA) system constructed by the cooperation of field-programmable gate array (FPGA) and optimized parameters of the GA. We found the enhanced decay of mutation rate makes convergence of the GA much faster, enabling the parameter-induced acceleration of the GA. Furthermore, the accelerated configuration of the GA is programmed in FPGA to boost processing speed at the hardware level without external computation devices. This system has ability to focus light through scattering medium within 4 seconds with robust noise resistance and stable repetition performance, which could be further reduced to millisecond level with advanced board configuration. This study solves the long-term limitation of the GA, it promotes the applications of the GA in dynamic scattering mediums, with the capability to tackle wavefront sha** in biological material. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 17 pages, 10 figures

arXiv:2109.12543 [pdf, ps, other]

SDN-based Resource Allocation in Edge and Cloud Computing Systems: An Evolutionary Stackelberg Differential Game Approach

Authors: Jun Du, Chunxiao Jiang, Abderrahim Benslimane, Song Guo, Yong Ren

Abstract: Recently, the boosting growth of computation-heavy applications raises great challenges for the Fifth Generation (5G) and future wireless networks. As responding, the hybrid edge and cloud computing (ECC) system has been expected as a promising solution to handle the increasing computational applications with low-latency and on-demand services of computation offloading, which requires new computin… ▽ More Recently, the boosting growth of computation-heavy applications raises great challenges for the Fifth Generation (5G) and future wireless networks. As responding, the hybrid edge and cloud computing (ECC) system has been expected as a promising solution to handle the increasing computational applications with low-latency and on-demand services of computation offloading, which requires new computing resource sharing and access control technology paradigms. This work establishes a software-defined networking (SDN) based architecture for edge/cloud computing services in 5G heterogeneous networks (HetNets), which can support efficient and on-demand computing resource management to optimize resource utilization and satisfy the time-varying computational tasks uploaded by user devices. In addition, resulting from the information incompleteness, we design an evolutionary game based service selection for users, which can model the replicator dynamics of service subscription. Based on this dynamic access model, a Stackelberg differential game based cloud computing resource sharing mechanism is proposed to facilitate the resource trading between the cloud computing service provider (CCP) and different edge computing service providers (ECPs). Then we derive the optimal pricing and allocation strategies of cloud computing resource based on the replicator dynamics of users' service selection. These strategies can promise the maximum integral utilities to all computing service providers (CPs), meanwhile the user distribution can reach the evolutionary stable state at this Stackelberg equilibrium. Furthermore, simulation results validate the performance of the designed resource sharing mechanism, and reveal the convergence and equilibrium states of user selection, and computing resource pricing and allocation. △ Less

Submitted 26 September, 2021; originally announced September 2021.

arXiv:2107.05464 [pdf, other]

IGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control

Authors: Xiaoyan Cao, Yao Yao, Lanqing Li, Wanpeng Zhang, Zhicheng An, Zhong Zhang, Li Xiao, Shihui Guo, Xiaoyu Cao, Meihong Wu, Dijun Luo

Abstract: Agriculture is the foundation of human civilization. However, the rapid increase of the global population poses a challenge on this cornerstone by demanding more food. Modern autonomous greenhouses, equipped with sensors and actuators, provide a promising solution to the problem by empowering precise control for high-efficient food production. However, the optimal control of autonomous greenhouses… ▽ More Agriculture is the foundation of human civilization. However, the rapid increase of the global population poses a challenge on this cornerstone by demanding more food. Modern autonomous greenhouses, equipped with sensors and actuators, provide a promising solution to the problem by empowering precise control for high-efficient food production. However, the optimal control of autonomous greenhouses is challenging, requiring decision-making based on high-dimensional sensory data, and the scaling of production is limited by the scarcity of labor capable of handling this task. With the advances of artificial intelligence (AI), the internet of things (IoT), and cloud computing technologies, we are hopeful to provide a solution to automate and smarten greenhouse control to address the above challenges. In this paper, we propose a smart agriculture solution named iGrow, for autonomous greenhouse control (AGC): (1) for the first time, we formulate the AGC problem as a Markov decision process (MDP) optimization problem; (2) we design a neural network-based simulator incorporated with the incremental mechanism to simulate the complete planting process of an autonomous greenhouse, which provides a testbed for the optimization of control strategies; (3) we propose a closed-loop bi-level optimization algorithm, which can dynamically re-optimize the greenhouse control strategy with newly observed data during real-world production. We not only conduct simulation experiments but also deploy iGrow in real scenarios, and experimental results demonstrate the effectiveness and superiority of iGrow in autonomous greenhouse simulation and optimal control. Particularly, compelling results from the tomato pilot project in real autonomous greenhouses show that our solution significantly increases crop yield (+10.15\%) and net profit (+92.70\%) with statistical significance compared to planting experts. △ Less

Submitted 14 March, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: 9 pages, 5 figures, 2 tables, accepted by AAAI 2022

arXiv:2106.05458 [pdf, other]

Joint Landmark and Structure Learning for Automatic Evaluation of Developmental Dysplasia of the Hip

Authors: Xindi Hu, Limin Wang, Xin Yang, Xu Zhou, Wufeng Xue, Yan Cao, Shengfeng Liu, Yuhao Huang, Shuang** Guo, Ning Shang, Dong Ni, Ning Gu

Abstract: The ultrasound (US) screening of the infant hip is vital for the early diagnosis of developmental dysplasia of the hip (DDH). The US diagnosis of DDH refers to measuring alpha and beta angles that quantify hip joint development. These two angles are calculated from key anatomical landmarks and structures of the hip. However, this measurement process is not trivial for sonographers and usually requ… ▽ More The ultrasound (US) screening of the infant hip is vital for the early diagnosis of developmental dysplasia of the hip (DDH). The US diagnosis of DDH refers to measuring alpha and beta angles that quantify hip joint development. These two angles are calculated from key anatomical landmarks and structures of the hip. However, this measurement process is not trivial for sonographers and usually requires a thorough understanding of complex anatomical structures. In this study, we propose a multi-task framework to learn the relationships among landmarks and structures jointly and automatically evaluate DDH. Our multi-task networks are equipped with three novel modules. Firstly, we adopt Mask R-CNN as the basic framework to detect and segment key anatomical structures and add one landmark detection branch to form a new multi-task framework. Secondly, we propose a novel shape similarity loss to refine the incomplete anatomical structure prediction robustly and accurately. Thirdly, we further incorporate the landmark-structure consistent prior to ensure the consistency of the bony rim estimated from the segmented structure and the detected landmark. In our experiments, 1,231 US images of the infant hip from 632 patients are collected, of which 247 images from 126 patients are tested. The average errors in alpha and beta angles are 2.221 degrees and 2.899 degrees. About 93% and 85% estimates of alpha and beta angles have errors less than 5 degrees, respectively. Experimental results demonstrate that the proposed method can accurately and robustly realize the automatic evaluation of DDH, showing great potential for clinical application. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: Accepted by IEEE Journal of Biomedical and Health Informatics. 14 pages, 10 figures and 10 tables

arXiv:2105.08350 [pdf, other]

doi 10.1109/TIP.2021.3134466

Generic Reversible Visible Watermarking Via Regularized Graph Fourier Transform Coding

Authors: Wenfa Qi, Sirui Guo, Wei Hu

Abstract: Reversible visible watermarking (RVW) is an active copyright protection mechanism. It not only transparently superimposes copyright patterns on specific positions of digital images or video frames to declare the copyright ownership information, but also completely erases the visible watermark image and thus enables restoring the original host image without any distortion. However, existing RVW alg… ▽ More Reversible visible watermarking (RVW) is an active copyright protection mechanism. It not only transparently superimposes copyright patterns on specific positions of digital images or video frames to declare the copyright ownership information, but also completely erases the visible watermark image and thus enables restoring the original host image without any distortion. However, existing RVW algorithms mostly construct the reversible map** mechanism for a specific visible watermarking scheme, which is not versatile. Hence, we propose a generic RVW framework to accommodate various visible watermarking schemes. In particular, we obtain a reconstruction data packet -- the compressed difference image between the watermarked image and the original host image, which is embedded into the watermarked image via any conventional reversible data hiding method to facilitate the blind recovery of the host image. The key is to achieve compact compression of the difference image for efficient embedding of the reconstruction data packet. To this end, we propose regularized Graph Fourier Transform (GFT) coding, where the difference image is smoothed via the graph Laplacian regularizer for more efficient compression and then encoded by multi-resolution GFTs in an approximately optimal manner. Experimental results show that the proposed framework has much better versatility than state-of-the-art methods. Due to the small amount of auxiliary information to be embedded, the visual quality of the watermarked image is also higher. △ Less

Submitted 26 November, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: This manuscript is accepted to IEEE Transactions on Image Processing on November 21th 2021. It has 15 pages, 12 figures and 4 tables

arXiv:2104.08395 [pdf, other]

Manifold Model for High-Resolution fMRI Joint Reconstruction and Dynamic Quantification

Authors: Shouchang Guo, Jeffrey A. Fessler, Douglas C. Noll

Abstract: Oscillating Steady-State Imaging (OSSI) is a recent fMRI acquisition method that exploits a large and oscillating signal, and can provide high SNR fMRI. However, the oscillatory nature of the signal leads to an increased number of acquisitions. To improve temporal resolution and accurately model the nonlinearity of OSSI signals, we build the MR physics for OSSI signal generation as a regularizer f… ▽ More Oscillating Steady-State Imaging (OSSI) is a recent fMRI acquisition method that exploits a large and oscillating signal, and can provide high SNR fMRI. However, the oscillatory nature of the signal leads to an increased number of acquisitions. To improve temporal resolution and accurately model the nonlinearity of OSSI signals, we build the MR physics for OSSI signal generation as a regularizer for the undersampled reconstruction rather than using subspace models that are not well suited for the data. Our proposed physics-based manifold model turns the disadvantages of OSSI acquisition into advantages and enables joint reconstruction and quantification. OSSI manifold model (OSSIMM) outperforms subspace models and reconstructs high-resolution fMRI images with a factor of 12 acceleration and without spatial or temporal resolution smoothing. Furthermore, OSSIMM can dynamically quantify important physics parameters, including $R_2^*$ maps, with a temporal resolution of 150 ms. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2103.10651 [pdf, other]

SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems

Authors: Yuxuan Chen, Jiangshan Zhang, Xue**g Yuan, Shengzhi Zhang, Kai Chen, Xiaofeng Wang, Shanqing Guo

Abstract: With the wide use of Automatic Speech Recognition (ASR) in applications such as human machine interaction, simultaneous interpretation, audio transcription, etc., its security protection becomes increasingly important. Although recent studies have brought to light the weaknesses of popular ASR systems that enable out-of-band signal attack, adversarial attack, etc., and further proposed various rem… ▽ More With the wide use of Automatic Speech Recognition (ASR) in applications such as human machine interaction, simultaneous interpretation, audio transcription, etc., its security protection becomes increasingly important. Although recent studies have brought to light the weaknesses of popular ASR systems that enable out-of-band signal attack, adversarial attack, etc., and further proposed various remedies (signal smoothing, adversarial training, etc.), a systematic understanding of ASR security (both attacks and defenses) is still missing, especially on how realistic such threats are and how general existing protection could be. In this paper, we present our systematization of knowledge for ASR security and provide a comprehensive taxonomy for existing work based on a modularized workflow. More importantly, we align the research in this domain with that on security in Image Recognition System (IRS), which has been extensively studied, using the domain knowledge in the latter to help understand where we stand in the former. Generally, both IRS and ASR are perceptual systems. Their similarities allow us to systematically study existing literature in ASR security based on the spectrum of attacks and defense solutions proposed for IRS, and pinpoint the directions of more advanced attacks and the directions potentially leading to more effective protection in ASR. In contrast, their differences, especially the complexity of ASR compared with IRS, help us learn unique challenges and opportunities in ASR security. Particularly, our experimental study shows that transfer learning across ASR models is feasible, even in the absence of knowledge about models (even their types) and training data. △ Less

Submitted 30 July, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: 17 pages

arXiv:2103.02813 [pdf, other]

PET Image Reconstruction with Multiple Kernels and Multiple Kernel Space Regularizers

Authors: Shiyao Guo, Yuxia Sheng, Shenpeng Li, Li Chai, **gxin Zhang

Abstract: Kernelized maximum-likelihood (ML) expectation maximization (EM) methods have recently gained prominence in PET image reconstruction, outperforming many previous state-of-the-art methods. But they are not immune to the problems of non-kernelized MLEM methods in potentially large reconstruction error and high sensitivity to iteration number. This paper demonstrates these problems by theoretical rea… ▽ More Kernelized maximum-likelihood (ML) expectation maximization (EM) methods have recently gained prominence in PET image reconstruction, outperforming many previous state-of-the-art methods. But they are not immune to the problems of non-kernelized MLEM methods in potentially large reconstruction error and high sensitivity to iteration number. This paper demonstrates these problems by theoretical reasoning and experiment results, and provides a novel solution to solve these problems. The solution is a regularized kernelized MLEM with multiple kernel matrices and multiple kernel space regularizers that can be tailored for different applications. To reduce the reconstruction error and the sensitivity to iteration number, we present a general class of multi-kernel matrices and two regularizers consisting of kernel image dictionary and kernel image Laplacian quatradic, and use them to derive the single-kernel regularized EM and multi-kernel regularized EM algorithms for PET image reconstruction. These new algorithms are derived using the technical tools of multi-kernel combination in machine learning, image dictionary learning in sparse coding, and graph Laplcian quadratic in graph signal processing. Extensive tests and comparisons on the simulated and in vivo data are presented to validate and evaluate the new algorithms, and demonstrate their superior performance and advantages over the kernelized MLEM and other conventional methods. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 21 pages, 9 figures

arXiv:2101.09870 [pdf, other]

doi 10.1109/TIP.2021.3100312

Joint Denoising and Demosaicking with Green Channel Prior for Real-world Burst Images

Authors: Shi Guo, Zhetong Liang, Lei Zhang

Abstract: Denoising and demosaicking are essential yet correlated steps to reconstruct a full color image from the raw color filter array (CFA) data. By learning a deep convolutional neural network (CNN), significant progress has been achieved to perform denoising and demosaicking jointly. However, most existing CNN-based joint denoising and demosaicking (JDD) methods work on a single image while assuming a… ▽ More Denoising and demosaicking are essential yet correlated steps to reconstruct a full color image from the raw color filter array (CFA) data. By learning a deep convolutional neural network (CNN), significant progress has been achieved to perform denoising and demosaicking jointly. However, most existing CNN-based joint denoising and demosaicking (JDD) methods work on a single image while assuming additive white Gaussian noise, which limits their performance on real-world applications. In this work, we study the JDD problem for real-world burst images, namely JDD-B. Considering the fact that the green channel has twice the sampling rate and better quality than the red and blue channels in CFA raw data, we propose to use this green channel prior (GCP) to build a GCP-Net for the JDD-B task. In GCP-Net, the GCP features extracted from green channels are utilized to guide the feature extraction and feature upsampling of the whole image. To compensate for the shift between frames, the offset is also estimated from GCP features to reduce the impact of noise. Our GCP-Net can preserve more image structures and details than other JDD methods while removing noise. Experiments on synthetic and real-world noisy images demonstrate the effectiveness of GCP-Net quantitatively and qualitatively. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Showing 1–50 of 69 results for author: Guo, S