-
Multi-Functional Beamforming Design for Integrated Sensing, Communication, and Computation
Authors:
Yapeng Zhao,
Qingqing Wu,
Wen Chen,
Yong Zeng,
Ruiqi Liu,
Weidong Mei,
Fen Hou,
Shaodan Ma
Abstract:
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target,…
▽ More
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target, and multiple singleantenna communication users. The BS needs to allocate the available resources to efficiently provide sensing, communication, and computation services. Due to the heavy service burden and limited power budget, the BS can partially offload the tasks to the nearby edge server instead of computing them locally. We consider the estimation of the target response matrix, a general problem in radar sensing, and utilize Cramer-Rao bound (CRB) as the corresponding performance metric. To tackle the non-convex optimization problem, we propose both semidefinite relaxation (SDR)-based alternating optimization and SDR-based successive convex approximation (SCA) algorithms to minimize the CRB of radar sensing while meeting the requirement of communication users and the need for task computing. Furthermore, we demonstrate that the optimal rankone solutions of both the alternating and SCA algorithms can be directly obtained via the solver or further constructed even when dealing with multiple functionalities. Simulation results show that the proposed algorithms can provide higher target estimation performance than state-of-the-art benchmarks while satisfying the communication and computation constraints.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
MIMO-OFDM ISAC Waveform Design for Range-Doppler Sidelobe Suppression
Authors:
Peishi Li,
Ming Li,
Rang Liu,
Qian Liu,
A. Lee Swindlehurst
Abstract:
Integrated sensing and communication (ISAC) is a key enabling technique for future wireless networks owing to its efficient hardware and spectrum utilization. In this paper, we focus on dual-functional waveform design for a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system, which is considered to be a promising solution for practical deployment. Since th…
▽ More
Integrated sensing and communication (ISAC) is a key enabling technique for future wireless networks owing to its efficient hardware and spectrum utilization. In this paper, we focus on dual-functional waveform design for a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system, which is considered to be a promising solution for practical deployment. Since the dual-functional waveform carries communication information, its random nature leads to high range-Doppler sidelobes in the ambiguity function, which in turn degrades radar sensing performance. To suppress range-Doppler sidelobes, we propose a novel symbol-level precoding (SLP) based waveform design for MIMO-OFDM ISAC systems by fully exploiting the temporal degrees of freedom (DoFs). Our goal is to minimize the range-Doppler integrated sidelobe level (ISL) while satisfying the constraints of target illumination power, multi-user communication quality of service (QoS), and constant-modulus transmission. To solve the resulting non-convex waveform design problem, we develop an efficient algorithm using the majorization-minimization (MM) and alternative direction method of multipliers (ADMM) methods. Simulation results show that the proposed waveform has significantly reduced range-Doppler sidelobes compared with signals designed only for communications and other baselines. In addition, the proposed waveform design achieves target detection and estimation performance close to that achievable by waveforms designed only for radar, which demonstrates the superiority of the proposed SLP-based ISAC approach.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Joint Power Allocation and Beamforming Design for Active IRS-Aided Directional Modulation Secure Systems
Authors:
Yifan Zhao,
Xiaoyu Wang,
Kaibo Zhou,
Xuehui Wang,
Yan Wang,
Wei Gao,
Ruiqi Liu,
Feng Shu
Abstract:
Since the secrecy rate (SR) performance improvement obtained by secure directional modulation (DM) network is limited, an active intelligent reflective surface (IRS)-assisted DM network is considered to attain a high SR. To address the SR maximization problem, a novel method based on Lagrangian dual transform and closed-form fractional programming algorithm (LDT-CFFP) is proposed, where the soluti…
▽ More
Since the secrecy rate (SR) performance improvement obtained by secure directional modulation (DM) network is limited, an active intelligent reflective surface (IRS)-assisted DM network is considered to attain a high SR. To address the SR maximization problem, a novel method based on Lagrangian dual transform and closed-form fractional programming algorithm (LDT-CFFP) is proposed, where the solutions to base station (BS) beamforming vectors and IRS reflection coefficient matrix are achieved. However, the computational complexity of LDT-CFFP method is high . To reduce its complexity, a blocked IRS-assisted DM network is designed. To meet the requirements of the network performance, a power allocation (PA) strategy is proposed and adopted in the system. Specifically, the system power between BS and IRS, as well as the transmission power for confidential messages (CM) and artificial noise (AN) from the BS, are allocated separately. Then we put forward null-space projection (NSP) method, maximum-ratio-reflecting (MRR) algorithm and PA strategy (NSP-MRR-PA) to solve the SR maximization problem. The CF solutions to BS beamforming vectors and IRS reflection coefficient matrix are respectively attained via NSP and MRR algorithms. For the PA factors, we take advantage of exhaustive search (ES) algorithm, particle swarm optimization (PSO) and simulated annealing (SA) algorithm to search for the solutions. From simulation results, it is verified that the LDT-CFFP method derives a higher SR gain over NSP-MRR-PA method. For NSP-MRR-PA method, the number of IRS units in each block possesses a significant SR performance. In addition, the application PA strategies, namely ES, PSO, SA methods outperforms the other PA strategies with fixed PA factors.
△ Less
Submitted 25 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Multi-Static ISAC based on Network-Assisted Full-Duplex Cell-Free Networks: Performance Analysis and Duplex Mode Optimization
Authors:
Fan Zeng,
Ruoyun Liu,
Xiaoyu Sun,
**gxuan Yu,
Jiamin Li,
Pengchen Zhu,
Dongming Wang,
Xiaohu You
Abstract:
Multi-static integrated sensing and communication (ISAC) technology, which can achieve a wider coverage range and avoid self-interference, is an important trend for the future development of ISAC. Existing multi-static ISAC designs are unable to support the asymmetric uplink (UL)/downlink (DL) communication requirements in the scenario while simultaneously achieving optimal sensing performance. Th…
▽ More
Multi-static integrated sensing and communication (ISAC) technology, which can achieve a wider coverage range and avoid self-interference, is an important trend for the future development of ISAC. Existing multi-static ISAC designs are unable to support the asymmetric uplink (UL)/downlink (DL) communication requirements in the scenario while simultaneously achieving optimal sensing performance. This paper proposes a design for multi-static ISAC based on network-assisted full-duplex (NAFD) cell-free networks can well solve the above problems. Under this design, closed-form expressions for the individual comunication rate and localization error rate are derived under imperfect channel state information, which are respectively utilized to assess the communication and sensing performances. Then, we propose a deep Q-network-based accesss point (AP) duplex mode optimization algorithm to obtain the trade-off between communication and sensing from the UL and DL perspectives of the APs. Simulation results demonstrate that the NAFD-based ISAC system proposed in this paper can achieve significantly better communication performance than other ISAC systems while ensuring minimal impact on sensing performance. Then, we validate the accuracy of the derived closed-form expressions. Furthermore, the proposed optimization algorithm achieves performance comparable to that of the exhaustion method with low complexity.
△ Less
Submitted 12 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
Authors:
Rui Liu,
Zening Ma
Abstract:
Speech Self-Supervised Learning (SSL) has demonstrated considerable efficacy in various downstream tasks. Nevertheless, prevailing self-supervised models often overlook the incorporation of emotion-related prior information, thereby neglecting the potential enhancement of emotion task comprehension through emotion prior knowledge in speech. In this paper, we propose an emotion-aware speech represe…
▽ More
Speech Self-Supervised Learning (SSL) has demonstrated considerable efficacy in various downstream tasks. Nevertheless, prevailing self-supervised models often overlook the incorporation of emotion-related prior information, thereby neglecting the potential enhancement of emotion task comprehension through emotion prior knowledge in speech. In this paper, we propose an emotion-aware speech representation learning with intensity knowledge. Specifically, we extract frame-level emotion intensities using an established speech-emotion understanding model. Subsequently, we propose a novel emotional masking strategy (EMS) to incorporate emotion intensities into the masking process. We selected two representative models based on Transformer and CNN, namely MockingJay and Non-autoregressive Predictive Coding (NPC), and conducted experiments on IEMOCAP dataset. Experiments have demonstrated that the representations derived from our proposed method outperform the original model in SER task.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS
Authors:
Ruiqi Liu,
Shuang Zheng,
Qingqing Wu,
Yifan Jiang,
Nan Zhang,
Yuanwei Liu,
Marco Di Renzo,
and George C. Alexandropoulos
Abstract:
Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research inv…
▽ More
Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research investigations and applications. To pave the way to the formal standardization of RISs, the European Telecommunications Standards Institute (ETSI) launched the Industry Specification Group (ISG) on the RIS technology in September 2021. This article provides a comprehensive overview of the status of the work conducted by the ETSI ISG RIS, covering typical deployment scenarios of reconfigurable metasurfaces, use cases and operating applications, requirements, emerging hardware architectures and operating modes, as well as the latest insights regarding future directions of RISs and the resulting smart wireless environments.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Multipath Exploitation for Fluctuating Target Detection in RIS-Assisted ISAC Systems
Authors:
Shoushuo Zhang,
Zichao Xiao,
Rang Liu,
Ming Li,
Wei Wang,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctu…
▽ More
Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctuations. In this letter, we propose to utilize reconfigurable intelligent surfaces (RIS) in ISAC systems to provide high-quality and controllable multipath propagation for improving the performance of fluctuating target detection and simultaneously enhancing the quality of communication services. To effectively exploit the spatial diversity offered by RIS-empowered multipath, the dual-functional transmit beamforming and the RIS reflection beamforming are jointly designed to maximize the expectation of radar signal-to-noise ratio (SNR). To solve the resulting complex non-convex optimization problem, we develop an efficient alternating optimization algorithm that utilizes majorization-minimization (MM) and alternating direction method of multipliers (ADMM) algorithms. Simulation results illustrate the advantages of multipath exploitation and the proposed beamforming design algorithm for fluctuating target detection in RIS-assisted ISAC systems.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Deep Learning-Based CSI Feedback for XL-MIMO Systems in the Near-Field Domain
Authors:
Zhangjie Peng,
Rui**g Liu,
Zhaotian Li,
Cunhua Pan,
Jiangzhou Wang
Abstract:
In this paper, we consider an extremely large-scale massive multiple-input-multiple-output (XL-MIMO) system. As the scale of antenna arrays increases, the range of near-field communications also expands. In this case, the signals no longer exhibit planar wave characteristics but spherical wave characteristics in the near-field channel, which makes the channel state information (CSI) highly complex…
▽ More
In this paper, we consider an extremely large-scale massive multiple-input-multiple-output (XL-MIMO) system. As the scale of antenna arrays increases, the range of near-field communications also expands. In this case, the signals no longer exhibit planar wave characteristics but spherical wave characteristics in the near-field channel, which makes the channel state information (CSI) highly complex. Additionally, the increase of the antenna arrays scale also makes the size of the CSI matrix significantly increase. Therefore, CSI feedback in the near-field channel becomes highly challenging. To solve this issue, we propose a deep-learning (DL)-based ExtendNLNet that can compress the CSI, and further reduce the overhead of CSI feedback. In addition, we have introduced the Non-Local block to obtain a larger area of CSI features. Simulation results show that the proposed ExtendNLNet can significantly improve the CSI recovery quality compared to other DL-based methods.
△ Less
Submitted 22 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces
Authors:
Yi Ding,
Yong Li,
Hao Sun,
Rui Liu,
Chengxuan Tong,
Cuntai Guan
Abstract:
Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine tempora…
▽ More
Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasks consistently verify the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms existing state-of-the-art methods or is comparable to them. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks. The source code can be found at https://github.com/yi-ding-cs/EEG-Deformer.
△ Less
Submitted 25 April, 2024;
originally announced May 2024.
-
Robust Proximity Detection using On-Device Gait Monitoring
Authors:
Yuqian Hu,
Guozhen Zhu,
Beibei Wang,
K. J. Ray Liu
Abstract:
Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates…
▽ More
Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates gait presence by leveraging the speed estimated from the autocorrelation function (ACF) of channel state information (CSI). By combining this gait score with a proximity feature, our approach effectively distinguishes different transition patterns, enabling more reliable proximity detection. In addition, to enhance the stability of the detection process, we employ a state machine and extract temporal information, ensuring continuous proximity detection even during subtle movements. Extensive experiments conducted in different environments demonstrate an overall detection rate of 92.5% and a low false alarm rate of 1.12% with a delay of 0.825s.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Learning and Optimization for Price-based Demand Response of Electric Vehicle Charging
Authors:
Chengyang Gu,
Yuxin Pan,
Ruohong Liu,
Yize Chen
Abstract:
In the context of charging electric vehicles (EVs), the price-based demand response (PBDR) is becoming increasingly significant for charging load management. Such response usually encourages cost-sensitive customers to adjust their energy demand in response to changes in price for financial incentives. Thus, to model and optimize EV charging, it is important for charging station operator to model…
▽ More
In the context of charging electric vehicles (EVs), the price-based demand response (PBDR) is becoming increasingly significant for charging load management. Such response usually encourages cost-sensitive customers to adjust their energy demand in response to changes in price for financial incentives. Thus, to model and optimize EV charging, it is important for charging station operator to model the PBDR patterns of EV customers by precisely predicting charging demands given price signals. Then the operator refers to these demands to optimize charging station power allocation policy. The standard pipeline involves offline fitting of a PBDR function based on historical EV charging records, followed by applying estimated EV demands in downstream charging station operation optimization. In this work, we propose a new decision-focused end-to-end framework for PBDR modeling that combines prediction errors and downstream optimization cost errors in the model learning stage. We evaluate the effectiveness of our method on a simulation of charging station operation with synthetic PBDR patterns of EV customers, and experimental results demonstrate that this framework can provide a more reliable prediction model for the ultimate optimization process, leading to more effective optimization solutions in terms of cost savings and charging station operation objectives with only a few training samples.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario
Authors:
Renyang Liu,
Kwok-Yan Lam,
Wei Zhou,
Sixing Wu,
Jun Zhao,
Dongting Hu,
Mingming Gong
Abstract:
Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built…
▽ More
Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built in a noise-adding manner are abnormal and struggle to successfully attack robust models, whose robustness is enhanced by adversarial training against small perturbations. There is no doubt that these two issues mentioned above will significantly increase the risk of exposure and result in a failure to dig deeply into the vulnerability of DNNs. Hence, it is necessary to evaluate DNNs' fragility sufficiently under query-limited settings in a non-additional way. In this paper, we propose the Spatial Transform Black-box Attack (STBA), a novel framework to craft formidable adversarial examples in the query-limited scenario. Specifically, STBA introduces a flow field to the high-frequency part of clean images to generate adversarial examples and adopts the following two processes to enhance their naturalness and significantly improve the query efficiency: a) we apply an estimated flow field to the high-frequency part of clean images to generate adversarial examples instead of introducing external noise to the benign image, and b) we leverage an efficient gradient estimation method based on a batch of samples to optimize such an ideal flow field under query-limited settings. Compared to existing score-based black-box baselines, extensive experiments indicated that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Real-time Safety Index Adaptation for Parameter-varying Systems via Determinant Gradient Ascend
Authors:
Rui Chen,
Weiye Zhao,
Ruixuan Liu,
Weiyang Zhang,
Changliu Liu
Abstract:
Safety Index Synthesis (SIS) is critical for deriving safe control laws. Recent works propose to synthesize a safety index (SI) via nonlinear programming and derive a safe control law such that the system 1) achieves forward invariant (FI) with some safe set and 2) guarantees finite time convergence (FTC) to that safe set. However, real-world system dynamics can vary during run-time, making the co…
▽ More
Safety Index Synthesis (SIS) is critical for deriving safe control laws. Recent works propose to synthesize a safety index (SI) via nonlinear programming and derive a safe control law such that the system 1) achieves forward invariant (FI) with some safe set and 2) guarantees finite time convergence (FTC) to that safe set. However, real-world system dynamics can vary during run-time, making the control law infeasible and invalidating the initial SI. Since the full SIS nonlinear programming is computationally expensive, it is infeasible to re-synthesize the SI each time the dynamics are perturbed. To address that, this paper proposes an efficient approach to adapting the SI to varying system dynamics and maintaining the feasibility of the safe control law. The proposed method leverages determinant gradient ascend and derives a closed-form update to safety index parameters, enabling real-time adaptation performance. A numerical study validates the effectiveness of our approach.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Large-Scale RIS Enabled Air-Ground Channels: Near-Field Modeling and Analysis
Authors:
Hao Jiang,
Wangqi Shi,
Zaichen Zhang,
Cunhua Pan,
Qingqing Wu,
Feng Shu,
Ruiqi Liu,
Jiangzhou Wang
Abstract:
Existing works mainly rely on the far-field planar-wave-based channel model to assess the performance of reconfigurable intelligent surface (RIS)-enabled wireless communication systems. However, when the transmitter and receiver are in near-field ranges, this will result in relatively low computing accuracy. To tackle this challenge, we initially develop an analytical framework for sub-array parti…
▽ More
Existing works mainly rely on the far-field planar-wave-based channel model to assess the performance of reconfigurable intelligent surface (RIS)-enabled wireless communication systems. However, when the transmitter and receiver are in near-field ranges, this will result in relatively low computing accuracy. To tackle this challenge, we initially develop an analytical framework for sub-array partitioning. This framework divides the large-scale RIS array into multiple sub-arrays, effectively reducing modeling complexity while maintaining acceptable accuracy. Then, we develop a beam domain channel model based on the proposed sub-array partition framework for large-scale RIS-enabled UAV-to-vehicle communication systems, which can be used to efficiently capture the sparse features in RIS-enabled UAV-to-vehicle channels in both near-field and far-field ranges. Furthermore, some important propagation characteristics of the proposed channel model, including the spatial cross-correlation functions (CCFs), temporal auto-correlation functions (ACFs), frequency correlation functions (CFs), and channel capacities with respect to the different physical features of the RIS and non-stationary properties of the channel model are derived and analyzed. Finally, simulation results are provided to demonstrate that the proposed framework is helpful to achieve a good tradeoff between model complexity and accuracy for investigating the channel propagation characteristics, and therefore providing highly-efficient communications in RIS-enabled UAV-to-vehicle wireless networks.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Deep Learning-Based CSI Feedback for RIS-Aided Massive MIMO Systems with Time Correlation
Authors:
Zhangjie Peng,
Zhaotian Li,
Rui**g Liu,
Cunhua Pan,
Feiniu Yuan,
Jiangzhou Wang
Abstract:
In this paper, we consider an reconfigurable intelligent surface (RIS)-aided frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) downlink system.In the FDD systems, the downlink channel state information (CSI) should be sent to the base station through the feedback link. However, the overhead of CSI feedback occupies substantial uplink bandwidth resources in RIS-aided com…
▽ More
In this paper, we consider an reconfigurable intelligent surface (RIS)-aided frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) downlink system.In the FDD systems, the downlink channel state information (CSI) should be sent to the base station through the feedback link. However, the overhead of CSI feedback occupies substantial uplink bandwidth resources in RIS-aided communication systems. In this work, we propose a deep learning (DL)-based scheme to reduce the overhead of CSI feedback by compressing the cascaded CSI. In the practical RIS-aided communication systems, the cascaded channel at the adjacent slots inevitably has time correlation. We use long short-term memory to learn time correlation, which can help the neural network to improve the recovery quality of the compressed CSI. Moreover, the attention mechanism is introduced to further improve the CSI recovery quality. Simulation results demonstrate that our proposed DLbased scheme can significantly outperform other DL-based methods in terms of the CSI recovery quality
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Skeleton-Based Human Action Recognition with Noisy Labels
Authors:
Yi Xu,
Kunyu Peng,
Di Wen,
Rui** Liu,
Junwei Zheng,
Yufan Chen,
Jiaming Zhang,
Alina Roitberg,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resul…
▽ More
Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resulting in lower recognition quality. Despite its importance, addressing label noise for skeleton-based action recognition has been overlooked so far. In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark. Observations reveal that these baselines yield only marginal performance when dealing with sparse skeleton data. Consequently, we introduce a novel methodology, NoiseEraSAR, which integrates global sample selection, co-teaching, and Cross-Modal Mixture-of-Experts (CM-MOE) strategies, aimed at mitigating the adverse impacts of label noise. Our proposed approach demonstrates better performance on the established benchmark, setting new state-of-the-art standards. The source code for this study will be made accessible at https://github.com/xuyizdby/NoiseEraSAR.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Authors:
Ruibin Yuan,
Hanfeng Lin,
Yi Wang,
Zeyue Tian,
Shangda Wu,
Tianhao Shen,
Ge Zhang,
Yuhang Wu,
Cong Liu,
Ziya Zhou,
Ziyang Ma,
Liumeng Xue,
Ziyu Wang,
Qin Liu,
Tianyu Zheng,
Yizhi Li,
Yinghao Ma,
Yiming Liang,
Xiaowei Chi,
Ruibo Liu,
Zili Wang,
Pengfei Li,
**gcheng Wu,
Chenghua Lin,
Qifeng Liu
, et al. (10 additional authors not shown)
Abstract:
While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the…
▽ More
While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Pre-Chirp-Domain Index Modulation for Affine Frequency Division Multiplexing
Authors:
Guangyao Liu,
Tianqi Mao,
Ruiqi Liu,
Zhenyu Xiao
Abstract:
Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter a…
▽ More
Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter and post-chirp parameter, respectively. These parameters can be fine-tuned to avoid overlap** channel paths with different delays or Doppler shifts, leading to performance enhancement especially for doubly dispersive channel. In this paper, we propose a novel AFDM structure with the pre-chirp index modulation (PIM) philosophy (AFDM-PIM), which can embed additional information bits into the pre-chirp parameter design for both spectral and energy efficiency enhancement. Specifically, we first demonstrate that the application of distinct pre-chirp parameters to various subcarriers in the AFDM modulation process maintains the orthogonality among these subcarriers. Then, different pre-chirp parameters are flexibly assigned to each AFDM subcarrier according to the incoming bits. By such arrangement, aside from classical phase/amplitude modulation, extra binary bits can be implicitly conveyed by the indices of selected pre-chir** parameters realizations without additional energy consumption. At the receiver, both a maximum likelihood (ML) detector and a reduced-complexity ML-minimum mean square error (ML-MMSE) detector are employed to recover the information bits. It has been shown via simulations that the proposed AFDM-PIM exhibits superior bit error rate (BER) performance compared to classical AFDM, OFDM and IM-aided OFDM algorithms.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
mmID: High-Resolution mmWave Imaging for Human Identification
Authors:
Sakila S. Jayaweera,
Sai Deepika Regani,
Yuqian Hu,
Beibei Wang,
K. J. Ray Liu
Abstract:
Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this…
▽ More
Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this paper proposes to improve imaging resolution by estimating the human figure as a whole using conditional generative adversarial networks (cGAN). In order to reduce training complexity, we use an estimated spatial spectrum using the MUltiple SIgnal Classification (MUSIC) algorithm as input to the cGAN. Our system generates environmentally independent, high-resolution images that can extract unique physical features useful for human identification. We use a simple convolution layers-based classification network to obtain the final identification result. From the experimental results, we show that resolution of the image produced by our trained generator is high enough to enable human identification. Our finding indicates high-resolution accuracy with 5% mean silhouette difference to the Kinect device. Extensive experiments in different environments on multiple testers demonstrate that our system can achieve 93% overall test accuracy in unseen environments for static human target identification.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation
Authors:
Rui** Liu,
Jiaming Zhang,
Kunyu Peng,
Yufan Chen,
Ke Cao,
Junwei Zheng,
M. Saquib Sarfraz,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level…
▽ More
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model's performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code is publicly available at https://github.com/Rui**L/MISS.
△ Less
Submitted 10 April, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
A Unified NOMA Framework in Beam-Hop** Satellite Communication Systems
Authors:
Xuyang Zhang,
Xinwei Yue,
Tian Li,
Zhihao Han,
Yafei Wang,
Yong Ding,
Rongke Liu
Abstract:
This paper investigates the application of a unified non-orthogonal multiple access framework in beam hop** (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization prob…
▽ More
This paper investigates the application of a unified non-orthogonal multiple access framework in beam hop** (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization problem to minimize the square of discrete difference by jointly optimizing power allocation, carrier assignment and beam scheduling. The non-convexity of the objective function and the constraint condition is solved through Dinkelbach's transform and variable relaxation. As a further development, the closed-from and asymptotic expressions of outage probability are derived for CD/PD-NOMA-BH systems. Based on approximated results, the diversity orders of a pair of users are obtained in detail. In addition, the system throughput of U-NOMA-BH is discussed in delay-limited transmission mode. Numerical results verify that: i) The gap between traffic requests of CD/PD-NOMA-BH systems appears to be more closely compared with orthogonal multiple access based BH (OMA-BH); ii) The CD-NOMA-BH system is capable of providing the enhanced traffic request and capacity provision; and iii) The outage behaviors of CD/PD-NOMA-BH are better than that of OMA-BH.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
End-to-End Learning for SLP-Based ISAC Systems
Authors:
Yixian Zheng,
Rang Liu,
Ming Li,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC) is an encouraging wireless technology which can simultaneously perform both radar and communication functionalities by sharing the same transmit waveform, spectral resource, and hardware platform. Recently emerged symbol-level precoding (SLP) technique exhibits advancement in ISAC systems by leveraging the waveform design degrees of freedom (DoFs) in bot…
▽ More
Integrated sensing and communication (ISAC) is an encouraging wireless technology which can simultaneously perform both radar and communication functionalities by sharing the same transmit waveform, spectral resource, and hardware platform. Recently emerged symbol-level precoding (SLP) technique exhibits advancement in ISAC systems by leveraging the waveform design degrees of freedom (DoFs) in both temporal and spatial domains. However, traditional SLP-based ISAC systems are designed in a modular paradigm, which potentially limits the overall performance of communication and radar sensing. The high complexity of existing SLP design algorithms is another issue that hurdles the practical deployment. To break through the bottleneck of these approaches, in this paper we propose an end-to-end approach to jointly design the SLP-based dual-functional transmitter and receivers of communication and radar sensing. In particular, we aim to utilize deep learning-based methods to minimize the symbol error rate (SER) of communication users, maximize the detection probability, and minimize the root mean square error (RMSE) of the target angle estimation. Multi-layer perceptron (MLP) networks and a long short term memory (LSTM) network are respectively applied to the transmitter, communication users and radar receiver. Simulation results verify the feasibility of the proposed deep-learning-based end-to-end optimization for ISAC systems and reveal the effectiveness of the proposed neural networks for the end-to-end design.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
A Practical Beamforming Design for Active RIS-assisted MU-MISO Systems
Authors:
Yun Yang,
Zhi** Lu,
Ming Li,
Rang Liu,
Qian Liu
Abstract:
Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture ha…
▽ More
Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture has been recently proposed to overcome this challenge. However, existing active RIS studies rely on an ideal amplification model without considering the practical hardware limitation of amplifiers, which may cause performance degradation using such inaccurate active RIS modeling. Motivated by this fact, in this paper we first investigate the amplification principle of typical active RIS and propose a more accurate amplification model based on amplifier hardware characteristics. Then, based on the new amplification model, we propose a novel joint transmit beamforming and RIS reflection beamforming design considering the incident signal power on practical active RIS for multiuser multi-input single-output (MU-MISO) communication system. Fractional programming (FP), majorization minimization (MM) and block coordinate descent (BCD) methods are used to solve for the complex problem. Simulation results indicate the importance of the consideration of practical amplifier hardware characteristics in the joint beamforming designs and demonstrate the effectiveness of the proposed algorithm compared to other benchmarks.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle
Authors:
Hongshan Liu,
Tong Qin,
Zhen Gao,
Tianqi Mao,
Keke Ying,
Ziwei Wan,
Li Qiao,
Rui Na,
Zhongxiang Li,
Chun Hu,
Yikun Mei,
Tuan Li,
Guanghui Wen,
Lei Chen,
Zhonghuai Wu,
Ruiqi Liu,
Gaojie Chen,
Shuo Wang,
Dezhi Zheng
Abstract:
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis…
▽ More
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis between the NS-COM network and other counterparts in SAGSIN is conducted, covering aspects of deployment, coverage, channel characteristics and unique problems of NS-COM network. Afterwards, the technical aspects of NS-COM, including channel modeling, random access, channel estimation, array-based beam management and joint network optimization, are examined in detail. Furthermore, we explore the potential applications of NS-COM, such as structural expansion in SAGSIN communication, civil aviation communication, remote and urgent communication, weather monitoring and carbon neutrality. Finally, some promising research avenues are identified, including stratospheric satellite (StratoSat) -to-ground direct links for mobile terminals, reconfigurable multiple-input multiple-output (MIMO) and holographic MIMO, federated learning in NS-COM networks, maritime communication, electromagnetic spectrum sensing and adversarial game, integrated sensing and communications, StratoSat-based radar detection and imaging, NS-COM assisted enhanced global navigation system, NS-COM assisted intelligent unmanned system and free space optical (FSO) communication. Overall, this paper highlights that the NS-COM plays an indispensable role in the SAGSIN puzzle, providing substantial performance and coverage enhancement to the traditional SAGSIN architecture.
△ Less
Submitted 4 March, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
Sample Robust Scheduling of Electricity-Gas Systems Under Wind Power Uncertainty
Authors:
Rong-Peng Liu,
Yunhe Hou,
Yujia Li,
Shunbo Lei,
Wei Wei,
Xiaozhe Wang
Abstract:
This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of…
▽ More
This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of the simplified model by exploring its structural features and, accordingly, develop a solution method.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Sparsity Exploitation via Joint Receive Processing and Transmit Beamforming Design for MIMO-OFDM ISAC Systems
Authors:
Zichao Xiao,
Rang Liu,
Ming Li,
Wei Wang,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming desig…
▽ More
Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming design is presented to achieve this goal. Specifically, we first propose a compressive sensing (CS)-assisted estimation approach to facilitate ISAC receive echo processing, which can not only enable accurate recovery of target information, but also allow substantial reduction in the number of sensing subcarriers to be sampled and processed. Then, based on the proposed CS-assisted processing method, the associated transmit beamforming design is formulated with the objective of maximizing the sum-rate of multiuser communications while satisfying the transmit power budget and ensuring the received signal-to-noise ratio (SNR) for the designated sensing subcarriers. In order to address the formulated non-convex problem involving high-dimensional variables, an effective iterative algorithm employing majorization minimization (MM), fractional programming (FP), and the nonlinear equality alternative direction method of multipliers (neADMM) with closed-form solutions has been developed. Finally, extensive numerical simulations are conducted to verify the effectiveness of the proposed algorithm and the superior performance of the introduced sparsity exploitation strategy.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Modeling Load Redistribution Attacks in Integrated Electricity-Gas Systems
Authors:
Rong-Peng Liu,
Xiaozhe Wang,
Bo Zeng,
Rawad Zgheib
Abstract:
We investigate load redistribution (LR) attacks on integrated electricity-gas systems (IEGSs) and proposes a bilevel mixed-integer model to identify the most severe LR attack from an economic perspective. Under a mild assumption, we prove that the proposed model does not exclude any possible upper-level attack. A modified reformulation and decomposition (R&D) algorithm is developed to solve this m…
▽ More
We investigate load redistribution (LR) attacks on integrated electricity-gas systems (IEGSs) and proposes a bilevel mixed-integer model to identify the most severe LR attack from an economic perspective. Under a mild assumption, we prove that the proposed model does not exclude any possible upper-level attack. A modified reformulation and decomposition (R&D) algorithm is developed to solve this model in a master-subproblem framework. Particularly, we design a subproblem to address infeasibility issues in the master problem. Accordingly, two types of cuts are added to the master problem for ensuring algorithm feasibility and solution optimality.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Authors:
Rui Liu,
Yifan Hu,
Yi Ren,
Xiang Yin,
Haizhou Li
Abstract:
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion mo…
▽ More
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
A Novel RFID Authentication Protocol Based on A Block-Order-Modulus Variable Matrix Encryption Algorithm
Authors:
Yan Wang,
Ruiqi Liu,
Tong Gao,
Feng Shu,
Xuemei Lei,
Guan Gui,
Jiangzhou Wang
Abstract:
In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix…
▽ More
In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix (DBLTKM) encryption algorithm is presented, which effectively expands the feasible domain of the key space. Based on the above three algorithms, a novel joint AM-SUEO-DBLTKM encryption algorithm is constructed. Making full use of the advantages of the proposed joint algorithm, a two-way RFID authentication protocol, named AM-SUEO-DBLTKM-RFID, is proposed for mobile RFID systems. In addition, the Burrows-Abadi-Needham (BAN) logic and security analysis indicate that the proposed AM-SUEO-DBLTKM-RFID protocol can effectively combat various typical attacks. Numerical results demonstrate that the proposed AM-SUEO-DBLTKM algorithm can save 99.59\% of tag storage over traditional algorithms. Finally, the low computational complexity as well as the low storage cost of the proposed AM-SUEO-DBLTKM-RFID protocol facilitates deployment within low-cost RFID tags.
△ Less
Submitted 9 May, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations
Authors:
Mushfiqur Rahman,
Runze Liu,
Chau-Wai Wong,
Huaiyu Dai
Abstract:
In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial ima…
▽ More
In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial images of individual public figures. We propose to condition the proposed detector on the identity of the identified individual given the advantages revealed by our theory-driven simulations. While most detectors in the literature rely on perceptible or imperceptible artifacts present in deepfake facial images, we demonstrate that the detection performance can be improved by exploiting the idempotency property of neural networks. In our approach, the training process involves double neural-network operations where we pass an authentic image through a deepfake simulating network twice. Experimental results show that the proposed method improves the area under the curve (AUC) from 0.92 to 0.94 and reduces its standard deviation by 17\%. For evaluating the detection performance of individual public figures, a facial image dataset with individuals' names is required, a criterion not met by the current deepfake datasets. To address this, we curated a dataset comprising 32k images featuring 45 public figures, which we intend to release to the public after the paper is published.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
SSTA: Salient Spatially Transformed Attack
Authors:
Renyang Liu,
Wei Zhou,
Sixin Wu,
Jun Zhao,
Kwok-Yan Lam
Abstract:
Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of esca** from being detect…
▽ More
Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of esca** from being detected by naked human eyes due to the formulation of adversarial example (AE) heavily relying on a noise-adding manner. Such mentioned challenges will significantly increase the risk of exposure and result in an attack to be failed. Therefore, in this paper, we propose the Salient Spatially Transformed Attack (SSTA), a novel framework to craft imperceptible AEs, which enhance the stealthiness of AEs by estimating a smooth spatial transform metric on a most critical area to generate AEs instead of adding external noise to the whole image. Compared to state-of-the-art baselines, extensive experiments indicated that SSTA could effectively improve the imperceptibility of the AEs while maintaining a 100\% attack success rate.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Large Foundation Models for Power Systems
Authors:
Chenghao Huang,
Siyang Li,
Ruohong Liu,
Hao Wang,
Yize Chen
Abstract:
Foundation models, such as Large Language Models (LLMs), can respond to a wide range of format-free queries without any task-specific data collection or model training, creating various research and application opportunities for the modeling and operation of large-scale power systems. In this paper, we outline how such large foundation model such as GPT-4 are developed, and discuss how they can be…
▽ More
Foundation models, such as Large Language Models (LLMs), can respond to a wide range of format-free queries without any task-specific data collection or model training, creating various research and application opportunities for the modeling and operation of large-scale power systems. In this paper, we outline how such large foundation model such as GPT-4 are developed, and discuss how they can be leveraged in challenging power and energy system tasks. We first investigate the potential of existing foundation models by validating their performance on four representative tasks across power system domains, including the optimal power flow (OPF), electric vehicle (EV) scheduling, knowledge retrieval for power engineering technical reports, and situation awareness. Our results indicate strong capabilities of such foundation models on boosting the efficiency and reliability of power system operational pipelines. We also provide suggestions and projections on future deployment of foundation models in power system applications.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Navigating Open Set Scenarios for Skeleton-based Action Recognition
Authors:
Kunyu Peng,
Cheng Yin,
Junwei Zheng,
Rui** Liu,
David Schneider,
Jiaming Zhang,
Kailun Yang,
M. Saquib Sarfraz,
Rainer Stiefelhagen,
Alina Roitberg
Abstract:
In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Se…
▽ More
In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task and formalize the benchmark on three skeleton-based datasets. We assess the performance of seven established open-set approaches on our task and identify their limits and critical generalization issues when dealing with skeleton information. To address these challenges, we propose a distance-based cross-modality ensemble method that leverages the cross-modal alignment of skeleton joints, bones, and velocities to achieve superior open-set recognition performance. We refer to the key idea as CrossMax - an approach that utilizes a novel cross-modality mean max discrepancy suppression mechanism to align latent spaces during training and a cross-modality distance-based logits refinement method during testing. CrossMax outperforms existing approaches and consistently yields state-of-the-art results across all datasets and backbones. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
ASWT-SGNN: Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network
Authors:
Ruyue Liu,
Rong Yin,
Yong Liu,
Wei** Wang
Abstract:
Graph Comparative Learning (GCL) is a self-supervised method that combines the advantages of Graph Convolutional Networks (GCNs) and comparative learning, making it promising for learning node representations. However, the GCN encoders used in these methods rely on the Fourier transform to learn fixed graph representations, which is inherently limited by the uncertainty principle involving spatial…
▽ More
Graph Comparative Learning (GCL) is a self-supervised method that combines the advantages of Graph Convolutional Networks (GCNs) and comparative learning, making it promising for learning node representations. However, the GCN encoders used in these methods rely on the Fourier transform to learn fixed graph representations, which is inherently limited by the uncertainty principle involving spatial and spectral localization trade-offs. To overcome the inflexibility of existing methods and the computationally expensive eigen-decomposition and dense matrix multiplication, this paper proposes an Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network (ASWT-SGNN). The proposed method employs spectral adaptive polynomials to approximate the filter function and optimize the wavelet using contrast loss. This design enables the creation of local filters in both spectral and spatial domains, allowing flexible aggregation of neighborhood information at various scales and facilitating controlled transformation between local and global information. Compared to existing methods, the proposed approach reduces computational complexity and addresses the limitation of graph convolutional neural networks, which are constrained by graph size and lack flexible control over the neighborhood aspect. Extensive experiments on eight benchmark datasets demonstrate that ASWT-SGNN accurately approximates the filter function in high-density spectral regions, avoiding costly eigen-decomposition. Furthermore, ASWT-SGNN achieves comparable performance to state-of-the-art models in node classification tasks.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Modeling False Data Injection Attacks on Integrated Electricity-Gas Systems
Authors:
Rong-Peng Liu,
Xiaozhe Wang,
Zuyi Li,
Rawad Zgheib
Abstract:
This work studies the modeling of false data injection attacks (FDIAs) on IEGSs. First, we introduce a static state estimation model and bad data detection method for IEGSs. Then, we develop FDIAs on IEGSs with complete network topology and parameter information. Next, we develop FDIAs on IEGSs when intruders have only local network topology and parameter information of an IEGS. Lastly, we explore…
▽ More
This work studies the modeling of false data injection attacks (FDIAs) on IEGSs. First, we introduce a static state estimation model and bad data detection method for IEGSs. Then, we develop FDIAs on IEGSs with complete network topology and parameter information. Next, we develop FDIAs on IEGSs when intruders have only local network topology and parameter information of an IEGS. Lastly, we explore FDIAs on IEGSs when intruders have only local network topology information of an IEGS.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Joint Sensing and Communication Optimization in Target-Mounted STARS-Assisted Vehicular Networks: A MADRL Approach
Authors:
Haocheng Zhang,
Rang Liu,
Ming Li,
Wei Wang,
Qian Liu
Abstract:
The utilization of integrated sensing and communication (ISAC) technology has the potential to enhance the communication performance of road side units (RSUs) through the active sensing of target vehicles. Furthermore, installing a simultaneous transmitting and reflecting surface (STARS) on the target vehicle can provide an extra boost to the reflection of the echo signal, thereby improving the co…
▽ More
The utilization of integrated sensing and communication (ISAC) technology has the potential to enhance the communication performance of road side units (RSUs) through the active sensing of target vehicles. Furthermore, installing a simultaneous transmitting and reflecting surface (STARS) on the target vehicle can provide an extra boost to the reflection of the echo signal, thereby improving the communication quality for in-vehicle users. However, the design of this target-mounted STARS system exhibits significant challenges, such as limited information sharing and distributed STARS control. In this paper, we propose an end-to-end multi-agent deep reinforcement learning (MADRL) framework to tackle the challenges of joint sensing and communication optimization in the considered target-mounted STARS assisted vehicle networks. By deploying agents on both RSU and vehicle, the MADRL framework enables RSU and vehicle to perform beam prediction and STARS pre-configuration using their respective local information. To ensure efficient and stable learning for continuous decision-making, we employ the multi-agent soft actor critic (MASAC) algorithm and the multi-agent proximal policy optimization (MAPPO) algorithm on the proposed MADRL framework. Extensive experimental results confirm the effectiveness of our proposed MADRL framework in improving both sensing and communication performance through the utilization of target-mounted STARS. Finally, we conduct a comparative analysis and comparison of the two proposed algorithms under various environmental conditions.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Cross-Domain Dual-Functional OFDM Waveform Design for Accurate Sensing/Positioning
Authors:
Fan Zhang,
Tianqi Mao,
Ruiqi Liu,
Zhu Han,
Sheng Chen,
Zhaocheng Wang
Abstract:
Orthogonal frequency division multiplexing (OFDM) has been widely recognized as the representative waveform for 5G wireless networks, which can directly support sensing/positioning with existing infrastructure. To guarantee superior sensing/positioning accuracy while supporting high-speed communications simultaneously, the dual functions tend to be assigned with different resource elements (REs) d…
▽ More
Orthogonal frequency division multiplexing (OFDM) has been widely recognized as the representative waveform for 5G wireless networks, which can directly support sensing/positioning with existing infrastructure. To guarantee superior sensing/positioning accuracy while supporting high-speed communications simultaneously, the dual functions tend to be assigned with different resource elements (REs) due to their diverse design requirements. This motivates optimization of resource allocation/waveform design across time, frequency, power and delay-Doppler domains. Therefore, this article proposes two cross-domain waveform optimization strategies for effective convergence of OFDM-based communications and sensing/positioning, following communication- and sensing-centric criteria, respectively. For the communication-centric design, to maximize the achievable data rate, a fraction of REs are optimally allocated for communications according to prior knowledge of the communication channel. The remaining REs are then employed for sensing/positioning, where the sidelobe level and peak-to-average power ratio are suppressed by optimizing its power-frequency and phase-frequency characteristics for sensing performance improvement. For the sensing-centric design, a `locally' perfect auto-correlation property is ensured for accurate sensing and positioning by adjusting the unit cells of the ambiguity function within its region of interest (RoI). Afterwards, the irrelevant cells beyond RoI, which can readily determine the sensing power allocation, are optimized with the communication power allocation to enhance the achievable data rate. Numerical results demonstrate the superiority of the proposed waveform designs.
△ Less
Submitted 19 March, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Detecting Abrupt Change of Channel Covariance Matrix in IRS-Assisted Communication
Authors:
Runnan Liu,
Liang Liu,
Yin Xu,
Dazhi He,
Wenjun Zhang,
Chang Wen Chen
Abstract:
The knowledge of channel covariance matrices is crucial to the design of intelligent reflecting surface (IRS) assisted communication. However, channel covariance matrices may change suddenly in practice. This letter focuses on the detection of the above change in IRS-assisted communication. Specifically, we consider the uplink communication system consisting of a single-antenna user (UE), an IRS,…
▽ More
The knowledge of channel covariance matrices is crucial to the design of intelligent reflecting surface (IRS) assisted communication. However, channel covariance matrices may change suddenly in practice. This letter focuses on the detection of the above change in IRS-assisted communication. Specifically, we consider the uplink communication system consisting of a single-antenna user (UE), an IRS, and a multi-antenna base station (BS). We first categorize two types of channel covariance matrix changes based on their impact on system design: Type I change, which denotes the change in the BS receive covariance matrix, and Type II change, which denotes the change in the IRS transmit/receive covariance matrix. Secondly, a powerful method is proposed to detect whether a Type I change occurs, a Type II change occurs, or no change occurs. The effectiveness of our proposed scheme is verified by numerical results.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
RIS-Aided Receive Generalized Spatial Modulation Design with Reflecting Modulation
Authors:
Xinghao Guo,
Yin Xu,
Hanjiang Hong,
De Mi,
Ruiqi Liu,
Dazhi He,
Wenjun Zhang,
Yi-yan Wu
Abstract:
Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception perform…
▽ More
Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception performance compared to conventional SM. To overcome the limitations of SM, this paper combines GSM with RIS and proposes the RIS-aided receive generalized spatial modulation (RIS-RGSM) scheme. The RIS-RGSM diversity scheme is realized via a simple improvement based on the state-of-the-art scheme. To further increase the transmission rate, a novel RIS-RGSM MUX scheme is proposed, where the reflection phase shifts and on/off states of RIS elements are configured to achieve bit map**. The theoretical bit error rate (BER) of the proposed scheme is derived and agrees well with the simulation results. Numerical simulations show that the RIS-RGSM MUX scheme has better BER performance than the diversity scheme. The proposed scheme can significantly increase the transmission rate and maintain good performance compared to the existing scheme under a limited number of antennas.
△ Less
Submitted 15 April, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Deep Learning based Spatially Dependent Acoustical Properties Recovery
Authors:
Ruixian Liu,
Peter Gerstoft
Abstract:
The physics-informed neural network (PINN) is capable of recovering partial differential equation (PDE) coefficients that remain constant throughout the spatial domain directly from physical measurements. In this work, we propose a spatially dependent physics-informed neural network (SD-PINN), which enables the recovery of coefficients in spatially-dependent PDEs using a single neural network, eli…
▽ More
The physics-informed neural network (PINN) is capable of recovering partial differential equation (PDE) coefficients that remain constant throughout the spatial domain directly from physical measurements. In this work, we propose a spatially dependent physics-informed neural network (SD-PINN), which enables the recovery of coefficients in spatially-dependent PDEs using a single neural network, eliminating the requirement for domain-specific physical expertise. We apply the SD-PINN to spatially-dependent wave equation coefficients recovery to reveal the spatial distribution of acoustical properties in the inhomogeneous medium. The proposed method exhibits robustness to noise owing to the incorporation of a loss function for the physical constraint that the assumed PDE must be satisfied. For the coefficients recovery of spatially two-dimensional PDEs, we store the PDE coefficients at all locations in the 2D region of interest into a matrix and incorporate the low-rank assumption for such a matrix to recover the coefficients at locations without available measurements.
△ Less
Submitted 22 November, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Two Enhanced-rate Power Allocation Strategies for Active IRS-assisted Wireless Network
Authors:
Qiankun Cheng,
Rongen Dong,
Wenlong Cai,
Ruiqi Liu,
Feng Shu,
Jiangzhou Wang
Abstract:
Due to its ability of overcoming the impact of double-fading effect, active intelligent reflecting surface (IRS) has attracted a lot of attention. Unlike passive IRS, active IRS should be supplied by power, thus adjusting power between base station (BS) and IRS having a direct impact on the system rate performance. In this paper, the active IRS-aided network under a total power constraint is model…
▽ More
Due to its ability of overcoming the impact of double-fading effect, active intelligent reflecting surface (IRS) has attracted a lot of attention. Unlike passive IRS, active IRS should be supplied by power, thus adjusting power between base station (BS) and IRS having a direct impact on the system rate performance. In this paper, the active IRS-aided network under a total power constraint is modeled with an ability of adjusting power between BS and IRS. Given the transmit beamforming at BS and reflecting beamforming at IRS, the SNR expression is derived to be a function of power allocation (PA) factor, and the optimization of maximizing the SNR is given. Subsequently, two high-performance PA strategies, enhanced multiple random initialization Newton's (EMRIN) and Taylor polynomial approximation (TPA), are proposed. The former is to improve the rate performance of classic Netwon's method to avoid involving a local optimal point by using multiple random initializations. To reduce its high computational complexity, the latter provides a closed-form solution by making use of the first-order Taylor polynomial approximation to the original SNR function. Actually, using TPA, the original optimization problem is transformed into a problem of finding a root for a third-order polynomial.Simulation results are as follows: the first-order TPA of SNR fit its exact expression well, the proposed two PA methods performs much better than fixed PA in accordance with rate, and appoaches exhaustive search as the number of IRS reflecting elements goes to large-scale.
△ Less
Submitted 23 January, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
NNgTL: Neural Network Guided Optimal Temporal Logic Task Planning for Mobile Robots
Authors:
Ruijia Liu,
Shaoyuan Li,
Xiang Yin
Abstract:
In this work, we investigate task planning for mobile robots under linear temporal logic (LTL) specifications. This problem is particularly challenging when robots navigate in continuous workspaces due to the high computational complexity involved. Sampling-based methods have emerged as a promising avenue for addressing this challenge by incrementally constructing random trees, thereby sidesteppin…
▽ More
In this work, we investigate task planning for mobile robots under linear temporal logic (LTL) specifications. This problem is particularly challenging when robots navigate in continuous workspaces due to the high computational complexity involved. Sampling-based methods have emerged as a promising avenue for addressing this challenge by incrementally constructing random trees, thereby sidestep** the need to explicitly explore the entire state-space. However, the performance of this sampling-based approach hinges crucially on the chosen sampling strategy, and a well-informed heuristic can notably enhance sample efficiency. In this work, we propose a novel neural-network guided (NN-guided) sampling strategy tailored for LTL planning. Specifically, we employ a multi-modal neural network capable of extracting features concurrently from both the workspace and the Büchi automaton. This neural network generates predictions that serve as guidance for random tree construction, directing the sampling process toward more optimal directions. Through numerical experiments, we compare our approach with existing methods and demonstrate its superior efficiency, requiring less than 15% of the time of the existing methods to find a feasible solution.
△ Less
Submitted 25 September, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments
Authors:
Yifei Chen,
Kunyu Peng,
Alina Roitberg,
David Schneider,
Jiaming Zhang,
Junwei Zheng,
Rui** Liu,
Yufan Chen,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
To integrate action recognition methods into autonomous robotic systems, it is crucial to consider adverse situations involving target occlusions. Such a scenario, despite its practical relevance, is rarely addressed in existing self-supervised skeleton-based action recognition methods. To empower robots with the capacity to address occlusion, we propose a simple and effective method. We first pre…
▽ More
To integrate action recognition methods into autonomous robotic systems, it is crucial to consider adverse situations involving target occlusions. Such a scenario, despite its practical relevance, is rarely addressed in existing self-supervised skeleton-based action recognition methods. To empower robots with the capacity to address occlusion, we propose a simple and effective method. We first pre-train using occluded skeleton sequences, then use k-means clustering (KMeans) on sequence embeddings to group semantically similar samples. Next, we employ K-nearest-neighbor (KNN) to fill in missing skeleton data based on the closest sample neighbors. Imputing incomplete skeleton sequences to create relatively complete sequences as input provides significant benefits to existing skeleton-based self-supervised models. Meanwhile, building on the state-of-the-art Partial Spatio-Temporal Learning (PSTL), we introduce an Occluded Partial Spatio-Temporal Learning (OPSTL) framework. This enhancement utilizes Adaptive Spatial Masking (ASM) for better use of high-quality, intact skeletons. The effectiveness of our imputation methods is verified on the challenging occluded versions of the NTURGB+D 60 and NTURGB+D 120. The source code will be made publicly available at https://github.com/cyfml/OPSTL.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision
Authors:
Yi** Wei,
Kunyu Peng,
Alina Roitberg,
Jiaming Zhang,
Junwei Zheng,
Rui** Liu,
Yufan Chen,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bone…
▽ More
Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bones, and motions are used, hence no additional modalities are explored.
In this work, we first propose an Implicit Knowledge Exchange Module (IKEM) which alleviates the propagation of erroneous knowledge between low-performance modalities. Then, we further propose three new modalities to enrich the complementary information between modalities. Finally, to maintain efficiency when introducing new modalities, we propose a novel teacher-student framework to distill the knowledge from the secondary modalities into the mandatory modalities considering the relationship constrained by anchors, positives, and negatives, named relational cross-modality knowledge distillation. The experimental results demonstrate the effectiveness of our approach, unlocking the efficient use of skeleton-based multi-modality data. Source code will be made publicly available at https://github.com/desehuileng0o0/IKEM.
△ Less
Submitted 10 January, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency
Authors:
Rui Liu,
Jiatian Xi,
Ziyue Jiang,
Haizhou Li
Abstract:
Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and gl…
▽ More
Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and global fluency in the context and original utterance. To maintain the speech fluency, we propose a fluency speech editing model, termed \textit{FluentEditor}, by considering fluency-aware training criterion in the TSE training. Specifically, the \textit{acoustic consistency constraint} aims to smooth the transition between the edited region and its neighboring acoustic segments consistent with the ground truth, while the \textit{prosody consistency constraint} seeks to ensure that the prosody attributes within the edited regions remain consistent with the overall style of the original utterance. The subjective and objective experimental results on VCTK demonstrate that our \textit{FluentEditor} outperforms all advanced baselines in terms of naturalness and fluency. The audio samples and code are available at \url{https://github.com/Ai-S2-Lab/FluentEditor}.
△ Less
Submitted 21 September, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Instant Photorealistic Style Transfer: A Lightweight and Adaptive Approach
Authors:
Rong Liu,
Enyu Zhao,
Zhiyuan Liu,
Andrew Feng,
Scott John Easley
Abstract:
In this paper, we propose an Instant Photorealistic Style Transfer (IPST) approach, designed to achieve instant photorealistic style transfer on super-resolution inputs without the need for pre-training on pair-wise datasets or imposing extra constraints. Our method utilizes a lightweight StyleNet to enable style transfer from a style image to a content image while preserving non-color information…
▽ More
In this paper, we propose an Instant Photorealistic Style Transfer (IPST) approach, designed to achieve instant photorealistic style transfer on super-resolution inputs without the need for pre-training on pair-wise datasets or imposing extra constraints. Our method utilizes a lightweight StyleNet to enable style transfer from a style image to a content image while preserving non-color information. To further enhance the style transfer process, we introduce an instance-adaptive optimization to prioritize the photorealism of outputs and accelerate the convergence of the style network, leading to a rapid training completion within seconds. Moreover, IPST is well-suited for multi-frame style transfer tasks, as it retains temporal and multi-view consistency of the multi-frame inputs such as video and Neural Radiance Field (NeRF). Experimental results demonstrate that IPST requires less GPU memory usage, offers faster multi-frame transfer speed, and generates photorealistic outputs, making it a promising solution for various photorealistic transfer applications.
△ Less
Submitted 20 October, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Cramer-Rao Bound Optimization for Active RIS-Empowered ISAC Systems
Authors:
Qi Zhu,
Ming Li,
Rang Liu,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC), which simultaneously performs sensing and communication functions within a shared frequency band and hardware platform, has emerged as a promising technology for future wireless systems. Nevertheless, the weak echo signal received by the low-sensitivity ISAC receiver significantly constrains sensing performance in scenarios involving obstructed targets.…
▽ More
Integrated sensing and communication (ISAC), which simultaneously performs sensing and communication functions within a shared frequency band and hardware platform, has emerged as a promising technology for future wireless systems. Nevertheless, the weak echo signal received by the low-sensitivity ISAC receiver significantly constrains sensing performance in scenarios involving obstructed targets. Active reconfigurable intelligent surface (RIS) has become a prospective solution by situationally manipulating the wireless propagations and amplifying the signals. In this paper, we investigate active RIS-empowered ISAC systems to enhance radar echo signal quality as well as communication performance. In particular, we focus on the joint design of the base station (BS) transmit precoding and the active RIS reflection beamforming to optimize the parameter estimation performance in terms of Cramer-Rao bound (CRB) subject to the communication users' signal-to-interference-plus-noise ratio (SINR) requirements. An efficient algorithm based on alternating optimization, semidefinite relaxation (SDR), and majorization-minimization (MM) is proposed to solve the formulated challenging non-convex problem. Finally, simulation results validate the effectiveness of the developed algorithm and the potential of employing active RIS in ISAC systems to enhance direct-of-arrival (DoA) estimation performance.
△ Less
Submitted 1 April, 2024; v1 submitted 17 September, 2023;
originally announced September 2023.
-
Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals
Authors:
Ran Liu,
Ellen L. Zippi,
Hadi Pouransari,
Chris Sandino,
**g** Nie,
Hanlin Goh,
Erdrin Azemi,
Ali Moin
Abstract:
Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence o…
▽ More
Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence of potential distributional shifts, we propose a frequency-aware masked autoencoder ($\texttt{bio}$FAME) that learns to parameterize the representation of biosignals in the frequency space. $\texttt{bio}$FAME incorporates a frequency-aware transformer, which leverages a fixed-size Fourier-based operator for global token mixing, independent of the length and sampling rate of inputs. To maintain the frequency components within each input channel, we further employ a frequency-maintain pretraining strategy that performs masked autoencoding in the latent space. The resulting architecture effectively utilizes multimodal information during pretraining, and can be seamlessly adapted to diverse tasks and modalities at test time, regardless of input size and order. We evaluated our approach on a diverse set of transfer experiments on unimodal time series, achieving an average of $\uparrow$5.5% improvement in classification accuracy over the previous state-of-the-art. Furthermore, we demonstrated that our architecture is robust in modality mismatch scenarios, including unpredicted modality dropout or substitution, proving its practical utility in real-world applications. Code is available at https://github.com/apple/ml-famae .
△ Less
Submitted 18 April, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Aggregating Intrinsic Information to Enhance BCI Performance through Federated Learning
Authors:
Rui Liu,
Yuanyuan Chen,
Anran Li,
Yi Ding,
Han Yu,
Cuntai Guan
Abstract:
Insufficient data is a long-standing challenge for Brain-Computer Interface (BCI) to build a high-performance deep learning model. Though numerous research groups and institutes collect a multitude of EEG datasets for the same BCI task, sharing EEG data from multiple sites is still challenging due to the heterogeneity of devices. The significance of this challenge cannot be overstated, given the c…
▽ More
Insufficient data is a long-standing challenge for Brain-Computer Interface (BCI) to build a high-performance deep learning model. Though numerous research groups and institutes collect a multitude of EEG datasets for the same BCI task, sharing EEG data from multiple sites is still challenging due to the heterogeneity of devices. The significance of this challenge cannot be overstated, given the critical role of data diversity in fostering model robustness. However, existing works rarely discuss this issue, predominantly centering their attention on model training within a single dataset, often in the context of inter-subject or inter-session settings. In this work, we propose a hierarchical personalized Federated Learning EEG decoding (FLEEG) framework to surmount this challenge. This innovative framework heralds a new learning paradigm for BCI, enabling datasets with disparate data formats to collaborate in the model training process. Each client is assigned a specific dataset and trains a hierarchical personalized model to manage diverse data formats and facilitate information exchange. Meanwhile, the server coordinates the training procedure to harness knowledge gleaned from all datasets, thus elevating overall performance. The framework has been evaluated in Motor Imagery (MI) classification with nine EEG datasets collected by different devices but implementing the same MI task. Results demonstrate that the proposed frame can boost classification performance up to 16.7% by enabling knowledge sharing between multiple datasets, especially for smaller datasets. Visualization results also indicate that the proposed framework can empower the local models to put a stable focus on task-related areas, yielding better performance. To the best of our knowledge, this is the first end-to-end solution to address this important challenge.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Intelligent Reflecting Surface Aided Multi-Tier Hybrid Computing
Authors:
Yapeng Zhao,
Qingqing Wu,
Guangji Chen,
Wen Chen,
Ruiqi Liu,
Ming-Min Zhao,
Yuan Wu,
Shaodan Ma
Abstract:
The digital twin edge network (DITEN) aims to integrate mobile edge computing (MEC) and digital twin (DT) to provide real-time system configuration and flexible resource allocation for the sixth-generation network. This paper investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid computing system that can achieve mutual benefits for DT and MEC in the DITEN. For the first time…
▽ More
The digital twin edge network (DITEN) aims to integrate mobile edge computing (MEC) and digital twin (DT) to provide real-time system configuration and flexible resource allocation for the sixth-generation network. This paper investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid computing system that can achieve mutual benefits for DT and MEC in the DITEN. For the first time, this paper presents the opportunity to realize the network-wide convergence of DT and MEC. In the considered system, specifically, over-the-air computation (AirComp) is employed to monitor the status of the DT system, while MEC is performed with the assistance of DT to provide low-latency computing services. Besides, the IRS is utilized to enhance signal transmission and mitigate interference among heterogeneous nodes. We propose a framework for designing the hybrid computing system, aiming to maximize the sum computation rate under communication and computation resources constraints. To tackle the non-convex optimization problem, alternative optimization and successive convex approximation techniques are leveraged to decouple variables and then transform the problem into a more tractable form. Simulation results verify the effectiveness of the proposed algorithm and demonstrate the IRS can significantly improve the system performance with appropriate phase shift configurations. Moreover, the results indicate that the DT assisted MEC system can precisely achieve the balance between local computing and task offloading since real-time system status can be obtained with the help of DT.
△ Less
Submitted 25 October, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.