Search | arXiv e-print repository

Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it with the prioritized experience replay method to solve the problem that traditional path planning algorithms often fall into dead zones. A series of simulation experiment results prove that the path planning algorithm based on DDQN is significantly better than other methods in terms of speed and accuracy, especially the ability to break through dead zones in extreme environments. Research shows that the path planning algorithm based on DDQN performs well in terms of path quality and safety. These research results provide an important reference for the research on automatic navigation of autonomous vehicles. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

arXiv:2406.00690 [pdf, other]

Electromagnetic Wave Property Inspired Radio Environment Knowledge Construction and AI-based Verification for 6G Digital Twin Channel

Authors: Jialin Wang, Jianhua Zhang, Yutong Sun, Yuxiang Zhang, Tao Jiang, Liang Xia

Abstract: As the underlying foundation of a digital twin network (DTN), a digital twin channel (DTC) can accurately depict the process of radio propagation in the air interface to support the DTN-based 6G wireless network. Since radio propagation is affected by the environment, constructing the relationship between the environment and radio wave propagation is the key to improving the accuracy of DTC, and t… ▽ More As the underlying foundation of a digital twin network (DTN), a digital twin channel (DTC) can accurately depict the process of radio propagation in the air interface to support the DTN-based 6G wireless network. Since radio propagation is affected by the environment, constructing the relationship between the environment and radio wave propagation is the key to improving the accuracy of DTC, and the construction method based on artificial intelligence (AI) is the most concentrated. However, in the existing methods, the environment information input into the neural network (NN) has many dimensions, and the correlation between the environment and the channel relationship is unclear, resulting in a highly complex relationship construction process. To solve this issue, in this paper, we propose a construction method of radio environment knowledge (REK) inspired by the electromagnetic wave property to quantify the contribution of radio propagation. Specifically, a range selection scheme for effective environment information based on random geometry is proposed to reduce the redundancy of environment information. We quantify the contribution of radio propagation reflection, diffraction and scatterer blockage using environment information and propose a flow chart of REK construction to replace the feature extraction process partially based on NN. To validate REK's effectiveness, we conduct a path loss prediction task based on a lightweight convolutional neural network (CNN) employing a simple two-layer convolutional structure. The results show that the accuracy of the range selection method reaches 90\%; the constructed REK maintains the prediction error of 0.3 and only needs 0.04 seconds of testing time, effectively reducing the network complexity. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.03129 [pdf, other]

Active Sensing for Multiuser Beam Tracking with Reconfigurable Intelligent Surface

Authors: Han Han, Tao Jiang, Wei Yu

Abstract: This paper studies a beam tracking problem in which an access point (AP), in collaboration with a reconfigurable intelligent surface (RIS), dynamically adjusts its downlink beamformers and the reflection pattern at the RIS in order to maintain reliable communications with multiple mobile user equipments (UEs). Specifically, the mobile UEs send uplink pilots to the AP periodically during the channe… ▽ More This paper studies a beam tracking problem in which an access point (AP), in collaboration with a reconfigurable intelligent surface (RIS), dynamically adjusts its downlink beamformers and the reflection pattern at the RIS in order to maintain reliable communications with multiple mobile user equipments (UEs). Specifically, the mobile UEs send uplink pilots to the AP periodically during the channel sensing intervals, the AP then adaptively configures the beamformers and the RIS reflection coefficients for subsequent data transmission based on the received pilots. This is an active sensing problem, because channel sensing involves configuring the RIS coefficients during the pilot stage and the optimal sensing strategy should exploit the trajectory of channel state information (CSI) from previously received pilots. Analytical solution to such an active sensing problem is very challenging. In this paper, we propose a deep learning framework utilizing a recurrent neural network (RNN) to automatically summarize the time-varying CSI obtained from the periodically received pilots into state vectors. These state vectors are then mapped to the AP beamformers and RIS reflection coefficients for subsequent downlink data transmissions, as well as the RIS reflection coefficients for the next round of uplink channel sensing. The map**s from the state vectors to the downlink beamformers and the RIS reflection coefficients for both channel sensing and downlink data transmission are performed using graph neural networks (GNNs) to account for the interference among the UEs. Simulations demonstrate significant and interpretable performance improvement of the proposed approach over the existing data-driven methods with nonadaptive channel sensing schemes. △ Less

Submitted 31 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.13277 [pdf, other]

Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang

Abstract: Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecti… ▽ More Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models. △ Less

Submitted 24 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: Submitted to a conference

arXiv:2403.11397 [pdf, other]

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

Authors: Yujia Liu, Chenxi Yang, Dingquan Li, Jianhao Ding, Tingting Jiang

Abstract: The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulti… ▽ More The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulting in significant changes in predicted scores. In this paper, we propose a defense method to improve the stability in predicted scores when attacked by small perturbations, thus enhancing the adversarial robustness of NR-IQA models. To be specific, we present theoretical evidence showing that the magnitude of score changes is related to the $\ell_1$ norm of the model's gradient with respect to the input image. Building upon this theoretical foundation, we propose a norm regularization training strategy aimed at reducing the $\ell_1$ norm of the gradient, thereby boosting the robustness of NR-IQA models. Experiments conducted on four NR-IQA baseline models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge, this work marks the first attempt to defend against adversarial attacks on NR-IQA models. Our study offers valuable insights into the adversarial robustness of NR-IQA models and provides a foundation for future research in this area. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: accepted by CVPR 2024

arXiv:2403.09004 [pdf, ps, other]

Meta-Learning-Based Fronthaul Compression for Cloud Radio Access Networks

Authors: Ruihua Qiao, Tao Jiang, Wei Yu

Abstract: This paper investigates the fronthaul compression problem in a user-centric cloud radio access network, in which single-antenna users are served by a central processor (CP) cooperatively via a cluster of remote radio heads (RRHs). To satisfy the fronthaul capacity constraint, this paper proposes a transform-compress-forward scheme, which consists of well-designed transformation matrices and unifor… ▽ More This paper investigates the fronthaul compression problem in a user-centric cloud radio access network, in which single-antenna users are served by a central processor (CP) cooperatively via a cluster of remote radio heads (RRHs). To satisfy the fronthaul capacity constraint, this paper proposes a transform-compress-forward scheme, which consists of well-designed transformation matrices and uniform quantizers. The transformation matrices perform dimension reduction in the uplink and dimension expansion in the downlink. To reduce the communication overhead for designing the transformation matrices, this paper further proposes a deep learning framework to first learn a suboptimal transformation matrix at each RRH based on the local channel state information (CSI), and then to refine it iteratively. To facilitate the refinement process, we propose an efficient signaling scheme that only requires the transmission of low-dimensional effective CSI and its gradient between the CP and RRH, and further, a meta-learning based gated recurrent unit network to reduce the number of signaling transmission rounds. For the sum-rate maximization problem, simulation results show that the proposed two-stage neural network can perform close to the fully cooperative global CSI based benchmark with significantly reduced communication overhead for both the uplink and the downlink. Moreover, using the first stage alone can already outperform the existing local CSI based benchmark. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 15 Pages, 13 Figures; accepted in IEEE Transactions on Wireless Communications

arXiv:2403.00134 [pdf, other]

Active Sensing for Reciprocal MIMO Channels

Authors: Tao Jiang, Wei Yu

Abstract: This paper addresses the design of transmit precoder and receive combiner matrices to support $N_{\rm s}$ independent data streams over a time-division duplex (TDD) point-to-point massive multiple-input multiple-output (MIMO) channel with either a fully digital or a hybrid structure. The optimal precoder and combiner design amounts to finding the top-$N_{\rm s}$ singular vectors of the channel mat… ▽ More This paper addresses the design of transmit precoder and receive combiner matrices to support $N_{\rm s}$ independent data streams over a time-division duplex (TDD) point-to-point massive multiple-input multiple-output (MIMO) channel with either a fully digital or a hybrid structure. The optimal precoder and combiner design amounts to finding the top-$N_{\rm s}$ singular vectors of the channel matrix, but the explicit estimation of the entire high-dimensional channel would require significant pilot overhead. Alternatively, prior works suggest to find the precoding and combining matrices directly by exploiting channel reciprocity and by using the power iteration method, but its performance degrades in the low SNR regime. To tackle this challenging problem, this paper proposes a learning-based active sensing framework, where the transmitter and the receiver send pilots alternately using sensing beamformers that are actively designed as functions of previously received pilots. This is accomplished by using recurrent neural networks to summarize information from the historical observations into hidden state vectors, then using fully connected neural networks to learn the appropriate sensing beamformers in the next pilot stage and finally the transmit precoding and receive combiner matrices for data communications. Simulations demonstrate that the learning-based method outperforms existing approaches significantly and maintains superior performance even in the low SNR regime for both the fully digital and hybrid MIMO scenarios. △ Less

Submitted 6 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: This paper is accepted in IEEE Transactions on Signal Processing

arXiv:2402.16153 [pdf, other]

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

arXiv:2402.11164 [pdf]

TinyLIC-High efficiency lossy image compression method

Authors: Gaocheng Ma, Yinfeng Chai, Tianhao Jiang, Ming Lu, Tong Chen

Abstract: Image compression has been the subject of extensive research for several decades, resulting in the development of well-known standards such as JPEG, JPEG2000, and H.264/AVC. However, recent advancements in deep learning have led to the emergence of learned image compression methods that offer significant improvements in coding efficiency compared to traditional codecs. These learned compression te… ▽ More Image compression has been the subject of extensive research for several decades, resulting in the development of well-known standards such as JPEG, JPEG2000, and H.264/AVC. However, recent advancements in deep learning have led to the emergence of learned image compression methods that offer significant improvements in coding efficiency compared to traditional codecs. These learned compression techniques have shown noticeable gains and even outperformed traditional schemes △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2401.15321 [pdf]

doi 10.1016/j.apenergy.2024.122736

Localization of Dummy Data Injection Attacks in Power Systems Considering Incomplete Topological Information: A Spatio-Temporal Graph Wavelet Convolutional Neural Network Approach

Authors: Zhaoyang Qu, Yunchang Dong, Yang Li, Siqi Song, Tao Jiang, Min Li, Qiming Wang, Lei Wang, Xiaoyong Bo, Jiye Zang, Qi Xu

Abstract: The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing researc… ▽ More The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing research predominantly focuses on various machine learning techniques, often analyzing the temporal data sequences post-attack or relying solely on Euclidean spatial characteristics. Unfortunately, this approach tends to overlook the inherent topological correlations within the non-Euclidean spatial attributes of power grid data, consequently leading to diminished accuracy in attack localization. To address this issue, this study takes a comprehensive approach. Initially, it examines the underlying principles of these new DDIAs on power systems. Here, an intricate mathematical model of the DDIA is designed, accounting for incomplete topological knowledge and alternating current (AC) state estimation from an attacker's perspective. Subsequently, by integrating a priori knowledge of grid topology and considering the temporal correlations within measurement data and the topology-dependent attributes of the power grid, this study introduces temporal and spatial attention matrices. These matrices adaptively capture the spatio-temporal correlations within the attacks. Leveraging gated stacked causal convolution and graph wavelet sparse convolution, the study jointly extracts spatio-temporal DDIA features. Finally, the research proposes a DDIA localization method based on spatio-temporal graph neural networks. The accuracy and effectiveness of the DDIA model are rigorously demonstrated through comprehensive analytical cases. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: Accepted by Applied Energy

Journal ref: Applied Energy 360 (2024) 122736

arXiv:2401.13276 [pdf, other]

SCNet: Sparse Compression Network for Music Source Separation

Authors: Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng

Abstract: Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propo… ▽ More Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propose SCNet, a novel frequency-domain network to explicitly split the spectrogram of the mixture into several subbands and introduce a sparsity-based encoder to model different frequency bands. We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information. In this way, the separation performance can be significantly improved using lower computational consumption. Experiment results show that the proposed model achieves a signal to distortion ratio (SDR) of 9.0 dB on the MUSDB18-HQ dataset without using extra data, which outperforms state-of-the-art methods. Specifically, SCNet's CPU inference time is only 48% of HT Demucs, one of the previous state-of-the-art models. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.05217 [pdf, other]

Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Tingting Jiang

Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack m… ▽ More No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack methods for NR-IQA provide a powerful instrument to test the robustness of NR-IQA. However, current attack methods of NR-IQA heavily rely on the gradient of the NR-IQA model, leading to limitations when the gradient information is unavailable. In this paper, we present a pioneering query-based black box attack against NR-IQA methods. We propose the concept of score boundary and leverage an adaptive iterative approach with multiple score boundaries. Meanwhile, the initial attack directions are also designed to leverage the characteristics of the Human Visual System (HVS). Experiments show our method outperforms all compared state-of-the-art attack methods and is far ahead of previous black-box methods. The effective NR-IQA model DBCNN suffers a Spearman's rank-order correlation coefficient (SROCC) decline of 0.6381 attacked by our method, revealing the vulnerability of NR-IQA models to black-box attacks. The proposed attack method also provides a potent tool for further exploration into NR-IQA robustness. △ Less

Submitted 25 April, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2312.09002 [pdf, other]

Localization with Reconfigurable Intelligent Surface: An Active Sensing Approach

Authors: Zhongze Zhang, Tao Jiang, Wei Yu

Abstract: This paper addresses an uplink localization problem in which a base station (BS) aims to locate a remote user with the help of reconfigurable intelligent surfaces (RISs). We propose a strategy in which the user transmits pilots sequentially and the BS adaptively adjusts the sensing vectors, including the BS beamforming vector and multiple RIS reflection coefficients based on the observations alrea… ▽ More This paper addresses an uplink localization problem in which a base station (BS) aims to locate a remote user with the help of reconfigurable intelligent surfaces (RISs). We propose a strategy in which the user transmits pilots sequentially and the BS adaptively adjusts the sensing vectors, including the BS beamforming vector and multiple RIS reflection coefficients based on the observations already made, to eventually produce an estimated user position. This is a challenging active sensing problem for which finding an optimal solution involves searching through a complicated functional space whose dimension increases with the number of measurements. We show that the long short-term memory (LSTM) network can be used to exploit the latent temporal correlation between measurements to automatically construct scalable state vectors. Subsequently, the state vector is mapped to the sensing vectors for the next time frame via a deep neural network (DNN). A final DNN is used to map the state vector to the estimated user position. Numerical result illustrates the advantage of the active sensing design as compared to non-active sensing methods. The proposed solution produces interpretable results and is generalizable in the number of sensing stages. Remarkably, we show that a network with one BS and multiple RISs can outperform a comparable setting with multiple BSs. △ Less

Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted in IEEE Transactions on Wireless Communications. This is an extended version of the previous arXiv paper arXiv:2310.13160

arXiv:2311.12273 [pdf, other]

How AI-driven Digital Twins Can Empower Mobile Networks

Authors: Tong Li, Fenyu Jiang, Qiaohong Yu, Wenzhen Huang, Tao Jiang, Depeng **

Abstract: The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which… ▽ More The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2310.16765 [pdf, other]

How to Extend 3D GBSM to Integrated Sensing and Communication Channel with Sharing Feature?

Authors: Yameng Liu, Jianhua Zhang, Yuxiang Zhang, Huiwen Gong, Tao Jiang, Guangyi Liu

Abstract: Integrated Sensing and Communication (ISAC) is a promising technology in 6G systems. The existing 3D Geometry-Based Stochastic Model (GBSM), as standardized for 5G systems, addresses solely communication channels and lacks consideration of the integration with sensing channel. Therefore, this letter extends 3D GBSM to support ISAC research, with a particular focus on capturing the sharing feature… ▽ More Integrated Sensing and Communication (ISAC) is a promising technology in 6G systems. The existing 3D Geometry-Based Stochastic Model (GBSM), as standardized for 5G systems, addresses solely communication channels and lacks consideration of the integration with sensing channel. Therefore, this letter extends 3D GBSM to support ISAC research, with a particular focus on capturing the sharing feature of both channels, including shared scatterers, clusters, paths, and similar propagation param-eters, which have been experimentally verified in the literature. The proposed approach can be summarized as follows: Firstly, an ISAC channel model is proposed, where shared and non-shared components are superimposed for both communication and sensing. Secondly, sensing channel is characterized as a cascade of TX-target, radar cross section, and target-RX, with the introduction of a novel parameter S for shared target extraction. Finally, an ISAC channel implementation framework is proposed, allowing flexible configuration of sharing feature and the joint generation of communication and sensing channels. The proposed ISAC channel model can be compatible with the 3GPP standards and offers promising support for ISAC technology evaluation. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.13160 [pdf, other]

Active Sensing for Localization with Reconfigurable Intelligent Surface

Authors: Zhongze Zhang, Tao Jiang, Wei Yu

Abstract: This paper addresses an uplink localization problem in which the base station (BS) aims to locate a remote user with the aid of reconfigurable intelligent surface (RIS). This paper proposes a strategy in which the user transmits pilots over multiple time frames, and the BS adaptively adjusts the RIS reflection coefficients based on the observations already received so far in order to produce an ac… ▽ More This paper addresses an uplink localization problem in which the base station (BS) aims to locate a remote user with the aid of reconfigurable intelligent surface (RIS). This paper proposes a strategy in which the user transmits pilots over multiple time frames, and the BS adaptively adjusts the RIS reflection coefficients based on the observations already received so far in order to produce an accurate estimate of the user location at the end. This is a challenging active sensing problem for which finding an optimal solution involves a search through a complicated functional space whose dimension increases with the number of measurements. In this paper, we show that the long short-term memory (LSTM) network can be used to exploit the latent temporal correlation between measurements to automatically construct scalable information vectors (called hidden state) based on the measurements. Subsequently, the state vector can be mapped to the RIS configuration for the next time frame in a codebook-free fashion via a deep neural network (DNN). After all the measurements have been received, a final DNN can be used to map the LSTM cell state to the estimated user equipment (UE) position. Numerical result shows that the proposed active RIS design results in lower localization error as compared to existing active and nonactive methods. The proposed solution produces interpretable results and is generalizable to early stop** in the sequence of sensing stages. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Accepted in IEEE International Conference on Communications (ICC) 2023

arXiv:2310.11044 [pdf, ps, other]

A Tutorial on Near-Field XL-MIMO Communications Towards 6G

Authors: Haiquan Lu, Yong Zeng, Changsheng You, Yu Han, Jiayi Zhang, Zhe Wang, Zhenjun Dong, Shi **, Cheng-Xiang Wang, Tao Jiang, Xiaohu You, Rui Zhang

Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The… ▽ More Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The evolution from massive MIMO to XL-MIMO is not simply an increase in the array size, but faces new design challenges, in terms of near-field channel modelling, performance analysis, channel estimation, and practical implementation. In this article, we give a comprehensive tutorial overview on near-field XL-MIMO communications, aiming to provide useful guidance for tackling the above challenges. First, the basic near-field modelling for XL-MIMO is established, by considering the new characteristics of non-uniform spherical wave (NUSW) and spatial non-stationarity. Next, based on the near-field modelling, the performance analysis of XL-MIMO is presented, including the near-field signal-to-noise ratio (SNR) scaling laws, beam focusing pattern, achievable rate, and degrees-of-freedom (DoF). Furthermore, various XL-MIMO design issues such as near-field beam codebook, beam training, channel estimation, and delay alignment modulation (DAM) transmission are elaborated. Finally, we point out promising directions to inspire future research on near-field XL-MIMO communications. △ Less

Submitted 3 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 42 pages

arXiv:2309.11977 [pdf, other]

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng

Abstract: Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by th… ▽ More Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by the length of the acoustic prompt, which makes it difficult to clone personal speaking style. In this paper, we propose a novel zero-shot TTS model with the multi-scale acoustic prompts based on a neural codec language model VALL-E. A speaker-aware text encoder is proposed to learn the personal speaking style at the phoneme-level from the style prompt consisting of multiple sentences. Following that, a VALL-E based acoustic decoder is utilized to model the timbre from the timbre prompt at the frame-level and generate speech. The experimental results show that our proposed method outperforms baselines in terms of naturalness and speaker similarity, and can achieve better performance by scaling out to a longer style prompt. △ Less

Submitted 9 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Accepted bt ICASSP 2024

arXiv:2308.13575 [pdf]

FrFT based estimation of linear and nonlinear impairments using Vision Transformer

Authors: Ting Jiang, Zheng Gao, Yizhao Chen, Zihe Hu, Ming Tang

Abstract: To comprehensively assess optical fiber communication system conditions, it is essential to implement joint estimation of the following four critical impairments: nonlinear signal-to-noise ratio (SNRNL), optical signal-to-noise ratio (OSNR), chromatic dispersion (CD) and differential group delay (DGD). However, current studies only achieve identifying a limited number of impairments within a narro… ▽ More To comprehensively assess optical fiber communication system conditions, it is essential to implement joint estimation of the following four critical impairments: nonlinear signal-to-noise ratio (SNRNL), optical signal-to-noise ratio (OSNR), chromatic dispersion (CD) and differential group delay (DGD). However, current studies only achieve identifying a limited number of impairments within a narrow range, due to limitations in network capabilities and lack of unified representation of impairments. To address these challenges, we adopt time-frequency signal processing based on fractional Fourier transform (FrFT) to achieve the unified representation of impairments, while employing a Transformer based neural networks (NN) to break through network performance limitations. To verify the effectiveness of the proposed estimation method, the numerical simulation is carried on a 5-channel polarization-division-multiplexed quadrature phase shift keying (PDM-QPSK) long haul optical transmission system with the symbol rate of 50 GBaud per channel, the mean absolute error (MAE) for SNRNL, OSNR, CD, and DGD estimation is 0.091 dB, 0.058 dB, 117 ps/nm, and 0.38 ps, and the monitoring window ranges from 0~20 dB, 10~30 dB, 0~51000 ps/nm, and 0~100 ps, respectively. Our proposed method achieves accurate estimation of linear and nonlinear impairments over a broad range, representing a significant advancement in the field of optical performance monitoring (OPM). △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 15 pages, 10 figures

arXiv:2307.04455 [pdf, other]

SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

Authors: Xinpeng Li, Ting Jiang, Haoqiang Fan, Shuaicheng Liu

Abstract: Image Quality Assessment (IQA) is a challenging task that requires training on massive datasets to achieve accurate predictions. However, due to the lack of IQA data, deep learning-based IQA methods typically rely on pre-trained networks trained on massive datasets as feature extractors to enhance their generalization ability, such as the ResNet network trained on ImageNet. In this paper, we utili… ▽ More Image Quality Assessment (IQA) is a challenging task that requires training on massive datasets to achieve accurate predictions. However, due to the lack of IQA data, deep learning-based IQA methods typically rely on pre-trained networks trained on massive datasets as feature extractors to enhance their generalization ability, such as the ResNet network trained on ImageNet. In this paper, we utilize the encoder of Segment Anything, a recently proposed segmentation model trained on a massive dataset, for high-level semantic feature extraction. Most IQA methods are limited to extracting spatial-domain features, while frequency-domain features have been shown to better represent noise and blur. Therefore, we leverage both spatial-domain and frequency-domain features by applying Fourier and standard convolutions on the extracted features, respectively. Extensive experiments are conducted to demonstrate the effectiveness of all the proposed components, and results show that our approach outperforms the state-of-the-art (SOTA) in four representative datasets, both qualitatively and quantitatively. Our experiments confirm the powerful feature extraction capabilities of Segment Anything and highlight the value of combining spatial-domain and frequency-domain features in IQA tasks. Code: https://github.com/Hedlen/SAM-IQA △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2306.08337 [pdf, other]

Carbon emissions and sustainability of launching 5G mobile networks in China

Authors: Tong Li, Li Yu, Yibo Ma, Tong Duan, Wenzhen Huang, Yan Zhou, Depeng **, Yong Li, Tao Jiang

Abstract: Since 2021, China has deployed more than 2.1 million 5G base stations to increase the network capacity and provide ubiquitous digital connectivity for mobile terminals. However, the launch of 5G networks also exacerbates the misalignment between cellular traffic and energy consumption, which reduces carbon efficiency - the amount of network traffic that can be delivered for each unit of carbon emi… ▽ More Since 2021, China has deployed more than 2.1 million 5G base stations to increase the network capacity and provide ubiquitous digital connectivity for mobile terminals. However, the launch of 5G networks also exacerbates the misalignment between cellular traffic and energy consumption, which reduces carbon efficiency - the amount of network traffic that can be delivered for each unit of carbon emission. In this study, we develop a large-scale data-driven framework to estimate the carbon emissions induced by mobile networks. We show that the decline in carbon efficiency leads to a carbon efficiency trap, estimated to cause additional carbon emissions of 23.82 +- 1.07 megatons in China. To mitigate the misalignment and improve energy efficiency, we propose DeepEnergy, an energy-saving method leveraging collaborative deep reinforcement learning and graph neural networks. DeepEnergy models complex collaboration among cells, making it possible to effectively coordinate the working state of tens of thousands of cells, which could help over 71% of Chinese provinces avoid carbon efficiency traps. In addition, applying DeepEnergy is estimated to reduce 20.90 +- 0.98 megatons of carbon emissions at the national level in 2023. We further assess the effects of adopting renewable energy and discover that the mobile network could accomplish more than 50% of its net-zero goal by integrating DeepEnergy and solar energy systems. Our study provides insight into carbon emission mitigation in 5G network infrastructure launching in China and overworld, paving the way towards achieving sustainable development goals and future net-zero mobile networks. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2305.14022 [pdf, other]

Realistic Noise Synthesis with Diffusion Models

Authors: Qi Wu, Mingyan Han, Ting Jiang, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

Abstract: Deep image denoising models often rely on large amount of training data for the high quality performance. However, it is challenging to obtain sufficient amount of data under real-world scenarios for the supervised training. As such, synthesizing realistic noise becomes an important solution. However, existing techniques have limitations in modeling complex noise distributions, resulting in residu… ▽ More Deep image denoising models often rely on large amount of training data for the high quality performance. However, it is challenging to obtain sufficient amount of data under real-world scenarios for the supervised training. As such, synthesizing realistic noise becomes an important solution. However, existing techniques have limitations in modeling complex noise distributions, resulting in residual noise and edge artifacts in denoising methods relying on synthetic data. To overcome these challenges, we propose a novel method that synthesizes realistic noise using diffusion models, namely Realistic Noise Synthesize Diffusor (RNSD). In particular, the proposed time-aware controlling module can simulate various environmental conditions under given camera settings. RNSD can incorporate guided multiscale content, such that more realistic noise with spatial correlations can be generated at multiple frequencies. In addition, we construct an inversion mechanism to predict the unknown camera setting, which enables the extension of RNSD to datasets without setting information. Extensive experiments demonstrate that our RNSD method significantly outperforms the existing methods not only in the synthesized noise under multiple realism metrics, but also in the single image denoising performances. △ Less

Submitted 3 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.07130 [pdf, other]

Active Sensing for Two-Sided Beam Alignment and Reflection Design Using **-Pong Pilots

Authors: Tao Jiang, Foad Sohrabi, Wei Yu

Abstract: Beam alignment is an important task for millimeter-wave (mmWave) communication, because constructing aligned narrow beams both at the transmitter (Tx) and the receiver (Rx) is crucial in terms of compensating the significant path loss in very high-frequency bands. However, beam alignment is also a highly nontrivial task because large antenna arrays typically have a limited number of radio-frequenc… ▽ More Beam alignment is an important task for millimeter-wave (mmWave) communication, because constructing aligned narrow beams both at the transmitter (Tx) and the receiver (Rx) is crucial in terms of compensating the significant path loss in very high-frequency bands. However, beam alignment is also a highly nontrivial task because large antenna arrays typically have a limited number of radio-frequency chains, allowing only low-dimensional measurements of the high-dimensional channel. This paper considers a two-sided beam alignment problem based on an alternating **-pong pilot scheme between Tx and Rx over multiple rounds without explicit feedback. We propose a deep active sensing framework in which two long short-term memory (LSTM) based neural networks are employed to learn the adaptive sensing strategies (i.e., measurement vectors) and to produce the final aligned beamformers at both sides. In the proposed **-pong protocol, the Tx and the Rx alternately send pilots so that both sides can leverage local observations to sequentially design their respective sensing and data transmission beamformers. The proposed strategy can be extended to scenarios with a reconfigurable intelligent surface (RIS) for designing, in addition, the reflection coefficients at the RIS for both sensing and communications. Numerical experiments demonstrate significant and interpretable performance improvement. The proposed strategy works well even for the challenging multipath channel environments. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: This paper is accepted in IEEE Journal on Selected Areas in Information Theory

arXiv:2305.05899 [pdf, other]

Mobile Image Restoration via Prior Quantization

Authors: Shiqi Chen, **wen Zhou, Menghao Li, Yueting Chen, Tingting Jiang

Abstract: In digital images, the performance of optical aberration is a multivariate degradation, where the spectral of the scene, the lens imperfections, and the field of view together contribute to the results. Besides eliminating it at the hardware level, the post-processing system, which utilizes various prior information, is significant for correction. However, due to the content differences among prio… ▽ More In digital images, the performance of optical aberration is a multivariate degradation, where the spectral of the scene, the lens imperfections, and the field of view together contribute to the results. Besides eliminating it at the hardware level, the post-processing system, which utilizes various prior information, is significant for correction. However, due to the content differences among priors, the pipeline that aligns these factors shows limited efficiency and unoptimized restoration. Here, we propose a prior quantization model to correct the optical aberrations in image processing systems. To integrate these messages, we encode various priors into a latent space and quantify them by the learnable codebooks. After quantization, the prior codes are fused with the image restoration branch to realize targeted optical aberration correction. Comprehensive experiments demonstrate the flexibility of the proposed method and validate its potential to accomplish targeted restoration for a specific camera. Furthermore, our model promises to analyze the correlation between the various priors and the optical aberration of devices, which is helpful for joint soft-hardware design. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: Submitted to Elsevier PRL. 5 pages, 5figures

arXiv:2304.07018 [pdf, other]

DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution

Authors: Lei Yu, Xinpeng Li, Youwei Li, Ting Jiang, Qi Wu, Haoqiang Fan, Shuaicheng Liu

Abstract: Efficient deep learning-based approaches have achieved remarkable performance in single image super-resolution. However, recent studies on efficient super-resolution have mainly focused on reducing the number of parameters and floating-point operations through various network designs. Although these methods can decrease the number of parameters and floating-point operations, they may not necessari… ▽ More Efficient deep learning-based approaches have achieved remarkable performance in single image super-resolution. However, recent studies on efficient super-resolution have mainly focused on reducing the number of parameters and floating-point operations through various network designs. Although these methods can decrease the number of parameters and floating-point operations, they may not necessarily reduce actual running time. To address this issue, we propose a novel multi-stage lightweight network boosting method, which can enable lightweight networks to achieve outstanding performance. Specifically, we leverage enhanced high-resolution output as additional supervision to improve the learning ability of lightweight student networks. Upon convergence of the student network, we further simplify our network structure to a more lightweight level using reparameterization techniques and iterative network pruning. Meanwhile, we adopt an effective lightweight network training strategy that combines multi-anchor distillation and progressive learning, enabling the lightweight network to achieve outstanding performance. Ultimately, our proposed method achieves the fastest inference time among all participants in the NTIRE 2023 efficient super-resolution challenge while maintaining competitive super-resolution performance. Additionally, extensive experiments are conducted to demonstrate the effectiveness of the proposed components. The results show that our approach achieves comparable performance in representative dataset DIV2K, both qualitatively and quantitatively, with faster inference and fewer number of network parameters. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2303.17959 [pdf, other]

Diffusion Action Segmentation

Authors: Daochang Liu, Qiyue Li, AnhDung Dinh, Tingting Jiang, Mubarak Shah, Chang Xu

Abstract: Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. We propose a novel framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are iteratively generated from random no… ▽ More Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. We propose a novel framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are iteratively generated from random noise with input video features as conditions. To enhance the modeling of three striking characteristics of human actions, including the position prior, the boundary ambiguity, and the relational dependency, we devise a unified masking strategy for the conditioning inputs in our framework. Extensive experiments on three benchmark datasets, i.e., GTEA, 50Salads, and Breakfast, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action segmentation. △ Less

Submitted 11 August, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: ICCV 2023

arXiv:2211.07201 [pdf, other]

Towards A Unified Conformer Structure: from ASR to ASV Task

Authors: Dexin Liao, Tao Jiang, Feng Wang, Lin Li, Qingyang Hong

Abstract: Transformer has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism, and its variant Conformer has become a state-of-the-art architecture in the field of Automatic Speech Recognition (ASR). However, the main-stream architecture for Automatic Speaker Verification (ASV) is convolutional Neural Networks, and there… ▽ More Transformer has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism, and its variant Conformer has become a state-of-the-art architecture in the field of Automatic Speech Recognition (ASR). However, the main-stream architecture for Automatic Speaker Verification (ASV) is convolutional Neural Networks, and there is still much room for research on the Conformer based ASV. In this paper, firstly, we modify the Conformer architecture from ASR to ASV with very minor changes. Length-Scaled Attention (LSA) method and Sharpness-Aware Minimizationis (SAM) are adopted to improve model generalization. Experiments conducted on VoxCeleb and CN-Celeb show that our Conformer based ASV achieves competitive performance compared with the popular ECAPA-TDNN. Secondly, inspired by the transfer learning strategy, ASV Conformer is natural to be initialized from the pretrained ASR model. Via parameter transferring, self-attention mechanism could better focus on the relationship between sequence features, brings about 11% relative improvement in EER on test set of VoxCeleb and CN-Celeb, which reveals the potential of Conformer to unify ASV and ASR task. Finally, we provide a runtime in ASV-Subtools to evaluate its inference speed in production scenario. Our code is released at https://github.com/Snowdar/asv-subtools/tree/master/doc/papers/conformer.md. △ Less

Submitted 15 January, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2210.02596 [pdf, other]

Role of Deep Learning in Wireless Communications

Authors: Wei Yu, Foad Sohrabi, Tao Jiang

Abstract: Traditional communication system design has always been based on the paradigm of first establishing a mathematical model of the communication channel, then designing and optimizing the system according to the model. The advent of modern machine learning techniques, specifically deep neural networks, has opened up opportunities for data-driven system design and optimization. This article draws exam… ▽ More Traditional communication system design has always been based on the paradigm of first establishing a mathematical model of the communication channel, then designing and optimizing the system according to the model. The advent of modern machine learning techniques, specifically deep neural networks, has opened up opportunities for data-driven system design and optimization. This article draws examples from the optimization of reconfigurable intelligent surface, distributed channel estimation and feedback for multiuser beamforming, and active sensing for millimeter wave (mmWave) initial alignment to illustrate that a data-driven design that bypasses explicit channel modelling can often discover excellent solutions to communication system design and optimization problems that are otherwise computationally difficult to solve. We show that by performing an end-to-end training of a deep neural network using a large number of channel samples, a machine learning based approach can potentially provide significant system-level improvements as compared to the traditional model-based approach for solving optimization problems. The key to the successful applications of machine learning techniques is in choosing the appropriate neural network architecture to match the underlying problem structure. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: 13 pages, 12 figures, To appear in IEEE BITS the Information Theory Magazine

arXiv:2205.12633 [pdf, other]

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, ** Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemap** operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds). △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

arXiv:2205.06396 [pdf, other]

doi 10.1109/JSTSP.2022.3178213

Learning Based User Scheduling in Reconfigurable Intelligent Surface Assisted Multiuser Downlink

Authors: Zhongze Zhang, Tao Jiang, Wei Yu

Abstract: Reconfigurable intelligent surface (RIS) is capable of intelligently manipulating the phases of the incident electromagnetic wave to improve the wireless propagation environment between the base-station (BS) and the users. This paper addresses the joint user scheduling, RIS configuration, and BS beamforming problem in an RIS-assisted downlink network with limited pilot overhead. We show that graph… ▽ More Reconfigurable intelligent surface (RIS) is capable of intelligently manipulating the phases of the incident electromagnetic wave to improve the wireless propagation environment between the base-station (BS) and the users. This paper addresses the joint user scheduling, RIS configuration, and BS beamforming problem in an RIS-assisted downlink network with limited pilot overhead. We show that graph neural networks (GNN) with permutation invariant and equivariant properties can be used to appropriately schedule users and to design RIS configurations to achieve high overall throughput while accounting for fairness among the users. As compared to the conventional methodology of first estimating the channels then optimizing the user schedule, RIS configuration and the beamformers, this paper shows that an optimized user schedule can be obtained directly from a very short set of pilots using a GNN, then the RIS configuration can be optimized using a second GNN, and finally the BS beamformers can be designed based on the overall effective channel. Numerical results show that the proposed approach can utilize the received pilots more efficiently than the conventional channel estimation based approach, and can generalize to systems with an arbitrary number of users. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted in IEEE Journal of Selected Topics in Signal Processing

arXiv:2202.09020 [pdf, other]

A Comprehensive Survey with Quantitative Comparison of Image Analysis Methods for Microorganism Biovolume Measurements

Authors: Jiawei Zhang, Chen Li, Md Mamunur Rahaman, Yudong Yao, **li Ma, **ghua Zhang, Xin Zhao, Tao Jiang, Marcin Grzegorzek

Abstract: With the acceleration of urbanization and living standards, microorganisms play increasingly important roles in industrial production, bio-technique, and food safety testing. Microorganism biovolume measurements are one of the essential parts of microbial analysis. However, traditional manual measurement methods are time-consuming and challenging to measure the characteristics precisely. With the… ▽ More With the acceleration of urbanization and living standards, microorganisms play increasingly important roles in industrial production, bio-technique, and food safety testing. Microorganism biovolume measurements are one of the essential parts of microbial analysis. However, traditional manual measurement methods are time-consuming and challenging to measure the characteristics precisely. With the development of digital image processing techniques, the characteristics of the microbial population can be detected and quantified. The changing trend can be adjusted in time and provided a basis for the improvement. The applications of the microorganism biovolume measurement method have developed since the 1980s. More than 62 articles are reviewed in this study, and the articles are grouped by digital image segmentation methods with periods. This study has high research significance and application value, which can be referred to microbial researchers to have a comprehensive understanding of microorganism biovolume measurements using digital image analysis methods and potential applications. △ Less

Submitted 2 May, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

arXiv:2202.07820 [pdf, other]

A Survey of Semen Quality Evaluation in Microscopic Videos Using Computer Assisted Sperm Analysis

Authors: Wenwei Zhao, **li Ma, Chen Li, Xiaoning Bu, Shuojia Zou, Tao Jiang, Marcin Grzegorzek

Abstract: The Computer Assisted Sperm Analysis (CASA) plays a crucial role in male reproductive health diagnosis and Infertility treatment. With the development of the computer industry in recent years, a great of accurate algorithms are proposed. With the assistance of those novel algorithms, it is possible for CASA to achieve a faster and higher quality result. Since image processing is the technical basi… ▽ More The Computer Assisted Sperm Analysis (CASA) plays a crucial role in male reproductive health diagnosis and Infertility treatment. With the development of the computer industry in recent years, a great of accurate algorithms are proposed. With the assistance of those novel algorithms, it is possible for CASA to achieve a faster and higher quality result. Since image processing is the technical basis of CASA, including pre-processing,feature extraction, target detection and tracking, these methods are important technical steps in dealing with CASA. The various works related to Computer Assisted Sperm Analysis methods in the last 30 years (since 1988) are comprehensively introduced and analysed in this survey. To facilitate understanding, the methods involved are analysed in the sequence of general steps in sperm analysis. In other words, the methods related to sperm detection (localization) are first analysed, and then the methods of sperm tracking are analysed. Beside this, we analyse and prospect the present situation and future of CASA. According to our work, the feasible for applying in sperm microscopic video of methods mentioned in this review is explained. Moreover, existing challenges of object detection and tracking in microscope video are potential to be solved inspired by this survey. △ Less

Submitted 17 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

arXiv:2202.06948 [pdf]

doi 10.3389/fncom.2023.1232925

Towards Best Practice of Interpreting Deep Learning Models for EEG-based Brain Computer Interfaces

Authors: Jian Cui, Liqiang Yuan, Zhaoxiang Wang, Ruilin Li, Tianzi Jiang

Abstract: As deep learning has achieved state-of-the-art performance for many tasks of EEG-based BCI, many efforts have been made in recent years trying to understand what have been learned by the models. This is commonly done by generating a heatmap indicating to which extent each pixel of the input contributes to the final classification for a trained model. Despite the wide use, it is not yet understood… ▽ More As deep learning has achieved state-of-the-art performance for many tasks of EEG-based BCI, many efforts have been made in recent years trying to understand what have been learned by the models. This is commonly done by generating a heatmap indicating to which extent each pixel of the input contributes to the final classification for a trained model. Despite the wide use, it is not yet understood to which extent the obtained interpretation results can be trusted and how accurate they can reflect the model decisions. In order to fill this research gap, we conduct a study to evaluate different deep interpretation techniques quantitatively on EEG datasets. The results reveal the importance of selecting a proper interpretation technique as the initial step. In addition, we also find that the quality of the interpretation results is inconsistent for individual samples despite when a method with an overall good performance is used. Many factors, including model structure and dataset types, could potentially affect the quality of the interpretation results. Based on the observations, we propose a set of procedures that allow the interpretation results to be presented in an understandable and trusted way. We illustrate the usefulness of our method for EEG-based BCI with instances selected from different scenarios. △ Less

Submitted 17 April, 2023; v1 submitted 12 February, 2022; originally announced February 2022.

arXiv:2202.06465 [pdf, other]

A State-of-the-art Survey of U-Net in Microscopic Image Analysis: from Simple Usage to Structure Mortification

Authors: Jian Wu, Wanli Liu, Chen Li, Tao Jiang, Islam Mohammad Shariful, Hongzan Sun, Xiaoqi Li, Xintong Li, Xinyu Huang, Marcin Grzegorzek

Abstract: Image analysis technology is used to solve the inadvertences of artificial traditional methods in disease, wastewater treatment, environmental change monitoring analysis and convolutional neural networks (CNN) play an important role in microscopic image analysis. An important step in detection, tracking, monitoring, feature extraction, modeling and analysis is image segmentation, in which U-Net ha… ▽ More Image analysis technology is used to solve the inadvertences of artificial traditional methods in disease, wastewater treatment, environmental change monitoring analysis and convolutional neural networks (CNN) play an important role in microscopic image analysis. An important step in detection, tracking, monitoring, feature extraction, modeling and analysis is image segmentation, in which U-Net has increasingly applied in microscopic image segmentation. This paper comprehensively reviews the development history of U-Net, and analyzes various research results of various segmentation methods since the emergence of U-Net and conducts a comprehensive review of related papers. First, this paper has summarized the improved methods of U-Net and then listed the existing significance of image segmentation techniques and their improvements that has introduced over the years. Finally, focusing on the different improvement strategies of U-Net in different papers, the related work of each application target is reviewed according to detailed technical categories to facilitate future research. Researchers can clearly see the dynamics of transmission of technological development and keep up with future trends in this interdisciplinary field. △ Less

Submitted 23 April, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

arXiv:2112.13261 [pdf, other]

Interference Nulling Using Reconfigurable Intelligent Surface

Authors: Tao Jiang, Wei Yu

Abstract: This paper investigates the interference nulling capability of reconfigurable intelligent surface (RIS) in a multiuser environment where multiple single-antenna transceivers communicate simultaneously in a shared spectrum. From a theoretical perspective, we show that when the channels between the RIS and the transceivers have line-of-sight and the direct paths are blocked, it is possible to adjust… ▽ More This paper investigates the interference nulling capability of reconfigurable intelligent surface (RIS) in a multiuser environment where multiple single-antenna transceivers communicate simultaneously in a shared spectrum. From a theoretical perspective, we show that when the channels between the RIS and the transceivers have line-of-sight and the direct paths are blocked, it is possible to adjust the phases of the RIS elements to null out all the interference completely and to achieve the maximum $K$ degrees-of-freedom (DoF) in the overall $K$-user interference channel, provided that the number of RIS elements exceeds some finite value that depends on $K$. Algorithmically, for any fixed channel realization we formulate the interference nulling problem as a feasibility problem, and propose an alternating projection algorithm to efficiently solve the resulting nonconvex problem with local convergence guarantee. Numerical results show that the proposed alternating projection algorithm can null all the interference if the number of RIS elements is only slightly larger than a threshold of $2K(K-1)$. For the practical sum-rate maximization objective, this paper proposes to use the zero-forcing solution obtained from alternating projection as an initial point for subsequent Riemannian conjugate gradient optimization and shows that it has a significant performance advantage over random initializations. For the objective of maximizing the minimum rate, this paper proposes a subgradient projection method which is capable of achieving excellent performance at low complexity. △ Less

Submitted 27 January, 2022; v1 submitted 25 December, 2021; originally announced December 2021.

Comments: This paper is accepted in IEEE Journal on Selected Areas in Communications

arXiv:2110.09121 [pdf, ps, other]

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

Authors: Xiaobin Zhuang, Huiran Yu, Weifeng Zhao, Tao Jiang, Peng Hu

Abstract: An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and… ▽ More An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and resynthesizes the singing voice directly from the tuned pitch and vocal spectrum extracted from the original recordings. Several vital technical points have been introduced in KaraTuner to ensure pitch accuracy, pitch naturalness, timbre consistency, and sound quality. A feed-forward Transformer is employed in the pitch predictor to capture longterm dependencies in the vocal spectrum and musical note. We also develop a pitch-controllable vocoder based on a novel source-filter block and the Fre-GAN architecture. KaraTuner obtains a higher preference than the rule-based pitch correction approach through A/B tests, and perceptual experiments show that the proposed vocoder achieves significant advantages in timbre consistency and sound quality compared with the parametric WORLD vocoder, phase vocoder and CLPC vocoder. △ Less

Submitted 26 June, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

Comments: To be published in Proc. Interspeech 2022, Incheon, South Korea

arXiv:2107.11617 [pdf, other]

LAConv: Local Adaptive Convolution for Image Fusion

Authors: Zi-Rong **, Liang-Jian Deng, Tai-Xiang Jiang, Tian-**g Zhang

Abstract: The convolution operation is a powerful tool for feature extraction and plays a prominent role in the field of computer vision. However, when targeting the pixel-wise tasks like image fusion, it would not fully perceive the particularity of each pixel in the image if the uniform convolution kernel is used on different patches. In this paper, we propose a local adaptive convolution (LAConv), which… ▽ More The convolution operation is a powerful tool for feature extraction and plays a prominent role in the field of computer vision. However, when targeting the pixel-wise tasks like image fusion, it would not fully perceive the particularity of each pixel in the image if the uniform convolution kernel is used on different patches. In this paper, we propose a local adaptive convolution (LAConv), which is dynamically adjusted to different spatial locations. LAConv enables the network to pay attention to every specific local area in the learning process. Besides, the dynamic bias (DYB) is introduced to provide more possibilities for the depiction of features and make the network more flexible. We further design a residual structure network equipped with the proposed LAConv and DYB modules, and apply it to two image fusion tasks. Experiments for pansharpening and hyperspectral image super-resolution (HISR) demonstrate the superiority of our method over other state-of-the-art methods. It is worth mentioning that LAConv can also be competent for other super-resolution tasks with less computation effort. △ Less

Submitted 24 July, 2021; originally announced July 2021.

arXiv:2106.12470 [pdf, ps, other]

Bilateral Control of Teleoperators with Closed Architecture and Time-Varying Delay

Authors: Hanlei Wang, Yipeng Li, Tiantian Jiang

Abstract: This paper investigates bilateral control of teleoperators with closed architecture and subjected to arbitrary bounded time-varying delay. A prominent challenge for bilateral control of such teleoperators lies in the closed architecture, especially in the context not involving interaction force/torque measurement. This yields the long-standing situation that most bilateral control rigorously devel… ▽ More This paper investigates bilateral control of teleoperators with closed architecture and subjected to arbitrary bounded time-varying delay. A prominent challenge for bilateral control of such teleoperators lies in the closed architecture, especially in the context not involving interaction force/torque measurement. This yields the long-standing situation that most bilateral control rigorously developed in the literature is hard to be justified as applied to teleoperators with closed architecture. With a new class of dynamic feedback, we propose kinematic and adaptive dynamic controllers for teleoperators with closed architecture, and we show that the proposed kinematic and dynamic controllers are robust with respect to arbitrary bounded time-varying delay. In addition, by exploiting the input-output properties of an inverted form of the dynamics of robot manipulators with closed architecture, we remove the assumption of uniform exponential stability of a linear time-varying system due to the adaptation to the gains of the inner controller in demonstrating stability of the presented adaptive dynamic control. The application of the proposed approach is illustrated by the experimental results using a Phantom Omni and a UR10 robot. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: This version is prepared with the consideration of the reviewers' and AE's comments

arXiv:2105.14320 [pdf, other]

doi 10.1109/TIP.2022.3176220

Self-Supervised Nonlinear Transform-Based Tensor Nuclear Norm for Multi-Dimensional Image Recovery

Authors: Yi-Si Luo, Xi-Le Zhao, Tai-Xiang Jiang, Yi Chang, Michael K. Ng, Chao Li

Abstract: In this paper, we study multi-dimensional image recovery. Recently, transform-based tensor nuclear norm minimization methods are considered to capture low-rank tensor structures to recover third-order tensors in multi-dimensional image processing applications. The main characteristic of such methods is to perform the linear transform along the third mode of third-order tensors, and then compute te… ▽ More In this paper, we study multi-dimensional image recovery. Recently, transform-based tensor nuclear norm minimization methods are considered to capture low-rank tensor structures to recover third-order tensors in multi-dimensional image processing applications. The main characteristic of such methods is to perform the linear transform along the third mode of third-order tensors, and then compute tensor nuclear norm minimization on the transformed tensor so that the underlying low-rank tensors can be recovered. The main aim of this paper is to propose a nonlinear multilayer neural network to learn a nonlinear transform via the observed tensor data under self-supervision. The proposed network makes use of low-rank representation of transformed tensors and data-fitting between the observed tensor and the reconstructed tensor to construct the nonlinear transformation. Extensive experimental results on tensor completion, background subtraction, robust tensor completion, and snapshot compressive imaging are presented to demonstrate that the performance of the proposed method is better than that of state-of-the-art methods. △ Less

Submitted 29 May, 2021; originally announced May 2021.

arXiv:2104.06243 [pdf, other]

A State-of-the-art Survey of Artificial Neural Networks for Whole-slide Image Analysis:from Popular Convolutional Neural Networks to Potential Visual Transformers

Authors: Xintong Li, Weiming Hu, Chen Li, Tao Jiang, Hongzan Sun, Xiaoyan Li, Xinyu Huang, Marcin Grzegorzek

Abstract: To increase the objectivity and accuracy of pathologists' work, artificial neural network(ANN) methods have been generally needed in the segmentation, classification, and detection of histopathological WSI. In this paper, WSI analysis methods based on ANN are reviewed. Firstly, the development status of WSI and ANN methods is introduced. Secondly, we summarize the common ANN methods. Next, we disc… ▽ More To increase the objectivity and accuracy of pathologists' work, artificial neural network(ANN) methods have been generally needed in the segmentation, classification, and detection of histopathological WSI. In this paper, WSI analysis methods based on ANN are reviewed. Firstly, the development status of WSI and ANN methods is introduced. Secondly, we summarize the common ANN methods. Next, we discuss publicly available WSI datasets and evaluation metrics. These ANN architectures for WSI processing are divided into classical neural networks and deep neural networks(DNNs) and then analyzed. Finally, the application prospect of the analytical method in this field is discussed. The important potential method is Visual Transformers. △ Less

Submitted 26 February, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: 22 pages, 38 figures. arXiv admin note: substantial text overlap with arXiv:2102.10553

arXiv:2103.13625 [pdf, other]

A Comprehensive Review of Image Analysis Methods for Microorganism Counting: From Classical Image Processing to Deep Learning Approaches

Authors: Jiawei Zhang, Chen Li, Md Mamunur Rahaman, Yudong Yao, **li Ma, **ghua Zhang, Xin Zhao, Tao Jiang, Marcin Grzegorzek

Abstract: Microorganisms such as bacteria and fungi play essential roles in many application fields, like biotechnique, medical technique and industrial domain. Microorganism counting techniques are crucial in microorganism analysis, hel** biologists and related researchers quantitatively analyze the microorganisms and calculate their characteristics, such as biomass concentration and biological activity.… ▽ More Microorganisms such as bacteria and fungi play essential roles in many application fields, like biotechnique, medical technique and industrial domain. Microorganism counting techniques are crucial in microorganism analysis, hel** biologists and related researchers quantitatively analyze the microorganisms and calculate their characteristics, such as biomass concentration and biological activity. However, traditional microorganism manual counting methods, such as plate counting method, hemocytometry and turbidimetry, are time-consuming, subjective and need complex operations, which are difficult to be applied in large-scale applications. In order to improve this situation, image analysis is applied for microorganism counting since the 1980s, which consists of digital image processing, image segmentation, image classification and suchlike. Image analysis-based microorganism counting methods are efficient comparing with traditional plate counting methods. In this article, we have studied the development of microorganism counting methods using digital image analysis. Firstly, the microorganisms are grouped as bacteria and other microorganisms. Then, the related articles are summarized based on image segmentation methods. Each part of the article is reviewed by methodologies. Moreover, commonly used image processing methods for microorganism counting are summarized and analyzed to find common technological points. More than 144 papers are outlined in this article. In conclusion, this paper provides new ideas for the future development trend of microorganism counting, and provides systematic suggestions for implementing integrated microorganism counting systems in the future. Researchers in other fields can refer to the techniques analyzed in this paper. △ Less

Submitted 29 September, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2012.05720 [pdf, other]

Peer-to-Peer Localization for Single-Antenna Devices

Authors: Xianan Zhang, Wei Wang, Xuedou Xiao, Hang Yang, Xinyu Zhang, Tao Jiang

Abstract: Some important indoor localization applications, such as localizing a lost kid in a shop** mall, call for a new peer-to-peer localization technique that can localize an individual's smartphone or wearables by directly using another's on-body devices in unknown indoor environments. However, current localization solutions either require pre-deployed infrastructures or multiple antennas in both tra… ▽ More Some important indoor localization applications, such as localizing a lost kid in a shop** mall, call for a new peer-to-peer localization technique that can localize an individual's smartphone or wearables by directly using another's on-body devices in unknown indoor environments. However, current localization solutions either require pre-deployed infrastructures or multiple antennas in both transceivers, impending their wide-scale application. In this paper, we present P2PLocate, a peer-to-peer localization system that enables a single-antenna device co-located with a batteryless backscatter tag to localize another single-antenna device with decimeter-level accuracy. P2PLocate leverages the multipath variations intentionally created by an on-body backscatter tag, coupled with spatial information offered by user movements, to accomplish this objective without relying on any pre-deployed infrastructures or pre-training. P2PLocate incorporates novel algorithms to address two major challenges: (i) interference with strong direct-path signal while extracting multipath variations, and (ii) lack of direction information while using single-antenna transceivers. We implement P2PLocate on commercial off-the-shelf Google Nexus 6p, Intel 5300 WiFi card, and Raspberry Pi B4. Real-world experiments reveal that P2PLocate can localize both static and mobile targets with a median accuracy of 0.88 m. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2011.04263 [pdf, other]

doi 10.1007/s11263-020-01408-w

Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training

Authors: Dingquan Li, Tingting Jiang, Ming Jiang

Abstract: Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and disto… ▽ More Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear map**, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA. △ Less

Submitted 15 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: 20 pages, 12 figures, 7 tables, accepted by IJCV. This is the version provided to IJCV office

arXiv:2009.14404 [pdf, other]

Learning to Reflect and to Beamform for Intelligent Reflecting Surface with Implicit Channel Estimation

Authors: Tao Jiang, Hei Victor Cheng, Wei Yu

Abstract: Intelligent reflecting surface (IRS), which consists of a large number of tunable reflective elements, is capable of enhancing the wireless propagation environment in a cellular network by intelligently reflecting the electromagnetic waves from the base-station (BS) toward the users. The optimal tuning of the phase shifters at the IRS is, however, a challenging problem, because due to the passive… ▽ More Intelligent reflecting surface (IRS), which consists of a large number of tunable reflective elements, is capable of enhancing the wireless propagation environment in a cellular network by intelligently reflecting the electromagnetic waves from the base-station (BS) toward the users. The optimal tuning of the phase shifters at the IRS is, however, a challenging problem, because due to the passive nature of reflective elements, it is difficult to directly measure the channels between the IRS, the BS, and the users. Instead of following the traditional paradigm of first estimating the channels then optimizing the system parameters, this paper advocates a machine learning approach capable of directly optimizing both the beamformers at the BS and the reflective coefficients at the IRS based on a system objective. This is achieved by using a deep neural network to parameterize the map** from the received pilots (plus any additional information, such as the user locations) to an optimized system configuration, and by adopting a permutation invariant/equivariant graph neural network (GNN) architecture to capture the interactions among the different users in the cellular network. Simulation results show that the proposed implicit channel estimation based approach is generalizable, can be interpreted, and can efficiently learn to maximize a sum-rate or minimum-rate objective from a much fewer number of pilots than the traditional explicit channel estimation based approaches. △ Less

Submitted 8 June, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

Comments: To appear in IEEE Journal of Selected Areas in Communications

arXiv:2008.05074 [pdf, ps, other]

doi 10.1109/JIOT.2021.3078462

A Review of Deep Reinforcement Learning for Smart Building Energy Management

Authors: Liang Yu, Shuqi Qin, Meng Zhang, Chao Shen, Tao Jiang, Xiaohong Guan

Abstract: Global buildings account for about 30% of the total energy consumption and carbon emission, raising severe energy and environmental concerns. Therefore, it is significant and urgent to develop novel smart building energy management (SBEM) technologies for the advance of energy-efficient and green buildings. However, it is a nontrivial task due to the following challenges. Firstly, it is generally… ▽ More Global buildings account for about 30% of the total energy consumption and carbon emission, raising severe energy and environmental concerns. Therefore, it is significant and urgent to develop novel smart building energy management (SBEM) technologies for the advance of energy-efficient and green buildings. However, it is a nontrivial task due to the following challenges. Firstly, it is generally difficult to develop an explicit building thermal dynamics model that is both accurate and efficient enough for building control. Secondly, there are many uncertain system parameters (e.g., renewable generation output, outdoor temperature, and the number of occupants). Thirdly, there are many spatially and temporally coupled operational constraints. Fourthly, building energy optimization problems can not be solved in real-time by traditional methods when they have extremely large solution spaces. Fifthly, traditional building energy management methods have respective applicable premises, which means that they have low versatility when confronted with varying building environments. With the rapid development of Internet of Things technology and computation capability, artificial intelligence technology find its significant competence in control and optimization. As a general artificial intelligence technology, deep reinforcement learning (DRL) is promising to address the above challenges. Notably, the recent years have seen the surge of DRL for SBEM. However, there lacks a systematic overview of different DRL methods for SBEM. To fill the gap, this paper provides a comprehensive review of DRL for SBEM from the perspective of system scale. In particular, we identify the existing unresolved issues and point out possible future research directions. △ Less

Submitted 22 September, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

Comments: 21 pages, 12 figures

Journal ref: IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12046-12063, 2021

arXiv:2008.03889 [pdf, other]

doi 10.1145/3394171.3413804

Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment

Authors: Dingquan Li, Tingting Jiang, Ming Jiang

Abstract: Currently, most image quality assessment (IQA) models are supervised by the MAE or MSE loss with empirically slow convergence. It is well-known that normalization can facilitate fast convergence. Therefore, we explore normalization in the design of loss functions for IQA. Specifically, we first normalize the predicted quality scores and the corresponding subjective quality scores. Then, the loss i… ▽ More Currently, most image quality assessment (IQA) models are supervised by the MAE or MSE loss with empirically slow convergence. It is well-known that normalization can facilitate fast convergence. Therefore, we explore normalization in the design of loss functions for IQA. Specifically, we first normalize the predicted quality scores and the corresponding subjective quality scores. Then, the loss is defined based on the norm of the differences between these normalized values. The resulting "Norm-in-Norm'' loss encourages the IQA model to make linear predictions with respect to subjective quality scores. After training, the least squares regression is applied to determine the linear map** from the predicted quality to the subjective quality. It is shown that the new loss is closely connected with two common IQA performance criteria (PLCC and RMSE). Through theoretical analysis, it is proved that the embedded normalization makes the gradients of the loss function more stable and more predictable, which is conducive to the faster convergence of the IQA model. Furthermore, to experimentally verify the effectiveness of the proposed loss, it is applied to solve a challenging problem: quality assessment of in-the-wild images. Experiments on two relevant datasets (KonIQ-10k and CLIVE) show that, compared to MAE or MSE loss, the new loss enables the IQA model to converge about 10 times faster and the final model achieves better performance. The proposed model also achieves state-of-the-art prediction performance on this challenging problem. For reproducible scientific research, our code is publicly available at https://github.com/lidq92/LinearityIQA. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: Accepted by ACM MM 2020, + supplemental materials

arXiv:2007.14137 [pdf, other]

Nonnegative Low Rank Tensor Approximation and its Application to Multi-dimensional Images

Authors: Tai-Xiang Jiang, Michael K. Ng, Junjun Pan, Guang**g Song

Abstract: The main aim of this paper is to develop a new algorithm for computing nonnegative low rank tensor approximation for nonnegative tensors that arise in many multi-dimensional imaging applications. Nonnegativity is one of the important property as each pixel value refers to nonzero light intensity in image data acquisition. Our approach is different from classical nonnegative tensor factorization (N… ▽ More The main aim of this paper is to develop a new algorithm for computing nonnegative low rank tensor approximation for nonnegative tensors that arise in many multi-dimensional imaging applications. Nonnegativity is one of the important property as each pixel value refers to nonzero light intensity in image data acquisition. Our approach is different from classical nonnegative tensor factorization (NTF) which requires each factorized matrix and/or tensor to be nonnegative. In this paper, we determine a nonnegative low Tucker rank tensor to approximate a given nonnegative tensor. We propose an alternating projections algorithm for computing such nonnegative low rank tensor approximation, which is referred to as NLRT. The convergence of the proposed manifold projection method is established. Experimental results for synthetic data and multi-dimensional images are presented to demonstrate the performance of NLRT is better than state-of-the-art NTF methods. △ Less

Submitted 26 September, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

arXiv:2006.14156 [pdf, ps, other]

Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings

Authors: Liang Yu, Yi Sun, Zhanbo Xu, Chao Shen, Dong Yue, Tao Jiang, Xiaohong Guan

Abstract: In commercial buildings, about 40%-50% of the total electricity consumption is attributed to Heating, Ventilation, and Air Conditioning (HVAC) systems, which places an economic burden on building operators. In this paper, we intend to minimize the energy cost of an HVAC system in a multi-zone commercial building under dynamic pricing with the consideration of random zone occupancy, thermal comfort… ▽ More In commercial buildings, about 40%-50% of the total electricity consumption is attributed to Heating, Ventilation, and Air Conditioning (HVAC) systems, which places an economic burden on building operators. In this paper, we intend to minimize the energy cost of an HVAC system in a multi-zone commercial building under dynamic pricing with the consideration of random zone occupancy, thermal comfort, and indoor air quality comfort. Due to the existence of unknown thermal dynamics models, parameter uncertainties (e.g., outdoor temperature, electricity price, and number of occupants), spatially and temporally coupled constraints associated with indoor temperature and CO2 concentration, a large discrete solution space, and a non-convex and non-separable objective function, it is very challenging to achieve the above aim. To this end, the above energy cost minimization problem is reformulated as a Markov game. Then, an HVAC control algorithm is proposed to solve the Markov game based on multi-agent deep reinforcement learning with attention mechanism. The proposed algorithm does not require any prior knowledge of uncertain parameters and can operate without knowing building thermal dynamics models. Simulation results based on real-world traces show the effectiveness, robustness and scalability of the proposed algorithm. △ Less

Submitted 22 July, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: 14 pages, 21 figures, accepted by IEEE Transactions on Smart Grid

arXiv:2005.14400 [pdf, other]

Hyperspectral Image Super-resolution via Deep Spatio-spectral Convolutional Neural Networks

Authors: **-Fan Hu, Ting-Zhu Huang, Liang-Jian Deng, Tai-Xiang Jiang, Gemine Vivone, Jocelyn Chanussot

Abstract: Hyperspectral images are of crucial importance in order to better understand features of different materials. To reach this goal, they leverage on a high number of spectral bands. However, this interesting characteristic is often paid by a reduced spatial resolution compared with traditional multispectral image systems. In order to alleviate this issue, in this work, we propose a simple and effici… ▽ More Hyperspectral images are of crucial importance in order to better understand features of different materials. To reach this goal, they leverage on a high number of spectral bands. However, this interesting characteristic is often paid by a reduced spatial resolution compared with traditional multispectral image systems. In order to alleviate this issue, in this work, we propose a simple and efficient architecture for deep convolutional neural networks to fuse a low-resolution hyperspectral image (LR-HSI) and a high-resolution multispectral image (HR-MSI), yielding a high-resolution hyperspectral image (HR-HSI). The network is designed to preserve both spatial and spectral information thanks to an architecture from two folds: one is to utilize the HR-HSI at a different scale to get an output with a satisfied spectral preservation; another one is to apply concepts of multi-resolution analysis to extract high-frequency information, aiming to output high quality spatial details. Finally, a plain mean squared error loss function is used to measure the performance during the training. Extensive experiments demonstrate that the proposed network architecture achieves best performance (both qualitatively and quantitatively) compared with recent state-of-the-art hyperspectral image super-resolution approaches. Moreover, other significant advantages can be pointed out by the use of the proposed approach, such as, a better network generalization ability, a limited computational burden, and a robustness with respect to the number of training samples. △ Less

Submitted 29 May, 2020; originally announced May 2020.

arXiv:2005.13534 [pdf, ps, other]

Robot-assisted Backscatter Localization for IoT Applications

Authors: Shengkai Zhang, Wei Wang, Sheyang Tang, Shi **, Tao Jiang

Abstract: Recent years have witnessed the rapid proliferation of backscatter technologies that realize the ubiquitous and long-term connectivity to empower smart cities and smart homes. Localizing such backscatter tags is crucial for IoT-based smart applications. However, current backscatter localization systems require prior knowledge of the site, either a map or landmarks with known positions, which is la… ▽ More Recent years have witnessed the rapid proliferation of backscatter technologies that realize the ubiquitous and long-term connectivity to empower smart cities and smart homes. Localizing such backscatter tags is crucial for IoT-based smart applications. However, current backscatter localization systems require prior knowledge of the site, either a map or landmarks with known positions, which is laborious for deployment. To empower universal localization service, this paper presents Rover, an indoor localization system that localizes multiple backscatter tags without any start-up cost using a robot equipped with inertial sensors. Rover runs in a joint optimization framework, fusing measurements from backscattered WiFi signals and inertial sensors to simultaneously estimate the locations of both the robot and the connected tags. Our design addresses practical issues including interference among multiple tags, real-time processing, as well as the data marginalization problem in dealing with degenerated motions. We prototype Rover using off-the-shelf WiFi chips and customized backscatter tags. Our experiments show that Rover achieves localization accuracies of 39.3 cm for the robot and 74.6 cm for the tags. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: To appear in IEEE Transactions on Wireless Communications. arXiv admin note: substantial text overlap with arXiv:1908.03297

Showing 1–50 of 75 results for author: Jiang, T