Search | arXiv e-print repository

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2403.09353 [pdf, ps, other]

doi 10.1109/LCOMM.2023.3344599

Intelligent Reflecting Surfaces vs. Full-Duplex Relays: A Comparison in the Air

Authors: Qian Ding, Jie Yang, Yang Luo, Chunbo Luo

Abstract: This letter aims to provide a fundamental analytical comparison for the two major types of relaying methods: intelligent reflecting surfaces and full-duplex relays, particularly focusing on unmanned aerial vehicle communication scenarios. Both amplify-and-forward and decode-and-forward relaying schemes are included in the comparison. In addition, optimal 3D UAV deployment and minimum transmit powe… ▽ More This letter aims to provide a fundamental analytical comparison for the two major types of relaying methods: intelligent reflecting surfaces and full-duplex relays, particularly focusing on unmanned aerial vehicle communication scenarios. Both amplify-and-forward and decode-and-forward relaying schemes are included in the comparison. In addition, optimal 3D UAV deployment and minimum transmit power under the quality of service constraint are derived. Our numerical results show that IRSs of medium size exhibit comparable performance to AF relays, meanwhile outperforming DF relays under extremely large surface size and high data rates. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Journal ref: IEEE Communications Letters, vol. 28, no. 2, pp. 397-401, Feb. 2024

arXiv:2403.08680 [pdf, other]

Towards the THz Networks in the 6G Era

Authors: Qian Ding, Jie Yang, Yang Luo, Chunbo Luo

Abstract: This commentary dedicates to envision what role THz is going to play in the coming human-centric 6G era. Three distinct THz network types including outdoor, indoor, and body area networks are discussed, with an emphasis on their capabilities in human body detection. Synthesizing these networks will unlock a bunch of fascinating applications across industrial, biomedical and entertainment fields, s… ▽ More This commentary dedicates to envision what role THz is going to play in the coming human-centric 6G era. Three distinct THz network types including outdoor, indoor, and body area networks are discussed, with an emphasis on their capabilities in human body detection. Synthesizing these networks will unlock a bunch of fascinating applications across industrial, biomedical and entertainment fields, significantly enhancing the quality of human life. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2401.08430 [pdf, other]

A Dynamic Capacitance Matching (DCM)-based Current Response Algorithm for Signal Line RC Network

Authors: Zhoujie Wu, Cai Luo, Zhong Guan

Abstract: This paper proposes a dynamic capacitance matching (DCM)-based RC current response algorithm for calculating the current waveform of a signal line without performing SPICE simulation. Specifically, unlike previous method such as CCS model, driver linear representation, waveform functional fitting or equivalent load capacitance, our algorithm does not rely on fixed reduced model of both standard ce… ▽ More This paper proposes a dynamic capacitance matching (DCM)-based RC current response algorithm for calculating the current waveform of a signal line without performing SPICE simulation. Specifically, unlike previous method such as CCS model, driver linear representation, waveform functional fitting or equivalent load capacitance, our algorithm does not rely on fixed reduced model of both standard cell driver and RC load. Instead, our algorithm approaches the current waveform dynamically by computing current responses of the target driver for various load scenarios. Besides, we creatively use symbolic expression to combine the y-parameter of RC network with the pre-characterized driver library in order to perform capacitance matching by considering over/under-shoot effect. Our algorithm is experimentally verified on 40nm CMOS technology and has been partially adopted by latest commercial tool for other nodes. Experimental results show that our algorithm has excellent resolution and promising efficiency compared with traditional methods and SPICE golden result, especially for application in computing delay, power and signal line electromigration. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2311.14473 [pdf, other]

Joint Diffusion: Mutual Consistency-Driven Diffusion Model for PET-MRI Co-Reconstruction

Authors: Taofeng Xie, Zhuo-Xu Cui, Chen Luo, Huayu Wang, Congcong Liu, Yuanzhi Zhang, Xuemei Wang, Yanjie Zhu, Qiyu **, Guoqing Chen, Yihang Zhou, Dong Liang, Haifeng Wang

Abstract: Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI sy… ▽ More Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI systems. However, there exists complementary information among multi-modal images. The complementary information can contribute to image reconstruction. In this study, we propose a novel PET-MRI joint reconstruction model employing a mutual consistency-driven diffusion mode, namely MC-Diffusion. MC-Diffusion learns the joint probability distribution of PET and MRI for utilizing complementary information. We conducted a series of contrast experiments about LPLS, Joint ISAT-net and MC-Diffusion by the ADNI dataset. The results underscore the qualitative and quantitative improvements achieved by MC-Diffusion, surpassing the state-of-the-art method. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.08720 [pdf, other]

Massive Wireless Energy Transfer without Channel State Information via Imperfect Intelligent Reflecting Surfaces

Authors: Cheng Luo, Jie Hu, Kun Yang, Kai-Kit Wong

Abstract: Intelligent Reflecting Surface (IRS) utilizes low-cost, passive reflecting elements to enhance the passive beam gain, improve Wireless Energy Transfer (WET) efficiency, and enable its deployment for numerous Internet of Things (IoT) devices. However, the increasing number of IRS elements presents considerable channel estimation challenges. This is due to the lack of active Radio Frequency (RF) cha… ▽ More Intelligent Reflecting Surface (IRS) utilizes low-cost, passive reflecting elements to enhance the passive beam gain, improve Wireless Energy Transfer (WET) efficiency, and enable its deployment for numerous Internet of Things (IoT) devices. However, the increasing number of IRS elements presents considerable channel estimation challenges. This is due to the lack of active Radio Frequency (RF) chains in an IRS, while pilot overhead becomes intolerable. To address this issue, we propose a Channel State Information (CSI)-free scheme that maximizes received energy in a specific direction and covers the entire space through phased beam rotation. Furthermore, we take into account the impact of an imperfect IRS and meticulously design the active precoder and IRS reflecting phase shift to mitigate its effects. Our proposed technique does not alter the existing IRS hardware architecture, allowing for easy implementation in the current system, and enabling access or removal of any Energy Receivers (ERs) without additional cost. Numerical results illustrate the efficacy of our CSI-free scheme in facilitating large-scale IRS without compromising performance due to excessive pilot overhead. Furthermore, our scheme outperforms the CSI-based counterpart in scenarios involving large-scale ERs, making it a promising solution in the era of IoT. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.13335 [pdf, other]

doi 10.1109/TCOMM.2023.3337257

Reconfigurable Intelligent Sensing Surface aided Wireless Powered Communication Networks: A Sensing-Then-Reflecting Approach

Authors: Cheng Luo, Jie Hu, Kun Yang

Abstract: This paper presents a reconfigurable intelligent sensing surface (RISS) that combines passive and active elements to achieve simultaneous reflection and direction of arrival (DOA) estimation tasks. By utilizing DOA information from the RISS instead of conventional channel estimation, the pilot overhead is reduced and the RISS becomes independent of the hybrid access point (HAP), enabling efficient… ▽ More This paper presents a reconfigurable intelligent sensing surface (RISS) that combines passive and active elements to achieve simultaneous reflection and direction of arrival (DOA) estimation tasks. By utilizing DOA information from the RISS instead of conventional channel estimation, the pilot overhead is reduced and the RISS becomes independent of the hybrid access point (HAP), enabling efficient operation. Specifically, the RISS autonomously estimates the DOA of uplink signals from single-antenna users and reflects them using the HAP's slowly varying DOA information. During downlink transmission, it updates the HAP's DOA information and designs the reflection phase of energy signals based on the latest user DOA information. The paper includes a comprehensive performance analysis, covering system design, protocol details, receiving performance, and RISS deployment suggestions. We derive a closed-form expression to analyze system performance under DOA errors, and calculate the statistical distribution of user received energy using the moment-matching technique. We provide a recommended transmit power to meet a specified outage probability and energy threshold. Numerical results demonstrate that the proposed system outperforms the conventional counterpart by 2.3 dB and 4.7 dB for Rician factors $κ_h=κ_G=1$ and $κ_h=κ_G=10$, respectively. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2309.13571 [pdf, other]

Matrix Completion-Informed Deep Unfolded Equilibrium Models for Self-Supervised k-Space Interpolation in MRI

Authors: Chen Luo, Huayu Wang, Taofeng Xie, Qiyu **, Guoqing Chen, Zhuo-Xu Cui, Dong Liang

Abstract: Recently, regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL while retaining the theoretical guarantees of regularization models. However, most of these methods are tailored for supervised learning scenarios that necessitate fully sampled labels, which can pose challenges in practical MRI app… ▽ More Recently, regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL while retaining the theoretical guarantees of regularization models. However, most of these methods are tailored for supervised learning scenarios that necessitate fully sampled labels, which can pose challenges in practical MRI applications. To tackle this challenge, we propose a self-supervised DL approach for accelerated MRI that is theoretically guaranteed and does not rely on fully sampled labels. Specifically, we achieve neural network structure regularization by exploiting the inherent structural low-rankness of the $k$-space data. Simultaneously, we constrain the network structure to resemble a nonexpansive map**, ensuring the network's convergence to a fixed point. Thanks to this well-defined network structure, this fixed point can completely reconstruct the missing $k$-space data based on matrix completion theory, even in situations where full-sampled labels are unavailable. Experiments validate the effectiveness of our proposed method and demonstrate its superiority over existing self-supervised approaches and traditional regularization methods, achieving performance comparable to that of supervised learning methods in certain scenarios. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2309.09250 [pdf, other]

Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

Authors: Huayu Wang, Chen Luo, Taofeng Xie, Qiyu **, Guoqing Chen, Zhuo-Xu Cui, Dong Liang

Abstract: Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion… ▽ More Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2308.11980 [pdf, other]

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

arXiv:2304.14795 [pdf, ps, other]

Semi-Supervised RF Fingerprinting with Consistency-Based Regularization

Authors: Weidong Wang, Cheng Luo, Jiancheng An, Lu Gan, Hongshu Liao, Chau Yuen

Abstract: As a promising non-password authentication technology, radio frequency (RF) fingerprinting can greatly improve wireless security. Recent work has shown that RF fingerprinting based on deep learning can significantly outperform conventional approaches. The superiority, however, is mainly attributed to supervised learning using a large amount of labeled data, and it significantly degrades if only li… ▽ More As a promising non-password authentication technology, radio frequency (RF) fingerprinting can greatly improve wireless security. Recent work has shown that RF fingerprinting based on deep learning can significantly outperform conventional approaches. The superiority, however, is mainly attributed to supervised learning using a large amount of labeled data, and it significantly degrades if only limited labeled data is available, making many existing algorithms lack practicability. Considering that it is often easier to obtain enough unlabeled data in practice with minimal resources, we leverage deep semi-supervised learning for RF fingerprinting, which largely relies on a composite data augmentation scheme designed for radio signals, combined with two popular techniques: consistency-based regularization and pseudo-labeling. Experimental results on both simulated and real-world datasets demonstrate that our proposed method for semi-supervised RF fingerprinting is far superior to other competing ones, and it can achieve remarkable performance almost close to that of fully supervised learning with a very limited number of examples. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: 12 pages, 15 figures, submitted to IEEE Internet of Things Journal

arXiv:2304.05922 [pdf, other]

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Authors: Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo

Abstract: Filler words like ``um" or ``uh" are common in spontaneous speech. It is desirable to automatically detect and remove them in recordings, as they affect the fluency, confidence, and professionalism of speech. Previous studies and our preliminary experiments reveal that the biggest challenge in filler word detection is that fillers can be easily confused with other hard categories like ``a" or ``I"… ▽ More Filler words like ``um" or ``uh" are common in spontaneous speech. It is desirable to automatically detect and remove them in recordings, as they affect the fluency, confidence, and professionalism of speech. Previous studies and our preliminary experiments reveal that the biggest challenge in filler word detection is that fillers can be easily confused with other hard categories like ``a" or ``I". In this paper, we propose a novel filler word detection method that effectively addresses this challenge by adding auxiliary categories dynamically and applying an additional inter-category focal loss. The auxiliary categories force the model to explicitly model the confusing words by mining hard categories. In addition, inter-category focal loss adaptively adjusts the penalty weight between ``filler" and ``non-filler" categories to deal with other confusing words left in the ``non-filler" category. Our system achieves the best results, with a huge improvement compared to other methods on the PodcastFillers dataset. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: accepted by ICASSP23

arXiv:2211.05396 [pdf]

Learning Visual Representation of Underwater Acoustic Imagery Using Transformer-Based Style Transfer Method

Authors: Xiaoteng Zhou, Changli Yu, Shihao Yuan, Xin Yuan, Hangchi Yu, Citong Luo

Abstract: Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representat… ▽ More Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representation of underwater acoustic imageries, which takes a transformer-based style transfer model as the main body. It could replace the low-level texture features of optical images with the visual features of underwater acoustic imageries while preserving their raw high-level semantic content. The proposed framework could fully use the rich optical image dataset to generate a pseudo-acoustic image dataset and use it as the initial sample to train the underwater acoustic target recognition model. The experiments select the dual-frequency identification sonar (DIDSON) as the underwater acoustic data source and also take fish, the most common marine creature, as the research subject. Experimental results show that the proposed method could generate high-quality and high-fidelity pseudo-acoustic samples, achieve the purpose of acoustic data enhancement and provide support for the underwater acoustic-optical images domain transfer research. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 11 pages, 9 figures, conference

arXiv:2210.12995 [pdf, other]

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

Authors: Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo

Abstract: In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details. TridentSE maintains T-F bin level representation to capture details, and uses a small number of global tokens to process the global information. Information is propagated between the local and the global representations through cross… ▽ More In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details. TridentSE maintains T-F bin level representation to capture details, and uses a small number of global tokens to process the global information. Information is propagated between the local and the global representations through cross attention modules. To capture both inter- and intra-frame information, the global tokens are divided into two groups to process along the time and the frequency axis respectively. A metric discriminator is further employed to guide our model to achieve higher perceptual quality. Even with significantly lower computational cost, TridentSE outperforms a variety of previous speech enhancement methods, achieving a PESQ of 3.47 on VoiceBank+DEMAND dataset and a PESQ of 3.44 on DNS no-reverb test set. Visualization shows that the global tokens learn diverse and interpretable global patterns. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 5 pages, 2 figures, 3 tables

arXiv:2208.06222 [pdf, other]

Scale-free and Task-agnostic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator

Authors: Xiangbo Gao, Cheng Luo, Qinliang Lin, Weicheng Xie, Minmin Liu, Linlin Shen, Keerthy Kusumam, Siyang Song

Abstract: \noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local… ▽ More \noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local adversarial entities, which are only suitable for attacking specific contents or performing global attacks, which are only applicable to a specific image scale. In this paper, we propose a novel Patch Quilting Generative Adversarial Networks (PQ-GAN) to learn the first scale-free CNN generator that can be applied to attack images with arbitrary scales for various computer vision tasks. The principal investigation on transferability of the generated adversarial examples, robustness to defense frameworks, and visual quality assessment show that the proposed PQG-based attack framework outperforms the other nine state-of-the-art adversarial attack approaches when attacking the neural networks trained on two standard evaluation datasets (i.e., ImageNet and CityScapes). △ Less

Submitted 19 November, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

arXiv:2208.04622 [pdf, other]

An Anchor-Free Detector for Continuous Speech Keyword Spotting

Authors: Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo

Abstract: Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech. In this paper, we regard CSKWS as a one-dimensional object detection task and propose a novel anchor-free detector, named AF-KWS, to solve the problem. AF-KWS directly regresses the center locations and lengths of the keywords through a single-stage deep neural network. In particular, AF-KWS… ▽ More Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech. In this paper, we regard CSKWS as a one-dimensional object detection task and propose a novel anchor-free detector, named AF-KWS, to solve the problem. AF-KWS directly regresses the center locations and lengths of the keywords through a single-stage deep neural network. In particular, AF-KWS is tailored for this speech task as we introduce an auxiliary unknown class to exclude other words from non-speech or silent background. We have built two benchmark datasets named LibriTop-20 and continuous meeting analysis keywords (CMAK) dataset for CSKWS. Evaluations on these two datasets show that our proposed AF-KWS outperforms reference schemes by a large margin, and therefore provides a decent baseline for future research. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted by Interspeech 2022

arXiv:2207.09647 [pdf, other]

Deep Learning Based Automatic Modulation Recognition: Models, Datasets, and Challenges

Authors: Fuxin Zhang, Chunbo Luo, Jialang Xu, Yang Luo, FuChun Zheng

Abstract: Automatic modulation recognition (AMR) detects the modulation scheme of the received signals for further signal processing without needing prior information, and provides the essential function when such information is missing. Recent breakthroughs in deep learning (DL) have laid the foundation for develo** high-performance DL-AMR approaches for communications systems. Comparing with traditional… ▽ More Automatic modulation recognition (AMR) detects the modulation scheme of the received signals for further signal processing without needing prior information, and provides the essential function when such information is missing. Recent breakthroughs in deep learning (DL) have laid the foundation for develo** high-performance DL-AMR approaches for communications systems. Comparing with traditional modulation detection methods, DL-AMR approaches have achieved promising performance including high recognition accuracy and low false alarms due to the strong feature extraction and classification abilities of deep neural networks. Despite the promising potential, DL-AMR approaches also bring concerns to complexity and explainability, which affect the practical deployment in wireless communications systems. This paper aims to present a review of the current DL-AMR research, with a focus on appropriate DL models and benchmark datasets. We further provide comprehensive experiments to compare the state of the art models for single-input-single-output (SISO) systems from both accuracy and complexity perspectives, and propose to apply DL-AMR in the new multiple-input-multiple-output (MIMO) scenario with precoding. Finally, existing challenges and possible future research directions are discussed. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2206.13865 [pdf, other]

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

Authors: Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo

Abstract: This paper proposes a new "decompose-and-edit" paradigm for the text-based speech insertion task that facilitates arbitrary-length speech insertion and even full sentence generation. In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody. Specifically, we proposed to represent the g… ▽ More This paper proposes a new "decompose-and-edit" paradigm for the text-based speech insertion task that facilitates arbitrary-length speech insertion and even full sentence generation. In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody. Specifically, we proposed to represent the global factors by multiple tokens, which are extracted by cross-attention operation and then injected back by link-attention operation. Due to the rich representation of global factors, we manage to achieve high speaker similarity in a zero-shot manner. In addition, we introduce a prosody smoothing task to make the local prosody factor context-aware and therefore achieve satisfactory prosody continuity. We further achieve high voice quality with an adversarial training stage. In the subjective test, our method achieves state-of-the-art performance in both naturalness and similarity. Audio samples can be found at https://ydcustc.github.io/retrieverTTS-demo/. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 5 pages, 1 figure, 3 tables. Accepted by Interspeech 2022

arXiv:2205.03599 [pdf, other]

GAN-Based Multi-View Video Coding with Spatio-Temporal EPI Reconstruction

Authors: Chengdong Lan, Hao Yan, Cheng Luo, Tiesong Zhao

Abstract: The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SI). Typically, depth maps are used to construct SI. However, their methods suffer from inaccur… ▽ More The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SI). Typically, depth maps are used to construct SI. However, their methods suffer from inaccuracies in reconstruction and inherently high bitrates. In this paper, we propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adversarial Network (GAN) to improve the reconstruction accuracy of SI. Additionally, we consider incorporating information from adjacent temporal and spatial viewpoints to further reduce SI redundancy. At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize a convolutional network to extract the latent code of a GAN as SI. At the decoder side, we combine the SI and adjacent viewpoints to reconstruct intermediate views using the GAN generator. Specifically, we establish a joint encoder constraint for reconstruction cost and SI entropy to achieve an optimal trade-off between reconstruction quality and bitrates overhead. Experiments demonstrate significantly improved Rate-Distortion (RD) performance compared with state-of-the-art methods. △ Less

Submitted 5 May, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

arXiv:2202.12307 [pdf, other]

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

Authors: Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng

Abstract: This paper addresses the unsupervised learning of content-style decomposed representation. We first give a definition of style and then model the content-style representation as a token-level bipartite graph. An unsupervised framework, named Retriever, is proposed to learn such representations. First, a cross-attention module is employed to retrieve permutation invariant (P.I.) information, define… ▽ More This paper addresses the unsupervised learning of content-style decomposed representation. We first give a definition of style and then model the content-style representation as a token-level bipartite graph. An unsupervised framework, named Retriever, is proposed to learn such representations. First, a cross-attention module is employed to retrieve permutation invariant (P.I.) information, defined as style, from the input data. Second, a vector quantization (VQ) module is used, together with man-induced constraints, to produce interpretable content tokens. Last, an innovative link attention module serves as the decoder to reconstruct data from the decomposed content and style, with the help of the linking keys. Being modal-agnostic, the proposed Retriever is evaluated in both speech and image domains. The state-of-the-art zero-shot voice conversion performance confirms the disentangling ability of our framework. Top performance is also achieved in the part discovery task for images, verifying the interpretability of our representation. In addition, the vivid part-based style transfer quality demonstrates the potential of Retriever to support various fascinating generative tasks. Project page at https://ydcustc.github.io/retriever-demo/. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: Accepted to ICLR 2022. Project page at https://ydcustc.github.io/retriever-demo/

arXiv:2110.04980 [pdf, other]

An Efficient Deep Learning Model for Automatic Modulation Recognition Based on Parameter Estimation and Transformation

Authors: Fuxin Zhang, Chunbo Luo, Jialang Xu, Yang Luo

Abstract: Automatic modulation recognition (AMR) is a promising technology for intelligent communication receivers to detect signal modulation schemes. Recently, the emerging deep learning (DL) research has facilitated high-performance DL-AMR approaches. However, most DL-AMR models only focus on recognition accuracy, leading to huge model sizes and high computational complexity, while some lightweight and l… ▽ More Automatic modulation recognition (AMR) is a promising technology for intelligent communication receivers to detect signal modulation schemes. Recently, the emerging deep learning (DL) research has facilitated high-performance DL-AMR approaches. However, most DL-AMR models only focus on recognition accuracy, leading to huge model sizes and high computational complexity, while some lightweight and low-complexity models struggle to meet the accuracy requirements. This letter proposes an efficient DL-AMR model based on phase parameter estimation and transformation, with convolutional neural network (CNN) and gated recurrent unit (GRU) as the feature extraction layers, which can achieve high recognition accuracy equivalent to the existing state-of-the-art models but reduces more than a third of the volume of their parameters. Meanwhile, our model is more competitive in training time and test time than the benchmark models with similar recognition accuracy. Moreover, we further propose to compress our model by pruning, which maintains the recognition accuracy higher than 90% while has less than 1/8 of the number of parameters comparing with state-of-the-art models. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2109.05426 [pdf, other]

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Authors: Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

Abstract: Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript. Existing methods adopt a two-stage approach: synthesize the input text using a generic text-to-speech (TTS) engine and then transform the voice to the desired voice using voice conversion (VC). A major problem of this fr… ▽ More Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript. Existing methods adopt a two-stage approach: synthesize the input text using a generic text-to-speech (TTS) engine and then transform the voice to the desired voice using voice conversion (VC). A major problem of this framework is that VC is a challenging problem which usually needs a moderate amount of parallel training data to work satisfactorily. In this paper, we propose a one-stage context-aware framework to generate natural and coherent target speech without any training data of the target speaker. In particular, we manage to perform accurate zero-shot duration prediction for the inserted text. The predicted duration is used to regulate both text embedding and speech embedding. Then, based on the aligned cross-modality input, we directly generate the mel-spectrogram of the edited speech with a transformer-based decoder. Subjective listening tests show that despite the lack of training data for the speaker, our method has achieved satisfactory results. It outperforms a recent zero-shot TTS engine by a large margin. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: Published in Interspeech'21

arXiv:2108.12083 [pdf]

Deep Denoising Method for Side Scan Sonar Images without High-quality Reference Data

Authors: Xiaoteng Zhou, Changli Yu, Xin Yuan, Citong Luo

Abstract: Subsea images measured by the side scan sonars (SSSs) are necessary visual data in the process of deep-sea exploration by using the autonomous underwater vehicles (AUVs). They could vividly reflect the topography of the seabed, but usually accompanied by complex and severe noise. This paper proposes a deep denoising method for SSS images without high-quality reference data, which uses one single n… ▽ More Subsea images measured by the side scan sonars (SSSs) are necessary visual data in the process of deep-sea exploration by using the autonomous underwater vehicles (AUVs). They could vividly reflect the topography of the seabed, but usually accompanied by complex and severe noise. This paper proposes a deep denoising method for SSS images without high-quality reference data, which uses one single noise SSS image to perform self-supervised denoising. Compared with the classical artificially designed filters, the deep denoising method shows obvious advantages. The denoising experiments are performed on the real seabed SSS images, and the results demonstrate that our proposed method could effectively reduce the noise on the SSS image while minimizing the image quality and detail loss. △ Less

Submitted 26 August, 2021; originally announced August 2021.

arXiv:2102.01930 [pdf, other]

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

Authors: Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha

Abstract: This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning. In the design of MGF, speech hierarchy is taken into consideration. Specifically, we propose to use generative learning approaches to capture fine-grained information at small time scales and use discriminative learning approaches to distill coarse-grained or semantic informatio… ▽ More This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning. In the design of MGF, speech hierarchy is taken into consideration. Specifically, we propose to use generative learning approaches to capture fine-grained information at small time scales and use discriminative learning approaches to distill coarse-grained or semantic information at large time scales. For phoneme-scale learning, we borrow idea from the masked language model but tailor it for the continuous speech signal by replacing classification loss with a contrastive loss. We corroborate our design by evaluating MGF representation on various downstream tasks, including phoneme classification, speaker classification, speech recognition, and emotion classification. Experiments verify that training at different time scales needs different training targets and loss functions, which in general complement each other and lead to a better performance. △ Less

Submitted 3 February, 2021; originally announced February 2021.

arXiv:2012.13920 [pdf]

WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets for hyperspectral image classification

Authors: Xin Hu, Yanfei Zhong, Chang Luo, Xinyu Wang

Abstract: Classification is an important aspect of hyperspectral images processing and application. At present, the researchers mostly use the classic airborne hyperspectral imagery as the benchmark dataset. However, existing datasets suffer from three bottlenecks: (1) low spatial resolution; (2) low labeled pixels proportion; (3) low degree of subclasses distinction. In this paper, a new benchmark dataset… ▽ More Classification is an important aspect of hyperspectral images processing and application. At present, the researchers mostly use the classic airborne hyperspectral imagery as the benchmark dataset. However, existing datasets suffer from three bottlenecks: (1) low spatial resolution; (2) low labeled pixels proportion; (3) low degree of subclasses distinction. In this paper, a new benchmark dataset named the Wuhan UAV-borne hyperspectral image (WHU-Hi) dataset was built for hyperspectral image classification. The WHU-Hi dataset with a high spectral resolution (nm level) and a very high spatial resolution (cm level), which we refer to here as H2 imager. Besides, the WHU-Hi dataset has a higher pixel labeling ratio and finer subclasses. Some start-of-art hyperspectral image classification methods benchmarked the WHU-Hi dataset, and the experimental results show that WHU-Hi is a challenging dataset. We hope WHU-Hi dataset can become a strong benchmark to accelerate future research. △ Less

Submitted 30 March, 2021; v1 submitted 27 December, 2020; originally announced December 2020.

Comments: 5 pages, 1 figure

arXiv:2009.08162

Online Speaker Diarization with Relation Network

Authors: Xiang Li, Yucheng Zhao, Chong Luo, Wenjun Zeng

Abstract: In this paper, we propose an online speaker diarization system based on Relation Network, named RenoSD. Unlike conventional diariztion systems which consist of several independently-optimized modules, RenoSD implements voice-activity-detection (VAD), embedding extraction, and speaker identity association using a single deep neural network. The most striking feature of RenoSD is that it adopts a me… ▽ More In this paper, we propose an online speaker diarization system based on Relation Network, named RenoSD. Unlike conventional diariztion systems which consist of several independently-optimized modules, RenoSD implements voice-activity-detection (VAD), embedding extraction, and speaker identity association using a single deep neural network. The most striking feature of RenoSD is that it adopts a meta-learning strategy for speaker identity association. In particular, the relation network learns to learn a deep distance metric in a data-driven way and it can determine through a simple forward pass whether two given segments belong to the same speaker. As such, RenoSD can be performed in an online manner with low latency. Experimental results on AMI and CALLHOME datasets show that the proposed RenoSD system achieves consistent improvements over the state-of-the-art x-vector baseline. Compared with an existing online diarization system named UIS-RNN, RenoSD achieves a better performance using much fewer training data and at a lower time complexity. △ Less

Submitted 18 September, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: We find potential incorrectness in our experimental results which may lead to a wrong conclusion. We decide to rerun the experiments to check our experimental results and temporarily withdraw this paper

arXiv:2005.05085 [pdf, other]

Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices

Authors: Chunjie Luo, Xiwen He, Jianfeng Zhan, Lei Wang, Wanling Gao, Jiahui Dai

Abstract: Due to increasing amounts of data and compute resources, deep learning achieves many successes in various domains. The application of deep learning on the mobile and embedded devices is taken more and more attentions, benchmarking and ranking the AI abilities of mobile and embedded devices becomes an urgent problem to be solved. Considering the model diversity and framework diversity, we propose a… ▽ More Due to increasing amounts of data and compute resources, deep learning achieves many successes in various domains. The application of deep learning on the mobile and embedded devices is taken more and more attentions, benchmarking and ranking the AI abilities of mobile and embedded devices becomes an urgent problem to be solved. Considering the model diversity and framework diversity, we propose a benchmark suite, AIoTBench, which focuses on the evaluation of the inference abilities of mobile and embedded devices. AIoTBench covers three typical heavy-weight networks: ResNet50, InceptionV3, DenseNet121, as well as three light-weight networks: SqueezeNet, MobileNetV2, MnasNet. Each network is implemented by three frameworks which are designed for mobile and embedded devices: Tensorflow Lite, Caffe2, Pytorch Mobile. To compare and rank the AI capabilities of the devices, we propose two unified metrics as the AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per Second (VOPS). Currently, we have compared and ranked 5 mobile devices using our benchmark. This list will be extended and updated soon after. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:2004.10987 [pdf, other]

COVID-19 Chest CT Image Segmentation -- A Deep Convolutional Neural Network Solution

Authors: Qingsen Yan, Bo Wang, Dong Gong, Chuan Luo, Wei Zhao, Jianhu Shen, Qinfeng Shi, Shuo **, Liang Zhang, Zheng You

Abstract: A novel coronavirus disease 2019 (COVID-19) was detected and has spread rapidly across various countries around the world since the end of the year 2019, Computed Tomography (CT) images have been used as a crucial alternative to the time-consuming RT-PCR test. However, pure manual segmentation of CT images faces a serious challenge with the increase of suspected cases, resulting in urgent requirem… ▽ More A novel coronavirus disease 2019 (COVID-19) was detected and has spread rapidly across various countries around the world since the end of the year 2019, Computed Tomography (CT) images have been used as a crucial alternative to the time-consuming RT-PCR test. However, pure manual segmentation of CT images faces a serious challenge with the increase of suspected cases, resulting in urgent requirements for accurate and automatic segmentation of COVID-19 infections. Unfortunately, since the imaging characteristics of the COVID-19 infection are diverse and similar to the backgrounds, existing medical image segmentation methods cannot achieve satisfactory performance. In this work, we try to establish a new deep convolutional neural network tailored for segmenting the chest CT images with COVID-19 infections. We firstly maintain a large and new chest CT image dataset consisting of 165,667 annotated chest CT images from 861 patients with confirmed COVID-19. Inspired by the observation that the boundary of the infected lung can be enhanced by adjusting the global intensity, in the proposed deep CNN, we introduce a feature variation block which adaptively adjusts the global properties of the features for segmenting COVID-19 infection. The proposed FV block can enhance the capability of feature representation effectively and adaptively for diverse cases. We fuse features at different scales by proposing Progressive Atrous Spatial Pyramid Pooling to handle the sophisticated infection areas with diverse appearance and shapes. We conducted experiments on the data collected in China and Germany and show that the proposed deep CNN can produce impressive performance effectively. △ Less

Submitted 25 April, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

arXiv:2002.11861 [pdf, other]

Simulation of Real-time Routing for UAS traffic Management with Communication and Airspace Safety Considerations

Authors: Zhao **, Ziyi Zhao, Chen Luo, Franco Basti, Adrian Solomon, M. Cenk Gursoy, Carlos Caicedo, Qinru Qiu

Abstract: Small Unmanned Aircraft Systems (sUAS) will be an important component of the smart city and intelligent transportation environments of the near future. The demand for sUAS related applications, such as commercial delivery and land surveying, is expected to grow rapidly in next few years. In general, sUAS traffic routing and management functions are needed to coordinate the launching of sUAS from d… ▽ More Small Unmanned Aircraft Systems (sUAS) will be an important component of the smart city and intelligent transportation environments of the near future. The demand for sUAS related applications, such as commercial delivery and land surveying, is expected to grow rapidly in next few years. In general, sUAS traffic routing and management functions are needed to coordinate the launching of sUAS from different launch sites and determine their trajectories to avoid conflict while considering several other constraints such as expected arrival time, minimum flight energy, and availability of communication resources. However, as the airborne sUAS density grows in a certain area, it is difficult to foresee the potential airspace and communications resource conflicts and make immediate decisions to avoid them. To address this challenge, we present a temporal and spatial routing algorithm and simulation platform for sUAS trajectory management in a high density urban area that plans sUAS movements in a spatial and temporal maze taking into account obstacles that are either static or dynamic in time. The routing allows the sUAS to avoid static no-fly areas (i.e. static obstacles) or other in-flight sUAS and areas that have congested communication resources (i.e. dynamic obstacles). The algorithm is evaluated using an agent-based simulation platform. The simulation results show that the proposed algorithm outperforms other route management algorithms in many areas, especially in processing speed and memory efficiency. Detailed comparisons are provided for the sUAS flight time, the overall throughput, conflict rate and communication resource utilization. The results demonstrate that our proposed algorithm can be used to address the airspace and communication resource utilization needs for a next generation smart city and smart transportation. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Comments: The 38th AIAA/IEEE Digital Avionics Systems Conference (DASC)

arXiv:1912.07367 [pdf, other]

A Model-driven and Data-driven Fusion Framework for Accurate Air Quality Prediction

Authors: Haolin Fei, Xiaofeng Wu, Chunbo Luo

Abstract: Air quality is closely related to public health. Health issues such as cardiovascular diseases and respiratory diseases, may have connection with long exposure to highly polluted environment. Therefore, accurate air quality forecasts are extremely important to those who are vulnerable. To estimate the variation of several air pollution concentrations, previous researchers used various approaches,… ▽ More Air quality is closely related to public health. Health issues such as cardiovascular diseases and respiratory diseases, may have connection with long exposure to highly polluted environment. Therefore, accurate air quality forecasts are extremely important to those who are vulnerable. To estimate the variation of several air pollution concentrations, previous researchers used various approaches, such as the Community Multiscale Air Quality model (CMAQ) or neural networks. Although CMAQ model considers a coverage of the historic air pollution data and meteorological variables, extra bias is introduced due to additional adjustment. In this paper, a combination of model-based strategy and data-driven method namely the physical-temporal collection(PTC) model is proposed, aiming to fix the systematic error that traditional models deliver. In the data-driven part, the first components are the temporal pattern and the weather pattern to measure important features that contribute to the prediction performance. The less relevant input variables will be removed to eliminate negative weights in network training. Then, we deploy a long-short-term-memory (LSTM) to fetch the preliminary results, which will be further corrected by a neural network (NN) involving the meteorological index as well as other pollutants concentrations. The data-set we applied for forecasting is from January 1st, 2016 to December 31st, 2016. According to the results, our PTC achieves an excellent performance compared with the baseline model (CMAQ prediction, GRU, DNN and etc.). This joint model-based data-driven method for air quality prediction can be easily deployed on stations without extra adjustment, providing results with high-time-resolution information for vulnerable members to prevent heavy air pollution ahead. △ Less

Submitted 6 December, 2019; originally announced December 2019.

arXiv:1911.04697 [pdf, other]

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

Authors: Dacheng Yin, Chong Luo, Zhiwei Xiong, Wenjun Zeng

Abstract: Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition to amplitude prediction. In this paper, we propose a phase-and-harmonics-aware deep neural network (DNN), named PHASEN, for this task. Unlike previous methods that directly use a complex ideal ratio mask to supervise the DNN learning, w… ▽ More Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition to amplitude prediction. In this paper, we propose a phase-and-harmonics-aware deep neural network (DNN), named PHASEN, for this task. Unlike previous methods that directly use a complex ideal ratio mask to supervise the DNN learning, we design a two-stream network, where amplitude stream and phase stream are dedicated to amplitude and phase prediction. We discover that the two streams should communicate with each other, and this is crucial to phase prediction. In addition, we propose frequency transformation blocks to catch long-range correlations along the frequency axis. The visualization shows that the learned transformation matrix spontaneously captures the harmonic correlation, which has been proven to be helpful for T-F spectrogram reconstruction. With these two innovations, PHASEN acquires the ability to handle detailed phase patterns and to utilize harmonic patterns, getting 1.76dB SDR improvement on AVSpeech + AudioSet dataset. It also achieves significant gains over Google's network on this dataset. On Voice Bank + DEMAND dataset, PHASEN outperforms previous methods by a large margin on four metrics. △ Less

Submitted 12 November, 2019; originally announced November 2019.

Comments: Accepted by AAAI'20

arXiv:1908.03398 [pdf, other]

No Need of Data Pre-processing: A General Framework for Radio-Based Device-Free Context Awareness

Authors: Bo Wei, Kai Li, Chengwen Luo, Weitao Xu, ** Zhang

Abstract: Device-free context awareness is important to many applications. There are two broadly used approaches for device-free context awareness, i.e. video-based and radio-based. Video-based applications can deliver good performance, but privacy is a serious concern. Radio-based context awareness has drawn researchers attention instead because it does not violate privacy and radio signal can penetrate ob… ▽ More Device-free context awareness is important to many applications. There are two broadly used approaches for device-free context awareness, i.e. video-based and radio-based. Video-based applications can deliver good performance, but privacy is a serious concern. Radio-based context awareness has drawn researchers attention instead because it does not violate privacy and radio signal can penetrate obstacles. Recently, deep learning has been introduced into radio-based device-free context awareness and helps boost the recognition accuracy. The present works design explicit methods for each radio based application. They also use one additional step to extract features before conducting classification and exploit deep learning as a classification tool. The additional initial data processing step introduces unnecessary noise and information loss. Without initial data processing, it is, however, challenging to explore patterns of raw signals. In this paper, we are the first to propose an innovative deep learning based general framework for both signal processing and classification. The key novelty of this paper is that the framework can be generalised for all the radio-based context awareness applications. We also eliminate the additional effort to extract features from raw radio signals. We conduct extensive evaluations to show the superior performance of our proposed method and its generalisation. △ Less

Submitted 9 August, 2019; originally announced August 2019.

Comments: 21 pages

arXiv:1905.04921

A DoA Estimation Based Robust Beam Forming Method for UAV-BS Communication

Authors: Tianxiao Zhao, Chunbo Luo, Geyong Min, Jianming Zhou, Dechun Guo, Wang Miao, Yang Mi

Abstract: High data rate communication with Unmanned Aerial Vehicles (UAV) is of growing demand among industrial and commercial applications since the last decade. In this paper, we investigate enhancing beam forming performance based on signal Direction of Arrival (DoA) estimation to support UAV-cellular network communication. We first study UAV fast moving scenario where we found that drone's mobility cau… ▽ More High data rate communication with Unmanned Aerial Vehicles (UAV) is of growing demand among industrial and commercial applications since the last decade. In this paper, we investigate enhancing beam forming performance based on signal Direction of Arrival (DoA) estimation to support UAV-cellular network communication. We first study UAV fast moving scenario where we found that drone's mobility cause degradation of beam forming algorithm performance. Then, we propose a DoA estimation algorithm and a steering vector adaptive receiving beam forming method. The DoA estimation algorithm is of high precision with low computational complexity. Also it enables a beam former to timely adjust steering vector value in calculating beam forming weight. Simulation results show higher SINR performance and more stability of proposed method than traditional method based on Multiple Signal Classification (MUSIC) DoA estimation algorithm. △ Less

Submitted 7 September, 2020; v1 submitted 13 May, 2019; originally announced May 2019.

Comments: We would like to make some variations to the simulation results

arXiv:1902.04035 [pdf]

A Simulation Framework for Fast Design Space Exploration of Unmanned Air System Traffic Management Policies

Authors: Ziyi Zhao, Chen Luo, ** Zhao, Qinru Qiu, M. Cenk Gursoy, Carlos Caicedo, Franco Basti

Abstract: The number of daily small Unmanned Aircraft Systems (sUAS) operations in uncontrolled low altitude airspace is expected to reach into the millions. UAS Traffic Management (UTM) is an emerging concept aiming at the safe and efficient management of such very dense traffic, but few studies are addressing the policies to accommodate such demand and the required ground infrastructure in suburban or urb… ▽ More The number of daily small Unmanned Aircraft Systems (sUAS) operations in uncontrolled low altitude airspace is expected to reach into the millions. UAS Traffic Management (UTM) is an emerging concept aiming at the safe and efficient management of such very dense traffic, but few studies are addressing the policies to accommodate such demand and the required ground infrastructure in suburban or urban environments. Searching for the optimal air traffic management policy is a combinatorial optimization problem with intractable complexity when the number of sUAS and the constraints increases. As the demands on the airspace increase and traffic patterns get complicated, it is difficult to forecast the potential low altitude airspace hotspots and the corresponding ground resource requirements. This work presents a Multi-agent Air Traffic and Resource Usage Simulation (MATRUS) framework that aims for fast evaluation of different air traffic management policies and the relationship between policy, environment and resulting traffic patterns. It can also be used as a tool to decide the resource distribution and launch site location in the planning of a next-generation smart city. As a case study, detailed comparisons are provided for the sUAS flight time, conflict ratio, cellular communication resource usage, for a managed (centrally coordinated) and unmanaged (free flight) traffic scenario. △ Less

Submitted 14 February, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

Comments: The Integrated Communications Navigation and Surveillance (ICNS) Conference in 2019

arXiv:1808.03897 [pdf, other]

Engineering and Economic Analysis for Electric Vehicle Charging Infrastructure --- Placement, Pricing, and Market Design

Authors: Chao Luo

Abstract: This dissertation is to study the interplay between large-scale electric vehicle (EV) charging and the power system. We address three important issues pertaining to EV charging and integration into the power system: (1) charging station placement, (2) pricing policy and energy management strategy, and (3) electricity trading market and distribution network design to facilitate integrating EV and r… ▽ More This dissertation is to study the interplay between large-scale electric vehicle (EV) charging and the power system. We address three important issues pertaining to EV charging and integration into the power system: (1) charging station placement, (2) pricing policy and energy management strategy, and (3) electricity trading market and distribution network design to facilitate integrating EV and renewable energy source (RES) into the power system. For charging station placement problem, we propose a multi-stage consumer behavior based placement strategy with incremental EV penetration rates and model the EV charging industry as an oligopoly where the entire market is dominated by a few charging service providers (oligopolists). The optimal placement policy for each service provider is obtained by solving a Bayesian game. For pricing and energy management of EV charging stations, we provide guidelines for charging service providers to determine charging price and manage electricity reserve to balance the competing objectives of improving profitability, enhancing customer satisfaction, and reducing impact on the power system. Two algorithms --- stochastic dynamic programming (SDP) algorithm and greedy algorithm (benchmark algorithm) are applied to derive the pricing and electricity procurement strategy. We design a novel electricity trading market and distribution network, which supports seamless RES integration, grid to vehicle (G2V), vehicle to grid (V2G), vehicle to vehicle (V2V), and distributed generation (DG) and storage. We apply a sharing economy model to the electricity sector to stimulate different entities to exchange and monetize their underutilized electricity. A fitness-score (FS)-based supply-demand matching algorithm is developed by considering consumer surplus, electricity network congestion, and economic dispatch. △ Less

Submitted 12 August, 2018; originally announced August 2018.

Comments: Doctoral Dissertation, University of Notre Dame, 2018

arXiv:1801.02783 [pdf, other]

doi 10.5220/0005797100490059

Dynamic Pricing and Energy Management Strategy for EV Charging Stations under Uncertainties

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

Abstract: This paper presents a dynamic pricing and energy management framework for electric vehicle (EV) charging service providers. To set the charging prices, the service providers faces three uncertainties: the volatility of wholesale electricity price, intermittent renewable energy generation, and spatial-temporal EV charging demand. The main objective of our work here is to help charging service provi… ▽ More This paper presents a dynamic pricing and energy management framework for electric vehicle (EV) charging service providers. To set the charging prices, the service providers faces three uncertainties: the volatility of wholesale electricity price, intermittent renewable energy generation, and spatial-temporal EV charging demand. The main objective of our work here is to help charging service providers to improve their total profits while enhancing customer satisfaction and maintaining power grid stability, taking into account those uncertainties. We employ a linear regression model to estimate the EV charging demand at each charging station, and introduce a quantitative measure for customer satisfaction. Both the greedy algorithm and the dynamic programming (DP) algorithm are employed to derive the optimal charging prices and determine how much electricity to be purchased from the wholesale market in each planning horizon. Simulation results show that DP algorithm achieves an increased profit (up to 9%) compared to the greedy algorithm (the benchmark algorithm) under certain scenarios. Additionally, we observe that the integration of a low-cost energy storage into the system can not only improve the profit, but also smooth out the charging price fluctuation, protecting the end customers from the volatile wholesale market. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Comments: 11 pages, 9 figures, Proceedings of VEHITS 2016, ISBN: 978-989-758-185-4

Journal ref: In Proceedings of the International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2016)

arXiv:1801.02140 [pdf]

doi 10.1109/CSQRWC.2011.6037125

Performance Analysis of UWB Based Wireless Sensor Networks in Indoor Office LOS Environment

Authors: Xuanli Wu, Yang Cao, Xiaozong Yang, Chao Luo

Abstract: With the fast development of wireless sensor networks (WSN), more attentions are paid to high data rate transmission of WSN, and hence, in IEEE 802.15.4a standard, ultra-wideband (UWB) is introduced as one of the physical layer technique to support high transmission data rate and precisie locationing applications. In order to analyze the bit error rate (BER) performance of UWB based WSN, a system… ▽ More With the fast development of wireless sensor networks (WSN), more attentions are paid to high data rate transmission of WSN, and hence, in IEEE 802.15.4a standard, ultra-wideband (UWB) is introduced as one of the physical layer technique to support high transmission data rate and precisie locationing applications. In order to analyze the bit error rate (BER) performance of UWB based WSN, a system model considering intra-symbol interference (IASI), inter-symbol interference (ISI), multiuser interference (MUI) and addictive white Gaussian noise (AWGN) is proposed in this paper, and then verified using simulation results. Moreover, the pulse waveforms complying with the spectrum requirement of IEEE 802.15.4a standard are given, and based on such obtained pulses, the effect of transmission data rate and user number is also shown. Results show that with the increase of SNR, the intra-symbol interference will decrease the system performance significantly, and system performance can be improve by using pulse waveforms with little intra-symbol interference. △ Less

Submitted 7 January, 2018; originally announced January 2018.

Comments: 5 pages, 6 figures, Cross Strait Quad-Regional Radio Science and Wireless Technology Conference (CSQRWC), 2011

arXiv:1801.02138 [pdf]

doi 10.1109/ChinaCom.2011.6158273

Partial Template Based Receiver in Impulse Radio Ultra-Wideband Communication Systems

Authors: Yang Cao, Chao Luo, Xuanli Wu

Abstract: For high speed ultra-wideband (UWB) communication systems, the multipath interference exhibits a primary obstacle to improve transmission performance. In order to enhance the signal to interference plus noise ratio (SINR) in the receiver, a partial template receiver is proposed in this paper. Instead of using the conventional template, the model in this paper adopts a partial template to demodulat… ▽ More For high speed ultra-wideband (UWB) communication systems, the multipath interference exhibits a primary obstacle to improve transmission performance. In order to enhance the signal to interference plus noise ratio (SINR) in the receiver, a partial template receiver is proposed in this paper. Instead of using the conventional template, the model in this paper adopts a partial template to demodulate signals. To analyze the performance of the proposed receiver, bit error rate (BER) formulation of IR-UWB systems in the presence of multipath interference, multiuser interference and addictive white Gaussian noise (AWGN) is derived in IEEE 802.15.4a channel models. Simulation results show that, compared with the conventional correlation receiver, the proposed receiver can achieve a better BER performance for high Eb/N0, which falls in the conventional used Eb/N0 range. △ Less

Submitted 7 January, 2018; originally announced January 2018.

Comments: 4 pages, 5 figures, Communications and Networking in China (CHINACOM), 2011 6th International ICST Conference on

arXiv:1801.02137 [pdf]

doi 10.1109/ChinaCom.2011.6158272

Multipath interference analysis of IR-UWB systems in indoor office LOS environment

Authors: Chao Luo, Xuanli Wu, Yang Cao

Abstract: Bit error rate (BER) performance of impulse radio Ultra-Wideband (UWB) systems in the presence of intrasymbol interference, inter-symbol interference, multiuser interference and addictive white Gaussian noise (AWGN) is presented in this paper. By analyzing the indoor office LOS channel model defined by IEEE 802.15.4a Task Group and deducing the variance for intra-symbol interference (IASI), inter-… ▽ More Bit error rate (BER) performance of impulse radio Ultra-Wideband (UWB) systems in the presence of intrasymbol interference, inter-symbol interference, multiuser interference and addictive white Gaussian noise (AWGN) is presented in this paper. By analyzing the indoor office LOS channel model defined by IEEE 802.15.4a Task Group and deducing the variance for intra-symbol interference (IASI), inter-symbol interference (ISI) and multiuser interference (MUI), the system BER expression is obtained and verified by MATLAB simulations. Through comparing the simulation results with and without intra-symbol interference, the conclusion that intra-symbol interference cannot be neglected is drawn-moreover, such interference will significantly decrease performance of UWB based wireless sensor networks (WSN). Then, the BER performance of UWB systems in multiuser environment is also analyzed and analysis results show that multiuser interference will further worsen the transmission performance of UWB systems. △ Less

Submitted 7 January, 2018; originally announced January 2018.

Comments: 5 pages, 2 figures, Communications and Networking in China (CHINACOM), 2011 6th International ICST Conference on

arXiv:1801.02135 [pdf, other]

doi 10.1109/VTCSpring.2015.7145593

A Consumer Behavior Based Approach to Multi-Stage EV Charging Station Placement

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

Abstract: This paper presents a multi-stage approach to the placement of charging stations under the scenarios of different electric vehicle (EV) penetration rates. The EV charging market is modeled as the oligopoly. A consumer behavior based approach is applied to forecast the charging demand of the charging stations using a nested logit model. The impacts of both the urban road network and the power grid… ▽ More This paper presents a multi-stage approach to the placement of charging stations under the scenarios of different electric vehicle (EV) penetration rates. The EV charging market is modeled as the oligopoly. A consumer behavior based approach is applied to forecast the charging demand of the charging stations using a nested logit model. The impacts of both the urban road network and the power grid network on charging station planning are also considered. At each planning stage, the optimal station placement strategy is derived through solving a Bayesian game among the service providers. To investigate the interplay of the travel pattern, the consumer behavior, urban road network, power grid network, and the charging station placement, a simulation platform (The EV Virtual City 1.0) is developed using Java on Repast.We conduct a case study in the San Pedro District of Los Angeles by importing the geographic and demographic data of that region into the platform. The simulation results demonstrate a strong consistency between the charging station placement and the traffic flow of EVs. The results also reveal an interesting phenomenon that service providers prefer clustering instead of spatial separation in this oligopoly market. △ Less

Submitted 7 January, 2018; originally announced January 2018.

Comments: 7 pages, 5 figures, Vehicular Technology Conference (VTC Spring), 2015 IEEE 81st. arXiv admin note: text overlap with arXiv:1801.02129

arXiv:1801.02129 [pdf, other]

doi 10.1109/TSG.2015.2508740

Placement of EV Charging Stations --- Balancing Benefits among Multiple Entities

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

Abstract: This paper studies the problem of multi-stage placement of electric vehicle (EV) charging stations with incremental EV penetration rates. A nested logit model is employed to analyze the charging preference of the individual consumer (EV owner), and predict the aggregated charging demand at the charging stations. The EV charging industry is modeled as an oligopoly where the entire market is dominat… ▽ More This paper studies the problem of multi-stage placement of electric vehicle (EV) charging stations with incremental EV penetration rates. A nested logit model is employed to analyze the charging preference of the individual consumer (EV owner), and predict the aggregated charging demand at the charging stations. The EV charging industry is modeled as an oligopoly where the entire market is dominated by a few charging service providers (oligopolists). At the beginning of each planning stage, an optimal placement policy for each service provider is obtained through analyzing strategic interactions in a Bayesian game. To derive the optimal placement policy, we consider both the transportation network graph and the electric power network graph. A simulation software --- The EV Virtual City 1.0 --- is developed using Java to investigate the interactions among the consumers (EV owner), the transportation network graph, the electric power network graph, and the charging stations. Through a series of experiments using the geographic and demographic data from the city of San Pedro District of Los Angeles, we show that the charging station placement is highly consistent with the heatmap of the traffic flow. In addition, we observe a spatial economic phenomenon that service providers prefer clustering instead of separation in the EV charging market. △ Less

Submitted 6 January, 2018; originally announced January 2018.

Comments: 10 pages, 10 figures

Journal ref: IEEE Transactions on Smart Grid, vol. 8, no. 2, pp. 759 - 768, 2015

arXiv:1801.02128 [pdf, other]

doi 10.1109/TSG.2017.2696493

Stochastic Dynamic Pricing for EV Charging Stations with Renewables Integration and Energy Storage

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

Abstract: This paper studies the problem of stochastic dynamic pricing and energy management policy for electric vehicle (EV) charging service providers. In the presence of renewable energy integration and energy storage system, EV charging service providers must deal with multiple uncertainties --- charging demand volatility, inherent intermittency of renewable energy generation, and wholesale electricity… ▽ More This paper studies the problem of stochastic dynamic pricing and energy management policy for electric vehicle (EV) charging service providers. In the presence of renewable energy integration and energy storage system, EV charging service providers must deal with multiple uncertainties --- charging demand volatility, inherent intermittency of renewable energy generation, and wholesale electricity price fluctuation. The motivation behind our work is to offer guidelines for charging service providers to determine proper charging prices and manage electricity to balance the competing objectives of improving profitability, enhancing customer satisfaction, and reducing impact on power grid in spite of these uncertainties. We propose a new metric to assess the impact on power grid without solving complete power flow equations. To protect service providers from severe financial losses, a safeguard of profit is incorporated in the model. Two algorithms --- stochastic dynamic programming (SDP) algorithm and greedy algorithm (benchmark algorithm) --- are applied to derive the pricing and electricity procurement policy. A Pareto front of the multiobjective optimization is derived. Simulation results show that using SDP algorithm can achieve up to 7% profit gain over using greedy algorithm. Additionally, we observe that the charging service provider is able to reshape spatial-temporal charging demands to reduce the impact on power grid via pricing signals. △ Less

Submitted 6 January, 2018; originally announced January 2018.

Comments: 13 pages, IEEE Transactions on Smart Grid, 2017

arXiv:1704.06917 [pdf, ps, other]

Identify Critical Branches with Cascading Failure Chain Statistics and Hypertext-Induced Topic Search Algorithm

Authors: Chao Luo, Jun Yang

Abstract: An effective way to suppress the cascading failure risk is the branch capacity upgrade, whose optimal decision making, however, may incur high computational burden. A practical way is to find out some critical branches as the candidates in advance. This paper proposes a simulation data oriented approach to identify the critical branches with higher importance in cascading failure propagation. Firs… ▽ More An effective way to suppress the cascading failure risk is the branch capacity upgrade, whose optimal decision making, however, may incur high computational burden. A practical way is to find out some critical branches as the candidates in advance. This paper proposes a simulation data oriented approach to identify the critical branches with higher importance in cascading failure propagation. First, a concept of cascading failure chain (CFC) is introduced and numerous samples of CFC are generated with an AC power flow based cascading failure simulator. Then, a directed weighted graph is constructed, whose edges denotes the severities of branch interactions. Third, the weighted hypertext-induced topic search (HITS) algorithm is used to rate and rank this graph's vertices,through which the critical branches can be identified accordingly. Validations on IEEE 118bus and RTS96 systems show that the proposed approach can identify critical branches whose capacity upgrades suppress cascading failure risk more greatly. Moreover, it is also shown that structural importance of a branch does not agree with its importance in cascading failure, which indicates the effectiveness of the proposed approach compared with structure vulnerabilities based identifying methods. △ Less

Submitted 23 April, 2017; originally announced April 2017.

Showing 1–43 of 43 results for author: Luo, C