-
Interference Analysis for Coexistence of UAVs and Civil Aircrafts Based on Automatic Dependent Surveillance-Broadcast
Authors:
Yiyang Liao,
Ziye Jia,
Chao Dong,
Lei Zhang,
Qihui Wu,
Huiling Hu,
Zhu Han
Abstract:
Due to the advantages of high mobility and easy deployment, unmanned aerial vehicles (UAVs) are widely applied in both military and civilian fields. In order to strengthen the flight surveillance of UAVs and guarantee the airspace safety, UAVs can be equipped with the automatic dependent surveillance-broadcast (ADS-B) system, which periodically sends flight information to other aircrafts and groun…
▽ More
Due to the advantages of high mobility and easy deployment, unmanned aerial vehicles (UAVs) are widely applied in both military and civilian fields. In order to strengthen the flight surveillance of UAVs and guarantee the airspace safety, UAVs can be equipped with the automatic dependent surveillance-broadcast (ADS-B) system, which periodically sends flight information to other aircrafts and ground stations (GSs). However, due to the limited resource of channel capacity, UAVs equipped with ADS-B results in the interference between UAVs and civil aircrafts (CAs), which further impacts the accuracy of received information at GSs. In detail, the channel capacity is mainly affected by the density of aircrafts and the transmitting power of ADS-B. Hence, based on the three-dimensional poisson point process, this work leverages the stochastic geometry theory to build a model of the coexistence of UAVs and CAs and analyze the interference performance of ADS-B monitoring system. From simulation results, we reveal the effects of transmitting power, density, threshold and pathloss on the performance of the ADS-B monitoring system. Besides, we provide the suggested transmitting power and density for the safe coexistence of UAVs and CAs.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Joint ADS-B in 5G for Hierarchical Aerial Networks: Performance Analysis and Optimization
Authors:
Ziye Jia,
Yiyang Liao,
Chao Dong,
Lijun He,
Qihui Wu,
Lei Zhang
Abstract:
Unmanned aerial vehicles (UAVs) are widely applied in multiple fields, which emphasizes the challenge of obtaining UAV flight information to ensure the airspace safety. UAVs equipped with automatic dependent surveillance-broadcast (ADS-B) devices are capable of sending flight information to nearby aircrafts and ground stations (GSs). However, the saturation of limited frequency bands of ADS-B lead…
▽ More
Unmanned aerial vehicles (UAVs) are widely applied in multiple fields, which emphasizes the challenge of obtaining UAV flight information to ensure the airspace safety. UAVs equipped with automatic dependent surveillance-broadcast (ADS-B) devices are capable of sending flight information to nearby aircrafts and ground stations (GSs). However, the saturation of limited frequency bands of ADS-B leads to interferences among UAVs and impairs the monitoring performance of GS to civil planes. To address this issue, the integration of the 5th generation mobile communication technology (5G) with ADS-B is proposed for UAV operations in this paper. Specifically, a hierarchical structure is proposed, in which the high-altitude central UAV is equipped with ADS-B and the low-altitude central UAV utilizes 5G modules to transmit flight information. Meanwhile, based on the mobile edge computing technique, the flight information of sub-UAVs is offloaded to the central UAV for further processing, and then transmitted to GS. We present the deterministic model and stochastic geometry based model to build the air-to-ground channel and air-to-air channel, respectively. The effectiveness of the proposed monitoring system is verified via simulations and experiments. This research contributes to improving the airspace safety and advancing the air traffic flow management.
△ Less
Submitted 29 April, 2024;
originally announced May 2024.
-
Towards Real-world Video Face Restoration: A New Benchmark
Authors:
Ziyan Chen,
**gwen He,
Xinqi Lin,
Yu Qiao,
Chao Dong
Abstract:
Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face ima…
▽ More
Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research.
△ Less
Submitted 4 May, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Three-Dimension Collision-Free Trajectory Planning of UAVs Based on ADS-B Information in Low-Altitude Urban Airspace
Authors:
Chao Dong,
Yifan Zhang,
Ziye Jia,
Yiyang Liao,
Lei Zhang,
Qihui Wu
Abstract:
The environment of low-altitude urban airspace is complex and variable due to numerous obstacles, non-cooperative aircrafts, and birds. Unmanned aerial vehicles (UAVs) leveraging environmental information to achieve three-dimension collision-free trajectory planning is the prerequisite to ensure airspace security. However, the timely information of surrounding situation is difficult to acquire by…
▽ More
The environment of low-altitude urban airspace is complex and variable due to numerous obstacles, non-cooperative aircrafts, and birds. Unmanned aerial vehicles (UAVs) leveraging environmental information to achieve three-dimension collision-free trajectory planning is the prerequisite to ensure airspace security. However, the timely information of surrounding situation is difficult to acquire by UAVs, which further brings security risks. As a mature technology leveraged in traditional civil aviation, the automatic dependent surveillance-broadcast (ADS-B) realizes continuous surveillance of the information of aircrafts. Consequently, we leverage ADS-B for surveillance and information broadcasting, and divide the aerial airspace into multiple sub-airspaces to improve flight safety in UAV trajectory planning. In detail, we propose the secure sub-airspaces planning (SSP) algorithm and particle swarm optimization rapidly-exploring random trees (PSO-RRT) algorithm for the UAV trajectory planning in law-altitude airspace. The performance of the proposed algorithm is verified by simulations and the results show that SSP reduces both the maximum number of UAVs in the sub-airspace and the length of the trajectory, and PSO-RRT reduces the cost of UAV trajectory in the sub-airspace.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Coexisting Passive RIS and Active Relay Assisted NOMA Systems
Authors:
Ao Huang,
Li Guo,
Xidong Mu,
Chao Dong,
Yuanwei Liu
Abstract:
A novel coexisting passive reconfigurable intelligent surface (RIS) and active decode-and-forward (DF) relay assisted non-orthogonal multiple access (NOMA) transmission framework is proposed. In particular, two communication protocols are conceived, namely Hybrid NOMA (H-NOMA) and Full NOMA (F-NOMA). Based on the proposed two protocols, both the sum rate maximization and max-min rate fairness prob…
▽ More
A novel coexisting passive reconfigurable intelligent surface (RIS) and active decode-and-forward (DF) relay assisted non-orthogonal multiple access (NOMA) transmission framework is proposed. In particular, two communication protocols are conceived, namely Hybrid NOMA (H-NOMA) and Full NOMA (F-NOMA). Based on the proposed two protocols, both the sum rate maximization and max-min rate fairness problems are formulated for jointly optimizing the power allocation at the access point and relay as well as the passive beamforming design at the RIS. To tackle the non-convex problems, an alternating optimization (AO) based algorithm is first developed, where the transmit power and the RIS phase-shift are alternatingly optimized by leveraging the two-dimensional search and rank-relaxed difference-of-convex (DC) programming, respectively. Then, a two-layer penalty based joint optimization (JO) algorithm is developed to jointly optimize the resource allocation coefficients within each iteration. Finally, numerical results demonstrate that: i) the proposed coexisting RIS and relay assisted transmission framework is capable of achieving a significant user performance improvement than conventional schemes without RIS or relay; ii) compared with the AO algorithm, the JO algorithm requires less execution time at the cost of a slight performance loss; and iii) the H-NOMA and F-NOMA protocols are generally preferable for ensuring user rate fairness and enhancing user sum rate, respectively.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding
Authors:
Cunhui Dong,
Haichuan Ma,
Haotian Zhang,
Changsheng Gao,
Li Li,
Dong Liu
Abstract:
Neural network-based image coding has been develo** rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, t…
▽ More
Neural network-based image coding has been develo** rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, this paper introduces a novel wavelet-like transform-based end-to-end image coding framework -- iWaveV3. iWaveV3 incorporates many new features such as affine wavelet-like transform, perceptual-friendly quality metric, and more advanced training and online optimization strategies into our previous wavelet-like transform-based framework iWave++. While preserving the features of supporting lossy and lossless compression simultaneously, iWaveV3 also achieves state-of-the-art compression efficiency for objective quality and is very competitive for perceptual quality. As a result, iWaveV3 is adopted as a candidate scheme for develo** the IEEE Standard for neural-network-based image coding.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Semantic Importance-Aware Based for Multi-User Communication Over MIMO Fading Channels
Authors:
Haotai Liang,
Zhicheng Bao,
Wannian An,
Chen Dong,
Xiaodong Xu
Abstract:
Semantic communication, as a novel communication paradigm, has attracted the interest of many scholars, with multi-user, multi-input multi-output (MIMO) scenarios being one of the critical contexts. This paper presents a semantic importance-aware based communication system (SIA-SC) over MIMO Rayleigh fading channels. Combining the semantic symbols' inequality and the equivalent subchannels of MIMO…
▽ More
Semantic communication, as a novel communication paradigm, has attracted the interest of many scholars, with multi-user, multi-input multi-output (MIMO) scenarios being one of the critical contexts. This paper presents a semantic importance-aware based communication system (SIA-SC) over MIMO Rayleigh fading channels. Combining the semantic symbols' inequality and the equivalent subchannels of MIMO channels based on Singular Value Decomposition (SVD) maximizes the end-to-end semantic performance through the new layer map** method. For multi-user scenarios, a method of semantic interference cancellation is proposed. Furthermore, a new metric, namely semantic information distortion (SID), is established to unify the expressions of semantic performance, which is affected by channel bandwidth ratio (CBR) and signal-to-noise ratio (SNR). With the help of the proposed metric, we derived performance expressions and Semantic Outage Probability (SOP) of SIA-SC for Single-User Single-Input Single-Output (SU-SISO), Single-User MIMO (SU-MIMO), Multi-Users SISO (MU-MIMO) and Multi-Users MIMO (MU-MIMO) scenarios. Numerical experiments show that SIA-SC can significantly improve semantic performance across various scenarios.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
UAV Trajectory Tracking via RNN-enhanced IMM-KF with ADS-B Data
Authors:
Yian Zhu,
Ziye Jia,
Qihui Wu,
Chao Dong,
Zirui Zhuang,
Huiling Hu,
Qi Cai
Abstract:
With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic de…
▽ More
With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic dependent surveillance-broadcast (ADS-B) emerges as a promising method to monitor UAVs, due to the advantages of real-time capabilities, easy deployment and affordable cost. Therefore, we employ the ADS-B for UAV trajectory tracking in this work. However, the inherent noise in the transmitted data poses an obstacle for precisely tracking UAVs. Hence, we propose the algorithm of recurrent neural network-enhanced interacting multiple model-Kalman filter (RNN-enhanced IMM-KF) for UAV trajectory filtering. Specifically, the algorithm utilizes the RNN to capture the maneuvering behavior of UAVs and the noise level in the ADS-B data. Moreover, accurate UAV tracking is achieved by adaptively adjusting the process noise matrix and observation noise matrix of IMM-KF with the assistance of the RNN. The proposed algorithm can facilitate GSs to make timely decisions during trajectory deviations of UAVs and improve the airspace safety. Finally, via comprehensive simulations, the total root mean square error of the proposed algorithm decreases by 28.56%, compared to the traditional IMM-KF.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Semantic Synchronization for Enhanced Reliability in Communication Systems
Authors:
Xiaoyi Liu,
Haotai Liang,
Chen Dong,
Xiaodong Xu
Abstract:
As a new communication paradigm, semantic communication has received widespread attention in communication fields. However, since the decoding of semantic signals relies on contextual knowledge, misalignment between the starting position of the semantic signal and the AI-based semantic decoder would prevent source signal recovery and reconstruction. To achieve more precise semantic communication,…
▽ More
As a new communication paradigm, semantic communication has received widespread attention in communication fields. However, since the decoding of semantic signals relies on contextual knowledge, misalignment between the starting position of the semantic signal and the AI-based semantic decoder would prevent source signal recovery and reconstruction. To achieve more precise semantic communication, this study proposes an image-based semantic synchronization method leveraging intrinsic semantic features of image content. Specifically, a shared synchronized image (SyncImg) is encoded into a synchronization vector header at the transmitter and sent to the receiver. The receiver adopts a sliding window semantic decoder combined with classification and template matching methods to locate the synchronization point. Experimental results demonstrate that compared with traditional methods, the proposed method achieves a lower miss detected ratio (MDR) and root-mean-square error (RMSE) under low signal-to-noise ratios, realizing accurate synchronization of semantic signals across different devices.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Semantics-Division Duplexing: A Novel Full-Duplex Paradigm
Authors:
Kai Niu,
Zijian Liang,
Chao Dong,
**cheng Dai,
Zhongwei Si,
** Zhang
Abstract:
In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limi…
▽ More
In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limitations of the separate source and channel coding methods. On the other hand, artificial intelligence-enabled semantic communication can provide a viable direction for the optimization of the IBFD system. This article introduces a novel IBFD paradigm with the guidance of semantic communication called semantics-division duplexing (SDD). It utilizes semantic domain processing to further suppress self-interference, distinguish the expected semantic information, and recover the desired sources. Further integration of the digital and semantic domain processing can be implemented so as to achieve intelligent and concise communications. We present the advantages of the SDD paradigm with theoretical explanations and provide some visualized results to verify its effectiveness.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency
Authors:
Chenyu Tang,
Muzi Xu,
Wentian Yi,
Zibo Zhang,
Edoardo Occhipinti,
Chaoqun Dong,
Dafydd Ravenscroft,
Sung-Min Jung,
Sanghyo Lee,
Shuo Gao,
Jong Min Kim,
Luigi G. Occhipinti
Abstract:
Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 42…
▽ More
Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 420%, simplifies signal processing compared to traditional voice recognition methods. Our system uses a computationally efficient neural network, specifically a one-dimensional convolutional neural network with residual structures, to decode speech signals. This network is energy and time-efficient, reducing computational load by 90% while achieving 95.25% accuracy for a 20-word lexicon and swiftly adapting to new users and words with minimal samples. This innovation demonstrates a practical, sensitive, and precise wearable SSI suitable for daily communication applications.
△ Less
Submitted 7 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Performance Analysis of MDMA-Based Cooperative MRC Networks with Relays in Dissimilar Rayleigh Fading Channels
Authors:
Lei Teng,
Wannian An,
Chen Dong,
Xiaoqi Qin,
Xiaodong Xu
Abstract:
Multiple access technology is a key technology in various generations of wireless communication systems. As a potential multiple access technology for the next generation wireless communication systems, model division multiple access (MDMA) technology improves spectrum efficiency and feasibility regions. This implies that the MDMA scheme can achieve greater performance gains compared to traditiona…
▽ More
Multiple access technology is a key technology in various generations of wireless communication systems. As a potential multiple access technology for the next generation wireless communication systems, model division multiple access (MDMA) technology improves spectrum efficiency and feasibility regions. This implies that the MDMA scheme can achieve greater performance gains compared to traditional schemes. Relayassisted cooperative networks, as a infrastructure of wireless communication, can effectively utilize resources and improve performance when MDMA is applied. In this paper, a communication relay cooperative network based on MDMA in dissimilar rayleigh fading channels is proposed, which consists of two source nodes, any number of decode-and-forward (DF) relay nodes, and one destination node, as well as using the maximal ratio combining (MRC) at the destination to combine the signals received from the source and relays. By applying the state transition matrix (STM) and moment generating function (MGF), closed-form analytical solutions for outage probability and resource utilization efficiency are derived. Theoretical and simulation results are conducted to verify the validity of the theoretical analysis.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Physics-Informed Data Denoising for Real-Life Sensing Systems
Authors:
Xiyuan Zhang,
Xiaohan Fu,
Diyan Teng,
Chengyu Dong,
Keerthivasan Vijayakumar,
Jiayun Zhang,
Ranak Roy Chowdhury,
Junsheng Han,
Dezhi Hong,
Rashmi Kulkarni,
**gbo Shang,
Rajesh Gupta
Abstract:
Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically r…
▽ More
Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically rely on using ground truth clean data to train a denoising model, which is often challenging or prohibitive to obtain for many real-world applications. We observe that in many scenarios, the relationships between different sensor measurements (e.g., location and acceleration) are analytically described by laws of physics (e.g., second-order differential equation). By incorporating such physics constraints, we can guide the denoising process to improve even in the absence of ground truth data. In light of this, we design a physics-informed denoising model that leverages the inherent algebraic relationships between different measurements governed by the underlying physics. By obviating the need for ground truth clean data, our method offers a practical denoising solution for real-world applications. We conducted experiments in various domains, including inertial navigation, CO2 monitoring, and HVAC control, and achieved state-of-the-art performance compared with existing denoising methods. Our method can denoise data in real time (4ms for a sequence of 1s) for low-cost noisy sensors and produces results that closely align with those from high-precision, high-cost alternatives, leading to an efficient, cost-effective approach for more accurate sensor-based systems.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Unifying Image Processing as Visual Prompting Question Answering
Authors:
Yihao Liu,
Xiangyu Chen,
Xianzheng Ma,
Xintao Wang,
Jiantao Zhou,
Yu Qiao,
Chao Dong
Abstract:
Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a…
▽ More
Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a similar trend in computer vision, which focuses on develo** large-scale models through pretraining and in-context learning. This paradigm shift reduces the reliance on task-specific models, yielding a powerful unified model to deal with various tasks. However, these advances have predominantly concentrated on high-level vision tasks, with less attention paid to low-level vision tasks. To address this issue, we propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, etc. Our proposed framework, named PromptGIP, unifies these diverse image processing tasks within a universal framework. Inspired by NLP question answering (QA) techniques, we employ a visual prompting question answering paradigm. Specifically, we treat the input-output image pair as a structured question-answer sentence, thereby reprogramming the image processing task as a prompting QA problem. PromptGIP can undertake diverse cross-domain tasks using provided visual prompts, eliminating the need for task-specific finetuning. Our methodology offers a universal and adaptive solution to general image processing. While PromptGIP has demonstrated a certain degree of out-of-domain task generalization capability, further research is expected to fully explore its more powerful emergent generalization.
△ Less
Submitted 20 February, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
UAV Swarm Deployment and Trajectory for 3D Area Coverage via Reinforcement Learning
Authors:
Jia He,
Ziye Jia,
Chao Dong,
Junyu Liu,
Qihui Wu,
**gxian Liu
Abstract:
Unmanned aerial vehicles (UAVs) are recognized as promising technologies for area coverage due to the flexibility and adaptability. However, the ability of a single UAV is limited, and as for the large-scale three-dimensional (3D) scenario, UAV swarms can establish seamless wireless communication services. Hence, in this work, we consider a scenario of UAV swarm deployment and trajectory to satisf…
▽ More
Unmanned aerial vehicles (UAVs) are recognized as promising technologies for area coverage due to the flexibility and adaptability. However, the ability of a single UAV is limited, and as for the large-scale three-dimensional (3D) scenario, UAV swarms can establish seamless wireless communication services. Hence, in this work, we consider a scenario of UAV swarm deployment and trajectory to satisfy 3D coverage considering the effects of obstacles. In detail, we propose a hierarchical swarm framework to efficiently serve the large-area users. Then, the problem is formulated to minimize the total trajectory loss of the UAV swarm. However, the problem is intractable due to the non-convex property, and we decompose it into smaller issues of users clustering, UAV swarm hovering points selection, and swarm trajectory determination. Moreover, we design a Q-learning based algorithm to accelerate the solution efficiency. Finally, we conduct extensive simulations to verify the proposed mechanisms, and the designed algorithm outperforms other referred methods.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Symbol Detection for Coarsely Quantized OTFS
Authors:
Junwei He,
Haochuan Zhang,
Chao Dong,
Huimin Zhu
Abstract:
This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropi…
▽ More
This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropic, which imposes a significant loss to symbol detection algorithms like the original approximate message passing (AMP). Although the algorithm of generalized expectation consistent for signal recovery (GEC-SR) can mitigate this loss, the complexity in computation is prohibitively high, mainly due to an dramatic increase in the matrix size of OTFS. In this context, we propose a low-complexity algorithm that incorporates into the GEC-SR a quick inversion of quasi-banded matrices, reducing the complexity from a cubic order to a linear order while kee** the performance at the same level.
△ Less
Submitted 20 January, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Introducing Shape Prior Module in Diffusion Model for Medical Image Segmentation
Authors:
Zhiqing Zhang,
Guojia Fan,
Tianyong Liu,
Nan Li,
Yuyang Liu,
Ziyu Liu,
Canwei Dong,
Shoujun Zhou
Abstract:
Medical image segmentation is critical for diagnosing and treating spinal disorders. However, the presence of high noise, ambiguity, and uncertainty makes this task highly challenging. Factors such as unclear anatomical boundaries, inter-class similarities, and irrational annotations contribute to this challenge. Achieving both accurate and diverse segmentation templates is essential to support ra…
▽ More
Medical image segmentation is critical for diagnosing and treating spinal disorders. However, the presence of high noise, ambiguity, and uncertainty makes this task highly challenging. Factors such as unclear anatomical boundaries, inter-class similarities, and irrational annotations contribute to this challenge. Achieving both accurate and diverse segmentation templates is essential to support radiologists in clinical practice. In recent years, denoising diffusion probabilistic modeling (DDPM) has emerged as a prominent research topic in computer vision. It has demonstrated effectiveness in various vision tasks, including image deblurring, super-resolution, anomaly detection, and even semantic representation generation at the pixel level. Despite the robustness of existing diffusion models in visual generation tasks, they still struggle with discrete masks and their various effects. To address the need for accurate and diverse spine medical image segmentation templates, we propose an end-to-end framework called VerseDiff-UNet, which leverages the denoising diffusion probabilistic model (DDPM). Our approach integrates the diffusion model into a standard U-shaped architecture. At each step, we combine the noise-added image with the labeled mask to guide the diffusion direction accurately towards the target region. Furthermore, to capture specific anatomical a priori information in medical images, we incorporate a shape a priori module. This module efficiently extracts structural semantic information from the input spine images. We evaluate our method on a single dataset of spine images acquired through X-ray imaging. Our results demonstrate that VerseDiff-UNet significantly outperforms other state-of-the-art methods in terms of accuracy while preserving the natural features and variations of anatomy.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation
Authors:
Xiangyu Chen,
Zheyuan Li,
Zhengwen Zhang,
Jimmy S. Ren,
Yihao Liu,
**gwen He,
Yu Qiao,
Jiantao Zhou,
Chao Dong
Abstract:
Modern displays are capable of rendering video content with high dynamic range (HDR) and wide color gamut (WCG). However, the majority of available resources are still in standard dynamic range (SDR). As a result, there is significant value in transforming existing SDR content into the HDRTV standard. In this paper, we define and analyze the SDRTV-to-HDRTV task by modeling the formation of SDRTV/H…
▽ More
Modern displays are capable of rendering video content with high dynamic range (HDR) and wide color gamut (WCG). However, the majority of available resources are still in standard dynamic range (SDR). As a result, there is significant value in transforming existing SDR content into the HDRTV standard. In this paper, we define and analyze the SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV content. Our analysis and observations indicate that a naive end-to-end supervised training pipeline suffers from severe gamut transition errors. To address this issue, we propose a novel three-step solution pipeline called HDRTVNet++, which includes adaptive global color map**, local enhancement, and highlight refinement. The adaptive global color map** step uses global statistics as guidance to perform image-adaptive color map**. A local enhancement network is then deployed to enhance local details. Finally, we combine the two sub-networks above as a generator and achieve highlight consistency through GAN-based joint training. Our method is primarily designed for ultra-high-definition TV content and is therefore effective and lightweight for processing 4K resolution images. We also construct a dataset using HDR videos in the HDR10 standard, named HDRTV1K that contains 1235 and 117 training images and 117 testing images, all in 4K resolution. Besides, we select five metrics to evaluate the results of SDRTV-to-HDRTV algorithms. Our final results demonstrate state-of-the-art performance both quantitatively and visually. The code, model and dataset are available at https://github.com/xiaom233/HDRTVNet-plus.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
A balanced Memristor-CMOS ternary logic family and its application
Authors:
Xiao-Yuan Wang,
Jia-Wei Zhou,
Chuan-Tao Dong,
Xin-Hui Chen,
Sanjoy Kumar Nandi,
Robert G. Elliman,
Sung-Mo Kang,
Herbert Ho-Ching Iu
Abstract:
The design of balanced ternary digital logic circuits based on memristors and conventional CMOS devices is proposed. First, balanced ternary minimum gate TMIN, maximum gate TMAX and ternary inverters are systematically designed and verified by simulation, and then logic circuits such as ternary encoders, decoders and multiplexers are designed on this basis. Two different schemes are then used to r…
▽ More
The design of balanced ternary digital logic circuits based on memristors and conventional CMOS devices is proposed. First, balanced ternary minimum gate TMIN, maximum gate TMAX and ternary inverters are systematically designed and verified by simulation, and then logic circuits such as ternary encoders, decoders and multiplexers are designed on this basis. Two different schemes are then used to realize the design of functional combinational logic circuits such as a balanced ternary half adder, multiplier, and numerical comparator. Finally, we report a series of comparisons and analyses of the two design schemes, which provide a reference for subsequent research and development of three-valued logic circuits.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Novel Online-Offline MA2C-DDPG for Efficient Spectrum Allocation and Trajectory Optimization in Dynamic Spectrum Sharing UAV Networks
Authors:
Rui Ding,
Fuhui Zhou,
Yuben Qu,
Chao Dong,
Qihui Wu,
Tony Q. S. Quek
Abstract:
Unmanned aerial vehicle (UAV) communication is of crucial importance for diverse practical applications. However, it is susceptible to the severe spectrum scarcity problem and interference since it operates in the unlicensed spectrum band. In order to tackle those issues, a dynamic spectrum sharing network is considered with the anti-jamming technique. Moreover, an intelligent spectrum allocation…
▽ More
Unmanned aerial vehicle (UAV) communication is of crucial importance for diverse practical applications. However, it is susceptible to the severe spectrum scarcity problem and interference since it operates in the unlicensed spectrum band. In order to tackle those issues, a dynamic spectrum sharing network is considered with the anti-jamming technique. Moreover, an intelligent spectrum allocation and trajectory optimization scheme is proposed to adapt to diverse jamming models by exploiting our designed novel online-offline multi-agent actor-critic and deep deterministic policy-gradient framework. Simulation results demonstrate the high efficiency of our proposed framework. It is also shown that our proposed scheme achieves the largest transmission rate among all benchmark schemes.
△ Less
Submitted 27 August, 2023; v1 submitted 4 August, 2023;
originally announced August 2023.
-
Impact of UAVs Equipped with ADS-B on the Civil Aviation Monitoring System
Authors:
Yiyang Liao,
Lei Zhang,
Ziye Jia,
Chao Dong,
Yifan Zhang,
Qihui Wu,
Huiling Hu,
Bin Wang
Abstract:
In recent years, there is an increasing demand for unmanned aerial vehicles (UAVs) to complete multiple applications. However, as unmanned equipments, UAVs lead to some security risks to general civil aviations. In order to strengthen the flight management of UAVs and guarantee the safety, UAVs can be equipped with automatic dependent surveillance-broadcast (ADS-B) devices. In addition, as an auto…
▽ More
In recent years, there is an increasing demand for unmanned aerial vehicles (UAVs) to complete multiple applications. However, as unmanned equipments, UAVs lead to some security risks to general civil aviations. In order to strengthen the flight management of UAVs and guarantee the safety, UAVs can be equipped with automatic dependent surveillance-broadcast (ADS-B) devices. In addition, as an automatic system, ADS-B can periodically broadcast flight information to the nearby aircrafts or the ground stations, and the technology is already used in civil aviation systems. However, due to the limited frequency of ADS-B technique, UAVs equipped with ADS-B devices result in the loss of packets to both UAVs and civil aviation. Further, the operation of civil aviation are seriously interfered. Hence, this paper firstly examines the packets loss of civil planes at different distance, then analyzes the impact of UAVs equipped with ADS-B on the packets updating of civil planes. The result indicates that the 1090MHz band blocking is affected by the density of UAVs. Besides, the frequency capacity is affected by the requirement of updating interval of civil planes. The position updating probability within 3s is 92.3% if there are 200 planes within 50km and 20 UAVs within 5km. The position updating probability within 3s is 86.9% if there are 200 planes within 50km and 40 UAVs within 5km.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
The Potential of LEO Satellites in 6G Space-Air-Ground Enabled Access Networks
Authors:
Ziye Jia,
Chao Dong,
Kun Guo,
Qihui Wu
Abstract:
Space-air-ground integrated networks (SAGINs) help enhance the service performance in the sixth generation communication system. SAGIN is basically composed of satellites, aerial vehicles, ground facilities, as well as multiple terrestrial users. Therein, the low earth orbit (LEO) satellites are popular in recent years due to the low cost of development and launch, global coverage and delay-enable…
▽ More
Space-air-ground integrated networks (SAGINs) help enhance the service performance in the sixth generation communication system. SAGIN is basically composed of satellites, aerial vehicles, ground facilities, as well as multiple terrestrial users. Therein, the low earth orbit (LEO) satellites are popular in recent years due to the low cost of development and launch, global coverage and delay-enabled services. Moreover, LEO satellites can support various applications, e.g., direct access, relay, caching and computation. In this work, we firstly provide the preliminaries and framework of SAGIN, in which the characteristics of LEO satellites, high altitude platforms, as well as unmanned aerial vehicles are analyzed. Then, the roles and potentials of LEO satellite in SAGIN are analyzed for access services. A couple of advanced techniques such as multi-access edge computing (MEC) and network function virtualization are introduced to enhance the LEO-based access service abilities as hierarchical MEC and network slicing in SAGIN. In addition, corresponding use cases are provided to verify the propositions. Besides, we also discuss the open issues and promising directions in LEO-enabled SAGIN access services for the future research.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Crafting Training Degradation Distribution for the Accuracy-Generalization Trade-off in Real-World Super-Resolution
Authors:
Ruofan Zhang,
**** Gu,
Haoyu Chen,
Chao Dong,
Yulun Zhang,
Wenming Yang
Abstract:
Super-resolution (SR) techniques designed for real-world applications commonly encounter two primary challenges: generalization performance and restoration accuracy. We demonstrate that when methods are trained using complex, large-range degradations to enhance generalization, a decline in accuracy is inevitable. However, since the degradation in a certain real-world applications typically exhibit…
▽ More
Super-resolution (SR) techniques designed for real-world applications commonly encounter two primary challenges: generalization performance and restoration accuracy. We demonstrate that when methods are trained using complex, large-range degradations to enhance generalization, a decline in accuracy is inevitable. However, since the degradation in a certain real-world applications typically exhibits a limited variation range, it becomes feasible to strike a trade-off between generalization performance and testing accuracy within this scope. In this work, we introduce a novel approach to craft training degradation distributions using a small set of reference images. Our strategy is founded upon the binned representation of the degradation space and the Fréchet distance between degradation distributions. Our results indicate that the proposed technique significantly improves the performance of test images while preserving generalization capabilities in real-world applications.
△ Less
Submitted 1 June, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Latent Semantic Diffusion-based Channel Adaptive De-Noising SemCom for Future 6G Systems
Authors:
Bingxuan Xu,
Rui Meng,
Yue Chen,
Xiaodong Xu,
Chen Dong,
Hao Sun
Abstract:
Compared with the current Shannon's Classical Information Theory (CIT) paradigm, semantic communication (SemCom) has recently attracted more attention, since it aims to transmit the meaning of information rather than bit-by-bit transmission, thus enhancing data transmission efficiency and supporting future human-centric, data-, and resource-intensive intelligent services in 6G systems. Nevertheles…
▽ More
Compared with the current Shannon's Classical Information Theory (CIT) paradigm, semantic communication (SemCom) has recently attracted more attention, since it aims to transmit the meaning of information rather than bit-by-bit transmission, thus enhancing data transmission efficiency and supporting future human-centric, data-, and resource-intensive intelligent services in 6G systems. Nevertheless, channel noises are common and even serious in 6G-empowered scenarios, limiting the communication performance of SemCom, especially when Signal-to-Noise (SNR) levels during training and deployment stages are different, but training multi-networks to cover the scenario with a broad range of SNRs is computationally inefficient. Hence, we develop a novel De-Noising SemCom (DNSC) framework, where the designed de-noiser module can eliminate noise interference from semantic vectors. Upon the designed DNSC architecture, we further combine adversarial learning, variational autoencoder, and diffusion model to propose the Latent Diffusion DNSC (Latent-Diff DNSC) scheme to realize intelligent online de-noising. During the offline training phase, noises are added to latent semantic vectors in a forward Markov diffusion manner and then are eliminated in a reverse diffusion manner through the posterior distribution approximated by the U-shaped Network (U-Net), where the semantic de-noiser is optimized by maximizing evidence lower bound (ELBO). Such design can model real noisy channel environments with various SNRs and enable to adaptively remove noises from noisy semantic vectors during the online transmission phase. The simulations on open-source image datasets demonstrate the superiority of the proposed Latent-Diff DNSC scheme in PSNR and SSIM over different SNRs than the state-of-the-art schemes, including JPEG, Deep JSCC, and ADJSCC.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Distributionally Robust Chance-Constrained Optimization for Hierarchical UAV-based MEC
Authors:
Can Cui,
Ziye Jia,
Chao Dong,
Zhuang Ling,
Jiahao You,
Qihui Wu
Abstract:
Multi-access edge computing (MEC) is regarded as a promising technology in the sixth-generation communication. However, the antenna gain is always affected by the environment when unmanned aerial vehicles (UAVs) are served as MEC platforms, resulting in unexpected channel errors. In order to deal with the problem and reduce the power consumption in the UAV-based MEC, we jointly optimize the access…
▽ More
Multi-access edge computing (MEC) is regarded as a promising technology in the sixth-generation communication. However, the antenna gain is always affected by the environment when unmanned aerial vehicles (UAVs) are served as MEC platforms, resulting in unexpected channel errors. In order to deal with the problem and reduce the power consumption in the UAV-based MEC, we jointly optimize the access scheme and power allocation in the hierarchical UAV-based MEC. Specifically, UAVs are deployed in the lower layer to collect data from ground users. Moreover, a UAV with powerful computation ability is deployed in the upper layer to assist with computing. The goal is to guarantee the quality of service and minimize the total power consumption. We consider the errors caused by various perturbations in realistic circumstances and formulate a distributionally robust chance-constrained optimization problem with an uncertainty set. The problem with chance constraints is intractable. To tackle this issue, we utilize the conditional value-at-risk method to reformulate the problem into a semidefinite programming form. Then, a joint algorithm for access scheme and power allocation is designed. Finally, we conduct simulations to demonstrate the efficiency of the proposed algorithm.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Non-Orthogonal Multiple Access Enhanced Multi-User Semantic Communication
Authors:
Weizhi Li,
Haotai Liang,
Chen Dong,
Xiaodong Xu,
** Zhang,
Kaijun Liu
Abstract:
Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-base…
▽ More
Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-based multi-user semantic communication system named NOMASC is proposed in this paper. The proposed system can support semantic tranmission of multiple users with diverse modalities of source information. To avoid high demand for hardware, an asymmetric quantizer is employed at the end of the semantic encoder for discretizing the continuous full-resolution semantic feature. In addition, a neural network model is proposed for map** the discrete feature into self-learned symbols and accomplishing intelligent multi-user detection (MUD) at the receiver. Simulation results demonstrate that the proposed system holds good performance in non-orthogonal transmission of multiple user signals and outperforms the other methods, especially at low-to-medium SNRs. Moreover, it has high robustness under various simulation settings and mismatched test scenarios.
△ Less
Submitted 20 November, 2023; v1 submitted 12 March, 2023;
originally announced March 2023.
-
SFC Deployment in Space-Air-Ground Integrated Networks Based on Matching Game
Authors:
Yilu Cao,
Ziye Jia,
Chao Dong,
Yanting Wang,
Jiahao You,
Qihui Wu
Abstract:
The space-air-ground integrated network (SAGIN) is dynamic and flexible, which can support transmitting data in environments lacking ground communication facilities. However, the nodes of SAGIN are heterogeneous and it is intractable to share the resources to provide multiple services. Therefore, in this paper, we consider using network function virtualization technology to handle the problem of a…
▽ More
The space-air-ground integrated network (SAGIN) is dynamic and flexible, which can support transmitting data in environments lacking ground communication facilities. However, the nodes of SAGIN are heterogeneous and it is intractable to share the resources to provide multiple services. Therefore, in this paper, we consider using network function virtualization technology to handle the problem of agile resource allocation. In particular, the service function chains (SFCs) are constructed to deploy multiple virtual network functions of different tasks. To depict the dynamic model of SAGIN, we propose the reconfigurable time extension graph. Then, an optimization problem is formulated to maximize the number of completed tasks, i.e., the successful deployed SFC. It is a mixed integer linear programming problem, which is hard to solve in limited time complexity. Hence, we transform it as a many-to-one two-sided matching game problem. Then, we design a Gale-Shapley based algorithm. Finally, via abundant simulations, it is verified that the designed algorithm can effectively deploy SFCs with efficient resource utilization.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer
Authors:
Fanghua Yu,
Xintao Wang,
Mingdeng Cao,
Gen Li,
Ying Shan,
Chao Dong
Abstract:
Omnidirectional images (ODIs) have obtained lots of research interest for immersive experiences. Although ODIs require extremely high resolution to capture details of the entire scene, the resolutions of most ODIs are insufficient. Previous methods attempt to solve this issue by image super-resolution (SR) on equirectangular projection (ERP) images. However, they omit geometric properties of ERP i…
▽ More
Omnidirectional images (ODIs) have obtained lots of research interest for immersive experiences. Although ODIs require extremely high resolution to capture details of the entire scene, the resolutions of most ODIs are insufficient. Previous methods attempt to solve this issue by image super-resolution (SR) on equirectangular projection (ERP) images. However, they omit geometric properties of ERP in the degradation process, and their models can hardly generalize to real ERP images. In this paper, we propose Fisheye downsampling, which mimics the real-world imaging process and synthesizes more realistic low-resolution samples. Then we design a distortion-aware Transformer (OSRT) to modulate ERP distortions continuously and self-adaptively. Without a cumbersome process, OSRT outperforms previous methods by about 0.2dB on PSNR. Moreover, we propose a convenient data augmentation strategy, which synthesizes pseudo ERP images from plain images. This simple strategy can alleviate the over-fitting problem of large networks and significantly boost the performance of ODISR. Extensive experiments have demonstrated the state-of-the-art performance of our OSRT. Codes and models will be available at https://github.com/Fanghua-Yu/OSRT.
△ Less
Submitted 9 February, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
A Specific Task-oriented Semantic Image Communication System for substation patrol inspection
Authors:
Senran Fan,
Haotai Liang,
Chen Dong,
Xiaodong Xu,
Geng Liu
Abstract:
Intelligent inspection robots are widely used in substation patrol inspection, which can help check potential safety hazards by patrolling the substation and sending back scene images. However, when patrolling some marginal areas with weak signal, the scene images cannot be sucessfully transmissted to be used for hidden danger elimination, which greatly reduces the quality of robots'daily work. To…
▽ More
Intelligent inspection robots are widely used in substation patrol inspection, which can help check potential safety hazards by patrolling the substation and sending back scene images. However, when patrolling some marginal areas with weak signal, the scene images cannot be sucessfully transmissted to be used for hidden danger elimination, which greatly reduces the quality of robots'daily work. To solve such problem, a Specific Task-oriented Semantic Communication System for Imag-STSCI is designed, which involves the semantic features extraction, transmission, restoration and enhancement to get clearer images sent by intelligent robots under weak signals. Inspired by that only some specific details of the image are needed in such substation patrol inspection task, we proposed a new paradigm of semantic enhancement in such specific task to ensure the clarity of key semantic information when facing a lower bit rate or a low signal-to-noise ratio situation. Across the reality-based simulation, experiments show our STSCI can generally surpass traditional image-compression-based and channel-codingbased or other semantic communication system in the substation patrol inspection task with a lower bit rate even under a low signal-to-noise ratio situation.
△ Less
Submitted 13 April, 2024; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Efficient Image Super-Resolution using Vast-Receptive-Field Attention
Authors:
Lin Zhou,
Haoming Cai,
**** Gu,
Zheyuan Li,
Yingqi Liu,
Xiangyu Chen,
Yu Qiao,
Chao Dong
Abstract:
The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of th…
▽ More
The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of the attention branch, (2) replacing large dense convolution kernels with depth-wise separable convolutions, and (3) introducing pixel normalization. These approaches paint a clear evolutionary roadmap for the design of attention mechanisms. Based on these observations, we propose VapSR, the VAst-receptive-field Pixel attention network. Experiments demonstrate the superior performance of VapSR. VapSR outperforms the present lightweight networks with even fewer parameters. And the light version of VapSR can use only 21.68% and 28.18% parameters of IMDB and RFDN to achieve similar performances to those networks. The code and models are available at https://github.com/zhoumumu/VapSR.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images
Authors:
**** Gu,
Haoming Cai,
Chenyu Dong,
Ruofan Zhang,
Yulun Zhang,
Wenming Yang,
Chun Yuan
Abstract:
Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorit…
▽ More
Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorithm divides the image into two parts for processing, i.e., sharp edges and flatter areas. For edges, different from the previous SR methods that take the anti-aliased images as inputs, our proposed SRPO takes advantage of the characteristics of rasterized images to conduct SR on the rasterized images. To complement the residual between HR and low-resolution (LR) rasterized images, we train an ultra-efficient network to predict the offset maps to move the appropriate surrounding pixels to the new positions. For flat areas, we found simple interpolation methods can already generate reasonable output. We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image. The proposed network only contains 8,434 parameters and can be accelerated by network quantization. Extensive experiments show that the proposed SRPO can achieve superior visual effects at a smaller computational cost than the existing state-of-the-art methods.
△ Less
Submitted 9 October, 2022;
originally announced October 2022.
-
UDC-UNet: Under-Display Camera Image Restoration via U-Shape Dynamic Network
Authors:
Xina Liu,
**fan Hu,
Xiangyu Chen,
Chao Dong
Abstract:
Under-Display Camera (UDC) has been widely exploited to help smartphones realize full screen display. However, as the screen could inevitably affect the light propagation process, the images captured by the UDC system usually contain flare, haze, blur, and noise. Particularly, flare and blur in UDC images could severely deteriorate the user experience in high dynamic range (HDR) scenes. In this pa…
▽ More
Under-Display Camera (UDC) has been widely exploited to help smartphones realize full screen display. However, as the screen could inevitably affect the light propagation process, the images captured by the UDC system usually contain flare, haze, blur, and noise. Particularly, flare and blur in UDC images could severely deteriorate the user experience in high dynamic range (HDR) scenes. In this paper, we propose a new deep model, namely UDC-UNet, to address the UDC image restoration problem with the known Point Spread Function (PSF) in HDR scenes. On the premise that Point Spread Function (PSF) of the UDC system is known, we treat UDC image restoration as a non-blind image restoration problem and propose a novel learning-based approach. Our network consists of three parts, including a U-shape base network to utilize multi-scale information, a condition branch to perform spatially variant modulation, and a kernel branch to provide the prior knowledge of the given PSF. According to the characteristics of HDR data, we additionally design a tone map** loss to stabilize network optimization and achieve better visual quality. Experimental results show that the proposed UDC-UNet outperforms the state-of-the-art methods in quantitative and qualitative comparisons. Our approach won the second place in the UDC image restoration track of MIPI challenge. Codes will be publicly available.
△ Less
Submitted 11 September, 2022; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Recurrent LSTM-based UAV Trajectory Prediction with ADS-B Information
Authors:
Yifan Zhang,
Ziye Jia,
Chao Dong,
Yuntian Liu,
Lei Zhang,
Qihui Wu
Abstract:
Recently, unmanned aerial vehicles (UAVs) are gathering increasing attentions from both the academia and industry. The ever-growing number of UAV brings challenges for air traffic control (ATC), and thus trajectory prediction plays a vital role in ATC, especially for avoiding collisions among UAVs. However, the dynamic flight of UAV aggravates the complexity of trajectory prediction. Different wit…
▽ More
Recently, unmanned aerial vehicles (UAVs) are gathering increasing attentions from both the academia and industry. The ever-growing number of UAV brings challenges for air traffic control (ATC), and thus trajectory prediction plays a vital role in ATC, especially for avoiding collisions among UAVs. However, the dynamic flight of UAV aggravates the complexity of trajectory prediction. Different with civil aviation aircrafts, the most intractable difficulty for UAV trajectory prediction depends on acquiring effective location information. Fortunately, the automatic dependent surveillance-broadcast (ADS-B) is an effective technique to help obtain positioning information. It is widely used in the civil aviation aircraft, due to its high data update frequency and low cost of corresponding ground stations construction. Hence, in this work, we consider leveraging ADS-B to help UAV trajectory prediction. However, with the ADS-B information for a UAV, it still lacks efficient mechanism to predict the UAV trajectory. It is noted that the recurrent neural network (RNN) is available for the UAV trajectory prediction, in which the long short-term memory (LSTM) is specialized in dealing with the time-series data. As above, in this work, we design a system of UAV trajectory prediction with the ADS-B information, and propose the recurrent LSTM (RLSTM) based algorithm to achieve the accurate prediction. Finally, extensive simulations are conducted by Python to evaluate the proposed algorithms, and the results show that the average trajectory prediction error is satisfied, which is in line with expectations.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
NTIRE 2022 Challenge on Perceptual Image Quality Assessment
Authors:
**** Gu,
Haoming Cai,
Chao Dong,
Jimmy S. Ren,
Radu Timofte
Abstract:
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics fro…
▽ More
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics from traditional distortions and are included in the PIPAL dataset used in this challenge. This challenge is divided into two tracks, a full-reference IQA track similar to the previous NTIRE IQA challenge and a new track that focuses on the no-reference IQA methods. The challenge has 192 and 179 registered participants for two tracks. In the final testing stage, 7 and 8 participating teams submitted their models and fact sheets. Almost all of them have achieved better results than existing IQA methods, and the winning method can demonstrate state-of-the-art performance.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results
Authors:
Eduardo Pérez-Pellitero,
Sibi Catley-Chandar,
Richard Shaw,
Aleš Leonardis,
Radu Timofte,
Zexin Zhang,
Cen Liu,
Yunbo Peng,
Yue Lin,
Gaocheng Yu,
** Zhang,
Zhe Ma,
Hongbin Wang,
Xiangyu Chen,
Xintao Wang,
Haiwei Wu,
Lin Liu,
Chao Dong,
Jiantao Zhou,
Qingsen Yan,
Song Zhang,
Weiye Chen,
Yuhang Liu,
Zhen Zhang,
Yanning Zhang
, et al. (68 additional authors not shown)
Abstract:
This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)…
▽ More
This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemap** operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds).
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Evaluating the Generalization Ability of Super-Resolution Networks
Authors:
Yihao Liu,
Hengyuan Zhao,
**** Gu,
Yu Qiao,
Chao Dong
Abstract:
Performance and generalization ability are two important aspects to evaluate the deep learning models. However, research on the generalization ability of Super-Resolution (SR) networks is currently absent. Assessing the generalization ability of deep models not only helps us to understand their intrinsic mechanisms, but also allows us to quantitatively measure their applicability boundaries, which…
▽ More
Performance and generalization ability are two important aspects to evaluate the deep learning models. However, research on the generalization ability of Super-Resolution (SR) networks is currently absent. Assessing the generalization ability of deep models not only helps us to understand their intrinsic mechanisms, but also allows us to quantitatively measure their applicability boundaries, which is important for unrestricted real-world applications. To this end, we make the first attempt to propose a Generalization Assessment Index for SR networks, namely SRGA. SRGA exploits the statistical characteristics of the internal features of deep networks to measure the generalization ability. Specially, it is a non-parametric and non-learning metric. To better validate our method, we collect a patch-based image evaluation set (PIES) that includes both synthetic and real-world images, covering a wide range of degradations. With SRGA and PIES dataset, we benchmark existing SR models on the generalization ability. This work provides insights and tools for future research on model generalization in low-level vision.
△ Less
Submitted 3 September, 2023; v1 submitted 14 May, 2022;
originally announced May 2022.
-
Blueprint Separable Residual Network for Efficient Image Super-Resolution
Authors:
Zheyuan Li,
Yingqi Liu,
Xiangyu Chen,
Haoming Cai,
**** Gu,
Yu Qiao,
Chao Dong
Abstract:
Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there…
▽ More
Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there is still redundancy in the convolution operation. In this paper, we propose Blueprint Separable Residual Network (BSRN) containing two efficient designs. One is the usage of blueprint separable convolution (BSConv), which takes place of the redundant convolution operation. The other is to enhance the model ability by introducing more effective attention modules. The experimental results show that BSRN achieves state-of-the-art performance among existing efficient SR methods. Moreover, a smaller variant of our model BSRN-S won the first place in model complexity track of NTIRE 2022 Efficient SR Challenge. The code is available at https://github.com/xiaom233/BSRN.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results
Authors:
Yawei Li,
Kai Zhang,
Radu Timofte,
Luc Van Gool,
Fangyuan Kong,
Mingxi Li,
Songwei Liu,
Zongcai Du,
Ding Liu,
Chenhui Zhou,
**gyi Chen,
Qingrui Han,
Zheyuan Li,
Yingqi Liu,
Xiangyu Chen,
Haoming Cai,
Yu Qiao,
Chao Dong,
Long Sun,
**shan Pan,
Yi Zhu,
Zhikai Zong,
Xiaoxiao Liu,
Zheng Hui,
Tao Yang
, et al. (86 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e…
▽ More
This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization
Authors:
Xintao Wang,
Chao Dong,
Ying Shan
Abstract:
This paper explores training efficient VGG-style super-resolution (SR) networks with the structural re-parameterization technique. The general pipeline of re-parameterization is to train networks with multi-branch topology first, and then merge them into standard 3x3 convolutions for efficient inference. In this work, we revisit those primary designs and investigate essential components for re-par…
▽ More
This paper explores training efficient VGG-style super-resolution (SR) networks with the structural re-parameterization technique. The general pipeline of re-parameterization is to train networks with multi-branch topology first, and then merge them into standard 3x3 convolutions for efficient inference. In this work, we revisit those primary designs and investigate essential components for re-parameterizing SR networks. First of all, we find that batch normalization (BN) is important to bring training non-linearity and improve the final performance. However, BN is typically ignored in SR, as it usually degrades the performance and introduces unpleasant artifacts. We carefully analyze the cause of BN issue and then propose a straightforward yet effective solution. In particular, we first train SR networks with mini-batch statistics as usual, and then switch to using population statistics at the later training period. While we have successfully re-introduced BN into SR, we further design a new re-parameterizable block tailored for SR, namely RepSR. It consists of a clean residual path and two expand-and-squeeze convolution paths with the modified BN. Extensive experiments demonstrate that our simple RepSR is capable of achieving superior performance to previous SR re-parameterization methods among different model sizes. In addition, our RepSR can achieve a better trade-off between performance and actual running time (throughput) than previous SR methods. Codes will be available at https://github.com/TencentARC/RepSR.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
MM-RealSR: Metric Learning based Interactive Modulation for Real-World Super-Resolution
Authors:
Chong Mou,
Yanze Wu,
Xintao Wang,
Chao Dong,
Jian Zhang,
Ying Shan
Abstract:
Interactive image restoration aims to restore images by adjusting several controlling coefficients, which determine the restoration strength. Existing methods are restricted in learning the controllable functions under the supervision of known degradation types and levels. They usually suffer from a severe performance drop when the real degradation is different from their assumptions. Such a limit…
▽ More
Interactive image restoration aims to restore images by adjusting several controlling coefficients, which determine the restoration strength. Existing methods are restricted in learning the controllable functions under the supervision of known degradation types and levels. They usually suffer from a severe performance drop when the real degradation is different from their assumptions. Such a limitation is due to the complexity of real-world degradations, which can not provide explicit supervision to the interactive modulation during training. However, how to realize the interactive modulation in real-world super-resolution has not yet been studied. In this work, we present a Metric Learning based Interactive Modulation for Real-World Super-Resolution (MM-RealSR). Specifically, we propose an unsupervised degradation estimation strategy to estimate the degradation level in real-world scenarios. Instead of using known degradation levels as explicit supervision to the interactive mechanism, we propose a metric learning strategy to map the unquantifiable degradation levels in real-world scenarios to a metric space, which is trained in an unsupervised manner. Moreover, we introduce an anchor point strategy in the metric learning process to normalize the distribution of metric space. Extensive experiments demonstrate that the proposed MM-RealSR achieves excellent modulation and restoration performance in real-world super-resolution. Codes are available at https://github.com/TencentARC/MM-RealSR.
△ Less
Submitted 27 July, 2022; v1 submitted 10 May, 2022;
originally announced May 2022.
-
A Closer Look at Blind Super-Resolution: Degradation Models, Baselines, and Performance Upper Bounds
Authors:
Wenlong Zhang,
Guangyuan Shi,
Yihao Liu,
Chao Dong,
Xiao-Ming Wu
Abstract:
Degradation models play an important role in Blind super-resolution (SR). The classical degradation model, which mainly involves blur degradation, is too simple to simulate real-world scenarios. The recently proposed practical degradation model includes a full spectrum of degradation types, but only considers complex cases that use all degradation types in the degradation process, while ignoring m…
▽ More
Degradation models play an important role in Blind super-resolution (SR). The classical degradation model, which mainly involves blur degradation, is too simple to simulate real-world scenarios. The recently proposed practical degradation model includes a full spectrum of degradation types, but only considers complex cases that use all degradation types in the degradation process, while ignoring many important corner cases that are common in the real world. To address this problem, we propose a unified gated degradation model to generate a broad set of degradation cases using a random gate controller. Based on the gated degradation model, we propose simple baseline networks that can effectively handle non-blind, classical, practical degradation cases as well as many other corner cases. To fairly evaluate the performance of our baseline networks against state-of-the-art methods and understand their limits, we introduce the performance upper bound of an SR network for every degradation type. Our empirical analysis shows that with the unified gated degradation model, the proposed baselines can achieve much better performance than existing methods in quantitative and qualitative results, which are close to the performance upper bounds.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Activating More Pixels in Image Super-Resolution Transformer
Authors:
Xiangyu Chen,
Xintao Wang,
Jiantao Zhou,
Yu Qiao,
Chao Dong
Abstract:
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reco…
▽ More
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlap** cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB. Codes and models are available at https://github.com/XPixelGroup/HAT.
△ Less
Submitted 18 March, 2023; v1 submitted 9 May, 2022;
originally announced May 2022.
-
VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution
Authors:
Liangbin Xie. Xintao Wang,
Honglun Zhang,
Chao Dong,
Ying Shan
Abstract:
Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a consequence, the VFSR models trained on this dataset can not output visual-pleasing results. In this paper, we develop an automatic and scalable pipeline to collect a high-quality vi…
▽ More
Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a consequence, the VFSR models trained on this dataset can not output visual-pleasing results. In this paper, we develop an automatic and scalable pipeline to collect a high-quality video face dataset (VFHQ), which contains over $16,000$ high-fidelity clips of diverse interview scenarios. To verify the necessity of VFHQ, we further conduct experiments and demonstrate that VFSR models trained on our VFHQ dataset can generate results with sharper edges and finer textures than those trained on VoxCeleb1. In addition, we show that the temporal information plays a pivotal role in eliminating video consistency issues as well as further improving visual performance. Based on VFHQ, by analyzing the benchmarking study of several state-of-the-art algorithms under bicubic and blind settings. See our project page: https://liangbinxie.github.io/projects/vfhq
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Innovative semantic communication system
Authors:
Chen Dong,
Haotai Liang,
Xiaodong Xu,
Shujun Han,
Bizhu Wang,
** Zhang
Abstract:
Traditional communication systems focus on the transmission process, and the context-dependent meaning has been ignored. The fact that 5G system has approached Shannon limit and the increasing amount of data will cause communication bottleneck, such as the increased delay problems. Inspired by the ability of artificial intelligence to understand semantics, we propose a new communication paradigm,…
▽ More
Traditional communication systems focus on the transmission process, and the context-dependent meaning has been ignored. The fact that 5G system has approached Shannon limit and the increasing amount of data will cause communication bottleneck, such as the increased delay problems. Inspired by the ability of artificial intelligence to understand semantics, we propose a new communication paradigm, which integrates artificial intelligence and communication, the semantic communication system. Semantic communication is at the second level of communication based on Shannon and Weaver\cite{6197583}, which retains the semantic features of the transmitted information and recovers the signal at the receiver, thus compressing the communication traffic without losing important information. Different from other semantic communication systems, the proposed system not only transmits semantic information but also transmits semantic decoder. In addition, a general semantic metrics is proposed to measure the quality of semantic communication system. In particular, the semantic communication system for image, namely AESC-I, is designed to verify the feasibility of the new paradigm. Simulations are conducted on our system with the additive white Gaussian noise (AWGN) and the multipath fading channel using MNIST and Cifar10 datasets. The experimental results show that DeepSC-I can effectively extract semantic information and reconstruct images at a relatively low SNR.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
Hierarchical Aerial Computing for Internet of Things via Cooperation of HAPs and UAVs
Authors:
Ziye Jia,
Qihui Wu,
Chao Dong,
Chau Yuen,
Zhu Han
Abstract:
With the explosive increment of computation requirements, the multi-access edge computing (MEC) paradigm appears as an effective mechanism. Besides, as for the Internet of Things (IoT) in disasters or remote areas requiring MEC services, unmanned aerial vehicles (UAVs) and high altitude platforms (HAPs) are available to provide aerial computing services for these IoT devices. In this paper, we dev…
▽ More
With the explosive increment of computation requirements, the multi-access edge computing (MEC) paradigm appears as an effective mechanism. Besides, as for the Internet of Things (IoT) in disasters or remote areas requiring MEC services, unmanned aerial vehicles (UAVs) and high altitude platforms (HAPs) are available to provide aerial computing services for these IoT devices. In this paper, we develop the hierarchical aerial computing framework composed of HAPs and UAVs, to provide MEC services for various IoT applications. In particular, the problem is formulated to maximize the total IoT data computed by the aerial MEC platforms, restricted by the delay requirement of IoT and multiple resource constraints of UAVs and HAPs, which is an integer programming problem and intractable to solve. Due to the prohibitive complexity of exhaustive search, we handle the problem by presenting the matching game theory based algorithm to deal with the offloading decisions from IoT devices to UAVs, as well as a heuristic algorithm for the offloading decisions between UAVs and HAPs. The external effect affected by interplay of different IoT devices in the matching is tackled by the externality elimination mechanism. Besides, an adjustment algorithm is also proposed to make the best of aerial resources. The complexity of proposed algorithms is analyzed and extensive simulation results verify the efficiency of the proposed algorithms, and the system performances are also analyzed by the numerical results.
△ Less
Submitted 12 February, 2022;
originally announced February 2022.
-
Unmanned Aerial Vehicle Swarm-Enabled Edge Computing: Potentials, Promising Technologies, and Challenges
Authors:
Wei Wu,
Fuhui Zhou,
Baoyun Wang,
Qihui Wu,
Chao Dong,
Rose Qingyang Hu
Abstract:
Unmanned aerial vehicle (UAV) swarm enabled edge computing is envisioned to be promising in the sixth generation wireless communication networks due to their wide application sensories and flexible deployment. However, most of the existing works focus on edge computing enabled by a single or a small scale UAVs, which are very different from UAV swarm-enabled edge computing. In order to facilitate…
▽ More
Unmanned aerial vehicle (UAV) swarm enabled edge computing is envisioned to be promising in the sixth generation wireless communication networks due to their wide application sensories and flexible deployment. However, most of the existing works focus on edge computing enabled by a single or a small scale UAVs, which are very different from UAV swarm-enabled edge computing. In order to facilitate the practical applications of UAV swarm-enabled edge computing, the state of the art research is presented in this article. The potential applications, architectures and implementation considerations are illustrated. Moreover, the promising enabling technologies for UAV swarm-enabled edge computing are discussed. Furthermore, we outline challenges and open issues in order to shed light on the future research directions.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning
Authors:
Yihao Liu,
Hengyuan Zhao,
Kelvin C. K. Chan,
Xintao Wang,
Chen Change Loy,
Yu Qiao,
Chao Dong
Abstract:
Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization and existing methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfying colorization performance. We address this problem from a new perspective, b…
▽ More
Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization and existing methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfying colorization performance. We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework. Specifically, we propose a novel temporally consistent video colorization framework (TCVC). TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Furthermore, TCVC introduces a self-regularization learning (SRL) scheme to minimize the prediction difference obtained with different time steps. SRL does not require any ground-truth color videos for training and can further improve temporal consistency. Experiments demonstrate that our method can not only obtain visually pleasing colorized video, but also achieve clearly better temporal consistency than state-of-the-art methods.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
End-to-End Image Compression with Probabilistic Decoding
Authors:
Haichuan Ma,
Dong Liu,
Cunhui Dong,
Li Li,
Feng Wu
Abstract:
Lossy image compression is a many-to-one process, thus one bitstream corresponds to multiple possible original images, especially at low bit rates. However, this nature was seldom considered in previous studies on image compression, which usually chose one possible image as reconstruction, e.g. the one with the maximal a posteriori probability. We propose a learned image compression framework to n…
▽ More
Lossy image compression is a many-to-one process, thus one bitstream corresponds to multiple possible original images, especially at low bit rates. However, this nature was seldom considered in previous studies on image compression, which usually chose one possible image as reconstruction, e.g. the one with the maximal a posteriori probability. We propose a learned image compression framework to natively support probabilistic decoding. The compressed bitstream is decoded into a series of parameters that instantiate a pre-chosen distribution; then the distribution is used by the decoder to sample and reconstruct images. The decoder may adopt different sampling strategies and produce diverse reconstructions, among which some have higher signal fidelity and some others have better visual quality. The proposed framework is dependent on a revertible neural network-based transform to convert pixels into coefficients that obey the pre-chosen distribution as much as possible. Our code and models will be made publicly available.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
A New Journey from SDRTV to HDRTV
Authors:
Xiangyu Chen,
Zhengwen Zhang,
Jimmy S. Ren,
Lynhoo Tian,
Yu Qiao,
Chao Dong
Abstract:
Nowadays modern displays are capable to render video content with high dynamic range (HDR) and wide color gamut (WCG). However, most available resources are still in standard dynamic range (SDR). Therefore, there is an urgent demand to transform existing SDR-TV contents into their HDR-TV versions. In this paper, we conduct an analysis of SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV…
▽ More
Nowadays modern displays are capable to render video content with high dynamic range (HDR) and wide color gamut (WCG). However, most available resources are still in standard dynamic range (SDR). Therefore, there is an urgent demand to transform existing SDR-TV contents into their HDR-TV versions. In this paper, we conduct an analysis of SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV content. Base on the analysis, we propose a three-step solution pipeline including adaptive global color map**, local enhancement and highlight generation. Moreover, the above analysis inspires us to present a lightweight network that utilizes global statistics as guidance to conduct image-adaptive color map**. In addition, we construct a dataset using HDR videos in HDR10 standard, named HDRTV1K, and select five metrics to evaluate the results of SDRTV-to-HDRTV algorithms. Furthermore, our final results achieve state-of-the-art performance in quantitative comparisons and visual quality. The code and dataset are available at https://github.com/chxy95/HDRTVNet.
△ Less
Submitted 25 September, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
-
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Authors:
Xintao Wang,
Liangbin Xie,
Chao Dong,
Ying Shan
Abstract:
Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images. In this work, we extend the powerful ESRGAN to a practical restoration application (namely, Real-ESRGAN), which is trained with pure synthetic data. Specifically, a high-order degradation modelin…
▽ More
Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images. In this work, we extend the powerful ESRGAN to a practical restoration application (namely, Real-ESRGAN), which is trained with pure synthetic data. Specifically, a high-order degradation modeling process is introduced to better simulate complex real-world degradations. We also consider the common ringing and overshoot artifacts in the synthesis process. In addition, we employ a U-Net discriminator with spectral normalization to increase discriminator capability and stabilize the training dynamics. Extensive comparisons have shown its superior visual performance than prior works on various real datasets. We also provide efficient implementations to synthesize training pairs on the fly.
△ Less
Submitted 17 August, 2021; v1 submitted 22 July, 2021;
originally announced July 2021.