Search | arXiv e-print repository

MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos

Authors: Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang

Abstract: While impressive progress has been achieved, video instance segmentation (VIS) methods with per-clip input often fail on challenging videos with occluded objects and crowded scenes. This is mainly because instance queries in these methods cannot encode well the discriminative embeddings of instances, making the query-based segmenter difficult to distinguish those `hard' instances. To address these… ▽ More While impressive progress has been achieved, video instance segmentation (VIS) methods with per-clip input often fail on challenging videos with occluded objects and crowded scenes. This is mainly because instance queries in these methods cannot encode well the discriminative embeddings of instances, making the query-based segmenter difficult to distinguish those `hard' instances. To address these issues, we propose to mine discriminative query embeddings (MDQE) to segment occluded instances on challenging videos. First, we initialize the positional embeddings and content features of object queries by considering their spatial contextual information and the inter-frame object motion. Second, we propose an inter-instance mask repulsion loss to distance each instance from its nearby non-target instances. The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos. In specific, MDQE with ResNet50 achieves 33.0\% and 44.5\% mask AP on OVIS and YouTube-VIS 2021, respectively. Code of MDQE can be found at \url{https://github.com/MinghanLi/MDQE_CVPR2023}. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023

arXiv:2303.09769 [pdf, other]

Denoising Diffusion Autoencoders are Unified Self-supervised Learners

Authors: Weilai Xiang, Hongyu Yang, Di Huang, Yunhong Wang

Abstract: Inspired by recent advances in diffusion models, which are reminiscent of denoising autoencoders, we investigate whether they can acquire discriminative representations for classification via generative pre-training. This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners: by pre-training on unconditional image ge… ▽ More Inspired by recent advances in diffusion models, which are reminiscent of denoising autoencoders, we investigate whether they can acquire discriminative representations for classification via generative pre-training. This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners: by pre-training on unconditional image generation, DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders, thus making diffusion pre-training emerge as a general approach for generative-and-discriminative dual learning. To validate this, we conduct linear probe and fine-tuning evaluations. Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet, respectively, and is comparable to contrastive learning and masked autoencoders for the first time. Transfer learning from ImageNet also confirms the suitability of DDAE for Vision Transformers, suggesting the potential to scale DDAEs as unified foundation models. Code is available at github.com/FutureXiang/ddae. △ Less

Submitted 19 August, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: ICCV 2023 Oral

arXiv:2303.08525 [pdf, other]

MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360 Degree Image Saliency Prediction

Authors: Pan Gao, Xinlang Chen, Rong Quan, Wei Xiang

Abstract: Thanks to the ability of providing an immersive and interactive experience, the uptake of 360 degree image content has been rapidly growing in consumer and industrial applications. Compared to planar 2D images, saliency prediction for 360 degree images is more challenging due to their high resolutions and spherical viewing ranges. Currently, most high-performance saliency prediction models for omn… ▽ More Thanks to the ability of providing an immersive and interactive experience, the uptake of 360 degree image content has been rapidly growing in consumer and industrial applications. Compared to planar 2D images, saliency prediction for 360 degree images is more challenging due to their high resolutions and spherical viewing ranges. Currently, most high-performance saliency prediction models for omnidirectional images (ODIs) rely on deeper or broader convolutional neural networks (CNNs), which benefit from CNNs' superior feature representation capabilities while suffering from their high computational costs. In this paper, inspired by the human visual cognitive process, i.e., human being's perception of a visual scene is always accomplished by multiple stages of analysis, we propose a novel multi-stage recurrent generative adversarial networks for ODIs dubbed MRGAN360, to predict the saliency maps stage by stage. At each stage, the prediction model takes as input the original image and the output of the previous stage and outputs a more accurate saliency map. We employ a recurrent neural network among adjacent prediction stages to model their correlations, and exploit a discriminator at the end of each stage to supervise the output saliency map. In addition, we share the weights among all the stages to obtain a lightweight architecture that is computationally cheap. Extensive experiments are conducted to demonstrate that our proposed model outperforms the state-of-the-art model in terms of both prediction accuracy and model size. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2303.04935 [pdf, other]

X-Pruner: eXplainable Pruning for Vision Transformers

Authors: Lu Yu, Wei Xiang

Abstract: Recently vision transformer models have become prominent models for a range of tasks. These models, however, usually suffer from intensive computational costs and heavy memory requirements, making them impractical for deployment on edge platforms. Recent studies have proposed to prune transformers in an unexplainable manner, which overlook the relationship between internal units of the model and t… ▽ More Recently vision transformer models have become prominent models for a range of tasks. These models, however, usually suffer from intensive computational costs and heavy memory requirements, making them impractical for deployment on edge platforms. Recent studies have proposed to prune transformers in an unexplainable manner, which overlook the relationship between internal units of the model and the target class, thereby leading to inferior performance. To alleviate this problem, we propose a novel explainable pruning framework dubbed X-Pruner, which is designed by considering the explainability of the pruning criterion. Specifically, to measure each prunable unit's contribution to predicting each target class, a novel explainability-aware mask is proposed and learned in an end-to-end manner. Then, to preserve the most informative units and learn the layer-wise pruning rate, we adaptively search the layer-wise threshold that differentiates between unpruned and pruned units based on their explainability-aware mask values. To verify and evaluate our method, we apply the X-Pruner on representative transformer models including the DeiT and Swin Transformer. Comprehensive simulation results demonstrate that the proposed X-Pruner outperforms the state-of-the-art black-box methods with significantly reduced computational costs and slight performance degradation. △ Less

Submitted 5 June, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.03808 [pdf, other]

Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis

Authors: Kang Han, Wei Xiang

Abstract: Rendering novel views from captured multi-view images has made considerable progress since the emergence of the neural radiance field. This paper aims to further advance the quality of view synthesis by proposing a novel approach dubbed the neural radiance feature field (NRFF). We first propose a multiscale tensor decomposition scheme to organize learnable features so as to represent scenes from c… ▽ More Rendering novel views from captured multi-view images has made considerable progress since the emergence of the neural radiance field. This paper aims to further advance the quality of view synthesis by proposing a novel approach dubbed the neural radiance feature field (NRFF). We first propose a multiscale tensor decomposition scheme to organize learnable features so as to represent scenes from coarse to fine scales. We demonstrate many benefits of the proposed multiscale representation, including more accurate scene shape and appearance reconstruction, and faster convergence compared with the single-scale representation. Instead of encoding view directions to model view-dependent effects, we further propose to encode the rendering equation in the feature space by employing the anisotropic spherical Gaussian mixture predicted from the proposed multiscale representation. The proposed NRFF improves state-of-the-art rendering results by over 1 dB in PSNR on both the NeRF and NSVF synthetic datasets. A significant improvement has also been observed on the real-world Tanks & Temples dataset. Code can be found at https://github.com/imkanghan/nrff. △ Less

Submitted 27 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.14302 [pdf, other]

Improving Model Generalization by On-manifold Adversarial Augmentation in the Frequency Domain

Authors: Chang Liu, Wenzhao Xiang, Yuan He, Hui Xue, Shibao Zheng, Hang Su

Abstract: Deep neural networks (DNNs) may suffer from significantly degenerated performance when the training and test data are of different underlying distributions. Despite the importance of model generalization to out-of-distribution (OOD) data, the accuracy of state-of-the-art (SOTA) models on OOD data can plummet. Recent work has demonstrated that regular or off-manifold adversarial examples, as a spec… ▽ More Deep neural networks (DNNs) may suffer from significantly degenerated performance when the training and test data are of different underlying distributions. Despite the importance of model generalization to out-of-distribution (OOD) data, the accuracy of state-of-the-art (SOTA) models on OOD data can plummet. Recent work has demonstrated that regular or off-manifold adversarial examples, as a special case of data augmentation, can be used to improve OOD generalization. Inspired by this, we theoretically prove that on-manifold adversarial examples can better benefit OOD generalization. Nevertheless, it is nontrivial to generate on-manifold adversarial examples because the real manifold is generally complex. To address this issue, we proposed a novel method of Augmenting data with Adversarial examples via a Wavelet module (AdvWavAug), an on-manifold adversarial data augmentation technique that is simple to implement. In particular, we project a benign image into a wavelet domain. With the assistance of the sparsity characteristic of wavelet transformation, we can modify an image on the estimated data manifold. We conduct adversarial augmentation based on AdvProp training framework. Extensive experiments on different models and different datasets, including ImageNet and its distorted versions, demonstrate that our method can improve model generalization, especially on OOD data. By integrating AdvWavAug into the training process, we have achieved SOTA results on some recent transformer-based models. △ Less

Submitted 8 June, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.14301 [pdf, other]

A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking

Authors: Chang Liu, Yinpeng Dong, Wenzhao Xiang, Xiao Yang, Hang Su, Jun Zhu, Yuefeng Chen, Yuan He, Hui Xue, Shibao Zheng

Abstract: The robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts, which becomes an important research problem in the development of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rap… ▽ More The robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts, which becomes an important research problem in the development of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called \textbf{ARES-Bench} on the image classification task. In our benchmark, we evaluate the robustness of 55 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: 1) there is an inherent trade-off between adversarial and natural robustness for the same model architecture; 2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; 3) pre-training significantly improves natural robustness based on more training data or self-supervised learning. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. By designing the training settings accordingly, we achieve the new state-of-the-art adversarial robustness. We have made the benchmarking results and code platform publicly available. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: International Journal of Computer Vision (IJCV) [under review]

arXiv:2302.12420 [pdf, other]

An Iterative Classification and Semantic Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images

Authors: Zili Lu, Yuexing Peng, Wei Li, Junchuan Yu, Daqing Ge, Wei Xiang

Abstract: Huge challenges exist for old landslide detection because their morphology features have been partially or strongly transformed over a long time and have little difference from their surrounding. Besides, small-sample problem also restrict in-depth learning. In this paper, an iterative classification and semantic segmentation network (ICSSN) is developed, which can greatly enhance both object-le… ▽ More Huge challenges exist for old landslide detection because their morphology features have been partially or strongly transformed over a long time and have little difference from their surrounding. Besides, small-sample problem also restrict in-depth learning. In this paper, an iterative classification and semantic segmentation network (ICSSN) is developed, which can greatly enhance both object-level and pixel-level classification performance by iteratively upgrading the feature extractor shared by two network. An object-level contrastive learning (OCL) strategy is employed in the object classification sub-network featuring a siamese network to realize the global features extraction, and a sub-object-level contrastive learning (SOCL) paradigm is designed in the semantic segmentation sub-network to efficiently extract salient features from boundaries of landslides. Moreover, an iterative training strategy is elaborated to fuse features in semantic space such that both object-level and pixel-level classification performance are improved. The proposed ICSSN is evaluated on the real landslide data set, and the experimental results show that ICSSN can greatly improve the classification and segmentation accuracy of old landslide detection. For the semantic segmentation task, compared to the baseline, the F1 score increases from 0.5054 to 0.5448, the mIoU improves from 0.6405 to 0.6610, the landslide IoU improved from 0.3381 to 0.3743, and the object-level detection accuracy of old landslides is enhanced from 0.55 to 0.9. For the object classification task, the F1 score increases from 0.8846 to 0.9230, and the accuracy score is up from 0.8375 to 0.8875. △ Less

Submitted 24 April, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.03298 [pdf, other]

Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion

Authors: Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, Clinton Fookes

Abstract: In this work, we investigate the problem of Model-Agnostic Zero-Shot Classification (MA-ZSC), which refers to training non-specific classification architectures (downstream models) to classify real images without using any real images during training. Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC. H… ▽ More In this work, we investigate the problem of Model-Agnostic Zero-Shot Classification (MA-ZSC), which refers to training non-specific classification architectures (downstream models) to classify real images without using any real images during training. Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC. However, the performance of this approach currently falls short of that achieved by large-scale vision-language models. One possible explanation is a potential significant domain gap between synthetic and real images. Our work offers a fresh perspective on the problem by providing initial insights that MA-ZSC performance can be improved by improving the diversity of images in the generated dataset. We propose a set of modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity, which we refer to as our $\textbf{bag of tricks}$. Our approach shows notable improvements in various classification architectures, with results comparable to state-of-the-art models such as CLIP. To validate our approach, we conduct experiments on CIFAR10, CIFAR100, and EuroSAT, which is particularly difficult for zero-shot classification due to its satellite image domain. We evaluate our approach with five classification architectures, including ResNet and ViT. Our findings provide initial insights into the problem of MA-ZSC using diffusion models. All code will be available on GitHub. △ Less

Submitted 16 April, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: (10 pages, 6 figures, 3 tables, preprint)

MSC Class: 68T07 ACM Class: I.2; I.4; I.5

arXiv:2302.01825 [pdf, other]

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

Authors: Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

Abstract: Human pose estimation is a challenging task due to its structured data sequence nature. Existing methods primarily focus on pair-wise interaction of body joints, which is insufficient for scenarios involving overlap** joints and rapidly changing poses. To overcome these issues, we introduce a novel approach, the High-order Directed Transformer (HDFormer), which leverages high-order bone and join… ▽ More Human pose estimation is a challenging task due to its structured data sequence nature. Existing methods primarily focus on pair-wise interaction of body joints, which is insufficient for scenarios involving overlap** joints and rapidly changing poses. To overcome these issues, we introduce a novel approach, the High-order Directed Transformer (HDFormer), which leverages high-order bone and joint relationships for improved pose estimation. Specifically, HDFormer incorporates both self-attention and high-order attention to formulate a multi-order attention module. This module facilitates first-order "joint$\leftrightarrow$joint", second-order "bone$\leftrightarrow$joint", and high-order "hyperbone$\leftrightarrow$joint" interactions, effectively addressing issues in complex and occlusion-heavy situations. In addition, modern CNN techniques are integrated into the transformer-based architecture, balancing the trade-off between performance and efficiency. HDFormer significantly outperforms state-of-the-art (SOTA) models on Human3.6M and MPI-INF-3DHP datasets, requiring only 1/10 of the parameters and significantly lower computational costs. Moreover, HDFormer demonstrates broad real-world applicability, enabling real-time, accurate 3D pose estimation. The source code is in https://github.com/hyer/HDFormer △ Less

Submitted 22 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: Accepted to IJCAI 2023; 9 pages, 5 figures, 7 tables; the code is at https://github.com/hyer/HDFormer

Journal ref: In the 32nd international Joint Conference on Artificial Intelligence (IJCAI 2023)

arXiv:2301.07531 [pdf, other]

Safety Verification of Neural Network Control Systems Using Guaranteed Neural Network Model Reduction

Authors: Weiming Xiang, Zhongzhu Shao

Abstract: This paper aims to enhance the computational efficiency of safety verification of neural network control systems by develo** a guaranteed neural network model reduction method. First, a concept of model reduction precision is proposed to describe the guaranteed distance between the outputs of a neural network and its reduced-size version. A reachability-based algorithm is proposed to accurately… ▽ More This paper aims to enhance the computational efficiency of safety verification of neural network control systems by develo** a guaranteed neural network model reduction method. First, a concept of model reduction precision is proposed to describe the guaranteed distance between the outputs of a neural network and its reduced-size version. A reachability-based algorithm is proposed to accurately compute the model reduction precision. Then, by substituting a reduced-size neural network controller into the closed-loop system, an algorithm to compute the reachable set of the original system is developed, which is able to support much more computationally efficient safety verification processes. Finally, the developed methods are applied to a case study of the Adaptive Cruise Control system with a neural network controller, which is shown to significantly reduce the computational time of safety verification and thus validate the effectiveness of the method. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: CDC 2022, Cancun, Mexico

arXiv:2211.00936 [pdf, ps, other]

Local Well-posedness of Unsteady Potential Flows Near a Space Corner of Right Angle

Authors: Beixiang Fang, Wei Xiang, Feng Xiao

Abstract: In this paper we are concerned with the local well-posedness of the unsteady potential flows near a space corner of right angle, which could be formulated as an initial-boundary value problem of a hyperbolic equation of second order in a cornered-space domain. The corner singularity is the key difficulty in establishing the local well-posedness of the problem. Moreover, the boundary conditions on… ▽ More In this paper we are concerned with the local well-posedness of the unsteady potential flows near a space corner of right angle, which could be formulated as an initial-boundary value problem of a hyperbolic equation of second order in a cornered-space domain. The corner singularity is the key difficulty in establishing the local well-posedness of the problem. Moreover, the boundary conditions on both edges of the corner angle are of Neumann-type and fail to satisfy the linear stability condition, which makes it more difficult to establish a priori estimates on the boundary terms in the analysis. In this paper, extension methods will be updated to deal with the corner singularity, and, based on a key observation that the boundary operators are co-normal, new techniques will be developed to control the boundary terms. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.15843 [pdf, other]

Bi-Directional Iterative Prompt-Tuning for Event Argument Extraction

Authors: Lu Dai, Bang Wang, Wei Xiang, Yijun Mo

Abstract: Recently, prompt-tuning has attracted growing interests in event argument extraction (EAE). However, the existing prompt-tuning methods have not achieved satisfactory performance due to the lack of consideration of entity information. In this paper, we propose a bi-directional iterative prompt-tuning method for EAE, where the EAE task is treated as a cloze-style task to take full advantage of enti… ▽ More Recently, prompt-tuning has attracted growing interests in event argument extraction (EAE). However, the existing prompt-tuning methods have not achieved satisfactory performance due to the lack of consideration of entity information. In this paper, we propose a bi-directional iterative prompt-tuning method for EAE, where the EAE task is treated as a cloze-style task to take full advantage of entity information and pre-trained language models (PLMs). Furthermore, our method explores event argument interactions by introducing the argument roles of contextual entities into prompt construction. Since template and verbalizer are two crucial components in a cloze-style prompt, we propose to utilize the role label semantic knowledge to construct a semantic verbalizer and design three kinds of templates for the EAE task. Experiments on the ACE 2005 English dataset with standard and low-resource settings show that the proposed method significantly outperforms the peer state-of-the-art methods. Our code is available at https://github.com/HustMinsLab/BIP. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: To be accept by EMNLP 2022 as a full paper

arXiv:2210.15511 [pdf, other]

ProContEXT: Exploring Progressive Context Transformer for Tracking

Authors: **-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

Abstract: Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. To this end, we revamped the tracking framework with Progressive Context Encoding Transformer Tracker (ProContEXT), which coherently exploits spatial and… ▽ More Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. To this end, we revamped the tracking framework with Progressive Context Encoding Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories. Specifically, ProContEXT leverages a context-aware self-attention module to encode the spatial and temporal context, refining and updating the multi-scale static and dynamic templates to progressively perform accurately tracking. It explores the complementary between spatial and temporal context, raising a new pathway to multi-context modeling for transformer-based trackers. In addition, ProContEXT revised the token pruning technique to reduce computational complexity. Extensive experiments on popular benchmark datasets such as GOT-10k and TrackingNet demonstrate that the proposed ProContEXT achieves state-of-the-art performance. △ Less

Submitted 30 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted at ICASSP 2023, source code is at https://github.com/zhiqic/ProContEXT

arXiv:2209.13933 [pdf, other]

DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention

Authors: Quan Zhou, Huimin Shi, Weikang Xiang, Bin Kang, Xiaofu Wu, Longin Jan Latecki

Abstract: The recent advances of compressing high-accuracy convolution neural networks (CNNs) have witnessed remarkable progress for real-time object detection. To accelerate detection speed, lightweight detectors always have few convolution layers using single-path backbone. Single-path architecture, however, involves continuous pooling and downsampling operations, always resulting in coarse and inaccurate… ▽ More The recent advances of compressing high-accuracy convolution neural networks (CNNs) have witnessed remarkable progress for real-time object detection. To accelerate detection speed, lightweight detectors always have few convolution layers using single-path backbone. Single-path architecture, however, involves continuous pooling and downsampling operations, always resulting in coarse and inaccurate feature maps that are disadvantageous to locate objects. On the other hand, due to limited network capacity, recent lightweight networks are often weak in representing large scale visual data. To address these problems, this paper presents a dual-path network, named DPNet, with a lightweight attention scheme for real-time object detection. The dual-path architecture enables us to parallelly extract high-level semantic features and low-level object details. Although DPNet has nearly duplicated shape with respect to single-path detectors, the computational costs and model size are not significantly increased. To enhance representation capability, a lightweight self-correlation module (LSCM) is designed to capture global interactions, with only few computational overheads and network parameters. In neck, LSCM is extended into a lightweight crosscorrelation module (LCCM), capturing mutual dependencies among neighboring scale features. We have conducted exhaustive experiments on MS COCO and Pascal VOC 2007 datasets. The experimental results demonstrate that DPNet achieves state-of the-art trade-off between detection accuracy and implementation efficiency. Specifically, DPNet achieves 30.5% AP on MS COCO test-dev and 81.5% mAP on Pascal VOC 2007 test set, together mwith nearly 2.5M model size, 1.04 GFLOPs, and 164 FPS and 196 FPS for 320 x 320 input images of two datasets. △ Less

Submitted 28 September, 2022; originally announced September 2022.

arXiv:2208.09786 [pdf, other]

A New Radar Signal Multiparameter-Based Deinterleaving Method

Authors: Wang Chao, Liu Weisong, Li Xueqiong, Wang Xiang, Huang Zhitao

Abstract: Radar signal deinterleaving has been extensively and thoroughly investigated in the electronic reconnaissance field. In this work, a new radar signal multiparameter-based deinterleaving method is proposed. In this method, semantic information composed of the pulse repetition interval (PRI), pulse width (PW), radio frequency (RF), and pulse amplitude (PA) of a radar signal is used to deinterleave r… ▽ More Radar signal deinterleaving has been extensively and thoroughly investigated in the electronic reconnaissance field. In this work, a new radar signal multiparameter-based deinterleaving method is proposed. In this method, semantic information composed of the pulse repetition interval (PRI), pulse width (PW), radio frequency (RF), and pulse amplitude (PA) of a radar signal is used to deinterleave radar signals. A bidirectional gated recurrent unit (BGRU) is employed, and the difference of time of arrival (DTOA)/RF, DTOA/PW, and DTOA/PA of the pulse stream are input into the BGRU. Based on the semantic information contained in different radar signal types, each pulse in the obtained pulse stream is classified according to the semantic information category, and the radar signals are deinterleaved. Compared to the PRI-based deinterleaving methods, the proposed method utilizes the multidimensional information of radar signals. As a result, higher deinterleaving accuracy is achieved. Compared to other existing radar signal multiparameter-based deinterleaving methods, the proposed method can adapt to radar signals with complex parameter features as well as to complex signal environments, and can complete the use of multiparameter in one step. △ Less

Submitted 20 August, 2022; originally announced August 2022.

arXiv:2208.05318 [pdf, other]

Generative Action Description Prompts for Skeleton-based Action Recognition

Authors: Wangmeng Xiang, Chao Li, Yuxuan Zhou, Biao Wang, Lei Zhang

Abstract: Skeleton-based action recognition has recently received considerable attention. Current approaches to skeleton-based action recognition are typically formulated as one-hot classification tasks and do not fully exploit the semantic relations between actions. For example, "make victory sign" and "thumb up" are two actions of hand gestures, whose major difference lies in the movement of hands. This i… ▽ More Skeleton-based action recognition has recently received considerable attention. Current approaches to skeleton-based action recognition are typically formulated as one-hot classification tasks and do not fully exploit the semantic relations between actions. For example, "make victory sign" and "thumb up" are two actions of hand gestures, whose major difference lies in the movement of hands. This information is agnostic from the categorical one-hot encoding of action classes but could be unveiled from the action description. Therefore, utilizing action description in training could potentially benefit representation learning. In this work, we propose a Generative Action-description Prompts (GAP) approach for skeleton-based action recognition. More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning. Experiments show that our proposed GAP method achieves noticeable improvements over various baseline models without extra computation cost at inference. GAP achieves new state-of-the-arts on popular skeleton-based action recognition benchmarks, including NTU RGB+D, NTU RGB+D 120 and NW-UCLA. The source code is available at https://github.com/MartinXM/GAP. △ Less

Submitted 5 September, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: Accepted by ICCV23

arXiv:2207.13670 [pdf, other]

Meta-Interpolation: Time-Arbitrary Frame Interpolation via Dual Meta-Learning

Authors: Shixing Yu, Yiyang Ma, Wenhan Yang, Wei Xiang, Jiaying Liu

Abstract: Existing video frame interpolation methods can only interpolate the frame at a given intermediate time-step, e.g. 1/2. In this paper, we aim to explore a more generalized kind of video frame interpolation, that at an arbitrary time-step. To this end, we consider processing different time-steps with adaptively generated convolutional kernels in a unified way with the help of meta-learning. Specific… ▽ More Existing video frame interpolation methods can only interpolate the frame at a given intermediate time-step, e.g. 1/2. In this paper, we aim to explore a more generalized kind of video frame interpolation, that at an arbitrary time-step. To this end, we consider processing different time-steps with adaptively generated convolutional kernels in a unified way with the help of meta-learning. Specifically, we develop a dual meta-learned frame interpolation framework to synthesize intermediate frames with the guidance of context information and optical flow as well as taking the time-step as side information. First, a content-aware meta-learned flow refinement module is built to improve the accuracy of the optical flow estimation based on the down-sampled version of the input frames. Second, with the refined optical flow and the time-step as the input, a motion-aware meta-learned frame interpolation module generates the convolutional kernels for every pixel used in the convolution operations on the feature map of the coarse warped version of the input frames to generate the predicted frame. Extensive qualitative and quantitative evaluations, as well as ablation studies, demonstrate that, via introducing meta-learning in our framework in such a well-designed way, our method not only achieves superior performance to state-of-the-art frame interpolation approaches but also owns an extended capacity to support the interpolation at an arbitrary time-step. △ Less

Submitted 27 July, 2022; originally announced July 2022.

arXiv:2207.13259 [pdf, other]

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

Authors: Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Xian-Sheng Hua, Lei Zhang

Abstract: Transformer-based methods have recently achieved great advancement on 2D image-based vision tasks. For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation. How to efficiently and e… ▽ More Transformer-based methods have recently achieved great advancement on 2D image-based vision tasks. For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation. How to efficiently and effectively model the 3D self-attention of video data has been a great challenge for transformers. In this paper, we propose a Temporal Patch Shift (TPS) method for efficient 3D self-attention modeling in transformers for video-based action recognition. TPS shifts part of patches with a specific mosaic pattern in the temporal dimension, thus converting a vanilla spatial self-attention operation to a spatiotemporal one with little additional cost. As a result, we can compute 3D self-attention using nearly the same computation and memory cost as 2D self-attention. TPS is a plug-and-play module and can be inserted into existing 2D transformer models to enhance spatiotemporal feature learning. The proposed method achieves competitive performance with state-of-the-arts on Something-something V1 & V2, Diving-48, and Kinetics400 while being much more efficient on computation and memory cost. The source code of TPS can be found at https://github.com/MartinXM/TPS. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV22

arXiv:2207.05358 [pdf, other]

eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic Segmentation

Authors: Lu Yu, Wei Xiang, Juan Fang, Yi-** Phoebe Chen, Lianhua Chi

Abstract: Recently vision transformer models have become prominent models for a range of vision tasks. These models, however, are usually opaque with weak feature interpretability. Moreover, there is no method currently built for an intrinsically interpretable transformer, which is able to explain its reasoning process and provide a faithful explanation. To close these crucial gaps, we propose a novel visio… ▽ More Recently vision transformer models have become prominent models for a range of vision tasks. These models, however, are usually opaque with weak feature interpretability. Moreover, there is no method currently built for an intrinsically interpretable transformer, which is able to explain its reasoning process and provide a faithful explanation. To close these crucial gaps, we propose a novel vision transformer dubbed the eXplainable Vision Transformer (eX-ViT), an intrinsically interpretable transformer model that is able to jointly discover robust interpretable features and perform the prediction. Specifically, eX-ViT is composed of the Explainable Multi-Head Attention (E-MHA) module, the Attribute-guided Explainer (AttE) module and the self-supervised attribute-guided loss. The E-MHA tailors explainable attention weights that are able to learn semantically interpretable representations from local patches in terms of model decisions with noise robustness. Meanwhile, AttE is proposed to encode discriminative attribute features for the target object through diverse attribute discovery, which constitutes faithful evidence for the model's predictions. In addition, a self-supervised attribute-guided loss is developed for our eX-ViT, which aims at learning enhanced representations through the attribute discriminability mechanism and attribute diversity mechanism, to localize diverse and discriminative attributes and generate more robust explanations. As a result, we can uncover faithful and robust interpretations with diverse attributes through the proposed eX-ViT. △ Less

Submitted 12 July, 2022; originally announced July 2022.

arXiv:2206.15138 [pdf, other]

DFGC 2022: The Second DeepFake Game Competition

Authors: Bo Peng, Wei Xiang, Yue Jiang, Wei Wang, **g Dong, Zhenan Sun, Zhen Lei, Siwei Lyu

Abstract: This paper presents the summary report on our DFGC 2022 competition. The DeepFake is rapidly evolving, and realistic face-swaps are becoming more deceptive and difficult to detect. On the contrary, methods for detecting DeepFakes are also improving. There is a two-party game between DeepFake creators and defenders. This competition provides a common platform for benchmarking the game between the c… ▽ More This paper presents the summary report on our DFGC 2022 competition. The DeepFake is rapidly evolving, and realistic face-swaps are becoming more deceptive and difficult to detect. On the contrary, methods for detecting DeepFakes are also improving. There is a two-party game between DeepFake creators and defenders. This competition provides a common platform for benchmarking the game between the current state-of-the-arts in DeepFake creation and detection methods. The main research question to be answered by this competition is the current state of the two adversaries when competed with each other. This is the second edition after the last year's DFGC 2021, with a new, more diverse video dataset, a more realistic game setting, and more reasonable evaluation metrics. With this competition, we aim to stimulate research ideas for building better defenses against the DeepFake threats. We also release our DFGC 2022 dataset contributed by both our participants and ourselves to enrich the DeepFake data resources for the research community (https://github.com/NiCE-X/DFGC-2022). △ Less

Submitted 3 October, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: Accepted by IJCB 2022

arXiv:2206.07662 [pdf, other]

SP-ViT: Learning 2D Spatial Priors for Vision Transformers

Authors: Yuxuan Zhou, Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Lei Zhang, Margret Keuper, Xiansheng Hua

Abstract: Recently, transformers have shown great potential in image classification and established state-of-the-art results on the ImageNet benchmark. However, compared to CNNs, transformers converge slowly and are prone to overfitting in low-data regimes due to the lack of spatial inductive biases. Such spatial inductive biases can be especially beneficial since the 2D structure of an input image is not w… ▽ More Recently, transformers have shown great potential in image classification and established state-of-the-art results on the ImageNet benchmark. However, compared to CNNs, transformers converge slowly and are prone to overfitting in low-data regimes due to the lack of spatial inductive biases. Such spatial inductive biases can be especially beneficial since the 2D structure of an input image is not well preserved in transformers. In this work, we present Spatial Prior-enhanced Self-Attention (SP-SA), a novel variant of vanilla Self-Attention (SA) tailored for vision transformers. Spatial Priors (SPs) are our proposed family of inductive biases that highlight certain groups of spatial relations. Unlike convolutional inductive biases, which are forced to focus exclusively on hard-coded local regions, our proposed SPs are learned by the model itself and take a variety of spatial relations into account. Specifically, the attention score is calculated with emphasis on certain kinds of spatial relations at each head, and such learned spatial foci can be complementary to each other. Based on SP-SA we propose the SP-ViT family, which consistently outperforms other ViT models with similar GFlops or parameters. Our largest model SP-ViT-L achieves a record-breaking 86.3% Top-1 accuracy with a reduction in the number of parameters by almost 50% compared to previous state-of-the-art model (150M for SP-ViT-L vs 271M for CaiT-M-36) among all ImageNet-1K models trained on 224x224 and fine-tuned on 384x384 resolution w/o extra data. △ Less

Submitted 15 June, 2022; originally announced June 2022.

ACM Class: I.4

arXiv:2206.05168 [pdf, other]

Multi-faceted Graph Attention Network for Radar Target Recognition in Heterogeneous Radar Network

Authors: Han Meng, Yuexing Peng, Wei Xiang, Xu Pang, Wenbo Wang

Abstract: Radar target recognition (RTR), as a key technology of intelligent radar systems, has been well investigated. Accurate RTR at low signal-to-noise ratios (SNRs) still remains an open challenge. Most existing methods are based on a single radar or the homogeneous radar network, which do not fully exploit frequency-dimensional information. In this paper, a two-stream semantic feature fusion model, te… ▽ More Radar target recognition (RTR), as a key technology of intelligent radar systems, has been well investigated. Accurate RTR at low signal-to-noise ratios (SNRs) still remains an open challenge. Most existing methods are based on a single radar or the homogeneous radar network, which do not fully exploit frequency-dimensional information. In this paper, a two-stream semantic feature fusion model, termed Multi-faceted Graph Attention Network (MF-GAT), is proposed to greatly improve the accuracy in the low SNR region of the heterogeneous radar network. By fusing the features extracted from the source domain and transform domain via a graph attention network model, the MF-GAT model distills higher-level semantic features before classification in a unified framework. Extensive experiments are presented to demonstrate that the proposed model can greatly improve the RTR performance at low SNRs. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 6 pages, 4 figures

Journal ref: The paper is under consideration at Pattern Recognition Letters, Elsevier, 2022

arXiv:2205.06891 [pdf, ps, other]

doi 10.1109/TAI.2024.3397292

Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation

Authors: Jianan Liu, Hao Li, Tao Huang, Euijoon Ahn, Kang Han, Adeel Razi, Wei Xiang, **man Kim, David Dagan Feng

Abstract: High-resolution (HR) magnetic resonance imaging is critical in aiding doctors in their diagnoses and image-guided treatments. However, acquiring HR images can be time-consuming and costly. Consequently, deep learning-based super-resolution reconstruction (SRR) has emerged as a promising solution for generating super-resolution (SR) images from low-resolution (LR) images. Unfortunately, training su… ▽ More High-resolution (HR) magnetic resonance imaging is critical in aiding doctors in their diagnoses and image-guided treatments. However, acquiring HR images can be time-consuming and costly. Consequently, deep learning-based super-resolution reconstruction (SRR) has emerged as a promising solution for generating super-resolution (SR) images from low-resolution (LR) images. Unfortunately, training such neural networks requires aligned authentic HR and LR image pairs, which are challenging to obtain due to patient movements during and between image acquisitions. While rigid movements of hard tissues can be corrected with image registration, aligning deformed soft tissues is complex, making it impractical to train neural networks with authentic HR and LR image pairs. Previous studies have focused on SRR using authentic HR images and down-sampled synthetic LR images. However, the difference in degradation representations between synthetic and authentic LR images suppresses the quality of SR images reconstructed from authentic LR images. To address this issue, we propose a novel Unsupervised Degradation Adaptation Network (UDEAN). Our network consists of a degradation learning network and an SRR network. The degradation learning network downsamples the HR images using the degradation representation learned from the misaligned or unpaired LR images. The SRR network then learns the map** from the down-sampled HR images to the original ones. Experimental results show that our method outperforms state-of-the-art networks and is a promising solution to the challenges in clinical settings. △ Less

Submitted 24 April, 2024; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: Accepted by IEEE Transactions on Artificial Intelligence

arXiv:2204.07360 [pdf, other]

Spatio-Temporal-Frequency Graph Attention Convolutional Network for Aircraft Recognition Based on Heterogeneous Radar Network

Authors: Han Meng, Yuexing Peng, Wenbo Wang, Peng Cheng, Yonghui Li, Wei Xiang

Abstract: This paper proposes a knowledge-and-data-driven graph neural network-based collaboration learning model for reliable aircraft recognition in a heterogeneous radar network. The aircraft recognizability analysis shows that: (1) the semantic feature of an aircraft is motion patterns driven by the kinetic characteristics, and (2) the grammatical features contained in the radar cross-section (RCS) sign… ▽ More This paper proposes a knowledge-and-data-driven graph neural network-based collaboration learning model for reliable aircraft recognition in a heterogeneous radar network. The aircraft recognizability analysis shows that: (1) the semantic feature of an aircraft is motion patterns driven by the kinetic characteristics, and (2) the grammatical features contained in the radar cross-section (RCS) signals present spatial-temporal-frequency (STF) diversity decided by both the electromagnetic radiation shape and motion pattern of the aircraft. Then a STF graph attention convolutional network (STFGACN) is developed to distill semantic features from the RCS signals received by the heterogeneous radar network. Extensive experiment results verify that the STFGACN outperforms the baseline methods in terms of detection accuracy, and ablation experiments are carried out to further show that the expansion of the information dimension can gain considerable benefits to perform robustly in the low signal-to-noise ratio region. △ Less

Submitted 15 April, 2022; originally announced April 2022.

Comments: 11 pages, 17 figures

arXiv:2203.06553 [pdf, other]

doi 10.1109/ITSC55140.2022.9922540

Contrastive Learning for Automotive mmWave Radar Detection Points Based Instance Segmentation

Authors: Weiyi Xiong, Jianan Liu, Yuxuan Xia, Tao Huang, Bing Zhu, Wei Xiang

Abstract: The automotive mmWave radar plays a key role in advanced driver assistance systems (ADAS) and autonomous driving. Deep learning-based instance segmentation enables real-time object identification from the radar detection points. In the conventional training process, accurate annotation is the key. However, high-quality annotations of radar detection points are challenging to achieve due to their a… ▽ More The automotive mmWave radar plays a key role in advanced driver assistance systems (ADAS) and autonomous driving. Deep learning-based instance segmentation enables real-time object identification from the radar detection points. In the conventional training process, accurate annotation is the key. However, high-quality annotations of radar detection points are challenging to achieve due to their ambiguity and sparsity. To address this issue, we propose a contrastive learning approach for implementing radar detection points-based instance segmentation. We define the positive and negative samples according to the ground-truth label, apply the contrastive loss to train the model first, and then perform fine-tuning for the following downstream task. In addition, these two steps can be merged into one, and pseudo labels can be generated for the unlabeled data to improve the performance further. Thus, there are four different training settings for our method. Experiments show that when the ground-truth information is only available for a small proportion of the training data, our method still achieves a comparable performance to the approach trained in a supervised manner with 100% ground-truth information. △ Less

Submitted 5 February, 2023; v1 submitted 12 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE ITSC 2022

arXiv:2203.03980 [pdf, other]

Human Biometric Signals Monitoring based on WiFi Channel State Information using Deep Learning

Authors: Moyu Liu, Zihuai Lin, Pei Xiao, Wei Xiang

Abstract: In this paper, we first present a single-input, multiple-output convolutional neural network that can estimate both heart rate and respiration rate simultaneously by exploiting the underlying link between heart rate and respiration rate. The inputs to the neural network are the amplitude and phase of channel state information collected by a pair of WiFi devices. Our WiFi-based technique addresses… ▽ More In this paper, we first present a single-input, multiple-output convolutional neural network that can estimate both heart rate and respiration rate simultaneously by exploiting the underlying link between heart rate and respiration rate. The inputs to the neural network are the amplitude and phase of channel state information collected by a pair of WiFi devices. Our WiFi-based technique addresses privacy concerns and is adaptable to a variety of settings. This system overall accuracy for the heart and respiration rate estimation can reach 99.109% and 98.581%, respectively. Furthermore, we developed and analyzed two deep learning-based neural network classification algorithms for categorizing four types of sleep stages: wake, rapid eye movement (REM) sleep, non-rapid eye movement (NREM) light sleep, and NREM deep sleep. This system overall classification accuracy can reach 95.925% △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2203.02982 [pdf, other]

A Survey of Implicit Discourse Relation Recognition

Authors: Wei Xiang, Bang Wang

Abstract: A discourse containing one or more sentences describes daily issues and events for people to communicate their thoughts and opinions. As sentences are normally consist of multiple text segments, correct understanding of the theme of a discourse should take into consideration of the relations in between text segments. Although sometimes a connective exists in raw texts for conveying relations, it i… ▽ More A discourse containing one or more sentences describes daily issues and events for people to communicate their thoughts and opinions. As sentences are normally consist of multiple text segments, correct understanding of the theme of a discourse should take into consideration of the relations in between text segments. Although sometimes a connective exists in raw texts for conveying relations, it is more often the cases that no connective exists in between two text segments but some implicit relation does exist in between them. The task of implicit discourse relation recognition (IDRR) is to detect implicit relation and classify its sense between two text segments without a connective. Indeed, the IDRR task is important to diverse downstream natural language processing tasks, such as text summarization, machine translation and so on. This article provides a comprehensive and up-to-date survey for the IDRR task. We first summarize the task definition and data sources widely used in the field. We categorize the main solution approaches for the IDRR task from the viewpoint of its development history. In each solution category, we present and analyze the most representative methods, including their origins, ideas, strengths and weaknesses. We also present performance comparisons for those solutions experimented on a public corpus with standard data processing procedures. Finally, we discuss future research directions for discourse relation analysis. △ Less

Submitted 6 March, 2022; originally announced March 2022.

Comments: IDRRSurvey-LaTeX v1.6, 32 pages with 9 figures

arXiv:2202.01214 [pdf, ps, other]

Approximate Bisimulation Relations for Neural Networks and Application to Assured Neural Network Compression

Authors: Weiming Xiang, Zhongzhu Shao

Abstract: In this paper, we propose a concept of approximate bisimulation relation for feedforward neural networks. In the framework of approximate bisimulation relation, a novel neural network merging method is developed to compute the approximate bisimulation error between two neural networks based on reachability analysis of neural networks. The developed method is able to quantitatively measure the dist… ▽ More In this paper, we propose a concept of approximate bisimulation relation for feedforward neural networks. In the framework of approximate bisimulation relation, a novel neural network merging method is developed to compute the approximate bisimulation error between two neural networks based on reachability analysis of neural networks. The developed method is able to quantitatively measure the distance between the outputs of two neural networks with the same inputs. Then, we apply the approximate bisimulation relation results to perform neural networks model reduction and compute the compression precision, i.e., assured neural networks compression. At last, using the assured neural network compression, we accelerate the verification processes of ACAS Xu neural networks to illustrate the effectiveness and advantages of our proposed approximate bisimulation approach. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2201.06750 [pdf, other]

doi 10.1109/TGRS.2022.3197546

DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution Remote Sensing Images

Authors: Ying Wang, Yuexing Peng, Xinran Liu, Wei Li, George C. Alexandropoulos, Junchuan Yu, Daqing Ge, Wei Xiang

Abstract: Extracting roads from high-resolution remote sensing images (HRSIs) is vital in a wide variety of applications, such as autonomous driving, path planning, and road navigation. Due to the long and thin shape as well as the shades induced by vegetation and buildings, small-sized roads are more difficult to discern. In order to improve the reliability and accuracy of small-sized road extraction when… ▽ More Extracting roads from high-resolution remote sensing images (HRSIs) is vital in a wide variety of applications, such as autonomous driving, path planning, and road navigation. Due to the long and thin shape as well as the shades induced by vegetation and buildings, small-sized roads are more difficult to discern. In order to improve the reliability and accuracy of small-sized road extraction when roads of multiple sizes coexist in an HRSI, an enhanced deep neural network model termed Dual-Decoder-U-Net (DDU-Net) is proposed in this paper. Motivated by the U-Net model, a small decoder is added to form a dual-decoder structure for more detailed features. In addition, we introduce the dilated convolution attention module (DCAM) between the encoder and decoders to increase the receptive field as well as to distill multi-scale features through cascading dilated convolution and global average pooling. The convolutional block attention module (CBAM) is also embedded in the parallel dilated convolution and pooling branches to capture more attention-aware features. Extensive experiments are conducted on the Massachusetts Roads dataset with experimental results showing that the proposed model outperforms the state-of-the-art DenseUNet, DeepLabv3+ and D-LinkNet by 6.5%, 3.3%, and 2.1% in the mean Intersection over Union (mIoU), and by 4%, 4.8%, and 3.1% in the F1 score, respectively. Both ablation and heatmap analyses are presented to validate the effectiveness of the proposed model. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2112.11703 [pdf, ps, other]

The finite time blow-up of the Yang-Mills flow

Authors: Wang Guan Xiang, Zhang Chuan **g

Abstract: In this paper, we shall prove that, on a non-flat Riemannian vector bundle over a compact Riemannian manifold, the smooth solution of the Yang-Mills flow will blow up in finite time if the energy of the initial connection is small enough. We also consider the finite time blow up for the Yang-Mills flow with the initial curvature near the harmonic form. Furthermore, when $E$ is a holomorphic vector… ▽ More In this paper, we shall prove that, on a non-flat Riemannian vector bundle over a compact Riemannian manifold, the smooth solution of the Yang-Mills flow will blow up in finite time if the energy of the initial connection is small enough. We also consider the finite time blow up for the Yang-Mills flow with the initial curvature near the harmonic form. Furthermore, when $E$ is a holomorphic vector bundle over a compact Kähler manifold, then $E$ will admit a projective flat structure if the trace free part of Chern curvature is small enough. △ Less

Submitted 22 December, 2021; originally announced December 2021.

arXiv:2111.02036 [pdf, other]

GRCN: Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback

Authors: Wei Yinwei, Wang Xiang, Nie Liqiang, He Xiangnan, Chua Tat-Seng

Abstract: Reorganizing implicit feedback of users as a user-item interaction graph facilitates the applications of graph convolutional networks (GCNs) in recommendation tasks. In the interaction graph, edges between user and item nodes function as the main element of GCNs to perform information propagation and generate informative representations. Nevertheless, an underlying challenge lies in the quality of… ▽ More Reorganizing implicit feedback of users as a user-item interaction graph facilitates the applications of graph convolutional networks (GCNs) in recommendation tasks. In the interaction graph, edges between user and item nodes function as the main element of GCNs to perform information propagation and generate informative representations. Nevertheless, an underlying challenge lies in the quality of interaction graph, since observed interactions with less-interested items occur in implicit feedback (say, a user views micro-videos accidentally). This means that the neighborhoods involved with such false-positive edges will be influenced negatively and the signal on user preference can be severely contaminated. However, existing GCN-based recommender models leave such challenge under-explored, resulting in suboptimal representations and performance. In this work, we focus on adaptively refining the structure of interaction graph to discover and prune potential false-positive edges. Towards this end, we devise a new GCN-based recommender model, \emph{Graph-Refined Convolutional Network} (GRCN), which adjusts the structure of interaction graph adaptively based on status of model training, instead of remaining the fixed structure. In particular, a graph refining layer is designed to identify the noisy edges with the high confidence of being false-positive interactions, and consequently prune them in a soft manner. We then apply a graph convolutional layer on the refined graph to distill informative signals on user preference. Through extensive experiments on three datasets for micro-video recommendation, we validate the rationality and effectiveness of our GRCN. Further in-depth analysis presents how the refined graph benefits the GCN-based recommender model. △ Less

Submitted 3 November, 2021; originally announced November 2021.

arXiv:2110.14925 [pdf, other]

Hierarchical User Intent Graph Network forMultimedia Recommendation

Authors: Wei Yinwei, Wang Xiang, He Xiangnan, Nie Liqiang, Rui Yong, Chua Tat-Seng

Abstract: In this work, we aim to learn multi-level user intents from the co-interacted patterns of items, so as to obtain high-quality representations of users and items and further enhance the recommendation performance. Towards this end, we develop a novel framework, Hierarchical User Intent Graph Network, which exhibits user intents in a hierarchical graph structure, from the fine-grained to coarse-grai… ▽ More In this work, we aim to learn multi-level user intents from the co-interacted patterns of items, so as to obtain high-quality representations of users and items and further enhance the recommendation performance. Towards this end, we develop a novel framework, Hierarchical User Intent Graph Network, which exhibits user intents in a hierarchical graph structure, from the fine-grained to coarse-grained intents. In particular, we get the multi-level user intents by recursively performing two operations: 1) intra-level aggregation, which distills the signal pertinent to user intents from co-interacted item graphs; and 2) inter-level aggregation, which constitutes the supernode in higher levels to model coarser-grained user intents via gathering the nodes' representations in the lower ones. Then, we refine the user and item representations as a distribution over the discovered intents, instead of simple pre-existing features. To demonstrate the effectiveness of our model, we conducted extensive experiments on three public datasets. Our model achieves significant improvements over the state-of-the-art methods, including MMGCN and DisenGCN. Furthermore, by visualizing the item representations, we provide the semantics of user intents. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2110.10854 [pdf, ps, other]

Performance Analysis for Covert Communications Under Faster-than-Nyquist Signaling

Authors: Yuan Li, Yuchen Zhang, Wanyu Xiang, Jianquan Wang, Sa Xiao, Liang Chang, Wanbin Tang

Abstract: In this letter, we analyze the performance of covert communications under faster-than-Nyquist (FTN) signaling in the Rayleigh block fading channel. Both Bayesian criterion- and Kullback-Leibler (KL) divergence-based covertness constraints are considered. Especially, for KL divergence-based one, we prove that both the maximum transmit power and covert rate under FTN signaling are higher than those… ▽ More In this letter, we analyze the performance of covert communications under faster-than-Nyquist (FTN) signaling in the Rayleigh block fading channel. Both Bayesian criterion- and Kullback-Leibler (KL) divergence-based covertness constraints are considered. Especially, for KL divergence-based one, we prove that both the maximum transmit power and covert rate under FTN signaling are higher than those under Nyquist signaling. Numerical results coincide with our analysis and validate the advantages of FTN signaling to realize covert data transmission. △ Less

Submitted 17 January, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

Comments: We have corrected the typos and inappropriate description throughout the paper as reviewers suggested. We have proved the superiority of FTN signaling on covert communications with the same detection time duration at Willie in Theorem 3 and renewed the simulation results in Section V as Reviewer 2 suggested. This paper has been resubmitted to IEEE Communications Letters on 14-Jan-2022

arXiv:2110.09903 [pdf, other]

Unrestricted Adversarial Attacks on ImageNet Competition

Authors: Yuefeng Chen, Xiaofeng Mao, Yuan He, Hui Xue, Chao Li, Yinpeng Dong, Qi-An Fu, Xiao Yang, Wenzhao Xiang, Tianyu Pang, Hang Su, Jun Zhu, Fangcheng Liu, Chao Zhang, Hongyang Zhang, Yichi Zhang, Shilong Liu, Chang Liu, Wenzhao Xiang, Yajie Wang, Huipeng Zhou, Haoran Lyu, Yidan Xu, Zixuan Xu, Taoyu Zhu , et al. (13 additional authors not shown)

Abstract: Many works have investigated the adversarial attacks or defenses under the settings where a bounded and imperceptible perturbation can be added to the input. However in the real-world, the attacker does not need to comply with this restriction. In fact, more threats to the deep model come from unrestricted adversarial examples, that is, the attacker makes large and visible modifications on the ima… ▽ More Many works have investigated the adversarial attacks or defenses under the settings where a bounded and imperceptible perturbation can be added to the input. However in the real-world, the attacker does not need to comply with this restriction. In fact, more threats to the deep model come from unrestricted adversarial examples, that is, the attacker makes large and visible modifications on the image, which causes the model classifying mistakenly, but does not affect the normal observation in human perspective. Unrestricted adversarial attack is a popular and practical direction but has not been studied thoroughly. We organize this competition with the purpose of exploring more effective unrestricted adversarial attack algorithm, so as to accelerate the academical research on the model robustness under stronger unbounded attacks. The competition is held on the TianChi platform (\url{https://tianchi.aliyun.com/competition/entrance/531853/introduction}) as one of the series of AI Security Challengers Program. △ Less

Submitted 25 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

Comments: CVPR-2021 AIC Phase VI Track2: Unrestricted Adversarial Attacks on ImageNet

arXiv:2110.08256 [pdf, other]

Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness

Authors: Xiao Yang, Yinpeng Dong, Wenzhao Xiang, Tianyu Pang, Hang Su, Jun Zhu

Abstract: The vulnerability of deep neural networks to adversarial examples has motivated an increasing number of defense strategies for promoting model robustness. However, the progress is usually hampered by insufficient robustness evaluations. As the de facto standard to evaluate adversarial robustness, adversarial attacks typically solve an optimization problem of crafting adversarial examples with an i… ▽ More The vulnerability of deep neural networks to adversarial examples has motivated an increasing number of defense strategies for promoting model robustness. However, the progress is usually hampered by insufficient robustness evaluations. As the de facto standard to evaluate adversarial robustness, adversarial attacks typically solve an optimization problem of crafting adversarial examples with an iterative process. In this work, we propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the optimizer in adversarial attacks parameterized by a recurrent neural network, which is trained over a class of data samples and defenses to produce effective update directions during adversarial example generation. Furthermore, we develop a model-agnostic training algorithm to improve the generalization ability of the learned optimizer when attacking unseen defenses. Our approach can be flexibly incorporated with various attacks and consistently improves the performance with little extra computational cost. Extensive experiments demonstrate the effectiveness of the learned attacks by MAMA compared to the state-of-the-art attacks on different defenses, leading to a more reliable evaluation of adversarial robustness. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2110.08042 [pdf, other]

Adversarial Attacks on ML Defense Models Competition

Authors: Yinpeng Dong, Qi-An Fu, Xiao Yang, Wenzhao Xiang, Tianyu Pang, Hang Su, Jun Zhu, Jiayu Tang, Yuefeng Chen, XiaoFeng Mao, Yuan He, Hui Xue, Chao Li, Ye Liu, Qilong Zhang, Lianli Gao, Yunrui Yu, Xitong Gao, Zhe Zhao, Daquan Lin, Jiadong Lin, Chuanbiao Song, Zihao Wang, Zhennan Wu, Yang Guo , et al. (3 additional authors not shown)

Abstract: Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years. However, the progress of building more robust models is usually hampered by the incomplete or incorrect robustness evaluation. To accelerate the research on reliable evaluation of adversarial robustness of the current de… ▽ More Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years. However, the progress of building more robust models is usually hampered by the incomplete or incorrect robustness evaluation. To accelerate the research on reliable evaluation of adversarial robustness of the current defense models in image classification, the TSAIL group at Tsinghua University and the Alibaba Security group organized this competition along with a CVPR 2021 workshop on adversarial machine learning (https://aisecure-workshop.github.io/amlcvpr2021/). The purpose of this competition is to motivate novel attack algorithms to evaluate adversarial robustness more effectively and reliably. The participants were encouraged to develop stronger white-box attack algorithms to find the worst-case robustness of different defenses. This competition was conducted on an adversarial robustness evaluation platform -- ARES (https://github.com/thu-ml/ares), and is held on the TianChi platform (https://tianchi.aliyun.com/competition/entrance/531847/introduction) as one of the series of AI Security Challengers Program. After the competition, we summarized the results and established a new adversarial robustness benchmark at https://ml.cs.tsinghua.edu.cn/ares-bench/, which allows users to upload adversarial attack algorithms and defense models for evaluation. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: Competition Report

arXiv:2110.03626 [pdf, other]

doi 10.1098/rsta.2021.0302

Tracking the national and regional COVID-19 epidemic status in the UK using directed Principal Component Analysis

Authors: Ben Swallow, Wen Xiang, Jasmina Panovska-Griffiths

Abstract: One of the difficulties in monitoring an ongoing pandemic is deciding on the metric that best describes its status when multiple intercorrelated measurements are available. Having a single measure, such as the effective reproduction number R, has been a simple and useful metric for tracking the epidemic and for imposing policy interventions to curb the increase when R >1. While R is easy to interp… ▽ More One of the difficulties in monitoring an ongoing pandemic is deciding on the metric that best describes its status when multiple intercorrelated measurements are available. Having a single measure, such as the effective reproduction number R, has been a simple and useful metric for tracking the epidemic and for imposing policy interventions to curb the increase when R >1. While R is easy to interpret in a fully susceptible population, it is more difficult to interpret for a population with heterogeneous prior immunity, e.g., from vaccination and prior infection. We propose an additional metric for tracking the UK epidemic which can capture the different spatial scales. These are the principal scores (PCs) from a weighted Principal Component Analysis. In this paper, we have used the methodology across the four UK nations and across the first two epidemic waves (January 2020-March 2021) to show that first principal score across nations and epidemic waves is a representative indicator of the state of the pandemic and are correlated with the trend in R. Hospitalisations are shown to be consistently representative, however, the precise dominant indicator, i.e. the principal loading(s) of the analysis, can vary geographically and across epidemic waves. △ Less

Submitted 10 March, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

arXiv:2109.15177 [pdf, other]

You Cannot Easily Catch Me: A Low-Detectable Adversarial Patch for Object Detectors

Authors: Zijian Zhu, Hang Su, Chang Liu, Wenzhao Xiang, Shibao Zheng

Abstract: Blind spots or outright deceit can bedevil and deceive machine learning models. Unidentified objects such as digital "stickers," also known as adversarial patches, can fool facial recognition systems, surveillance systems and self-driving cars. Fortunately, most existing adversarial patches can be outwitted, disabled and rejected by a simple classification network called an adversarial patch detec… ▽ More Blind spots or outright deceit can bedevil and deceive machine learning models. Unidentified objects such as digital "stickers," also known as adversarial patches, can fool facial recognition systems, surveillance systems and self-driving cars. Fortunately, most existing adversarial patches can be outwitted, disabled and rejected by a simple classification network called an adversarial patch detector, which distinguishes adversarial patches from original images. An object detector classifies and predicts the types of objects within an image, such as by distinguishing a motorcyclist from the motorcycle, while also localizing each object's placement within the image by "drawing" so-called bounding boxes around each object, once again separating the motorcyclist from the motorcycle. To train detectors even better, however, we need to keep subjecting them to confusing or deceitful adversarial patches as we probe for the models' blind spots. For such probes, we came up with a novel approach, a Low-Detectable Adversarial Patch, which attacks an object detector with small and texture-consistent adversarial patches, making these adversaries less likely to be recognized. Concretely, we use several geometric primitives to model the shapes and positions of the patches. To enhance our attack performance, we also assign different weights to the bounding boxes in terms of loss function. Our experiments on the common detection dataset COCO as well as the driving-video dataset D2-City show that LDAP is an effective attack method, and can resist the adversarial patch detector. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2109.05820 [pdf, other]

Improving the Robustness of Adversarial Attacks Using an Affine-Invariant Gradient Estimator

Authors: Wenzhao Xiang, Hang Su, Chang Liu, Yandong Guo, Shibao Zheng

Abstract: As designers of artificial intelligence try to outwit hackers, both sides continue to hone in on AI's inherent vulnerabilities. Designed and trained from certain statistical distributions of data, AI's deep neural networks (DNNs) remain vulnerable to deceptive inputs that violate a DNN's statistical, predictive assumptions. Before being fed into a neural network, however, most existing adversarial… ▽ More As designers of artificial intelligence try to outwit hackers, both sides continue to hone in on AI's inherent vulnerabilities. Designed and trained from certain statistical distributions of data, AI's deep neural networks (DNNs) remain vulnerable to deceptive inputs that violate a DNN's statistical, predictive assumptions. Before being fed into a neural network, however, most existing adversarial examples cannot maintain malicious functionality when applied to an affine transformation. For practical purposes, maintaining that malicious functionality serves as an important measure of the robustness of adversarial attacks. To help DNNs learn to defend themselves more thoroughly against attacks, we propose an affine-invariant adversarial attack, which can consistently produce more robust adversarial examples over affine transformations. For efficiency, we propose to disentangle current affine-transformation strategies from the Euclidean geometry coordinate plane with its geometric translations, rotations and dilations; we reformulate the latter two in polar coordinates. Afterwards, we construct an affine-invariant gradient estimator by convolving the gradient at the original image with derived kernels, which can be integrated with any gradient-based attack methods. Extensive experiments on ImageNet, including some experiments under physical condition, demonstrate that our method can significantly improve the affine invariance of adversarial examples and, as a byproduct, improve the transferability of adversarial examples, compared with alternative state-of-the-art methods. △ Less

Submitted 22 April, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

arXiv:2108.12313 [pdf, other]

TE-YOLOF: Tiny and efficient YOLOF for blood cell detection

Authors: Fanxin Xu, Xiangkui Li, Hang Yang, Yali Wang, Wei Xiang

Abstract: Blood cell detection in microscopic images is an essential branch of medical image processing research. Since disease detection based on manual checking of blood cells is time-consuming and full of errors, testing of blood cells using object detectors with Deep Convolutional Neural Network can be regarded as a feasible solution. In this work, an object detector based on YOLOF has been proposed to… ▽ More Blood cell detection in microscopic images is an essential branch of medical image processing research. Since disease detection based on manual checking of blood cells is time-consuming and full of errors, testing of blood cells using object detectors with Deep Convolutional Neural Network can be regarded as a feasible solution. In this work, an object detector based on YOLOF has been proposed to detect blood cell objects such as red blood cells, white blood cells and platelets. This object detector is called TE-YOLOF, Tiny and Efficient YOLOF, and it is a One-Stage detector using dilated encoder to extract information from single-level feature maps. For increasing efficiency and flexibility, the EfficientNet Convolutional Neural Network is utilized as the backbone for the proposed object detector. Furthermore, the Depthwise Separable Convolution is applied to enhance the performance and minimize the parameters of the network. In addition, the Mish activation function is employed to increase the precision. Extensive experiments on the BCCD dataset prove the effectiveness of the proposed model, which is more efficient than other existing studies for blood cell detection. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:2108.11032 [pdf, other]

Improving Visual Quality of Unrestricted Adversarial Examples with Wavelet-VAE

Authors: Wenzhao Xiang, Chang Liu, Shibao Zheng

Abstract: Traditional adversarial examples are typically generated by adding perturbation noise to the input image within a small matrix norm. In practice, un-restricted adversarial attack has raised great concern and presented a new threat to the AI safety. In this paper, we propose a wavelet-VAE structure to reconstruct an input image and generate adversarial examples by modifying the latent code. Differe… ▽ More Traditional adversarial examples are typically generated by adding perturbation noise to the input image within a small matrix norm. In practice, un-restricted adversarial attack has raised great concern and presented a new threat to the AI safety. In this paper, we propose a wavelet-VAE structure to reconstruct an input image and generate adversarial examples by modifying the latent code. Different from perturbation-based attack, the modifications of the proposed method are not limited but imperceptible to human eyes. Experiments show that our method can generate high quality adversarial examples on ImageNet dataset. △ Less

Submitted 24 August, 2021; originally announced August 2021.

arXiv:2108.09207 [pdf, ps, other]

Persistence of the steady planar normal shock structure in 3-D unsteady potential flows

Authors: Beixiang Fang, Feimin Huang, Wei Xiang, Feng Xiao

Abstract: This paper concerns the dynamic stability of the steady 3-D wave structure of a planar normal shock front intersecting perpendicularly to a planar solid wall for unsteady potential flows. The stability problem can be formulated as a free boundary problem of a quasi-linear hyperbolic equation of second order in a dihedral-space domain between the shock front and the solid wall. The key difficulty i… ▽ More This paper concerns the dynamic stability of the steady 3-D wave structure of a planar normal shock front intersecting perpendicularly to a planar solid wall for unsteady potential flows. The stability problem can be formulated as a free boundary problem of a quasi-linear hyperbolic equation of second order in a dihedral-space domain between the shock front and the solid wall. The key difficulty is brought by the edge singularity of the space domain, the intersection curve between the shock front and the solid wall. Different from the 2-D case, for which the singular part of the boundary is only a point, it is a curve for the 3-D case in this paper. This difference brings new difficulties to the mathematical analysis of the stability problem. A modified partial hodograph transformation is introduced such that the extension technique developed for the 2-D case can be employed to establish the well-posed theory for the initial-boundary value problem of the linearized hyperbolic equation of second order in a dihedral-space domain. Moreover, the extension technique is improved in this paper such that loss of regularity in the a priori estimates on the shock front does not occur. Thus the classical nonlinear iteration scheme can be constructed to prove the existence of the solution to the stability problem, which shows the dynamic stability of the steady planar normal shock without applying the Nash-Moser iteration method. △ Less

Submitted 20 August, 2021; originally announced August 2021.

arXiv:2108.05061 [pdf, other]

NI-UDA: Graph Adversarial Domain Adaptation from Non-shared-and-Imbalanced Big Data to Small Imbalanced Applications

Authors: Guangyi Xiao, Weiwei Xiang, Huan Liu, Hao Chen, Shun Peng, **gzhi Guo, Zhiguo Gong

Abstract: We propose a new general Graph Adversarial Domain Adaptation (GADA) based on semantic knowledge reasoning of class structure for solving the problem of unsupervised domain adaptation (UDA) from the big data with non-shared and imbalanced classes to specified small and imbalanced applications (NI-UDA), where non-shared classes mean the label space out of the target domain. Our goal is to leverage p… ▽ More We propose a new general Graph Adversarial Domain Adaptation (GADA) based on semantic knowledge reasoning of class structure for solving the problem of unsupervised domain adaptation (UDA) from the big data with non-shared and imbalanced classes to specified small and imbalanced applications (NI-UDA), where non-shared classes mean the label space out of the target domain. Our goal is to leverage priori hierarchy knowledge to enhance domain adversarial aligned feature representation with graph reasoning. In this paper, to address two challenges in NI-UDA, we equip adversarial domain adaptation with Hierarchy Graph Reasoning (HGR) layer and the Source Classifier Filter (SCF). For sparse classes transfer challenge, our HGR layer can aggregate local feature to hierarchy graph nodes by node prediction and enhance domain adversarial aligned feature with hierarchy graph reasoning for sparse classes. Our HGR contributes to learn direct semantic patterns for sparse classes by hierarchy attention in self-attention, non-linear map** and graph normalization. our SCF is proposed for the challenge of knowledge sharing from non-shared data without negative transfer effect by filtering low-confidence non-shared data in HGR layer. Experiments on two benchmark datasets show our GADA methods consistently improve the state-of-the-art adversarial UDA algorithms, e.g. GADA(HGR) can greatly improve f1 of the MDD by \textbf{7.19\%} and GVB-GD by \textbf{7.89\%} respectively on imbalanced source task in Meal300 dataset. The code is available at https://gadatransfer.wixsite.com/gada. △ Less

Submitted 11 August, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

Comments: 11 pages, 5 figures, and 8 tables

arXiv:2107.12801 [pdf, ps, other]

Robust Optimization Framework for Training Shallow Neural Networks Using Reachability Method

Authors: Yejiang Yang, Weiming Xiang

Abstract: In this paper, a robust optimization framework is developed to train shallow neural networks based on reachability analysis of neural networks. To characterize noises of input data, the input training data is disturbed in the description of interval sets. Interval-based reachability analysis is then performed for the hidden layer. With the reachability analysis results, a robust optimization train… ▽ More In this paper, a robust optimization framework is developed to train shallow neural networks based on reachability analysis of neural networks. To characterize noises of input data, the input training data is disturbed in the description of interval sets. Interval-based reachability analysis is then performed for the hidden layer. With the reachability analysis results, a robust optimization training method is developed in the framework of robust least-square problems. Then, the developed robust least-square problem is relaxed to a semidefinite programming problem. It has been shown that the developed robust learning method can provide better robustness against perturbations at the price of loss of training accuracy to some extent. At last, the proposed method is evaluated on a robot arm model learning example. △ Less

Submitted 27 July, 2021; originally announced July 2021.

arXiv:2107.11725 [pdf, ps, other]

Convergence Rate of Hypersonic Similarity for Steady Potential Flows Over Two-Dimensional Lipschitz Wedge

Authors: Jie Kuang, Wei Xiang, Yongqian Zhang

Abstract: This paper is devoted to establishing the convergence rate of the hypersonic similarity for the inviscid steady irrotational Euler flow over a two-dimensional Lipschitz slender wedge in $BV\cap L^1$ space. The rate we established is the same as the one predicted by Newtonian-Busemann law (see (3.29) in \cite[Page 67]{anderson} for more details)as the incoming Mach number… ▽ More This paper is devoted to establishing the convergence rate of the hypersonic similarity for the inviscid steady irrotational Euler flow over a two-dimensional Lipschitz slender wedge in $BV\cap L^1$ space. The rate we established is the same as the one predicted by Newtonian-Busemann law (see (3.29) in \cite[Page 67]{anderson} for more details)as the incoming Mach number $\textrm{M}_{\infty}\rightarrow\infty$ for a fixed hypersonic similarity parameter $K$. The hypersonic similarity, which is also called the Mach-number independence principle, is equivalent to the following Van Dyke's similarity theory: For a given hypersonic similarity parameter $K$, when the Mach number of the flow is sufficiently large, the governing equations after the scaling are approximated by a simpler equation, that is called the hypersonic small-disturbance equation. To achieve the convergence rate, we approximate the curved boundary by piecewisely straight lines and find a new Lipschitz continuous map $\mathcal{P}_{h}$ such that the trajectory can be obtained by piecing together the Riemann solutions near the approximated boundary. Next, we derive the $L^1$ difference estimates between the approximate solutions $U^{(τ)}_{h,ν}(x,\cdot)$ to the initial-boundary value problem for the scaled equations and the trajectories $\mathcal{P}_{h}(x,0)(U^ν_{0})$ by piecing together all the Riemann solvers. Then, by the uniqueness and the compactness of $\mathcal{P}_{h}$ and $U^{(τ)}_{h,ν}$, we can further establish the $L^1$ estimates of order $τ^2$ between the solutions to the initial-boundary value problem for the scaled equations and the solutions to the initial-boundary value problem for the hypersonic small-disturbance equations, if the total variations of the initial data and the tangential derivative of the boundary are sufficiently small. △ Less

Submitted 25 July, 2021; originally announced July 2021.

arXiv:2105.14113 [pdf, ps, other]

Necessary and Sufficient Conditions for Stability of Discrete-Time Switched Linear Systems with Ranged Dwell Time

Authors: Weiming Xiang

Abstract: This paper deals with the stability analysis problem of discrete-time switched linear systems with ranged dwell time. A novel concept called L-switching-cycle is proposed, which contains sequences of multiple activation cycles satisfying the prescribed ranged dwell time constraint. Based on L-switching-cycle, two sufficient conditions are proposed to ensure the global uniform asymptotic stability… ▽ More This paper deals with the stability analysis problem of discrete-time switched linear systems with ranged dwell time. A novel concept called L-switching-cycle is proposed, which contains sequences of multiple activation cycles satisfying the prescribed ranged dwell time constraint. Based on L-switching-cycle, two sufficient conditions are proposed to ensure the global uniform asymptotic stability of discrete-time switched linear systems. It is noted that two conditions are equivalent in stability analysis with the same $L$-switching-cycle. These two sufficient conditions can be viewed as generalizations of the clock-dependent Lyapunov and multiple Lyapunov function methods, respectively. Furthermore, it has been proven that the proposed L-switching-cycle can eventually achieve the nonconservativeness in stability analysis as long as a sufficiently long L-switching-cycle is adopted. A numerical example is provided to illustrate our theoretical results. △ Less

Submitted 28 May, 2021; originally announced May 2021.

arXiv:2105.09683 [pdf, other]

DPN-SENet:A self-attention mechanism neural network for detection and diagnosis of COVID-19 from chest x-ray images

Authors: Bo Cheng, Ruhui Xue, Hang Yang, Laili Zhu, Wei Xiang

Abstract: Background and Objective: The new type of coronavirus is also called COVID-19. It began to spread at the end of 2019 and has now spread across the world. Until October 2020, It has infected around 37 million people and claimed about 1 million lives. We propose a deep learning model that can help radiologists and clinicians use chest X-rays to diagnose COVID-19 cases and show the diagnostic feature… ▽ More Background and Objective: The new type of coronavirus is also called COVID-19. It began to spread at the end of 2019 and has now spread across the world. Until October 2020, It has infected around 37 million people and claimed about 1 million lives. We propose a deep learning model that can help radiologists and clinicians use chest X-rays to diagnose COVID-19 cases and show the diagnostic features of pneumonia. Methods: The approach in this study is: 1) we propose a data enhancement method to increase the diversity of the data set, thereby improving the generalization performance of the model. 2) Our deep convolution neural network model DPN-SE adds a self-attention mechanism to the DPN network. The addition of a self-attention mechanism has greatly improved the performance of the network. 3) Use the Lime interpretable library to mark the feature regions on the X-ray medical image that helps doctors more quickly diagnose COVID-19 in people. Results: Under the same network model, the data with and without data enhancement is put into the model for training respectively. At last, comparing two experimental results: among the 10 network models with different structures, 7 network models have improved their effects after using data enhancement, with an average improvement of 1% in recognition accuracy. We propose that the accuracy and recall rates of the DPN-SE network are 93% and 98% of cases (COVID vs. pneumonia bacteria vs. viral pneumonia vs. normal). Compared with the original DPN, the respective accuracy is improved by 2%. Conclusion: The data augmentation method we used has achieved effective results on a small amount of data set, showing that a reasonable data augmentation method can improve the recognition accuracy without changing the sample size and model structure. Overall, the proposed method and model can effectively become a very useful tool for clinical radiologists. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Comments: 11 pages, 7 figures

arXiv:2105.09596 [pdf, other]

AGSFCOS: Based on attention mechanism and Scale-Equalizing pyramid network of object detection

Authors: Li Wang, Wei Xiang, Ruhui Xue, Kaida Zou, Laili Zhu

Abstract: Recently, the anchor-free object detection model has shown great potential for accuracy and speed to exceed anchor-based object detection. Therefore, two issues are mainly studied in this article: (1) How to let the backbone network in the anchor-free object detection model learn feature extraction? (2) How to make better use of the feature pyramid network? In order to solve the above problems, Ex… ▽ More Recently, the anchor-free object detection model has shown great potential for accuracy and speed to exceed anchor-based object detection. Therefore, two issues are mainly studied in this article: (1) How to let the backbone network in the anchor-free object detection model learn feature extraction? (2) How to make better use of the feature pyramid network? In order to solve the above problems, Experiments show that our model has a certain improvement in accuracy compared with the current popular detection models on the COCO dataset, the designed attention mechanism module can capture contextual information well, improve detection accuracy, and use sepc network to help balance abstract and detailed information, and reduce the problem of semantic gap in the feature pyramid network. Whether it is anchor-based network model YOLOv3, Faster RCNN, or anchor-free network model Foveabox, FSAF, FCOS. Our optimal model can get 39.5% COCO AP under the background of ResNet50. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Comments: 9 pages,9 figures

arXiv:2102.10371 [pdf, other]

doi 10.1109/LWC.2020.2969157

Denoising Higher-order Moments for Blind Digital Modulation Identification in Multiple-antenna Systems

Authors: Sofiane Kharbech, Eric Pierre Simon, Akram Belazi, Wei Xiang

Abstract: The paper proposes a new technique that substantially improves blind digital modulation identification (DMI) algorithms that are based on higher-order statistics (HOS). The proposed technique takes advantage of noise power estimation to make an offset on higher-order moments (HOM), thus getting an estimate of noise-free HOM. When tested for multiple-antenna systems, the proposed method outperforms… ▽ More The paper proposes a new technique that substantially improves blind digital modulation identification (DMI) algorithms that are based on higher-order statistics (HOS). The proposed technique takes advantage of noise power estimation to make an offset on higher-order moments (HOM), thus getting an estimate of noise-free HOM. When tested for multiple-antenna systems, the proposed method outperforms other DMI algorithms, in terms of identification accuracy, that are based only on cumulants or do not consider HOM denoising, even for a receiver with impairments. The improvement is achieved with the same order of complexity of the common HOS-based DMI algorithms in the same context. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Journal ref: IEEE Wireless Communications Letters, vol. 9, no. 6, pp. 765-769, June 2020

Showing 51–100 of 188 results for author: Xiang, W