Skip to main content

Showing 1–50 of 57 results for author: Pei, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18958  [pdf, other

    cs.CV

    AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

    Authors: Yanan Sun, Yanchen Liu, Yinhao Tang, Wenjie Pei, Kai Chen

    Abstract: The field of text-to-image (T2I) generation has made significant progress in recent years, largely driven by advancements in diffusion models. Linguistic control enables effective content creation, but struggles with fine-grained control over image generation. This challenge has been explored, to a great extent, by incorporating additional user-supplied spatial conditions, such as depth maps and e… ▽ More

    Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2405.09185  [pdf, other

    cs.SI cs.NE

    Influence Maximization in Hypergraphs Using A Genetic Algorithm with New Initialization and Evaluation Methods

    Authors: Xilong Qu, Wenbin Pei, Yingchao Yang, Xirong Xu, Renquan Zhang, Qiang Zhang

    Abstract: Influence maximization (IM) is a crucial optimization task related to analyzing complex networks in the real world, such as social networks, disease propagation networks, and marketing networks. Publications to date about the IM problem focus mainly on graphs, which fail to capture high-order interaction relationships from the real world. Therefore, the use of hypergraphs for addressing the IM pro… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  3. arXiv:2404.10322  [pdf, other

    cs.CV

    Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation

    Authors: Jiapeng Su, Qi Fan, Guangming Lu, Fanglin Chen, Wenjie Pei

    Abstract: Few-shot semantic segmentation (FSS) has achieved great success on segmenting objects of novel classes, supported by only a few annotated samples. However, existing FSS methods often underperform in the presence of domain shifts, especially when encountering new domain styles that are unseen during training. It is suboptimal to directly adapt or generalize the entire model to new domains in the fe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  4. arXiv:2402.00404  [pdf, other

    cs.NE

    Improving Critical Node Detection Using Neural Network-based Initialization in a Genetic Algorithm

    Authors: Chanjuan Liu, Shike Ge, Zhihan Chen, Wenbin Pei, Enqiang Zhu, Yi Mei, Hisao Ishibuchi

    Abstract: The Critical Node Problem (CNP) is concerned with identifying the critical nodes in a complex network. These nodes play a significant role in maintaining the connectivity of the network, and removing them can negatively impact network performance. CNP has been studied extensively due to its numerous real-world applications. Among the different versions of CNP, CNP-1a has gained the most popularity… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 14 pages, 13 figures

  5. arXiv:2401.00755  [pdf, other

    cs.LG

    Saliency-Aware Regularized Graph Neural Network

    Authors: Wenjie Pei, Weina Xu, Zongze Wu, Weichao Li, **fan Wang, Guangming Lu, Xiangrong Wang

    Abstract: The crux of graph classification lies in the effective representation learning for the entire graph. Typical graph neural networks focus on modeling the local dependencies when aggregating features of neighboring nodes, and obtain the representation for the entire graph by aggregating node features. Such methods have two potential limitations: 1) the global node saliency w.r.t. graph classificatio… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted by Artificial Intelligence Journal with minor revision

  6. arXiv:2312.10608  [pdf, other

    cs.CV

    Robust 3D Tracking with Quality-Aware Shape Completion

    Authors: **gwen Zhang, Zikun Zhou, Guangming Lu, Jiandong Tian, Wenjie Pei

    Abstract: 3D single object tracking remains a challenging problem due to the sparsity and incompleteness of the point clouds. Existing algorithms attempt to address the challenges in two strategies. The first strategy is to learn dense geometric features based on the captured sparse point cloud. Nevertheless, it is quite a formidable task since the learned dense geometric features are with high uncertainty… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: A detailed version of the paper accepted by AAAI 2024

  7. arXiv:2312.10376  [pdf, other

    cs.CV

    SA$^2$VP: Spatially Aligned-and-Adapted Visual Prompt

    Authors: Wenjie Pei, Tongqi Xia, Fanglin Chen, **song Li, Jiandong Tian, Guangming Lu

    Abstract: As a prominent parameter-efficient fine-tuning technique in NLP, prompt tuning is being explored its potential in computer vision. Typical methods for visual prompt tuning follow the sequential modeling paradigm stemming from NLP, which represents an input image as a flattened sequence of token embeddings and then learns a set of unordered parameterized tokens prefixed to the sequence representati… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  8. arXiv:2312.01431  [pdf, other

    cs.CV

    D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

    Authors: Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian

    Abstract: Adapting large pre-trained image models to few-shot action recognition has proven to be an effective and efficient strategy for learning robust feature extractors, which is essential for few-shot learning. Typical fine-tuning based adaptation paradigm is prone to overfitting in the few-shot learning scenarios and offers little modeling flexibility for learning temporal features in video data. In t… ▽ More

    Submitted 20 April, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

  9. arXiv:2308.14061  [pdf, other

    cs.CV

    Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection

    Authors: Xin Feng, Yifeng Xu, Guangming Lu, Wenjie Pei

    Abstract: Effective image restoration with large-size corruptions, such as blind image inpainting, entails precise detection of corruption region masks which remains extremely challenging due to diverse shapes and patterns of corruptions. In this work, we present a novel method for automatic corruption detection, which allows for blind corruption restoration without known corruption masks. Specifically, we… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  10. arXiv:2308.05104  [pdf, other

    cs.CV

    Scene-Generalizable Interactive Segmentation of Radiance Fields

    Authors: Songlin Tang, Wenjie Pei, Xin Tao, Tanghui Jia, Guangming Lu, Yu-Wing Tai

    Abstract: Existing methods for interactive segmentation in radiance fields entail scene-specific optimization and thus cannot generalize across different scenes, which greatly limits their applicability. In this work we make the first attempt at Scene-Generalizable Interactive Segmentation in Radiance Fields (SGISRF) and propose a novel SGISRF method, which can perform 3D object segmentation for novel (unse… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  11. arXiv:2308.03529  [pdf, other

    cs.CV

    Feature Decoupling-Recycling Network for Fast Interactive Segmentation

    Authors: Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie Pei

    Abstract: Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input without considering the invariant nature of the source image. As a result, extracting features from the source image is repeated in each interaction, resulting in substantial computational redundancy. In this work, we propose the Feature Decoupling-Recycling Network (FDRN… ▽ More

    Submitted 8 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023

  12. arXiv:2308.03177  [pdf, other

    cs.CV

    Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement

    Authors: Zhenhua Ning, Zhuotao Tian, Guangming Lu, Wenjie Pei

    Abstract: Although extensive research has been conducted on 3D point cloud segmentation, effectively adapting generic models to novel categories remains a formidable challenge. This paper proposes a novel approach to improve point cloud few-shot segmentation (PC-FSS) models. Unlike existing PC-FSS methods that directly utilize categorical information from support prototypes to recognize novel classes in que… ▽ More

    Submitted 8 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023

  13. arXiv:2303.14384  [pdf, other

    cs.CV

    Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation

    Authors: Zikun Zhou, Kaige Mao, Wenjie Pei, Hongpeng Wang, Yaowei Wang, Zhenyu He

    Abstract: This paper aims to solve the video object segmentation (VOS) task in a scribble-supervised manner, in which VOS models are not only trained by the sparse scribble annotations but also initialized with the sparse target scribbles for inference. Thus, the annotation burdens for both training and initialization can be substantially lightened. The difficulties of scribble-supervised VOS lie in two asp… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: This project is available at https://github.com/mkg1204/RHMNet-for-SSVOS

  14. arXiv:2301.06690  [pdf, other

    cs.CV

    Audio2Gestures: Generating Diverse Gestures from Audio

    Authors: **g Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

    Abstract: People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one map**, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during infe… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2108.06720

  15. arXiv:2212.01131  [pdf, other

    cs.CV

    Activating the Discriminability of Novel Classes for Few-shot Segmentation

    Authors: Dianwen Mei, Wei Zhuo, Jiandong Tian, Guangming Lu, Wenjie Pei

    Abstract: Despite the remarkable success of existing methods for few-shot segmentation, there remain two crucial challenges. First, the feature learning for novel classes is suppressed during the training on base classes in that the novel classes are always treated as background. Thus, the semantics of novel classes are not well learned. Second, most of existing methods fail to consider the underlying seman… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  16. arXiv:2211.15143  [pdf, other

    cs.CV cs.LG

    Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations

    Authors: Bin Wang, Wenbin Pei, Bing Xue, Mengjie Zhang

    Abstract: Deep convolutional neural networks have proven their effectiveness, and have been acknowledged as the most dominant method for image classification. However, a severe drawback of deep convolutional neural networks is poor explainability. Unfortunately, in many real-world applications, users need to understand the rationale behind the predictions of deep convolutional neural networks when determini… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  17. arXiv:2211.14705  [pdf, other

    cs.CV

    Semantic-Aware Local-Global Vision Transformer

    Authors: Jiatong Zhang, Zengwei Yao, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Vision Transformers have achieved remarkable progresses, among which Swin Transformer has demonstrated the tremendous potential of Transformer for vision tasks. It surmounts the key challenge of high computational complexity by performing local self-attention within shifted windows. In this work we propose the Semantic-Aware Local-Global Vision Transformer (SALG), to further investigate two potent… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  18. arXiv:2210.16834  [pdf, other

    cs.CV

    Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid

    Authors: **g Xu, Xu Luo, Xinglin Pan, Wenjie Pei, Yanan Li, Zenglin Xu

    Abstract: Few-shot learning (FSL) targets at generalization of vision models towards unseen tasks without sufficient annotations. Despite the emergence of a number of few-shot learning methods, the sample selection bias problem, i.e., the sensitivity to the limited amount of support data, has not been well understood. In this paper, we find that this problem usually occurs when the positions of support samp… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  19. arXiv:2208.14093  [pdf, other

    cs.CV

    SSORN: Self-Supervised Outlier Removal Network for Robust Homography Estimation

    Authors: Yi Li, Wenjie Pei, Zhenyu He

    Abstract: The traditional homography estimation pipeline consists of four main steps: feature detection, feature matching, outlier removal and transformation estimation. Recent deep learning models intend to address the homography estimation problem using a single convolutional network. While these models are trained in an end-to-end fashion to simplify the homography estimation problem, they lack the featu… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  20. arXiv:2208.06162  [pdf, other

    cs.CV

    Layout-Bridging Text-to-Image Synthesis

    Authors: Jiadong Liang, Wenjie Pei, Feng Lu

    Abstract: The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image map** directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circu… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

  21. arXiv:2207.12941  [pdf, other

    cs.CV eess.IV

    Learning Generalizable Latent Representations for Novel Degradations in Super Resolution

    Authors: Fengjun Li, Xin Feng, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  22. arXiv:2207.12049  [pdf, other

    cs.CV

    Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations

    Authors: Wenjie Pei, Shuang Wu, Dianwen Mei, Fanglin Chen, Jiandong Tian, Guangming Lu

    Abstract: While fine-tuning based methods for few-shot object detection have achieved remarkable progress, a crucial challenge that has not been addressed well is the potential class-specific overfitting on base classes and sample-specific overfitting on novel classes. In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfi… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  23. arXiv:2207.11549  [pdf, other

    cs.CV

    Self-Support Few-Shot Semantic Segmentation

    Authors: Qi Fan, Wenjie Pei, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Existing few-shot segmentation methods have achieved great progress based on the support-query matching framework. But they still heavily suffer from the limited coverage of intra-class variations from the few-shot supports provided. Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  24. arXiv:2207.11184  [pdf, other

    cs.CV

    Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection

    Authors: Shuang Wu, Wenjie Pei, Dianwen Mei, Fanglin Chen, Jiandong Tian, Guangming Lu

    Abstract: Most of existing methods for few-shot object detection follow the fine-tuning paradigm, which potentially assumes that the class-agnostic generalizable knowledge can be learned and transferred implicitly from base classes with abundant samples to novel classes with limited samples via such a two-stage training strategy. However, it is not necessarily true since the object detector can hardly disti… ▽ More

    Submitted 3 November, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  25. arXiv:2207.09710  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Sequence Representations by Non-local Recurrent Neural Memory

    Authors: Wenjie Pei, Xin Feng, Canmiao Fu, Qiong Cao, Guangming Lu, Yu-Wing Tai

    Abstract: The key challenge of sequence representation learning is to capture the long-range temporal dependencies. Typical methods for supervised sequence representation learning are built upon recurrent neural networks to capture temporal dependencies. One potential limitation of these methods is that they only model one-order information interactions explicitly between adjacent time steps in a sequence,… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: To be appeared in International Journal of Computer Vision (IJCV). arXiv admin note: substantial text overlap with arXiv:1908.09535

  26. arXiv:2207.08808  [pdf, other

    cs.CV

    Global-Local Stepwise Generative Network for Ultra High-Resolution Image Restoration

    Authors: Xin Feng, Haobo Ji, Wenjie Pei, Fanglin Chen, Guangming Lu

    Abstract: While the research on image background restoration from regular size of degraded images has achieved remarkable progress, restoring ultra high-resolution (e.g., 4K) images remains an extremely challenging task due to the explosion of computational complexity and memory usage, as well as the deficiency of annotated data. In this paper we present a novel model for ultra high-resolution image restora… ▽ More

    Submitted 17 May, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2207.07253  [pdf, other

    cs.CV

    Single Shot Self-Reliant Scene Text Spotter by Decoupled yet Collaborative Detection and Recognition

    Authors: **g**g Wu, Pengyuan Lyu, Guangming Lu, Chengquan Zhang, Wenjie Pei

    Abstract: Typical text spotters follow the two-stage spotting paradigm which detects the boundary for a text instance first and then performs text recognition within the detected regions. Despite the remarkable progress of such spotting paradigm, an important limitation is that the performance of text recognition depends heavily on the precision of text detection, resulting in the potential error propagatio… ▽ More

    Submitted 7 February, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

  28. arXiv:2203.16092  [pdf, other

    cs.CV

    Global Tracking via Ensemble of Local Trackers

    Authors: Zikun Zhou, Jianqiu Chen, Wenjie Pei, Kaige Mao, Hongpeng Wang, Zhenyu He

    Abstract: The crux of long-term tracking lies in the difficulty of tracking the target with discontinuous moving caused by out-of-view or occlusion. Existing long-term tracking methods follow two typical strategies. The first strategy employs a local tracker to perform smooth tracking and uses another re-detector to detect the target when the target is lost. While it can exploit the temporal context like hi… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: 10 pages; 6 figures; accepted to CVPR2022

  29. arXiv:2112.07224  [pdf, other

    cs.CV

    Exploring Category-correlated Feature for Few-shot Image Classification

    Authors: **g Xu, Xinglin Pan, Xu Luo, Wenjie Pei, Zenglin Xu

    Abstract: Few-shot classification aims to adapt classifiers to novel classes with a few training samples. However, the insufficiency of training data may cause a biased estimation of feature distribution in a certain class. To alleviate this problem, we present a simple yet effective feature rectification method by exploring the category correlation between novel and base classes as the prior knowledge. We… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: 10 pages, 9 figures

  30. arXiv:2112.06467  [pdf, other

    cs.CV

    An Informative Tracking Benchmark

    Authors: Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang

    Abstract: Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming. Thus, a small and informative benchmark, which covers all typical challenging scenarios to facilitate assessing the tracker performance, is of great interest. In this… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 10 pages, 6 figures

  31. arXiv:2112.02279  [pdf, other

    cs.CV

    U2-Former: A Nested U-shaped Transformer for Image Restoration

    Authors: Haobo Ji, Xin Feng, Wenjie Pei, **xing Li, Guangming Lu

    Abstract: While Transformer has achieved remarkable performance in various high-level vision tasks, it is still challenging to exploit the full potential of Transformer in image restoration. The crux lies in the limited depth of applying Transformer in the typical encoder-decoder framework for image restoration, resulting from heavy self-attention computation load and inefficient communications across diffe… ▽ More

    Submitted 8 December, 2021; v1 submitted 4 December, 2021; originally announced December 2021.

  32. arXiv:2111.08974  [pdf, other

    cs.CV

    Pedestrian Detection by Exemplar-Guided Contrastive Learning

    Authors: Zebin Lin, Wenjie Pei, Fanglin Chen, David Zhang, Guangming Lu

    Abstract: Typical methods for pedestrian detection focus on either tackling mutual occlusions between crowded pedestrians, or dealing with the various scales of pedestrians. Detecting pedestrians with substantial appearance diversities such as different pedestrian silhouettes, different viewpoints or different dressing, remains a crucial challenge. Instead of learning each of these diverse pedestrian appear… ▽ More

    Submitted 9 July, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  33. arXiv:2111.04901  [pdf, other

    cs.LG cs.CV

    Label-Aware Distribution Calibration for Long-tailed Classification

    Authors: Chaozheng Wang, Shuzheng Gao, Cuiyun Gao, Pengyun Wang, Wenjie Pei, Lujia Pan, Zenglin Xu

    Abstract: Real-world data usually present long-tailed distributions. Training on imbalanced data tends to render neural networks perform well on head classes while much worse on tail classes. The severe sparseness of training instances for the tail classes is the main challenge, which results in biased distribution estimation during training. Plenty of efforts have been devoted to ameliorating the challenge… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: 9 pages

  34. arXiv:2110.04791  [pdf, other

    eess.AS cs.LG cs.SD

    Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain

    Authors: Zengwei Yao, Wenjie Pei, Fanglin Chen, Guangming Lu, David Zhang

    Abstract: The crux of single-channel speech separation is how to encode the mixture of signals into such a latent embedding space that the signals from different speakers can be precisely separated. Existing methods for speech separation either transform the speech signals into frequency domain to perform separation or seek to learn a separable embedding space by constructing a latent domain based on convol… ▽ More

    Submitted 31 January, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  35. arXiv:2110.00261  [pdf, other

    cs.CV

    Generative Memory-Guided Semantic Reasoning Model for Image Inpainting

    Authors: Xin Feng, Wenjie Pei, Fengjun Li, Fanglin Chen, David Zhang, Guangming Lu

    Abstract: Most existing methods for image inpainting focus on learning the intra-image priors from the known regions of the current input image to infer the content of the corrupted regions in the same image. While such methods perform well on images with small corrupted regions, it is challenging for these methods to deal with images with large corrupted area due to two potential limitations: 1) such metho… ▽ More

    Submitted 20 March, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: 13 pages, 10 figures

  36. arXiv:2108.06720  [pdf, other

    cs.CV

    Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

    Authors: **g Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Zhenyu He, Linchao Bao

    Abstract: Generating conversational gestures from speech audio is challenging due to the inherent one-to-many map** between audio and body motions. Conventional CNNs/RNNs assume one-to-one map**, and thus tend to predict the average of all possible target motions, resulting in plain/boring motions during inference. In order to overcome this problem, we propose a novel conditional variational autoencoder… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

  37. arXiv:2108.03637  [pdf, other

    cs.CV

    Saliency-Associated Object Tracking

    Authors: Zikun Zhou, Wenjie Pei, Xin Li, Hongpeng Wang, Feng Zheng, Zhenyu He

    Abstract: Most existing trackers based on deep learning perform tracking in a holistic strategy, which aims to learn deep representations of the whole target for localizing the target. It is arduous for such methods to track targets with various appearance variations. To address this limitation, another type of methods adopts a part-based tracking strategy which divides the target into equal patches and tra… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV 2021

  38. arXiv:2106.10900  [pdf, other

    cs.CV

    Self-Supervised Tracking via Target-Aware Data Synthesis

    Authors: Xin Li, Wenjie Pei, Yaowei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

    Abstract: While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training. To eliminate expensive and exhaustive annotation, we study self-supervised learning for visual tracking. In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data by simulating various… ▽ More

    Submitted 30 December, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: 11 pages, 7 figures, Accepted by IEEE Transactions on Neural Networks and Learning Systems

  39. arXiv:2104.07303  [pdf, other

    cs.CV

    SiamCorners: Siamese Corner Networks for Visual Tracking

    Authors: Kai Yang, Zhenyu He, Wenjie Pei, Zikun Zhou, Xin Li, Di Yuan, Haijun Zhang

    Abstract: The current Siamese network based on region proposal network (RPN) has attracted great attention in visual tracking due to its excellent accuracy and high efficiency. However, the design of the RPN involves the selection of the number, scale, and aspect ratios of anchor boxes, which will affect the applicability and convenience of the model. Furthermore, these anchor boxes require complicated calc… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  40. arXiv:2011.00490  [pdf, ps, other

    cs.IT

    On the Distribution of SINR for Widely Linear MMSE MIMO Systems with Rectilinear or Quasi-Rectilinear Signals

    Authors: Wei Deng, Yili Xia, Zhe Li, Wenjiang Pei

    Abstract: Although the widely linear least mean square error (WLMMSE) receiver has been an appealing option for multiple-input-multiple-output (MIMO) wireless systems, a statistical understanding on its pose-detection signal-to-interference-plus-noise ratio (SINR) in detail is still missing. To this end, we consider a WLMMSE MIMO transmission system with rectilinear or quasi-rectilinear (QR) signals over th… ▽ More

    Submitted 29 November, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

  41. Deep-Masking Generative Network: A Unified Framework for Background Restoration from Superimposed Images

    Authors: Xin Feng, Wenjie Pei, Zihui Jia, Fanglin Chen, David Zhang, Guangming Lu

    Abstract: Restoring the clean background from the superimposed images containing a noisy layer is the common crux of a classical category of tasks on image restoration such as image reflection removal, image deraining and image dehazing. These tasks are typically formulated and tackled individually due to the diverse and complicated appearance patterns of noise layers within the image. In this work we prese… ▽ More

    Submitted 12 April, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: 16 pages, accepted for publication in IEEE Transactions on Image Processing (TIP)

  42. A Full Second-Order Analysis of the Widely Linear MVDR Beamformer for Noncircular Signals

    Authors: Zhe Li, Rui Pu, Yili Xia, Wenjiang Pei, Danilo P. Mandic

    Abstract: A full performance analysis of the widely linear (WL) minimum variance distortionless response (MVDR) beamformer is introduced. While the WL MVDR is known to outperform its strictly linear counterpart, the Capon beamformer, for noncircular complex signals, the existing approaches provide limited physical insights, since they explicitly or implicitly omit the complementary second-order (SO) statist… ▽ More

    Submitted 29 November, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

  43. arXiv:2007.12387  [pdf, other

    cs.CV

    Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation

    Authors: Qi Fan, Lei Ke, Wenjie Pei, Chi-Keung Tang, Yu-Wing Tai

    Abstract: Partially supervised instance segmentation aims to perform learning on limited mask-annotated categories of data thus eliminating expensive and exhaustive mask annotation. The learned models are expected to be generalizable to novel categories. Existing methods either learn a transfer function from detection to segmentation, or cluster shape priors for segmenting novel categories. We propose to le… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: Accepted by ECCV 2020

  44. arXiv:2003.11228  [pdf, other

    cs.CV

    ASFD: Automatic and Scalable Face Detector

    Authors: Bin Zhang, Jian Li, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yili Xia, Wenjiang Pei, Rongrong Ji

    Abstract: In this paper, we propose a novel Automatic and Scalable Face Detector (ASFD), which is based on a combination of neural architecture search techniques as well as a new loss design. First, we propose an automatic feature enhance module named Auto-FEM by improved differential architecture search, which allows efficient multi-scale feature fusion and context enhancement. Second, we use Distance-base… ▽ More

    Submitted 31 March, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

    Comments: Ranked No.1 on WIDER Face (http://shuoyang1213.me/WIDERFACE/WiderFace_Results.html)

  45. arXiv:1912.08562  [pdf, other

    cs.CV

    CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

    Authors: Jiadong Liang, Wenjie Pei, Feng Lu

    Abstract: Typical methods for text-to-image synthesis seek to design effective generative architecture to model the text-to-image map** directly. It is fairly arduous due to the cross-modality translation. In this paper we circumvent this problem by focusing on parsing the content of both the input text and the synthesized image thoroughly to model the text-to-image consistency in the semantic level. Part… ▽ More

    Submitted 12 July, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: 18 pages,8 figures

  46. arXiv:1909.00206  [pdf, other

    cs.CV

    Push for Quantization: Deep Fisher Hashing

    Authors: Yunqiang Li, Wenjie Pei, Yufei zha, Jan van Gemert

    Abstract: Current massive datasets demand light-weight access for analysis. Discrete hashing methods are thus beneficial because they map high-dimensional data to compact binary codes that are efficient to store and process, while preserving semantic similarity. To optimize powerful deep learning methods for image hashing, gradient-based methods are required. Binary codes, however, are discrete and thus hav… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

    Comments: BMVC 2019

  47. arXiv:1908.11824  [pdf, other

    cs.CV

    Reflective Decoding Network for Image Captioning

    Authors: Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai

    Abstract: State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. In this paper, we show that vocabulary coherence between words and syntactic paradigm of sentences are also important to generate high-quality image caption. Following the conventional encoder-decoder fra… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: ICCV 2019

  48. arXiv:1908.10535  [pdf, other

    cs.CV

    Push for Center Learning via Orthogonalization and Subspace Masking for Person Re-Identification

    Authors: Weinong Wang, Wenjie Pei, Qiong Cao, Shu Liu, Yu-Wing Tai

    Abstract: Person re-identification aims to identify whether pairs of images belong to the same person or not. This problem is challenging due to large differences in camera views, lighting and background. One of the mainstream in learning CNN features is to design loss functions which reinforce both the class separation and intra-class compactness. In this paper, we propose a novel Orthogonal Center Learnin… ▽ More

    Submitted 11 December, 2019; v1 submitted 27 August, 2019; originally announced August 2019.

  49. arXiv:1908.09535  [pdf, other

    cs.CV

    Non-local Recurrent Neural Memory for Supervised Sequence Modeling

    Authors: Canmiao Fu, Wenjie Pei, Qiong Cao, Chaopeng Zhang, Yong Zhao, Xiaoyong Shen, Yu-Wing Tai

    Abstract: Typical methods for supervised sequence modeling are built upon the recurrent neural networks to capture temporal dependencies. One potential limitation of these methods is that they only model explicitly information interactions between adjacent time steps in a sequence, hence the high-order interactions between nonadjacent time steps are not fully exploited. It greatly limits the capability of m… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

    Comments: Accepted by ICCV 2019, Oral

  50. arXiv:1905.03966  [pdf, other

    cs.CV

    Memory-Attended Recurrent Network for Video Captioning

    Authors: Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai

    Abstract: Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed. A potential disadvantage of such design is that it cannot capture the multiple visual context information of a word appearing in more than one relevant videos in training data. To tackle this limitation, we propose the Memory-Attended Recurrent Network (MARN) for… ▽ More

    Submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted by CVPR 2019