Skip to main content

Showing 1–50 of 94 results for author: Wen, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08148  [pdf, other

    cs.LG cs.AI

    Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation

    Authors: Shuyu Yin, Fei Wen, Peilin Liu, Tao Luo

    Abstract: Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualizatio… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2405.16265  [pdf, other

    cs.LG

    MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

    Authors: Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Qianyi Sun, Boxing Chen, Dong Li, Xu He, Quan He, Feng Wen, Jianye Hao, Jun Yao

    Abstract: Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datase… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  3. arXiv:2403.08108  [pdf, other

    cs.CV

    TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection

    Authors: Hanning Chen, Wenjun Huang, Yang Ni, Sanggeon Yun, Fei Wen, Hugo Latapie, Mohsen Imani

    Abstract: Task-oriented object detection aims to find objects suitable for accomplishing specific tasks. As a challenging task, it requires simultaneous visual data processing and reasoning under ambiguous semantics. Recent solutions are mainly all-in-one models. However, the object detection backbones are pre-trained without text supervision. Thus, to incorporate task requirements, their intricate models u… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  4. arXiv:2403.05763  [pdf, other

    cs.AR cs.AI cs.LG

    HDReason: Algorithm-Hardware Codesign for Hyperdimensional Knowledge Graph Reasoning

    Authors: Hanning Chen, Yang Ni, Ali Zakeri, Zhuowen Zou, Sanggeon Yun, Fei Wen, Behnam Khaleghi, Narayan Srinivasa, Hugo Latapie, Mohsen Imani

    Abstract: In recent times, a plethora of hardware accelerators have been put forth for graph learning applications such as vertex classification and graph classification. However, previous works have paid little attention to Knowledge Graph Completion (KGC), a task that is well-known for its significantly higher algorithm complexity. The state-of-the-art KGC solutions based on graph convolution neural netwo… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2402.16899  [pdf, other

    cs.LG cs.AI

    A priori Estimates for Deep Residual Network in Continuous-time Reinforcement Learning

    Authors: Shuyu Yin, Qixuan Zhou, Fei Wen, Tao Luo

    Abstract: Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is a… ▽ More

    Submitted 7 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  6. arXiv:2402.08207  [pdf, other

    cs.CV

    Translating Images to Road Network:A Non-Autoregressive Sequence-to-Sequence Approach

    Authors: Jiachen Lu, Renyuan Peng, Xinyue Cai, Hang Xu, Hongyang Li, Feng Wen, Wei Zhang, Li Zhang

    Abstract: The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Exi… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: ICCV 2023 Oral Presentation

  7. arXiv:2401.17609  [pdf, other

    cs.CV

    LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

    Authors: Renyuan Peng, Xinyue Cai, Hang Xu, Jiachen Lu, Feng Wen, Wei Zhang, Li Zhang

    Abstract: Understanding road structures is crucial for autonomous driving. Intricate road structures are often depicted using lane graphs, which include centerline curves and connections forming a Directed Acyclic Graph (DAG). Accurate extraction of lane graphs relies on precisely estimating vertex and edge information within the DAG. Recent research highlights Transformer-based language models' impressive… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: AAAI 2024

  8. arXiv:2401.15987  [pdf, other

    cs.CV

    Hand-Centric Motion Refinement for 3D Hand-Object Interaction via Hierarchical Spatial-Temporal Modeling

    Authors: Yuze Hao, Jianrong Zhang, Tao Zhuo, Fuan Wen, Hehe Fan

    Abstract: Hands are the main medium when people interact with the world. Generating proper 3D motion for hand-object interaction is vital for applications such as virtual reality and robotics. Although grasp tracking or object manipulation synthesis can produce coarse hand motion, this kind of motion is inevitably noisy and full of jitter. To address this problem, we propose a data-driven method for coarse… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  9. arXiv:2401.07218  [pdf, other

    cs.CV

    Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency

    Authors: Junyu Zhu, Lina Liu, Bofeng Jiang, Feng Wen, Hongbo Zhang, Wanlong Li, Yong Liu

    Abstract: An event camera is a novel vision sensor that can capture per-pixel brightness changes and output a stream of asynchronous ``events''. It has advantages over conventional cameras in those scenes with high-speed motions and challenging lighting conditions because of the high temporal resolution, high dynamic range, low bandwidth, low power consumption, and no motion blur. Therefore, several supervi… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Accepted by IROS2023

  10. arXiv:2311.13361  [pdf, other

    cs.AI cs.HC eess.SY

    Applying Large Language Models to Power Systems: Potential Security Threats

    Authors: Jiaqi Ruan, Gaoqi Liang, Huan Zhao, Guolong Liu, Xianzhuo Sun, **g Qiu, Zhao Xu, Fushuan Wen, Zhao Yang Dong

    Abstract: Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and d… ▽ More

    Submitted 24 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

  11. arXiv:2310.11326  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Channel Estimation by Exploiting Dual Timescales for Delay-Doppler Alignment Modulation

    Authors: Zhiqiang Xiao, Yong Zeng, Fuxi Wen, Zaichen Zhang, Derrick Wing Kwan Ng

    Abstract: For integrated sensing and communication (ISAC) systems, the channel information essential for communication and sensing tasks fluctuates across different timescales. Specifically, wireless sensing primarily focuses on acquiring path state information (PSI) (e.g., delay, angle, and Doppler) of individual multi-path components to sense the environment, which usually evolves much more slowly than th… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  12. arXiv:2309.04132  [pdf, other

    cs.SD eess.AS

    A Two-Stage Training Framework for Joint Speech Compression and Enhancement

    Authors: Jiayi Huang, Zeyu Yan, Wenbin Jiang, Fei Wen

    Abstract: This paper considers the joint compression and enhancement problem for speech signal in the presence of noise. Recently, the SoundStream codec, which relies on end-to-end joint training of an encoder-decoder pair and a residual vector quantizer by a combination of adversarial and reconstruction losses,has shown very promising performance, especially in subjective perception quality. In this work,… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  13. arXiv:2308.14525  [pdf, other

    cs.CV

    Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

    Authors: Junyu Zhu, Lina Liu, Yu Tang, Feng Wen, Wanlong Li, Yong Liu

    Abstract: Visual bird's eye view (BEV) semantic segmentation helps autonomous vehicles understand the surrounding environment only from images, including static elements (e.g., roads) and dynamic elements (e.g., vehicles, pedestrians). However, the high cost of annotation procedures of full-supervised methods limits the capability of the visual BEV semantic segmentation, which usually needs HD maps, 3D obje… ▽ More

    Submitted 26 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by ICRA2024

  14. arXiv:2305.00273  [pdf, other

    cs.CV eess.IV

    Sparsity-Aware Optimal Transport for Unsupervised Restoration Learning

    Authors: Fei Wen, Wei Wang, Wenxian Yu

    Abstract: Recent studies show that, without any prior model, the unsupervised restoration learning problem can be optimally formulated as an optimal transport (OT) problem, which has shown promising performance on denoising tasks to approach the performance of supervised methods. However, it still significantly lags behind state-of-the-art supervised methods on complex restoration tasks such as super-resolu… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

    Comments: 15 pages, 9 figures

  15. arXiv:2304.12033  [pdf, other

    cs.RO cs.MA

    A Spatial Calibration Method for Robust Cooperative Perception

    Authors: Zhiying Song, Tenghui Xie, Hailiang Zhang, Jiaxin Liu, Fuxi Wen, Jun Li

    Abstract: Cooperative perception is a promising technique for intelligent and connected vehicles through vehicle-to-everything (V2X) cooperation, provided that accurate pose information and relative pose transforms are available. Nevertheless, obtaining precise positioning information often entails high costs associated with navigation systems. {Hence, it is required to calibrate relative pose information f… ▽ More

    Submitted 22 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

  16. arXiv:2304.10440  [pdf, other

    cs.CV

    OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Map**

    Authors: Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Pei** Jia, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, ** Luo, Junchi Yan, Wei Zhang, Hongyang Li

    Abstract: Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments. However, existing benchmarks tend to oversimplify the scene by solely focusing on lane perception tasks. Observing that human drivers rely on both lanes and traffic signals to operate their vehicles safely, we present OpenLane-V2, the first dataset on topology reasoning for tra… ▽ More

    Submitted 28 October, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted by NeurIPS 2023 Track on Datasets and Benchmarks | OpenLane-V2 Dataset: https://github.com/OpenDriveLab/OpenLane-V2

  17. arXiv:2304.05146  [pdf, other

    cs.CV cs.RO

    Loop Closure Detection Based on Object-level Spatial Layout and Semantic Consistency

    Authors: Xingwu Ji, Peilin Liu, Haochen Niu, Xiang Chen, Rendong Ying, Fei Wen

    Abstract: Visual simultaneous localization and map** (SLAM) systems face challenges in detecting loop closure under the circumstance of large viewpoint changes. In this paper, we present an object-based loop closure detection method based on the spatial layout and semanic consistency of the 3D scene graph. Firstly, we propose an object-level data association approach based on the semantic information from… ▽ More

    Submitted 14 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

  18. HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching

    Authors: Yiheng Li, Canhui Tang, Runzhao Yao, Aixue Ye, Feng Wen, Shaoyi Du

    Abstract: Patch-to-point matching has become a robust way of point cloud registration. However, previous patch-matching methods employ superpoints with poor localization precision as nodes, which may lead to ambiguous patch partitions. In this paper, we propose a HybridPoint-based network to find more robust and accurate correspondences. Firstly, we propose to use salient points with prominent local feature… ▽ More

    Submitted 23 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME), 2023

  19. arXiv:2301.08414  [pdf, other

    cs.CV cs.AI

    FG-Depth: Flow-Guided Unsupervised Monocular Depth Estimation

    Authors: Junyu Zhu, Lina Liu, Yong Liu, Wanlong Li, Feng Wen, Hongbo Zhang

    Abstract: The great potential of unsupervised monocular depth estimation has been demonstrated by many works due to low annotation cost and impressive accuracy comparable to supervised methods. To further improve the performance, recent works mainly focus on designing more complex network structures and exploiting extra supervised information, e.g., semantic segmentation. These methods optimize the models b… ▽ More

    Submitted 7 February, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: Accepted by ICRA2023

  20. arXiv:2212.08062  [pdf, other

    cs.CV

    MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

    Authors: Bowen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen

    Abstract: In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects. First, as opposed to interpolating from sparse flow, we claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields. Second, inspired by face-swap** methods, we adaptively fuse the source identity during synthesis, so that the network better pre… ▽ More

    Submitted 26 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: CVPR 2023, project page: https://meta-portrait.github.io

  21. arXiv:2212.06138  [pdf, other

    cs.CV cs.LG

    CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

    Authors: Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Shuyang Gu, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

    Abstract: Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory. In this paper, we identify that fine-tuning performance is significantly impacted by hyper-parameter choices. We examine various key hyper-parameters and empirically evaluate their impact in fine-tuning CLIP for classification tasks through a… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Technical Report, code will be available at https://github.com/LightDXY/FT-CLIP

  22. arXiv:2212.06135  [pdf, other

    cs.CV

    Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

    Authors: Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, **g**g Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

    Abstract: This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Ro… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Project Webpage: https://3d-avatar-diffusion.microsoft.com/

  23. arXiv:2212.03863  [pdf, other

    cs.CV cs.LG

    X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

    Authors: Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

    Abstract: Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous wor… ▽ More

    Submitted 31 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: ICML 2023, code is available at https://github.com/yoctta/XPaste

  24. arXiv:2211.13227  [pdf, other

    cs.CV

    Paint by Example: Exemplar-based Image Editing with Diffusion Models

    Authors: Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xue** Chen, Xiaoyan Sun, Dong Chen, Fang Wen

    Abstract: Language-guided image editing has achieved great success recently. In this paper, for the first time, we investigate exemplar-guided image editing for more precise control. We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar. However, the naive approach will cause obvious fusing artifacts. We carefully analyze it and propose… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Code: https://github.com/Fantasy-Studio/Paint-by-Example

  25. arXiv:2210.06289  [pdf, other

    cs.MA cs.CV cs.RO

    A Cooperative Perception System Robust to Localization Errors

    Authors: Zhiying Song, Fuxi Wen, Hailiang Zhang, Jun Li

    Abstract: Cooperative perception is challenging for safety-critical autonomous driving applications.The errors in the shared position and pose cause an inaccurate relative transform estimation and disrupt the robust map** of the Ego vehicle. We propose a distributed object-level cooperative perception system called OptiMatch, in which the detected 3D bounding boxes and local state information are shared b… ▽ More

    Submitted 25 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE IV 2023

  26. arXiv:2209.05434  [pdf, other

    cs.CV

    3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation

    Authors: Junshu Tang, Bo Zhang, Binxin Yang, Ting Zhang, Dong Chen, Lizhuang Ma, Fang Wen

    Abstract: In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network th… ▽ More

    Submitted 17 October, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: Project webpage: https://junshutang.github.io/control/index.html

  27. arXiv:2208.12262  [pdf, other

    cs.CV

    MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

    Authors: Xiaoyi Dong, Jianmin Bao, Yinglin Zheng, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

    Abstract: This paper presents a simple yet effective framework MaskCLIP, which incorporates a newly proposed masked self-distillation into contrastive language-image pretraining. The core idea of masked self-distillation is to distill representation from a full image to the representation predicted from a masked image. Such incorporation enjoys two vital benefits. First, masked self-distillation targets loc… ▽ More

    Submitted 9 April, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: CVPR 2023, code is available at https://github.com/LightDXY/MaskCLIP

  28. arXiv:2207.07116  [pdf, other

    cs.CV cs.LG

    Bootstrapped Masked Autoencoders for Vision BERT Pretraining

    Authors: Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

    Abstract: We propose bootstrapped masked autoencoders (BootMAE), a new approach for vision BERT pretraining. BootMAE improves the original masked autoencoders (MAE) with two core designs: 1) momentum encoder that provides online feature as extra BERT prediction targets; 2) target-aware decoder that tries to reduce the pressure on the encoder to memorize target-specific information in BERT pretraining. The f… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: ECCV 2022, code is available at https://github.com/LightDXY/BootMAE

  29. arXiv:2206.10082  [pdf, other

    cs.CV eess.IV

    Optimally Controllable Perceptual Lossy Compression

    Authors: Zeyu Yan, Fei Wen, Peilin Liu

    Abstract: Recent studies in lossy compression show that distortion and perceptual quality are at odds with each other, which put forward the tradeoff between distortion and perception (D-P). Intuitively, to attain different perceptual quality, different decoders have to be trained. In this paper, we present a nontrivial finding that only two decoders are sufficient for optimally achieving arbitrary (an infi… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  30. arXiv:2205.16007  [pdf, other

    cs.CV

    Improved Vector Quantized Diffusion Models

    Authors: Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

    Abstract: Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to the flawed sampling strategy. In this paper, we propose two important techniques to further improve the sample quality of VQ-Diffusion. 1) We explore classifier-… ▽ More

    Submitted 8 February, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: update reference

  31. arXiv:2205.12952  [pdf, other

    cs.CV

    Pretraining is All You Need for Image-to-Image Translation

    Authors: Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen

    Abstract: We propose to use pretraining to boost general image-to-image translation. Prior image-to-image translation methods usually need dedicated architectural design and train individual translation models from scratch, struggling for high-quality generation of complex scenes, especially when paired training data are not abundant. In this paper, we regard each image-to-image translation problem as a dow… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: Project Page: https://tengfei-wang.github.io/PITI/index.html

  32. arXiv:2204.11820  [pdf, other

    cs.CV

    Real-Time Neural Character Rendering with Pose-Guided Multiplane Images

    Authors: Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen

    Abstract: We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality. We use a portable camera rig to capture the multi-view images along with the driving signal for the moving subject. Our method generalizes the image-to-image translation paradigm, which translates the human pose to a 3D scene representation -- MPIs that can b… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: Project website: https://ken-ouyang.github.io/cmpi/index.html

  33. arXiv:2203.16533  [pdf, other

    cs.CV

    Large-Scale Pre-training for Person Re-identification with Noisy Labels

    Authors: Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

    Abstract: This paper aims to address the problem of pre-training for person re-identification (Re-ID) with noisy labels. To setup the pre-training task, we apply a simple online multi-object tracking system on raw videos of an existing unlabeled Re-ID dataset "LUPerson" nd build the Noisy Labeled variant called "LUPerson-NL". Since theses ID labels automatically derived from tracklets inevitably contain noi… ▽ More

    Submitted 4 April, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: To Appear at CVPR 2022, code is available at https://github.com/DengpanFu/LUPerson-NL

  34. arXiv:2203.15241  [pdf, other

    cs.CV

    Semi-Supervised Image-to-Image Translation using Latent Space Map**

    Authors: Pan Zhang, Jianmin Bao, Ting Zhang, Dong Chen, Fang Wen

    Abstract: Recent image-to-image translation works have been transferred from supervised to unsupervised settings due to the expensive cost of capturing or labeling large amounts of paired data. However, current unsupervised methods using the cycle-consistency constraint may not find the desired map**, especially for difficult translation tasks. On the other hand, a small number of paired data are usually… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

  35. arXiv:2203.01318  [pdf, other

    cs.CV

    Protecting Celebrities from DeepFake with Identity Consistency Transformer

    Authors: Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

    Abstract: In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions. The Identity Consistency Transformer incorporates a consistency loss for identity consistency determination. We show that Identity Cons… ▽ More

    Submitted 5 April, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: To Appear at CVPR 2022, code is available at https://github.com/LightDXY/ICT_DeepFake

  36. arXiv:2112.10762  [pdf, other

    cs.CV

    StyleSwin: Transformer-based GAN for High-resolution Image Generation

    Authors: Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo

    Abstract: Despite the tantalizing success in a broad of vision tasks, transformers have not yet demonstrated on-par ability as ConvNets in high-resolution image generative modeling. In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. To this end, we believe that local attention is crucial to strike the balance between compu… ▽ More

    Submitted 20 July, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: CVPR 2022

  37. arXiv:2112.07368  [pdf, other

    cs.LG cs.AI cs.CV

    Simple and Robust Loss Design for Multi-Label Learning with Missing Labels

    Authors: Youcai Zhang, Yuhao Cheng, Xinyu Huang, Fei Wen, Rui Feng, Yaqian Li, Yandong Guo

    Abstract: Multi-label learning in the presence of missing labels (MLML) is a challenging problem. Existing methods mainly focus on the design of network structures or training schemes, which increase the complexity of implementation. This work seeks to fulfill the potential of loss function in MLML without increasing the procedure and complexity. Toward this end, we propose two simple yet effective methods… ▽ More

    Submitted 27 December, 2021; v1 submitted 13 December, 2021; originally announced December 2021.

  38. arXiv:2112.03109  [pdf, other

    cs.CV cs.CL

    General Facial Representation Learning in a Visual-Linguistic Manner

    Authors: Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen

    Abstract: How to learn a universal facial representation that boosts all face analysis tasks? This paper takes one step toward this goal. In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner. On one hand, the framework involves a contrastive loss to learn… ▽ More

    Submitted 1 April, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: CVPR2022 Oral; 16 pages, 6 figures, 14 tables

  39. arXiv:2111.14822  [pdf, other

    cs.CV cs.LG

    Vector Quantized Diffusion Model for Text-to-Image Synthesis

    Authors: Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo

    Abstract: We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not… ▽ More

    Submitted 3 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

  40. arXiv:2111.12710  [pdf, other

    cs.CV cs.LG

    PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

    Authors: Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu, Baining Guo

    Abstract: This paper explores a better prediction target for BERT pre-training of vision transformers. We observe that current prediction targets disagree with human perception judgment.This contradiction motivates us to learn a perceptual prediction target. We argue that perceptually similar images should stay close to each other in the prediction target space. We surprisingly find one simple yet effective… ▽ More

    Submitted 7 December, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: To appear at AAAI 2023

  41. arXiv:2109.11453  [pdf, other

    cs.CV

    Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds

    Authors: Xuemeng Yang, Hao Zou, Xin Kong, Tianxin Huang, Yong Liu, Wanlong Li, Feng Wen, Hongbo Zhang

    Abstract: Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completion tasks, the relationship between them is worth ex… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: 8 pages, 4 figures, accepted by The 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems

  42. arXiv:2108.06693  [pdf, other

    cs.CV

    Exploring Temporal Coherence for More General Video Face Forgery Detection

    Authors: Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, Fang Wen

    Abstract: Although current face manipulation techniques achieve impressive performance regarding quality and controllability, they are struggling to generate temporal coherent face videos. In this work, we explore to take full advantage of the temporal coherence for video face forgery detection. To achieve this, we propose a novel end-to-end framework, which consists of two major stages. The first stage is… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV 2021

  43. arXiv:2108.06337  [pdf, other

    cs.CV

    Dual Path Learning for Domain Adaptation of Semantic Segmentation

    Authors: Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Fang Wen, Wenqiang Zhang

    Abstract: Domain adaptation for semantic segmentation enables to alleviate the need for large-scale pixel-wise annotations. Recently, self-supervised learning (SSL) with a combination of image-to-image translation shows great effectiveness in adaptive segmentation. The most common practice is to perform SSL along with image translation to well align a single domain (the source or target). However, in this s… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV 2021

  44. arXiv:2108.02574  [pdf, other

    eess.IV cs.CV cs.LG

    Optimal Transport for Unsupervised Denoising Learning

    Authors: Wei Wang, Fei Wen, Zeyu Yan, Peilin Liu

    Abstract: Recently, much progress has been made in unsupervised denoising learning. However, existing methods more or less rely on some assumptions on the signal and/or degradation model, which limits their practical performance. How to construct an optimal criterion for unsupervised denoising learning without any prior knowledge on the degradation model is still an open question. Toward answering this ques… ▽ More

    Submitted 28 April, 2022; v1 submitted 3 August, 2021; originally announced August 2021.

    Comments: Published in IEEE TPAMI, DOI: 10.1109/TPAMI.2022.3170155, https://ieeexplore.ieee.org/document/9763342 (40 pages, 33 figures)

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) 1-1

  45. arXiv:2106.11516  [pdf, other

    cs.RO cs.AI cs.CV

    SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure

    Authors: Lin Li, Xin Kong, Xiangrui Zhao, Wanlong Li, Feng Wen, Hongbo Zhang, Yong Liu

    Abstract: LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure detection is still an open issue. With the development of 3D semantic segmentation for point cloud, semantic information can be obtained conveniently and steadily, essential for high-level intelligence and conductive to SLAM. In this paper, we present a novel semantic-aided LiDAR SLAM with loop closu… ▽ More

    Submitted 1 July, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: 8 pages. Accepted by ICRA-2021

  46. arXiv:2106.02782  [pdf, other

    cs.IT cs.AI

    On Perceptual Lossy Compression: The Cost of Perceptual Reconstruction and An Optimal Training Framework

    Authors: Zeyu Yan, Fei Wen, Rendong Ying, Chao Ma, Peilin Liu

    Abstract: Lossy compression algorithms are typically designed to achieve the lowest possible distortion at a given bit rate. However, recent studies show that pursuing high perceptual quality would lead to increase of the lowest achievable distortion (e.g., MSE). This paper provides nontrivial results theoretically revealing that, \textit{1}) the cost of achieving perfect perception quality is exactly a dou… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: ICML 2021

    Report number: Accepted by ICML 2021

  47. arXiv:2106.00609  [pdf, other

    cs.CV

    Robust Mutual Learning for Semi-supervised Semantic Segmentation

    Authors: Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Fang Wen

    Abstract: Recent semi-supervised learning (SSL) methods are commonly based on pseudo labeling. Since the SSL performance is greatly influenced by the quality of pseudo labels, mutual learning has been proposed to effectively suppress the noises in the pseudo supervision. In this work, we propose robust mutual learning that improves the prior approach in two aspects. First, the vanilla mutual learners suffer… ▽ More

    Submitted 26 December, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  48. arXiv:2105.11085  [pdf

    cs.LG cs.DC

    Fed-NILM: A Federated Learning-based Non-Intrusive Load Monitoring Method for Privacy-Protection

    Authors: Hai** Wang, Caomingzhe Si, Junhua Zhao, Guolong Liu, Fushuan Wen

    Abstract: Non-intrusive load monitoring (NILM) is essential for understanding customer's power consumption patterns and may find wide applications like carbon emission reduction and energy conservation. The training of NILM models requires massive load data containing different types of appliances. However, inadequate load data and the risk of power consumer privacy breaches may be encountered by local data… ▽ More

    Submitted 25 June, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

  49. arXiv:2103.15814  [pdf, other

    cs.CV

    High-Fidelity and Arbitrary Face Editing

    Authors: Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, Zhouhui Lian

    Abstract: Cycle consistency is widely used for face editing. However, we observe that the generator tends to find a tricky way to hide information from the original image to satisfy the constraint of cycle consistency, making it impossible to maintain the rich details (e.g., wrinkles and moles) of non-editing areas. In this work, we propose a simple yet effective method named HifaFace to address the above-m… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  50. arXiv:2103.04586  [pdf, other

    cs.SE

    Siri, Write the Next Method

    Authors: Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, Gabriele Bavota

    Abstract: Code completion is one of the killer features of Integrated Development Environments (IDEs), and researchers have proposed different methods to improve its accuracy. While these techniques are valuable to speed up code writing, they are limited to recommendations related to the next few tokens a developer is likely to type given the current context. In the best case, they can recommend a few APIs… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021)