Skip to main content

Showing 1–20 of 20 results for author: Zhe, X

.
  1. arXiv:2401.00374  [pdf, other

    cs.CV

    EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

    Authors: Haiyang Liu, Zihao Zhu, Giorgio Becherini, Yichen Peng, Mingyang Su, You Zhou, Xuefei Zhe, Naoya Iwamoto, Bo Zheng, Michael J. Black

    Abstract: We propose EMAGE, a framework to generate full-body human gestures from audio and masked gestures, encompassing facial, local body, hands, and global movements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a new mesh-level holistic co-speech dataset. BEAT2 combines a MoShed SMPL-X body with FLAME head parameters and further refines the modeling of head, neck, and finger movements,… ▽ More

    Submitted 30 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: Fix typos; Conflict of Interest Disclosure; CVPR Camera Ready; Project Page: https://pantomatrix.github.io/EMAGE/

  2. arXiv:2303.08658  [pdf, other

    cs.CV cs.GR

    Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

    Authors: Jiaxu Zhang, Junwu Weng, Di Kang, Fang Zhao, Shaoli Huang, Xuefei Zhe, Linchao Bao, Ying Shan, Jue Wang, Zhigang Tu

    Abstract: A good motion retargeting cannot be reached without reasonable consideration of source-target differences on both the skeleton and shape geometry levels. In this work, we propose a novel Residual RETargeting network (R2ET) structure, which relies on two neural modification modules, to adjust the source motions to fit the target skeletons and shapes progressively. In particular, a skeleton-aware mo… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  3. arXiv:2301.06690  [pdf, other

    cs.CV

    Audio2Gestures: Generating Diverse Gestures from Audio

    Authors: **g Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

    Abstract: People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one map**, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during infe… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2108.06720

  4. arXiv:2301.06059  [pdf, other

    cs.GR cs.CV

    Learning Audio-Driven Viseme Dynamics for 3D Face Animation

    Authors: Linchao Bao, Haoxian Zhang, Yue Qian, Tangli Xue, Changhai Chen, Xuefei Zhe, Di Kang

    Abstract: We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D facial animations from the input audio. Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs. The core of our approach is a novel parametric viseme fitting algorithm that utilizes phoneme priors to extract vis… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: Project page: https://linchaobao.github.io/viseme2023/

  5. arXiv:2301.04258  [pdf, other

    cs.CV

    CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

    Authors: Ye Huang, Di Kang, Liang Chen, Wen**g Jia, Xiangjian He, Lixin Duan, Xuefei Zhe, Linchao Bao

    Abstract: Semantic segmentation has recently achieved notable advances by exploiting "class-level" contextual information during learning. However, these approaches simply concatenate class-level information to pixel features to boost the pixel representation learning, which cannot fully utilize intra-class and inter-class contextual information. Moreover, these approaches learn soft class centers based on… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: Tech report, text extended from arXiv:2203.07160

  6. arXiv:2209.13204  [pdf, other

    cs.CV cs.GR

    NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System

    Authors: Weiqiang Wang, Xuefei Zhe, Qiuhong Ke, Di Kang, Tingguang Li, Ruizhi Chen, Linchao Bao

    Abstract: We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel… ▽ More

    Submitted 27 November, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  7. arXiv:2208.11948  [pdf, other

    cs.CV

    Learning to Construct 3D Building Wireframes from 3D Line Clouds

    Authors: Yicheng Luo, **g Ren, Xuefei Zhe, Di Kang, Ya**g Xu, Peter Wonka, Linchao Bao

    Abstract: Line clouds, though under-investigated in the previous work, potentially encode more compact structural information of buildings than point clouds extracted from multi-view images. In this work, we propose the first network to process line clouds for building wireframe abstraction. The network takes a line cloud as input , i.e., a nonstructural and unordered set of 3D line segments extracted from… ▽ More

    Submitted 4 November, 2022; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: 10 pages, 6 figures

  8. arXiv:2206.06715  [pdf, other

    cs.CV

    Semi-signed prioritized neural fitting for surface reconstruction from unoriented point clouds

    Authors: Runsong Zhu, Di Kang, Ka-Hei Hui, Yue Qian, Xuefei Zhe, Zhen Dong, Linchao Bao, Pheng-Ann Heng, Chi-Wing Fu

    Abstract: Reconstructing 3D geometry from \emph{unoriented} point clouds can benefit many downstream tasks. Recent shape modeling methods mostly adopt implicit neural representation to fit a signed distance field (SDF) and optimize the network by \emph{unsigned} supervision. However, these methods occasionally have difficulty in finding the coarse shape for complicated objects, especially suffering from the… ▽ More

    Submitted 14 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

  9. arXiv:2206.03128  [pdf, other

    cs.LG cs.GR

    Spatial-Temporal Adaptive Graph Convolution with Attention Network for Traffic Forecasting

    Authors: Chen Weikang, Li Yawen, Xue Zhe, Li Ang, Wu Guobin

    Abstract: Traffic forecasting is one canonical example of spatial-temporal learning task in Intelligent Traffic System. Existing approaches capture spatial dependency with a pre-determined matrix in graph convolution neural operators. However, the explicit graph structure losses some hidden representations of relationships among nodes. Furthermore, traditional graph convolution neural operators cannot aggre… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  10. arXiv:2203.12917  [pdf, other

    cs.CV cs.AI

    War**GAN: War** Multiple Uniform Priors for Adversarial 3D Point Cloud Generation

    Authors: Yingzhi Tang, Yue Qian, Qijian Zhang, Yiming Zeng, Junhui Hou, Xuefei Zhe

    Abstract: We propose War**GAN, an effective and efficient 3D point cloud generation network. Unlike existing methods that generate point clouds by directly learning the map** functions between latent codes and 3D shapes, War**-GAN learns a unified local-war** function to warp multiple identical pre-defined priors (i.e., sets of points uniformly distributed on regular 3D grids) into 3D shapes driven… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: This paper has been accepted by CVPR 2022

  11. arXiv:2203.09729  [pdf, other

    cs.CV cs.GR

    REALY: Rethinking the Evaluation of 3D Face Reconstruction

    Authors: Zenghao Chai, Haoxian Zhang, **g Ren, Di Kang, Zhengzhuo Xu, Xuefei Zhe, Chun Yuan, Linchao Bao

    Abstract: The evaluation of 3D face reconstruction results typically relies on a rigid shape alignment between the estimated 3D model and the ground-truth scan. We observe that aligning two shapes with different reference points can largely affect the evaluation results. This poses difficulties for precisely diagnosing and improving a 3D face reconstruction method. In this paper, we propose a novel evaluati… ▽ More

    Submitted 19 July, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022, camera-ready version; Project page: https://realy3dface.com; Code: https://github.com/czh-98/REALY

  12. arXiv:2203.07160  [pdf, other

    cs.CV

    CAR: Class-aware Regularizations for Semantic Segmentation

    Authors: Ye Huang, Di Kang, Liang Chen, Xuefei Zhe, Wen**g Jia, Xiangjian He, Linchao Bao

    Abstract: Recent segmentation methods, such as OCR and CPNet, utilizing "class level" information in addition to pixel features, have achieved notable success for boosting the accuracy of existing network modules. However, the extracted class-level information was simply concatenated to pixel features, without explicitly being exploited for better pixel representation learning. Moreover, these approaches le… ▽ More

    Submitted 14 July, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: ECCV 2022 camera ready. Codes and models are available at https://github.com/edwardyehuang/CAR

  13. arXiv:2108.06720  [pdf, other

    cs.CV

    Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

    Authors: **g Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Zhenyu He, Linchao Bao

    Abstract: Generating conversational gestures from speech audio is challenging due to the inherent one-to-many map** between audio and body motions. Conventional CNNs/RNNs assume one-to-one map**, and thus tend to predict the average of all possible target motions, resulting in plain/boring motions during inference. In order to overcome this problem, we propose a novel conditional variational autoencoder… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

  14. arXiv:2107.00327  [pdf, other

    cs.CV

    Orthonormal Product Quantization Network for Scalable Face Image Retrieval

    Authors: Ming Zhang, Xuefei Zhe, Hong Yan

    Abstract: Existing deep quantization methods provided an efficient solution for large-scale image retrieval. However, the significant intra-class variations like pose, illumination, and expressions in face images, still pose a challenge for face image retrieval. In light of this, face image retrieval requires sufficiently powerful learning metrics, which are absent in current deep quantization works. Moreov… ▽ More

    Submitted 12 May, 2023; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: Published in Pattern Recognition, supplementary material can be found in Github project page

  15. arXiv:2106.13629  [pdf, other

    cs.CV

    Animatable Neural Radiance Fields from Monocular RGB Videos

    Authors: Jianchuan Chen, Ying Zhang, Di Kang, Xuefei Zhe, Linchao Bao, Xu Jia, Huchuan Lu

    Abstract: We present animatable neural radiance fields (animatable NeRF) for detailed human avatar creation from monocular videos. Our approach extends neural radiance fields (NeRF) to the dynamic scenes with human movements via introducing explicit pose-guided deformation while learning the scene representation network. In particular, we estimate the human pose for each frame and learn a constant canonical… ▽ More

    Submitted 7 September, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: 12 pages, 12 figures

  16. arXiv:2103.11703  [pdf, other

    cs.CV

    Model-based 3D Hand Reconstruction via Self-Supervised Learning

    Authors: Yu** Chen, Zhigang Tu, Di Kang, Linchao Bao, Ying Zhang, Xuefei Zhe, Ruizhi Chen, Junsong Yuan

    Abstract: Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D ha… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR21

  17. arXiv:2010.05562  [pdf, other

    cs.CV cs.GR

    High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies

    Authors: Linchao Bao, Xiangkai Lin, Ya**g Chen, Haoxian Zhang, Sheng Wang, Xuefei Zhe, Di Kang, Haozhi Huang, Xinwei Jiang, Jue Wang, Dong Yu, Zhengyou Zhang

    Abstract: We present a fully automatic system that can produce high-fidelity, photo-realistic 3D digital human heads with a consumer RGB-D selfie camera. The system only needs the user to take a short selfie RGB-D video while rotating his/her head, and can produce a high quality head reconstruction in less than 30 seconds. Our main contribution is a new facial geometry modeling and reflectance synthesis pro… ▽ More

    Submitted 29 June, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: Code: https://github.com/tencent-ailab/hifi3dface

  18. arXiv:1901.11259  [pdf, other

    cs.CV

    Semantic Hierarchy Preserving Deep Hashing for Large-scale Image Retrieval

    Authors: Ming Zhang, Xuefei Zhe, Le Ou-Yang, Shifeng Chen, Hong Yan

    Abstract: Deep hashing models have been proposed as an efficient method for large-scale similarity search. However, most existing deep hashing methods only utilize fine-level labels for training while ignoring the natural semantic hierarchy structure. This paper presents an effective method that preserves the classwise similarity of full-level semantic hierarchy for large-scale image retrieval. Experiments… ▽ More

    Submitted 22 June, 2021; v1 submitted 31 January, 2019; originally announced January 2019.

  19. arXiv:1803.04137  [pdf, other

    cs.CV

    Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss

    Authors: Xuefei Zhe, Shifeng Chen, Hong Yan

    Abstract: Deep supervised hashing has emerged as an influential solution to large-scale semantic image retrieval problems in computer vision. In the light of recent progress, convolutional neural network based hashing methods typically seek pair-wise or triplet labels to conduct the similarity preserving learning. However, complex semantic concepts of visual contents are hard to capture by similar/dissimila… ▽ More

    Submitted 12 March, 2018; originally announced March 2018.

  20. arXiv:1802.09662  [pdf, other

    cs.CV

    Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval

    Authors: Xuefei Zhe, Shifeng Chen, Hong Yan

    Abstract: Deep distance metric learning (DDML), which is proposed to learn image similarity metrics in an end-to-end manner based on the convolution neural network, has achieved encouraging results in many computer vision tasks.$L2$-normalization in the embedding space has been used to improve the performance of several DDML methods. However, the commonly used Euclidean distance is no longer an accurate met… ▽ More

    Submitted 27 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

    Comments: codes will come soon