Skip to main content

Showing 1–50 of 81 results for author: Bao, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15859  [pdf, other

    cs.IR cs.AI

    LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning

    Authors: Guangsi Shi, Xiaofeng Deng, Linhao Luo, Lijuan Xia, Lei Bao, Bei Ye, Fei Du, Shirui Pan, Yuxiao Li

    Abstract: Recommender systems are pivotal in enhancing user experiences across various web applications by analyzing the complicated relationships between users and items. Knowledge graphs(KGs) have been widely used to enhance the performance of recommender systems. However, KGs are known to be noisy and incomplete, which are hard to provide reliable explanations for recommendation results. An explainable r… ▽ More

    Submitted 29 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin **, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, **g**g Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, **long Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, **gfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  3. arXiv:2405.15705  [pdf, other

    cs.AR eess.SY

    Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates

    Authors: **bo Peng, Zhe Chen, Zheng Lin, Haoxuan Yuan, Zihan Fang, Lingzhong Bao, Zihang Song, Ying Li, **g Ren, Yue Gao

    Abstract: Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures

  4. Quality-aware Selective Fusion Network for V-D-T Salient Object Detection

    Authors: Liuxin Bao, Xiaofei Zhou, Xiankai Lu, Yaoqi Sun, Haibing Yin, Zhenghui Hu, Jiyong Zhang, Chenggang Yan

    Abstract: Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some r… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Image Processing (TIP)

  5. Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

    Authors: Yubo Mai, Zhipeng Gao, Xing Hu, Lingfeng Bao, Yu Liu, Jianling Sun

    Abstract: Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2404.19026  [pdf, other

    cs.CV

    MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

    Authors: Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang

    Abstract: Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gau… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Project page: https://conallwang.github.io/MeGA_Pages/

  7. arXiv:2404.17070  [pdf, other

    cs.RO

    Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

    Authors: Lingfan Bao, Joseph Humphreys, Tianhu Peng, Chengxu Zhou

    Abstract: Bipedal robots are garnering increasing global attention due to their potential applications and advancements in artificial intelligence, particularly in Deep Reinforcement Learning (DRL). While DRL has driven significant progress in bipedal locomotion, develo** a comprehensive and unified framework capable of adeptly performing a wide range of tasks remains a challenge. This survey systematical… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures

  8. arXiv:2403.09601  [pdf, other

    cs.NI eess.SY

    Network-Controlled Repeater -- An Introduction

    Authors: Fco. Italo G. Carvalho, Raul Victor de O. Paiva, Tarcisio F. Maciel, Victor F. Monteiro, Fco. Rafael M. Lima, Darlan C. Moreira, Diego A. Sousa, Behrooz Makki, Magnus Astrom, Lei Bao

    Abstract: In fifth generation (5G) wireless cellular networks, millimeter wave spectrum opens room for several potential improvements in throughput, reliability, latency, among other aspects. However, it also brings challenges, such as a higher influence of blockage which may significantly limit the coverage. In this context, network-controlled repeaters (NCRs) are network nodes with low complexity that rep… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Submmited to IEEE Communications Standards Magazine

  9. arXiv:2401.01078  [pdf, other

    cs.CL cs.AI

    Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation

    Authors: Triet Minh Huynh, Quan Le Bao

    Abstract: Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacio… ▽ More

    Submitted 4 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  10. arXiv:2307.05000  [pdf, other

    cs.CV

    Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar

    Authors: Cong Wang, Di Kang, Yan-Pei Cao, Linchao Bao, Ying Shan, Song-Hai Zhang

    Abstract: Rendering photorealistic and dynamically moving human heads is crucial for ensuring a pleasant and immersive experience in AR/VR and video conferencing applications. However, existing methods often struggle to model challenging facial regions (e.g., mouth interior, eyes, hair/beard), resulting in unrealistic and blurry results. In this paper, we propose {\fullname} ({\name}), a method that adopts… ▽ More

    Submitted 13 October, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: Accepted by SIGGRAPH Asia 2023

  11. arXiv:2304.05554  [pdf, other

    cs.CV cs.AI

    Learning Transferable Pedestrian Representation from Multimodal Information Supervision

    Authors: Li** Bao, Longhui Wei, Xiaoyu Qiu, Wengang Zhou, Houqiang Li, Qi Tian

    Abstract: Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet. However, those pre-trained methods are specifically designed for reID and suffer flexible adaption to other pedestrian analysis tasks. In this paper, we propose VAL-PAT, a novel framewo… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  12. arXiv:2303.15967  [pdf, other

    cs.SE

    CM-CASL: Comparison-based Performance Modeling of Software Systems via Collaborative Active and Semisupervised Learning

    Authors: Rong Cao, Liang Bao, Chase Wu, Panpan Zhangsun, Yufei Li, Zhe Zhang

    Abstract: Configuration tuning for large software systems is generally challenging due to the complex configuration space and expensive performance evaluation. Most existing approaches follow a two-phase process, first learning a regression-based performance prediction model on available samples and then searching for the configurations with satisfactory performance using the learned model. Such regression-… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  13. arXiv:2303.08658  [pdf, other

    cs.CV cs.GR

    Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

    Authors: Jiaxu Zhang, Junwu Weng, Di Kang, Fang Zhao, Shaoli Huang, Xuefei Zhe, Linchao Bao, Ying Shan, Jue Wang, Zhigang Tu

    Abstract: A good motion retargeting cannot be reached without reasonable consideration of source-target differences on both the skeleton and shape geometry levels. In this work, we propose a novel Residual RETargeting network (R2ET) structure, which relies on two neural modification modules, to adjust the source motions to fit the target skeletons and shapes progressively. In particular, a skeleton-aware mo… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  14. arXiv:2303.07744  [pdf, other

    math.OC cs.CV

    Sliding at first order: Higher-order momentum distributions for discontinuous image registration

    Authors: Lili Bao, Jiahao Lu, Shihui Ying, Stefan Sommer

    Abstract: In this paper, we propose a new approach to deformable image registration that captures sliding motions. The large deformation diffeomorphic metric map** (LDDMM) registration method faces challenges in representing sliding motion since it per construction generates smooth warps. To address this issue, we extend LDDMM by incorporating both zeroth- and first-order momenta with a non-differentiable… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    MSC Class: 65D18; 65K10; 34A36; 68U10

  15. arXiv:2302.01162  [pdf, other

    cs.CV

    Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors

    Authors: Zhangyang Xiong, Di Kang, Derong **, Weikai Chen, Linchao Bao, Shuguang Cui, Xiaoguang Han

    Abstract: Fast generation of high-quality 3D digital humans is important to a vast number of applications ranging from entertainment to professional concerns. Recent advances in differentiable rendering have enabled the training of 3D generative models without requiring 3D ground truths. However, the quality of the generated 3D humans still has much room to improve in terms of both fidelity and diversity. I… ▽ More

    Submitted 24 July, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: ICCV 2023, project page: https://x-zhangyang.github.io/2023_Get3DHuman/

  16. arXiv:2301.11546  [pdf, other

    cs.LG

    Adapting Step-size: A Unified Perspective to Analyze and Improve Gradient-based Methods for Adversarial Attacks

    Authors: Wei Tao, Lei Bao, Sheng Long, Gaowei Wu, Qing Tao

    Abstract: Learning adversarial examples can be formulated as an optimization problem of maximizing the loss function with some box-constraints. However, for solving this induced optimization problem, the state-of-the-art gradient-based methods such as FGSM, I-FGSM and MI-FGSM look different from their original methods especially in updating the direction, which makes it difficult to understand them and then… ▽ More

    Submitted 1 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  17. arXiv:2301.06690  [pdf, other

    cs.CV

    Audio2Gestures: Generating Diverse Gestures from Audio

    Authors: **g Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

    Abstract: People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one map**, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during infe… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2108.06720

  18. arXiv:2301.06059  [pdf, other

    cs.GR cs.CV

    Learning Audio-Driven Viseme Dynamics for 3D Face Animation

    Authors: Linchao Bao, Haoxian Zhang, Yue Qian, Tangli Xue, Changhai Chen, Xuefei Zhe, Di Kang

    Abstract: We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D facial animations from the input audio. Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs. The core of our approach is a novel parametric viseme fitting algorithm that utilizes phoneme priors to extract vis… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: Project page: https://linchaobao.github.io/viseme2023/

  19. arXiv:2301.04258  [pdf, other

    cs.CV

    CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

    Authors: Ye Huang, Di Kang, Liang Chen, Wen**g Jia, Xiangjian He, Lixin Duan, Xuefei Zhe, Linchao Bao

    Abstract: Semantic segmentation has recently achieved notable advances by exploiting "class-level" contextual information during learning. However, these approaches simply concatenate class-level information to pixel features to boost the pixel representation learning, which cannot fully utilize intra-class and inter-class contextual information. Moreover, these approaches learn soft class centers based on… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: Tech report, text extended from arXiv:2203.07160

  20. arXiv:2211.13874  [pdf, other

    cs.CV

    FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction

    Authors: Haoran Bai, Di Kang, Haoxian Zhang, **shan Pan, Linchao Bao

    Abstract: We present a large-scale facial UV-texture dataset that contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions, which are desired characteristics for rendering realistic 3D face models under different lighting conditions. The dataset is derived from a large-scale face image dataset namely FFHQ, with the help of our fully automatic… ▽ More

    Submitted 24 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: The dataset, code, and pre-trained texture decoder are publicly available at https://github.com/csbhr/FFHQ-UV

  21. arXiv:2211.06974  [pdf

    cs.NI cs.IT eess.SP

    A Comparison between Network-Controlled Repeaters and Reconfigurable Intelligent Surfaces

    Authors: Hao Guo, Charitha Madapatha, Behrooz Makki, Boris Dortschy, Lei Bao, Magnus Åström, Tommy Svensson

    Abstract: Network-controlled repeater (NCR) has been recently considered as a study-item in 3GPP Release 18, and the discussions are continuing in a work-item. In this paper, we introduce the concept of NCRs, as a possible low-complexity device to support for network densification and compare the performance of the NCRs with those achieved by reconfigurable intelligent surfaces (RISs). The results are prese… ▽ More

    Submitted 13 November, 2022; originally announced November 2022.

    Comments: 7 pages, 7 figures, submitted to potential IEEE publication

  22. arXiv:2211.05256  [pdf, other

    eess.IV cs.CV

    Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

    Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

  23. arXiv:2210.12723  [pdf

    eess.IV cs.AI cs.LG

    A Faithful Deep Sensitivity Estimation for Accelerated Magnetic Resonance Imaging

    Authors: Zi Wang, Haoming Fang, Chen Qian, Boxuan Shi, Lijun Bao, Liuhong Zhu, Jianjun Zhou, Wen** Wei, Jianzhong Lin, Di Guo, Xiaobo Qu

    Abstract: Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learn… ▽ More

    Submitted 24 December, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: 12 pages, 13 figures, 7 tables

  24. arXiv:2210.00841  [pdf, other

    cs.CV cs.LG

    Smooth image-to-image translations with latent space interpolations

    Authors: Yahui Liu, Enver Sangineto, Ya**g Chen, Linchao Bao, Haoxian Zhang, Nicu Sebe, Bruno Lepri, Marco De Nadai

    Abstract: Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain. One important, desired characteristic of these transformations, is their graduality, which corresponds to a smooth change between the source and the target image when their respective latent-space representations are linearly interpolated. However, state-of-the-art methods usually… ▽ More

    Submitted 14 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  25. arXiv:2209.13204  [pdf, other

    cs.CV cs.GR

    NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System

    Authors: Weiqiang Wang, Xuefei Zhe, Qiuhong Ke, Di Kang, Tingguang Li, Ruizhi Chen, Linchao Bao

    Abstract: We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel… ▽ More

    Submitted 27 November, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  26. arXiv:2208.14600  [pdf, other

    cs.CV eess.IV

    ELSR: Extreme Low-Power Super Resolution Network For Mobile Devices

    Authors: Tianyu Xu, Zhuang Jia, Yijian Zhang, Long Bao, Heng Sun

    Abstract: With the popularity of mobile devices, e.g., smartphone and wearable devices, lighter and faster model is crucial for the application of video super resolution. However, most previous lightweight models tend to concentrate on reducing lantency of model inference on desktop GPU, which may be not energy efficient in current mobile devices. In this paper, we proposed Extreme Low-Power Super Resolutio… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  27. arXiv:2208.11948  [pdf, other

    cs.CV

    Learning to Construct 3D Building Wireframes from 3D Line Clouds

    Authors: Yicheng Luo, **g Ren, Xuefei Zhe, Di Kang, Ya**g Xu, Peter Wonka, Linchao Bao

    Abstract: Line clouds, though under-investigated in the previous work, potentially encode more compact structural information of buildings than point clouds extracted from multi-view images. In this work, we propose the first network to process line clouds for building wireframe abstraction. The network takes a line cloud as input , i.e., a nonstructural and unordered set of 3D line segments extracted from… ▽ More

    Submitted 4 November, 2022; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: 10 pages, 6 figures

  28. arXiv:2208.10769  [pdf, other

    cs.CV

    PIFu for the Real World: A Self-supervised Framework to Reconstruct Dressed Human from Single-view Images

    Authors: Zhangyang Xiong, Dong Du, Yushuang Wu, **gqi Dong, Di Kang, Linchao Bao, Xiaoguang Han

    Abstract: It is very challenging to accurately reconstruct sophisticated human geometry caused by various poses and garments from a single image. Recently, works based on pixel-aligned implicit function (PIFu) have made a big step and achieved state-of-the-art fidelity on image-based 3D human digitization. However, the training of PIFu relies heavily on expensive and limited 3D ground truth data (i.e. synth… ▽ More

    Submitted 8 March, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: CVM 2024

  29. arXiv:2206.06715  [pdf, other

    cs.CV

    Semi-signed prioritized neural fitting for surface reconstruction from unoriented point clouds

    Authors: Runsong Zhu, Di Kang, Ka-Hei Hui, Yue Qian, Xuefei Zhe, Zhen Dong, Linchao Bao, Pheng-Ann Heng, Chi-Wing Fu

    Abstract: Reconstructing 3D geometry from \emph{unoriented} point clouds can benefit many downstream tasks. Recent shape modeling methods mostly adopt implicit neural representation to fit a signed distance field (SDF) and optimize the network by \emph{unsigned} supervision. However, these methods occasionally have difficulty in finding the coarse shape for complicated objects, especially suffering from the… ▽ More

    Submitted 14 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

  30. arXiv:2205.12633  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

    Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, ** Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

    Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

  31. arXiv:2204.01147  [pdf, other

    cs.RO

    Continuous Jum** for Legged Robots on Step** Stones via Trajectory Optimization and Model Predictive Control

    Authors: Chuong Nguyen, Lingfan Bao, Quan Nguyen

    Abstract: Performing highly agile dynamic motions, such as jum** or running on uneven step** stones has remained a challenging problem in legged robot locomotion. This paper presents a framework that combines trajectory optimization and model predictive control to perform robust and consecutive jum** on step** stones. In our approach, we first utilize trajectory optimization based on full-nonlinear… ▽ More

    Submitted 16 September, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted to the 61st IEEE Conference on Decision and Control (CDC 2022)

  32. arXiv:2203.09729  [pdf, other

    cs.CV cs.GR

    REALY: Rethinking the Evaluation of 3D Face Reconstruction

    Authors: Zenghao Chai, Haoxian Zhang, **g Ren, Di Kang, Zhengzhuo Xu, Xuefei Zhe, Chun Yuan, Linchao Bao

    Abstract: The evaluation of 3D face reconstruction results typically relies on a rigid shape alignment between the estimated 3D model and the ground-truth scan. We observe that aligning two shapes with different reference points can largely affect the evaluation results. This poses difficulties for precisely diagnosing and improving a 3D face reconstruction method. In this paper, we propose a novel evaluati… ▽ More

    Submitted 19 July, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022, camera-ready version; Project page: https://realy3dface.com; Code: https://github.com/czh-98/REALY

  33. arXiv:2203.07160  [pdf, other

    cs.CV

    CAR: Class-aware Regularizations for Semantic Segmentation

    Authors: Ye Huang, Di Kang, Liang Chen, Xuefei Zhe, Wen**g Jia, Xiangjian He, Linchao Bao

    Abstract: Recent segmentation methods, such as OCR and CPNet, utilizing "class level" information in addition to pixel features, have achieved notable success for boosting the accuracy of existing network modules. However, the extracted class-level information was simply concatenated to pixel features, without explicitly being exploited for better pixel representation learning. Moreover, these approaches le… ▽ More

    Submitted 14 July, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: ECCV 2022 camera ready. Codes and models are available at https://github.com/edwardyehuang/CAR

  34. arXiv:2203.00803  [pdf, other

    cs.SE cs.LG

    Code Smells in Machine Learning Systems

    Authors: Jiri Gesi, Siqi Liu, Jiawei Li, Iftekhar Ahmed, Nachiappan Nagappan, David Lo, Eduardo Santana de Almeida, Pavneet Singh Kochhar, Lingfeng Bao

    Abstract: As Deep learning (DL) systems continuously evolve and grow, assuring their quality becomes an important yet challenging task. Compared to non-DL systems, DL systems have more complex team compositions and heavier data dependency. These inherent characteristics would potentially cause DL systems to be more vulnerable to bugs and, in the long run, to maintenance issues. Code smells are empirically t… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

  35. arXiv:2202.04879  [pdf, other

    cs.CV

    PVSeRF: Joint Pixel-, Voxel- and Surface-Aligned Radiance Field for Single-Image Novel View Synthesis

    Authors: Xianggang Yu, Jiapeng Tang, Yipeng Qin, Chenghong Li, Linchao Bao, Xiaoguang Han, Shuguang Cui

    Abstract: We present PVSeRF, a learning framework that reconstructs neural radiance fields from single-view RGB images, for novel view synthesis. Previous solutions, such as pixelNeRF, rely only on pixel-aligned features and suffer from feature ambiguity issues. As a result, they struggle with the disentanglement of geometry and appearance, leading to implausible geometries and blurry results. To address th… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

  36. Consistent 3D Hand Reconstruction in Video via self-supervised Learning

    Authors: Zhigang Tu, Zhisheng Huang, Yu** Chen, Di Kang, Linchao Bao, Bisheng Yang, Junsong Yuan

    Abstract: We present a method for reconstructing accurate and consistent 3D hands from a monocular video. We observe that detected 2D hand keypoints and the image texture provide important cues about the geometry and texture of the 3D hand, which can reduce or even eliminate the requirement on 3D hand annotation. Thus we propose ${\rm {S}^{2}HAND}$, a self-supervised 3D hand reconstruction model, that can j… ▽ More

    Submitted 20 March, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2103.11703

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023

  37. arXiv:2111.15234  [pdf, other

    cs.CV cs.GR

    NeRFReN: Neural Radiance Fields with Reflections

    Authors: Yuan-Chen Guo, Di Kang, Linchao Bao, Yu He, Song-Hai Zhang

    Abstract: Neural Radiance Fields (NeRF) has achieved unprecedented view synthesis quality using coordinate-based neural scene representations. However, NeRF's view dependency can only handle simple reflections like highlights but cannot deal with complex reflections such as those from glass and mirrors. In these scenarios, NeRF models the virtual image as real geometries which leads to inaccurate depth esti… ▽ More

    Submitted 6 April, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR 2022. Project page: https://bennyguo.github.io/nerfren/

  38. arXiv:2109.12492  [pdf, other

    cs.CV

    ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation

    Authors: Yahui Liu, Ya**g Chen, Linchao Bao, Nicu Sebe, Bruno Lepri, Marco De Nadai

    Abstract: Recently, there has been an increasing interest in image editing methods that employ pre-trained unconditional image generators (e.g., StyleGAN). However, applying these methods to translate images to multiple visual domains remains challenging. Existing works do not often preserve the domain-invariant part of the image (e.g., the identity in human face translations), they do not usually handle mu… ▽ More

    Submitted 23 February, 2022; v1 submitted 26 September, 2021; originally announced September 2021.

    Comments: 14 pages, 15 figures

    Journal ref: IEEE Transactions on Multimedia, 2022

  39. arXiv:2108.06720  [pdf, other

    cs.CV

    Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

    Authors: **g Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Zhenyu He, Linchao Bao

    Abstract: Generating conversational gestures from speech audio is challenging due to the inherent one-to-many map** between audio and body motions. Conventional CNNs/RNNs assume one-to-one map**, and thus tend to predict the average of all possible target motions, resulting in plain/boring motions during inference. In order to overcome this problem, we propose a novel conditional variational autoencoder… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

  40. arXiv:2108.05650  [pdf, other

    cs.CV

    UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing

    Authors: Meng Cao, Haozhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, Jiebo Luo

    Abstract: Recent research has witnessed advances in facial image editing tasks including face swap** and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visu… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: Accepted by IEEE Transactions on Image Processing (TIP). arXiv admin note: text overlap with arXiv:2007.01466

  41. arXiv:2106.13629  [pdf, other

    cs.CV

    Animatable Neural Radiance Fields from Monocular RGB Videos

    Authors: Jianchuan Chen, Ying Zhang, Di Kang, Xuefei Zhe, Linchao Bao, Xu Jia, Huchuan Lu

    Abstract: We present animatable neural radiance fields (animatable NeRF) for detailed human avatar creation from monocular videos. Our approach extends neural radiance fields (NeRF) to the dynamic scenes with human movements via introducing explicit pose-guided deformation while learning the scene representation network. In particular, we estimate the human pose for each frame and learn a constant canonical… ▽ More

    Submitted 7 September, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: 12 pages, 12 figures

  42. arXiv:2106.09016  [pdf, other

    cs.CV cs.LG

    Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation

    Authors: Yahui Liu, Enver Sangineto, Ya**g Chen, Linchao Bao, Haoxian Zhang, Nicu Sebe, Bruno Lepri, Wei Wang, Marco De Nadai

    Abstract: Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic interpolation results. However, state-of-the-art models frequently show abrupt changes in the image appearance during interpolation, and usually perform poorly in interpolations across domains. In this paper, we propose a new training protocol based on three specific losses which hel… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2021

  43. arXiv:2103.11703  [pdf, other

    cs.CV

    Model-based 3D Hand Reconstruction via Self-Supervised Learning

    Authors: Yu** Chen, Zhigang Tu, Di Kang, Linchao Bao, Ying Zhang, Xuefei Zhe, Ruizhi Chen, Junsong Yuan

    Abstract: Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D ha… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR21

  44. psc2code: Denoising Code Extraction from Programming Screencasts

    Authors: Lingfeng Bao, Zhenchang Xing, Xin Xia, David Lo, Minghui Wu, Xiaohu Yang

    Abstract: In this paper, we propose an approach named psc2code to denoise the process of extracting source code from programming screencasts. First, psc2code leverages the Convolutional Neural Network based image classification to remove non-code and noisy-code frames. Then, psc2code performs edge detection and clustering-based image segmentation to detect sub-windows in a code frame, and based on the detec… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: pre-print TOSEM paper for ICSE 2021

    Journal ref: ACM Trans. Softw. Eng. Methodol. 29, 3, Article 21 (July 2020), 38 pages

  45. arXiv:2012.07565  [pdf, other

    cs.CL cs.IR cs.LG

    Automating Document Classification with Distant Supervision to Increase the Efficiency of Systematic Reviews

    Authors: Xiaoxiao Li, Rabah Al-Zaidy, Amy Zhang, Stefan Baral, Le Bao, C. Lee Giles

    Abstract: Objective: Systematic reviews of scholarly documents often provide complete and exhaustive summaries of literature relevant to a research question. However, well-done systematic reviews are expensive, time-demanding, and labor-intensive. Here, we propose an automatic document classification approach to significantly reduce the effort in reviewing documents. Methods: We first describe a manual docu… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

  46. arXiv:2011.14238  [pdf, other

    stat.ML cs.LG stat.CO

    Approximate Cross-validated Mean Estimates for Bayesian Hierarchical Regression Models

    Authors: Amy X. Zhang, Le Bao, Changcheng Li, Michael J. Daniels

    Abstract: We introduce a novel procedure for obtaining cross-validated predictive estimates for Bayesian hierarchical regression models (BHRMs). Bayesian hierarchical models are popular for their ability to model complex dependence structures and provide probabilistic uncertainty estimates, but can be computationally expensive to run. Cross-validation (CV) is therefore not a common practice to evaluate the… ▽ More

    Submitted 17 January, 2024; v1 submitted 28 November, 2020; originally announced November 2020.

    Comments: 26 pages, 2 figures

  47. arXiv:2011.02055  [pdf, other

    eess.IV cs.CV

    Self-Adaptively Learning to Demoire from Focused and Defocused Image Pairs

    Authors: Lin Liu, Shanxin Yuan, Jianzhuang Liu, Li** Bao, Gregory Slabaugh, Qi Tian

    Abstract: Moire artifacts are common in digital photography, resulting from the interference between high-frequency scene content and the color filter array of the camera. Existing deep learning-based demoireing methods trained on large scale datasets are limited in handling various complex moire patterns, and mainly focus on demoireing of photos taken of digital displays. Moreover, obtaining moire-free gro… ▽ More

    Submitted 5 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to NeurIPS 2020. Project page: "http://home.ustc.edu.cn/~ll0825/project_FDNet.html"

  48. arXiv:2010.05562  [pdf, other

    cs.CV cs.GR

    High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies

    Authors: Linchao Bao, Xiangkai Lin, Ya**g Chen, Haoxian Zhang, Sheng Wang, Xuefei Zhe, Di Kang, Haozhi Huang, Xinwei Jiang, Jue Wang, Dong Yu, Zhengyou Zhang

    Abstract: We present a fully automatic system that can produce high-fidelity, photo-realistic 3D digital human heads with a consumer RGB-D selfie camera. The system only needs the user to take a short selfie RGB-D video while rotating his/her head, and can produce a high quality head reconstruction in less than 30 seconds. Our main contribution is a new facial geometry modeling and reflectance synthesis pro… ▽ More

    Submitted 29 June, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: Code: https://github.com/tencent-ailab/hifi3dface

  49. The star-structure connectivity and star-substructure connectivity of hypercubes and folded hypercubes

    Authors: Lina Ba, He** Zhang

    Abstract: As a generalization of vertex connectivity, for connected graphs $G$ and $T$, the $T$-structure connectivity $κ(G, T)$ (resp. $T$-substructure connectivity $κ^{s}(G, T)$) of $G$ is the minimum cardinality of a set of subgraphs $F$ of $G$ that each is isomorphic to $T$ (resp. to a connected subgraph of $T$) so that $G-F$ is disconnected. For $n$-dimensional hypercube $Q_{n}$, Lin et al. [6] showed… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Journal ref: The Computer Journal 2021

  50. arXiv:2008.13426  [pdf, other

    cs.CV

    Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

    Authors: Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Wei Liu, Yun-hui Liu

    Abstract: This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc. Then a… ▽ More

    Submitted 28 January, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

    Comments: Accepted by TPAMI. An extension of our previous work at arXiv:1904.03597