Skip to main content

Showing 1–50 of 131 results for author: Tan, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02096  [pdf, other

    cs.RO

    MS-Map**: Multi-session LiDAR Map** with Wasserstein-based Keyframe Selection

    Authors: Xiangcheng Hu, ** Wu, Jianhao Jiao, Wei Zhang, ** Tan

    Abstract: Large-scale multi-session LiDAR map** plays a crucial role in various applications but faces significant challenges in data redundancy and pose graph scalability. This paper present MS-Map**, a novel multi-session LiDAR map** system that combines an incremental map** scheme with support for various LiDAR-based odometry, enabling high-precision and consistent map assembly in large-scale env… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  2. arXiv:2406.01467  [pdf, other

    cs.GR cs.CV

    RaDe-GS: Rasterizing Depth in Gaussian Splatting

    Authors: Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, ** Tan

    Abstract: Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent tech… ▽ More

    Submitted 24 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2405.14979  [pdf, other

    cs.GR cs.CV

    CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

    Authors: Weiyu Li, Jiarui Liu, Rui Chen, Yixun Liang, Xuelin Chen, ** Tan, Xiaoxiao Long

    Abstract: We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner. Despite the significant advancements in 3D generation, existing methods still struggle with lengthy optimization processes, irregular mes… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: HomePage: https://craftsman3d.github.io/, Code: https://github.com/wyysf-98/CraftsMan

  4. arXiv:2405.14198  [pdf, other

    cs.MA

    Enabling Sustainable Freight Forwarding Network via Collaborative Games

    Authors: Pang-** Tan, Shih-Fen Cheng, Richard Chen

    Abstract: Freight forwarding plays a crucial role in facilitating global trade and logistics. However, as the freight forwarding market is extremely fragmented, freight forwarders often face the issue of not being able to fill the available ship** capacity. This recurrent issue motivates the creation of various freight forwarding networks that aim at exchanging capacities and demands so that the resource… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24)

  5. arXiv:2405.11616  [pdf, other

    cs.CV

    Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

    Authors: Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, ** Tan, Wen** Wang, Qifeng Liu, Yike Guo

    Abstract: In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should… ▽ More

    Submitted 29 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  6. arXiv:2405.05814  [pdf

    eess.IV cs.CV

    MSDiff: Multi-Scale Diffusion Model for Ultra-Sparse View CT Reconstruction

    Authors: Pinhuang Tan, Mengxiao Geng, **gya Lu, Liu Shi, Bin Huang, Qiegen Liu

    Abstract: Computed Tomography (CT) technology reduces radiation haz-ards to the human body through sparse sampling, but fewer sampling angles pose challenges for image reconstruction. Score-based generative models are widely used in sparse-view CT re-construction, performance diminishes significantly with a sharp reduction in projection angles. Therefore, we propose an ultra-sparse view CT reconstruction me… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2404.15121  [pdf, other

    cs.GR cs.AI cs.CV

    Taming Diffusion Probabilistic Models for Character Control

    Authors: Rui Chen, Mingyi Shi, Shaoli Huang, ** Tan, Taku Komura, Xuelin Chen

    Abstract: We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's his… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGGRAPH 2024 (Conference Track). Project page and source codes: https://aiganimation.github.io/CAMDM/

  8. arXiv:2404.14850  [pdf, other

    cs.CL cs.LG q-bio.BM

    Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models

    Authors: Yang Tan, Mingchen Li, Bingxin Zhou, Bozitao Zhong, Lirong Zheng, Pan Tan, Ziyi Zhou, Huiqun Yu, Guisheng Fan, Liang Hong

    Abstract: Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfe… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 30 pages, 4 figures, 8 tables

  9. arXiv:2404.02788  [pdf, other

    cs.CV

    GenN2N: Generative NeRF2NeRF Translation

    Authors: Xiangyue Liu, Han Xue, Kunming Luo, ** Tan, Li Yi

    Abstract: We present GenN2N, a unified NeRF-to-NeRF translation framework for various NeRF translation tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc. Unlike previous methods designed for individual translation tasks with task-specific schemes, GenN2N achieves all these NeRF editing tasks by employing a plug-and-play image-to-image translator to perform editing in th… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project page: https://xiangyueliu.github.io/GenN2N/

  10. arXiv:2404.01543  [pdf, other

    cs.CV cs.GR

    Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

    Authors: Ziqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, ** Tan, Yinda Zhang

    Abstract: 3D head avatars built with neural implicit volumetric representations have achieved unprecedented levels of photorealism. However, the computational cost of these methods remains a significant barrier to their widespread adoption, particularly in real-time applications such as virtual reality and teleconferencing. While attempts have been made to develop fast neural rendering approaches for static… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: In CVPR2024. Project page: https://augmentedperception.github.io/monoavatar-plus

  11. arXiv:2403.12013  [pdf, other

    cs.CV

    GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

    Authors: Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, ** Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long

    Abstract: We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenar… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://fuxiao0719.github.io/projects/geowizard/

  12. arXiv:2403.11270  [pdf, other

    cs.CV

    Bilateral Propagation Network for Depth Completion

    Authors: Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, ** Tan

    Abstract: Depth completion aims to derive a dense depth map from sparse depth measurements with a synchronized color image. Current state-of-the-art (SOTA) methods are predominantly propagation-based, which work as an iterative refinement on the initial estimated dense depth. However, the initial depth estimations mostly result from direct applications of convolutional layers on the sparse depth map. In thi… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  13. arXiv:2402.16479  [pdf, other

    cs.CV

    Edge Detectors Can Make Deep Convolutional Neural Networks More Robust

    Authors: ** Ding, Jie-Chao Zhao, Yong-Zhi Sun, ** Tan, Jia-Wei Wang, Ji-En Ma, You-Tong Fang

    Abstract: Deep convolutional neural networks (DCNN for short) are vulnerable to examples with small perturbations. Improving DCNN's robustness is of great significance to the safety-critical applications, such as autonomous driving and industry automation. Inspired by the principal way that human eyes recognize objects, i.e., largely relying on the shape features, this paper first employs the edge detectors… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 26 pages, 18 figures, 7 tables. submitted to Neural Networks, under review

  14. arXiv:2402.10551  [pdf, other

    cs.LG q-bio.QM

    Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information

    Authors: Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt, Anand D. Jeyasekharan, Vaibhav Rajan

    Abstract: Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  15. arXiv:2401.14427  [pdf, other

    cs.SE cs.CR cs.LG

    Beimingwu: A Learnware Dock System

    Authors: Zhi-Hao Tan, Jian-Dong Liu, Xiao-Dong Bi, Peng Tan, Qin-Cheng Zheng, Hai-Tian Liu, Yi Xie, Xiao-Chuan Zou, Yang Yu, Zhi-Hua Zhou

    Abstract: The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes. In this paradigm, developers worldwide can submit their high-performing models spontaneously to the learnware dock system (formerly known as learnwa… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  16. arXiv:2401.05478  [pdf, other

    cs.SI cs.AI cs.LG

    Population Graph Cross-Network Node Classification for Autism Detection Across Sample Groups

    Authors: Anna Stephens, Francisco Santos, Pang-Ning Tan, Abdol-Hossein Esfahanian

    Abstract: Graph neural networks (GNN) are a powerful tool for combining imaging and non-imaging medical information for node classification tasks. Cross-network node classification extends GNN techniques to account for domain drift, allowing for node classification on an unlabeled target network. In this paper we present OTGCN, a powerful, novel approach to cross-network node classification. This approach l… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: To appear ICDM DMBIH workshop 2023

  17. Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation

    Authors: Xueyan Oh, Leonard Loh, Shaohui Foong, Zhong Bao Andy Koh, Kow Leong Ng, Poh Kang Tan, Pei Lin Pearlin Toh, U-Xuan Tan

    Abstract: General Visual Inspection is a manual inspection process regularly used to detect and localise obvious damage on the exterior of commercial aircraft. There has been increasing demand to perform this process at the boarding gate to minimize the downtime of the aircraft and automating this process is desired to reduce the reliance on human labour. This automation typically requires the first step of… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: This paper has been accepted by 2021 IEEE International Conference on Robotics and Automation (ICRA) with DOI: 10.1109/ICRA48506.2021.9561575

  18. arXiv:2310.17415  [pdf, other

    cs.CL cs.AI q-bio.BM

    PETA: Evaluating the Impact of Protein Transfer Learning with Sub-word Tokenization on Downstream Applications

    Authors: Yang Tan, Mingchen Li, Pan Tan, Ziyi Zhou, Huiqun Yu, Guisheng Fan, Liang Hong

    Abstract: Large protein language models are adept at capturing the underlying evolutionary information in primary structures, offering significant practical value for protein engineering. Compared to natural language models, protein amino acid sequences have a smaller data volume and a limited combinatorial space. Choosing an appropriate vocabulary size to optimize the pre-trained model is a pivotal issue.… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 46 pages, 4figures, 9 tables

  19. arXiv:2310.16857  [pdf

    eess.IV cs.LG

    Improvement in Alzheimer's Disease MRI Images Analysis by Convolutional Neural Networks Via Topological Optimization

    Authors: Peiwen Tan

    Abstract: This research underscores the efficacy of Fourier topological optimization in refining MRI imagery, thereby bolstering the classification precision of Alzheimer's Disease through convolutional neural networks. Recognizing that MRI scans are indispensable for neurological assessments, but frequently grapple with issues like blurriness and contrast irregularities, the deployment of Fourier topologic… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  20. arXiv:2310.05456  [pdf

    cs.LG cs.AI

    Ensemble-based Hybrid Optimization of Bayesian Neural Networks and Traditional Machine Learning Algorithms

    Authors: Peiwen Tan

    Abstract: This research introduces a novel methodology for optimizing Bayesian Neural Networks (BNNs) by synergistically integrating them with traditional machine learning algorithms such as Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM). Feature integration solidifies these results by emphasizing the second-order conditions for optimality, including stationarity and positive… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  21. arXiv:2310.03602  [pdf, other

    cs.CV

    Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints

    Authors: Chuan Fang, Yuan Dong, Kunming Luo, Xiaotao Hu, Rakesh Shrestha, ** Tan

    Abstract: Text-driven 3D indoor scene generation is useful for gaming, the film industry, and AR/VR applications. However, existing methods cannot faithfully capture the room layout, nor do they allow flexible editing of individual objects in the room. To address these problems, we present Ctrl-Room, which can generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a te… ▽ More

    Submitted 1 July, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  22. arXiv:2310.02596  [pdf, other

    cs.CV

    SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D

    Authors: Weiyu Li, Rui Chen, Xuelin Chen, ** Tan

    Abstract: It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation. 2D diffusion models solely learn view-agnostic priors and thus lack 3D knowledge during the lifting, leading to the multi-view inconsistency problem. We find that this problem primarily stems from geometric inconsistency, and avoiding misplaced geometric structures substantially… ▽ More

    Submitted 20 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Project page: https://sweetdreamer3d.github.io/

  23. arXiv:2309.13814  [pdf, other

    cs.CV

    DVI-SLAM: A Dual Visual Inertial SLAM Network

    Authors: Xiongfeng Peng, Zhihua Liu, Weiming Li, ** Tan, SoonYong Cho, Qiang Wang

    Abstract: Recent deep learning based visual simultaneous localization and map** (SLAM) methods have made significant progress. However, how to make full use of visual information as well as better integrate with inertial measurement unit (IMU) in visual SLAM has potential research value. This paper proposes a novel deep SLAM network with dual visual factors. The basic idea is to integrate both photometric… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted to ICRA2024

    Journal ref: The 2024 IEEE International Conference on Robotics and Automation (ICRA2024)

  24. arXiv:2308.11162  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    A Preliminary Investigation into Search and Matching for Tumour Discrimination in WHO Breast Taxonomy Using Deep Networks

    Authors: Abubakr Shafique, Ricardo Gonzalez, Liron Pantanowitz, Puay Hoon Tan, Alberto Machado, Ian A Cree, Hamid R. Tizhoosh

    Abstract: Breast cancer is one of the most common cancers affecting women worldwide. They include a group of malignant neoplasms with a variety of biological, clinical, and histopathological characteristics. There are more than 35 different histological forms of breast lesions that can be classified and diagnosed histologically according to cell morphology, growth, and architecture patterns. Recently, deep… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  25. arXiv:2308.03492  [pdf, other

    cs.CV

    Learning Photometric Feature Transform for Free-form Object Scan

    Authors: Xiang Feng, Kaizhang Kang, Fan Pei, Huakeng Ding, **jiang You, ** Tan, Kun Zhou, Hongzhi Wu

    Abstract: We propose a novel framework to automatically learn to aggregate and transform photometric measurements from multiple unstructured views into spatially distinctive and view-invariant low-level features, which are fed to a multi-view stereo method to enhance 3D reconstruction. The illumination conditions during acquisition and the feature transform are jointly trained on a large amount of synthetic… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  26. arXiv:2307.13282  [pdf, other

    cs.CV

    High-Resolution Volumetric Reconstruction for Clothed Humans

    Authors: Sicong Tang, Guangyuan Wang, Qing Ran, Lingzhi Li, Li Shen, ** Tan

    Abstract: We present a novel method for reconstructing clothed humans from a sparse set of, e.g., 1 to 6 RGB images. Despite impressive results from recent works employing deep implicit representation, we revisit the volumetric approach and demonstrate that better performance can be achieved with proper system design. The volumetric representation offers significant advantages in leveraging 3D spatial conte… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  27. arXiv:2306.12361  [pdf, other

    eess.SY cs.LG

    Sigma-point Kalman Filter with Nonlinear Unknown Input Estimation via Optimization and Data-driven Approach for Dynamic Systems

    Authors: Junn Yong Loo, Ze Yang Ding, Vishnu Monn Baskaran, Surya Girinatha Nurzaman, Chee Pin Tan

    Abstract: Most works on joint state and unknown input (UI) estimation require the assumption that the UIs are linear; this is potentially restrictive as it does not hold in many intelligent autonomous systems. To overcome this restriction and circumvent the need to linearize the system, we propose a derivative-free Unknown Input Sigma-point Kalman Filter (SPKF-nUI) where the SPKF is interconnected with a ge… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  28. arXiv:2306.04919  [pdf, other

    cs.LG

    Unsupervised Cross-Domain Soft Sensor Modelling via Deep Physics-Inspired Particle Flow Bayes

    Authors: Junn Yong Loo, Ze Yang Ding, Surya G. Nurzaman, Chee-Ming Ting, Vishnu Monn Baskaran, Chee Pin Tan

    Abstract: Data-driven soft sensors are essential for achieving accurate perception through reliable state inference. However, develo** representative soft sensor models is challenged by issues such as missing labels, domain adaptability, and temporal coherence in data. To address these challenges, we propose a deep Particle Flow Bayes (DPFB) framework for cross-domain soft sensor modeling in the absence o… ▽ More

    Submitted 8 July, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  29. arXiv:2305.18163  [pdf, other

    cs.CV

    Compact Real-time Radiance Fields with Neural Codebook

    Authors: Lingzhi Li, Zhongshu Wang, Zhen Shen, Li Shen, ** Tan

    Abstract: Reconstructing neural radiance fields with explicit volumetric representations, demonstrated by Plenoxels, has shown remarkable advantages on training and rendering efficiency, while grid-based representations typically induce considerable overhead for storage and transmission. In this work, we present a simple and effective framework for pursuing compact radiance fields from the perspective of co… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted by ICME 2023

  30. arXiv:2305.17445  [pdf, other

    cs.SE

    Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing

    Authors: Julia Kaiwen Lau, Kelvin Kai Wen Kong, Julian Hao Yong, Per Hoong Tan, Zhou Yang, Zi Qian Yong, Joshua Chern Wey Low, Chun Yong Chong, Mei Kuan Lim, David Lo

    Abstract: Recent studies have proposed the use of Text-To-Speech (TTS) systems to automatically synthesise speech test cases on a scale and uncover a large number of failures in ASR systems. However, the failures uncovered by synthetic test cases may not reflect the actual performance of an ASR system when it transcribes human audio, which we refer to as false alarms. Given a failed test case synthesised fr… ▽ More

    Submitted 18 July, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: 13 pages, Accepted at ISSTA2023

  31. arXiv:2305.12497  [pdf, other

    cs.CV

    PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

    Authors: Yuan Dong, Chuan Fang, Liefeng Bo, Zilong Dong, ** Tan

    Abstract: Panoramic image enables deeper understanding and more holistic perception of $360^\circ$ surrounding environment, which can naturally encode enriched scene context information compared to standard perspective image. Previous work has made lots of effort to solve the scene understanding task in a bottom-up form, thus each sub-task is processed separately and few correlations are explored in this pr… ▽ More

    Submitted 5 June, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

  32. arXiv:2304.08299  [pdf, other

    q-bio.QM cs.LG

    Accurate and Definite Mutational Effect Prediction with Lightweight Equivariant Graph Neural Networks

    Authors: Bingxin Zhou, Outongyi Lv, Kai Yi, Xinye Xiong, Pan Tan, Liang Hong, Yu Guang Wang

    Abstract: Directed evolution as a widely-used engineering strategy faces obstacles in finding desired mutants from the massive size of candidate modifications. While deep learning methods learn protein contexts to establish feasible searching space, many existing models are computationally demanding and fail to predict how specific mutational tests will affect a protein's sequence or function. This research… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  33. arXiv:2304.01436  [pdf, other

    cs.CV cs.GR

    Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

    Authors: Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, ** Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

    Abstract: We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduc… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: In CVPR2023. Project page: https://augmentedperception.github.io/monoavatar/

  34. arXiv:2303.11011  [pdf, other

    cs.CV

    Learning Optical Flow from Event Camera with Rendered Dataset

    Authors: Xinglong Luo, Kunming Luo, Ao Luo, Zhengning Wang, ** Tan, Shuaicheng Liu

    Abstract: We study the problem of estimating optical flow from event cameras. One important issue is how to build a high-quality event-flow dataset with accurate event values and flow labels. Previous datasets are created by either capturing real scenes by event cameras or synthesizing from images with pasted foreground objects. The former case can produce real event values but with calculated flow labels,… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  35. arXiv:2303.06425  [pdf, ps, other

    cs.CV

    Improving the Robustness of Deep Convolutional Neural Networks Through Feature Learning

    Authors: ** Ding, Jie-Chao Zhao, Yong-Zhi Sun, ** Tan, Ji-En Ma, You-Tong Fang

    Abstract: Deep convolutional neural network (DCNN for short) models are vulnerable to examples with small perturbations. Adversarial training (AT for short) is a widely used approach to enhance the robustness of DCNN models by data augmentation. In AT, the DCNN models are trained with clean examples and adversarial examples (AE for short) which are generated using a specific attack method, aiming to gain ab… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 8 pages, 12 figures, 6 tables. Work in process

  36. Cross-domain Transfer Learning and State Inference for Soft Robots via a Semi-supervised Sequential Variational Bayes Framework

    Authors: Shageenderan Sapai, Junn Yong Loo, Ze Yang Ding, Chee Pin Tan, Raphael CW Phan, Vishnu Monn Baskaran, Surya Girinatha Nurzaman

    Abstract: Recently, data-driven models such as deep neural networks have shown to be promising tools for modelling and state inference in soft robots. However, voluminous amounts of data are necessary for deep models to perform effectively, which requires exhaustive and quality data collection, particularly of state labels. Consequently, obtaining labelled state data for soft robotic systems is challenged f… ▽ More

    Submitted 25 August, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted at the International Conference on Robotics and Automation (ICRA) 2023

  37. arXiv:2301.08930  [pdf, other

    cs.CV cs.LG cs.RO

    Dense RGB SLAM with Neural Implicit Maps

    Authors: Heng Li, Xiaodong Gu, Weihao Yuan, Luwei Yang, Zilong Dong, ** Tan

    Abstract: There is an emerging trend of using neural implicit functions for map representation in Simultaneous Localization and Map** (SLAM). Some pioneer works have achieved encouraging results on RGB-D SLAM. In this paper, we present a dense RGB SLAM method with neural implicit map representation. To reach this challenging goal without depth input, we introduce a hierarchical feature volume to facilitat… ▽ More

    Submitted 19 February, 2023; v1 submitted 21 January, 2023; originally announced January 2023.

    Comments: Accepted by ICLR 2023; Camera-Ready Version; The code is at poptree.github.io/DIM-SLAM

  38. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

    Authors: Mingchen Li, Liqi Kang, Yi Xiong, Yu Guang Wang, Guisheng Fan, Pan Tan, Liang Hong

    Abstract: Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mech… ▽ More

    Submitted 28 December, 2022; originally announced January 2023.

    Journal ref: Journal of Cheminformatics (2023) 15:12

  39. RAGO: Recurrent Graph Optimizer For Multiple Rotation Averaging

    Authors: Heng Li, Zhaopeng Cui, Shuaicheng Liu, ** Tan

    Abstract: This paper proposes a deep recurrent Rotation Averaging Graph Optimizer (RAGO) for Multiple Rotation Averaging (MRA). Conventional optimization-based methods usually fail to produce accurate results due to corrupted and noisy relative measurements. Recent learning-based approaches regard MRA as a regression problem, while these methods are sensitive to initialization due to the gauge freedom probl… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Accepted by CVPR 2022

    Journal ref: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15766-15775

  40. arXiv:2212.02073  [pdf, other

    cs.CV

    Minimum Latency Deep Online Video Stabilization

    Authors: Zhuofan Zhang, Zhen Liu, ** Tan, Bing Zeng, Shuaicheng Liu

    Abstract: We present a novel camera path optimization framework for the task of online video stabilization. Typically, a stabilization pipeline consists of three steps: motion estimating, path smoothing, and novel view rendering. Most previous methods concentrate on motion estimation, proposing various global or local motion models. In contrast, path optimization receives relatively less attention, especial… ▽ More

    Submitted 15 August, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: Accepted by ICCV 2023

  41. arXiv:2211.11177  [pdf, other

    cs.CV

    NeuMap: Neural Coordinate Map** by Auto-Transdecoder for Camera Localization

    Authors: Shitao Tang, Sicong Tang, Andrea Tagliasacchi, ** Tan, Yasutaka Furukawa

    Abstract: This paper presents an end-to-end neural map** method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scen… ▽ More

    Submitted 26 March, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: CVPR2023

  42. arXiv:2210.14831  [pdf, other

    cs.CV

    Streaming Radiance Fields for 3D Video Synthesis

    Authors: Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, ** Tan

    Abstract: We present an explicit-grid based method for efficiently reconstructing streaming radiance fields for novel view synthesis of real world dynamic scenes. Instead of training a single model that combines all the frames, we formulate the dynamic modeling problem with an incremental learning paradigm in which per-frame model difference is trained to complement the adaption of a base model on the curre… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  43. arXiv:2210.07650  [pdf, other

    cs.CV cs.AI cs.GR

    DART: Articulated Hand Model with Diverse Accessories and Rich Textures

    Authors: Daiheng Gao, Yuliang Xiu, Kailin Li, Lixin Yang, Feng Wang, Peng Zhang, Bang Zhang, Cewu Lu, ** Tan

    Abstract: Hand, the bearer of human productivity and intelligence, is receiving much attention due to the recent fever of digital twins. Among different hand morphable models, MANO has been widely used in vision and graphics community. However, MANO disregards textures and accessories, which largely limits its power to synthesize photorealistic hand data. In this paper, we extend MANO with Diverse Accessori… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Homepage: dart2022.github.io. Accepted by NeurIPS 2022 Datasets and Benchmarks Track

  44. arXiv:2210.02018  [pdf, other

    cs.CV

    InterFace:Adjustable Angular Margin Inter-class Loss for Deep Face Recognition

    Authors: Meng Sang, Jiaxuan Chen, Mengzhen Li, Pan Tan, Anning Pan, Shan Zhao, Yang Yang

    Abstract: In the field of face recognition, it is always a hot research topic to improve the loss solution to make the face features extracted by the network have greater discriminative power. Research works in recent years has improved the discriminative power of the face model by normalizing softmax to the cosine space step by step and then adding a fixed penalty margin to reduce the intra-class distance… ▽ More

    Submitted 9 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2109.09416 by other authors

  45. arXiv:2209.06094  [pdf, other

    cs.LG cs.AI

    Learning to Solve Multiple-TSP with Time Window and Rejections via Deep Reinforcement Learning

    Authors: Rongkai Zhang, Cong Zhang, Zhiguang Cao, Wen Song, Puay Siew Tan, Jie Zhang, Bihan Wen, Justin Dauwels

    Abstract: We propose a manager-worker framework based on deep reinforcement learning to tackle a hard yet nontrivial variant of Travelling Salesman Problem (TSP), \ie~multiple-vehicle TSP with time window and rejections (mTSPTWR), where customers who cannot be served before the deadline are subject to rejections. Particularly, in the proposed framework, a manager agent learns to divide mTSPTWR into sub-rout… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  46. arXiv:2208.03792  [pdf, other

    cs.CV

    Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Gras** Specular and Transparent Objects

    Authors: Qiyu Dai, Jiyao Zhang, Qiwei Li, Tianhao Wu, Hao Dong, Ziyuan Liu, ** Tan, He Wang

    Abstract: Commercial depth sensors usually generate noisy and missing depths, especially on specular and transparent objects, which poses critical issues to downstream depth or point cloud-based tasks. To mitigate this problem, we propose a powerful RGBD fusion network, SwinDRNet, for depth restoration. We further propose Domain Randomization-Enhanced Depth Simulation (DREDS) approach to simulate an active… ▽ More

    Submitted 23 November, 2022; v1 submitted 7 August, 2022; originally announced August 2022.

    Comments: ECCV 2022

  47. arXiv:2207.12579  [pdf, other

    cs.CV

    RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

    Authors: Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, ** Tan

    Abstract: Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated. Relocalization in large-scale indoor environments enables attractive applications such as augmented reality and robot navigation. However, appearance changes fast in such environments when the camera moves, which is challe… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  48. arXiv:2206.11808  [pdf, other

    cs.CV cs.RO

    Unseen Object 6D Pose Estimation: A Benchmark and Baselines

    Authors: Minghao Gou, Haolin Pan, Hao-Shu Fang, Ziyuan Liu, Cewu Lu, ** Tan

    Abstract: Estimating the 6D pose for unseen objects is in great demand for many real-world applications. However, current state-of-the-art pose estimation methods can only handle objects that are previously trained. In this paper, we propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing. We collect a dataset with both real and synthetic… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  49. arXiv:2206.02367  [pdf, other

    cs.MM

    Subtitle-based Viewport Prediction for 360-degree Virtual Tourism Video

    Authors: Chuanzhe **g, Tho Nguyen Duc, Phan Xuan Tan, Eiji Kamioka

    Abstract: 360-degree streaming videos can provide a rich immersive experiences to the users. However, it requires an extremely high bandwidth network. One of the common solutions for saving bandwidth consumption is to stream only a portion of video covered by the user's viewport. To do that, the user's viewpoint prediction is indispensable. In existing viewport prediction methods, they mainly concentrate on… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  50. arXiv:2205.15573  [pdf, other

    cs.GR

    Text/Speech-Driven Full-Body Animation

    Authors: Wenlin Zhuang, **wei Qi, Peng Zhang, Bang Zhang, ** Tan

    Abstract: Due to the increasing demand in films and games, synthesizing 3D avatar animation has attracted much attention recently. In this work, we present a production-ready text/speech-driven full-body animation synthesis system. Given the text and corresponding speech, our system synthesizes face and body animations simultaneously, which are then skinned and rendered to obtain a video stream output. We a… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: IJCAI-2022 demo track, video see https://youtu.be/MipiwU3Em_8