Skip to main content

Showing 1–50 of 106 results for author: Lee, G H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11311  [pdf, other

    cs.CV

    Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection

    Authors: Yunsong Wang, Na Zhao, Gim Hee Lee

    Abstract: The use of synthetic data in indoor 3D object detection offers the potential of greatly reducing the manual labor involved in 3D annotations and training effective zero-shot detectors. However, the complicated domain shifts across syn-to-real indoor datasets remains underexplored. In this paper, we propose a novel Object-wise Hierarchical Domain Alignment (OHDA) framework for syn-to-real unsupervi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.11283  [pdf, other

    cs.CV

    Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding

    Authors: Yunsong Wang, Na Zhao, Gim Hee Lee

    Abstract: The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets. However, it continues to be hindered by the lack of diverse, large-scale, real-world 3D scene datasets for source data. To address this shortfall, we propose Generalizable Representation Learning (GRL), where we devi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.06050  [pdf, other

    cs.CV

    Generalizable Human Gaussians from Single-View Image

    Authors: **nan Chen, Chen Li, Jianfeng Zhang, Hanlin Chen, Buzhen Huang, Gim Hee Lee

    Abstract: In this work, we tackle the task of learning generalizable 3D human Gaussians from a single image. The main challenge for this task is to recover detailed geometry and appearance, especially for the unobserved regions. To this end, we propose single-view generalizable Human Gaussian model (HGM), a diffusion-guided framework for 3D human modeling from a single image. We design a diffusion-based coa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  4. arXiv:2406.05774  [pdf, other

    cs.CV

    VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction

    Authors: Hanlin Chen, Fangyin Wei, Chen Li, Tianxin Huang, Yunsong Wang, Gim Hee Lee

    Abstract: Although 3D Gaussian Splatting has been widely studied because of its realistic and efficient novel-view synthesis, it is still challenging to extract a high-quality surface from the point-based representation. Previous works improve the surface by incorporating geometric priors from the off-the-shelf normal estimator. However, there are two main limitations: 1) Supervising normal rendered from 3D… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  5. arXiv:2405.17958  [pdf, other

    cs.CV

    FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

    Authors: Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee

    Abstract: Empowering 3D Gaussian Splatting with generalization ability is appealing. However, existing generalizable 3D Gaussian Splatting methods are largely confined to narrow-range interpolation between stereo images due to their heavy backbones, thus lacking the ability to accurately localize 3D Gaussian and support free-view synthesis across wide view range. In this paper, we present a novel framework… ▽ More

    Submitted 9 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2405.13943  [pdf, other

    cs.CV

    DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus

    Authors: Yu Chen, Gim Hee Lee

    Abstract: The recent advances in 3D Gaussian Splatting (3DGS) show promising results on the novel view synthesis (NVS) task. With its superior rendering performance and high-fidelity rendering quality, 3DGS is excelling at its previous NeRF counterparts. The most recent 3DGS method focuses either on improving the instability of rendering efficiency or reducing the model size. On the other hand, the training… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  7. arXiv:2404.14329  [pdf, other

    cs.CV

    X-Ray: A Sequential 3D Representation For Generation

    Authors: Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee

    Abstract: We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected s… ▽ More

    Submitted 1 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  8. arXiv:2404.11291  [pdf, other

    cs.CV

    Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

    Authors: Buzhen Huang, Chen Li, Chongyang Xu, Liang Pan, Yangang Wang, Gim Hee Lee

    Abstract: Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration, but overlook the modeling of close interactions. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. The main challenge of this task comes from insufficient visual information caused by depth ambiguity and severe inter-person occ… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  9. arXiv:2404.06814  [pdf, other

    cs.CV

    Zero-shot Point Cloud Completion Via 2D Priors

    Authors: Tianxin Huang, Zhiwen Yan, Yuyang Zhao, Gim Hee Lee

    Abstract: 3D point cloud completion is designed to recover complete shapes from partially observed point clouds. Conventional completion methods typically depend on extensive point cloud data for training %, with their effectiveness often constrained to object categories similar to those seen during training. In contrast, we propose a zero-shot framework aimed at completing partially observed point clouds a… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  10. arXiv:2404.02157  [pdf, other

    cs.CV cs.AI

    Segment Any 3D Object with Language

    Authors: Seungjun Lee, Yuyang Zhao, Gim Hee Lee

    Abstract: In this paper, we investigate Open-Vocabulary 3D Instance Segmentation (OV-3DIS) with free-form language instructions. Earlier works that rely on only annotated base categories for training suffer from limited generalization to unseen novel categories. Recent works mitigate poor generalizability to novel categories by generating class-agnostic masks or projecting generalized masks from 2D to 3D, b… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Project Page: https://cvrp-sole.github.io

  11. arXiv:2404.00931  [pdf, other

    cs.CV

    GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields

    Authors: Yunsong Wang, Hanlin Chen, Gim Hee Lee

    Abstract: Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding. However, the generalizability of existing methods is constrained due to their framework designs and their reliance on 3D data. We address this limitation by introducing Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF), a novel approach offering a generalizable… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  12. arXiv:2404.00874  [pdf, other

    cs.CV

    DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

    Authors: Jie Long Lee, Chen Li, Gim Hee Lee

    Abstract: We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  13. arXiv:2403.11324  [pdf, other

    cs.CV

    GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

    Authors: Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

    Abstract: During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue,… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  14. arXiv:2403.10119  [pdf, other

    cs.CV

    URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields

    Authors: Bo Xu, Ziao Liu, Mengqi Guo, Jiancheng Li, Gim Hee Lee

    Abstract: We propose a novel rolling shutter bundle adjustment method for neural radiance fields (NeRF), which utilizes the unordered rolling shutter (RS) images to obtain the implicit 3D representation. Existing NeRF methods suffer from low-quality images and inaccurate initial camera poses due to the RS effect in the image, whereas, the previous method that incorporates the RS into NeRF requires strict se… ▽ More

    Submitted 24 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  15. arXiv:2401.13146  [pdf, other

    eess.AS cs.CL cs.SD

    Locality enhanced dynamic biasing and sampling strategies for contextual ASR

    Authors: Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  16. arXiv:2401.12085  [pdf, other

    eess.AS cs.SD

    Consistency Based Unsupervised Self-training For ASR Personalisation

    Authors: Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  17. arXiv:2312.00846  [pdf, other

    cs.CV

    NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance

    Authors: Hanlin Chen, Chen Li, Gim Hee Lee

    Abstract: Existing neural implicit surface reconstruction methods have achieved impressive performance in multi-view 3D reconstruction by leveraging explicit geometry priors such as depth maps or point clouds as regularization. However, the reconstruction results still lack fine details because of the over-smoothed depth map or sparse point cloud. In this work, we propose a neural implicit surface reconstru… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  18. arXiv:2311.17089  [pdf, other

    cs.CV

    Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering

    Authors: Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee

    Abstract: 3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  19. arXiv:2311.16657  [pdf, other

    cs.CV

    SCALAR-NeRF: SCAlable LARge-scale Neural Radiance Fields for Scene Reconstruction

    Authors: Yu Chen, Gim Hee Lee

    Abstract: In this work, we introduce SCALAR-NeRF, a novel framework tailored for scalable large-scale neural scene reconstruction. We structure the neural representation as an encoder-decoder architecture, where the encoder processes 3D point coordinates to produce encoded features, and the decoder generates geometric values that include volume densities of signed distances and colors. Our approach first tr… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Project Page: https://aibluefisher.github.io/SCALAR-NeRF

  20. arXiv:2311.14603  [pdf, other

    cs.CV

    Animate124: Animating One Image to 4D Dynamic Scene

    Authors: Yuyang Zhao, Zhiwen Yan, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee

    Abstract: We introduce Animate124 (Animate-one-image-to-4D), the first work to animate a single in-the-wild image into 3D video through textual motion descriptions, an underexplored problem with significant applications. Our 4D generation leverages an advanced 4D grid dynamic Neural Radiance Field (NeRF) model, optimized in three distinct stages using multiple diffusion priors. Initially, a static model is… ▽ More

    Submitted 18 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Project Page: https://animate124.github.io

  21. arXiv:2311.08151  [pdf, other

    cs.CV

    Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

    Authors: Yating Xu, Conghui Hu, Gim Hee Lee

    Abstract: Existing works on weakly-supervised audio-visual video parsing adopt hybrid attention network (HAN) as the multi-modal embedding to capture the cross-modal context. It embeds the audio and visual modalities with a shared network, where the cross-attention is performed at the input. However, such an early fusion method highly entangles the two non-fully correlated modalities and leads to sub-optima… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  22. arXiv:2310.15712  [pdf, other

    cs.CV

    GNeSF: Generalizable Neural Semantic Fields

    Authors: Hanlin Chen, Chen Li, Mengqi Guo, Zhiwen Yan, Gim Hee Lee

    Abstract: 3D scene segmentation based on neural implicit representation has emerged recently with the advantage of training only on 2D supervision. However, existing approaches still requires expensive per-scene optimization that prohibits generalization to novel scenes during inference. To circumvent this problem, we introduce a generalizable 3D segmentation framework based on implicit representation. Spec… ▽ More

    Submitted 26 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  23. arXiv:2310.07247  [pdf, other

    cs.CV cs.RO

    Optimizing the Placement of Roadside LiDARs for Autonomous Driving

    Authors: Wentao Jiang, Hao Xiang, Xinyu Cai, Runsheng Xu, Jiaqi Ma, Yikang Li, Gim Hee Lee, Si Liu

    Abstract: Multi-agent cooperative perception is an increasingly popular topic in the field of autonomous driving, where roadside LiDARs play an essential role. However, how to optimize the placement of roadside LiDARs is a crucial but often overlooked problem. This paper proposes an approach to optimize the placement of roadside LiDARs by selecting optimized positions within the scene for better perception… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  24. arXiv:2309.11228  [pdf, other

    cs.CV

    Towards Robust Few-shot Point Cloud Semantic Segmentation

    Authors: Yating Xu, Na Zhao, Gim Hee Lee

    Abstract: Few-shot point cloud semantic segmentation aims to train a model to quickly adapt to new unseen classes with only a handful of support set samples. However, the noise-free assumption in the support set can be easily violated in many practical real-world settings. In this paper, we focus on improving the robustness of few-shot point cloud segmentation under the detrimental influence of noisy suppor… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: BMVC 2023

  25. arXiv:2309.11222  [pdf, other

    cs.CV

    Generalized Few-Shot Point Cloud Segmentation Via Geometric Words

    Authors: Yating Xu, Conghui Hu, Na Zhao, Gim Hee Lee

    Abstract: Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes. Few-shot point cloud segmentation algorithms address this problem by learning to adapt to new classes at the sacrifice of segmentation accuracy for the base classes, which severely impedes its practicality. This largely motivates us to present the first attempt at a more… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  26. arXiv:2309.08596  [pdf, other

    cs.CV cs.GR cs.RO

    Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion

    Authors: Weng Fei Low, Gim Hee Lee

    Abstract: Event cameras offer many advantages over standard cameras due to their distinctive principle of operation: low power, low latency, high temporal resolution and high dynamic range. Nonetheless, the success of many downstream visual applications also hinges on an efficient and effective scene representation, where Neural Radiance Field (NeRF) is seen as the leading candidate. Such promise and potent… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023. Project website is accessible at https://wengflow.github.io/robust-e-nerf

  27. arXiv:2308.16576  [pdf, other

    cs.CV

    GHuNeRF: Generalizable Human NeRF from a Monocular Video

    Authors: Chen Li, Jiahao Lin, Gim Hee Lee

    Abstract: In this paper, we tackle the challenging task of learning a generalizable human NeRF model from a monocular video. Although existing generalizable human NeRFs have achieved impressive results, they require muti-view images or videos which might not be always available. On the other hand, some works on free-viewpoint rendering of human from monocular videos cannot be generalized to unseen identitie… ▽ More

    Submitted 12 December, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: Add in more baseline for comparison

  28. arXiv:2308.10205  [pdf, other

    cs.CV

    GeT: Generative Target Structure Debiasing for Domain Adaptation

    Authors: Can Zhang, Gim Hee Lee

    Abstract: Domain adaptation (DA) aims to transfer knowledge from a fully labeled source to a scarcely labeled or totally unlabeled target under domain shift. Recently, semi-supervised learning-based (SSL) techniques that leverage pseudo labeling have been increasingly used in DA. Despite the competitive performance, these pseudo labeling methods rely heavily on the source domain to generate pseudo labels fo… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  29. arXiv:2308.09386  [pdf, other

    cs.CV

    DReg-NeRF: Deep Registration for Neural Radiance Fields

    Authors: Yu Chen, Gim Hee Lee

    Abstract: Although Neural Radiance Fields (NeRF) is popular in the computer vision community recently, registering multiple NeRFs has yet to gain much attention. Unlike the existing work, NeRF2NeRF, which is based on traditional optimization methods and needs human annotated keypoints, we propose DReg-NeRF to solve the NeRF registration problem on object-centric scenes without human intervention. After trai… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  30. arXiv:2307.13459  [pdf, other

    cs.CV

    Weakly-supervised 3D Pose Transfer with Keypoints

    Authors: **nan Chen, Chen Li, Gim Hee Lee

    Abstract: The main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies. We thus propose a novel weakly-supervised keypoint-based framework to overcome these difficulties. Specifically, we use a topology-agnostic ke… ▽ More

    Submitted 17 August, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023, Project page: https://**nan-chen.github.io/ws3dpt/

  31. arXiv:2307.09112  [pdf, other

    cs.CV

    NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF

    Authors: Stefan Lionar, Xiangyu Xu, Min Lin, Gim Hee Lee

    Abstract: Remarkable progress has been made in 3D reconstruction from single-view RGB-D inputs. MCC is the current state-of-the-art method in this field, which achieves unprecedented success by combining vision Transformers with large-scale training. However, we identified two key limitations of MCC: 1) The Transformer decoder is inefficient in handling large number of query points; 2) The 3D representation… ▽ More

    Submitted 21 November, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023. Project page: https://numcc.github.io/ Code: https://github.com/sail-sg/numcc

  32. arXiv:2305.14831  [pdf, other

    cs.CV

    OD-NeRF: Efficient Training of On-the-Fly Dynamic Neural Radiance Fields

    Authors: Zhiwen Yan, Chen Li, Gim Hee Lee

    Abstract: Dynamic neural radiance fields (dynamic NeRFs) have demonstrated impressive results in novel view synthesis on 3D dynamic scenes. However, they often require complete video sequences for training followed by novel view synthesis, which is similar to playing back the recording of a dynamic 3D scene. In contrast, we propose OD-NeRF to efficiently train and render dynamic NeRFs on-the-fly which inste… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  33. arXiv:2305.12833  [pdf, other

    cs.CV

    Boosting Long-tailed Object Detection via Step-wise Learning on Smooth-tail Data

    Authors: Na Dong, Yongqiang Zhang, Mingli Ding, Gim Hee Lee

    Abstract: Real-world data tends to follow a long-tailed distribution, where the class imbalance results in dominance of the head classes during training. In this paper, we propose a frustratingly simple but effective step-wise learning framework to gradually enhance the capability of the model in detecting all categories of long-tailed datasets. Specifically, we build smooth-tail data where the long-tailed… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 10 pages, 5 figures

  34. arXiv:2305.11031  [pdf, other

    cs.CV

    ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis

    Authors: Shoukang Hu, Kaichen Zhou, Kaiyu Li, Longhui Yu, Lanqing Hong, Tianyang Hu, Zhenguo Li, Gim Hee Lee, Ziwei Liu

    Abstract: Neural Radiance Fields (NeRF) has demonstrated remarkable 3D reconstruction capabilities with dense view images. However, its performance significantly deteriorates under sparse view settings. We observe that learning the 3D consistency of pixels among different views is crucial for improving reconstruction quality in such cases. In this paper, we propose ConsistentNeRF, a method that leverages de… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: https://github.com/skhu101/ConsistentNeRF

  35. arXiv:2305.08850  [pdf, other

    cs.CV

    Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

    Authors: Yuyang Zhao, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee

    Abstract: The text-driven image and video diffusion models have achieved unprecedented success in generating realistic and diverse content. Recently, the editing and variation of existing images and videos in diffusion-based generative models have garnered significant attention. However, previous works are limited to editing content with text or providing coarse personalization using a single visual clue, r… ▽ More

    Submitted 18 February, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Project page: https://make-a-protagonist.github.io

  36. arXiv:2305.00646  [pdf, other

    cs.CV

    Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

    Authors: Ziwei Yu, Chen Li, Linlin Yang, Xiaoxu Zheng, Michael Bi Mi, Gim Hee Lee, Angela Yao

    Abstract: Direct mesh fitting for 3D hand shape reconstruction is highly accurate. However, the reconstructed meshes are prone to artifacts and do not appear as plausible hand shapes. Conversely, parametric models like MANO ensure plausible hand shapes but are not as accurate as the non-parametric methods. In this work, we introduce a novel weakly-supervised hand shape estimation framework that integrates n… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: CVPR 2023

  37. arXiv:2303.15023  [pdf, other

    cs.CV

    ScarceNet: Animal Pose Estimation with Scarce Annotations

    Authors: Chen Li, Gim Hee Lee

    Abstract: Animal pose estimation is an important but under-explored task due to the lack of labeled data. In this paper, we tackle the task of animal pose estimation with scarce annotations, where only a small set of labeled data and unlabeled images are available. At the core of the solution to this problem setting is the use of the unlabeled data to compensate for the lack of well-labeled animal pose data… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  38. arXiv:2303.14478  [pdf, other

    cs.CV

    DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields

    Authors: Yu Chen, Gim Hee Lee

    Abstract: Recent works such as BARF and GARF can bundle adjust camera poses with neural radiance fields (NeRF) which is based on coordinate-MLPs. Despite the impressive results, these methods cannot be applied to Generalizable NeRFs (GeNeRFs) which require image feature extractions that are often based on more complicated 3D CNN or transformer architectures. In this work, we first analyze the difficulties o… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

  39. arXiv:2303.14435  [pdf, other

    cs.CV cs.GR

    NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects

    Authors: Zhiwen Yan, Chen Li, Gim Hee Lee

    Abstract: Dynamic Neural Radiance Field (NeRF) is a powerful algorithm capable of rendering photo-realistic novel view images from a monocular RGB video of a dynamic scene. Although it warps moving points across frames from the observation spaces to a common canonical space for rendering, dynamic NeRF does not model the change of the reflected color during the war**. As a result, this approach often fails… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  40. arXiv:2303.11052  [pdf, other

    cs.CV

    ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning

    Authors: Hao Yang, Lanqing Hong, Aoxue Li, Tianyang Hu, Zhenguo Li, Gim Hee Lee, Liwei Wang

    Abstract: Although many recent works have investigated generalizable NeRF-based novel view synthesis for unseen scenes, they seldom consider the synthetic-to-real generalization, which is desired in many practical applications. In this work, we first investigate the effects of synthetic data in synthetic-to-real novel view synthesis and surprisingly observe that models trained with synthetic data tend to pr… ▽ More

    Submitted 22 June, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  41. arXiv:2301.12135  [pdf, other

    cs.CV cs.DC

    AdaSfM: From Coarse Global to Fine Incremental Adaptive Structure from Motion

    Authors: Yu Chen, Zihao Yu, Shu Song, Tianning Yu, Jianming Li, Gim Hee Lee

    Abstract: Despite the impressive results achieved by many existing Structure from Motion (SfM) approaches, there is still a need to improve the robustness, accuracy, and efficiency on large-scale scenes with many outlier matches and sparse view graphs. In this paper, we propose AdaSfM: a coarse-to-fine adaptive SfM approach that is scalable to large-scale and challenging datasets. Our approach first does a… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

    Comments: accepted by ICRA 2023

  42. arXiv:2212.10950  [pdf, other

    cs.CV

    Incremental Neural Implicit Representation with Uncertainty-Filtered Knowledge Distillation

    Authors: Mengqi Guo, Chen Li, Hanlin Chen, Gim Hee Lee

    Abstract: Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis. However, they suffer from the catastrophic forgetting problem when continuously learning from streaming data without revisiting the previously seen data. This limitation prohibits the application of existing NIRs to scenarios where images come in sequentially. In vi… ▽ More

    Submitted 4 June, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

  43. arXiv:2212.09068  [pdf, other

    cs.CV

    Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

    Authors: Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee

    Abstract: Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to the poor generalization ability, which limits the real-world applications. The domain shift mainly lies in the limited source environmental variations and the large distribution gap between source and unseen target data. To this end, we pro… ▽ More

    Submitted 24 November, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: Accepted by IJCV. Journal extension of arXiv:2204.02548. Code is available at https://github.com/HeliosZhao/SHADE-VisualDG

  44. arXiv:2212.04679  [pdf, other

    cs.CV

    Motion and Context-Aware Audio-Visual Conditioned Video Prediction

    Authors: Yating Xu, Conghui Hu, Gim Hee Lee

    Abstract: The existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame. However, a direct inference of per-pixel intensity for the next visual frame is extremely challenging because of the high-dimensional image space. To this end, we decouple the a… ▽ More

    Submitted 20 September, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: BMVC 2023

  45. arXiv:2212.04668  [pdf, other

    cs.CV

    Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds

    Authors: Yuyang Zhao, Na Zhao, Gim Hee Lee

    Abstract: Semantic segmentation in 3D indoor scenes has achieved remarkable performance under the supervision of large-scale annotated data. However, previous works rely on the assumption that the training and testing data are of the same distribution, which may suffer from performance degradation when evaluated on the out-of-distribution scenes. To alleviate the annotation cost and the performance degradat… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  46. arXiv:2212.02969  [pdf, other

    cs.CV

    Open World DETR: Transformer based Open World Object Detection

    Authors: Na Dong, Yongqiang Zhang, Mingli Ding, Gim Hee Lee

    Abstract: Open world object detection aims at detecting objects that are absent in the object classes of the training data as unknown objects without explicit supervision. Furthermore, the exact classes of the unknown objects must be identified without catastrophic forgetting of the previous known classes when the corresponding annotations of unknown objects are given incrementally. In this paper, we propos… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 13 pages, 6 figures

  47. arXiv:2207.14782  [pdf, other

    cs.CV cs.AI cs.GR

    Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion

    Authors: Weng Fei Low, Gim Hee Lee

    Abstract: Explicit neural surface representations allow for exact and efficient extraction of the encoded surface at arbitrary precision, as well as analytic derivation of differential geometric properties such as surface normal and curvature. Such desirable properties, which are absent in its implicit counterpart, makes it ideal for various applications in computer vision, graphics and robotics. However, S… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. Code is available at https://github.com/low5545/minimal-neural-atlas

  48. arXiv:2207.09721  [pdf, other

    cs.CV

    Feature Representation Learning for Unsupervised Cross-domain Image Retrieval

    Authors: Conghui Hu, Gim Hee Lee

    Abstract: Current supervised cross-domain image retrieval methods can achieve excellent performance. However, the cost of data collection and labeling imposes an intractable barrier to practical deployment in real applications. In this paper, we investigate the unsupervised cross-domain image retrieval task, where class labels and pairing annotations are no longer a prerequisite for training. This is an ext… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV2022

  49. arXiv:2207.09332  [pdf, other

    cs.CV

    Rethinking IoU-based Optimization for Single-stage 3D Object Detection

    Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, Min-Jian Zhao, Gim Hee Lee

    Abstract: Since Intersection-over-Union (IoU) based optimization maintains the consistency of the final IoU prediction metric and losses, it has been widely used in both regression and classification branches of single-stage 2D object detectors. Recently, several 3D object detection methods adopt IoU-based optimization and directly replace the 2D IoU with 3D IoU. However, such a direct computation in 3D is… ▽ More

    Submitted 20 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV2022. The code is available at https://github.com/hlsheng1/RDIoU

  50. arXiv:2207.04892  [pdf, other

    cs.CV

    Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation

    Authors: Zhun Zhong, Yuyang Zhao, Gim Hee Lee, Nicu Sebe

    Abstract: In this paper, we consider the problem of domain generalization in semantic segmentation, which aims to learn a robust model using only labeled synthetic (source) data. The model is expected to perform well on unseen real (target) domains. Our study finds that the image style variation can largely influence the model's performance and the style features can be well represented by the channel-wise… ▽ More

    Submitted 12 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2022