Skip to main content

Showing 1–28 of 28 results for author: Cham, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.14627  [pdf, other

    cs.CV

    MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

    Authors: Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

    Abstract: We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane swee** in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian prim… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://donydchen.github.io/mvsplat Code: https://github.com/donydchen/mvsplat

  2. arXiv:2403.14619  [pdf, other

    cs.CV

    ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

    Authors: Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu

    Abstract: 3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preven… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project Page: https://sm0kywu.github.io/ClusteringSDF/

  3. arXiv:2402.19159  [pdf, other

    cs.CV

    Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Map**

    Authors: Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, Tat-Jen Cham

    Abstract: Latent Consistency Model (LCM) extends the Consistency Model to the latent space and leverages the guided consistency distillation technique to achieve impressive performance in accelerating text-to-image synthesis. However, we observed that LCM struggles to generate images with both clarity and detailed intricacy. Consequently, we introduce Trajectory Consistency Distillation (TCD), which encompa… ▽ More

    Submitted 15 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Project Page: https://mhh0318.github.io/tcd

  4. arXiv:2311.15744  [pdf, other

    cs.CV

    One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

    Authors: Minghui Hu, Jianbin Zheng, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham

    Abstract: It is well known that many open-released foundational diffusion models have difficulty in generating images that substantially depart from average brightness, despite such images being present in the training data. This is due to an inconsistency: while denoising starts from pure Gaussian noise during inference, the training noise schedule retains residual data even in the final timestep distribut… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Project Page: https://jabir-zheng.github.io/OneMoreStep/, Demo Page: https://huggingface.co/spaces/h1t/oms_sdxl_lcm

  5. arXiv:2307.03177  [pdf, other

    cs.CV

    PanoDiffusion: 360-degree Panorama Outpainting via Diffusion

    Authors: Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham

    Abstract: Generating complete 360-degree panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360-degree indoor RGB-D panorama outpainting model using latent diffu… ▽ More

    Submitted 20 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Project Page: https://sm0kywu.github.io/panodiffusion/

  6. arXiv:2306.00964  [pdf, other

    cs.CV cs.LG

    Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

    Authors: Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham

    Abstract: Text-conditional diffusion models are able to generate high-fidelity images with diverse contents. However, linguistic representations frequently exhibit ambiguous descriptions of the envisioned objective imagery, requiring the incorporation of additional control signals to bolster the efficacy of text-guided diffusion models. In this work, we propose Cocktail, a pipeline to mix various modalities… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Project Page: https://mhh0318.github.io/cocktail/

  7. arXiv:2304.12294  [pdf, other

    cs.CV

    Explicit Correspondence Matching for Generalizable Neural Radiance Fields

    Authors: Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

    Abstract: We present a new generalizable NeRF method that is able to directly generalize to new unseen scenarios and perform novel view synthesis with as few as two source views. The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering. The explicit correspondence matching… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: Code and pre-trained models: https://github.com/donydchen/matchnerf Project Page: https://donydchen.github.io/matchnerf/

  8. arXiv:2303.13817  [pdf, other

    cs.CV

    ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field

    Authors: Zhe Jun Tang, Tat-Jen Cham, Haiyu Zhao

    Abstract: Neural Radiance Field (NeRF) is a popular method in representing 3D scenes by optimising a continuous volumetric scene function. Its large success which lies in applying volumetric rendering (VR) is also its Achilles' heel in producing view-dependent effects. As a consequence, glossy and transparent surfaces often appear murky. A remedy to reduce these artefacts is to constrain this VR equation by… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2023

    Journal ref: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2023

  9. arXiv:2211.14842  [pdf, other

    cs.CV

    Unified Discrete Diffusion for Simultaneous Vision-Language Generation

    Authors: Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, Dacheng Tao, Ponnuthurai N. Suganthan

    Abstract: The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the "modality translation" and "multi-modality generation" tasks using a single model, performing text-based, image-… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

  10. arXiv:2207.06235  [pdf, other

    cs.CV

    Entry-Flipped Transformer for Inference and Prediction of Participant Behavior

    Authors: Bo Hu, Tat-Jen Cham

    Abstract: Some group activities, such as team sports and choreographed dances, involve closely coupled interaction between participants. Here we investigate the tasks of inferring and predicting participant behavior, in terms of motion paths and actions, under such conditions. We narrow the problem to that of estimating how a set target participants react to the behavior of other observed participants. Our… ▽ More

    Submitted 14 July, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: Accepted in ECCV 2022

  11. arXiv:2204.01931  [pdf

    cs.CV

    High-Quality Pluralistic Image Completion via Code Shared VQGAN

    Authors: Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Dinh Phung, Linjie Luo

    Abstract: PICNet pioneered the generation of multiple and diverse results for image completion task, but it required a careful balance between $\mathcal{KL}$ loss (diversity) and reconstruction loss (quality), resulting in a limited diversity and quality . Separately, iGPT-based architecture has been employed to infer distributions in a discrete space derived from a pixel-level pre-clustered palette, which… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: 12 pages, 15 figures

  12. Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

    Authors: Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

    Abstract: Image translation and manipulation have gain increasing attention along with the rapid development of deep generative models. Although existing approaches have brought impressive results, they mainly operated in 2D space. In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF,… ▽ More

    Submitted 20 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: ECCV2022, Code: https://github.com/donydchen/sem2nerf Project Page: https://donydchen.github.io/sem2nerf/

  13. arXiv:2112.01799  [pdf, other

    cs.CV cs.LG

    Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation

    Authors: Minghui Hu, Yujie Wang, Tat-Jen Cham, Jianfei Yang, P. N. Suganthan

    Abstract: The integration of Vector Quantised Variational AutoEncoder (VQ-VAE) with autoregressive models as generation part has yielded high-quality results on image generation. However, the autoregressive models will strictly follow the progressive scanning order during the sampling phase. This leads the existing VQ series models to hardly escape the trap of lacking global information. Denoising Diffusion… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  14. arXiv:2107.12096  [pdf, other

    cs.CV

    Towards Unbiased Visual Emotion Recognition via Causal Intervention

    Authors: Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai

    Abstract: Although much progress has been made in visual emotion recognition, researchers have realized that modern deep networks tend to exploit dataset characteristics to learn spurious statistical associations between the input and the target. Such dataset characteristics are usually treated as dataset bias, which damages the robustness and generalization performance of these recognition systems. In this… ▽ More

    Submitted 20 July, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: Accepted to ACM Multimedia 2022, code is available at https://github.com/donydchen/causal_emotion

  15. Half-body Portrait Relighting with Overcomplete Lighting Representation

    Authors: Guoxian Song, Tat-Jen Cham, Jianfei Cai, Jianmin Zheng

    Abstract: We present a neural-based model for relighting a half-body portrait image by simply referring to another portrait image with the desired lighting condition. Rather than following classical inverse rendering methodology that involves estimating normals, albedo and environment maps, we implicitly encode the subject and lighting in a latent space, and use these latent codes to generate relighted imag… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

  16. arXiv:2104.05367  [pdf, other

    cs.CV

    Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

    Authors: Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai

    Abstract: Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world. Concurrently, image completion has aimed to create plausible appearance for the invisible regions, but requires a manual mask as input. In this work, we propose a higher-level scene understanding system to tackle both visible and invis… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 20 pages, 16 pages

  17. arXiv:2104.00854  [pdf, other

    cs.CV

    The Spatially-Correlative Loss for Various Image Translation Tasks

    Authors: Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

    Abstract: We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation. Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses, but the domain-specific nature of these losses hinder translation across… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 14 pages, 12 figures

  18. arXiv:2104.00845  [pdf, other

    cs.CV

    Bridging Global Context Interactions for High-Fidelity Image Completion

    Authors: Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh Phung

    Abstract: Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a… ▽ More

    Submitted 22 November, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  19. arXiv:2003.03055  [pdf, other

    cs.CV

    GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

    Authors: Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Jianming Zheng

    Abstract: Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture. Most existing AU recognition approaches leverage geometry information in a straightforward 2D or 3D manner, which either ignore 3D manifold information or suffer from high computational costs. In this paper,… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

    Comments: 16 pages, 3 figures

  20. arXiv:1911.11999  [pdf, other

    cs.CV cs.GR

    Recovering Facial Reflectance and Geometry from Multi-view Images

    Authors: Guoxian Song, Jianmin Zheng, Jianfei Cai, Tat-Jen Cham

    Abstract: While the problem of estimating shapes and diffuse reflectances of human faces from images has been extensively studied, there is relatively less work done on recovering the specular albedo. This paper presents a lightweight solution for inferring photorealistic facial reflectance and geometry. Our system processes video streams from two views of a subject, and outputs two reflectance maps for dif… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

  21. arXiv:1905.02114  [pdf, other

    cs.CV

    Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

    Authors: Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

    Abstract: In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations. Specifically, we introduce a statistical 3D morphable model that flexibly describes the distribution of points on the surface of the face model, with an efficient swit… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  22. Unconstrained Facial Action Unit Detection via Latent Feature Domain

    Authors: Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma

    Abstract: Facial action unit (AU) detection in the wild is a challenging problem, due to the unconstrained variability in facial appearances and the lack of accurate annotations. Most existing methods depend on either impractical labor-intensive labeling or inaccurate pseudo labels. In this paper, we propose an end-to-end unconstrained facial AU detection framework based on domain adaptation, which transfer… ▽ More

    Submitted 20 June, 2021; v1 submitted 25 March, 2019; originally announced March 2019.

    Comments: This paper has been accepted by IEEE Transactions on Affective Computing

  23. arXiv:1903.04227  [pdf, other

    cs.CV

    Pluralistic Image Completion

    Authors: Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

    Abstract: Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach for \textbf{pluralistic image completion} -- the task of generating multiple and diverse plausible solutions for image completion. A major challenge faced by learning-based approaches is that usually only one ground truth training i… ▽ More

    Submitted 5 April, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: 21 pages, 16 figures

  24. arXiv:1903.00304  [pdf, other

    cs.CV

    Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

    Authors: Bo Hu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan

    Abstract: Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions. However, the temporal relation of an action has not been fully explored. In this paper, we propose an end-to-end Progress Regression Recurrent Neural Network (PR-RNN) for online spatial-temporal action localization, which learns to infer the actio… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

    Comments: 11 pages, 5 figures

  25. Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

    Authors: Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs

    Abstract: Teleconference or telepresence based on virtual reality (VR) headmount display (HMD) device is a very interesting and promising application since HMD can provide immersive feelings for users. However, in order to facilitate face-to-face communications for HMD users, real-time 3D facial performance capture of a person wearing HMD is needed, which is a very challenging task due to the large occlusio… ▽ More

    Submitted 20 January, 2019; originally announced January 2019.

    Comments: ACM Multimedia Conference 2018

  26. arXiv:1808.01454  [pdf, other

    cs.CV

    T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

    Authors: Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

    Abstract: Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire. We propose a framework, trained on synthetic image-depth pairs and unpaired real images, that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network. A key idea is having the first network a… ▽ More

    Submitted 4 August, 2018; originally announced August 2018.

    Comments: 15 pages, 8 figures

  27. Conditional Adversarial Synthesis of 3D Facial Action Units

    Authors: Zhilei Liu, Guoxian Song, Jianfei Cai, Tat-Jen Cham, Juyong Zhang

    Abstract: Employing deep learning-based approaches for fine-grained facial expression analysis, such as those involving the estimation of Action Unit (AU) intensities, is difficult due to the lack of a large-scale dataset of real faces with sufficiently diverse AU labels for training. In this paper, we consider how AU-level facial image synthesis can be used to substantially augment such a dataset. We propo… ▽ More

    Submitted 14 March, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

    Journal ref: NeuroComputing 355 (2019) 200-208

  28. arXiv:1507.02779  [pdf, other

    cs.CV

    Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes

    Authors: Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-jen Cham

    Abstract: We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user. In particular, we emphasize on improving the tracking performance in instances where the tracked subject is at a large distance from the cameras, and the quality of point cloud deteriorates severely. Th… ▽ More

    Submitted 10 July, 2015; originally announced July 2015.

    Comments: 10 pages, 8 figures, 4 tables