Skip to main content

Showing 1–16 of 16 results for author: Tan, D J

.
  1. arXiv:2406.09801  [pdf, other

    cs.CV

    RaNeuS: Ray-adaptive Neural Surface Reconstruction

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: Our objective is to leverage a differentiable radiance field \eg NeRF to reconstruct detailed 3D surfaces in addition to producing the standard novel view renderings. There have been related methods that perform such tasks, usually by utilizing a signed distance field (SDF). However, the state-of-the-art approaches still fail to correctly reconstruct the small-scale details, such as the leaves, ro… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 3DV 2024, oral. In: Proceedings of the IEEE/CVF International Conference on 3D Vision (2023)

  2. arXiv:2311.16241  [pdf, other

    cs.CV

    SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

    Authors: Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, Federico Tombari

    Abstract: In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  3. arXiv:2309.15487  [pdf, other

    cs.CV

    Tackling VQA with Pretrained Foundation Models without Further Training

    Authors: Alvin De Jun Tan, Bingquan Shen

    Abstract: Large language models (LLMs) have achieved state-of-the-art results in many natural language processing tasks. They have also demonstrated ability to adapt well to different tasks through zero-shot or few-shot settings. With the capability of these LLMs, researchers have looked into how to adopt them for use with Visual Question Answering (VQA). Many methods require further training to align the i… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  4. arXiv:2309.15486  [pdf, other

    cs.CV

    Transferability of Representations Learned using Supervised Contrastive Learning Trained on a Multi-Domain Dataset

    Authors: Alvin De Jun Tan, Clement Tan, Chai Kiat Yeo

    Abstract: Contrastive learning has shown to learn better quality representations than models trained using cross-entropy loss. They also transfer better to downstream datasets from different domains. However, little work has been done to explore the transferability of representations learned using contrastive learning when trained on a multi-domain dataset. In this paper, a study has been conducted using th… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  5. arXiv:2303.17408  [pdf, other

    cs.CL

    P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data

    Authors: Yucheng Ruan, Xiang Lan, Daniel J. Tan, Hairil Rizal Abdullah, Mengling Feng

    Abstract: Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remained for existing work to be effectively adapted into medical domain, such as under-utilization… ▽ More

    Submitted 9 January, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

  6. arXiv:2211.11674  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

    Authors: Dario Pavllo, David Joseph Tan, Marie-Julie Rakotosaona, Federico Tombari

    Abstract: Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies. Recent work in this area, however, has mostly focused on synthetic datasets where exact ground-truth poses are known, and has overlooked pose estimation, which is important for certain downstream appli… ▽ More

    Submitted 20 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Code and models are available at https://github.com/google-research/nerf-from-image

  7. SoftPool++: An Encoder-Decoder Network for Point Cloud Completion

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a novel convolutional operator for the task of point cloud completion. One striking characteristic of our approach is that, conversely to related work it does not require any max-pooling or voxelization operation. Instead, the proposed operator used to learn the point cloud embedding in the encoder extracts permutation-invariant features from the point cloud via a soft-pooling of featur… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

    Comments: Accepted in International Journal of Computer Vision

    Journal ref: Int J Comput Vis 130, 1145-1164 (2022)

  8. arXiv:2203.16600  [pdf, other

    cs.CV cs.AI

    Learning Local Displacements for Point Cloud Completion

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud. Our architecture relies on three novel layers that are used successively within an encoder-decoder structure and specifically developed for the task at hand. The first one carries out feature extraction by matching the point features to a set of pre-trained local descripto… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  9. arXiv:2201.05675  [pdf, other

    cs.CV cs.LG

    Transformers in Action: Weakly Supervised Action Segmentation

    Authors: John Ridley, Huseyin Coskun, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: The video action segmentation task is regularly explored under weaker forms of supervision, such as transcript supervision, where a list of actions is easier to obtain than dense frame-wise labels. In this formulation, the task presents various challenges for sequence modeling approaches due to the emphasis on action transition points, long sequence lengths, and frame contextualization, making the… ▽ More

    Submitted 20 January, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

    Comments: Under Review

  10. arXiv:2112.13384  [pdf, other

    cs.LG cs.MM cs.SI

    Will You Dance To The Challenge? Predicting User Participation of TikTok Challenges

    Authors: Lynnette Hui Xian Ng, John Yeh Han Tan, Darryl **g Heng Tan, Roy Ka-Wei Lee

    Abstract: TikTok is a popular new social media, where users express themselves through short video clips. A common form of interaction on the platform is participating in "challenges", which are songs and dances for users to iterate upon. Challenge contagion can be measured through replication reach, i.e., users uploading videos of their participation in the challenges. The uniqueness of the TikTok platform… ▽ More

    Submitted 26 December, 2021; originally announced December 2021.

    Comments: Accepted at ASONAM 2021

  11. arXiv:2011.08534  [pdf, other

    cs.CV

    A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views

    Authors: Riccardo Spezialetti, David Joseph Tan, Alessio Tonioni, Keisuke Tateno, Federico Tombari

    Abstract: Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. Most approaches regress the full object shape in a canonical pose, possibly extrapolating the occluded parts based on the learned priors. However, their viewpoint invariant technique often discards the unique structures visible from the input imag… ▽ More

    Submitted 18 November, 2020; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted to 3DV 2020 as oral

  12. arXiv:2008.07358  [pdf, other

    cs.CV eess.IV

    SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: Point clouds are often the default choice for many applications as they exhibit more flexibility and efficiency than volumetric data. Nevertheless, their unorganized nature -- points are stored in an unordered way -- makes them less suited to be processed by deep learning pipelines. In this paper, we propose a method for 3D object completion and classification based on point clouds. We introduce a… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

    Comments: accepted in ECCV 2020 as oral

  13. arXiv:1909.01106  [pdf, other

    cs.CV cs.AI cs.CG cs.LG eess.IV

    ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space. To transfer information between the geometric and semantic branches of the network, we introduce paths between them concaten… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: Accepted in International Conference on Computer Vision 2019

  14. Adversarial Semantic Scene Completion from a Single Depth Image

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a method to reconstruct, complete and semantically label a 3D scene from a single input depth image. We improve the accuracy of the regressed semantic 3D maps by a novel architecture based on adversarial learning. In particular, we suggest using multiple adversarial loss terms that not only enforce realistic outputs with respect to the ground truth, but also an effective embedding of th… ▽ More

    Submitted 25 October, 2018; originally announced October 2018.

    Comments: 2018 International Conference on 3D Vision (3DV)

    Journal ref: 2018 International Conference on 3D Vision (3DV), Verona, Italy, 2018, pp. 426-434

  15. arXiv:1807.11176  [pdf, other

    cs.CV

    Human Motion Analysis with Deep Metric Learning

    Authors: Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, Federico Tombari

    Abstract: Effectively measuring the similarity between two human motions is necessary for several computer vision tasks such as gait analysis, person identi- fication and action retrieval. Nevertheless, we believe that traditional approaches such as L2 distance or Dynamic Time War** based on hand-crafted local pose metrics fail to appropriately capture the semantic relationship across motions and, as such… ▽ More

    Submitted 5 August, 2018; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: To appear in ECCV 2018

  16. arXiv:1709.01459  [pdf, other

    cs.CV

    6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality

    Authors: David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: To determine the 3D orientation and 3D location of objects in the surroundings of a camera mounted on a robot or mobile device, we developed two powerful algorithms in object detection and temporal tracking that are combined seamlessly for robotic perception and interaction as well as Augmented Reality (AR). A separate evaluation of, respectively, the object detection and the temporal tracker demo… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.