Skip to main content

Showing 1–50 of 503 results for author: Van Gool, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19811  [pdf, ps, other

    cs.CV

    EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting

    Authors: Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang

    Abstract: Human activities are inherently complex, and even simple household tasks involve numerous object interactions. To better understand these activities and behaviors, it is crucial to model their dynamic interactions with the environment. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand dynamic human-object inter… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.17438  [pdf, other

    cs.CV

    Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes

    Authors: Qi Ma, Danda Pani Paudel, Ender Konukoglu, Luc Van Gool

    Abstract: Neural implicit functions have demonstrated significant importance in various areas such as computer vision, graphics. Their advantages include the ability to represent complex shapes and scenes with high fidelity, smooth interpolation capabilities, and continuous representations. Despite these benefits, the development and analysis of implicit functions have been limited by the lack of comprehens… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.10898  [pdf, other

    cs.RO cs.CV

    TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding

    Authors: Zhejun Zhang, Christos Sakaridis, Luc Van Gool

    Abstract: In this technical report we present TrafficBots V1.5, a baseline method for the closed-loop simulation of traffic agents. TrafficBots V1.5 achieves baseline-level performance and a 3rd place ranking in the Waymo Open Sim Agents Challenge (WOSAC) 2024. It is a simple baseline that combines TrafficBots, a CVAE-based multi-agent policy conditioned on each agent's individual destination and personalit… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: A Technical Report for Waymo Open Sim Agents Challenge and CVPR 2024 Workshop on Autonomous Driving

  4. arXiv:2406.04221  [pdf, other

    cs.CV

    Matching Anything by Segmenting Anything

    Authors: Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu

    Abstract: The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits the cross-domain generalization of learned similarity embeddings. We propose MASA, a novel method for robust instance association learning, capable of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Highlight. code at: https://github.com/siyuanliii/masa

  5. arXiv:2405.20008  [pdf, other

    cs.CV

    Sharing Key Semantics in Transformer Makes Efficient Image Restoration

    Authors: Bin Ren, Yawei Li, **gyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Ming-Hsuan Yang, Nicu Sebe

    Abstract: Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the Vision Transformers (ViTs) emergence has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated objec… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 9 pages

  6. arXiv:2405.17773  [pdf, other

    cs.CV

    Towards a Generalist and Blind RGB-X Tracker

    Authors: Yuedong Tan, Zongwei Wu, Yuqian Fu, Zhuyun Zhou, Guolei Sun, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte

    Abstract: With the emergence of a single large model capable of successfully solving a multitude of tasks in NLP, there has been growing research interest in achieving similar goals in computer vision. On the one hand, most of these generic models, referred to as generalist vision models, aim at producing unified outputs serving different tasks. On the other hand, some existing models aim to combine differe… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  7. arXiv:2405.16544  [pdf, other

    cs.CV

    Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians

    Authors: Erik Sandström, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Luc Van Gool, Martin R. Oswald, Federico Tombari

    Abstract: 3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Map** (SLAM), as it provides a compact dense map representation while enabling efficient and high-quality map rendering. However, existing methods show significantly worse reconstruction quality than competing methods using other 3D representations, e.g. neur… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 21 pages

  8. arXiv:2405.04662  [pdf, other

    cs.CV

    Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar

    Authors: David Borts, Erich Liang, Tim Brödermann, Andrea Ramazzina, Stefanie Walz, Edoardo Palladin, Jipeng Sun, David Bruggemann, Christos Sakaridis, Luc Van Gool, Mario Bijelic, Felix Heide

    Abstract: Neural fields have been broadly investigated as scene representations for the reproduction and novel generation of diverse outdoor scenes, including those autonomous vehicles and robots must handle. While successful approaches for RGB and LiDAR data exist, neural reconstruction methods for radar as a sensing modality have been largely unexplored. Operating at millimeter wavelengths, radar sensors… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, to be published in SIGGRAPH 2024

  9. arXiv:2404.05603  [pdf, other

    cs.CV cs.AI

    Self-Explainable Affordance Learning with Embodied Caption

    Authors: Zhipeng Zhang, Zhimin Wei, Guolei Sun, Peng Wang, Luc Van Gool

    Abstract: In the field of visual affordance learning, previous methods mainly used abundant images or videos that delineate human behavior patterns to identify action possibility regions for object manipulation, with a variety of applications in robotic tasks. However, they encounter a main challenge of action ambiguity, illustrated by the vagueness like whether to beat or carry a drum, and the complexities… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  10. arXiv:2404.05519  [pdf, other

    cs.CV cs.LG

    Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models

    Authors: Saman Motamed, Wouter Van Gansbeke, Luc Van Gool

    Abstract: With recent advances in image and video diffusion models for content creation, a plethora of techniques have been proposed for customizing their generated content. In particular, manipulating the cross-attention layers of Text-to-Image (T2I) diffusion models has shown great promise in controlling the shape and location of objects in the scene. Transferring image-editing techniques to the video dom… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Generative Models for Computer Vision Generative Models for Computer Vision CVPR 2024 Workshop

  11. arXiv:2404.04617  [pdf, other

    cs.CV

    Empowering Image Recovery_ A Multi-Attention Approach

    Authors: Juan Wen, Yawei Li, Chao Zhang, Weiyan Hou, Radu Timofte, Luc Van Gool

    Abstract: We propose Diverse Restormer (DART), a novel image restoration method that effectively integrates information from various sources (long sequences, local and global regions, feature dimensions, and positional dimensions) to address restoration challenges. While Transformer models have demonstrated excellent performance in image restoration due to their self-attention mechanism, they face limitatio… ▽ More

    Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 12 pages, 10 figures, 12 tables

    MSC Class: 68T07 (Primary) 168T45 (Secondary) ACM Class: I.4.4

  12. arXiv:2404.03799  [pdf, other

    cs.CV cs.AI

    Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation

    Authors: Elham Amin Mansour, Ozan Unal, Suman Saha, Benjamin Bejar, Luc Van Gool

    Abstract: The increasing relevance of panoptic segmentation is tied to the advancements in autonomous driving and AR/VR applications. However, the deployment of such models has been limited due to the expensive nature of dense data annotation, giving rise to unsupervised domain adaptation (UDA). A key challenge in panoptic UDA is reducing the domain gap between a labeled source and an unlabeled target domai… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  13. arXiv:2404.03658  [pdf, other

    cs.CV

    Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

    Authors: Rui Li, Tobias Fischer, Mattia Segu, Marc Pollefeys, Luc Van Gool, Federico Tombari

    Abstract: Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page: https://ruili3.github.io/kyn

  14. arXiv:2404.03159  [pdf, other

    cs.CV

    HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

    Authors: Wencan Cheng, Hao Tang, Luc Van Gool, Jong Hwan Ko

    Abstract: Extracting keypoint locations from input hand frames, known as 3D hand pose estimation, is a critical task in various human-computer interaction applications. Essentially, the 3D hand pose estimation can be regarded as a 3D point subset generative problem conditioned on input frames. Thanks to the recent significant progress on diffusion-based generative models, hand pose estimation can also benef… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted as a conference paper to the Conference on Computer Vision and Pattern Recognition (2024)

  15. arXiv:2404.02838  [pdf, other

    cs.AI

    I-Design: Personalized LLM Interior Designer

    Authors: Ata Çelen, Guo Han, Konrad Schindler, Luc Van Gool, Iro Armeni, Anton Obukhov, Xi Wang

    Abstract: Interior design allows us to be who we are and live how we want - each design is as unique as our distinct personality. However, it is not trivial for non-professionals to express and materialize this since it requires aligning functional and visual expectations with the constraints of physical space; this renders interior design a luxury. To make it more accessible, we present I-Design, a persona… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  16. arXiv:2404.01243  [pdf, other

    cs.CV

    A Unified and Interpretable Emotion Representation and Expression Generation

    Authors: Reni Paskaleva, Mykyta Holubakha, Andela Ilic, Saman Motamed, Luc Van Gool, Danda Paudel

    Abstract: Canonical emotions, such as happy, sad, and fearful, are easy to understand and annotate. However, emotions are often compound, e.g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones. Intuitively, emotions are continuous as represented by the arousal-valence (AV) model. An interpretable unification of these four modalit… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 10 pages, 9 figures, 3 tables Accepted at CVPR 2024. Project page: https://emotion-diffusion.github.io

  17. arXiv:2403.19549  [pdf, other

    cs.CV cs.RO

    GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM

    Authors: Ganlin Zhang, Erik Sandström, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald

    Abstract: Recent advancements in RGB-only dense Simultaneous Localization and Map** (SLAM) have predominantly utilized grid-based neural implicit encodings and/or struggle to efficiently realize global map and pose consistency. To this end, we propose an efficient RGB-only dense SLAM system using a flexible neural point cloud scene representation that adapts to keyframe poses and depth updates, without ne… ▽ More

    Submitted 27 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  18. arXiv:2403.18913  [pdf, other

    cs.CV

    UniDepth: Universal Monocular Metric Depth Estimation

    Authors: Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, Fisher Yu

    Abstract: Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to unseen domains even in the presence of moderate domain gaps, which hinders their practical applicability. We propose a new model, UniDepth, capable o… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  19. arXiv:2403.16161  [pdf, other

    cs.CV

    Towards Online Real-Time Memory-based Video Inpainting Transformers

    Authors: Guillaume Thiry, Hao Tang, Radu Timofte, Luc Van Gool

    Abstract: Video inpainting tasks have seen significant improvements in recent years with the rise of deep neural networks and, in particular, vision transformers. Although these models show promising reconstruction quality and temporal consistency, they are still unsuitable for live videos, one of the last steps to make them completely convincing and usable. The main limitations are that these state-of-the-… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  20. arXiv:2403.06904  [pdf, other

    cs.CV

    FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

    Authors: Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc Van Gool, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks. Our novel contributions enhance CLIP on both the vision and text sides. On the vision side, we incorporate ROI heatmaps emulating human visual attention mechanisms to emphasize subject-relevant image regio… ▽ More

    Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  21. arXiv:2403.00592  [pdf, other

    cs.CV

    Rethinking Few-shot 3D Point Cloud Semantic Segmentation

    Authors: Zhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, Serge Belongie

    Abstract: This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS), with a focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. The former arises from non-uniform point sampling, allowing models to distinguish the density disparities between foreground and background for easier segmentation. The latter results from sampling only 2,048 p… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  22. arXiv:2402.09944  [pdf, other

    cs.CV

    Loopy-SLAM: Dense Neural SLAM with Loop Closures

    Authors: Lorenzo Liso, Erik Sandström, Vladimir Yugay, Luc Van Gool, Martin R. Oswald

    Abstract: Neural RGBD SLAM techniques have shown promise in dense Simultaneous Localization And Map** (SLAM), yet face challenges such as error accumulation during camera tracking resulting in distorted maps. In response, we introduce Loopy-SLAM that globally optimizes poses and the dense 3D model. We use frame-to-model tracking using a data-driven point-based submap generation method and trigger loop clo… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  23. arXiv:2402.03094  [pdf, other

    cs.CV cs.LG

    Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

    Authors: Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, Xingqun Jiang

    Abstract: This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear: 1) can such open-set detection methods easily generalize… ▽ More

    Submitted 19 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  24. arXiv:2402.02634  [pdf, other

    cs.CV cs.LG eess.IV

    Key-Graph Transformer for Image Restoration

    Authors: Bin Ren, Yawei Li, **gyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 6 figures

  25. arXiv:2402.02235  [pdf, other

    cs.CV

    Image Fusion via Vision-Language Model

    Authors: Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc Van Gool

    Abstract: Image fusion integrates essential information from multiple source images into a single composite, emphasizing the highlighting structure and textures, and refining imperfect areas. Existing methods predominantly focus on pixel-level and semantic visual features for recognition. However, they insufficiently explore the deeper semantic information at a text-level beyond vision. Therefore, we introd… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  26. arXiv:2401.15261  [pdf, other

    cs.CV

    Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

    Authors: Diandian Guo, Deng-** Fan, Tongyu Lu, Christos Sakaridis, Luc Van Gool

    Abstract: The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes. Prior works utilize keyframes, feature propagation, or cross-frame attention to address these issues. By contrast, we are the first to harness vanishing point (VP) priors for more effective segmentation. Intuitively, objects… ▽ More

    Submitted 25 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: CVPR 2024 highlight

  27. arXiv:2401.12761  [pdf, other

    cs.CV

    MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty

    Authors: Tim Brödermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool

    Abstract: Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  28. arXiv:2401.07721  [pdf, other

    cs.CV

    Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

    Authors: Hao Tang, Ling Shao, Nicu Sebe, Luc Van Gool

    Abstract: We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for challenging graph-constrained architectural layout generation tasks. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and gl… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted to TPAMI, an extended version of a paper published in CVPR2023. arXiv admin note: substantial text overlap with arXiv:2303.08225

  29. arXiv:2401.05335  [pdf, other

    cs.CV cs.GR cs.LG

    InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

    Authors: Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc Van Gool, Federico Tombari

    Abstract: We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes. Recently, methods for 3D scene editing have been profoundly transformed, owing to the use of strong priors of text-to-image diffusion models in 3D generat… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  30. arXiv:2401.02418  [pdf, other

    cs.CV

    Learning to Prompt with Text Only Supervision for Vision-Language Models

    Authors: Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc Van Gool, Federico Tombari

    Abstract: Foundational vision-language models such as CLIP are becoming a new paradigm in vision, due to their excellent generalization abilities. However, adapting these models for downstream tasks while maintaining their generalization remains a challenge. In literature, one branch of methods adapts CLIP by learning prompts using visual information. While effective, most of these works require labeled dat… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Project Page: https://muzairkhattak.github.io/ProText/

  31. arXiv:2312.15471  [pdf, other

    cs.CV cs.RO

    Residual Learning for Image Point Descriptors

    Authors: Rashik Shrestha, Ajad Chhatkuli, Menelaos Kanakis, Luc Van Gool

    Abstract: Local image feature descriptors have had a tremendous impact on the development and application of computer vision methods. It is therefore unsurprising that significant efforts are being made for learning-based image point descriptors. However, the advantage of learned methods over handcrafted methods in real applications is subtle and more nuanced than expected. Moreover, handcrafted descriptors… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  32. arXiv:2312.13332  [pdf, other

    cs.CV

    Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM

    Authors: Junru Lin, Asen Nachkov, Songyou Peng, Luc Van Gool, Danda Pani Paudel

    Abstract: The opacity of rigid 3D scenes with opaque surfaces is considered to be of a binary type. However, we observed that this property is not followed by the existing RGB-only NeRF-SLAM. Therefore, we are motivated to introduce this prior into the RGB-only NeRF-SLAM pipeline. Unfortunately, the optimization through the volumetric rendering function does not facilitate easy integration of the desired pr… ▽ More

    Submitted 22 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

  33. arXiv:2312.11578  [pdf, other

    cs.CV

    Diffusion-Based Particle-DETR for BEV Perception

    Authors: Asen Nachkov, Martin Danelljan, Danda Pani Paudel, Luc Van Gool

    Abstract: The Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs) due to its well suited compatibility to downstream tasks. For the enhanced safety of AVs, modeling perception uncertainty in BEV is crucial. Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively det… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  34. arXiv:2312.08558  [pdf, other

    cs.CV

    G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving

    Authors: M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel, Nikola Popovic, Christian Vater, Otmar Hilliges, Luc Van Gool, Xi Wang

    Abstract: Understanding the decision-making process of drivers is one of the keys to ensuring road safety. While the driver intent and the resulting ego-motion trajectory are valuable in develo** driver-assistance systems, existing methods mostly focus on the motions of other vehicles. In contrast, we focus on inferring the ego trajectory of a driver's vehicle using their gaze data. For this purpose, we f… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  35. arXiv:2312.03048  [pdf, other

    cs.CV

    DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

    Authors: Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc Van Gool, Konrad Schindler, Anton Obukhov

    Abstract: Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content, specialize to user data through few-shot fine-tuning, and condition their output on other modalities, such as semantic maps. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this… ▽ More

    Submitted 8 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  36. arXiv:2312.03032  [pdf, other

    cs.CV

    Zero-Shot Point Cloud Registration

    Authors: Weijie Wang, Guofeng Mei, Bin Ren, Xiaoshui Huang, Fabio Poiesi, Luc Van Gool, Nicu Sebe, Bruno Lepri

    Abstract: Learning-based point cloud registration approaches have significantly outperformed their traditional counterparts. However, they typically require extensive training on specific datasets. In this paper, we propose , the first zero-shot point cloud registration approach that eliminates the need for training on point cloud datasets. The cornerstone of ZeroReg is the novel transfer of image features… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  37. arXiv:2311.17944  [pdf, other

    cs.CV

    LALM: Long-Term Action Anticipation with Language Models

    Authors: Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc Van Gool, Xi Wang

    Abstract: Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint. While traditional methods heavily rely on representation learning trained on extensive video data, there exists a significant limitation: obtaining effective video representations proves challenging due to the inherent complexi… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  38. arXiv:2311.17119  [pdf, other

    cs.CV

    Continuous Pose for Monocular Cameras in Neural Implicit Representation

    Authors: Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool

    Abstract: In this paper, we showcase the effectiveness of optimizing monocular camera poses as a continuous function of time. The camera poses are represented using an implicit neural function which maps the given time to the corresponding camera pose. The mapped camera poses are then used for the downstream tasks where joint camera pose optimization is also required. While doing so, the network parameters… ▽ More

    Submitted 2 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  39. arXiv:2311.16241  [pdf, other

    cs.CV

    SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

    Authors: Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, Federico Tombari

    Abstract: In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  40. arXiv:2311.15851  [pdf, other

    cs.CV

    Single-Model and Any-Modality for Video Object Tracking

    Authors: Zongwei Wu, Jilai Zheng, Xiangxuan Ren, Florin-Alexandru Vasluianu, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte

    Abstract: In the realm of video object tracking, auxiliary modalities such as depth, thermal, or event data have emerged as valuable assets to complement the RGB trackers. In practice, most existing RGB trackers learn a single set of parameters to use them across datasets and applications. However, a similar single-model unification for multi-modality tracking presents several challenges. These challenges s… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR2024

  41. arXiv:2311.15605  [pdf, other

    cs.CV

    2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation

    Authors: Ozan Unal, Dengxin Dai, Lukas Hoyer, Yigit Baran Can, Luc Van Gool

    Abstract: As 3D perception problems grow in popularity and the need for large-scale labeled datasets for LiDAR semantic segmentation increase, new methods arise that aim to reduce the necessity for dense annotations by employing weakly-supervised training. However these methods continue to show weak boundary estimation and high false negative rates for small objects and distant sparse regions. We argue that… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted at WACV 2024

  42. arXiv:2311.13833  [pdf, other

    cs.CV cs.CL cs.LG

    Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

    Authors: Saman Motamed, Danda Pani Paudel, Luc Van Gool

    Abstract: Diffusion models have revolutionized generative content creation and text-to-image (T2I) diffusion models in particular have increased the creative freedom of users by allowing scene synthesis using natural language. T2I models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Tex… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  43. arXiv:2311.13009  [pdf, ps, other

    cs.CV

    3D Compression Using Neural Fields

    Authors: Janis Postels, Yannick Strümpler, Klara Reichard, Luc Van Gool, Federico Tombari

    Abstract: Neural Fields (NFs) have gained momentum as a tool for compressing various data modalities - e.g. images and videos. This work leverages previous advances and proposes a novel NF-based compression algorithm for 3D data. We derive two versions of our approach - one tailored to watertight shapes based on Signed Distance Fields (SDFs) and, more generally, one for arbitrary non-watertight shapes using… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  44. arXiv:2311.12157  [pdf, other

    cs.CV

    Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions

    Authors: Nikola Popovic, Dimitrios Christodoulou, Danda Pani Paudel, Xi Wang, Luc Van Gool

    Abstract: The task of predicting 3D eye gaze from eye images can be performed either by (a) end-to-end learning for image-to-gaze map** or by (b) fitting a 3D eye model onto images. The former case requires 3D gaze labels, while the latter requires eye semantics or landmarks to facilitate the model fitting. Although obtaining eye semantics and landmarks is relatively easy, fitting an accurate 3D eye model… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted to ISMAR2023 as a poster paper

  45. arXiv:2311.11600  [pdf, other

    cs.CV

    Deep Equilibrium Diffusion Restoration with Parallel Sampling

    Authors: Jiezhang Cao, Yue Shi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: Diffusion model-based image restoration (IR) aims to use diffusion models to recover high-quality (HQ) images from degraded images, achieving promising performance. Due to the inherent property of diffusion models, most existing methods need long serial sampling chains to restore HQ images step-by-step, resulting in expensive sampling time and high computation costs. Moreover, such long sampling c… ▽ More

    Submitted 29 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: CVPR'2024

  46. arXiv:2311.11325  [pdf, other

    cs.CV eess.IV

    MoVideo: Motion-Aware Video Generation with Diffusion Models

    Authors: **gyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool, Rakesh Ranjan

    Abstract: While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspe… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: project homepage: https://**gyunliang.github.io/MoVideo

  47. arXiv:2311.08043  [pdf, other

    cs.CV

    Contrastive Learning for Multi-Object Tracking with Transformers

    Authors: Pierre-François De Plaen, Nicola Marinello, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool

    Abstract: The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-l… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  48. arXiv:2311.04521  [pdf, other

    cs.CV

    Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images

    Authors: Nishant Jain, Suryansh Kumar, Luc Van Gool

    Abstract: We introduce an improved solution to the neural image-based rendering problem in computer vision. Given a set of images taken from a freely moving camera at train time, the proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time. The key ideas presented in this paper are (i) Recovering accurate camera parameters via a robust pipeline from unposed day-t… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted for publication at International Journal of Computer Vision (IJCV). Draft info: 22 pages, 12 figures and 14 tables

  49. arXiv:2311.03345  [pdf, other

    cs.CV

    Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences

    Authors: Zador Pataki, Mohammad Altillawi, Menelaos Kanakis, Rémi Pautrat, Fengyi Shen, Ziyuan Liu, Luc Van Gool, Marc Pollefeys

    Abstract: Modern learning-based visual feature extraction networks perform well in intra-domain localization, however, their performance significantly declines when image pairs are captured across long-term visual domain variations, such as different seasonal and daytime variations. In this paper, our first contribution is a benchmark to investigate the performance impact of long-term variations on visual l… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 14 pages + 5 pages appendix, 13 figures

  50. arXiv:2311.00932  [pdf, other

    cs.CV eess.IV

    Towards High-quality HDR Deghosting with Conditional Diffusion Models

    Authors: Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc Van Gool, Yanning Zhang

    Abstract: High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: accepted by IEEE TCSVT