Skip to main content

Showing 101–150 of 505 results for author: van Gool, L

.
  1. arXiv:2303.08225  [pdf, other

    cs.CV cs.AI

    Graph Transformer GANs for Graph-Constrained House Generation

    Authors: Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc Van Gool

    Abstract: We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and global interac… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  2. arXiv:2303.06840  [pdf, other

    cs.CV

    DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

    Authors: Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu Timofte, Luc Van Gool

    Abstract: Multi-modality image fusion aims to combine different modalities to produce fused images that retain the complementary features of each modality, such as functional highlights and texture details. To leverage strong generative priors and address challenges such as unstable training and lack of interpretability for GAN-based generative methods, we propose a novel fusion algorithm based on the denoi… ▽ More

    Submitted 22 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV 2023 (Oral)

  3. arXiv:2303.05194  [pdf, other

    cs.CV

    Contrastive Model Adaptation for Cross-Condition Robustness in Semantic Segmentation

    Authors: David Bruggemann, Christos Sakaridis, Tim Brödermann, Luc Van Gool

    Abstract: Standard unsupervised domain adaptation methods adapt models from a source to a target domain using labeled source data and unlabeled target data jointly. In model adaptation, on the other hand, access to the labeled source data is prohibited, i.e., only the source-trained model and unlabeled target data are available. We investigate normal-to-adverse condition model adaptation for semantic segmen… ▽ More

    Submitted 17 August, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: International Conference on Computer Vision (ICCV) 2023

  4. arXiv:2303.04118  [pdf, other

    cs.RO cs.LG

    A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

    Authors: Nick Bührer, Zhejun Zhang, Alexander Liniger, Fisher Yu, Luc Van Gool

    Abstract: An emerging field of sequential decision problems is safe Reinforcement Learning (RL), where the objective is to maximize the reward while obeying safety constraints. Being able to handle constraints is essential for deploying RL agents in real-world environments, where constraint violations can harm the agent and the environment. To this end, we propose a safe model-free RL algorithm with a novel… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Repository available at https://github.com/nikeke19/Safe-Mult-RL

  5. arXiv:2303.04116  [pdf, other

    cs.RO cs.CV

    TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction

    Authors: Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, Luc Van Gool

    Abstract: Data-driven simulation has become a favorable way to train and test autonomous driving algorithms. The idea of replacing the actual environment with a learned simulator has also been explored in model-based reinforcement learning in the context of world models. In this work, we show data-driven traffic simulation can be formulated as a world model. We present TrafficBots, a multi-agent policy buil… ▽ More

    Submitted 28 September, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Published at ICRA 2023. The repository is available at https://github.com/zhejz/TrafficBots

  6. arXiv:2303.00748  [pdf, other

    cs.CV

    Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

    Authors: Yawei Li, Yuchen Fan, Xiaoyu Xiang, Denis Demandolx, Rakesh Ranjan, Radu Timofte, Luc Van Gool

    Abstract: The aim of this paper is to propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration. To achieve that, we start by analyzing two important properties of natural images including cross-scale similarity and anisotropic image features. Inspired by that, we propose the anchored stripe self-attention which achieves a good b… ▽ More

    Submitted 25 May, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023. 12 pages, 7 figures, 11 tables

  7. arXiv:2302.06556  [pdf, other

    cs.CV cs.LG

    VA-DepthNet: A Variational Approach to Single Image Depth Prediction

    Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc Van Gool

    Abstract: We introduce VA-DepthNet, a simple, effective, and accurate deep neural network approach for the single-image depth prediction (SIDP) problem. The proposed approach advocates using classical first-order variational constraints for this problem. While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable i… ▽ More

    Submitted 15 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Accepted for publication at ICLR 2023 (Spotlight Oral Presentation). Draft info: 21 pages, 13 tables, 8 figures

  8. arXiv:2302.00903  [pdf, other

    cs.CV

    No One Left Behind: Real-World Federated Class-Incremental Learning

    Authors: Jiahua Dong, Hongliu Li, Yang Cong, Gan Sun, Yulun Zhang, Luc Van Gool

    Abstract: Federated learning (FL) is a hot collaborative training framework via aggregating model parameters of decentralized local clients. However, most FL methods unreasonably assume data categories of FL framework are known and fixed in advance. Moreover, some new local clients that collect novel categories unseen by other clients may be introduced to FL training irregularly. These issues render global… ▽ More

    Submitted 15 November, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence 2023 (TPAMI 2023)

  9. arXiv:2301.09209  [pdf, other

    cs.CV cs.CL

    Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

    Authors: Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc Van Gool, Otmar Hilliges, Xi Wang

    Abstract: We study object interaction anticipation in egocentric videos. This task requires an understanding of the spatio-temporal context formed by past actions on objects, coined action context. We propose TransFusion, a multimodal transformer-based architecture. It exploits the representational power of language by summarizing the action context. TransFusion leverages pre-trained image captioning and vi… ▽ More

    Submitted 10 March, 2024; v1 submitted 22 January, 2023; originally announced January 2023.

  10. arXiv:2301.05191  [pdf, other

    cs.CV

    Event-Based Frame Interpolation with Ad-hoc Deblurring

    Authors: Lei Sun, Christos Sakaridis, **gyun Liang, Peng Sun, Jiezhang Cao, Kai Zhang, Qi Jiang, Kaiwei Wang, Luc Van Gool

    Abstract: The performance of video frame interpolation is inherently correlated with the ability to handle motion in the input scene. Even though previous works recognize the utility of asynchronous event information for this task, they ignore the fact that motion may or may not result in blur in the input video to be interpolated, depending on the length of the exposure time of the frames and the speed of… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  11. arXiv:2212.11920  [pdf, other

    cs.CV

    Beyond SOT: Tracking Multiple Generic Objects at Once

    Authors: Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc Van Gool, Alina Kuznetsova

    Abstract: Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the la… ▽ More

    Submitted 25 February, 2024; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: accepted by WACV'24

  12. arXiv:2212.07292  [pdf, other

    cs.CV

    One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers

    Authors: Rui Gong, Qin Wang, Dengxin Dai, Luc Van Gool

    Abstract: Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. It can save the cost of manually labeling data in real-world applications such as robot vision and autonomous driving. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for th… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 15 pages, 6 figures, 10 Tables

  13. arXiv:2212.06570  [pdf, other

    cs.CV

    CamoFormer: Masked Separable Attention for Camouflaged Object Detection

    Authors: Bowen Yin, Xuying Zhang, Qibin Hou, Bo-Yuan Sun, Deng-** Fan, Luc Van Gool

    Abstract: How to identify and segment camouflaged objects from the background is challenging. Inspired by the multi-head self-attention in Transformers, we present a simple masked separable attention (MSA) for camouflaged object detection. We first separate the multi-head self-attention into three parts, which are responsible for distinguishing the camouflaged objects from the background using different mas… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

  14. arXiv:2212.05370  [pdf, other

    cs.CV

    Source-free Depth for Object Pop-out

    Authors: Zongwei Wu, Danda Pani Paudel, Deng-** Fan, **g**g Wang, Shuo Wang, Cédric Demonceaux, Radu Timofte, Luc Van Gool

    Abstract: Depth cues are known to be useful for visual perception. However, direct measurement of depth is often impracticable. Fortunately, though, modern learning-based methods offer promising depth maps by inference in the wild. In this work, we adapt such depth inference models for object segmentation using the objects' "pop-out" prior in 3D. The "pop-out" is a simple composition prior that assumes obje… ▽ More

    Submitted 25 September, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

    Comments: Accepted to ICCV 2023

  15. arXiv:2212.04362  [pdf, other

    cs.CV

    CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

    Authors: Jiezhang Cao, Qin Wang, Yongqin Xian, Yawei Li, Bingbing Ni, Zhiming Pi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no l… ▽ More

    Submitted 13 April, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  16. arXiv:2212.02291  [pdf, other

    cs.CV

    I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification

    Authors: Muhammad Ferjad Naeem, Muhammad Gul Zain Ali Khan, Yongqin Xian, Muhammad Zeshan Afzal, Didier Stricker, Luc Van Gool, Federico Tombari

    Abstract: Recent works have shown that unstructured text (documents) from online sources can serve as useful auxiliary information for zero-shot image classification. However, these methods require access to a high-quality source like Wikipedia and are limited to a single source of information. Large Language Models (LLM) trained on web-scale text show impressive abilities to repurpose their learned knowled… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  17. arXiv:2212.01331  [pdf, other

    cs.CV

    Surface Normal Clustering for Implicit Representation of Manhattan Scenes

    Authors: Nikola Popovic, Danda Pani Paudel, Luc Van Gool

    Abstract: Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-leve… ▽ More

    Submitted 27 September, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Paper accepted to ICCV23

  18. arXiv:2212.01322  [pdf, other

    cs.CV

    MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

    Authors: Lukas Hoyer, Dengxin Dai, Haoran Wang, Luc Van Gool

    Abstract: In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Im… ▽ More

    Submitted 24 March, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  19. arXiv:2211.16928  [pdf, other

    eess.IV cs.CV

    Knowledge Distillation based Degradation Estimation for Blind Super-Resolution

    Authors: Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool

    Abstract: Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (e.g., blur, noise, jpeg compression) to… ▽ More

    Submitted 16 February, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: ICLR2023, code is available at https://github.com/Zj-BinXia/KDSR

  20. arXiv:2211.14461  [pdf, other

    cs.CV

    CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

    Authors: Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, Luc Van Gool

    Abstract: Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed textures. To tackle the challenge in modeling cross-modality features and decomposing desirable modality-specific and modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. Firstly, CDDFuse us… ▽ More

    Submitted 10 April, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted by CVPR 2023

  21. arXiv:2211.12131  [pdf, other

    cs.CV

    DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models

    Authors: Shengqu Cai, Eric Ryan Chan, Songyou Peng, Mohamad Shahbazi, Anton Obukhov, Luc Van Gool, Gordon Wetzstein

    Abstract: Scene extrapolation -- the idea of generating novel views by flying into a given image -- is a promising, yet challenging task. For each predicted frame, a joint inpainting and 3D refinement problem has to be solved, which is ill posed and includes a high level of ambiguity. Moreover, training data for long-range scenes is difficult to obtain and usually lacks sufficient views to infer accurate ca… ▽ More

    Submitted 18 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

  22. arXiv:2211.07491  [pdf, other

    cs.CV

    Piecewise Planar Hulls for Semi-Supervised Learning of 3D Shape and Pose from 2D Images

    Authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool

    Abstract: We study the problem of estimating 3D shape and pose of an object in terms of keypoints, from a single 2D image. The shape and pose are learned directly from images collected by categories and their partial 2D keypoint annotations.. In this work, we first propose an end-to-end training framework for intermediate 2D keypoints extraction and final 3D shape and pose estimation. The proposed framewo… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  23. Advancing Learned Video Compression with In-loop Frame Prediction

    Authors: Ren Yang, Radu Timofte, Luc Van Gool

    Abstract: Recent years have witnessed an increasing interest in end-to-end learned video compression. Most previous works explore temporal redundancy by detecting and compressing a motion map to warp the reference frame towards the target frame. Yet, it failed to adequately take advantage of the historical priors in the sequential reference frames. In this paper, we propose an Advanced Learned Video Compres… ▽ More

    Submitted 18 November, 2022; v1 submitted 13 November, 2022; originally announced November 2022.

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology (2022)

  24. arXiv:2211.06770  [pdf, other

    cs.CV cs.LG eess.IV

    MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning

    Authors: Andrey Ignatov, Anastasia Sycheva, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

    Abstract: While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The propo… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2211.06263

  25. arXiv:2211.06263  [pdf, other

    cs.CV cs.LG eess.IV

    PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

    Abstract: The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address th… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  26. arXiv:2210.16822  [pdf, other

    cs.CV

    Towards Versatile Embodied Navigation

    Authors: Hanqing Wang, Wei Liang, Luc Van Gool, Wenguan Wang

    Abstract: With the emergence of varied visual navigation tasks (e.g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: ca… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022; Code: https://github.com/hanqingwangai/VXN

  27. TripletTrack: 3D Object Tracking using Triplet Embeddings and LSTM

    Authors: Nicola Marinello, Marc Proesmans, Luc Van Gool

    Abstract: 3D object tracking is a critical task in autonomous driving systems. It plays an essential role for the system's awareness about the surrounding environment. At the same time there is an increasing interest in algorithms for autonomous cars that solely rely on inexpensive sensors, such as cameras. In this paper we investigate the use of triplet embeddings in combination with motion representations… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted to CVPR 2022 Workshop on Autonomous Driving

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops June 2022 4500-4510

  28. Masked Vision-Language Transformer in Fashion

    Authors: Ge-Peng Ji, Mingcheng Zhuge, Dehong Gao, Deng-** Fan, Christos Sakaridis, Luc Van Gool

    Abstract: We present a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation. Technically, we simply utilize vision transformer architecture for replacing the BERT in the pre-training model, making MVLT the first end-to-end framework for the fashion domain. Besides, we designed masked image reconstruction (MIR) for a fine-grained understanding of fashion. MVLT is an exten… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted by Machine Intelligence Research (2023)

    Journal ref: Machine Intelligence Research. 20, 421-434 (2023)

  29. arXiv:2210.11557  [pdf, other

    cs.CV

    Learning Attention Propagation for Compositional Zero-Shot Learning

    Authors: Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc Van Gool, Alain Pagani, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Compositional zero-shot learning aims to recognize unseen compositions of seen visual primitives of object classes and their states. While all primitives (states and objects) are observable during training in some combination, their complex interaction makes this task especially hard. For example, wet changes the visual appearance of a dog very differently from a bicycle. Furthermore, we argue tha… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  30. arXiv:2210.07670  [pdf, other

    cs.CV

    Multi-View Photometric Stereo Revisited

    Authors: Berk Kaya, Suryansh Kumar, Carlos Oliveira, Vittorio Ferrari, Luc Van Gool

    Abstract: Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images. Although popular methods for MVPS can provide outstanding results, they are often complex to execute and limited to isotropic material objects. To address such limitations, we present a simple, practical approach to MVPS, which works well for isotropic as well as other objec… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted for publication at IEEE/CVF WACV 2023. Draft info: 10 pages, 5 figure, and 3 tables

  31. arXiv:2210.07239  [pdf, other

    cs.CV

    Composite Learning for Robust and Effective Dense Predictions

    Authors: Menelaos Kanakis, Thomas E. Huang, David Bruggemann, Fisher Yu, Luc Van Gool

    Abstract: Multi-task learning promises better model generalization on a target task by jointly optimizing it with an auxiliary task. However, the current practice requires additional labeling efforts for the auxiliary task, while not guaranteeing better model performance. In this paper, we find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently im… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Winter Conference on Applications of Computer Vision (WACV), 2023

  32. arXiv:2210.06196  [pdf, other

    cs.LO

    On the Preservation of Properties when Changing Communication Models

    Authors: Olav Bunte, Louis C. M. van Gool, Tim A. C. Willemse

    Abstract: In a system of processes that communicate asynchronously by means of FIFO channels, there are many options in which these channels can be laid out. In this paper, we compare channel layouts in how they affect the behaviour of the system using an ordering based on splitting and merging channels. This order induces a simulation relation, from which the preservation of safety properties follows. Also… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  33. arXiv:2210.04553  [pdf, other

    cs.CV

    SiNeRF: Sinusoidal Neural Radiance Fields for Joint Pose Estimation and Scene Reconstruction

    Authors: Yitong Xia, Hao Tang, Radu Timofte, Luc Van Gool

    Abstract: NeRFmm is the Neural Radiance Fields (NeRF) that deal with Joint Optimization tasks, i.e., reconstructing real-world scenes and registering camera parameters simultaneously. Despite NeRFmm producing precise scene synthesis and pose estimations, it still struggles to outperform the full-annotated baseline on challenging scenes. In this work, we identify that there exists a systematic sub-optimality… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted yet not published by BMVC2022

  34. arXiv:2210.04233  [pdf, other

    cs.CV

    Robustifying the Multi-Scale Representation of Neural Radiance Fields

    Authors: Nishant Jain, Suryansh Kumar, Luc Van Gool

    Abstract: Neural Radiance Fields (NeRF) recently emerged as a new paradigm for object representation from multi-view (MV) images. Yet, it cannot handle multi-scale (MS) images and camera pose estimation errors, which generally is the case with multi-view images captured from a day-to-day commodity camera. Although recently proposed Mip-NeRF could handle multi-scale imaging problems with NeRF, it cannot hand… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: Accepted for publication at British Machine Vision Conference (BMVC) 2022. Draft info: 13 pages, 3 Figures, and 4 Tables

  35. arXiv:2210.00405  [pdf, other

    cs.CV eess.IV

    Basic Binary Convolution Unit for Binarized Image Restoration Network

    Authors: Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool

    Abstract: Lighter and faster image restoration (IR) models are crucial for the deployment on resource-limited devices. Binary neural network (BNN), one of the most promising model compression methods, can dramatically reduce the computations and parameters of full-precision convolutional neural networks (CNN). However, there are different properties between BNN and full-precision CNN, and we can hardly use… ▽ More

    Submitted 16 February, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: ICLR2023, code is available at https://github.com/Zj-BinXia/BBCU

  36. arXiv:2209.15529  [pdf, other

    cs.LG cs.CV stat.ML

    TT-NF: Tensor Train Neural Fields

    Authors: Anton Obukhov, Mikhail Usvyatsov, Christos Sakaridis, Konrad Schindler, Luc Van Gool

    Abstract: Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterizat… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

    Comments: Preprint, under review

  37. arXiv:2209.15439  [pdf, other

    cs.CV

    Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection

    Authors: Yifan Lu, Gurkirt Singh, Suman Saha, Luc Van Gool

    Abstract: We propose a novel domain adaptive action detection approach and a new adaptation protocol that leverages the recent advancements in image-level unsupervised domain adaptation (UDA) techniques and handle vagaries of instance-level video data. Self-training combined with cross-domain mixed sampling has shown remarkable performance gain in semantic segmentation in UDA (unsupervised domain adaptation… ▽ More

    Submitted 6 October, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

  38. arXiv:2209.15179  [pdf, other

    cs.CV

    Physical Adversarial Attack meets Computer Vision: A Decade Survey

    Authors: Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Luc Van Gool, Zheng Wang

    Abstract: Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision, their vulnerability to adversarial attacks remains a critical concern. Extensive research has demonstrated that incorporating sophisticated perturbations into input images can lead to a catastrophic degradation in DNNs' performance. This perplexing phenomenon not only exists in the digital space but also in the… ▽ More

    Submitted 1 October, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: 19 pages. Under Review

  39. arXiv:2209.10304  [pdf, other

    cs.CV

    I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

    Authors: Muhammad Ferjad Naeem, Yongqin Xian, Luc Van Gool, Federico Tombari

    Abstract: Despite the tremendous progress in zero-shot learning(ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale. An unsupervised alternative is to represent each class using the word embedding associated with its semantic class name. However, word embeddings extracted from pre-trained language models do not necessarily capture visual… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  40. arXiv:2209.02250  [pdf, other

    cs.CV

    Spatio-Temporal Action Detection Under Large Motion

    Authors: Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher Yu, Luc Van Gool

    Abstract: Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames. However, such pooling fails to accumulate meaningful spatiotemporal features if the position or shape of the actor shows large 2D motion and variability through the frames, due to large camera motion, large actor shape def… ▽ More

    Submitted 25 October, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

    Comments: 10 pages, 5 figures, 5 tables

  41. arXiv:2208.11803  [pdf, other

    cs.CV eess.IV

    Learning Task-Oriented Flows to Mutually Guide Feature Alignment in Synthesized and Real Video Denoising

    Authors: Jiezhang Cao, Qin Wang, **gyun Liang, Yulun Zhang, Kai Zhang, Radu Timofte, Luc Van Gool

    Abstract: Video denoising aims at removing noise from videos to recover clean ones. Some existing works show that optical flow can help the denoising by exploiting the additional spatial-temporal clues from nearby frames. However, the flow estimation itself is also sensitive to noise, and can be unusable under large noise levels. To this end, we propose a new multi-scale refined optical flow-guided video de… ▽ More

    Submitted 25 March, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

  42. arXiv:2208.08932  [pdf, other

    cs.CV stat.ML

    ManiFlow: Implicitly Representing Manifolds with Normalizing Flows

    Authors: Janis Postels, Martin Danelljan, Luc Van Gool, Federico Tombari

    Abstract: Normalizing Flows (NFs) are flexible explicit generative models that have been shown to accurately model complex real-world data distributions. However, their invertibility constraint imposes limitations on data distributions that reside on lower dimensional manifolds embedded in higher dimensional space. Practically, this shortcoming is often bypassed by adding noise to the data which impacts the… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: International Conference on 3D Vision 2022

  43. arXiv:2208.06888  [pdf, other

    cs.CV

    AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility

    Authors: Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Akshay Dudhane, Martin Danelljan, Hisham Cholakkal, Salman Khan, Luc Van Gool, Fahad Shahbaz Khan

    Abstract: One of the key factors behind the recent success in visual tracking is the availability of dedicated benchmarks. While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scena… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

  44. arXiv:2207.11938  [pdf, other

    cs.CV

    Reference-based Image Super-Resolution with Deformable Attention Transformer

    Authors: Jiezhang Cao, **gyun Liang, Kai Zhang, Yawei Li, Yulun Zhang, Wenguan Wang, Luc Van Gool

    Abstract: Reference-based image super-resolution (RefSR) aims to exploit auxiliary reference (Ref) images to super-resolve low-resolution (LR) images. Recently, RefSR has been attracting great attention as it provides an alternative way to surpass single image SR. However, addressing the RefSR problem has two critical challenges: (i) It is difficult to match the correspondence between LR and Ref images when… ▽ More

    Submitted 4 August, 2022; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  45. arXiv:2207.10765  [pdf, other

    cs.CV

    Towards Interpretable Video Super-Resolution via Alternating Optimization

    Authors: Jiezhang Cao, **gyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc Van Gool

    Abstract: In this paper, we study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate low-resolution blurry video. Such problem often occurs when recording a fast dynamic event with a low-framerate and low-resolution camera, and the captured video would suffer from three typical issues: i) motion blur occurs… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  46. arXiv:2207.10436  [pdf, other

    cs.CV

    Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

    Authors: Guolei Sun, Yun Liu, Hao Tang, Ajad Chhatkuli, Le Zhang, Luc Van Gool

    Abstract: The essence of video semantic segmentation (VSS) is how to leverage temporal information for prediction. Previous efforts are mainly devoted to develo** new techniques to calculate the cross-frame affinities such as optical flow and attention. Instead, this paper contributes from a different angle by mining relations among cross-frame affinities, upon which better temporal information aggregatio… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  47. arXiv:2207.06825  [pdf, other

    cs.CV

    Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions

    Authors: David Bruggemann, Christos Sakaridis, Prune Truong, Luc Van Gool

    Abstract: Due to the scarcity of dense pixel-level semantic annotations for images recorded in adverse visual conditions, there has been a keen interest in unsupervised domain adaptation (UDA) for the semantic segmentation of such images. UDA adapts models trained on normal conditions to the target adverse-condition domains. Meanwhile, multiple datasets with driving scenes provide corresponding images of th… ▽ More

    Submitted 3 July, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

  48. arXiv:2207.06262  [pdf, other

    cs.CV cs.CG

    Organic Priors in Non-Rigid Structure from Motion

    Authors: Suryansh Kumar, Luc Van Gool

    Abstract: This paper advocates the use of organic priors in classical non-rigid structure from motion (NRSfM). By organic priors, we mean invaluable intermediate prior information intrinsic to the NRSfM matrix factorization theory. It is shown that such priors reside in the factorized matrices, and quite surprisingly, existing methods generally disregard them. The paper's main contribution is to put forward… ▽ More

    Submitted 16 July, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: To appear in ECCV 2022 Conference (Oral Presentation). Draft info: 18 Pages, 4 Figures, and 6 Tables. Project webpage: https://suryanshkumar.github.io/Organic_Prior_NRSfM/

  49. arXiv:2207.02255  [pdf, other

    cs.CV

    OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

    Authors: Jialun Pei, Tianyang Cheng, Deng-** Fan, He Tang, Chuanbo Chen, Luc Van Gool

    Abstract: We present OSFormer, the first one-stage transformer framework for camouflaged instance segmentation (CIS). OSFormer is based on two key designs. First, we design a location-sensing transformer (LST) to obtain the location label and instance-aware parameters by introducing the location-guided queries and the blend-convolution feedforward network. Second, we develop a coarse-to-fine fusion (CFF) to… ▽ More

    Submitted 2 August, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: This paper has been accepted by ECCV2022

  50. arXiv:2207.01009  [pdf, other

    cs.CV cs.RO

    L2E: Lasers to Events for 6-DoF Extrinsic Calibration of Lidars and Event Cameras

    Authors: Kevin Ta, David Bruggemann, Tim Brödermann, Christos Sakaridis, Luc Van Gool

    Abstract: As neuromorphic technology is maturing, its application to robotics and autonomous vehicle systems has become an area of active research. In particular, event cameras have emerged as a compelling alternative to frame-based cameras in low-power and latency-demanding applications. To enable event cameras to operate alongside staple sensors like lidar in perception tasks, we propose a direct, tempora… ▽ More

    Submitted 20 February, 2023; v1 submitted 3 July, 2022; originally announced July 2022.

    Comments: Accepted to ICRA2023