Skip to main content

Showing 151–200 of 505 results for author: van Gool, L

.
  1. arXiv:2206.15157  [pdf, other

    cs.CV cs.LG

    HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection

    Authors: Tim Broedermann, Christos Sakaridis, Dengxin Dai, Luc Van Gool

    Abstract: Besides standard cameras, autonomous vehicles typically include multiple additional sensors, such as lidars and radars, which help acquire richer information for perceiving the content of the driving scene. While several recent works focus on fusing certain pairs of sensors - such as camera with lidar or radar - by using architectural components specific to the examined setting, a generic and modu… ▽ More

    Submitted 11 August, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: IEEE International Conference on Intelligent Transportation Systems (ITSC) 2023

  2. arXiv:2206.14797  [pdf, other

    cs.CV cs.LG

    3D-Aware Video Generation

    Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc Van Gool, Radu Timofte

    Abstract: Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neu… ▽ More

    Submitted 9 August, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: TMLR 2023; Project page: https://sherwinbahmani.github.io/3dvidgen

  3. arXiv:2206.08367  [pdf, other

    cs.CV cs.LG

    SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

    Authors: Tao Sun, Mattia Segu, Janis Postels, Yuxuan Wang, Luc Van Gool, Bernt Schiele, Federico Tombari, Fisher Yu

    Abstract: Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous driving systems. Existing image and video driving datasets, however, fall short of capturing the mutable nature of the real world. In this paper, we introduce the largest multi-task synthetic dataset for autonomous driving, SHIFT. It presents discrete and continuous shifts in cloudines… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: Published at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  4. arXiv:2206.07687  [pdf, other

    cs.CV eess.IV

    Structured Sparsity Learning for Efficient Video Super-Resolution

    Authors: Bin Xia, **gwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc Van Gool

    Abstract: The high computational costs of video super-resolution (VSR) models hinder their deployment on resource-limited devices, (e.g., smartphones and drones). Existing VSR models contain considerable redundant filters, which drag down the inference efficiency. To prune these unimportant filters, we develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties… ▽ More

    Submitted 25 March, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Accepted by CVPR2023, code is available at https://github.com/Zj-BinXia/SSL

  5. arXiv:2206.06363  [pdf, other

    cs.CV cs.LG

    Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation

    Authors: Wouter Van Gansbeke, Simon Vandenhende, Luc Van Gool

    Abstract: The task of unsupervised semantic segmentation aims to cluster pixels into semantically meaningful groups. Specifically, pixels assigned to the same cluster should share high-level semantic properties like their object or part category. This paper presents MaskDistill: a novel framework for unsupervised semantic segmentation based on three key ideas. First, we advocate a data-driven strategy to ge… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Code: https://github.com/wvangansbeke/MaskDistill

  6. arXiv:2206.02146  [pdf, other

    cs.CV eess.IV

    Recurrent Video Restoration Transformer with Guided Deformable Attention

    Authors: **gyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, Luc Van Gool

    Abstract: Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusi… ▽ More

    Submitted 12 November, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted by NeurIPS 2022. Code: https://github.com/**gyunLiang/RVRT

  7. arXiv:2206.01705  [pdf, other

    cs.CV

    Gradient Obfuscation Checklist Test Gives a False Sense of Security

    Authors: Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

    Abstract: One popular group of defense techniques against adversarial attacks is based on injecting stochastic noise into the network. The main source of robustness of such stochastic defenses however is often due to the obfuscation of the gradients, offering a false sense of security. Since most of the popular adversarial attacks are optimization-based, obfuscated gradients reduce their attacking ability,… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  8. GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector

    Authors: Peng Zheng, Huazhu Fu, Deng-** Fan, Qi Fan, Jie Qin, Yu-Wing Tai, Chi-Keung Tang, Luc Van Gool

    Abstract: In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes. The proposed GCoNet+ achieves the new state-of-the-art performance for co-salient object detection (CoSOD) through mining consensus representations based on the following two essential criteria: 1) intra-gr… ▽ More

    Submitted 10 April, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: T-PAMI 2023

  9. Deep Gradient Learning for Efficient Camouflaged Object Detection

    Authors: Ge-Peng Ji, Deng-** Fan, Yu-Cheng Chou, Dengxin Dai, Alexander Liniger, Luc Van Gool

    Abstract: This paper introduces DGNet, a novel deep framework that exploits object gradient supervision for camouflaged object detection (COD). It decouples the task into two connected branches, i.e., a context and a texture encoder. The essential connection is the gradient-induced transition, representing a soft grou** between context and texture features. Benefiting from the simple but efficient framewo… ▽ More

    Submitted 8 August, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted by Machine Intelligence Research

    Journal ref: Machine Intelligence Research. 20, 92-108 (2023)

  10. arXiv:2205.10195  [pdf, other

    cs.CV

    Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration

    Authors: **g Lin, Xiaowan Hu, Yuanhao Cai, Haoqian Wang, Youliang Yan, Xueyi Zou, Yulun Zhang, Luc Van Gool

    Abstract: How to properly model the inter-frame relation within the video sequence is an important but unsolved challenge for video restoration (VR). In this work, we propose an unsupervised flow-aligned sequence-to-sequence model (S2SVR) to address this problem. On the one hand, the sequence-to-sequence model, which has proven capable of sequence modeling in the field of natural language processing, is exp… ▽ More

    Submitted 16 June, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: ICML 2022; The first sequence-to-sequence model for video restoration

  11. arXiv:2205.10102  [pdf, other

    eess.IV cs.CV

    Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging

    Authors: Yuanhao Cai, **g Lin, Haoqian Wang, Xin Yuan, Henghui Ding, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: In coded aperture snapshot spectral compressive imaging (CASSI) systems, hyperspectral image (HSI) reconstruction methods are employed to recover the spatial-spectral signal from a compressed measurement. Among these algorithms, deep unfolding methods demonstrate promising performance but suffer from two issues. Firstly, they do not estimate the degradation patterns and ill-posedness degree from t… ▽ More

    Submitted 16 October, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022; The first Transformer-based deep unfolding method for spectral compressive imaging

  12. arXiv:2205.05676  [pdf, other

    cs.CV

    Revisiting Random Channel Pruning for Neural Network Compression

    Authors: Yawei Li, Kamil Adamczewski, Wen Li, Shuhang Gu, Radu Timofte, Luc Van Gool

    Abstract: Channel (or 3D filter) pruning serves as an effective way to accelerate the inference of neural networks. There has been a flurry of algorithms that try to solve this practical problem, each being claimed effective in some ways. Yet, a benchmark to compare those algorithms directly is lacking, mainly due to the complexity of the algorithms and some custom settings such as the particular network co… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR2022. Code will be released at \url{https://github.com/ofsoundof/random_channel_pruning}

  13. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  14. arXiv:2205.05467  [pdf, other

    cs.CV cs.LG

    A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials

    Authors: Chuqiao Li, Zhiwu Huang, Danda Pani Paudel, Yabin Wang, Mohamad Shahbazi, Xiaopeng Hong, Luc Van Gool

    Abstract: There have been emerging a number of benchmarks and techniques for the detection of deepfakes. However, very few works study the detection of incrementally appearing deepfakes in the real-world scenarios. To simulate the wild scenes, this paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models. The suggested CD… ▽ More

    Submitted 14 November, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted to WACV 2023

  15. arXiv:2204.13132  [pdf, other

    cs.CV

    HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

    Authors: Lukas Hoyer, Dengxin Dai, Luc Van Gool

    Abstract: Unsupervised domain adaptation (UDA) aims to adapt a model trained on the source domain (e.g. synthetic data) to the target domain (e.g. real-world data) without requiring further annotations on the target domain. This work focuses on UDA for semantic segmentation as real-world pixel-wise annotations are particularly expensive to acquire. As UDA methods for semantic segmentation are usually GPU me… ▽ More

    Submitted 26 July, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: ECCV 2022

  16. arXiv:2204.07908  [pdf, other

    cs.CV

    MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

    Authors: Yuanhao Cai, **g Lin, Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Radu Timofte, Luc Van Gool

    Abstract: Existing leading methods for spectral reconstruction (SR) focus on designing deeper or wider convolutional neural networks (CNNs) to learn the end-to-end map** from the RGB image to its hyperspectral image (HSI). These CNN-based methods achieve impressive restoration performance while showing limitations in capturing the long-range dependencies and self-similarity prior. To cope with this proble… ▽ More

    Submitted 16 April, 2022; originally announced April 2022.

    Comments: Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB; The First Transformer-based Method for Spectral Reconstruction

    Journal ref: CVPRW 2022

  17. arXiv:2204.06552  [pdf, other

    cs.CV

    Neural Vector Fields for Implicit Surface Representation and Inference

    Authors: Edoardo Mello Rella, Ajad Chhatkuli, Ender Konukoglu, Luc Van Gool

    Abstract: Implicit fields have recently shown increasing success in representing and learning 3D shapes accurately. Signed distance fields and occupancy fields are decades old and still the preferred representations, both with well-studied properties, despite their restriction to closed surfaces. With neural networks, several other variations and training principles have been proposed with the goal to repre… ▽ More

    Submitted 7 April, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

  18. arXiv:2204.03353  [pdf, other

    cs.CV

    Learning Online Multi-Sensor Depth Fusion

    Authors: Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc Van Gool

    Abstract: Many hand-held or mixed reality devices are used with a single sensor for 3D reconstruction, although they often comprise multiple sensors. Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics… ▽ More

    Submitted 21 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to ECCV 2022. 31 pages, 17 figures, 15 Tables

  19. arXiv:2204.03330  [pdf, other

    cs.CV cs.AI

    Learning Local and Global Temporal Contexts for Video Semantic Segmentation

    Authors: Guolei Sun, Yun Liu, Henghui Ding, Min Wu, Luc Van Gool

    Abstract: Contextual information plays a core role for video semantic segmentation (VSS). This paper summarizes contexts for VSS in two-fold: local temporal contexts (LTC) which define the contexts from neighboring frames, and global temporal contexts (GTC) which represent the contexts from the whole video. As for LTC, it includes static and motional contexts, corresponding to static and moving content in n… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to TPAMI, an extended version of a paper published in CVPR 2022

  20. arXiv:2204.02392  [pdf, other

    cs.RO cs.AI cs.MA

    Deep Interactive Motion Prediction and Planning: Playing Games with Motion Prediction Models

    Authors: Jose L. Vazquez, Alexander Liniger, Wilko Schwarting, Daniela Rus, Luc Van Gool

    Abstract: In most classical Autonomous Vehicle (AV) stacks, the prediction and planning layers are separated, limiting the planner to react to predictions that are not informed by the planned trajectory of the AV. This work presents a module that tightly couples these layers via a game-theoretic Model Predictive Controller (MPC) that uses a novel interactive multi-agent neural network policy as part of its… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: accepted to L4DC

  21. arXiv:2204.02273  [pdf, other

    cs.CV

    Arbitrary-Scale Image Synthesis

    Authors: Evangelos Ntavelis, Mohamad Shahbazi, Iason Kastanis, Radu Timofte, Martin Danelljan, Luc Van Gool

    Abstract: Positional encodings have enabled recent works to train a single adversarial network that can generate images of different scales. However, these approaches are either limited to a set of discrete scales or struggle to maintain good perceptual quality at the scales for which the model is not trained explicitly. We propose the design of scale-consistent positional encodings invariant to our generat… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: CVPR2022, code: https://github.com/vglsd/ScaleParty

  22. arXiv:2204.02091  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

    Authors: Vaishakh Patil, Christos Sakaridis, Alexander Liniger, Luc Van Gool

    Abstract: Monocular depth estimation is vital for scene understanding and downstream tasks. We focus on the supervised setup, in which ground-truth depth is available only at training time. Based on knowledge about the high regularity of real 3D scenes, we propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth. In particular, we introduce a piece… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted at CVPR 2022

  23. arXiv:2204.01267  [pdf, other

    cs.CV

    FoV-Net: Field-of-View Extrapolation Using Self-Attention and Uncertainty

    Authors: Liqian Ma, Stamatios Georgoulis, Xu Jia, Luc Van Gool

    Abstract: The ability to make educated predictions about their surroundings, and associate them with certain confidence, is important for intelligent systems, like autonomous vehicles and robots. It allows them to plan early and decide accordingly. Motivated by this observation, in this paper we utilize information from a video sequence with a narrow field-of-view to infer the scene at a wider field-of-view… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to IEEE Robotics and Automation Letters and ICRA2021. Project page http://charliememory.github.io/RAL21_FoV

  24. arXiv:2204.01263  [pdf, other

    cs.CV

    Direct Dense Pose Estimation

    Authors: Liqian Ma, Lingjie Liu, Christian Theobalt, Luc Van Gool

    Abstract: Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies, which finds various applications, such as human body reconstruction, human pose transfer, and human action recognition. Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to 3DV 2021. Project page http://charliememory.github.io/3DV21_DDP/

  25. arXiv:2203.16586  [pdf, other

    cs.CV

    Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

    Authors: Hanqing Wang, Wei Liang, Jianbing Shen, Luc Van Gool, Wenguan Wang

    Abstract: Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions. However, far less attention has been paid to the inverse task: instruction generation -- learning a speaker~to generate grounded descriptions for navigation routes. Existing VLN methods train a speaker independ… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  26. arXiv:2203.15118  [pdf, other

    cs.CV cs.LG

    LiDAR Snowfall Simulation for Robust 3D Object Detection

    Authors: Martin Hahner, Christos Sakaridis, Mario Bijelic, Felix Heide, Fisher Yu, Dengxin Dai, Luc Van Gool

    Abstract: 3D object detection is a central task for applications such as autonomous driving, in which the system needs to localize and classify surrounding traffic agents, even in the presence of adverse weather. In this paper, we address the problem of LiDAR-based 3D object detection under snowfall. Due to the difficulty of collecting and annotating training data in this setting, we propose a physically ba… ▽ More

    Submitted 5 June, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Oral at CVPR 2022

  27. arXiv:2203.15102  [pdf, other

    cs.CV

    Rethinking Semantic Segmentation: A Prototype View

    Authors: Tianfei Zhou, Wenguan Wang, Ender Konukoglu, Luc Van Gool

    Abstract: Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric se… ▽ More

    Submitted 4 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022 (Oral); Code: https://github.com/tfzhou/ProtoSeg

  28. Video Polyp Segmentation: A Deep Learning Perspective

    Authors: Ge-Peng Ji, Guobao Xiao, Yu-Cheng Chou, Deng-** Fan, Kai Zhao, Geng Chen, Luc Van Gool

    Abstract: We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 colonoscopy frames from the we… ▽ More

    Submitted 31 August, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: Accepted by Machine Intelligence Research 2022 (Project Page: https://github.com/GewelsJI/VPS)

    Journal ref: Machine Intelligence Research, vol. 19, no. 6, pp.531-549, 2022

  29. arXiv:2203.13812  [pdf, other

    cs.CV

    Spatially Multi-conditional Image Generation

    Authors: Ritika Chakraborty, Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

    Abstract: In most scenarios, conditional image generation can be thought of as an inversion of the image understanding process. Since generic image understanding involves solving multiple tasks, it is natural to aim at generating images via multi-conditioning. However, multi-conditional image generation is a very challenging problem due to the heterogeneity and the sparsity of the (in practice) available co… ▽ More

    Submitted 14 July, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

  30. arXiv:2203.13591  [pdf, other

    cs.CV

    Continual Test-Time Domain Adaptation

    Authors: Qin Wang, Olga Fink, Luc Van Gool, Dengxin Dai

    Abstract: Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data. Existing works mainly consider the case where the target domain is static. However, real-world machine perception systems are running in non-stationary and continually changing environments where the target domain distribution can change over time. Existing methods, which are most… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  31. arXiv:2203.13278  [pdf, other

    cs.CV cs.GR eess.IV

    Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis

    Authors: Kai Zhang, Yawei Li, **gyun Liang, Jiezhang Cao, Yulun Zhang, Hao Tang, Deng-** Fan, Radu Timofte, Luc Van Gool

    Abstract: While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising, existing methods mostly rely on simple noise assumptions, such as additive white Gaussian noise (AWGN), JPEG compression noise and camera sensor noise, and a general-purpose blind denoising method for real images remains unsolved. In this paper, we attempt to solve this problem f… ▽ More

    Submitted 1 December, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Codes: https://github.com/cszn/SCUNet

    Journal ref: Machine Intelligence Research, 2023

  32. arXiv:2203.11192  [pdf, other

    cs.CV

    Transforming Model Prediction for Tracking

    Authors: Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, Luc Van Gool

    Abstract: Optimization based tracking methods have been widely successful by integrating a target model prediction module, providing effective global reasoning by minimizing an objective function. While this inductive bias integrates valuable domain knowledge, it limits the expressivity of the tracking network. In this work, we therefore propose a tracker architecture employing a Transformer-based model pre… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022. The code and trained models are available at https://github.com/visionml/pytracking

  33. arXiv:2203.11191  [pdf, other

    cs.CV

    Robust Visual Tracking by Segmentation

    Authors: Matthieu Paul, Martin Danelljan, Christoph Mayer, Luc Van Gool

    Abstract: Estimating the target extent poses a fundamental challenge in visual object tracking. Typically, trackers are box-centric and fully rely on a bounding box to define the target in the scene. In practice, objects often have complex shapes and are not aligned with the image axis. In these cases, bounding boxes do not provide an accurate description of the target and often contain a majority of backgr… ▽ More

    Submitted 20 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted at ECCV 2022. Code and trained models are available at: https://github.com/visionml/pytracking

  34. arXiv:2203.10636  [pdf, other

    cs.CV

    Transform your Smartphone into a DSLR Camera: Learning the ISP in the Wild

    Authors: Ardhendu Shekhar Tripathi, Martin Danelljan, Samarth Shukla, Radu Timofte, Luc Van Gool

    Abstract: We propose a trainable Image Signal Processing (ISP) framework that produces DSLR quality images given RAW images captured by a smartphone. To address the color misalignments between training image pairs, we employ a color-conditional ISP network and optimize a novel parametric color map** between each input RAW and reference DSLR image. During inference, we predict the target color image by des… ▽ More

    Submitted 12 July, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: Accepted at ECCV 2022

  35. arXiv:2203.08795  [pdf, other

    cs.CV cs.LG

    Zero Pixel Directional Boundary by Vector Transform

    Authors: Edoardo Mello Rella, Ajad Chhatkuli, Yun Liu, Ender Konukoglu, Luc Van Gool

    Abstract: Boundaries are among the primary visual cues used by human and computer vision systems. One of the key problems in boundary detection is the label representation, which typically leads to class imbalance and, as a consequence, to thick boundaries that require non-differential post-processing steps to be thinned. In this paper, we re-interpret boundaries as 1-D surfaces and formulate a one-to-one v… ▽ More

    Submitted 8 September, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Published at the Tenth International Conference on Learning Representations (ICLR 2022)

  36. arXiv:2203.08537  [pdf, other

    cs.CV

    Scribble-Supervised LiDAR Semantic Segmentation

    Authors: Ozan Unal, Dengxin Dai, Luc Van Gool

    Abstract: Densely annotating LiDAR point clouds remains too expensive and time-consuming to keep up with the ever growing volume of data. While current literature focuses on fully-supervised performance, develo** efficient methods that take advantage of realistic weak supervision have yet to be explored. In this paper, we propose using scribbles to annotate LiDAR point clouds and release ScribbleKITTI, th… ▽ More

    Submitted 31 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022 (ORAL)

  37. arXiv:2203.06639  [pdf, other

    cs.CV

    Revisiting Deep Semi-supervised Learning: An Empirical Distribution Alignment Framework and Its Generalization Bound

    Authors: Feiyu Wang, Qin Wang, Wen Li, Dong Xu, Luc Van Gool

    Abstract: In this work, we revisit the semi-supervised learning (SSL) problem from a new perspective of explicitly reducing empirical distribution mismatch between labeled and unlabeled samples. Benefited from this new perspective, we first propose a new deep semi-supervised learning framework called Semi-supervised Learning by Empirical Distribution Alignment (SLEDA), in which existing technologies from th… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

    Comments: Submitted to T-PAMI on August 2021

  38. arXiv:2203.04845  [pdf, other

    cs.CV

    Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

    Authors: Yuanhao Cai, **g Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: Many algorithms have been developed to solve the inverse problem of coded aperture snapshot spectral imaging (CASSI), i.e., recovering the 3D hyperspectral images (HSIs) from a 2D compressive measurement. In recent years, learning-based methods have demonstrated promising performance and dominated the mainstream research direction. However, existing CNN-based methods show limitations in capturing… ▽ More

    Submitted 10 July, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: ECCV 2022

  39. arXiv:2203.04279  [pdf, other

    cs.CV

    Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences

    Authors: Prune Truong, Martin Danelljan, Fisher Yu, Luc Van Gool

    Abstract: We propose Probabilistic Warp Consistency, a weakly-supervised learning objective for semantic matching. Our approach directly supervises the dense matching scores predicted by the network, encoded as a conditional probability distribution. We first construct an image triplet by applying a known warp to one of the images in a pair depicting different instances of the same object class. Our probabi… ▽ More

    Submitted 31 October, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022 code: https://github.com/PruneTruong/DenseMatching

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  40. arXiv:2203.03727  [pdf, other

    cs.CV

    Barlow constrained optimization for Visual Question Answering

    Authors: Abhishek Jha, Badri N. Patro, Luc Van Gool, Tinne Tuytelaars

    Abstract: Visual question answering is a vision-and-language multimodal task, that aims at predicting answers given samples from the question and image modalities. Most recent methods focus on learning a good joint embedding space of images and questions, either by improving the interaction between these two modalities, or by making it a more discriminant space. However, how informative this joint space is,… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  41. arXiv:2203.03610  [pdf, other

    cs.CV cs.LG cs.RO

    ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization

    Authors: Menelaos Kanakis, Simon Maurer, Matteo Spallanzani, Ajad Chhatkuli, Luc Van Gool

    Abstract: Efficient detection and description of geometric regions in images is a prerequisite in visual systems for localization and map**. Such systems still rely on traditional hand-crafted methods for efficient generation of lightweight descriptors, a common limitation of the more powerful neural network models that come with high compute and specific hardware requirements. In this paper, we focus on… ▽ More

    Submitted 8 April, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: Computer Vision and Pattern Recognition Workshop (CVPRW), 2023

  42. arXiv:2203.03041  [pdf, other

    cs.CV

    Highly Accurate Dichotomous Image Segmentation

    Authors: Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-** Fan, Ling Shao, Luc Van Gool

    Abstract: We present a systematic study on a new task called dichotomous image segmentation (DIS) , which aims to segment highly accurate objects from natural images. To this end, we collected the first large-scale DIS dataset, called DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images covering camouflaged, salient, or meticulous objects in various backgrounds. DIS is annotated with… ▽ More

    Submitted 15 July, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: 29 pages, 18 figures, ECCV 2022

  43. arXiv:2203.02149  [pdf, other

    eess.IV cs.CV

    HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging

    Authors: Xiaowan Hu, Yuanhao Cai, **g Lin, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: The rapid development of deep learning provides a better solution for the end-to-end reconstruction of hyperspectral image (HSI). However, existing learning-based methods have two major defects. Firstly, networks with self-attention usually sacrifice internal resolution to balance model performance against complexity, losing fine-grained high-resolution (HR) features. Secondly, even if the optimiz… ▽ More

    Submitted 16 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  44. arXiv:2202.13162  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Pix2NeRF: Unsupervised Conditional $π$-GAN for Single Image to Neural Radiance Fields Translation

    Authors: Shengqu Cai, Anton Obukhov, Dengxin Dai, Luc Van Gool

    Abstract: We propose a pipeline to generate Neural Radiance Fields~(NeRF) of an object or a scene of a specific class, conditioned on a single input image. This is a challenging task, as training NeRF requires multiple views of the same scene, coupled with corresponding poses, which are hard to obtain. Our method is based on $π$-GAN, a generative model for unconditional 3D-aware image synthesis, which maps… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: 16 pages, 10 figures

  45. arXiv:2202.13071  [pdf, other

    cs.CV

    Uncertainty-Aware Deep Multi-View Photometric Stereo

    Authors: Berk Kaya, Suryansh Kumar, Carlos Oliveira, Vittorio Ferrari, Luc Van Gool

    Abstract: This paper presents a simple and effective solution to the longstanding classical multi-view photometric stereo (MVPS) problem. It is well-known that photometric stereo (PS) is excellent at recovering high-frequency surface details, whereas multi-view stereo (MVS) can help remove the low-frequency distortion due to PS and retain the global geometry of the shape. This paper proposes an approach tha… ▽ More

    Submitted 28 March, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Accepted for publication in IEEE/CVF CVPR 2022. (11 Pages, 6 Figures, 3 Tables)

  46. arXiv:2202.08837  [pdf, other

    cs.CV cs.AI cs.LG

    Adiabatic Quantum Computing for Multi Object Tracking

    Authors: Jan-Nico Zaech, Alexander Liniger, Martin Danelljan, Dengxin Dai, Luc Van Gool

    Abstract: Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 16 Pages

  47. arXiv:2202.01731  [pdf, other

    eess.IV cs.CV

    Fast Online Video Super-Resolution with Deformable Attention Pyramid

    Authors: Dario Fuoli, Martin Danelljan, Radu Timofte, Luc Van Gool

    Abstract: Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV. We address the VSR problem under these settings, which poses additional important challenges since information from future frames is unavailable. Importantly, designing efficient, yet effective frame alignment and fusion modules remain central problems.… ▽ More

    Submitted 6 April, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  48. arXiv:2201.12288  [pdf, other

    cs.CV eess.IV

    VRT: A Video Restoration Transformer

    Authors: **gyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool

    Abstract: Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, wh… ▽ More

    Submitted 15 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: add results on VFI and STVSR; SOTA results (+up to 2.16dB) on video SR, video deblurring, video denoising, video frame interpolation and space-time video super-resolution. Code: https://github.com/**gyunLiang/VRT

  49. arXiv:2201.11279  [pdf, other

    cs.CV

    Revisiting RCAN: Improved Training for Image Super-Resolution

    Authors: Zudi Lin, Prateek Garg, Atmadeep Banerjee, Salma Abdel Magid, Deqing Sun, Yulun Zhang, Luc Van Gool, Donglai Wei, Hanspeter Pfister

    Abstract: Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight. However, most SR models were optimized with dated training strategies. In this work, we revisit the popular RCAN model and examine the effect of different training options in SR. Surprisingly (or perhaps as expected), we show that RCAN can outperform or match nearly all the CNN-based SR architectu… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: 13 pages with 10 tables and 4 figures

  50. arXiv:2201.09865  [pdf, other

    cs.CV

    RePaint: Inpainting using Denoising Diffusion Probabilistic Models

    Authors: Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool

    Abstract: Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of sem… ▽ More

    Submitted 31 August, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: We missed out on other diffusion models that work on inpainting. We corrected that and apologize for this mistake