Skip to main content

Showing 201–250 of 505 results for author: van Gool, L

.
  1. arXiv:2201.06578  [pdf, other

    cs.CV cs.AI

    Collapse by Conditioning: Training Class-conditional GANs with Limited Data

    Authors: Mohamad Shahbazi, Martin Danelljan, Danda Pani Paudel, Luc Van Gool

    Abstract: Class-conditioning offers a direct means to control a Generative Adversarial Network (GAN) based on a discrete input variable. While necessary in many applications, the additional information provided by the class labels could even be expected to benefit the training of the GAN itself. On the contrary, we observe that class-conditioning causes mode collapse in limited data settings, where uncondit… ▽ More

    Submitted 16 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

  2. End-To-End Optimization of LiDAR Beam Configuration for 3D Object Detection and Localization

    Authors: Niclas Vödisch, Ozan Unal, Ke Li, Luc Van Gool, Dengxin Dai

    Abstract: Existing learning methods for LiDAR-based applications use 3D points scanned under a pre-determined beam configuration, e.g., the elevation angles of beams are often evenly distributed. Those fixed configurations are task-agnostic, so simply using them can lead to sub-optimal performance. In this work, we take a new route to learn to optimize the LiDAR beam configuration for a given application. S… ▽ More

    Submitted 28 March, 2023; v1 submitted 11 January, 2022; originally announced January 2022.

    Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2242-2249, April 2022

  3. arXiv:2201.01893  [pdf, other

    eess.IV cs.CV

    Flow-Guided Sparse Transformer for Video Deblurring

    Authors: **g Lin, Yuanhao Cai, Xiaowan Hu, Haoqian Wang, Youliang Yan, Xueyi Zou, Henghui Ding, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sp… ▽ More

    Submitted 29 May, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: ICML 2022; The First Transformer-based method for Video Deblurring

  4. arXiv:2201.01046  [pdf, other

    cs.CV cs.MM

    Sound and Visual Representation Learning with Multiple Pretraining Tasks

    Authors: Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool

    Abstract: Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. Specifically, for this study, we investigate binaural sounds and image data in isolation. For binaural so… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

    Comments: 11 pages, 3 figures

  5. Facial-Sketch Synthesis: A New Challenge

    Authors: Deng-** Fan, Ziling Huang, Peng Zheng, Hong Liu, Xuebin Qin, Luc Van Gool

    Abstract: This paper aims to conduct a comprehensive study on facial-sketch synthesis (FSS). However, due to the high costs of obtaining hand-drawn sketch datasets, there lacks a complete benchmark for assessing the development of FSS algorithms over the last decade. We first introduce a high-quality dataset for FSS, named FS2K, which consists of 2,104 image-sketch pairs spanning three types of sketch style… ▽ More

    Submitted 11 July, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

    Comments: Accepted to Machine Intelligence Research (MIR)

  6. arXiv:2112.15111  [pdf, other

    cs.CV

    Improving the Behaviour of Vision Transformers with Token-consistent Stochastic Layers

    Authors: Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

    Abstract: We introduce token-consistent stochastic layers in vision transformers, without causing any severe drop in performance. The added stochasticity improves network calibration, robustness and strengthens privacy. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are… ▽ More

    Submitted 14 July, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

    Comments: This article is under consideration at the Computer Vision and Image Understanding journal

  7. arXiv:2112.10196  [pdf, other

    cs.CV

    End-to-End Learning of Multi-category 3D Pose and Shape Estimation

    Authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool

    Abstract: In this paper, we study the representation of the shape and pose of objects using their keypoints. Therefore, we propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D. The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations. In addition to being end-to-end from images to 3D keypoints, our method also handles… ▽ More

    Submitted 9 March, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

  8. arXiv:2112.10155  [pdf, other

    cs.CV

    Topology Preserving Local Road Network Estimation from Single Onboard Camera Image

    Authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool

    Abstract: Knowledge of the road network topology is crucial for autonomous planning and navigation. Yet, recovering such topology from a single image has only been explored in part. Furthermore, it needs to refer to the ground plane, where also the driving actions are taken. This paper aims at extracting the local road network topology, directly in the bird's-eye-view (BEV), all in a complex urban setting.… ▽ More

    Submitted 30 March, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

    Comments: CVPR 2022

  9. arXiv:2112.09686  [pdf, other

    cs.CV

    Efficient Visual Tracking with Exemplar Transformers

    Authors: Philippe Blatter, Menelaos Kanakis, Martin Danelljan, Luc Van Gool

    Abstract: The design of more complex and powerful neural network models has significantly advanced the state-of-the-art in visual object tracking. These advances can be attributed to deeper networks, or the introduction of new building blocks, such as transformers. However, in the pursuit of increased tracking performance, runtime is often hindered. Furthermore, efficient tracking architectures have receive… ▽ More

    Submitted 4 October, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  10. arXiv:2112.04267  [pdf, other

    eess.IV cs.CV cs.LG

    Implicit Neural Representations for Image Compression

    Authors: Yannick Strümpler, Janis Postels, Ren Yang, Luc van Gool, Federico Tombari

    Abstract: Recently Implicit Neural Representations (INRs) gained attention as a novel and effective representation for various data types. Thus far, prior work mostly focused on optimizing their reconstruction performance. This work investigates INRs from a novel perspective, i.e., as a tool for image compression. To this end, we propose the first comprehensive compression pipeline based on INRs including q… ▽ More

    Submitted 3 August, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

  11. Configuration Space Exploration for Digital Printing Systems

    Authors: Jasper Denkers, Marvin Brunner, Louis van Gool, Eelco Visser

    Abstract: Within the printing industry, much of the variety in printed applications comes from the variety in finishing. Finishing comprises the processing of sheets of paper after being printed, e.g. to form books. The configuration space of finishers, i.e. all possible configurations given the available features and hardware capabilities, are large. Current control software minimally assists operators in… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 24 pages, 11 figures. This is an extended version of https://link.springer.com/chapter/10.1007/978-3-030-92124-8_24

    Journal ref: Calinescu R., Păsăreanu C.S. (eds) Software Engineering and Formal Methods. SEFM 2021. Lecture Notes in Computer Science, vol 13085. Springer, Cham

  12. Event-Based Fusion for Motion Deblurring with Cross-modal Attention

    Authors: Lei Sun, Christos Sakaridis, **gyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, Luc Van Gool

    Abstract: Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times. As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution, providing valid image degradation information within the exposure time. In this paper, we rethink the eventbased image deblurring problem and unfold it into an end-to-end… ▽ More

    Submitted 11 January, 2023; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: Accepted by ECCV 2022 as oral presentation

  13. arXiv:2111.14887  [pdf, other

    cs.CV

    DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

    Authors: Lukas Hoyer, Dengxin Dai, Luc Van Gool

    Abstract: As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in unsupervised domain adaptation (UDA). Even though a large number of methods propose new adaptation strategies, they are mostly based on ou… ▽ More

    Submitted 29 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: CVPR 2022

  14. arXiv:2111.14673  [pdf, other

    cs.CV

    3D Compositional Zero-shot Learning with DeCompositional Consensus

    Authors: Muhammad Ferjad Naeem, Evin Pınar Örnek, Yongqin Xian, Luc Van Gool, Federico Tombari

    Abstract: Parts represent a basic unit of geometric and semantic similarity across different objects. We argue that part knowledge should be composable beyond the observed object classes. Towards this, we present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes for semantic segmentation. We provide a structured study through benchmarking the task wit… ▽ More

    Submitted 15 April, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

  15. arXiv:2111.13333  [pdf, other

    cs.CV

    Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

    Authors: Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc Van Gool, Errui Ding

    Abstract: To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trained for. We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety o… ▽ More

    Submitted 24 March, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: To appear in CVPR 2022

  16. arXiv:2111.12707  [pdf, other

    cs.CV cs.AI cs.LG

    MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

    Authors: Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool

    Abstract: Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Tra… ▽ More

    Submitted 28 June, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Accepted by CVPR 2022. Open Sourced

  17. arXiv:2111.10346  [pdf, other

    cs.CV

    Global and Local Alignment Networks for Unpaired Image-to-Image Translation

    Authors: Guanglei Yang, Hao Tang, Humphrey Shi, Mingli Ding, Nicu Sebe, Radu Timofte, Luc Van Gool, Elisa Ricci

    Abstract: The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style while kee** unrelated contents of the input source image unchanged. However, due to the lack of attention to the content change in existing methods, the semantic information from source images suffers from degradation during translation. In the paper, to address this issue, we intro… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  18. arXiv:2111.07910  [pdf, other

    eess.IV cs.CV

    Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

    Authors: Yuanhao Cai, **g Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement in the coded aperture snapshot spectral imaging (CASSI) system. The HSI representations are highly similar and correlated across the spectral dimension. Modeling the inter-spectra interactions is beneficial for HSI reconstruction. However, existing CNN-based methods show limitations in cap… ▽ More

    Submitted 21 March, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: CVPR 2022; The first Transformer-based method for snapshot compressive imaging

  19. arXiv:2111.03649  [pdf, other

    cs.CV eess.IV

    Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic Super-resolution

    Authors: Andreas Lugmayr, Martin Danelljan, Fisher Yu, Luc Van Gool, Radu Timofte

    Abstract: Super-resolution is an ill-posed problem, where a ground-truth high-resolution image represents only one possibility in the space of plausible solutions. Yet, the dominant paradigm is to employ pixel-wise losses, such as L_1, which drive the prediction towards a blurry average. This leads to fundamentally conflicting objectives when combined with adversarial losses, which degrades the final qualit… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Journal ref: WACV 2022

  20. arXiv:2110.05621  [pdf, other

    cs.CV

    Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo

    Authors: Francesco Sarno, Suryansh Kumar, Berk Kaya, Zhiwu Huang, Vittorio Ferrari, Luc Van Gool

    Abstract: We present an automated machine learning approach for uncalibrated photometric stereo (PS). Our work aims at discovering lightweight and computationally efficient PS neural networks with excellent surface normal accuracy. Unlike previous uncalibrated deep PS networks, which are handcrafted and carefully tuned, we leverage differentiable neural architecture search (NAS) strategy to find uncalibrate… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at IEEE/CVF, WACV 2022. (11 pages)

  21. arXiv:2110.05594  [pdf, other

    cs.CV

    Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo

    Authors: Berk Kaya, Suryansh Kumar, Francesco Sarno, Vittorio Ferrari, Luc Van Gool

    Abstract: We present a modern solution to the multi-view photometric stereo problem (MVPS). Our work suitably exploits the image formation model in a MVPS experimental setup to recover the dense 3D reconstruction of an object from images. We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at IEEE/CVF WACV 2022. 18 pages

  22. arXiv:2110.01997  [pdf, other

    cs.CV

    Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images

    Authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool

    Abstract: Autonomous navigation requires structured representation of the road network and instance-wise identification of the other traffic agents. Since the traffic scene is defined on the ground plane, this corresponds to scene understanding in the bird's-eye-view (BEV). However, the onboard cameras of autonomous cars are customarily mounted horizontally for a better view of the surrounding, making this… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: ICCV 2021

  23. arXiv:2110.00464  [pdf, other

    cs.CV cs.AI cs.RO

    MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation

    Authors: Jonas Heylen, Mark De Wolf, Bruno Dawagne, Marc Proesmans, Luc Van Gool, Wim Abbeloos, Hazem Abdelkawy, Daniel Olmeda Reino

    Abstract: Monocular 3D object detection has recently shown promising results, however there remain challenging problems. One of those is the lack of invariance to different camera intrinsic parameters, which can be observed across different 3D object datasets. Little effort has been made to exploit the combination of heterogeneous 3D object datasets. In contrast to general intuition, we show that more data… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: Accepted to ICCV2021 Workshop on 3D Object Detection from Images

  24. arXiv:2109.13912  [pdf, other

    cs.CV

    PDC-Net+: Enhanced Probabilistic Dense Correspondence Network

    Authors: Prune Truong, Martin Danelljan, Radu Timofte, Luc Van Gool

    Abstract: Establishing robust and accurate correspondences between a pair of images is a long-standing computer vision problem with numerous applications. While classically dominated by sparse methods, emerging dense approaches offer a compelling alternative paradigm that avoids the keypoint detection step. However, dense flow estimation is often inaccurate in the case of large displacements, occlusions, or… ▽ More

    Submitted 29 September, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Code: https://github.com/PruneTruong/DenseMatching. Paper extension of PDC-Net. arXiv admin note: substantial text overlap with arXiv:2101.01710

  25. arXiv:2109.07854  [pdf, other

    cs.CV

    Context-aware Padding for Semantic Segmentation

    Authors: Yu-Hui Huang, Marc Proesmans, Luc Van Gool

    Abstract: Zero padding is widely used in convolutional neural networks to prevent the size of feature maps diminishing too fast. However, it has been claimed to disturb the statistics at the border. As an alternative, we propose a context-aware (CA) padding approach to extend the image. We reformulate the padding problem as an image extrapolation problem and illustrate the effects on the semantic segmentati… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

  26. arXiv:2109.04813  [pdf, other

    cs.CV

    TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation

    Authors: Rui Gong, Martin Danelljan, Dengxin Dai, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc Van Gool

    Abstract: Traditional domain adaptive semantic segmentation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many r… ▽ More

    Submitted 28 July, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted by ECCV 2022

  27. arXiv:2109.03082  [pdf, other

    eess.IV cs.CV

    Perceptual Learned Video Compression with Recurrent Conditional GAN

    Authors: Ren Yang, Radu Timofte, Luc Van Gool

    Abstract: This paper proposes a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional GAN. We employ the recurrent auto-encoder-based compression network as the generator, and most importantly, we propose a recurrent conditional discriminator, which judges on raw vs. compressed video conditioned on both spatial and temporal features, including the latent representation, temporal mo… ▽ More

    Submitted 30 April, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

    Comments: IJCAI 2022 camera ready

  28. arXiv:2109.02763  [pdf, other

    cs.SD cs.CV eess.AS

    Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

    Authors: Dengxin Dai, Arun Balajee Vasudevan, Jiri Matas, Luc Van Gool

    Abstract: Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, a… ▽ More

    Submitted 27 February, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted by TPAMI. arXiv admin note: substantial text overlap with arXiv:2003.04210

  29. arXiv:2108.12545  [pdf, other

    cs.CV

    Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

    Authors: Lukas Hoyer, Dengxin Dai, Qin Wang, Yuhua Chen, Luc Van Gool

    Abstract: Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained o… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: arXiv admin note: text overlap with arXiv:2012.10782

  30. arXiv:2108.11505  [pdf, other

    eess.IV cs.CV cs.LG

    Generalized Real-World Super-Resolution through Adversarial Robustness

    Authors: Angela Castillo, María Escobar, Juan C. Pérez, Andrés Romero, Radu Timofte, Luc Van Gool, Pablo Arbeláez

    Abstract: Real-world Super-Resolution (SR) has been traditionally tackled by first learning a specific degradation model that resembles the noise and corruption artifacts in low-resolution imagery. Thus, current methods lack generalization and lose their accuracy when tested on unseen types of corruption. In contrast to the traditional proposal, we present Robust Super-Resolution (RSR), a method that levera… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: ICCV Workshops, 2021

  31. arXiv:2108.10257  [pdf, other

    eess.IV cs.CV

    SwinIR: Image Restoration Using Swin Transformer

    Authors: **gyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, Radu Timofte

    Abstract: Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propo… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: Sota results on classical/lightweight/real-world image SR, image denoising and JPEG compression artifact reduction. Code: https://github.com/**gyunLiang/SwinIR

  32. arXiv:2108.08286  [pdf, other

    eess.IV cs.CV

    Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

    Authors: Goutam Bhat, Martin Danelljan, Fisher Yu, Luc Van Gool, Radu Timofte

    Abstract: We propose a deep reparametrization of the maximum a posteriori formulation commonly employed in multi-frame image restoration tasks. Our approach is derived by introducing a learned error metric and a latent representation of the target image, which transforms the MAP objective to a deep feature space. The deep reparametrization allows us to directly model the image formation process in the laten… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 Oral

  33. arXiv:2108.08265  [pdf, other

    cs.CV cs.RO

    End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

    Authors: Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, Luc Van Gool

    Abstract: End-to-end approaches to autonomous driving commonly rely on expert demonstrations. Although humans are good drivers, they are not good coaches for end-to-end algorithms that demand dense on-policy supervision. On the contrary, automated experts that leverage privileged information can efficiently generate large scale on-policy and off-policy demonstrations. However, existing automated experts for… ▽ More

    Submitted 4 October, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

    Comments: Published at ICCV 2021

  34. Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction

    Authors: Edoardo Mello Rella, Jan-Nico Zaech, Alexander Liniger, Luc Van Gool

    Abstract: Forecasting the future behavior of all traffic agents in the vicinity is a key task to achieve safe and reliable autonomous driving systems. It is a challenging problem as agents adjust their behavior depending on their intentions, the others' actions, and the road layout. In this paper, we propose Decoder Fusion RNN (DF-RNN), a recurrent, attention-based approach for motion forecasting. Our netwo… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    ACM Class: I.2.9; J.7

  35. arXiv:2108.05302  [pdf, other

    cs.CV

    Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

    Authors: **gyun Liang, Guolei Sun, Kai Zhang, Luc Van Gool, Radu Timofte

    Abstract: Existing blind image super-resolution (SR) methods mostly assume blur kernels are spatially invariant across the whole image. However, such an assumption is rarely applicable for real images whose blur kernels are usually spatially variant due to factors such as object motion and out-of-focus. Hence, existing blind SR methods would inevitably give rise to poor performance in real applications. To… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021. Code: https://github.com/**gyunLiang/MANet

  36. arXiv:2108.05301  [pdf, other

    eess.IV cs.CV

    Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

    Authors: **gyun Liang, Andreas Lugmayr, Kai Zhang, Martin Danelljan, Luc Van Gool, Radu Timofte

    Abstract: Normalizing flows have recently demonstrated promising results for low-level vision tasks. For image super-resolution (SR), it learns to predict diverse photo-realistic high-resolution (HR) images from the low-resolution (LR) image rather than learning a deterministic map**. For image rescaling, it achieves high accuracy by jointly modelling the downscaling and upscaling processes. While existin… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021. Code: https://github.com/**gyunLiang/HCFlow

  37. arXiv:2108.05249  [pdf, other

    cs.CV cs.LG

    Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather

    Authors: Martin Hahner, Christos Sakaridis, Dengxin Dai, Luc Van Gool

    Abstract: This work addresses the challenging task of LiDAR-based 3D object detection in foggy weather. Collecting and annotating data in such a scenario is very time, labor and cost intensive. In this paper, we tackle this problem by simulating physically accurate fog into clear-weather scenes, so that the abundant existing real datasets captured in clear weather can be repurposed for our task. Our contrib… ▽ More

    Submitted 16 August, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: Camera-Ready Version for ICCV 2021

  38. arXiv:2108.05246  [pdf, other

    cs.CV

    A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes

    Authors: Davide Menini, Suryansh Kumar, Martin R. Oswald, Erik Sandstrom, Cristian Sminchisescu, Luc Van Gool

    Abstract: This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label. Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed deep neural network based approach learns to fuse the depth over frames with suitable semantic labels in the scene space. Our approach exploits the joint volumetric representatio… ▽ More

    Submitted 28 December, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted for publication at IEEE Robotics and Automation Letters (RA-L), 2022. Draft info: 9 pages, 5 figures, 4 tables

  39. arXiv:2108.02266  [pdf, other

    cs.CV

    Boosting Few-shot Semantic Segmentation with Transformers

    Authors: Guolei Sun, Yun Liu, **gyun Liang, Luc Van Gool

    Abstract: Due to the fact that fully supervised semantic segmentation methods require sufficient fully-labeled data to work well and can not generalize to unseen classes, few-shot segmentation has attracted lots of research attention. Previous arts extract features from support and query images, which are processed jointly before making predictions on query images. The whole process is based on convolutiona… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Comments: Technical report. Code and pretrained models will be available: https://github.com/GuoleiSun/TRFS

  40. arXiv:2107.01153  [pdf, other

    cs.CV

    A Survey on Deep Learning Technique for Video Segmentation

    Authors: Tianfei Zhou, Fatih Porikli, David Crandall, Luc Van Gool, Wenguan Wang

    Abstract: Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learnin… ▽ More

    Submitted 29 November, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: Accepted by TPAMI. Website: https://github.com/tfzhou/VS-Survey

  41. arXiv:2107.00649  [pdf, other

    cs.CV

    On the Practicality of Deterministic Epistemic Uncertainty

    Authors: Janis Postels, Mattia Segu, Tao Sun, Luca Sieber, Luc Van Gool, Fisher Yu, Federico Tombari

    Abstract: A set of novel approaches for estimating epistemic uncertainty in deep neural networks with a single forward pass has recently emerged as a valid alternative to Bayesian Neural Networks. On the premise of informative representations, these deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution (OOD) data while adding negligible computational costs at i… ▽ More

    Submitted 5 July, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: International Conference on Machine Learning 2022

  42. arXiv:2106.06847  [pdf, other

    cs.CV

    Video Super-Resolution Transformer

    Authors: Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool

    Abstract: Video super-resolution (VSR), with the aim to restore a high-resolution video from its corresponding low-resolution version, is a spatial-temporal sequence prediction problem. Recently, Transformer has been gaining popularity due to its parallel computing ability for sequence-to-sequence modeling. Thus, it seems to be straightforward to apply the vision Transformer to solve VSR. However, the typic… ▽ More

    Submitted 4 July, 2023; v1 submitted 12 June, 2021; originally announced June 2021.

  43. arXiv:2106.05967  [pdf, other

    cs.CV cs.LG

    Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

    Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool

    Abstract: Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. In this paper, we first study how biases in the dataset affect existing methods. Our results show that current contrastive approaches work surprisingly well across: (i) o… ▽ More

    Submitted 14 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021. Code: https://github.com/wvangansbeke/Revisiting-Contrastive-SSL

  44. arXiv:2106.03959  [pdf, other

    cs.LG cs.CV

    Generative Flows with Invertible Attentions

    Authors: Rhea Sanjay Sukthanker, Zhiwu Huang, Suryansh Kumar, Radu Timofte, Luc Van Gool

    Abstract: Flow-based generative models have shown an excellent ability to explicitly learn the probability density function of data via a sequence of invertible transformations. Yet, learning attentions in generative flows remains understudied, while it has made breakthroughs in other domains. To fill the gap, this paper introduces two types of invertible attention mechanisms, i.e., map-based and transforme… ▽ More

    Submitted 31 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2022

  45. arXiv:2106.03180  [pdf, other

    cs.CV

    Vision Transformers with Hierarchical Attention

    Authors: Yun Liu, Yu-Huan Wu, Guolei Sun, Le Zhang, Ajad Chhatkuli, Luc Van Gool

    Abstract: This paper tackles the high computational/space complexity associated with Multi-Head Self-Attention (MHSA) in vanilla vision transformers. To this end, we propose Hierarchical MHSA (H-MHSA), a novel approach that computes self-attention in a hierarchical fashion. Specifically, we first divide the input image into patches as commonly done, and each patch is viewed as a token. Then, the proposed H-… ▽ More

    Submitted 26 March, 2024; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Machine Intelligence Research (MIR), DOI: 10.1007/s11633-024-1393-8

  46. arXiv:2106.03135  [pdf, other

    cs.CV

    Go with the Flows: Mixtures of Normalizing Flows for Point Cloud Generation and Reconstruction

    Authors: Janis Postels, Mengya Liu, Riccardo Spezialetti, Luc Van Gool, Federico Tombari

    Abstract: Recently normalizing flows (NFs) have demonstrated state-of-the-art performance on modeling 3D point clouds while allowing sampling with arbitrary resolution at inference time. However, these flow-based models still require long training times and large models for representing complicated geometries. This work enhances their representational power by applying mixtures of NFs to point clouds. We sh… ▽ More

    Submitted 29 November, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Journal ref: International Conference on 3D Vision 2021

  47. arXiv:2106.00783  [pdf, other

    eess.IV cs.CV

    Fourier Space Losses for Efficient Perceptual Image Super-Resolution

    Authors: Dario Fuoli, Luc Van Gool, Radu Timofte

    Abstract: Many super-resolution (SR) models are optimized for high performance only and therefore lack efficiency due to large model complexity. As large models are often not practical in real-world applications, we investigate and propose novel loss functions, to enable SR with high perceptual quality from much more efficient models. The representative power for a given low-complexity generator network can… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  48. arXiv:2105.10926  [pdf, other

    cs.CV

    Rethinking Global Context in Crowd Counting

    Authors: Guolei Sun, Yun Liu, Thomas Probst, Danda Pani Paudel, Nikola Popovic, Luc Van Gool

    Abstract: This paper investigates the role of global context for crowd counting. Specifically, a pure transformer is used to extract features with global information from overlap** image patches. Inspired by classification, we add a context token to the input sequence, to facilitate information exchange with tokens corresponding to image patches throughout transformer layers. Due to the fact that transfor… ▽ More

    Submitted 25 November, 2023; v1 submitted 23 May, 2021; originally announced May 2021.

    Comments: Accepted by Machine Intelligence Research (MIR)

    Report number: DOI: 10.1007/s11633-023-1475-z

  49. arXiv:2105.08463  [pdf, other

    cs.CV cs.LG

    Unsupervised Compound Domain Adaptation for Face Anti-Spoofing

    Authors: Ankush Panwar, Pratyush Singh, Suman Saha, Danda Pani Paudel, Luc Van Gool

    Abstract: We address the problem of face anti-spoofing which aims to make the face verification systems robust in the real world settings. The context of detecting live vs. spoofed face images may differ significantly in the target domain, when compared to that of labeled source domain where the model is trained. Such difference may be caused due to new and unknown spoof types, illumination conditions, scen… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: 9 pages, 6 figures

  50. arXiv:2105.07830  [pdf, other

    cs.CV cs.LG

    Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation

    Authors: Suman Saha, Anton Obukhov, Danda Pani Paudel, Menelaos Kanakis, Yuhua Chen, Stamatios Georgoulis, Luc Van Gool

    Abstract: We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting. Semantic segmentation and monocular depth estimation are shown to be complementary tasks; in a multi-task learning setting, a proper encoding of their relationships can further improve performance on both tasks. Motivated by this observation, we propose a n… ▽ More

    Submitted 3 July, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

    Comments: Accepted at CVPR 2021; updated results according to the released source code