Skip to main content

Showing 1–50 of 190 results for author: Schiele, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00503  [pdf, other

    cs.CV

    Toward a Diffusion-Based Generalist for Dense Vision Tasks

    Authors: Yue Fan, Yongqin Xian, Xiaohua Zhai, Alexander Kolesnikov, Muhammad Ferjad Naeem, Bernt Schiele, Federico Tombari

    Abstract: Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring results. In this paper, we explore diffusion-based vision generalists, where we unify different types of dense prediction tasks as conditional image g… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Published at CVPR 2024 as a workshop paper

  2. arXiv:2405.16517  [pdf, other

    cs.CV

    Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors

    Authors: Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen

    Abstract: We aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diff… ▽ More

    Submitted 2 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: 18 pages, 11 figures, 4 tables

  3. arXiv:2403.19811  [pdf, other

    cs.CV

    X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

    Authors: Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma

    Abstract: Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition. However, the adaptation of these models to egocentric videos has been largely unexplored. To address this gap, we propose a simple yet effective cross-modal adaptation framework, which we call X-MIC. Using a video adapter, o… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  4. arXiv:2403.18550  [pdf, other

    cs.CV

    OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

    Authors: Noor Ahmed, Anna Kukleva, Bernt Schiele

    Abstract: Few-Shot Class-Incremental Learning (FSCIL) introduces a paradigm in which the problem space expands with limited data. FSCIL methods inherently face the challenge of catastrophic forgetting as data arrives incrementally, making models susceptible to overwriting previously acquired knowledge. Moreover, given the scarcity of labeled samples available at any given time, models may be prone to overfi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  5. arXiv:2403.16292  [pdf, other

    cs.CV

    latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

    Authors: Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen

    Abstract: We present latentSplat, a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. Existing methods for generalizable 3D reconstruction either do not enable fast inference of high resolution novel views due to slow volume rendering, or are limited to interpolation of close input views, even in simpler settings with a s… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Project website: https://geometric-rl.mpi-inf.mpg.de/latentsplat/

  6. arXiv:2403.09394  [pdf, other

    cs.CV

    GiT: Towards Generalist Vision Transformer through Universal Language Interface

    Authors: Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang

    Abstract: This paper proposes a simple, yet effective framework, called GiT, simultaneously applicable for various vision tasks only with a vanilla ViT. Motivated by the universality of the Multi-layer Transformer architecture (e.g, GPT) widely used in large language models (LLMs), we seek to broaden its scope to serve as a powerful vision foundation model (VFM). However, unlike language modeling, visual ta… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  7. arXiv:2402.08400  [pdf, other

    cs.LG cs.CV

    Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing

    Authors: Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz

    Abstract: Certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions, a necessity for safety-critical domains. Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty across many classes. We propose a novel, more practical setting, which certifies pixel… ▽ More

    Submitted 3 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Journal ref: International Conference on Machine Learning (ICML), 2024

  8. arXiv:2402.03119  [pdf, other

    cs.CV cs.AI cs.LG

    Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

    Authors: Amin Parchami-Araghi, Moritz Böhle, Sukrut Rao, Bernt Schiele

    Abstract: Knowledge Distillation (KD) has proven effective for compressing large teacher models into smaller student models. While it is well known that student models can achieve similar accuracies as the teachers, it has also been shown that they nonetheless often do not learn the same function. It is, however, often highly desirable that the student's and teacher's functions share similar properties such… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 21 pages, 12 figures

  9. arXiv:2401.01505  [pdf, other

    cs.CV

    Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

    Authors: Haopeng Li, Andong Deng, Qiuhong Ke, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Chen Chen

    Abstract: Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos… ▽ More

    Submitted 14 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  10. arXiv:2311.10572  [pdf, other

    cs.CV cs.LG

    SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

    Authors: Yue Fan, Anna Kukleva, Dengxin Dai, Bernt Schiele

    Abstract: Semi-supervised learning (SSL) methods effectively leverage unlabeled data to improve model generalization. However, SSL models often underperform in open-set scenarios, where unlabeled data contain outliers from novel categories that do not appear in the labeled set. In this paper, we study the challenging and realistic open-set SSL setting, where the goal is to both correctly classify inliers an… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Paper accepted in ICCV 2023

  11. arXiv:2310.16115  [pdf, other

    cs.CV cs.LG

    Wakening Past Concepts without Past Data: Class-Incremental Learning from Online Placebos

    Authors: Yaoyao Liu, Yingying Li, Bernt Schiele, Qianru Sun

    Abstract: Not forgetting old class knowledge is a key challenge for class-incremental learning (CIL) when the model continuously adapts to new classes. A common technique to address this is knowledge distillation (KD), which penalizes prediction inconsistencies between old and new models. Such prediction is made with almost new class data, as old class data is extremely scarce due to the strict memory limit… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to WACV 2024. Code: https://github.com/yaoyao-liu/online-placebos

  12. arXiv:2310.04900  [pdf, other

    cs.CV

    HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

    Authors: Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

    Abstract: Instructional videos are an excellent source for learning multimodal representations by leveraging video-subtitle pairs extracted with automatic speech recognition systems (ASR) from the audio signal in the videos. However, in contrast to human-annotated captions, both speech and subtitles naturally differ from the visual content of the videos and thus provide only noisy supervision for multimodal… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: https://github.com/ninatu/howtocaption

  13. arXiv:2310.01926  [pdf, other

    cs.CV cs.AI

    DARTH: Holistic Test-time Adaptation for Multiple Object Tracking

    Authors: Mattia Segu, Bernt Schiele, Fisher Yu

    Abstract: Multiple object tracking (MOT) is a fundamental component of perception systems for autonomous driving, and its robustness to unseen conditions is a requirement to avoid life-critical failures. Despite the urge of safety in driving systems, no solution to the MOT adaptation problem to domain shift in test-time conditions has ever been proposed. However, the nature of a MOT system is manifold - req… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Proceedings of the IEEE/CVF International Conference on Computer Vision

  14. arXiv:2309.09858  [pdf, other

    cs.CV

    Unsupervised Open-Vocabulary Object Localization in Videos

    Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

    Abstract: In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization. We propose a method that first localizes objects in videos via an object-centric approach with slot attention and then assigns text to the obtained slots. The latter is achieved by an unsupervised way to… ▽ More

    Submitted 26 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023; Presented on CVPR 2024 Workshop CORR; Project Page:https://github.com/amazon-science/object-centric-vol

  15. arXiv:2309.08928  [pdf, other

    cs.CV

    In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

    Authors: Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne

    Abstract: Large-scale noisy web image-text datasets have been proven to be efficient for learning robust vision-language models. However, when transferring them to the task of video retrieval, models still need to be fine-tuned on hand-curated paired text-video data to adapt to the diverse styles of video descriptions. To address this problem without the need for hand-annotated pairs, we propose a new setti… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: Published at ICCV 2023, code: https://github.com/ninatu/in_style

  16. arXiv:2309.06166  [pdf, other

    cs.LG cs.CV stat.ML

    Certified Robust Models with Slack Control and Large Lipschitz Constants

    Authors: Max Losch, David Stutz, Bernt Schiele, Mario Fritz

    Abstract: Despite recent success, state-of-the-art learning-based models remain highly vulnerable to input changes such as adversarial examples. In order to obtain certifiable robustness against such perturbations, recent work considers Lipschitz-based regularizers or constraints while at the same time increasing prediction margin. Unfortunately, this comes at the cost of significantly decreased accuracy. I… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: To be published at GCPR 2023

  17. arXiv:2309.03809  [pdf, other

    cs.CV

    SimNP: Learning Self-Similarity Priors Between Neural Points

    Authors: Christopher Wewer, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen

    Abstract: Existing neural field representations for 3D object reconstruction either (1) utilize object-level representations, but suffer from low-quality details due to conditioning on a global latent code, or (2) are able to perfectly reconstruct the observations, but fail to utilize object-level prior knowledge to infer unobserved regions. We present SimNP, a method to learn category-level self-similariti… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  18. arXiv:2309.00233  [pdf, other

    cs.CV

    Object-Centric Multiple Object Tracking

    Authors: Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

    Abstract: Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects are often split into parts and are not consistently tracked over time. In fact, state-of-the-art model… ▽ More

    Submitted 5 September, 2023; v1 submitted 31 August, 2023; originally announced September 2023.

    Comments: ICCV 2023 camera-ready version

  19. arXiv:2308.07732  [pdf, other

    cs.CV

    UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

    Authors: Haiyang Wang, Hao Tang, Shaoshuai Shi, Aoxue Li, Zhenguo Li, Bernt Schiele, Liwei Wang

    Abstract: Jointly processing information from multiple sensors is crucial to achieving accurate and robust perception for reliable autonomous driving systems. However, current 3D perception research follows a modality-specific paradigm, leading to additional computation overheads and inefficient collaboration between different sensor data. In this paper, we present an efficient multi-modal backbone for outd… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  20. arXiv:2306.17770  [pdf, other

    cs.CV

    MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying

    Authors: Shaoshuai Shi, Li Jiang, Dengxin Dai, Bernt Schiele

    Abstract: Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-… ▽ More

    Submitted 9 March, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2024). The winning approaches for the Waymo Motion Prediction Challenge in 2022 and 2023

  21. arXiv:2306.10898  [pdf, other

    cs.CV

    B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers

    Authors: Moritz Böhle, Navdeeppal Singh, Mario Fritz, Bernt Schiele

    Abstract: We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. For this, we propose to replace the linear transformations in DNNs by our novel B-cos transformation. As we show, a sequence (network) of such transformations induces a single linear transformation that faithfully summarises the full model computations.… ▽ More

    Submitted 15 January, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: Extension of B-cos Networks: Alignment is All We Need for Interpretability (Böhle et al., CVPR 2022). Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: substantial text overlap with arXiv:2205.10268

  22. arXiv:2305.13031  [pdf, other

    cs.CV

    HGFormer: Hierarchical Grou** Transformer for Domain Generalized Semantic Segmentation

    Authors: Jian Ding, Nan Xue, Gui-Song Xia, Bernt Schiele, Dengxin Dai

    Abstract: Current semantic segmentation models have achieved great success under the independent and identically distributed (i.i.d.) condition. However, in real-world applications, test data might come from a different domain than training data. Therefore, it is important to improve model robustness against domain differences. This work studies semantic segmentation under the domain generalization setting,… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by CVPR 2023

  23. arXiv:2305.05026  [pdf, other

    cs.CV

    Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding

    Authors: Li Jiang, Zetong Yang, Shaoshuai Shi, Vladislav Golyanik, Dengxin Dai, Bernt Schiele

    Abstract: Masked signal modeling has greatly advanced self-supervised pre-training for language and 2D images. However, it is still not fully explored in 3D scene understanding. Thus, this paper introduces Masked Shape Prediction (MSP), a new framework to conduct masked signal modeling in 3D scenes. MSP uses the essential 3D semantic cue, i.e., geometric shape, as the prediction target for masked points. Th… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: CVPR 2023

  24. arXiv:2304.03110  [pdf, other

    cs.CV

    Continual Detection Transformer for Incremental Object Detection

    Authors: Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, Christian Rupprecht

    Abstract: Incremental object detection (IOD) aims to train an object detector in phases, each with annotations for new object categories. As other incremental settings, IOD is subject to catastrophic forgetting, which is often addressed by techniques such as knowledge distillation (KD) and exemplar replay (ER). However, KD and ER do not work well if applied directly to state-of-the-art transformer-based obj… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  25. arXiv:2303.14042  [pdf, other

    cs.CV

    Class-Incremental Exemplar Compression for Class-Incremental Learning

    Authors: Zilin Luo, Yaoyao Liu, Bernt Schiele, Qianru Sun

    Abstract: Exemplar-based class-incremental learning (CIL) finetunes the model with all samples of new classes but few-shot exemplars of old classes in each incremental phase, where the "few-shot" abides by the limited memory budget. In this paper, we break this "few-shot" limit based on a simple yet surprisingly effective idea: compressing exemplars by downsampling non-discriminative pixels and saving "many… ▽ More

    Submitted 7 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  26. arXiv:2303.13664  [pdf, other

    cs.CV cs.LG

    Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data

    Authors: Anna Kukleva, Moritz Böhle, Bernt Schiele, Hilde Kuehne, Christian Rupprecht

    Abstract: Most approaches for self-supervised learning (SSL) are optimised on curated balanced datasets, e.g. ImageNet, despite the fact that natural data usually exhibits long-tail distributions. In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on long-tail data. In particular, we investigate the role of the temperature parameter $τ$ in the contr… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: ICLR 2023

  27. arXiv:2303.11932  [pdf, other

    cs.CV cs.AI cs.LG

    Using Explanations to Guide Models

    Authors: Sukrut Rao, Moritz Böhle, Amin Parchami-Araghi, Bernt Schiele

    Abstract: Deep neural networks are highly performant, but might base their decision on spurious or background features that co-occur with certain classes, which can hurt generalization. To mitigate this issue, the usage of 'model guidance' has gained popularity recently: for this, models are guided to be "right for the right reasons" by regularizing the models' explanations to highlight the right features.… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 38 pages, 35 figures, 4 tables

  28. arXiv:2303.11884  [pdf, other

    cs.CV cs.AI cs.LG

    Better Understanding Differences in Attribution Methods via Systematic Evaluations

    Authors: Sukrut Rao, Moritz Böhle, Bernt Schiele

    Abstract: Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more relia… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 35 pages, 37 figures, 2 tables, extended version of arXiv:2205.10435

  29. arXiv:2303.11126  [pdf, other

    cs.CV

    Robustifying Token Attention for Vision Transformers

    Authors: Yong Guo, David Stutz, Bernt Schiele

    Abstract: Despite the success of vision transformers (ViTs), they still suffer from significant drops in accuracy in the presence of common corruptions, such as noise or blur. Interestingly, we observe that the attention mechanism of ViTs tends to rely on few important tokens, a phenomenon we call token overfocusing. More critically, these tokens are not robust to corruptions, often leading to highly diverg… ▽ More

    Submitted 6 September, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: To appear in ICCV 2023

  30. arXiv:2303.01598  [pdf, other

    cs.CV cs.LG

    A Meta-Learning Approach to Predicting Performance and Data Requirements

    Authors: Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto

    Abstract: We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  31. arXiv:2301.10921  [pdf, other

    cs.LG cs.AI cs.CV

    SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning

    Authors: Hao Chen, Ran Tao, Yue Fan, Yidong Wang, **dong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, Marios Savvides

    Abstract: The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model's generalization performance. In this paper, we first revisit the popular pseudo-labeling methods via a unified sample weighting formulation and demonstrate the inherent quantity-quality trade-off problem of pseudo-labeling with thresholdi… ▽ More

    Submitted 15 March, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Accepted by ICLR 2023

  32. arXiv:2301.08669  [pdf, other

    cs.CV stat.ML

    Holistically Explainable Vision Transformers

    Authors: Moritz Böhle, Mario Fritz, Bernt Schiele

    Abstract: Transformers increasingly dominate the machine learning landscape across many tasks and domains, which increases the importance for understanding their outputs. While their attention modules provide partial insight into their inner workings, the attention scores have been shown to be insufficient for explaining the models as a whole. To address this, we propose B-cos transformers, which inherently… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  33. arXiv:2301.08571  [pdf, other

    cs.CL cs.CV cs.LG

    Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

    Authors: Xudong Hong, Asad Sayeed, Khushboo Mehra, Vera Demberg, Bernt Schiele

    Abstract: Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K st… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

    Comments: Paper accepted by Transactions of the Association for Computational Linguistics (TACL). This is a pre-MIT Press publication version. 15 pages, 6 figures

  34. arXiv:2301.06051  [pdf, other

    cs.CV

    DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

    Authors: Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, Liwei Wang

    Abstract: Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is a fundamental problem in 3D perception. Compared with the customized sparse convolution, the attention mechanism in Transformers is more appropriate for flexibly modeling long-range relationships and is easier to be deployed in real-world applications. However, due to the sparse characteristics of point clo… ▽ More

    Submitted 20 March, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

    Comments: Accepted by CVPR2023

  35. arXiv:2301.05792  [pdf, other

    cs.CV

    RMM: Reinforced Memory Management for Class-Incremental Learning

    Authors: Yaoyao Liu, Bernt Schiele, Qianru Sun

    Abstract: Class-Incremental Learning (CIL) [40] trains classifiers under a strict memory budget: in each incremental phase, learning is done for new data, most of which is abandoned to free space for the next phase. The preserved data are exemplars used for replaying. However, existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal. In this work, we propose a dyna… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: NeurIPS 2021

  36. Online Hyperparameter Optimization for Class-Incremental Learning

    Authors: Yaoyao Liu, Yingying Li, Bernt Schiele, Qianru Sun

    Abstract: Class-incremental learning (CIL) aims to train a classification model while the number of classes increases phase-by-phase. An inherent challenge of CIL is the stability-plasticity tradeoff, i.e., CIL models should keep stable to retain old knowledge and keep plastic to absorb new knowledge. However, none of the existing CIL models can achieve the optimal tradeoff in different data-receiving setti… ▽ More

    Submitted 3 May, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: AAAI 2023 Oral. Code is available at https://class-il.mpi-inf.mpg.de/online/code/

  37. arXiv:2301.02009  [pdf, other

    cs.CV

    Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

    Authors: Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne

    Abstract: Contrastive learning has become an important tool in learning representations from unlabeled data mainly relying on the idea of minimizing distance between positive data pairs, e.g., views from the same images, and maximizing distance between negative data pairs, e.g., views from different images. This paper proposes a new variation of the contrastive learning objective, Group Ordering Constraints… ▽ More

    Submitted 18 August, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: Published at ICCV 2023, Code @ https://github.com/ninatu/learning_by_sorting

  38. arXiv:2212.07911  [pdf, other

    cs.CV

    Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

    Authors: Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele

    Abstract: For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as we… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at WACV 2023

  39. arXiv:2212.01455  [pdf, other

    cs.CV

    Discovering Class-Specific GAN Controls for Semantic Image Synthesis

    Authors: Edgar Schönfeld, Julio Borges, Vadim Sushko, Bernt Schiele, Anna Khoreva

    Abstract: Prior work has extensively studied the latent space structure of GANs for unconditional image synthesis, enabling global editing of generated images by the unsupervised discovery of interpretable latent directions. However, the discovery of latent directions for conditional GANs for semantic image synthesis (SIS) has remained unexplored. In this work, we specifically focus on addressing this gap.… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  40. arXiv:2211.11086   

    cs.CV cs.AI cs.LG

    An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

    Authors: Hao Chen, Yue Fan, Yidong Wang, **dong Wang, Bernt Schiele, Xing Xie, Marios Savvides, Bhiksha Raj

    Abstract: Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance. While standard SSL assumes uniform data distribution, we consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data. Although there are existing endeavors to tackle this challenge, their perform… ▽ More

    Submitted 18 January, 2024; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Issues in the paper, will re-open later

  41. arXiv:2211.04393  [pdf, other

    cs.CV

    Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts

    Authors: Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai

    Abstract: Improving model's generalizability against domain shifts is crucial, especially for safety-critical applications such as autonomous driving. Real-world domain styles can vary substantially due to environment changes and sensor noises, but deep models only know the training domain style. Such domain style gap impedes model generalization on diverse real-world domains. Our proposed Normalization Per… ▽ More

    Submitted 8 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

  42. arXiv:2209.13508  [pdf, other

    cs.CV

    Motion Transformer with Global Intention Localization and Local Movement Refinement

    Authors: Shaoshuai Shi, Li Jiang, Dengxin Dai, Bernt Schiele

    Abstract: Predicting multimodal future behavior of traffic participants is essential for robotic vehicles to make safe decisions. Existing works explore to directly predict future trajectories based on latent features or utilize dense goal candidates to identify agent's destinations, where the former strategy converges slowly since all motion modes are derived from the same feature while the latter strategy… ▽ More

    Submitted 18 March, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by NeurIPS 2022 as Oral Presentation

  43. arXiv:2209.11870  [pdf, other

    cs.CV

    Leveraging Self-Supervised Training for Unintentional Action Recognition

    Authors: Enea Duka, Anna Kukleva, Bernt Schiele

    Abstract: Unintentional actions are rare occurrences that are difficult to define precisely and that are highly dependent on the temporal context of the action. In this work, we explore such actions and seek to identify the points in videos where the actions transition from intentional to unintentional. We propose a multi-stage framework that exploits inherent biases such as motion speed, motion direction,… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted at ECCVW2022

  44. arXiv:2209.11459  [pdf, other

    cs.CV cs.LG

    TeST: Test-time Self-Training under Distribution Shift

    Authors: Samarth Sinha, Peter Gehler, Francesco Locatello, Bernt Schiele

    Abstract: Despite their recent success, deep neural networks continue to perform poorly when they encounter distribution shifts at test time. Many recently proposed approaches try to counter this by aligning the model to the new distribution prior to inference. With no labels available this requires unsupervised objectives to adapt the model on the observed test data. In this paper, we propose Test-Time Sel… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Journal ref: WACV 2023

  45. arXiv:2209.10033  [pdf, other

    cs.CV

    MTR-A: 1st Place Solution for 2022 Waymo Open Dataset Challenge -- Motion Prediction

    Authors: Shaoshuai Shi, Li Jiang, Dengxin Dai, Bernt Schiele

    Abstract: In this report, we present the 1st place solution for motion prediction track in 2022 Waymo Open Dataset Challenges. We propose a novel Motion Transformer framework for multimodal motion prediction, which introduces a small set of novel motion query pairs for generating better multimodal future trajectories by jointly performing the intention localization and iterative motion refinement. A simple… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: The 1st place solution report for Waymo Motion Prediction Challenge of Workshop on Autonomous Driving of CVPR 2022

  46. arXiv:2209.05654  [pdf, other

    cs.CV

    ComplETR: Reducing the cost of annotations for object detection in dense scenes with vision transformers

    Authors: Achin Jain, Kibok Lee, Gurumurthy Swaminathan, Hao Yang, Bernt Schiele, Avinash Ravichandran, Onkar Dabeer

    Abstract: Annotating bounding boxes for object detection is expensive, time-consuming, and error-prone. In this work, we propose a DETR based framework called ComplETR that is designed to explicitly complete missing annotations in partially annotated dense scene datasets. This reduces the need to annotate every object instance in the scene thereby reducing annotation cost. ComplETR augments object queries i… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

  47. arXiv:2208.08439  [pdf, other

    cs.CV

    MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes

    Authors: Zhi Li, Soshi Shimada, Bernt Schiele, Christian Theobalt, Vladislav Golyanik

    Abstract: 3D human motion capture from monocular RGB images respecting interactions of a subject with complex and possibly deformable environments is a very challenging, ill-posed and under-explored problem. Existing methods address it only weakly and do not model possible surface deformations often occurring when humans interact with scene surfaces. In contrast, this paper proposes MoCapDeform, i.e., a new… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 11 pages, 8 figures, 3 tables; project page: https://4dqv.mpi-inf.mpg.de/MoCapDeform/

    Journal ref: International Conference on 3D Vision 2022 (Oral)

  48. arXiv:2208.07204  [pdf, other

    cs.LG cs.AI cs.CV

    USB: A Unified Semi-supervised Learning Benchmark for Classification

    Authors: Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, **dong Wang, Xing Xie, Yue Zhang

    Abstract: Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issu… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted by NeurIPS'22 dataset and benchmark track; code at https://github.com/microsoft/Semi-supervised-learning

  49. arXiv:2207.09239  [pdf, other

    cs.LG stat.ML

    Assaying Out-Of-Distribution Generalization in Transfer Learning

    Authors: Florian Wenzel, Andrea Dittadi, Peter Vincent Gehler, Carl-Johann Simon-Gabriel, Max Horn, Dominik Zietlow, David Kernert, Chris Russell, Thomas Brox, Bernt Schiele, Bernhard Schölkopf, Francesco Locatello

    Abstract: Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions… ▽ More

    Submitted 21 October, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

  50. arXiv:2206.08367  [pdf, other

    cs.CV cs.LG

    SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

    Authors: Tao Sun, Mattia Segu, Janis Postels, Yuxuan Wang, Luc Van Gool, Bernt Schiele, Federico Tombari, Fisher Yu

    Abstract: Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous driving systems. Existing image and video driving datasets, however, fall short of capturing the mutable nature of the real world. In this paper, we introduce the largest multi-task synthetic dataset for autonomous driving, SHIFT. It presents discrete and continuous shifts in cloudines… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: Published at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022