Skip to main content

Showing 1–50 of 118 results for author: Brox, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07983  [pdf, other

    cs.CV cs.LG

    Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning

    Authors: Simon Schrodi, David T. Hoffmann, Max Argus, Volker Fischer, Thomas Brox

    Abstract: Contrastive vision-language models like CLIP have gained popularity for their versatile applicable learned representations in various downstream tasks. Despite their successes in some tasks, like zero-shot image recognition, they also perform surprisingly poor on other tasks, like attribute detection. Previous work has attributed these challenges to the modality gap, a separation of image and text… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2403.15203  [pdf, other

    cs.RO cs.CV

    DITTO: Demonstration Imitation by Trajectory Transformation

    Authors: Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada

    Abstract: Teaching robots new skills quickly and conveniently is crucial for the broader adoption of robotic systems. In this work, we address the problem of one-shot imitation from a single human demonstration, given by an RGB-D video recording through a two-stage process. In the first stage which is offline, we extract the trajectory of the demonstration. This entails segmenting manipulated objects and de… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures, 3 tables, submitted to IROS 2024

  3. arXiv:2403.14508  [pdf, other

    cs.LG cs.AI eess.SY

    Constrained Reinforcement Learning with Smoothed Log Barrier Function

    Authors: Baohe Zhang, Yuan Zhang, Lilli Frison, Thomas Brox, Joschka Bödecker

    Abstract: Reinforcement Learning (RL) has been widely applied to many control tasks and substantially improved the performances compared to conventional control methods in many domains where the reward function is well defined. However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously. Optimizing such constrained… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  4. arXiv:2402.07270  [pdf, other

    cs.CV cs.CL cs.LG

    Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy

    Authors: Simon Ging, María A. Bravo, Thomas Brox

    Abstract: The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks to advance our understanding of these models' capabilities. We propose a novel VQA benchmark based on well-known visual classification datasets which… ▽ More

    Submitted 5 May, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: Accepted as Spotlight Paper for ICLR 2024. The first two authors contributed equally to this work

  5. arXiv:2402.03170  [pdf, other

    cs.LG

    Is Mamba Capable of In-Context Learning?

    Authors: Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank Hutter

    Abstract: State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models… ▽ More

    Submitted 24 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  6. arXiv:2312.14124  [pdf, other

    cs.CV

    Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation

    Authors: Philipp Schröppel, Christopher Wewer, Jan Eric Lenssen, Eddy Ilg, Thomas Brox

    Abstract: Controllable generation of 3D assets is important for many practical applications like content creation in movies, games and engineering, as well as in AR/VR. Recently, diffusion models have shown remarkable results in generation quality of 3D objects. However, none of the existing models enable disentangled generation to control the shape and appearance separately. For the first time, we present… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  7. arXiv:2311.07761  [pdf, other

    cs.CV cs.AI cs.RO

    Amodal Optical Flow

    Authors: Maximilian Luz, Rohit Mohan, Ahmed Rida Sekkat, Oliver Sawade, Elmar Matthes, Thomas Brox, Abhinav Valada

    Abstract: Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible… ▽ More

    Submitted 7 May, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  8. arXiv:2310.12956  [pdf, other

    cs.LG cs.AI cs.CV

    Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

    Authors: David T. Hoffmann, Simon Schrodi, Jelena Bratulić, Nadine Behrmann, Volker Fischer, Thomas Brox

    Abstract: In this work, we study rapid improvements of the training loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate task and both training and validation loss saturate for hundreds of epochs. When transformers finally learn the intermediate task, they do this rapidly and unexpectedly. We call these abrupt improvements E… ▽ More

    Submitted 6 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted at ICML 2024

  9. arXiv:2310.06668  [pdf, other

    cs.LG cs.CV

    Latent Diffusion Counterfactual Explanations

    Authors: Karim Farid, Simon Schrodi, Max Argus, Thomas Brox

    Abstract: Counterfactual explanations have emerged as a promising method for elucidating the behavior of opaque black-box models. Recently, several works leveraged pixel-space diffusion models for counterfactual generation. To handle noisy, adversarial gradients during counterfactual generation -- causing unrealistic artifacts or mere adversarial perturbations -- they required either auxiliary adversarially… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  10. arXiv:2310.05691  [pdf, other

    cs.CV physics.ao-ph

    Climate-sensitive Urban Planning through Optimization of Tree Placements

    Authors: Simon Schrodi, Ferdinand Briegel, Max Argus, Andreas Christen, Thomas Brox

    Abstract: Climate change is increasing the intensity and frequency of many extreme weather events, including heatwaves, which results in increased thermal discomfort and mortality rates. While global mitigation action is undoubtedly necessary, so is climate adaptation, e.g., through climate-sensitive urban planning. Among the most promising strategies is harnessing the benefits of urban trees in shading and… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  11. arXiv:2310.04271  [pdf, other

    cs.RO cs.CV

    Compositional Servoing by Recombining Demonstrations

    Authors: Max Argus, Abhijeet Nayak, Martin Büchner, Silvio Galesso, Abhinav Valada, Thomas Brox

    Abstract: Learning-based manipulation policies from image inputs often show weak task transfer capabilities. In contrast, visual servoing methods allow efficient task transfer in high-precision scenarios while requiring only a few demonstrations. In this work, we present a framework that formulates the visual servoing task as graph traversal. Our method not only extends the robustness of visual servoing, bu… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: http://compservo.cs.uni-freiburg.de

  12. arXiv:2309.09858  [pdf, other

    cs.CV

    Unsupervised Open-Vocabulary Object Localization in Videos

    Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

    Abstract: In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization. We propose a method that first localizes objects in videos via an object-centric approach with slot attention and then assigns text to the obtained slots. The latter is achieved by an unsupervised way to… ▽ More

    Submitted 26 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023; Presented on CVPR 2024 Workshop CORR; Project Page:https://github.com/amazon-science/object-centric-vol

  13. arXiv:2309.00233  [pdf, other

    cs.CV

    Object-Centric Multiple Object Tracking

    Authors: Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

    Abstract: Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects are often split into parts and are not consistently tracked over time. In fact, state-of-the-art model… ▽ More

    Submitted 5 September, 2023; v1 submitted 31 August, 2023; originally announced September 2023.

    Comments: ICCV 2023 camera-ready version

  14. arXiv:2305.15956  [pdf, other

    cs.CV

    Anomaly Detection with Conditioned Denoising Diffusion Models

    Authors: Arian Mousakhan, Thomas Brox, Jawad Tayyub

    Abstract: Traditional reconstruction-based methods have struggled to achieve competitive performance in anomaly detection. In this paper, we introduce Denoising Diffusion Anomaly Detection (DDAD), a novel denoising process for image reconstruction conditioned on a target image. This ensures a coherent restoration that closely resembles the target image. Our anomaly detection framework employs the conditioni… ▽ More

    Submitted 3 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  15. arXiv:2302.04075  [pdf, other

    cs.CV

    Best Practices in Active Learning for Semantic Segmentation

    Authors: Sudhanshu Mittal, Joshua Niemeijer, Jörg P. Schäfer, Thomas Brox

    Abstract: Active learning is particularly of interest for semantic segmentation, where annotations are costly. Previous academic studies focused on datasets that are already very diverse and where the model is trained in a supervised manner with a large annotation budget. In contrast, data collected in many driving scenarios is highly redundant, and most medical applications are subject to very constrained… ▽ More

    Submitted 15 March, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

  16. arXiv:2211.12914  [pdf, other

    cs.CV cs.LG

    Open-vocabulary Attribute Detection

    Authors: María A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox

    Abstract: Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corres… ▽ More

    Submitted 8 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted at CVPR 2023. https://ovad-benchmark.github.io

  17. arXiv:2211.06660  [pdf, other

    cs.CV

    Far Away in the Deep Space: Dense Nearest-Neighbor-Based Out-of-Distribution Detection

    Authors: Silvio Galesso, Max Argus, Thomas Brox

    Abstract: The key to out-of-distribution detection is density estimation of the in-distribution data or of its feature representations. This is particularly challenging for dense anomaly detection in domains where the in-distribution data has a complex underlying structure. Nearest-Neighbors approaches have been shown to work well in object-centric data domains, such as industrial inspection and image class… ▽ More

    Submitted 14 September, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: Workshop on Uncertainty Quantification for Computer Vision, ICCV 2023. Code at: https://github.com/silviogalesso/dense-ood-knns

  18. arXiv:2211.01842  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

    Authors: Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, Frank Hutter

    Abstract: The discovery of neural architectures from simple building blocks is a long-standing goal of Neural Architecture Search (NAS). Hierarchical search spaces are a promising step towards this goal but lack a unifying search space design framework and typically only search over some limited aspect of architectures. In this work, we introduce a unifying search space design framework based on context-fre… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2023

  19. arXiv:2209.14860  [pdf, other

    cs.CV cs.LG

    Bridging the Gap to Real-World Object-Centric Learning

    Authors: Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Dominik Zietlow, Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard Schölkopf, Thomas Brox, Francesco Locatello

    Abstract: Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully d… ▽ More

    Submitted 6 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: ICLR 2023 camera-ready version

  20. arXiv:2209.08532  [pdf, other

    cs.CV

    SF2SE3: Clustering Scene Flow into SE(3)-Motions via Proposal and Selection

    Authors: Leonhard Sommer, Philipp Schröppel, Thomas Brox

    Abstract: We propose SF2SE3, a novel approach to estimate scene dynamics in form of a segmentation into independently moving rigid objects and their SE(3)-motions. SF2SE3 operates on two consecutive stereo or RGB-D images. First, noisy scene flow is obtained by application of existing optical flow and depth estimation algorithms. SF2SE3 then iteratively (1) samples pixel sets to compute SE(3)-motion proposa… ▽ More

    Submitted 26 September, 2022; v1 submitted 18 September, 2022; originally announced September 2022.

    Comments: German Conference on Pattern Recognition 2022, Konstanz, Germany

  21. arXiv:2209.06681  [pdf, other

    cs.CV

    A Benchmark and a Baseline for Robust Multi-view Depth Estimation

    Authors: Philipp Schröppel, Jan Bechtold, Artemij Amiranashvili, Thomas Brox

    Abstract: Recent deep learning approaches for multi-view depth estimation are employed either in a depth-from-video or a multi-view stereo setting. Despite different settings, these approaches are technically similar: they correlate multiple source views with a keyview to estimate a depth map for the keyview. In this work, we introduce the Robust Multi-View Depth Benchmark that is built upon a set of public… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Accepted at 3DV 2022

  22. arXiv:2208.14195  [pdf, other

    cs.CV

    Probing Contextual Diversity for Dense Out-of-Distribution Detection

    Authors: Silvio Galesso, Maria Alejandra Bravo, Mehdi Naouar, Thomas Brox

    Abstract: Detection of out-of-distribution (OoD) samples in the context of image classification has recently become an area of interest and active study, along with the topic of uncertainty estimation, to which it is closely related. In this paper we explore the task of OoD segmentation, which has been studied less than its classification counterpart and presents additional challenges. Segmentation is a den… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: Safe Artificial Intelligence for Automated Driving Workshop, ECCV 2022

  23. arXiv:2207.13591  [pdf, other

    cs.RO

    RobotIO: A Python Library for Robot Manipulation Experiments

    Authors: Lukas Hermann, Max Argus, Adrian Roefer, Abhinav Valada, Thomas Brox

    Abstract: Setting up robot environments to quickly test newly developed algorithms is still a difficult and time consuming process. This presents a significant hurdle to researchers interested in performing real-world robotic experiments. RobotIO is a python library designed to solve this problem. It focuses on providing common, simple, and well structured python interfaces for robots, grippers, and cameras… ▽ More

    Submitted 16 August, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

    Comments: 6 pages, 3 figures

  24. arXiv:2207.09239  [pdf, other

    cs.LG stat.ML

    Assaying Out-Of-Distribution Generalization in Transfer Learning

    Authors: Florian Wenzel, Andrea Dittadi, Peter Vincent Gehler, Carl-Johann Simon-Gabriel, Max Horn, Dominik Zietlow, David Kernert, Chris Russell, Thomas Brox, Bernt Schiele, Bernhard Schölkopf, Francesco Locatello

    Abstract: Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions… ▽ More

    Submitted 21 October, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

  25. arXiv:2207.05027  [pdf, other

    cs.CV

    Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations

    Authors: Andrii Zadaianchuk, Matthaeus Kleindessner, Yi Zhu, Francesco Locatello, Thomas Brox

    Abstract: In this paper, we show that recent advances in self-supervised feature learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago. We propose a methodology based on unsupervised saliency masks and self-supervised feature clustering to kickstart object discovery followed by training… ▽ More

    Submitted 30 April, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

  26. arXiv:2207.03866  [pdf, other

    cs.CV

    Pixel-level Correspondence for Self-Supervised Learning from Video

    Authors: Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox

    Abstract: While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  27. arXiv:2205.08441  [pdf, other

    cs.RO cs.CV cs.LG

    Conditional Visual Servoing for Multi-Step Tasks

    Authors: Sergio Izquierdo, Max Argus, Thomas Brox

    Abstract: Visual Servoing has been effectively used to move a robot into specific target locations or to track a recorded demonstration. It does not require manual programming, but it is typically limited to settings where one demonstration maps to one environment state. We propose a modular approach to extend visual servoing to scenarios with multiple demonstration sequences. We call this conditional servo… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

  28. arXiv:2205.06160  [pdf, other

    cs.CV cs.LG

    Localized Vision-Language Matching for Open-vocabulary Object Detection

    Authors: Maria A. Bravo, Sudhanshu Mittal, Thomas Brox

    Abstract: In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the mo… ▽ More

    Submitted 28 July, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted at DAGM German Conference on Pattern Recognition (GCPR 2022)

  29. arXiv:2202.07242  [pdf, other

    cs.CV cs.LG

    Neural Architecture Search for Dense Prediction Tasks in Computer Vision

    Authors: Thomas Elsken, Arber Zela, Jan Hendrik Metzen, Benedikt Staffler, Thomas Brox, Abhinav Valada, Frank Hutter

    Abstract: The success of deep learning in recent years has lead to a rising demand for neural network architecture engineering. As a consequence, neural architecture search (NAS), which aims at automatically designing neural network architectures in a data-driven manner rather than manually, has evolved as a popular field of research. With the advent of weight sharing strategies across architectures, NAS ha… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  30. arXiv:2201.11736  [pdf, other

    cs.CV

    Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives

    Authors: David T. Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, Mehdi Noroozi

    Abstract: This paper introduces Ranking Info Noise Contrastive Estimation (RINCE), a new member in the family of InfoNCE losses that preserves a ranked ordering of positive samples. In contrast to the standard InfoNCE loss, which requires a strict binary separation of the training pairs into similar and dissimilar samples, RINCE can exploit information about a similarity ranking for learning a corresponding… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

    Comments: AAAI 2022 (Main Track)

  31. arXiv:2112.08817  [pdf, other

    cs.CV cs.LG eess.SP q-bio.QM

    Search for temporal cell segmentation robustness in phase-contrast microscopy videos

    Authors: Estibaliz Gómez-de-Mariscal, Hasini Jayatilaka, Özgün Çiçek, Thomas Brox, Denis Wirtz, Arrate Muñoz-Barrutia

    Abstract: Studying cell morphology changes in time is critical to understanding cell migration mechanisms. In this work, we present a deep learning-based workflow to segment cancer cells embedded in 3D collagen matrices and imaged with phase-contrast microscopy. Our approach uses transfer learning and recurrent convolutional long-short term memory units to exploit the temporal information from the past and… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  32. arXiv:2110.06562  [pdf, other

    cs.CV cs.LG stat.ML

    Unsupervised Object Learning via Common Fate

    Authors: Matthias Tangemann, Steffen Schneider, Julius von Kügelgen, Francesco Locatello, Peter Gehler, Thomas Brox, Matthias Kümmerer, Matthias Bethge, Bernhard Schölkopf

    Abstract: Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative model… ▽ More

    Submitted 15 May, 2023; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Published at CLeaR 2023

  33. arXiv:2110.05304  [pdf, other

    cs.LG

    You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory Prediction

    Authors: Osama Makansi, Julius von Kügelgen, Francesco Locatello, Peter Gehler, Dominik Janzing, Thomas Brox, Bernhard Schölkopf

    Abstract: Predicting the future trajectory of a moving agent can be easy when the past trajectory continues smoothly but is challenging when complex interactions with other agents are involved. Recent deep learning approaches for trajectory prediction show promising performance and partially attribute this to successful reasoning about agent-agent interactions. However, it remains unclear which features suc… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

  34. arXiv:2109.14910  [pdf, other

    cs.CV cs.AI cs.LG

    CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

    Authors: Mohammadreza Zolfaghari, Yi Zhu, Peter Gehler, Thomas Brox

    Abstract: Contrastive learning allows us to flexibly define powerful losses by contrasting positive pairs from sets of negative samples. Recently, the principle has also been used to learn cross-modal embeddings for video and text, yet without exploiting its full potential. In particular, previous losses do not take the intra-modality similarities into account, which leads to inefficient embeddings, as the… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Comments: ICCV 2021, 14 pages, 13 figures

  35. arXiv:2107.04369  [pdf, other

    cs.LG stat.ML

    Multi-headed Neural Ensemble Search

    Authors: Ashwin Raaghav Narayanan, Arber Zela, Tonmoy Saikia, Thomas Brox, Frank Hutter

    Abstract: Ensembles of CNN models trained with different seeds (also known as Deep Ensembles) are known to achieve superior performance over a single copy of the CNN. Neural Ensemble Search (NES) can further boost performance by adding architectural diversity. However, the scope of NES remains prohibitive under limited computational resources. In this work, we extend NES to multi-headed ensembles, which con… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 8 pages, 12 figures, 3 tables

  36. arXiv:2106.14999  [pdf, other

    stat.ML cs.LG

    Test-Time Adaptation to Distribution Shift by Confidence Maximization and Input Transformation

    Authors: Chaithanya Kumar Mummadi, Robin Hutmacher, Kilian Rambach, Evgeny Levinkov, Thomas Brox, Jan Hendrik Metzen

    Abstract: Deep neural networks often exhibit poor performance on data that is unlikely under the train-time data distribution, for instance data affected by corruptions. Previous works demonstrate that test-time adaptation to data shift, for instance using entropy minimization, effectively improves performance on such shifted distributions. This paper focuses on the fully test-time adaptation setting, where… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: 16 pages, 5 figures, 7 tables

  37. arXiv:2106.08265  [pdf, other

    cs.CV

    Towards Total Recall in Industrial Anomaly Detection

    Authors: Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Schölkopf, Thomas Brox, Peter Gehler

    Abstract: Being able to spot defective parts is a critical component in large-scale industrial manufacturing. A particular challenge that we address in this work is the cold-start problem: fit a model using nominal (non-defective) example images only. While handcrafted solutions per class are possible, the goal is to build systems that work well simultaneously on many different tasks automatically. The best… ▽ More

    Submitted 5 May, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2022

  38. arXiv:2106.04324  [pdf, other

    cs.CV

    Contrastive Representation Learning for Hand Shape Estimation

    Authors: Christian Zimmermann, Max Argus, Thomas Brox

    Abstract: This work presents improvements in monocular hand shape estimation by building on top of recent advances in unsupervised learning. We extend momentum contrastive learning and contribute a structured collection of hand images, well suited for visual representation learning, which we call HanCo. We find that the representation learned by established contrastive learning methods can be improved signi… ▽ More

    Submitted 2 July, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  39. arXiv:2106.00318  [pdf, other

    cs.CV

    Semi-Supervised Disparity Estimation with Deep Feature Reconstruction

    Authors: Julia Guerrero-Viu, Sergio Izquierdo, Philipp Schröppel, Thomas Brox

    Abstract: Despite the success of deep learning in disparity estimation, the domain generalization gap remains an issue. We propose a semi-supervised pipeline that successfully adapts DispNet to a real-world domain by joint supervised training on labeled synthetic data and self-supervised training on unlabeled real data. Furthermore, accounting for the limitations of the widely-used photometric loss, we anal… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: Women in Computer Vision workshop CVPR 2021

  40. arXiv:2104.14386  [pdf, other

    cs.LG cs.AI cs.RO

    Pre-training of Deep RL Agents for Improved Learning under Domain Randomization

    Authors: Artemij Amiranashvili, Max Argus, Lukas Hermann, Wolfram Burgard, Thomas Brox

    Abstract: Visual domain randomization in simulated environments is a widely used method to transfer policies trained in simulation to real robots. However, domain randomization and augmentation hamper the training of a policy. As reinforcement learning struggles with a noisy training signal, this additional nuisance can drastically impede training. For difficult tasks it can even result in complete failure… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

  41. arXiv:2104.00476  [pdf, other

    cs.CV

    Fostering Generalization in Single-view 3D Reconstruction by Learning a Hierarchy of Local and Global Shape Priors

    Authors: Jan Bechtold, Maxim Tatarchenko, Volker Fischer, Thomas Brox

    Abstract: Single-view 3D object reconstruction has seen much progress, yet methods still struggle generalizing to novel shapes unseen during training. Common approaches predominantly rely on learned global shape priors and, hence, disregard detailed local observations. In this work, we address this issue by learning a hierarchy of priors at different levels of locality from ground truth input depth maps. We… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: Accepted at CVPR 2021

  42. arXiv:2103.16255  [pdf, other

    cs.CV

    Towards Understanding Adversarial Robustness of Optical Flow Networks

    Authors: Simon Schrodi, Tonmoy Saikia, Thomas Brox

    Abstract: Recent work demonstrated the lack of robustness of optical flow networks to physical patch-based adversarial attacks. The possibility to physically attack a basic component of automotive systems is a reason for serious concerns. In this paper, we analyze the cause of the problem and show that the lack of robustness is rooted in the classical aperture problem of optical flow estimation in combinati… ▽ More

    Submitted 15 June, 2022; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR 2022

  43. arXiv:2103.16241  [pdf, other

    cs.CV

    Improving robustness against common corruptions with frequency biased models

    Authors: Tonmoy Saikia, Cordelia Schmid, Thomas Brox

    Abstract: CNNs perform remarkably well when the training and test distributions are i.i.d, but unseen image corruptions can cause a surprisingly large drop in performance. In various real scenarios, unexpected distortions, such as random noise, compression artefacts, or weather distortions are common phenomena. Improving performance on corrupted images must not result in degraded i.i.d performance - a chall… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

  44. arXiv:2103.12474  [pdf, other

    cs.CV

    On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors

    Authors: Osama Makansi, Özgün Cicek, Yassine Marrakchi, Thomas Brox

    Abstract: Predicting the states of dynamic traffic actors into the future is important for autonomous systems to operate safelyand efficiently. Remarkably, the most critical scenarios aremuch less frequent and more complex than the uncriticalones. Therefore, uncritical cases dominate the prediction. In this paper, we address specifically the challenging scenarios at the long tail of the dataset distribution… ▽ More

    Submitted 8 August, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Journal ref: ICCV 2021

  45. arXiv:2102.09517  [pdf, other

    cs.CV cs.LG

    Essentials for Class Incremental Learning

    Authors: Sudhanshu Mittal, Silvio Galesso, Thomas Brox

    Abstract: Contemporary neural networks are limited in their ability to learn from evolving streams of training data. When trained sequentially on new or evolving tasks, their accuracy drops sharply, making them unsuitable for many real-world applications. In this work, we shed light on the causes of this well-known yet unsolved phenomenon - often referred to as catastrophic forgetting - in a class-increment… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

  46. Recovering the Imperfect: Cell Segmentation in the Presence of Dynamically Localized Proteins

    Authors: Özgün Çiçek, Yassine Marrakchi, Enoch Boasiako Antwi, Barbara Di Ventura, Thomas Brox

    Abstract: Deploying off-the-shelf segmentation networks on biomedical data has become common practice, yet if structures of interest in an image sequence are visible only temporarily, existing frame-by-frame methods fail. In this paper, we provide a solution to segmentation of imperfect data through time based on temporal propagation and uncertainty estimation. We integrate uncertainty estimation into Mask… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Accepted at MICCAI Workshop on Medical Image Learning with Less Labels and Imperfect Data, 2020

  47. arXiv:2011.00597  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

    Authors: Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, Thomas Brox

    Abstract: Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics. In this paper, we propose a Cooperative hierarchical Transformer (COOT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities. The method consists o… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: 27 pages, 5 figures, 19 tables. To be published in the 34th conference on Neural Information Processing Systems (NeurIPS 2020). The first two authors contributed equally to this work

    ACM Class: I.2.7; I.2.10

  48. arXiv:2007.09746  [pdf, other

    cs.CV cs.RO

    Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic Image Segmentation

    Authors: Gabriel L. Oliveira, Senthil Yogamani, Wolfram Burgard, Thomas Brox

    Abstract: Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers. To address these limitations, we propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content. The new decoder has a new topology of skip connections, namely backward and stacked res… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

  49. arXiv:2007.02701  [pdf, other

    cs.LG cs.AI stat.ML

    Scaling Imitation Learning in Minecraft

    Authors: Artemij Amiranashvili, Nicolai Dorka, Wolfram Burgard, Vladlen Koltun, Thomas Brox

    Abstract: Imitation learning is a powerful family of techniques for learning sensorimotor coordination in immersive environments. We apply imitation learning to attain state-of-the-art performance on hard exploration problems in the Minecraft environment. We report experiments that highlight the influence of network architecture, loss function, and data augmentation. An early version of our approach reached… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  50. arXiv:2007.00291  [pdf, other

    cs.RO cs.CV

    FlowControl: Optical Flow Based Visual Servoing

    Authors: Max Argus, Lukas Hermann, Jon Long, Thomas Brox

    Abstract: One-shot imitation is the vision of robot programming from a single demonstration, rather than by tedious construction of computer code. We present a practical method for realizing one-shot imitation for manipulation tasks, exploiting modern learning-based optical flow to perform real-time visual servoing. Our approach, which we call FlowControl, continuously tracks a demonstration video, using a… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.