Skip to main content

Showing 51–100 of 121 results for author: Brox, T

.
  1. arXiv:2007.00291  [pdf, other

    cs.RO cs.CV

    FlowControl: Optical Flow Based Visual Servoing

    Authors: Max Argus, Lukas Hermann, Jon Long, Thomas Brox

    Abstract: One-shot imitation is the vision of robot programming from a single demonstration, rather than by tedious construction of computer code. We present a practical method for realizing one-shot imitation for manipulation tasks, exploiting modern learning-based optical flow to perform real-time visual servoing. Our approach, which we call FlowControl, continuously tracks a demonstration video, using a… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

  2. arXiv:2006.07872  [pdf, other

    cs.CV

    Explicitly Modeled Attention Maps for Image Classification

    Authors: Andong Tan, Duc Tam Nguyen, Maximilian Dax, Matthias Nießner, Thomas Brox

    Abstract: Self-attention networks have shown remarkable progress in computer vision tasks such as image classification. The main benefit of the self-attention mechanism is the ability to capture long-range feature interactions in attention-maps. However, the computation of attention-maps requires a learnable key, query, and positional encoding, whose usage is often not intuitive and computationally expensiv… ▽ More

    Submitted 18 March, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: Accepted by AAAI2021

  3. arXiv:2006.04700  [pdf, other

    cs.CV

    Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View with a Reachability Prior

    Authors: Osama Makansi, Özgün Cicek, Kevin Buchicchio, Thomas Brox

    Abstract: In this paper, we investigate the problem of anticipating future dynamics, particularly the future location of other vehicles and pedestrians, in the view of a moving vehicle. We approach two fundamental challenges: (1) the partial visibility due to the egocentric view with a single RGB camera and considerable field-of-view change due to the egomotion of the vehicle; (2) the multimodality of the d… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: In CVPR 2020

  4. arXiv:2004.01823  [pdf, other

    cs.CV cs.LG eess.IV

    Temporal Shift GAN for Large Scale Video Generation

    Authors: Andres Munoz, Mohammadreza Zolfaghari, Max Argus, Thomas Brox

    Abstract: Video generation models have become increasingly popular in the last few years, however the standard 2D architectures used today lack natural spatio-temporal modelling capabilities. In this paper, we present a network architecture for video generation that models spatio-temporal consistency without resorting to costly 3D architectures. The architecture facilitates information exchange between neig… ▽ More

    Submitted 10 November, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: 14 pages, 15 figures

    ACM Class: I.2.10

  5. arXiv:2001.07926  [pdf, other

    cs.CV

    Optimized Generic Feature Learning for Few-shot Classification across Domains

    Authors: Tonmoy Saikia, Thomas Brox, Cordelia Schmid

    Abstract: To learn models or features that generalize across tasks and domains is one of the grand goals of machine learning. In this paper, we propose to use cross-domain, cross-task data as validation objective for hyper-parameter optimization (HPO) to improve on this goal. Given a rich enough search space, optimization of hyper-parameters learn features that maximize validation performance and, due to th… ▽ More

    Submitted 22 January, 2020; originally announced January 2020.

  6. arXiv:1912.05361  [pdf, other

    cs.CV

    Parting with Illusions about Deep Active Learning

    Authors: Sudhanshu Mittal, Maxim Tatarchenko, Özgün Çiçek, Thomas Brox

    Abstract: Active learning aims to reduce the high labeling cost involved in training machine learning models on large datasets by efficiently labeling only the most informative samples. Recently, deep active learning has shown success on various tasks. However, the conventional evaluation scheme used for deep active learning is below par. Current methods disregard some apparent parallel work in the closely… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

  7. arXiv:1910.07972  [pdf, other

    cs.RO cs.CV cs.LG

    Adaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor Control

    Authors: Lukas Hermann, Max Argus, Andreas Eitel, Artemij Amiranashvili, Wolfram Burgard, Thomas Brox

    Abstract: We propose Adaptive Curriculum Generation from Demonstrations (ACGD) for reinforcement learning in the presence of sparse rewards. Rather than designing shaped reward functions, ACGD adaptively sets the appropriate task difficulty for the learner by controlling where to sample from the demonstration trajectories and which set of simulation parameters to use. We show that training vision-based cont… ▽ More

    Submitted 8 July, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

    Comments: Accepted at the 2020 IEEE International Conference on Robotics and Automation (ICRA). Project page see https://lmb.informatik.uni-freiburg.de/projects/curriculum/

  8. arXiv:1910.07948  [pdf, other

    cs.RO cs.CV cs.LG eess.IV

    Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics

    Authors: Oier Mees, Maxim Tatarchenko, Thomas Brox, Wolfram Burgard

    Abstract: We present a convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image. During training, our network gets the learning signal from a silhouette of an object in the input image - a form of self-supervision. It does not require ground truth data for 3D shapes and the viewpoints. Because it relies on such a weak form of supervision, our approach can… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

    Comments: Accepted at the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Video at https://www.youtube.com/watch?v=oQgHG9JdMP4

  9. arXiv:1910.01842  [pdf, other

    cs.CV cs.LG stat.ML

    SELF: Learning to Filter Noisy Labels with Self-Ensembling

    Authors: Duc Tam Nguyen, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Laura Beggel, Thomas Brox

    Abstract: Deep neural networks (DNNs) have been shown to over-fit a dataset when being trained with noisy labels for a long enough time. To overcome this problem, we present a simple and effective method self-ensemble label filtering (SELF) to progressively filter out the wrong labels during training. Our method improves the task performance by gradually allowing supervision only from the potentially non-no… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  10. arXiv:1909.13055  [pdf, other

    cs.CV cs.LG eess.IV

    DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision

    Authors: Duc Tam Nguyen, Maximilian Dax, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Zhongyu Lou, Thomas Brox

    Abstract: Deep neural network (DNN) based salient object detection in images based on high-quality labels is expensive. Alternative unsupervised approaches rely on careful selection of multiple handcrafted saliency methods to generate noisy pseudo-ground-truth labels. In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement… ▽ More

    Submitted 15 March, 2021; v1 submitted 28 September, 2019; originally announced September 2019.

    Comments: NeuRIPS-2019 (Vancouver, Canada): camera ready version

  11. arXiv:1909.09656  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Understanding and Robustifying Differentiable Architecture Search

    Authors: Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter

    Abstract: Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performa… ▽ More

    Submitted 28 January, 2020; v1 submitted 20 September, 2019; originally announced September 2019.

    Comments: In: International Conference on Learning Representations (ICLR 2020); 28 pages, 30 figures

  12. arXiv:1909.04349  [pdf, other

    cs.CV cs.LG cs.RO

    FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images

    Authors: Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, Thomas Brox

    Abstract: Estimating 3D hand pose from single RGB images is a highly ambiguous problem that relies on an unbiased training dataset. In this paper, we analyze cross-dataset generalization when training on existing datasets. We find that approaches perform well on the datasets they are trained on, but do not generalize to other datasets or in-the-wild scenarios. As a consequence, we introduce the first large-… ▽ More

    Submitted 13 September, 2019; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: Accepted to ICCV 2019, Project page: https://lmb.informatik.uni-freiburg.de/projects/freihand/

  13. arXiv:1908.05724  [pdf, other

    cs.CV

    Semi-Supervised Semantic Segmentation with High- and Low-level Consistency

    Authors: Sudhanshu Mittal, Maxim Tatarchenko, Thomas Brox

    Abstract: The ability to understand visual information from limited labeled data is an important aspect of machine learning. While image-level classification has been extensively studied in a semi-supervised setting, dense pixel-level classification with limited data has only drawn attention recently. In this work, we propose an approach for semi-supervised semantic segmentation that learns from limited pix… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

  14. arXiv:1908.03463  [pdf, other

    stat.ML cs.LG

    Group Pruning using a Bounded-Lp norm for Group Gating and Regularization

    Authors: Chaithanya Kumar Mummadi, Tim Genewein, Dan Zhang, Thomas Brox, Volker Fischer

    Abstract: Deep neural networks achieve state-of-the-art results on several tasks while increasing in complexity. It has been shown that neural networks can be pruned during training by imposing sparsity inducing regularizers. In this paper, we investigate two techniques for group-wise pruning during training in order to improve network efficiency. We propose a gating factor after every convolutional layer t… ▽ More

    Submitted 9 August, 2019; originally announced August 2019.

    Comments: German Conference on Pattern Recognition (GCPR) 2019, 12 main pages, 3 pages of appendix, 4 figures, 2 tables

  15. Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

    Authors: Osama Makansi, Eddy Ilg, Özgün Cicek, Thomas Brox

    Abstract: Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture com… ▽ More

    Submitted 8 June, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Comments: In CVPR 2019

  16. arXiv:1906.00216  [pdf, other

    cs.LG cs.CV stat.ML

    Robust Learning Under Label Noise With Iterative Noise-Filtering

    Authors: Duc Tam Nguyen, Thi-Phuong-Nhung Ngo, Zhongyu Lou, Michael Klar, Laura Beggel, Thomas Brox

    Abstract: We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to them or completely removing them from the training set. In the first case the model however still learns from noisy labels; in the latter approach, good training d… ▽ More

    Submitted 1 June, 2019; originally announced June 2019.

  17. arXiv:1905.07443  [pdf, other

    cs.CV cs.AI cs.LG

    AutoDispNet: Improving Disparity Estimation With AutoML

    Authors: Tonmoy Saikia, Yassine Marrakchi, Arber Zela, Frank Hutter, Thomas Brox

    Abstract: Much research work in computer vision is being spent on optimizing existing network architectures to obtain a few more percentage points on benchmarks. Recent AutoML approaches promise to relieve us from this effort. However, they are mainly designed for comparatively small-scale classification tasks. In this work, we show how to use and extend existing AutoML techniques to efficiently optimize la… ▽ More

    Submitted 6 October, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

    Comments: In Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV)

  18. arXiv:1905.03678  [pdf, other

    cs.CV

    What Do Single-view 3D Reconstruction Networks Learn?

    Authors: Maxim Tatarchenko, Stephan R. Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, Thomas Brox

    Abstract: Convolutional networks for single-view object reconstruction have shown impressive performance and have become a popular subject of research. All existing techniques are united by the idea of having an encoder-decoder network that performs non-trivial reasoning about the 3D structure of the output space. In this work, we set up two alternative approaches that perform image classification and retri… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

  19. arXiv:1905.03578  [pdf, other

    cs.CV cs.AI cs.IT cs.RO

    Learning Representations for Predicting Future Activities

    Authors: Mohammadreza Zolfaghari, Özgün Çiçek, Syed Mohsin Ali, Farzaneh Mahdisoltani, Can Zhang, Thomas Brox

    Abstract: Foreseeing the future is one of the key factors of intelligence. It involves understanding of the past and current environment as well as decent experience of its possible dynamics. In this work, we address future prediction at the abstract level of activities. We propose a network module for learning embeddings of the environment's dynamics in a self-supervised way. To take the ambiguities and hi… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: 14 pages, ICCV 2019 submission, Code and Models: https://github.com/lmb-freiburg/PreFAct

  20. arXiv:1904.05847  [pdf, other

    cs.CV

    MAIN: Multi-Attention Instance Network for Video Segmentation

    Authors: Juan Leon Alcazar, Maria A. Bravo, Ali K. Thabet, Guillaume Jeanneret, Thomas Brox, Pablo Arbelaez, Bernard Ghanem

    Abstract: Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Netwo… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

  21. arXiv:1904.02028  [pdf, other

    cs.CV

    CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth

    Authors: Jose M. Facil, Benjamin Ummenhofer, Huizhong Zhou, Luis Montesano, Thomas Brox, Javier Civera

    Abstract: Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model. Thus, changing the camera model requires collecting an entirely new training dataset. In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn cali… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: Camera ready version for CVPR 2019. Project page: http://webdiis.unizar.es/~jmfacil/camconvs/

  22. arXiv:1902.05605  [pdf, other

    cs.LG stat.ML

    CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

    Authors: Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, Jan Peters

    Abstract: Sample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce Cro… ▽ More

    Submitted 25 March, 2024; v1 submitted 14 February, 2019; originally announced February 2019.

    Comments: Published at ICLR 2024. Project page at http://aditya.bhatts.org/CrossQ and code release at https://github.com/adityab/CrossQ

  23. arXiv:1901.03162  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Motion Perception in Reinforcement Learning with Dynamic Objects

    Authors: Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

    Abstract: In dynamic environments, learned controllers are supposed to take motion into account when selecting the action to be taken. However, in existing reinforcement learning works motion is rarely treated explicitly; it is rather assumed that the controller learns the necessary motion representation from temporal stacks of frames implicitly. In this paper, we show that for continuous control tasks lear… ▽ More

    Submitted 1 February, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

  24. arXiv:1812.03705  [pdf, other

    cs.CV cs.CR cs.LG stat.ML

    Defending Against Universal Perturbations With Shared Adversarial Training

    Authors: Chaithanya Kumar Mummadi, Thomas Brox, Jan Hendrik Metzen

    Abstract: Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such adversarial perturbations, it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this work, we show that adversarial training… ▽ More

    Submitted 13 August, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

    Comments: ICCV 2019, 8 main pages, 9 appendix pages, 16 figures, 2 tables

  25. arXiv:1810.13292  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Anomaly Detection With Multiple-Hypotheses Predictions

    Authors: Duc Tam Nguyen, Zhongyu Lou, Michael Klar, Thomas Brox

    Abstract: In one-class-learning tasks, only the normal case (foreground) can be modeled with data, whereas the variation of all possible anomalies is too erratic to be described by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the foreground, are used… ▽ More

    Submitted 31 May, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

    Comments: In proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, California, PMLR 97, 2019

  26. arXiv:1808.06389  [pdf, other

    cs.CV

    FusionNet and AugmentedFlowNet: Selective Proxy Ground Truth for Training on Unlabeled Images

    Authors: Osama Makansi, Eddy Ilg, Thomas Brox

    Abstract: Recent work has shown that convolutional neural networks (CNNs) can be used to estimate optical flow with high quality and fast runtime. This makes them preferable for real-world applications. However, such networks require very large training datasets. Engineering the training data is difficult and/or laborious. This paper shows how to augment a network trained on an existing synthetic dataset wi… ▽ More

    Submitted 20 August, 2018; originally announced August 2018.

    Comments: See video at: https://www.youtube.com/watch?v=HdMeb20Rybs

  27. arXiv:1808.01900  [pdf, other

    cs.CV

    DeepTAM: Deep Tracking and Map**

    Authors: Huizhong Zhou, Benjamin Ummenhofer, Thomas Brox

    Abstract: We present a system for keyframe-based dense camera tracking and depth map estimation that is entirely learned. For tracking, we estimate small pose increments between the current camera image and a synthetic viewpoint. This significantly simplifies the learning problem and alleviates the dataset bias for camera motions. Further, we show that generating a large number of pose hypotheses leads to m… ▽ More

    Submitted 7 August, 2018; v1 submitted 6 August, 2018; originally announced August 2018.

    Comments: Accepted to ECCV 2018 as oral. Project page: https://lmb.informatik.uni-freiburg.de/people/zhouh/deeptam/

  28. arXiv:1808.01838  [pdf, other

    cs.CV

    Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation

    Authors: Eddy Ilg, Tonmoy Saikia, Margret Keuper, Thomas Brox

    Abstract: Occlusions play an important role in disparity and optical flow estimation, since matching costs are not available in occluded areas and occlusions indicate depth or motion boundaries. Moreover, occlusions are relevant for motion segmentation and scene flow estimation. In this paper, we present an efficient learning-based approach to estimate occlusion areas jointly with disparities or optical flo… ▽ More

    Submitted 8 August, 2018; v1 submitted 6 August, 2018; originally announced August 2018.

    Comments: Accepted to ECCV 2018 as poster. See video at: https://www.youtube.com/watch?v=SwOdSaBRysI

  29. arXiv:1806.01175  [pdf, other

    cs.LG cs.AI stat.ML

    TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

    Authors: Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

    Abstract: Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators. These results suggest that RL methods that use temporal differencing (TD) are superior to direct Monte Carlo estimation (MC). How do these results hold up in deep RL, which deals with perceptually compl… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

  30. arXiv:1804.09066  [pdf, other

    cs.CV cs.AI cs.IR cs.MM

    ECO: Efficient Convolutional Network for Online Video Understanding

    Authors: Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox

    Abstract: The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast per-frame processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification… ▽ More

    Submitted 7 May, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: Submitted to ECCV 2018. 17 pages, 7 figures, Supplementary Material, https://github.com/mzolfaghari/ECO-efficient-video-understanding

  31. arXiv:1804.01792  [pdf, other

    cs.RO

    TrimBot2020: an outdoor robot for automatic gardening

    Authors: Nicola Strisciuglio, Radim Tylecek, Michael Blaich, Nicolai Petkov, Peter Bieber, Jochen Hemming, Eldert van Henten, Torsten Sattler, Marc Pollefeys, Theo Gevers, Thomas Brox, Robert B. Fisher

    Abstract: Robots are increasingly present in modern industry and also in everyday life. Their applications range from health-related situations, for assistance to elderly people or in surgical operations, to automatic and driver-less vehicles (on wheels or flying) or for driving assistance. Recently, an interest towards robotics applied in agriculture and gardening has arisen, with applications to automatic… ▽ More

    Submitted 15 May, 2018; v1 submitted 5 April, 2018; originally announced April 2018.

    Comments: Accepted for publication at International Sympsium on Robotics 2018

  32. arXiv:1803.02622  [pdf, other

    cs.CV cs.RO

    3D Human Pose Estimation in RGBD Images for Robotic Task Learning

    Authors: Christian Zimmermann, Tim Welschehold, Christian Dornhege, Wolfram Burgard, Thomas Brox

    Abstract: We propose an approach to estimate 3D human pose in real world units from a single RGBD image and show that it exceeds performance of monocular 3D pose estimation approaches from color as well as pose estimation exclusively from depth. Our approach builds on robust human keypoint detectors for color images and incorporates depth for lifting into 3D. We combine the system with our learning from dem… ▽ More

    Submitted 13 March, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: Accepted to ICRA 2018. Video and Code (ROS node) are available: http://lmb.informatik.uni-freiburg.de/projects/rgbd-pose3d/

  33. arXiv:1802.07095  [pdf, other

    cs.CV

    Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow

    Authors: Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox

    Abstract: Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate their local uncertainty about the correctness of their prediction, which is vital information when building decisions on top of the estimations. For the first time… ▽ More

    Submitted 20 December, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

    Comments: Accepted to ECCV 2018 as poster. See Video at: https://youtu.be/HvyovWSo8uE

  34. What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

    Authors: Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox

    Abstract: The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method… ▽ More

    Submitted 22 March, 2018; v1 submitted 19 January, 2018; originally announced January 2018.

    Comments: added references (UCL dataset); added IJCV copyright information

  35. arXiv:1708.06500  [pdf, other

    cs.CV

    Sparsity Invariant CNNs

    Authors: Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, Andreas Geiger

    Abstract: In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution lay… ▽ More

    Submitted 30 August, 2017; v1 submitted 22 August, 2017; originally announced August 2017.

  36. Artistic style transfer for videos and spherical images

    Authors: Manuel Ruder, Alexey Dosovitskiy, Thomas Brox

    Abstract: Manually re-drawing an image in a certain artistic style takes a professional artist a long time. Doing this for a video sequence single-handedly is beyond imagination. We present two computational approaches that transfer the style from one image (for example, a painting) to a whole video sequence. In our first approach, we adapt to videos the original image style transfer technique by Gatys et a… ▽ More

    Submitted 5 August, 2018; v1 submitted 13 August, 2017; originally announced August 2017.

    Comments: v3: added ref to conference. This paper is a successor of and overlaps with arXiv:1604.08610, International Journal of Computer Vision (IJCV), 2018

  37. arXiv:1707.02278  [pdf, other

    math.OC math.NA

    Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

    Authors: Peter Ochs, Jalal Fadili, Thomas Brox

    Abstract: We propose a unifying algorithm for non-smooth non-convex optimization. The algorithm approximates the objective function by a convex model function and finds an approximate (Bregman) proximal point of the convex model. This approximate minimizer of the model function yields a descent direction, along which the next iterate is found. Complemented with an Armijo-like line search strategy, we obtain… ▽ More

    Submitted 25 June, 2018; v1 submitted 7 July, 2017; originally announced July 2017.

  38. arXiv:1707.00471  [pdf, other

    cs.CV

    End-to-End Learning of Video Super-Resolution with Motion Compensation

    Authors: Osama Makansi, Eddy Ilg, Thomas Brox

    Abstract: Learning approaches have shown great success in the task of super-resolving an image given a low resolution input. Video super-resolution aims for exploiting additionally the information from multiple images. Typically, the images are related via optical flow and consecutive image war**. In this paper, we provide an end-to-end video super-resolution network that, in contrast to previous works, i… ▽ More

    Submitted 3 July, 2017; originally announced July 2017.

    Comments: Accepted to GCPR2017

  39. arXiv:1706.08775  [pdf, other

    cs.CV cs.RO

    Topometric Localization with Deep Learning

    Authors: Gabriel L. Oliveira, Noha Radwan, Wolfram Burgard, Thomas Brox

    Abstract: Compared to LiDAR-based localization methods, which provide high accuracy but rely on expensive sensors, visual localization approaches only require a camera and thus are more cost-effective while their accuracy and reliability typically is inferior to LiDAR-based methods. In this work, we propose a vision-based localization approach that learns from LiDAR-based localization methods by using their… ▽ More

    Submitted 27 June, 2017; originally announced June 2017.

    Comments: 16 pages, 7 figures, ISRR 2017 submission

  40. arXiv:1705.01389  [pdf, other

    cs.CV

    Learning to Estimate 3D Hand Pose from Single RGB Images

    Authors: Christian Zimmermann, Thomas Brox

    Abstract: Low-cost consumer depth cameras and deep learning have enabled reasonable 3D hand pose estimation from single depth images. In this paper, we present an approach that estimates 3D hand pose from regular RGB images. This task has far more ambiguities due to the missing depth information. To this end, we propose a deep network that learns a network-implicit 3D articulation prior. Together with detec… ▽ More

    Submitted 15 October, 2017; v1 submitted 3 May, 2017; originally announced May 2017.

    Comments: Accepted to ICCV 2017. Code and dataset is released: https://lmb.informatik.uni-freiburg.de/projects/hand3d/

  41. arXiv:1704.05712  [pdf, other

    stat.ML cs.AI cs.CV cs.LG cs.NE

    Universal Adversarial Perturbations Against Semantic Image Segmentation

    Authors: Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, Volker Fischer

    Abstract: While deep learning is remarkably successful on perceptual tasks, it was also shown to be vulnerable to adversarial perturbations of the input. These perturbations denote noise added to the input that was generated specifically to fool the system while being quasi-imperceptible for humans. More severely, there even exist universal perturbations that are input-agnostic but fool the network on the m… ▽ More

    Submitted 31 July, 2017; v1 submitted 19 April, 2017; originally announced April 2017.

    Comments: Final version for ICCV including supplementary material

  42. arXiv:1704.00616  [pdf, other

    cs.CV cs.AI cs.HC cs.MM cs.NE

    Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

    Authors: Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox

    Abstract: General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classif… ▽ More

    Submitted 26 May, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: 10 pages, 7 figures, ICCV 2017 submission

  43. arXiv:1703.09554  [pdf, other

    cs.CV

    Lucid Data Dreaming for Video Object Segmentation

    Authors: Anna Khoreva, Rodrigo Benenson, Eddy Ilg, Thomas Brox, Bernt Schiele

    Abstract: Convolutional networks reach top quality in pixel-level video object segmentation but require a large amount of training data (1k~100k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x~1000x less annotated data than competing methods. Our approach is suitable for both single and multiple object segm… ▽ More

    Submitted 13 March, 2019; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: Accepted in International Journal of Computer Vision (IJCV)

  44. arXiv:1703.09438  [pdf, other

    cs.CV

    Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs

    Authors: Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox

    Abstract: We present a deep convolutional decoder architecture that can generate volumetric 3D outputs in a compute- and memory-efficient manner by using an octree representation. The network learns to predict both the structure of the octree, and the occupancy values of individual cells. This makes it a particularly valuable technique for generating 3D shapes. In contrast to standard decoders acting on reg… ▽ More

    Submitted 7 August, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

  45. arXiv:1703.01101  [pdf, other

    stat.ML cs.CR cs.CV cs.LG cs.NE

    Adversarial Examples for Semantic Image Segmentation

    Authors: Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, Thomas Brox

    Abstract: Machine learning methods in general and Deep Neural Networks in particular have shown to be vulnerable to adversarial perturbations. So far this phenomenon has mainly been studied in the context of whole-image classification. In this contribution, we analyse how adversarial perturbations can affect the task of semantic segmentation. We show how existing adversarial attackers can be transferred to… ▽ More

    Submitted 3 March, 2017; originally announced March 2017.

    Comments: ICLR 2017 workshop submission

  46. arXiv:1612.03777  [pdf, other

    cs.CV

    Hybrid Learning of Optical Flow and Next Frame Prediction to Boost Optical Flow in the Wild

    Authors: Nima Sedaghat, Mohammadreza Zolfaghari, Thomas Brox

    Abstract: CNN-based optical flow estimation has attracted attention recently, mainly due to its impressively high frame rates. These networks perform well on synthetic datasets, but they are still far behind the classical methods in real-world videos. This is because there is no ground truth optical flow for training these networks on real data. In this paper, we boost CNN-based optical flow estimation in r… ▽ More

    Submitted 7 April, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

  47. DeMoN: Depth and Motion Network for Learning Monocular Stereo

    Authors: Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy, Thomas Brox

    Abstract: In this paper we formulate structure from motion as a learning problem. We train a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs. The architecture is composed of multiple stacked encoder-decoder networks, the core part being an iterative network that is able to improve its own predictions. The network estimates not only depth and mot… ▽ More

    Submitted 11 April, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

    Comments: Camera ready version for CVPR 2017. Supplementary material included. Project page: http://lmb.informatik.uni-freiburg.de/people/ummenhof/depthmotionnet/

  48. arXiv:1612.01925  [pdf, other

    cs.CV

    FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

    Authors: Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox

    Abstract: The FlowNet demonstrated that optical flow estimation can be cast as a learning problem. However, the state of the art with regard to the quality of the flow has still been defined by traditional methods. Particularly on small displacements and real-world data, FlowNet cannot compete with variational methods. In this paper, we advance the concept of end-to-end learning of optical flow and make it… ▽ More

    Submitted 6 December, 2016; originally announced December 2016.

    Comments: Including supplementary material. For the video see: http://lmb.informatik.uni-freiburg.de/Publications/2016/IMKDB16/

  49. arXiv:1611.04399  [pdf, other

    cs.CV cs.DM

    Joint Graph Decomposition and Node Labeling: Problem, Algorithms, Applications

    Authors: Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, Eldar Insafutdinov, Alexander Kirillov, Carsten Rother, Thomas Brox, Bernt Schiele, Bjoern Andres

    Abstract: We state a combinatorial optimization problem whose feasible solutions define both a decomposition and a node labeling of a given graph. This problem offers a common mathematical abstraction of seemingly unrelated computer vision tasks, including instance-separating semantic segmentation, articulated human body pose estimation and multiple object tracking. Conceptually, the problem we state genera… ▽ More

    Submitted 21 February, 2017; v1 submitted 14 November, 2016; originally announced November 2016.

  50. arXiv:1608.03066  [pdf, other

    cs.CV

    Object Detection, Tracking, and Motion Segmentation for Object-level Video Segmentation

    Authors: Benjamin Drayer, Thomas Brox

    Abstract: We present an approach for object segmentation in videos that combines frame-level object detection with concepts from object tracking and motion segmentation. The approach extracts temporally consistent object tubes based on an off-the-shelf detector. Besides the class label for each tube, this provides a location prior that is independent of motion. For the final video segmentation, we combine t… ▽ More

    Submitted 10 August, 2016; originally announced August 2016.