Search | arXiv e-print repository

Image retrieval outperforms diffusion models on data augmentation

Authors: Max F. Burg, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco Locatello, Chris Russell

Abstract: Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize e… ▽ More Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize enough to improve over directly using the additional data of their pre-training process for augmentation. We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. Personalizing diffusion models towards the target data outperforms simpler prompting strategies. However, using the pre-training data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure, leads to even stronger downstream performance. Our study explores the potential of diffusion models in generating new training data, and surprisingly finds that these sophisticated models are not yet able to beat a simple and strong image retrieval baseline on simple downstream vision tasks. △ Less

Submitted 30 November, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2110.05304 [pdf, other]

You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory Prediction

Authors: Osama Makansi, Julius von Kügelgen, Francesco Locatello, Peter Gehler, Dominik Janzing, Thomas Brox, Bernhard Schölkopf

Abstract: Predicting the future trajectory of a moving agent can be easy when the past trajectory continues smoothly but is challenging when complex interactions with other agents are involved. Recent deep learning approaches for trajectory prediction show promising performance and partially attribute this to successful reasoning about agent-agent interactions. However, it remains unclear which features suc… ▽ More Predicting the future trajectory of a moving agent can be easy when the past trajectory continues smoothly but is challenging when complex interactions with other agents are involved. Recent deep learning approaches for trajectory prediction show promising performance and partially attribute this to successful reasoning about agent-agent interactions. However, it remains unclear which features such black-box models actually learn to use for making predictions. This paper proposes a procedure that quantifies the contributions of different cues to model performance based on a variant of Shapley values. Applying this procedure to state-of-the-art trajectory prediction methods on standard benchmark datasets shows that they are, in fact, unable to reason about interactions. Instead, the past trajectory of the target is the only feature used for predicting its future. For a task with richer social interaction patterns, on the other hand, the tested models do pick up such interactions to a certain extent, as quantified by our feature attribution method. We discuss the limits of the proposed method and its links to causality △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2103.12474 [pdf, other]

On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors

Authors: Osama Makansi, Özgün Cicek, Yassine Marrakchi, Thomas Brox

Abstract: Predicting the states of dynamic traffic actors into the future is important for autonomous systems to operate safelyand efficiently. Remarkably, the most critical scenarios aremuch less frequent and more complex than the uncriticalones. Therefore, uncritical cases dominate the prediction. In this paper, we address specifically the challenging scenarios at the long tail of the dataset distribution… ▽ More Predicting the states of dynamic traffic actors into the future is important for autonomous systems to operate safelyand efficiently. Remarkably, the most critical scenarios aremuch less frequent and more complex than the uncriticalones. Therefore, uncritical cases dominate the prediction. In this paper, we address specifically the challenging scenarios at the long tail of the dataset distribution. Our analysis shows that the common losses tend to place challenging cases suboptimally in the embedding space. As a consequence, we propose to supplement the usual loss with aloss that places challenging cases closer to each other. This triggers sharing information among challenging cases andlearning specific predictive features. We show on four public datasets that this leads to improved performance on the challenging scenarios while the overall performance stays stable. The approach is agnostic w.r.t. the used network architecture, input modality or viewpoint, and can be integrated into existing solutions easily. Code is available at https://github.com/lmb-freiburg/Contrastive-Future-Trajectory-Prediction △ Less

Submitted 8 August, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Journal ref: ICCV 2021

arXiv:2006.04700 [pdf, other]

Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View with a Reachability Prior

Authors: Osama Makansi, Özgün Cicek, Kevin Buchicchio, Thomas Brox

Abstract: In this paper, we investigate the problem of anticipating future dynamics, particularly the future location of other vehicles and pedestrians, in the view of a moving vehicle. We approach two fundamental challenges: (1) the partial visibility due to the egocentric view with a single RGB camera and considerable field-of-view change due to the egomotion of the vehicle; (2) the multimodality of the d… ▽ More In this paper, we investigate the problem of anticipating future dynamics, particularly the future location of other vehicles and pedestrians, in the view of a moving vehicle. We approach two fundamental challenges: (1) the partial visibility due to the egocentric view with a single RGB camera and considerable field-of-view change due to the egomotion of the vehicle; (2) the multimodality of the distribution of future states. In contrast to many previous works, we do not assume structural knowledge from maps. We rather estimate a reachability prior for certain classes of objects from the semantic map of the present image and propagate it into the future using the planned egomotion. Experiments show that the reachability prior combined with multi-hypotheses learning improves multimodal prediction of the future location of tracked objects and, for the first time, the emergence of new objects. We also demonstrate promising zero-shot transfer to unseen datasets. Source code is available at $\href{https://github.com/lmb-freiburg/FLN-EPN-RPN}{\text{this https URL.}}$ △ Less

Submitted 8 June, 2020; originally announced June 2020.

Comments: In CVPR 2020

arXiv:1906.03631 [pdf, other]

doi 10.1109/CVPR.2019.00731

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

Authors: Osama Makansi, Eddy Ilg, Özgün Cicek, Thomas Brox

Abstract: Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture com… ▽ More Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture components that suffer from instabilities in training and mode collapse. In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grou** of samples to multiple modes. Moreover, we discuss how to evaluate predicted multimodal distributions, including the common real scenario, where only a single sample from the ground-truth distribution is available for evaluation. We show on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse. Source code is available at $\href{https://github.com/lmb-freiburg/Multimodal-Future-Prediction}{\text{this https URL.}}$ △ Less

Submitted 8 June, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

Comments: In CVPR 2019

arXiv:1808.06389 [pdf, other]

FusionNet and AugmentedFlowNet: Selective Proxy Ground Truth for Training on Unlabeled Images

Authors: Osama Makansi, Eddy Ilg, Thomas Brox

Abstract: Recent work has shown that convolutional neural networks (CNNs) can be used to estimate optical flow with high quality and fast runtime. This makes them preferable for real-world applications. However, such networks require very large training datasets. Engineering the training data is difficult and/or laborious. This paper shows how to augment a network trained on an existing synthetic dataset wi… ▽ More Recent work has shown that convolutional neural networks (CNNs) can be used to estimate optical flow with high quality and fast runtime. This makes them preferable for real-world applications. However, such networks require very large training datasets. Engineering the training data is difficult and/or laborious. This paper shows how to augment a network trained on an existing synthetic dataset with large amounts of additional unlabelled data. In particular, we introduce a selection mechanism to assemble from multiple estimates a joint optical flow field, which outperforms that of all input methods. The latter can be used as proxy-ground-truth to train a network on real-world data and to adapt it to specific domains of interest. Our experimental results show that the performance of networks improves considerably, both, in cross-domain and in domain-specific scenarios. As a consequence, we obtain state-of-the-art results on the KITTI benchmarks. △ Less

Submitted 20 August, 2018; originally announced August 2018.

Comments: See video at: https://www.youtube.com/watch?v=HdMeb20Rybs

arXiv:1802.07095 [pdf, other]

Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow

Authors: Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox

Abstract: Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate their local uncertainty about the correctness of their prediction, which is vital information when building decisions on top of the estimations. For the first time… ▽ More Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate their local uncertainty about the correctness of their prediction, which is vital information when building decisions on top of the estimations. For the first time we compare several strategies and techniques to estimate uncertainty in a large-scale computer vision task like optical flow estimation. Moreover, we introduce a new network architecture utilizing the Winner-Takes-All loss and show that this can provide complementary hypotheses and uncertainty estimates efficiently with a single forward pass and without the need for sampling or ensembles. Finally, we demonstrate the quality of the different uncertainty estimates, which is clearly above previous confidence measures on optical flow and allows for interactive frame rates. △ Less

Submitted 20 December, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

Comments: Accepted to ECCV 2018 as poster. See Video at: https://youtu.be/HvyovWSo8uE

arXiv:1707.00471 [pdf, other]

End-to-End Learning of Video Super-Resolution with Motion Compensation

Authors: Osama Makansi, Eddy Ilg, Thomas Brox

Abstract: Learning approaches have shown great success in the task of super-resolving an image given a low resolution input. Video super-resolution aims for exploiting additionally the information from multiple images. Typically, the images are related via optical flow and consecutive image war**. In this paper, we provide an end-to-end video super-resolution network that, in contrast to previous works, i… ▽ More Learning approaches have shown great success in the task of super-resolving an image given a low resolution input. Video super-resolution aims for exploiting additionally the information from multiple images. Typically, the images are related via optical flow and consecutive image war**. In this paper, we provide an end-to-end video super-resolution network that, in contrast to previous works, includes the estimation of optical flow in the overall network architecture. We analyze the usage of optical flow for video super-resolution and find that common off-the-shelf image war** does not allow video super-resolution to benefit much from optical flow. We rather propose an operation for motion compensation that performs war** from low to high resolution directly. We show that with this network configuration, video super-resolution can benefit from optical flow and we obtain state-of-the-art results on the popular test sets. We also show that the processing of whole images rather than independent patches is responsible for a large increase in accuracy. △ Less

Submitted 3 July, 2017; originally announced July 2017.

Comments: Accepted to GCPR2017

Showing 1–8 of 8 results for author: Makansi, O