Search | arXiv e-print repository

CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine Context-Guided Motion Reasoning

Authors: Azin Jahedi, Maximilian Luz, Marc Rivinius, Andrés Bruhn

Abstract: Attention-based motion aggregation concepts have recently shown their usefulness in optical flow estimation, in particular when it comes to handling occluded regions. However, due to their complexity, such concepts have been mainly restricted to coarse-resolution single-scale approaches that fail to provide the detailed outcome of high-resolution multi-scale networks. In this paper, we hence propo… ▽ More Attention-based motion aggregation concepts have recently shown their usefulness in optical flow estimation, in particular when it comes to handling occluded regions. However, due to their complexity, such concepts have been mainly restricted to coarse-resolution single-scale approaches that fail to provide the detailed outcome of high-resolution multi-scale networks. In this paper, we hence propose CCMR: a high-resolution coarse-to-fine approach that leverages attention-based motion grou** concepts to multi-scale optical flow estimation. CCMR relies on a hierarchical two-step attention-based context-motion grou** strategy that first computes global multi-scale context features and then uses them to guide the actual motion grou**. As we iterate both steps over all coarse-to-fine scales, we adapt cross covariance image transformers to allow for an efficient realization while maintaining scale-dependent properties. Experiments and ablations demonstrate that our efforts of combining multi-scale and attention-based concepts pay off. By providing highly detailed flow fields with strong improvements in both occluded and non-occluded regions, our CCMR approach not only outperforms both the corresponding single-scale attention-based and multi-scale attention-free baselines by up to 23.0% and 21.6%, respectively, it also achieves state-of-the-art results, ranking first on KITTI 2015 and second on MPI Sintel Clean and Final. Code and trained models are available at https://github.com/cv-stuttgart /CCMR. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: WACV 2024

arXiv:2310.17403 [pdf, other]

Detection Defenses: An Empty Promise against Adversarial Patch Attacks on Optical Flow

Authors: Erik Scheurer, Jenny Schmalfuss, Alexander Lis, Andrés Bruhn

Abstract: Adversarial patches undermine the reliability of optical flow predictions when placed in arbitrary scene locations. Therefore, they pose a realistic threat to real-world motion detection and its downstream applications. Potential remedies are defense strategies that detect and remove adversarial patches, but their influence on the underlying motion prediction has not been investigated. In this pap… ▽ More Adversarial patches undermine the reliability of optical flow predictions when placed in arbitrary scene locations. Therefore, they pose a realistic threat to real-world motion detection and its downstream applications. Potential remedies are defense strategies that detect and remove adversarial patches, but their influence on the underlying motion prediction has not been investigated. In this paper, we thoroughly examine the currently available detect-and-remove defenses ILP and LGS for a wide selection of state-of-the-art optical flow methods, and illuminate their side effects on the quality and robustness of the final flow predictions. In particular, we implement defense-aware attacks to investigate whether current defenses are able to withstand attacks that take the defense mechanism into account. Our experiments yield two surprising results: Detect-and-remove defenses do not only lower the optical flow quality on benign scenes, in doing so, they also harm the robustness under patch attacks for all tested optical flow methods except FlowNetC. As currently employed detect-and-remove defenses fail to deliver the promised adversarial robustness for optical flow, they evoke a false sense of security. The code is available at https://github.com/cv-stuttgart/DetectionDefenses. △ Less

Submitted 2 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to WACV 2024

arXiv:2305.06716 [pdf, other]

Distracting Downpour: Adversarial Weather Attacks for Motion Estimation

Authors: Jenny Schmalfuss, Lukas Mehl, Andrés Bruhn

Abstract: Current adversarial attacks on motion estimation, or optical flow, optimize small per-pixel perturbations, which are unlikely to appear in the real world. In contrast, adverse weather conditions constitute a much more realistic threat scenario. Hence, in this work, we present a novel attack on motion estimation that exploits adversarially optimized particles to mimic weather effects like snowflake… ▽ More Current adversarial attacks on motion estimation, or optical flow, optimize small per-pixel perturbations, which are unlikely to appear in the real world. In contrast, adverse weather conditions constitute a much more realistic threat scenario. Hence, in this work, we present a novel attack on motion estimation that exploits adversarially optimized particles to mimic weather effects like snowflakes, rain streaks or fog clouds. At the core of our attack framework is a differentiable particle rendering system that integrates particles (i) consistently over multiple time steps (ii) into the 3D space (iii) with a photo-realistic appearance. Through optimization, we obtain adversarial weather that significantly impacts the motion estimation. Surprisingly, methods that previously showed good robustness towards small per-pixel perturbations are particularly vulnerable to adversarial weather. At the same time, augmenting the training with non-optimized weather increases a method's robustness towards weather effects and improves generalizability at almost no additional cost. Our code will be available at https://github.com/cv-stuttgart/DistractingDownpour. △ Less

Submitted 27 July, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Acepted by ICCV 2023. This work is a direct extension of our extended abstract from arXiv:2210.11242

arXiv:2303.01943 [pdf, other]

Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo

Authors: Lukas Mehl, Jenny Schmalfuss, Azin Jahedi, Yaroslava Nalivayko, Andrés Bruhn

Abstract: While recent methods for motion and stereo estimation recover an unprecedented amount of details, such highly detailed structures are neither adequately reflected in the data of existing benchmarks nor their evaluation methodology. Hence, we introduce Spring $-$ a large, high-resolution, high-detail, computer-generated benchmark for scene flow, optical flow, and stereo. Based on rendered scenes fr… ▽ More While recent methods for motion and stereo estimation recover an unprecedented amount of details, such highly detailed structures are neither adequately reflected in the data of existing benchmarks nor their evaluation methodology. Hence, we introduce Spring $-$ a large, high-resolution, high-detail, computer-generated benchmark for scene flow, optical flow, and stereo. Based on rendered scenes from the open-source Blender movie "Spring", it provides photo-realistic HD datasets with state-of-the-art visual effects and ground truth training data. Furthermore, we provide a website to upload, analyze and compare results. Using a novel evaluation methodology based on a super-resolved UHD ground truth, our Spring benchmark can assess the quality of fine structures and provides further detailed performance statistics on different image regions. Regarding the number of ground truth frames, Spring is 60$\times$ larger than the only scene flow benchmark, KITTI 2015, and 15$\times$ larger than the well-established MPI Sintel optical flow benchmark. Initial results for recent methods on our benchmark show that estimating fine details is indeed challenging, as their accuracy leaves significant room for improvement. The Spring benchmark and the corresponding datasets are available at http://spring-benchmark.org. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: CVPR 2023

arXiv:2210.16900 [pdf, other]

High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)

Authors: Azin Jahedi, Maximilian Luz, Lukas Mehl, Marc Rivinius, Andrés Bruhn

Abstract: In this report, we present our optical flow approach, MS-RAFT+, that won the Robust Vision Challenge 2022. It is based on the MS-RAFT method, which successfully integrates several multi-scale concepts into single-scale RAFT. Our approach extends this method by exploiting an additional finer scale for estimating the flow, which is made feasible by on-demand cost computation. This way, it can not on… ▽ More In this report, we present our optical flow approach, MS-RAFT+, that won the Robust Vision Challenge 2022. It is based on the MS-RAFT method, which successfully integrates several multi-scale concepts into single-scale RAFT. Our approach extends this method by exploiting an additional finer scale for estimating the flow, which is made feasible by on-demand cost computation. This way, it can not only operate at half the original resolution, but also use MS-RAFT's shared convex upsampler to obtain full resolution flow. Moreover, our approach relies on an adjusted fine-tuning scheme during training. This in turn aims at improving the generalization across benchmarks. Among all participating methods in the Robust Vision Challenge, our approach ranks first on VIPER and second on KITTI, Sintel, and Middlebury, resulting in the first place of the overall ranking. △ Less

Submitted 30 October, 2022; originally announced October 2022.

Comments: Technical report for the Robust Vision Challenge 2022

arXiv:2210.11242 [pdf, other]

Attacking Motion Estimation with Adversarial Snow

Authors: Jenny Schmalfuss, Lukas Mehl, Andrés Bruhn

Abstract: Current adversarial attacks for motion estimation (optical flow) optimize small per-pixel perturbations, which are unlikely to appear in the real world. In contrast, we exploit a real-world weather phenomenon for a novel attack with adversarially optimized snow. At the core of our attack is a differentiable renderer that consistently integrates photorealistic snowflakes with realistic motion into… ▽ More Current adversarial attacks for motion estimation (optical flow) optimize small per-pixel perturbations, which are unlikely to appear in the real world. In contrast, we exploit a real-world weather phenomenon for a novel attack with adversarially optimized snow. At the core of our attack is a differentiable renderer that consistently integrates photorealistic snowflakes with realistic motion into the 3D scene. Through optimization we obtain adversarial snow that significantly impacts the optical flow while being indistinguishable from ordinary snow. Surprisingly, the impact of our novel attack is largest on methods that previously showed a high robustness to small L_p perturbations. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2207.12163 [pdf, other]

Multi-Scale RAFT: Combining Hierarchical Concepts for Learning-based Optical FLow Estimation

Authors: Azin Jahedi, Lukas Mehl, Marc Rivinius, Andrés Bruhn

Abstract: Many classical and learning-based optical flow methods rely on hierarchical concepts to improve both accuracy and robustness. However, one of the currently most successful approaches -- RAFT -- hardly exploits such concepts. In this work, we show that multi-scale ideas are still valuable. More precisely, using RAFT as a baseline, we propose a novel multi-scale neural network that combines several… ▽ More Many classical and learning-based optical flow methods rely on hierarchical concepts to improve both accuracy and robustness. However, one of the currently most successful approaches -- RAFT -- hardly exploits such concepts. In this work, we show that multi-scale ideas are still valuable. More precisely, using RAFT as a baseline, we propose a novel multi-scale neural network that combines several hierarchical concepts within a single estimation framework. These concepts include (i) a partially shared coarse-to-fine architecture, (ii) multi-scale features, (iii) a hierarchical cost volume and (iv) a multi-scale multi-iteration loss. Experiments on MPI Sintel and KITTI clearly demonstrate the benefits of our approach. They show not only substantial improvements compared to RAFT, but also state-of-the-art results -- in particular in non-occluded regions. Code will be available at https://github.com/cv-stuttgart/MS_RAFT. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2207.05704 [pdf, other]

M-FUSE: Multi-frame Fusion for Scene Flow Estimation

Authors: Lukas Mehl, Azin Jahedi, Jenny Schmalfuss, Andrés Bruhn

Abstract: Recently, neural network for scene flow estimation show impressive results on automotive data such as the KITTI benchmark. However, despite of using sophisticated rigidity assumptions and parametrizations, such networks are typically limited to only two frame pairs which does not allow them to exploit temporal information. In our paper we address this shortcoming by proposing a novel multi-frame a… ▽ More Recently, neural network for scene flow estimation show impressive results on automotive data such as the KITTI benchmark. However, despite of using sophisticated rigidity assumptions and parametrizations, such networks are typically limited to only two frame pairs which does not allow them to exploit temporal information. In our paper we address this shortcoming by proposing a novel multi-frame approach that considers an additional preceding stereo pair. To this end, we proceed in two steps: Firstly, building upon the recent RAFT-3D approach, we develop an improved two-frame baseline by incorporating an advanced stereo method. Secondly, and even more importantly, exploiting the specific modeling concepts of RAFT-3D, we propose a U-Net architecture that performs a fusion of forward and backward flow estimates and hence allows to integrate temporal information on demand. Experiments on the KITTI benchmark do not only show that the advantages of the improved baseline and the temporal fusion approach complement each other, they also demonstrate that the computed scene flow is highly accurate. More precisely, our approach ranks second overall and first for the even more challenging foreground objects, in total outperforming the original RAFT-3D method by more than 16%. Code is available at https://github.com/cv-stuttgart/M-FUSE. △ Less

Submitted 28 October, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023. Copyright: IEEE

arXiv:2205.06597 [pdf, other]

Blind Image Inpainting with Sparse Directional Filter Dictionaries for Lightweight CNNs

Authors: Jenny Schmalfuss, Erik Scheurer, Heng Zhao, Nikolaos Karantzas, Andrés Bruhn, Demetrio Labate

Abstract: Blind inpainting algorithms based on deep learning architectures have shown a remarkable performance in recent years, typically outperforming model-based methods both in terms of image quality and run time. However, neural network strategies typically lack a theoretical explanation, which contrasts with the well-understood theory underlying model-based methods. In this work, we leverage the advant… ▽ More Blind inpainting algorithms based on deep learning architectures have shown a remarkable performance in recent years, typically outperforming model-based methods both in terms of image quality and run time. However, neural network strategies typically lack a theoretical explanation, which contrasts with the well-understood theory underlying model-based methods. In this work, we leverage the advantages of both approaches by integrating theoretically founded concepts from transform domain methods and sparse approximations into a CNN-based approach for blind image inpainting. To this end, we present a novel strategy to learn convolutional kernels that applies a specifically designed filter dictionary whose elements are linearly combined with trainable weights. Numerical experiments demonstrate the competitiveness of this approach. Our results show not only an improved inpainting quality compared to conventional CNNs but also significantly faster network convergence within a lightweight network design. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2203.13214 [pdf, other]

A Perturbation-Constrained Adversarial Attack for Evaluating the Robustness of Optical Flow

Authors: Jenny Schmalfuss, Philipp Scholze, Andrés Bruhn

Abstract: Recent optical flow methods are almost exclusively judged in terms of accuracy, while their robustness is often neglected. Although adversarial attacks offer a useful tool to perform such an analysis, current attacks on optical flow methods focus on real-world attacking scenarios rather than a worst case robustness assessment. Hence, in this work, we propose a novel adversarial attack - the Pertur… ▽ More Recent optical flow methods are almost exclusively judged in terms of accuracy, while their robustness is often neglected. Although adversarial attacks offer a useful tool to perform such an analysis, current attacks on optical flow methods focus on real-world attacking scenarios rather than a worst case robustness assessment. Hence, in this work, we propose a novel adversarial attack - the Perturbation-Constrained Flow Attack (PCFA) - that emphasizes destructivity over applicability as a real-world attack. PCFA is a global attack that optimizes adversarial perturbations to shift the predicted flow towards a specified target flow, while kee** the L2 norm of the perturbation below a chosen bound. Our experiments demonstrate PCFA's applicability in white- and black-box settings, and show it finds stronger adversarial samples than previous attacks. Based on these strong samples, we provide the first joint ranking of optical flow methods considering both prediction quality and adversarial robustness, which reveals state-of-the-art methods to be particularly vulnerable. Code is available at https://github.com/cv-stuttgart/PCFA. △ Less

Submitted 18 July, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: Accepted at the European Conference on Computer Vision (ECCV) 2022

arXiv:2001.06409 [pdf, other]

doi 10.1007/s41233-020-00037-y

Subjective Annotation for a Frame Interpolation Benchmark using Artefact Amplification

Authors: Hui Men, Vlad Hosu, Hanhe Lin, Andrés Bruhn, Dietmar Saupe

Abstract: Current benchmarks for optical flow algorithms evaluate the estimation either directly by comparing the predicted flow fields with the ground truth or indirectly by using the predicted flow fields for frame interpolation and then comparing the interpolated frames with the actual frames. In the latter case, objective quality measures such as the mean squared error are typically employed. However, i… ▽ More Current benchmarks for optical flow algorithms evaluate the estimation either directly by comparing the predicted flow fields with the ground truth or indirectly by using the predicted flow fields for frame interpolation and then comparing the interpolated frames with the actual frames. In the latter case, objective quality measures such as the mean squared error are typically employed. However, it is well known that for image quality assessment, the actual quality experienced by the user cannot be fully deduced from such simple measures. Hence, we conducted a subjective quality assessment crowdscouring study for the interpolated frames provided by one of the optical flow benchmarks, the Middlebury benchmark. We collected forced-choice paired comparisons between interpolated images and corresponding ground truth. To increase the sensitivity of observers when judging minute difference in paired comparisons we introduced a new method to the field of full-reference quality assessment, called artefact amplification. From the crowdsourcing data, we reconstructed absolute quality scale values according to Thurstone's model. As a result, we obtained a re-ranking of the 155 participating algorithms w.r.t. the visual quality of the interpolated frames. This re-ranking not only shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks, the results also provide the ground truth for designing novel image quality assessment (IQA) methods dedicated to perceptual quality of interpolated images. As a first step, we proposed such a new full-reference method, called WAE-IQA. By weighing the local differences between an interpolated image and its ground truth WAE-IQA performed slightly better than the currently best FR-IQA approach from the literature. △ Less

Submitted 28 April, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

Comments: arXiv admin note: text overlap with arXiv:1901.05362

arXiv:1901.05362 [pdf, other]

Technical Report on Visual Quality Assessment for Frame Interpolation

Authors: Hui Men, Hanhe Lin, Vlad Hosu, Daniel Maurer, Andres Bruhn, Dietmar Saupe

Abstract: Current benchmarks for optical flow algorithms evaluate the estimation quality by comparing their predicted flow field with the ground truth, and additionally may compare interpolated frames, based on these predictions, with the correct frames from the actual image sequences. For the latter comparisons, objective measures such as mean square errors are applied. However, for applications like image… ▽ More Current benchmarks for optical flow algorithms evaluate the estimation quality by comparing their predicted flow field with the ground truth, and additionally may compare interpolated frames, based on these predictions, with the correct frames from the actual image sequences. For the latter comparisons, objective measures such as mean square errors are applied. However, for applications like image interpolation, the expected user's quality of experience cannot be fully deduced from such simple quality measures. Therefore, we conducted a subjective quality assessment study by crowdsourcing for the interpolated images provided in one of the optical flow benchmarks, the Middlebury benchmark. We used paired comparisons with forced choice and reconstructed absolute quality scale values according to Thurstone's model using the classical least squares method. The results give rise to a re-ranking of 141 participating algorithms w.r.t. visual quality of interpolated frames mostly based on optical flow estimation. Our re-ranking result shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks. △ Less

Submitted 26 March, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

Comments: 11 pages, 6 figures

arXiv:1806.00800 [pdf, other]

ProFlow: Learning to Predict Optical Flow

Authors: Daniel Maurer, Andrés Bruhn

Abstract: Temporal coherence is a valuable source of information in the context of optical flow estimation. However, finding a suitable motion model to leverage this information is a non-trivial task. In this paper we propose an unsupervised online learning approach based on a convolutional neural network (CNN) that estimates such a motion model individually for each frame. By relating forward and backward… ▽ More Temporal coherence is a valuable source of information in the context of optical flow estimation. However, finding a suitable motion model to leverage this information is a non-trivial task. In this paper we propose an unsupervised online learning approach based on a convolutional neural network (CNN) that estimates such a motion model individually for each frame. By relating forward and backward motion these learned models not only allow to infer valuable motion information based on the backward flow, they also help to improve the performance at occlusions, where a reliable prediction is particularly useful. Moreover, our learned models are spatially variant and hence allow to estimate non-rigid motion per construction. This, in turns, allows to overcome the major limitation of recent rigidity-based approaches that seek to improve the estimation by incorporating additional stereo/SfM constraints. Experiments demonstrate the usefulness of our new approach. They not only show a consistent improvement of up to 27% for all major benchmarks (KITTI 2012, KITTI 2015, MPI Sintel) compared to a baseline without prediction, they also show top results for the MPI Sintel benchmark -- the one of the three benchmarks that contains the largest amount of non-rigid motion. △ Less

Submitted 3 June, 2018; originally announced June 2018.

arXiv:1505.06163 [pdf, other]

Direct Variational Perspective Shape from Shading with Cartesian Depth Parametrisation

Authors: Yong Chul Ju, Daniel Maurer, Michael Breuß, Andrés Bruhn

Abstract: Most of today's state-of-the-art methods for perspective shape from shading are modelled in terms of partial differential equations (PDEs) of Hamilton-Jacobi type. To improve the robustness of such methods w.r.t. noise and missing data, first approaches have recently been proposed that seek to embed the underlying PDE into a variational framework with data and smoothness term. So far, however, suc… ▽ More Most of today's state-of-the-art methods for perspective shape from shading are modelled in terms of partial differential equations (PDEs) of Hamilton-Jacobi type. To improve the robustness of such methods w.r.t. noise and missing data, first approaches have recently been proposed that seek to embed the underlying PDE into a variational framework with data and smoothness term. So far, however, such methods either make use of a radial depth parametrisation that makes the regularisation hard to interpret from a geometrical viewpoint or they consider indirect smoothness terms that require additional consistency constraints to provide valid solutions. Moreover the minimisation of such frameworks is an intricate task, since the underlying energy is typically non-convex. In our paper we address all three of the aforementioned issues. First, we propose a novel variational model that operates directly on the Cartesian depth. In this context, we also point out a common mistake in the derivation of the surface normal. Moreover, we employ a direct second-order regulariser with edge-preservation property. This direct regulariser yields by construction valid solutions without requiring additional consistency constraints. Finally, we also propose a novel coarse-to-fine minimisation framework based on an alternating explicit scheme. This framework allows us to avoid local minima during the minimisation and thus to improve the accuracy of the reconstruction. Experiments show the good quality of our model as well as the usefulness of the proposed numerical scheme. △ Less

Submitted 13 July, 2015; v1 submitted 22 May, 2015; originally announced May 2015.

Showing 1–14 of 14 results for author: Bruhn, A