Search | arXiv e-print repository

FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms

Authors: Lea Bogensperger, Dominik Narnhofer, Alexander Falk, Konrad Schindler, Thomas Pock

Abstract: Medical image segmentation is a crucial task that relies on the ability to accurately identify and isolate regions of interest in medical images. Thereby, generative approaches allow to capture the statistical properties of segmentation masks that are dependent on the respective structures. In this work we propose FlowSDF, an image-guided conditional flow matching framework to represent the signed… ▽ More Medical image segmentation is a crucial task that relies on the ability to accurately identify and isolate regions of interest in medical images. Thereby, generative approaches allow to capture the statistical properties of segmentation masks that are dependent on the respective structures. In this work we propose FlowSDF, an image-guided conditional flow matching framework to represent the signed distance function (SDF) leading to an implicit distribution of segmentation masks. The advantage of leveraging the SDF is a more natural distortion when compared to that of binary masks. By learning a vector field that is directly related to the probability path of a conditional distribution of SDFs, we can accurately sample from the distribution of segmentation masks, allowing for the evaluation of statistical quantities. Thus, this probabilistic representation allows for the generation of uncertainty maps represented by the variance, which can aid in further analysis and enhance the predictive robustness. We qualitatively and quantitatively illustrate competitive performance of the proposed method on a public nuclei and gland segmentation data set, highlighting its utility in medical image segmentation applications. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2403.12710 [pdf, other]

Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition

Authors: Filip Ilic, He Zhao, Thomas Pock, Richard P. Wildes

Abstract: Concerns for the privacy of individuals captured in public imagery have led to privacy-preserving action recognition. Existing approaches often suffer from issues arising through obfuscation being applied globally and a lack of interpretability. Global obfuscation hides privacy sensitive regions, but also contextual regions important for action recognition. Lack of interpretability erodes trust in… ▽ More Concerns for the privacy of individuals captured in public imagery have led to privacy-preserving action recognition. Existing approaches often suffer from issues arising through obfuscation being applied globally and a lack of interpretability. Global obfuscation hides privacy sensitive regions, but also contextual regions important for action recognition. Lack of interpretability erodes trust in these new technologies. We highlight the limitations of current paradigms and propose a solution: Human selected privacy templates that yield interpretability by design, an obfuscation scheme that selectively hides attributes and also induces temporal consistency, which is important in action recognition. Our approach is architecture agnostic and directly modifies input imagery, while existing approaches generally require architecture training. Our approach offers more flexibility, as no retraining is required, and outperforms alternatives on three widely used datasets. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2311.08199 [pdf, other]

Diffusion-based generation of Histopathological Whole Slide Images at a Gigapixel scale

Authors: Robert Harb, Thomas Pock, Heimo Müller

Abstract: We present a novel diffusion-based approach to generate synthetic histopathological Whole Slide Images (WSIs) at an unprecedented gigapixel scale. Synthetic WSIs have many potential applications: They can augment training datasets to enhance the performance of many computational pathology applications. They allow the creation of synthesized copies of datasets that can be shared without violating p… ▽ More We present a novel diffusion-based approach to generate synthetic histopathological Whole Slide Images (WSIs) at an unprecedented gigapixel scale. Synthetic WSIs have many potential applications: They can augment training datasets to enhance the performance of many computational pathology applications. They allow the creation of synthesized copies of datasets that can be shared without violating privacy regulations. Or they can facilitate learning representations of WSIs without requiring data annotations. Despite this variety of applications, no existing deep-learning-based method generates WSIs at their typically high resolutions. Mainly due to the high computational complexity. Therefore, we propose a novel coarse-to-fine sampling scheme to tackle image generation of high-resolution WSIs. In this scheme, we increase the resolution of an initial low-resolution image to a high-resolution WSI. Particularly, a diffusion model sequentially adds fine details to images and increases their resolution. In our experiments, we train our method with WSIs from the TCGA-BRCA dataset. Additionally to quantitative evaluations, we also performed a user study with pathologists. The study results suggest that our generated WSIs resemble the structure of real WSIs. △ Less

Submitted 14 November, 2023; originally announced November 2023.

ACM Class: I.4.9; I.5.4; I.2.10

arXiv:2306.16854 [pdf, other]

On the Relationship Between RNN Hidden State Vectors and Semantic Ground Truth

Authors: Edi Muškardin, Martin Tappler, Ingo Pill, Bernhard K. Aichernig, Thomas Pock

Abstract: We examine the assumption that the hidden-state vectors of recurrent neural networks (RNNs) tend to form clusters of semantically similar vectors, which we dub the clustering hypothesis. While this hypothesis has been assumed in the analysis of RNNs in recent years, its validity has not been studied thoroughly on modern neural network architectures. We examine the clustering hypothesis in the cont… ▽ More We examine the assumption that the hidden-state vectors of recurrent neural networks (RNNs) tend to form clusters of semantically similar vectors, which we dub the clustering hypothesis. While this hypothesis has been assumed in the analysis of RNNs in recent years, its validity has not been studied thoroughly on modern neural network architectures. We examine the clustering hypothesis in the context of RNNs that were trained to recognize regular languages. This enables us to draw on perfect ground-truth automata in our evaluation, against which we can compare the RNN's accuracy and the distribution of the hidden-state vectors. We start with examining the (piecewise linear) separability of an RNN's hidden-state vectors into semantically different classes. We continue the analysis by computing clusters over the hidden-state vector space with multiple state-of-the-art unsupervised clustering approaches. We formally analyze the accuracy of computed clustering functions and the validity of the clustering hypothesis by determining whether clusters group semantically similar vectors to the same state in the ground-truth model. Our evaluation supports the validity of the clustering hypothesis in the majority of examined cases. We observed that the hidden-state vectors of well-trained RNNs are separable, and that the unsupervised clustering techniques succeed in finding clusters of similar state vectors. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2305.15988 [pdf, other]

Non-Log-Concave and Nonsmooth Sampling via Langevin Monte Carlo Algorithms

Authors: Tim Tsz-Kit Lau, Han Liu, Thomas Pock

Abstract: We study the problem of approximate sampling from non-log-concave distributions, e.g., Gaussian mixtures, which is often challenging even in low dimensions due to their multimodality. We focus on performing this task via Markov chain Monte Carlo (MCMC) methods derived from discretizations of the overdamped Langevin diffusions, which are commonly known as Langevin Monte Carlo algorithms. Furthermor… ▽ More We study the problem of approximate sampling from non-log-concave distributions, e.g., Gaussian mixtures, which is often challenging even in low dimensions due to their multimodality. We focus on performing this task via Markov chain Monte Carlo (MCMC) methods derived from discretizations of the overdamped Langevin diffusions, which are commonly known as Langevin Monte Carlo algorithms. Furthermore, we are also interested in two nonsmooth cases for which a large class of proximal MCMC methods have been developed: (i) a nonsmooth prior is considered with a Gaussian mixture likelihood; (ii) a Laplacian mixture distribution. Such nonsmooth and non-log-concave sampling tasks arise from a wide range of applications to Bayesian inference and imaging inverse problems such as image deconvolution. We perform numerical simulations to compare the performance of most commonly used Langevin Monte Carlo algorithms. △ Less

Submitted 29 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2303.05966 [pdf, other]

Score-Based Generative Models for Medical Image Segmentation using Signed Distance Functions

Authors: Lea Bogensperger, Dominik Narnhofer, Filip Ilic, Thomas Pock

Abstract: Medical image segmentation is a crucial task that relies on the ability to accurately identify and isolate regions of interest in medical images. Thereby, generative approaches allow to capture the statistical properties of segmentation masks that are dependent on the respective structures. In this work we propose a conditional score-based generative modeling framework to represent the signed dist… ▽ More Medical image segmentation is a crucial task that relies on the ability to accurately identify and isolate regions of interest in medical images. Thereby, generative approaches allow to capture the statistical properties of segmentation masks that are dependent on the respective structures. In this work we propose a conditional score-based generative modeling framework to represent the signed distance function (SDF) leading to an implicit distribution of segmentation masks. The advantage of leveraging the SDF is a more natural distortion when compared to that of binary masks. By learning the score function of the conditional distribution of SDFs we can accurately sample from the distribution of segmentation masks, allowing for the evaluation of statistical quantities. Thus, this probabilistic representation allows for the generation of uncertainty maps represented by the variance, which can aid in further analysis and enhance the predictive robustness. We qualitatively and quantitatively illustrate competitive performance of the proposed method on a public nuclei and gland segmentation data set, highlighting its potential utility in medical image segmentation applications. △ Less

Submitted 21 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.10502 [pdf, other]

Learning Gradually Non-convex Image Priors Using Score Matching

Authors: Erich Kobler, Thomas Pock

Abstract: In this paper, we propose a unified framework of denoising score-based models in the context of graduated non-convex energy minimization. We show that for sufficiently large noise variance, the associated negative log density -- the energy -- becomes convex. Consequently, denoising score-based models essentially follow a graduated non-convexity heuristic. We apply this framework to learning genera… ▽ More In this paper, we propose a unified framework of denoising score-based models in the context of graduated non-convex energy minimization. We show that for sufficiently large noise variance, the associated negative log density -- the energy -- becomes convex. Consequently, denoising score-based models essentially follow a graduated non-convexity heuristic. We apply this framework to learning generalized Fields of Experts image priors that approximate the joint density of noisy images and their associated variances. These priors can be easily incorporated into existing optimization algorithms for solving inverse problems and naturally implement a fast and robust graduated non-convexity mechanism. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: 13 pages, 3 figures

ACM Class: I.2.6; I.4.10

arXiv:2302.08411 [pdf, other]

doi 10.1007/978-3-031-31975-4_1

Explicit Diffusion of Gaussian Mixture Model Based Image Priors

Authors: Martin Zach, Thomas Pock, Erich Kobler, Antonin Chambolle

Abstract: In this work we tackle the problem of estimating the density $f_X$ of a random variable $X$ by successive smoothing, such that the smoothed random variable $Y$ fulfills $(\partial_t - Δ_1)f_Y(\,\cdot\,, t) = 0$, $f_Y(\,\cdot\,, 0) = f_X$. With a focus on image processing, we propose a product/fields of experts model with Gaussian mixture experts that admits an analytic expression for… ▽ More In this work we tackle the problem of estimating the density $f_X$ of a random variable $X$ by successive smoothing, such that the smoothed random variable $Y$ fulfills $(\partial_t - Δ_1)f_Y(\,\cdot\,, t) = 0$, $f_Y(\,\cdot\,, 0) = f_X$. With a focus on image processing, we propose a product/fields of experts model with Gaussian mixture experts that admits an analytic expression for $f_Y (\,\cdot\,, t)$ under an orthogonality constraint on the filters. This construction naturally allows the model to be trained simultaneously over the entire diffusion horizon using empirical Bayes. We show preliminary results on image denoising where our model leads to competitive results while being tractable, interpretable, and having only a small number of learnable parameters. As a byproduct, our model can be used for reliable noise estimation, allowing blind denoising of images corrupted by heteroscedastic noise. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Journal ref: Scale Space and Variational Methods in Computer Vision (2023). Lecture Notes in Computer Science, vol 14009. 3-15

arXiv:2212.12499 [pdf, other]

Posterior-Variance-Based Error Quantification for Inverse Problems in Imaging

Authors: Dominik Narnhofer, Andreas Habring, Martin Holler, Thomas Pock

Abstract: In this work, a method for obtaining pixel-wise error bounds in Bayesian regularization of inverse imaging problems is introduced. The proposed method employs estimates of the posterior variance together with techniques from conformal prediction in order to obtain coverage guarantees for the error bounds, without making any assumption on the underlying data distribution. It is generally applicable… ▽ More In this work, a method for obtaining pixel-wise error bounds in Bayesian regularization of inverse imaging problems is introduced. The proposed method employs estimates of the posterior variance together with techniques from conformal prediction in order to obtain coverage guarantees for the error bounds, without making any assumption on the underlying data distribution. It is generally applicable to Bayesian regularization approaches, independent, e.g., of the concrete choice of the prior. Furthermore, the coverage guarantees can also be obtained in case only approximate sampling from the posterior is possible. With this in particular, the proposed framework is able to incorporate any learned prior in a black-box manner. Guaranteed coverage without assumptions on the underlying distributions is only achievable since the magnitude of the error bounds is, in general, unknown in advance. Nevertheless, experiments with multiple regularization approaches presented in the paper confirm that in practice, the obtained error bounds are rather tight. For realizing the numerical experiments, also a novel primal-dual Langevin algorithm for sampling from non-smooth distributions is introduced in this work. △ Less

Submitted 23 December, 2022; originally announced December 2022.

MSC Class: 68U10; 62F15; 65C40; 65C60; 65J22

arXiv:2210.13834 [pdf, other]

doi 10.1109/TMI.2023.3311345

Stable Deep MRI Reconstruction using Generative Priors

Authors: Martin Zach, Florian Knoll, Thomas Pock

Abstract: Data-driven approaches recently achieved remarkable success in magnetic resonance imaging (MRI) reconstruction, but integration into clinical routine remains challenging due to a lack of generalizability and interpretability. In this paper, we address these challenges in a unified framework based on generative image priors. We propose a novel deep neural network based regularizer which is trained… ▽ More Data-driven approaches recently achieved remarkable success in magnetic resonance imaging (MRI) reconstruction, but integration into clinical routine remains challenging due to a lack of generalizability and interpretability. In this paper, we address these challenges in a unified framework based on generative image priors. We propose a novel deep neural network based regularizer which is trained in a generative setting on reference magnitude images only. After training, the regularizer encodes higher-level domain statistics which we demonstrate by synthesizing images without data. Embedding the trained model in a classical variational approach yields high-quality reconstructions irrespective of the sub-sampling pattern. In addition, the model shows stable behavior when confronted with out-of-distribution data in the form of contrast variation. Furthermore, a probabilistic interpretation provides a distribution of reconstructions and hence allows uncertainty quantification. To reconstruct parallel MRI, we propose a fast algorithm to jointly estimate the image and the sensitivity maps. The results demonstrate competitive performance, on par with state-of-the-art end-to-end deep learning methods, while preserving the flexibility with respect to sub-sampling patterns and allowing for uncertainty quantification. △ Less

Submitted 15 June, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2207.06261 [pdf, other]

Is Appearance Free Action Recognition Possible?

Authors: Filip Ilic, Thomas Pock, Richard P. Wildes

Abstract: Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absenc… ▽ More Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2203.12658 [pdf, other]

Computed Tomography Reconstruction using Generative Energy-Based Priors

Authors: Martin Zach, Erich Kobler, Thomas Pock

Abstract: In the past decades, Computed Tomography (CT) has established itself as one of the most important imaging techniques in medicine. Today, the applicability of CT is only limited by the deposited radiation dose, reduction of which manifests in noisy or incomplete measurements. Thus, the need for robust reconstruction algorithms arises. In this work, we learn a parametric regularizer with a global re… ▽ More In the past decades, Computed Tomography (CT) has established itself as one of the most important imaging techniques in medicine. Today, the applicability of CT is only limited by the deposited radiation dose, reduction of which manifests in noisy or incomplete measurements. Thus, the need for robust reconstruction algorithms arises. In this work, we learn a parametric regularizer with a global receptive field by maximizing it's likelihood on reference CT data. Due to this unsupervised learning strategy, our trained regularizer truly represents higher-level domain statistics, which we empirically demonstrate by synthesizing CT images. Moreover, this regularizer can easily be applied to different CT reconstruction problems by embedding it in a variational framework, which increases flexibility and interpretability compared to feed-forward learning-based approaches. In addition, the accompanying probabilistic perspective enables experts to explore the full posterior distribution and may quantify uncertainty of the reconstruction approach. We apply the regularizer to limited-angle and few-view CT reconstruction problems, where it outperforms traditional reconstruction algorithms by a large margin. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2102.10863 [pdf, other]

Learning atrial fiber orientations and conductivity tensors from intracardiac maps using physics-informed neural networks

Authors: Thomas Grandits, Simone Pezzuto, Francisco Sahli Costabal, Paris Perdikaris, Thomas Pock, Gernot Plank, Rolf Krause

Abstract: Electroanatomical maps are a key tool in the diagnosis and treatment of atrial fibrillation. Current approaches focus on the activation times recorded. However, more information can be extracted from the available data. The fibers in cardiac tissue conduct the electrical wave faster, and their direction could be inferred from activation times. In this work, we employ a recently developed approach,… ▽ More Electroanatomical maps are a key tool in the diagnosis and treatment of atrial fibrillation. Current approaches focus on the activation times recorded. However, more information can be extracted from the available data. The fibers in cardiac tissue conduct the electrical wave faster, and their direction could be inferred from activation times. In this work, we employ a recently developed approach, called physics informed neural networks, to learn the fiber orientations from electroanatomical maps, taking into account the physics of the electrical wave propagation. In particular, we train the neural network to weakly satisfy the anisotropic eikonal equation and to predict the measured activation times. We use a local basis for the anisotropic conductivity tensor, which encodes the fiber orientation. The methodology is tested both in a synthetic example and for patient data. Our approach shows good agreement in both cases, with an RMSE of 2.2ms on the in-silico data and outperforming a state of the art method on the patient data. The results show a first step towards learning the fiber orientations from electroanatomical maps with physics-informed neural networks. △ Less

Submitted 6 May, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: 10 pages, 3 figures

arXiv:2102.06665 [pdf, other]

Bayesian Uncertainty Estimation of Learned Variational MRI Reconstruction

Authors: Dominik Narnhofer, Alexander Effland, Erich Kobler, Kerstin Hammernik, Florian Knoll, Thomas Pock

Abstract: Recent deep learning approaches focus on improving quantitative scores of dedicated benchmarks, and therefore only reduce the observation-related (aleatoric) uncertainty. However, the model-immanent (epistemic) uncertainty is less frequently systematically analyzed. In this work, we introduce a Bayesian variational framework to quantify the epistemic uncertainty. To this end, we solve the linear i… ▽ More Recent deep learning approaches focus on improving quantitative scores of dedicated benchmarks, and therefore only reduce the observation-related (aleatoric) uncertainty. However, the model-immanent (epistemic) uncertainty is less frequently systematically analyzed. In this work, we introduce a Bayesian variational framework to quantify the epistemic uncertainty. To this end, we solve the linear inverse problem of undersampled MRI reconstruction in a variational setting. The associated energy functional is composed of a data fidelity term and the total deep variation (TDV) as a learned parametric regularizer. To estimate the epistemic uncertainty we draw the parameters of the TDV regularizer from a multivariate Gaussian distribution, whose mean and covariance matrix are learned in a stochastic optimal control problem. In several numerical experiments, we demonstrate that our approach yields competitive results for undersampled MRI reconstruction. Moreover, we can accurately quantify the pixelwise epistemic uncertainty, which can serve radiologists as an additional resource to visualize reconstruction reliability. △ Less

Submitted 22 October, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: 19 pages, 11 figures

MSC Class: 68T45; 65K10; 65D19

arXiv:2011.06539 [pdf, other]

Shared Prior Learning of Energy-Based Models for Image Reconstruction

Authors: Thomas Pinetz, Erich Kobler, Thomas Pock, Alexander Effland

Abstract: We propose a novel learning-based framework for image reconstruction particularly designed for training without ground truth data, which has three major building blocks: energy-based learning, a patch-based Wasserstein loss functional, and shared prior learning. In energy-based learning, the parameters of an energy functional composed of a learned data fidelity term and a data-driven regularizer a… ▽ More We propose a novel learning-based framework for image reconstruction particularly designed for training without ground truth data, which has three major building blocks: energy-based learning, a patch-based Wasserstein loss functional, and shared prior learning. In energy-based learning, the parameters of an energy functional composed of a learned data fidelity term and a data-driven regularizer are computed in a mean-field optimal control problem. In the absence of ground truth data, we change the loss functional to a patch-based Wasserstein functional, in which local statistics of the output images are compared to uncorrupted reference patches. Finally, in shared prior learning, both aforementioned optimal control problems are optimized simultaneously with shared learned parameters of the regularizer to further enhance unsupervised image reconstruction. We derive several time discretization schemes of the gradient flow and verify their consistency in terms of Mosco convergence. In numerous numerical experiments, we demonstrate that the proposed method generates state-of-the-art results for various image reconstruction applications--even if no ground truth images are available for training. △ Less

Submitted 13 November, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: 37 pages, 19 figures

MSC Class: 49J15; 65C30; 65K10; 65L09; 68U10

arXiv:2010.12436 [pdf, other]

BP-MVSNet: Belief-Propagation-Layers for Multi-View-Stereo

Authors: Christian Sormann, Patrick Knöbelreiter, Andreas Kuhn, Mattia Rossi, Thomas Pock, Friedrich Fraundorfer

Abstract: In this work, we propose BP-MVSNet, a convolutional neural network (CNN)-based Multi-View-Stereo (MVS) method that uses a differentiable Conditional Random Field (CRF) layer for regularization. To this end, we propose to extend the BP layer and add what is necessary to successfully use it in the MVS setting. We therefore show how we can calculate a normalization based on the expected 3D error, whi… ▽ More In this work, we propose BP-MVSNet, a convolutional neural network (CNN)-based Multi-View-Stereo (MVS) method that uses a differentiable Conditional Random Field (CRF) layer for regularization. To this end, we propose to extend the BP layer and add what is necessary to successfully use it in the MVS setting. We therefore show how we can calculate a normalization based on the expected 3D error, which we can then use to normalize the label jumps in the CRF. This is required to make the BP layer invariant to different scales in the MVS setting. In order to also enable fractional label jumps, we propose a differentiable interpolation step, which we embed into the computation of the pairwise term. These extensions allow us to integrate the BP layer into a multi-scale MVS network, where we continuously improve a rough initial estimate until we get high quality depth maps as a result. We evaluate the proposed BP-MVSNet in an ablation study and conduct extensive experiments on the DTU, Tanks and Temples and ETH3D data sets. The experiments show that we can significantly outperform the baseline and achieve state-of-the-art results. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: accepted at 3DV 2020

arXiv:2006.08789 [pdf, other]

Total Deep Variation: A Stable Regularizer for Inverse Problems

Authors: Erich Kobler, Alexander Effland, Karl Kunisch, Thomas Pock

Abstract: Various problems in computer vision and medical imaging can be cast as inverse problems. A frequent method for solving inverse problems is the variational approach, which amounts to minimizing an energy composed of a data fidelity term and a regularizer. Classically, handcrafted regularizers are used, which are commonly outperformed by state-of-the-art deep learning approaches. In this work, we co… ▽ More Various problems in computer vision and medical imaging can be cast as inverse problems. A frequent method for solving inverse problems is the variational approach, which amounts to minimizing an energy composed of a data fidelity term and a regularizer. Classically, handcrafted regularizers are used, which are commonly outperformed by state-of-the-art deep learning approaches. In this work, we combine the variational formulation of inverse problems with deep learning by introducing the data-driven general-purpose total deep variation regularizer. In its core, a convolutional neural network extracts local features on multiple scales and in successive blocks. This combination allows for a rigorous mathematical analysis including an optimal control formulation of the training problem in a mean-field setting and a stability analysis with respect to the initial values and the parameters of the regularizer. In addition, we experimentally verify the robustness against adversarial attacks and numerically derive upper bounds for the generalization error. Finally, we achieve state-of-the-art results for numerous imaging tasks. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 30 pages, 12 figures. arXiv admin note: text overlap with arXiv:2001.05005

arXiv:2003.06258 [pdf, other]

Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems

Authors: Patrick Knöbelreiter, Christian Sormann, Alexander Shekhovtsov, Friedrich Fraundorfer, Thomas Pock

Abstract: It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest infer… ▽ More It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: CVPR 2020

arXiv:2001.05005 [pdf, other]

Total Deep Variation for Linear Inverse Problems

Authors: Erich Kobler, Alexander Effland, Karl Kunisch, Thomas Pock

Abstract: Diverse inverse problems in imaging can be cast as variational problems composed of a task-specific data fidelity term and a regularization term. In this paper, we propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning. We cast the learning problem as a discrete sampled optimal control problem, for which we derive the adjoint state… ▽ More Diverse inverse problems in imaging can be cast as variational problems composed of a task-specific data fidelity term and a regularization term. In this paper, we propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning. We cast the learning problem as a discrete sampled optimal control problem, for which we derive the adjoint state equations and an optimality condition. By exploiting the variational structure of our approach, we perform a sensitivity analysis with respect to the learned parameters obtained from different training datasets. Moreover, we carry out a nonlinear eigenfunction analysis, which reveals interesting properties of the learned regularizer. We show state-of-the-art performance for classical image restoration and medical image reconstruction problems. △ Less

Submitted 17 February, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

Comments: 21 pages, 10 figures

MSC Class: 68T45; 93A30; 34H05; 49K15; 65L05

arXiv:1912.10739 [pdf, other]

Improving Optical Flow on a Pyramid Level

Authors: Markus Hofinger, Samuel Rota Bulò, Lorenzo Porzi, Arno Knapitsch, Thomas Pock, Peter Kontschieder

Abstract: In this work we review the coarse-to-fine spatial feature pyramid concept, which is used in state-of-the-art optical flow estimation networks to make exploration of the pixel flow search space computationally tractable and efficient. Within an individual pyramid level, we improve the cost volume construction process by departing from a war**- to a sampling-based strategy, which avoids ghosting a… ▽ More In this work we review the coarse-to-fine spatial feature pyramid concept, which is used in state-of-the-art optical flow estimation networks to make exploration of the pixel flow search space computationally tractable and efficient. Within an individual pyramid level, we improve the cost volume construction process by departing from a war**- to a sampling-based strategy, which avoids ghosting and hence enables us to better preserve fine flow details. We further amplify the positive effects through a level-specific, loss max-pooling strategy that adaptively shifts the focus of the learning process on under-performing predictions. Our second contribution revises the gradient flow across pyramid levels. The typical operations performed at each pyramid level can lead to noisy, or even contradicting gradients across levels. We show and discuss how properly blocking some of these gradient components leads to improved convergence and ultimately better performance. Finally, we introduce a distillation concept to counteract the issue of catastrophic forgetting and thus preserving knowledge over models sequentially trained on multiple datasets. Our findings are conceptually simple and easy to implement, yet result in compelling improvements on relevant error measures that we demonstrate via exhaustive ablations on datasets like Flying Chairs2, Flying Things, Sintel and KITTI. We establish new state-of-the-art results on the challenging Sintel and KITTI 2012 test datasets, and even show the portability of our findings to different optical flow and depth from stereo approaches. △ Less

Submitted 18 July, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

arXiv:1910.00888 [pdf, other]

On the estimation of the Wasserstein distance in generative models

Authors: Thomas Pinetz, Daniel Soukup, Thomas Pock

Abstract: Generative Adversarial Networks (GANs) have been used to model the underlying probability distribution of sample based datasets. GANs are notoriuos for training difficulties and their dependence on arbitrary hyperparameters. One recent improvement in GAN literature is to use the Wasserstein distance as loss function leading to Wasserstein Generative Adversarial Networks (WGANs). Using this as a ba… ▽ More Generative Adversarial Networks (GANs) have been used to model the underlying probability distribution of sample based datasets. GANs are notoriuos for training difficulties and their dependence on arbitrary hyperparameters. One recent improvement in GAN literature is to use the Wasserstein distance as loss function leading to Wasserstein Generative Adversarial Networks (WGANs). Using this as a basis, we show various ways in which the Wasserstein distance is estimated for the task of generative modelling. Additionally, the secrets in training such models are shown and summarized at the end of this work. Where applicable, we extend current works to different algorithms, different cost functions, and different regularization schemes to improve generative models. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: Accepted and presented at GCPR 2019 (http://gcpr2019.tu-dortmund.de/)

arXiv:1907.13391 [pdf, other]

Learned Collaborative Stereo Refinement

Authors: Patrick Knöbelreiter, Thomas Pock

Abstract: In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint… ▽ More In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. The efficiency of our method is demonstrated by the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015. △ Less

Submitted 31 July, 2019; originally announced July 2019.

Comments: @German Conference on Pattern Recognition 2019

arXiv:1907.12446 [pdf, other]

Self-Supervised Learning for Stereo Reconstruction on Aerial Images

Authors: Patrick Knöbelreiter, Christoph Vogel, Thomas Pock

Abstract: Recent developments established deep learning as an inevitable tool to boost the performance of dense matching and stereo estimation. On the downside, learning these networks requires a substantial amount of training data to be successful. Consequently, the application of these models outside of the laboratory is far from straight forward. In this work we propose a self-supervised training procedu… ▽ More Recent developments established deep learning as an inevitable tool to boost the performance of dense matching and stereo estimation. On the downside, learning these networks requires a substantial amount of training data to be successful. Consequently, the application of these models outside of the laboratory is far from straight forward. In this work we propose a self-supervised training procedure that allows us to adapt our network to the specific (imaging) characteristics of the dataset at hand, without the requirement of external ground truth data. We instead generate interim training data by running our intermediate network on the whole dataset, followed by conservative outlier filtering. Bootstrapped from a pre-trained version of our hybrid CNN-CRF model, we alternate the generation of training data and network training. With this simple concept we are able to lift the completeness and accuracy of the pre-trained version significantly. We also show that our final model compares favorably to other popular stereo estimation algorithms on an aerial dataset. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: Symposium Prize Paper Award @IGARSS 2018

arXiv:1907.08488 [pdf, other]

An Optimal Control Approach to Early Stop** Variational Methods for Image Restoration

Authors: Alexander Effland, Erich Kobler, Karl Kunisch, Thomas Pock

Abstract: We investigate a well-known phenomenon of variational approaches in image processing, where typically the best image quality is achieved when the gradient flow process is stopped before converging to a stationary point. This paradox originates from a tradeoff between optimization and modelling errors of the underlying variational model and holds true even if deep learning methods are used to learn… ▽ More We investigate a well-known phenomenon of variational approaches in image processing, where typically the best image quality is achieved when the gradient flow process is stopped before converging to a stationary point. This paradox originates from a tradeoff between optimization and modelling errors of the underlying variational model and holds true even if deep learning methods are used to learn highly expressive regularizers from data. In this paper, we take advantage of this paradox and introduce an optimal stop** time into the gradient flow process, which in turn is learned from data by means of an optimal control approach. As a result, we obtain highly efficient numerical schemes that achieve competitive results for image denoising and image deblurring. A nonlinear spectral analysis of the gradient of the learned regularizer gives enlightening insights about the different regularization properties. △ Less

Submitted 19 July, 2019; originally announced July 2019.

Comments: 14 figures

MSC Class: 68T45; 93A30; 34H05; 49K15; 65L05

arXiv:1905.11327 [pdf, ps, other]

Fast Decomposable Submodular Function Minimization using Constrained Total Variation

Authors: K S Sesh Kumar, Francis Bach, Thomas Pock

Abstract: We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function. Most existing approaches reformulate the problem as the convex minimization of the sum of the corresponding Lovász extensions and the squared Euclidean norm, leading to algorithms requiring total variation oracles of the summand functions; without further assumptions, t… ▽ More We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function. Most existing approaches reformulate the problem as the convex minimization of the sum of the corresponding Lovász extensions and the squared Euclidean norm, leading to algorithms requiring total variation oracles of the summand functions; without further assumptions, these more complex oracles require many calls to the simpler minimization oracles often available in practice. In this paper, we consider a modified convex problem requiring constrained version of the total variation oracles that can be solved with significantly fewer calls to the simple minimization oracles. We support our claims by showing results on graph cuts for 2D and 3D graphs △ Less

Submitted 27 May, 2019; originally announced May 2019.

arXiv:1904.03537 [pdf, other]

Convex-Concave Backtracking for Inertial Bregman Proximal Gradient Algorithms in Non-Convex Optimization

Authors: Mahesh Chandra Mukkamala, Peter Ochs, Thomas Pock, Shoham Sabach

Abstract: Backtracking line-search is an old yet powerful strategy for finding a better step sizes to be used in proximal gradient algorithms. The main principle is to locally find a simple convex upper bound of the objective function, which in turn controls the step size that is used. In case of inertial proximal gradient algorithms, the situation becomes much more difficult and usually leads to very restr… ▽ More Backtracking line-search is an old yet powerful strategy for finding a better step sizes to be used in proximal gradient algorithms. The main principle is to locally find a simple convex upper bound of the objective function, which in turn controls the step size that is used. In case of inertial proximal gradient algorithms, the situation becomes much more difficult and usually leads to very restrictive rules on the extrapolation parameter. In this paper, we show that the extrapolation parameter can be controlled by locally finding also a simple concave lower bound of the objective function. This gives rise to a double convex-concave backtracking procedure which allows for an adaptive choice of both the step size and extrapolation parameters. We apply this procedure to the class of inertial Bregman proximal gradient methods, and prove that any sequence generated by these algorithms converges globally to a critical point of the function at hand. Numerical experiments on a number of challenging non-convex problems in image processing and machine learning were conducted and show the power of combining inertial step and double backtracking strategy in achieving improved performances. △ Less

Submitted 5 November, 2019; v1 submitted 6 April, 2019; originally announced April 2019.

Comments: 29 pages

MSC Class: 90C25; 26B25; 49M27; 52A41; 65K05

arXiv:1904.01112 [pdf, other]

Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction

Authors: Florian Knoll, Kerstin Hammernik, Chi Zhang, Steen Moeller, Thomas Pock, Daniel K. Sodickson, Mehmet Akcakaya

Abstract: Following the success of deep learning in a wide range of applications, neural network-based machine learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deep learning techniques from computer vision and image processing have been successfully applied to non-linear image reconstruction in the spirit of compressed sens… ▽ More Following the success of deep learning in a wide range of applications, neural network-based machine learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deep learning techniques from computer vision and image processing have been successfully applied to non-linear image reconstruction in the spirit of compressed sensing for both low dose computed tomography and accelerated MRI. The additional integration of multi-coil information to recover missing k-space lines in the MRI reconstruction process, is still studied less frequently, even though it is the de-facto standard for currently used accelerated MR acquisitions. This manuscript provides an overview of the recent machine learning approaches that have been proposed specifically for improving parallel imaging. A general background introduction to parallel MRI is given that is structured around the classical view of image space and k-space based methods. Both linear and non-linear methods are covered, followed by a discussion of recent efforts to further improve parallel imaging using machine learning, and specifically using artificial neural networks. Image-domain based techniques that introduce improved regularizers are covered as well as k-space based methods, where the focus is on better interpolation strategies using neural networks. Issues and open problems are discussed as well as recent efforts for producing open datasets and benchmarks for the community. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: 14 pages, 7 figures

arXiv:1811.03721 [pdf, other]

Learning Energy Based Inpainting for Optical Flow

Authors: Christoph Vogel, Patrick Knöbelreiter, Thomas Pock

Abstract: Modern optical flow methods are often composed of a cascade of many independent steps or formulated as a black box neural network that is hard to interpret and analyze. In this work we seek for a plain, interpretable, but learnable solution. We propose a novel inpainting based algorithm that approaches the problem in three steps: feature selection and matching, selection of supporting points and e… ▽ More Modern optical flow methods are often composed of a cascade of many independent steps or formulated as a black box neural network that is hard to interpret and analyze. In this work we seek for a plain, interpretable, but learnable solution. We propose a novel inpainting based algorithm that approaches the problem in three steps: feature selection and matching, selection of supporting points and energy based inpainting. To facilitate the inference we propose an optimization layer that allows to backpropagate through 10K iterations of a first-order method without any numerical or memory problems. Compared to recent state-of-the-art networks, our modular CNN is very lightweight and competitive with other, more involved, inpainting based methods. △ Less

Submitted 8 November, 2018; originally announced November 2018.

Journal ref: Proc. Asian Conf. on Computer Vision (ACCV), 2018

arXiv:1804.03037 [pdf, other]

doi 10.1007/s11263-019-01261-6

3D Fluid Flow Estimation with Integrated Particle Reconstruction

Authors: Katrin Lasinger, Christoph Vogel, Thomas Pock, Konrad Schindler

Abstract: The standard approach to densely reconstruct the motion in a volume of fluid is to inject high-contrast tracer particles and record their motion with multiple high-speed cameras. Almost all existing work processes the acquired multi-view video in two separate steps, utilizing either a pure Eulerian or pure Lagrangian approach. Eulerian methods perform a voxel-based reconstruction of particles per… ▽ More The standard approach to densely reconstruct the motion in a volume of fluid is to inject high-contrast tracer particles and record their motion with multiple high-speed cameras. Almost all existing work processes the acquired multi-view video in two separate steps, utilizing either a pure Eulerian or pure Lagrangian approach. Eulerian methods perform a voxel-based reconstruction of particles per time step, followed by 3D motion estimation, with some form of dense matching between the precomputed voxel grids from different time steps. In this sequential procedure, the first step cannot use temporal consistency considerations to support the reconstruction, while the second step has no access to the original, high-resolution image data. Alternatively, Lagrangian methods reconstruct an explicit, sparse set of particles and track the individual particles over time. Physical constraints can only be incorporated in a post-processing step when interpolating the particle tracks to a dense motion field. We show, for the first time, how to jointly reconstruct both the individual tracer particles and a dense 3D fluid motion field from the image data, using an integrated energy minimization. Our hybrid Lagrangian/Eulerian model reconstructs individual particles, and at the same time recovers a dense 3D motion field in the entire domain. Making particles explicit greatly reduces the memory consumption and allows one to use the high-res input images for matching. Whereas the dense motion field makes it possible to include physical a-priori constraints and account for the incompressibility and viscosity of the fluid. The method exhibits greatly (~70%) improved results over our recently published baseline with two separate steps for 3D reconstruction and motion estimation. Our results with only two time steps are comparable to those of sota tracking-based methods that require much longer sequences. △ Less

Submitted 21 November, 2019; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: To appear in International Journal of Computer Vision (IJCV)

arXiv:1804.02872 [pdf, other]

doi 10.1088/1361-6501/aab5a0

Variational 3D-PIV with Sparse Descriptors

Authors: Katrin Lasinger, Christoph Vogel, Thomas Pock, Konrad Schindler

Abstract: 3D Particle Imaging Velocimetry (3D-PIV) aim to recover the flow field in a volume of fluid, which has been seeded with tracer particles and observed from multiple camera viewpoints. The first step of 3D-PIV is to reconstruct the 3D locations of the tracer particles from synchronous views of the volume. We propose a new method for iterative particle reconstruction (IPR), in which the locations and… ▽ More 3D Particle Imaging Velocimetry (3D-PIV) aim to recover the flow field in a volume of fluid, which has been seeded with tracer particles and observed from multiple camera viewpoints. The first step of 3D-PIV is to reconstruct the 3D locations of the tracer particles from synchronous views of the volume. We propose a new method for iterative particle reconstruction (IPR), in which the locations and intensities of all particles are inferred in one joint energy minimization. The energy function is designed to penalize deviations between the reconstructed 3D particles and the image evidence, while at the same time aiming for a sparse set of particles. We find that the new method, without any post-processing, achieves significantly cleaner particle volumes than a conventional, tomographic MART reconstruction, and can handle a wide range of particle densities. The second step of 3D-PIV is to then recover the dense motion field from two consecutive particle reconstructions. We propose a variational model, which makes it possible to directly include physical properties, such as incompressibility and viscosity, in the estimation of the motion field. To further exploit the sparse nature of the input data, we propose a novel, compact descriptor of the local particle layout. Hence, we avoid the memory-intensive storage of high-resolution intensity volumes. Our framework is generic and allows for a variety of different data costs (correlation measures) and regularizers. We quantitatively evaluate it with both the sum of squared differences (SSD) and the normalized cross-correlation (NCC), respectively with both a hard and a soft version of the incompressibility constraint. △ Less

Submitted 9 April, 2018; originally announced April 2018.

Comments: to be published in Measurement Science and Technology

arXiv:1802.04546 [pdf, other]

Robust Deformation Estimation in Wood-Composite Materials using Variational Optical Flow

Authors: Markus Hofinger, Thomas Pock, Thomas Moosbrugger

Abstract: Wood-composite materials are widely used today as they homogenize humidity related directional deformations. Quantification of these deformations as coefficients is important for construction and engineering and topic of current research but still a manual process. This work introduces a novel computer vision approach that automatically extracts these properties directly from scans of the wooden… ▽ More Wood-composite materials are widely used today as they homogenize humidity related directional deformations. Quantification of these deformations as coefficients is important for construction and engineering and topic of current research but still a manual process. This work introduces a novel computer vision approach that automatically extracts these properties directly from scans of the wooden specimens, taken at different humidity levels during the long lasting humidity conditioning process. These scans are used to compute a humidity dependent deformation field for each pixel, from which the desired coefficients can easily be calculated. The overall method includes automated registration of the wooden blocks, numerical optimization to compute a variational optical flow field which is further used to calculate dense strain fields and finally the engineering coefficients and their variance throughout the wooden blocks. The methods regularization is fully parameterizable which allows to model and suppress artifacts due to surface appearance changes of the specimens from mold, cracks, etc. that typically arise in the conditioning process. △ Less

Submitted 13 February, 2018; originally announced February 2018.

Comments: 8 pages, 8 figures, originally published in 23 rd Computer Vision Winter Workshop proceedings 2018 http://cmp.felk.cvut.cz/cvww2018/papers/28.pdf

Journal ref: 23rd Computer Vision Winter Workshop proceedings February 2018 page 97-104

arXiv:1710.01749 [pdf, other]

Semantic 3D Reconstruction with Finite Element Bases

Authors: Audrey Richard, Christoph Vogel, Maros Blaha, Thomas Pock, Konrad Schindler

Abstract: We propose a novel framework for the discretisation of multi-label problems on arbitrary, continuous domains. Our work bridges the gap between general FEM discretisations, and labeling problems that arise in a variety of computer vision tasks, including for instance those derived from the generalised Potts model. Starting from the popular formulation of labeling as a convex relaxation by functiona… ▽ More We propose a novel framework for the discretisation of multi-label problems on arbitrary, continuous domains. Our work bridges the gap between general FEM discretisations, and labeling problems that arise in a variety of computer vision tasks, including for instance those derived from the generalised Potts model. Starting from the popular formulation of labeling as a convex relaxation by functional lifting, we show that FEM discretisation is valid for the most general case, where the regulariser is anisotropic and non-metric. While our findings are generic and applicable to different vision problems, we demonstrate their practical implementation in the context of semantic 3D reconstruction, where such regularisers have proved particularly beneficial. The proposed FEM approach leads to a smaller memory footprint as well as faster computation, and it constitutes a very simple way to enable variable, adaptive resolution within the same model. △ Less

Submitted 4 October, 2017; originally announced October 2017.

Journal ref: BMVC 2017, 28th British Machine Vision Conference

arXiv:1707.06427 [pdf, other]

Scalable Full Flow with Learned Binary Descriptors

Authors: Gottfried Munda, Alexander Shekhovtsov, Patrick Knöbelreiter, Thomas Pock

Abstract: We propose a method for large displacement optical flow in which local matching costs are learned by a convolutional neural network (CNN) and a smoothness prior is imposed by a conditional random field (CRF). We tackle the computation- and memory-intensive operations on the 4D cost volume by a min-projection which reduces memory complexity from quadratic to linear and binary descriptors for effici… ▽ More We propose a method for large displacement optical flow in which local matching costs are learned by a convolutional neural network (CNN) and a smoothness prior is imposed by a conditional random field (CRF). We tackle the computation- and memory-intensive operations on the 4D cost volume by a min-projection which reduces memory complexity from quadratic to linear and binary descriptors for efficient matching. This enables evaluation of the cost on the fly and allows to perform learning and CRF inference on high resolution images without ever storing the 4D cost volume. To address the problem of learning binary descriptors we propose a new hybrid learning scheme. In contrast to current state of the art approaches for learning binary CNNs we can compute the exact non-zero gradient within our model. We compare several methods for training binary descriptors and show results on public available benchmarks. △ Less

Submitted 20 July, 2017; originally announced July 2017.

Comments: GCPR 2017

arXiv:1704.00447 [pdf, other]

Learning a Variational Network for Reconstruction of Accelerated MRI Data

Authors: Kerstin Hammernik, Teresa Klatzer, Erich Kobler, Michael P Recht, Daniel K Sodickson, Thomas Pock, Florian Knoll

Abstract: Purpose: To allow fast and high-quality reconstruction of clinical accelerated multi-coil MR data by learning a variational network that combines the mathematical structure of variational models with deep learning. Theory and Methods: Generalized compressed sensing reconstruction formulated as a variational model is embedded in an unrolled gradient descent scheme. All parameters of this formulat… ▽ More Purpose: To allow fast and high-quality reconstruction of clinical accelerated multi-coil MR data by learning a variational network that combines the mathematical structure of variational models with deep learning. Theory and Methods: Generalized compressed sensing reconstruction formulated as a variational model is embedded in an unrolled gradient descent scheme. All parameters of this formulation, including the prior model defined by filter kernels and activation functions as well as the data term weights, are learned during an offline training procedure. The learned model can then be applied online to previously unseen data. Results: The variational network approach is evaluated on a clinical knee imaging protocol. The variational network reconstructions outperform standard reconstruction algorithms in terms of image quality and residual artifacts for all tested acceleration factors and sampling patterns. Conclusion: Variational network reconstructions preserve the natural appearance of MR images as well as pathologies that were not included in the training data set. Due to its high computational performance, i.e., reconstruction time of 193 ms on a single graphics card, and the omission of parameter tuning once the network is trained, this new approach to image reconstruction can easily be integrated into clinical workflow. △ Less

Submitted 3 April, 2017; originally announced April 2017.

Comments: Submitted to Magnetic Resonance in Medicine

arXiv:1703.05161 [pdf, other]

Real-Time Panoramic Tracking for Event Cameras

Authors: Christian Reinbacher, Gottfried Munda, Thomas Pock

Abstract: Event cameras are a paradigm shift in camera technology. Instead of full frames, the sensor captures a sparse set of events caused by intensity changes. Since only the changes are transferred, those cameras are able to capture quick movements of objects in the scene or of the camera itself. In this work we propose a novel method to perform camera tracking of event cameras in a panoramic setting wi… ▽ More Event cameras are a paradigm shift in camera technology. Instead of full frames, the sensor captures a sparse set of events caused by intensity changes. Since only the changes are transferred, those cameras are able to capture quick movements of objects in the scene or of the camera itself. In this work we propose a novel method to perform camera tracking of event cameras in a panoramic setting with three degrees of freedom. We propose a direct camera tracking formulation, similar to state-of-the-art in visual odometry. We show that the minimal information needed for simultaneous tracking and map** is the spatial position of events, without using the appearance of the imaged scene point. We verify the robustness to fast camera movements and dynamic objects in the scene on a recently proposed dataset and self-recorded sequences. △ Less

Submitted 21 March, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

Comments: Accepted to International Conference on Computational Photography 2017

arXiv:1611.10229 [pdf, other]

End-to-End Training of Hybrid CNN-CRF Models for Stereo

Authors: Patrick Knöbelreiter, Christian Reinbacher, Alexander Shekhovtsov, Thomas Pock

Abstract: We propose a novel and principled hybrid CNN+CRF model for stereo estimation. Our model allows to exploit the advantages of both, convolutional neural networks (CNNs) and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to compute the unary and binary costs of the CRF. For inference, we ap… ▽ More We propose a novel and principled hybrid CNN+CRF model for stereo estimation. Our model allows to exploit the advantages of both, convolutional neural networks (CNNs) and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to compute the unary and binary costs of the CRF. For inference, we apply a recently proposed highly parallel dual block descent algorithm which only needs a small fixed number of iterations to compute a high-quality approximate minimizer. As the main contribution of the paper, we propose a theoretically sound method based on the structured output support vector machine (SSVM) to train the hybrid CNN+CRF model on large-scale data end-to-end. Our trained models perform very well despite the fact that we are using shallow CNNs and do not apply any kind of post-processing to the final output of the CRF. We evaluate our combined models on challenging stereo benchmarks such as Middlebury 2014 and Kitti 2015 and also investigate the performance of each individual component. △ Less

Submitted 3 May, 2017; v1 submitted 30 November, 2016; originally announced November 2016.

Comments: To appear at CVPR 2017

arXiv:1607.06283 [pdf, other]

Real-Time Intensity-Image Reconstruction for Event Cameras Using Manifold Regularisation

Authors: Christian Reinbacher, Gottfried Graber, Thomas Pock

Abstract: Event cameras or neuromorphic cameras mimic the human perception system as they measure the per-pixel intensity change rather than the actual intensity level. In contrast to traditional cameras, such cameras capture new information about the scene at MHz frequency in the form of sparse events. The high temporal resolution comes at the cost of losing the familiar per-pixel intensity information. In… ▽ More Event cameras or neuromorphic cameras mimic the human perception system as they measure the per-pixel intensity change rather than the actual intensity level. In contrast to traditional cameras, such cameras capture new information about the scene at MHz frequency in the form of sparse events. The high temporal resolution comes at the cost of losing the familiar per-pixel intensity information. In this work we propose a variational model that accurately models the behaviour of event cameras, enabling reconstruction of intensity images with arbitrary frame rate in real-time. Our method is formulated on a per-event-basis, where we explicitly incorporate information about the asynchronous nature of events via an event manifold induced by the relative timestamps of events. In our experiments we verify that solving the variational model on the manifold produces high-quality images without explicitly estimating optical flow. △ Less

Submitted 4 August, 2016; v1 submitted 21 July, 2016; originally announced July 2016.

Comments: Accepted to BMVC 2016 as oral presentation, 12 pages

arXiv:1601.06274 [pdf, other]

Solving Dense Image Matching in Real-Time using Discrete-Continuous Optimization

Authors: Alexander Shekhovtsov, Christian Reinbacher, Gottfried Graber, Thomas Pock

Abstract: Dense image matching is a fundamental low-level problem in Computer Vision, which has received tremendous attention from both discrete and continuous optimization communities. The goal of this paper is to combine the advantages of discrete and continuous optimization in a coherent framework. We devise a model based on energy minimization, to be optimized by both discrete and continuous algorithms… ▽ More Dense image matching is a fundamental low-level problem in Computer Vision, which has received tremendous attention from both discrete and continuous optimization communities. The goal of this paper is to combine the advantages of discrete and continuous optimization in a coherent framework. We devise a model based on energy minimization, to be optimized by both discrete and continuous algorithms in a consistent way. In the discrete setting, we propose a novel optimization algorithm that can be massively parallelized. In the continuous setting we tackle the problem of non-convex regularizers by a formulation based on differences of convex functions. The resulting hybrid discrete-continuous algorithm can be efficiently accelerated by modern GPUs and we demonstrate its real-time performance for the applications of dense stereo matching and optical flow. △ Less

Submitted 23 January, 2016; originally announced January 2016.

Comments: 21 st Computer Vision Winter Workshop

arXiv:1511.06566 [pdf, other]

doi 10.1007/s10851-016-0692-2

Acceleration of the PDHGM on strongly convex subspaces

Authors: Tuomo Valkonen, Thomas Pock

Abstract: We propose several variants of the primal-dual method due to Chambolle and Pock. Without requiring full strong convexity of the objective functions, our methods are accelerated on subspaces with strong convexity. This yields mixed rates, $O(1/N^2)$ with respect to initialisation and $O(1/N)$ with respect to the dual sequence, and the residual part of the primal sequence. We demonstrate the efficac… ▽ More We propose several variants of the primal-dual method due to Chambolle and Pock. Without requiring full strong convexity of the objective functions, our methods are accelerated on subspaces with strong convexity. This yields mixed rates, $O(1/N^2)$ with respect to initialisation and $O(1/N)$ with respect to the dual sequence, and the residual part of the primal sequence. We demonstrate the efficacy of the proposed methods on image processing problems lacking strong convexity, such as total generalised variation denoising and total variation deblurring. △ Less

Submitted 10 February, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

MSC Class: 90C25; 49M29; 94A08

arXiv:1508.02848 [pdf, other]

doi 10.1109/TPAMI.2016.2596743

Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration

Authors: Yun** Chen, Thomas Pock

Abstract: Image restoration is a long-standing problem in low-level computer vision with many interesting applications. We describe a flexible learning framework based on the concept of nonlinear reaction diffusion models for various image restoration problems. By embodying recent improvements in nonlinear diffusion models, we propose a dynamic nonlinear reaction diffusion model with time-dependent paramete… ▽ More Image restoration is a long-standing problem in low-level computer vision with many interesting applications. We describe a flexible learning framework based on the concept of nonlinear reaction diffusion models for various image restoration problems. By embodying recent improvements in nonlinear diffusion models, we propose a dynamic nonlinear reaction diffusion model with time-dependent parameters (\ie, linear filters and influence functions). In contrast to previous nonlinear diffusion models, all the parameters, including the filters and the influence functions, are simultaneously learned from training data through a loss based approach. We call this approach TNRD -- \textit{Trainable Nonlinear Reaction Diffusion}. The TNRD approach is applicable for a variety of image restoration tasks by incorporating appropriate reaction force. We demonstrate its capabilities with three representative applications, Gaussian image denoising, single image super resolution and JPEG deblocking. Experiments show that our trained nonlinear diffusion models largely benefit from the training of the parameters and finally lead to the best reported performance on common test datasets for the tested applications. Our trained models preserve the structural simplicity of diffusion models and take only a small number of diffusion steps, thus are highly efficient. Moreover, they are also well-suited for parallel computation on GPUs, which makes the inference procedure extremely fast. △ Less

Submitted 20 August, 2016; v1 submitted 12 August, 2015; originally announced August 2015.

Comments: 14 pages, 13 figures, to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:1503.05768 [pdf, other]

On learning optimized reaction diffusion processes for effective image restoration

Authors: Yun** Chen, Wei Yu, Thomas Pock

Abstract: For several decades, image restoration remains an active research topic in low-level computer vision and hence new approaches are constantly emerging. However, many recently proposed algorithms achieve state-of-the-art performance only at the expense of very high computation time, which clearly limits their practical relevance. In this work, we propose a simple but effective approach with both hig… ▽ More For several decades, image restoration remains an active research topic in low-level computer vision and hence new approaches are constantly emerging. However, many recently proposed algorithms achieve state-of-the-art performance only at the expense of very high computation time, which clearly limits their practical relevance. In this work, we propose a simple but effective approach with both high computational efficiency and high restoration quality. We extend conventional nonlinear reaction diffusion models by several parametrized linear filters as well as several parametrized influence functions. We propose to train the parameters of the filters and the influence functions through a loss based approach. Experiments show that our trained nonlinear reaction diffusion models largely benefit from the training of the parameters and finally lead to the best reported performance on common test datasets for image restoration. Due to their structural simplicity, our trained models are highly efficient and are also well-suited for parallel computation on GPUs. △ Less

Submitted 25 March, 2015; v1 submitted 19 March, 2015; originally announced March 2015.

Comments: 9 pages, 3 figures, 3 tables. CVPR2015 oral presentation together with the supplemental material of 13 pages, 8 pages (Notes on diffusion networks)

arXiv:1502.07770 [pdf, other]

Total variation on a tree

Authors: Vladimir Kolmogorov, Thomas Pock, Michal Rolinek

Abstract: We consider the problem of minimizing the continuous valued total variation subject to different unary terms on trees and propose fast direct algorithms based on dynamic programming to solve these problems. We treat both the convex and the non-convex case and derive worst case complexities that are equal or better than existing methods. We show applications to total variation based 2D image proces… ▽ More We consider the problem of minimizing the continuous valued total variation subject to different unary terms on trees and propose fast direct algorithms based on dynamic programming to solve these problems. We treat both the convex and the non-convex case and derive worst case complexities that are equal or better than existing methods. We show applications to total variation based 2D image processing and computer vision problems based on a Lagrangian decomposition approach. The resulting algorithms are very efficient, offer a high degree of parallelism and come along with memory requirements which are only in the order of the number of image pixels. △ Less

Submitted 25 April, 2016; v1 submitted 26 February, 2015; originally announced February 2015.

Comments: accepted to SIAM Journal on Imaging Sciences (SIIMS)

arXiv:1404.5344 [pdf, other]

doi 10.1109/LSP.2014.2337274

A higher-order MRF based variational model for multiplicative noise reduction

Authors: Yun** Chen, Wensen Feng, René Ranftl, Hong Qiao, Thomas Pock

Abstract: The Fields of Experts (FoE) image prior model, a filter-based higher-order Markov Random Fields (MRF) model, has been shown to be effective for many image restoration problems. Motivated by the successes of FoE-based approaches, in this letter, we propose a novel variational model for multiplicative noise reduction based on the FoE image prior model. The resulted model corresponds to a non-convex… ▽ More The Fields of Experts (FoE) image prior model, a filter-based higher-order Markov Random Fields (MRF) model, has been shown to be effective for many image restoration problems. Motivated by the successes of FoE-based approaches, in this letter, we propose a novel variational model for multiplicative noise reduction based on the FoE image prior model. The resulted model corresponds to a non-convex minimization problem, which can be solved by a recently published non-convex optimization algorithm. Experimental results based on synthetic speckle noise and real synthetic aperture radar (SAR) images suggest that the performance of our proposed method is on par with the best published despeckling algorithm. Besides, our proposed model comes along with an additional advantage, that the inference is extremely efficient. {Our GPU based implementation takes less than 1s to produce state-of-the-art despeckling performance.} △ Less

Submitted 7 July, 2014; v1 submitted 21 April, 2014; originally announced April 2014.

Comments: 5 pages, 5 figures, to appear in IEEE Signal Processing Letters

arXiv:1404.4805 [pdf, other]

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization

Authors: Peter Ochs, Yun** Chen, Thomas Brox, Thomas Pock

Abstract: In this paper we study an algorithm for solving a minimization problem composed of a differentiable (possibly non-convex) and a convex (possibly non-differentiable) function. The algorithm iPiano combines forward-backward splitting with an inertial force. It can be seen as a non-smooth split version of the Heavy-ball method from Polyak. A rigorous analysis of the algorithm for the proposed class o… ▽ More In this paper we study an algorithm for solving a minimization problem composed of a differentiable (possibly non-convex) and a convex (possibly non-differentiable) function. The algorithm iPiano combines forward-backward splitting with an inertial force. It can be seen as a non-smooth split version of the Heavy-ball method from Polyak. A rigorous analysis of the algorithm for the proposed class of problems yields global convergence of the function values and the arguments. This makes the algorithm robust for usage on non-convex problems. The convergence result is obtained based on the \KL inequality. This is a very weak restriction, which was used to prove convergence for several other gradient methods. First, an abstract convergence theorem for a generic algorithm is proved, and, then iPiano is shown to satisfy the requirements of this theorem. Furthermore, a convergence rate is established for the general problem class. We demonstrate iPiano on computer vision problems: image denoising with learned priors and diffusion based image compression. △ Less

Submitted 18 April, 2014; originally announced April 2014.

Comments: 32pages, 7 figures, to appear in SIAM Journal on Imaging Sciences

arXiv:1403.3522 [pdf, other]

doi 10.1007/s10851-014-0523-2

An inertial forward-backward algorithm for monotone inclusions

Authors: Dirk A. Lorenz, Thomas Pock

Abstract: In this paper, we propose an inertial forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being co-coercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger class of problems including convex-concave saddle point problems and general monotone inclusions. We prove con… ▽ More In this paper, we propose an inertial forward backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being co-coercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger class of problems including convex-concave saddle point problems and general monotone inclusions. We prove convergence of the algorithm in a Hilbert space setting and show that several recently proposed first-order methods can be obtained as special cases of the general algorithm. Numerical results show that the proposed algorithm converges faster than existing methods, while kee** the computational cost of each iteration basically unchanged. △ Less

Submitted 12 September, 2014; v1 submitted 14 March, 2014; originally announced March 2014.

Comments: The final publication is available at http://link.springer.com

arXiv:1401.4112 [pdf, other]

A bi-level view of inpainting - based image compression

Authors: Yun** Chen, René Ranftl, Thomas Pock

Abstract: Inpainting based image compression approaches, especially linear and non-linear diffusion models, are an active research topic for lossy image compression. The major challenge in these compression models is to find a small set of descriptive supporting points, which allow for an accurate reconstruction of the original image. It turns out in practice that this is a challenging problem even for the… ▽ More Inpainting based image compression approaches, especially linear and non-linear diffusion models, are an active research topic for lossy image compression. The major challenge in these compression models is to find a small set of descriptive supporting points, which allow for an accurate reconstruction of the original image. It turns out in practice that this is a challenging problem even for the simplest Laplacian interpolation model. In this paper, we revisit the Laplacian interpolation compression model and introduce two fast algorithms, namely successive preconditioning primal dual algorithm and the recently proposed iPiano algorithm, to solve this problem efficiently. Furthermore, we extend the Laplacian interpolation based compression model to a more general form, which is based on principles from bi-level optimization. We investigate two different variants of the Laplacian model, namely biharmonic interpolation and smoothed Total Variation regularization. Our numerical results show that significant improvements can be obtained from the biharmonic interpolation model, and it can recover an image with very high quality from only 5% pixels. △ Less

Submitted 9 May, 2014; v1 submitted 16 January, 2014; originally announced January 2014.

Comments: 8 pages, 4 figures, best paper award of CVWW 2014, Computer Vision Winter Workshop, Křtiny, Czech Republic, 3-5th February 2014

arXiv:1401.4107 [pdf, other]

doi 10.1007/978-3-642-40602-7_30

Revisiting loss-specific training of filter-based MRFs for image restoration

Authors: Yun** Chen, Thomas Pock, René Ranftl, Horst Bischof

Abstract: It is now well known that Markov random fields (MRFs) are particularly effective for modeling image priors in low-level vision. Recent years have seen the emergence of two main approaches for learning the parameters in MRFs: (1) probabilistic learning using sampling-based algorithms and (2) loss-specific training based on MAP estimate. After investigating existing training approaches, it turns out… ▽ More It is now well known that Markov random fields (MRFs) are particularly effective for modeling image priors in low-level vision. Recent years have seen the emergence of two main approaches for learning the parameters in MRFs: (1) probabilistic learning using sampling-based algorithms and (2) loss-specific training based on MAP estimate. After investigating existing training approaches, it turns out that the performance of the loss-specific training has been significantly underestimated in existing work. In this paper, we revisit this approach and use techniques from bi-level optimization to solve it. We show that we can get a substantial gain in the final performance by solving the lower-level problem in the bi-level framework with high accuracy using our newly proposed algorithm. As a result, our trained model is on par with highly specialized image denoising algorithms and clearly outperforms probabilistically trained MRF models. Our findings suggest that for the loss-specific training scheme, solving the lower-level problem with higher accuracy is beneficial. Our trained model comes along with the additional advantage, that inference is extremely efficient. Our GPU-based implementation takes less than 1s to produce state-of-the-art performance. △ Less

Submitted 16 January, 2014; originally announced January 2014.

Comments: 10 pages, 2 figures, appear at 35th German Conference, GCPR 2013, Saarbrücken, Germany, September 3-6, 2013. Proceedings

arXiv:1401.4105 [pdf, other]

Learning $\ell_1$-based analysis and synthesis sparsity priors using bi-level optimization

Authors: Yun** Chen, Thomas Pock, Horst Bischof

Abstract: We consider the analysis operator and synthesis dictionary learning problems based on the the $\ell_1$ regularized sparse representation model. We reveal the internal relations between the $\ell_1$-based analysis model and synthesis model. We then introduce an approach to learn both analysis operator and synthesis dictionary simultaneously by using a unified framework of bi-level optimization. Our… ▽ More We consider the analysis operator and synthesis dictionary learning problems based on the the $\ell_1$ regularized sparse representation model. We reveal the internal relations between the $\ell_1$-based analysis model and synthesis model. We then introduce an approach to learn both analysis operator and synthesis dictionary simultaneously by using a unified framework of bi-level optimization. Our aim is to learn a meaningful operator (dictionary) such that the minimum energy solution of the analysis (synthesis)-prior based model is as close as possible to the ground-truth. We solve the bi-level optimization problem using the implicit differentiation technique. Moreover, we demonstrate the effectiveness of our leaning approach by applying the learned analysis operator (dictionary) to the image denoising task and comparing its performance with state-of-the-art methods. Under this unified framework, we can compare the performance of the two types of priors. △ Less

Submitted 16 January, 2014; originally announced January 2014.

Comments: 5 pages, 1 figure, appear at the Workshop on Analysis Operator Learning vs. Dictionary Learning, NIPS 2012

arXiv:1401.2804 [pdf, other]

doi 10.1109/TIP.2014.2299065

Insights into analysis operator learning: From patch-based sparse models to higher-order MRFs

Authors: Yun** Chen, René Ranftl, Thomas Pock

Abstract: This paper addresses a new learning algorithm for the recently introduced co-sparse analysis model. First, we give new insights into the co-sparse analysis model by establishing connections to filter-based MRF models, such as the Field of Experts (FoE) model of Roth and Black. For training, we introduce a technique called bi-level optimization to learn the analysis operators. Compared to existing… ▽ More This paper addresses a new learning algorithm for the recently introduced co-sparse analysis model. First, we give new insights into the co-sparse analysis model by establishing connections to filter-based MRF models, such as the Field of Experts (FoE) model of Roth and Black. For training, we introduce a technique called bi-level optimization to learn the analysis operators. Compared to existing analysis operator learning approaches, our training procedure has the advantage that it is unconstrained with respect to the analysis operator. We investigate the effect of different aspects of the co-sparse analysis model and show that the sparsity promoting function (also called penalty function) is the most important factor in the model. In order to demonstrate the effectiveness of our training approach, we apply our trained models to various classical image restoration problems. Numerical experiments show that our trained models clearly outperform existing analysis operator learning approaches and are on par with state-of-the-art image denoising algorithms. Our approach develops a framework that is intuitive to understand and easy to implement. △ Less

Submitted 13 January, 2014; originally announced January 2014.

Comments: 13 pages, 10 figures, accepted to IEEE Image Processing

arXiv:1304.7153 [pdf, other]

A Convex Approach for Image Hallucination

Authors: Peter Innerhofer, Thomas Pock

Abstract: In this paper we propose a global convex approach for image hallucination. Altering the idea of classical multi image super resolution (SU) systems to single image SU, we incorporate aligned images to hallucinate the output. Our work is based on the paper of Tappen et al. where they use a non-convex model for image hallucination. In comparison we formulate a convex primal optimization problem and… ▽ More In this paper we propose a global convex approach for image hallucination. Altering the idea of classical multi image super resolution (SU) systems to single image SU, we incorporate aligned images to hallucinate the output. Our work is based on the paper of Tappen et al. where they use a non-convex model for image hallucination. In comparison we formulate a convex primal optimization problem and derive a fast converging primal-dual algorithm with a global optimal solution. We use a database with face images to incorporate high-frequency details to the high-resolution output. We show that we can achieve state-of-the-art results by using a convex approach. △ Less

Submitted 26 April, 2013; originally announced April 2013.

Comments: submitted to ÖAGM-AAPR 2013, 8 pages, 3 figures

Report number: OAGM-AAPR/2013/18

Showing 1–50 of 51 results for author: Pock, T