-
Enumerating Complexity Revisited
Authors:
Alexander Shekhovtsov,
Georgii Zakharov
Abstract:
Consider a subset of positive integers $S$. In this paper, we reduce the upper bound on the length of a minimum program that enumerates $S$ in terms of the probability of $S$ being enumerated by a random program.
So far, the best-known upper bound was given by Solovay. Solovay proved that the minimum length of a program enumerating $S$ is bounded by $3$ times minus binary logarithm of the probab…
▽ More
Consider a subset of positive integers $S$. In this paper, we reduce the upper bound on the length of a minimum program that enumerates $S$ in terms of the probability of $S$ being enumerated by a random program.
So far, the best-known upper bound was given by Solovay. Solovay proved that the minimum length of a program enumerating $S$ is bounded by $3$ times minus binary logarithm of the probability that a random program enumerates $S$. Later, Vereshchagin showed that the constant can be improved from $3$ to $2$ for finite sets. By improving the method proposed by Solovay, we demonstrate that any bound for finite sets implies the same bound for infinite sets, modulo logarithmic factors. Thus, the constant can be replaced by $2$ for every set $S$ due to the result of Vereshchagin.
△ Less
Submitted 14 December, 2023; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Symmetric Equilibrium Learning of VAEs
Authors:
Boris Flach,
Dmitrij Schlesinger,
Alexander Shekhovtsov
Abstract:
We view variational autoencoders (VAE) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. The standard learning approach for VAEs is the maximisation of the evidence lower bound (ELBO). It is asymmetric in that it aims at learning a latent variable model while using the encoder as an auxiliary means only. Moreover, it requires a…
▽ More
We view variational autoencoders (VAE) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. The standard learning approach for VAEs is the maximisation of the evidence lower bound (ELBO). It is asymmetric in that it aims at learning a latent variable model while using the encoder as an auxiliary means only. Moreover, it requires a closed form a-priori latent distribution. This limits its applicability in more complex scenarios, such as general semi-supervised learning and employing complex generative models as priors. We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling. The flexibility and simplicity of this approach allows its application to a wide range of learning scenarios and downstream tasks.
△ Less
Submitted 12 March, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Generalized Differentiable RANSAC
Authors:
Tong Wei,
Yash Patel,
Alexander Shekhovtsov,
Jiri Matas,
Daniel Barath
Abstract:
We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models esti…
▽ More
We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models estimated within $\nabla$-RANSAC to guide the network learning accurate and useful inlier probabilities or to train feature detection and matching networks. Our method directly maximizes the probability of drawing a good hypothesis, allowing us to learn better sampling distributions. We test $\nabla$-RANSAC on various real-world scenarios on fundamental and essential matrix estimation, and 3D point cloud registration, outdoors and indoors, with handcrafted and learning-based features. It is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives. The code and trained models are available at https://github.com/weitong8591/differentiable_ransac.
△ Less
Submitted 8 September, 2023; v1 submitted 26 December, 2022;
originally announced December 2022.
-
Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators
Authors:
Alexander Shekhovtsov
Abstract:
Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simpl…
▽ More
Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simplicity and efficiency, in particular in deep networks where unbiased estimators are impractical. Several techniques were proposed to improve over ST while kee** the same low computational complexity: Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical analysis of bias and variance of these methods in order to understand tradeoffs and verify the originally claimed properties. The presented theoretical results allow for better understanding of these methods and in some cases reveal serious issues.
△ Less
Submitted 15 October, 2021; v1 submitted 7 October, 2021;
originally announced October 2021.
-
VAE Approximation Error: ELBO and Exponential Families
Authors:
Alexander Shekhovtsov,
Dmitrij Schlesinger,
Boris Flach
Abstract:
The importance of Variational Autoencoders reaches far beyond standalone generative models -- the approach is also used for learning latent representations and can be generalized to semi-supervised learning. This requires a thorough analysis of their commonly known shortcomings: posterior collapse and approximation errors. This paper analyzes VAE approximation errors caused by the combination of t…
▽ More
The importance of Variational Autoencoders reaches far beyond standalone generative models -- the approach is also used for learning latent representations and can be generalized to semi-supervised learning. This requires a thorough analysis of their commonly known shortcomings: posterior collapse and approximation errors. This paper analyzes VAE approximation errors caused by the combination of the ELBO objective and encoder models from conditional exponential families, including, but not limited to, commonly used conditionally independent discrete and continuous models. We characterize subclasses of generative models consistent with these encoder families. We show that the ELBO optimizer is pulled away from the likelihood optimizer towards the consistent subset and study this effect experimentally. Importantly, this subset can not be enlarged, and the respective error cannot be decreased, by considering deeper encoder/decoder networks.
△ Less
Submitted 11 April, 2022; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks
Authors:
Alexander Shekhovtsov,
Viktor Yanush
Abstract:
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches, proposing a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete wei…
▽ More
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches, proposing a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. At the same time, ST methods can be truly derived as estimators in the stochastic binary network (SBN) model with Bernoulli weights. We advance these derivations to a more complete and systematic study. We analyze properties, estimation accuracy, obtain different forms of correct ST estimators for activations and weights, explain existing empirical approaches and their shortcomings, explain how latent weights arise from the mirror descent method when optimizing over probabilities. This allows to reintroduce ST methods, long known empirically, as sound approximations, apply them with clarity and develop further improvements.
△ Less
Submitted 19 October, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
Authors:
Alexander Shekhovtsov,
Viktor Yanush,
Boris Flach
Abstract:
In neural networks with binary activations and or binary weights the training by gradient descent is complicated as the model has piecewise constant response. We consider stochastic binary networks, obtained by adding noises in front of activations. The expected model response becomes a smooth function of parameters, its gradient is well defined but it is challenging to estimate it accurately. We…
▽ More
In neural networks with binary activations and or binary weights the training by gradient descent is complicated as the model has piecewise constant response. We consider stochastic binary networks, obtained by adding noises in front of activations. The expected model response becomes a smooth function of parameters, its gradient is well defined but it is challenging to estimate it accurately. We propose a new method for this estimation problem combining sampling and analytic approximation steps. The method has a significantly reduced variance at the price of a small bias which gives a very practical tradeoff in comparison with existing unbiased and biased estimators. We further show that one extra linearization step leads to a deep straight-through estimator previously known only as an ad-hoc heuristic. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models with both proposed methods.
△ Less
Submitted 4 November, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models
Authors:
Siddharth Tourani,
Alexander Shekhovtsov,
Carsten Rother,
Bogdan Savchynskyy
Abstract:
Dense, discrete Graphical Models with pairwise potentials are a powerful class of models which are employed in state-of-the-art computer vision and bio-imaging applications. This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, the Max Product Linear Programming (MPLP) algorithm, we d…
▽ More
Dense, discrete Graphical Models with pairwise potentials are a powerful class of models which are employed in state-of-the-art computer vision and bio-imaging applications. This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, the Max Product Linear Programming (MPLP) algorithm, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin, including the state-of-the-art solver Tree-Reweighted Sequential (TRWS) message-passing algorithm. Additionally, our solver is highly parallel, in contrast to TRWS, which gives a further boost in performance with the proposed GPU and multi-thread CPU implementations. We verify the superiority of our algorithm on dense problems from publicly available benchmarks, as well, as a new benchmark for 6D Object Pose estimation. We also provide an ablation study with respect to graph density.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Taxonomy of Dual Block-Coordinate Ascent Methods for Discrete Energy Minimization
Authors:
Siddharth Tourani,
Alexander Shekhovtsov,
Carsten Rother,
Bogdan Savchynskyy
Abstract:
We consider the maximum-a-posteriori inference problem in discrete graphical models and study solvers based on the dual block-coordinate ascent rule. We map all existing solvers in a single framework, allowing for a better understanding of their design principles. We theoretically show that some block-optimizing updates are sub-optimal and how to strictly improve them. On a wide range of problem i…
▽ More
We consider the maximum-a-posteriori inference problem in discrete graphical models and study solvers based on the dual block-coordinate ascent rule. We map all existing solvers in a single framework, allowing for a better understanding of their design principles. We theoretically show that some block-optimizing updates are sub-optimal and how to strictly improve them. On a wide range of problem instances of varying graph connectivity, we study the performance of existing solvers as well as new variants that can be obtained within the framework. As a result of this exploration we build a new state-of-the art solver, performing uniformly better on the whole range of test instances.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems
Authors:
Patrick Knöbelreiter,
Christian Sormann,
Alexander Shekhovtsov,
Friedrich Fraundorfer,
Thomas Pock
Abstract:
It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest infer…
▽ More
It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Graph-based Nearest Neighbor Search: From Practice to Theory
Authors:
Liudmila Prokhorenkova,
Aleksandr Shekhovtsov
Abstract:
Graph-based approaches are empirically shown to be very successful for the nearest neighbor search (NNS). However, there has been very little research on their theoretical guarantees. We fill this gap and rigorously analyze the performance of graph-based NNS algorithms, specifically focusing on the low-dimensional (d << \log n) regime. In addition to the basic greedy algorithm on nearest neighbor…
▽ More
Graph-based approaches are empirically shown to be very successful for the nearest neighbor search (NNS). However, there has been very little research on their theoretical guarantees. We fill this gap and rigorously analyze the performance of graph-based NNS algorithms, specifically focusing on the low-dimensional (d << \log n) regime. In addition to the basic greedy algorithm on nearest neighbor graphs, we also analyze the most successful heuristics commonly used in practice: speeding up via adding shortcut edges and improving accuracy via maintaining a dynamic list of candidates. We believe that our theoretical insights supported by experimental analysis are an important step towards understanding the limits and benefits of graph-based NNS algorithms.
△ Less
Submitted 20 August, 2020; v1 submitted 1 July, 2019;
originally announced July 2019.
-
Stochastic Normalizations as Bayesian Learning
Authors:
Alexander Shekhovtsov,
Boris Flach
Abstract:
In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply th…
▽ More
In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior.
△ Less
Submitted 1 November, 2018;
originally announced November 2018.
-
Feed-forward Uncertainty Propagation in Belief and Neural Networks
Authors:
Alexander Shekhovtsov,
Boris Flach,
Michal Busta
Abstract:
We propose a feed-forward inference method applicable to belief and neural networks. In a belief network, the method estimates an approximate factorized posterior of all hidden units given the input. In neural networks the method propagates uncertainty of the input through all the layers. In neural networks with injected noise, the method analytically takes into account uncertainties resulting fro…
▽ More
We propose a feed-forward inference method applicable to belief and neural networks. In a belief network, the method estimates an approximate factorized posterior of all hidden units given the input. In neural networks the method propagates uncertainty of the input through all the layers. In neural networks with injected noise, the method analytically takes into account uncertainties resulting from this noise. Such feed-forward analytic propagation is differentiable in parameters and can be trained end-to-end. Compared to standard NN, which can be viewed as propagating only the means, we propagate the mean and variance. The method can be useful in all scenarios that require knowledge of the neuron statistics, e.g. when dealing with uncertain inputs, considering sigmoid activations as probabilities of Bernoulli units, training the models regularized by injected noise (dropout) or estimating activation statistics over the dataset (as needed for normalization methods). In the experiments we show the possible utility of the method in all these tasks as well as its current limitations.
△ Less
Submitted 1 November, 2018; v1 submitted 28 March, 2018;
originally announced March 2018.
-
Normalization of Neural Networks using Analytic Variance Propagation
Authors:
Alexander Shekhovtsov,
Boris Flach
Abstract:
We address the problem of estimating statistics of hidden units in a neural network using a method of analytic moment propagation. These statistics are useful for approximate whitening of the inputs in front of saturating non-linearities such as a sigmoid function. This is important for initialization of training and for reducing the accumulated scale and bias dependencies (compensating covariate…
▽ More
We address the problem of estimating statistics of hidden units in a neural network using a method of analytic moment propagation. These statistics are useful for approximate whitening of the inputs in front of saturating non-linearities such as a sigmoid function. This is important for initialization of training and for reducing the accumulated scale and bias dependencies (compensating covariate shift), which presumably eases the learning. In batch normalization, which is currently a very widely applied technique, sample estimates of statistics of hidden units over a batch are used. The proposed estimation uses an analytic propagation of mean and variance of the training set through the network. The result depends on the network structure and its current weights but not on the specific batch input. The estimates are suitable for initialization and normalization, efficient to compute and independent of the batch size. The experimental verification well supports these claims. However, the method does not share the generalization properties of BN, to which our experiments give some additional insight.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
Generative learning for deep networks
Authors:
Boris Flach,
Alexander Shekhovtsov,
Ondrej Fikar
Abstract:
Learning, taking into account full distribution of the data, referred to as generative, is not feasible with deep neural networks (DNNs) because they model only the conditional distribution of the outputs given the inputs. Current solutions are either based on joint probability models facing difficult estimation problems or learn two separate networks, map** inputs to outputs (recognition) and v…
▽ More
Learning, taking into account full distribution of the data, referred to as generative, is not feasible with deep neural networks (DNNs) because they model only the conditional distribution of the outputs given the inputs. Current solutions are either based on joint probability models facing difficult estimation problems or learn two separate networks, map** inputs to outputs (recognition) and vice-versa (generation). We propose an intermediate approach. First, we show that forward computation in DNNs with logistic sigmoid activations corresponds to a simplified approximate Bayesian inference in a directed probabilistic multi-layer model. This connection allows to interpret DNN as a probabilistic model of the output and all hidden units given the input. Second, we propose that in order for the recognition and generation networks to be more consistent with the joint model of the data, weights of the recognition and generator network should be related by transposition. We demonstrate in a tentative experiment that such a coupled pair can be learned generatively, modelling the full distribution of the data, and has enough capacity to perform well in both recognition and generation.
△ Less
Submitted 25 September, 2017;
originally announced September 2017.
-
Scalable Full Flow with Learned Binary Descriptors
Authors:
Gottfried Munda,
Alexander Shekhovtsov,
Patrick Knöbelreiter,
Thomas Pock
Abstract:
We propose a method for large displacement optical flow in which local matching costs are learned by a convolutional neural network (CNN) and a smoothness prior is imposed by a conditional random field (CRF). We tackle the computation- and memory-intensive operations on the 4D cost volume by a min-projection which reduces memory complexity from quadratic to linear and binary descriptors for effici…
▽ More
We propose a method for large displacement optical flow in which local matching costs are learned by a convolutional neural network (CNN) and a smoothness prior is imposed by a conditional random field (CRF). We tackle the computation- and memory-intensive operations on the 4D cost volume by a min-projection which reduces memory complexity from quadratic to linear and binary descriptors for efficient matching. This enables evaluation of the cost on the fly and allows to perform learning and CRF inference on high resolution images without ever storing the 4D cost volume. To address the problem of learning binary descriptors we propose a new hybrid learning scheme. In contrast to current state of the art approaches for learning binary CNNs we can compute the exact non-zero gradient within our model. We compare several methods for training binary descriptors and show results on public available benchmarks.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
End-to-End Training of Hybrid CNN-CRF Models for Stereo
Authors:
Patrick Knöbelreiter,
Christian Reinbacher,
Alexander Shekhovtsov,
Thomas Pock
Abstract:
We propose a novel and principled hybrid CNN+CRF model for stereo estimation. Our model allows to exploit the advantages of both, convolutional neural networks (CNNs) and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to compute the unary and binary costs of the CRF. For inference, we ap…
▽ More
We propose a novel and principled hybrid CNN+CRF model for stereo estimation. Our model allows to exploit the advantages of both, convolutional neural networks (CNNs) and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to compute the unary and binary costs of the CRF. For inference, we apply a recently proposed highly parallel dual block descent algorithm which only needs a small fixed number of iterations to compute a high-quality approximate minimizer. As the main contribution of the paper, we propose a theoretically sound method based on the structured output support vector machine (SSVM) to train the hybrid CNN+CRF model on large-scale data end-to-end. Our trained models perform very well despite the fact that we are using shallow CNNs and do not apply any kind of post-processing to the final output of the CRF. We evaluate our combined models on challenging stereo benchmarks such as Middlebury 2014 and Kitti 2015 and also investigate the performance of each individual component.
△ Less
Submitted 3 May, 2017; v1 submitted 30 November, 2016;
originally announced November 2016.
-
Complexity of Discrete Energy Minimization Problems
Authors:
Mengtian Li,
Alexander Shekhovtsov,
Daniel Huber
Abstract:
Discrete energy minimization is widely-used in computer vision and machine learning for problems such as MAP inference in graphical models. The problem, in general, is notoriously intractable, and finding the global optimal solution is known to be NP-hard. However, is it possible to approximate this problem with a reasonable ratio bound on the solution quality in polynomial time? We show in this p…
▽ More
Discrete energy minimization is widely-used in computer vision and machine learning for problems such as MAP inference in graphical models. The problem, in general, is notoriously intractable, and finding the global optimal solution is known to be NP-hard. However, is it possible to approximate this problem with a reasonable ratio bound on the solution quality in polynomial time? We show in this paper that the answer is no. Specifically, we show that general energy minimization, even in the 2-label pairwise case, and planar energy minimization with three or more labels are exp-APX-complete. This finding rules out the existence of any approximation algorithm with a sub-exponential approximation ratio in the input size for these two problems, including constant factor approximations. Moreover, we collect and review the computational complexity of several subclass problems and arrange them on a complexity scale consisting of three major complexity classes -- PO, APX, and exp-APX, corresponding to problems that are solvable, approximable, and inapproximable in polynomial time. Problems in the first two complexity classes can serve as alternative tractable formulations to the inapproximable ones. This paper can help vision researchers to select an appropriate model for an application or guide them in designing new algorithms.
△ Less
Submitted 29 July, 2016;
originally announced July 2016.
-
Joint M-Best-Diverse Labelings as a Parametric Submodular Minimization
Authors:
Alexander Kirillov,
Alexander Shekhovtsov,
Carsten Rother,
Bogdan Savchynskyy
Abstract:
We consider the problem of jointly inferring the M-best diverse labelings for a binary (high-order) submodular energy of a graphical model. Recently, it was shown that this problem can be solved to a global optimum, for many practically interesting diversity measures. It was noted that the labelings are, so-called, nested. This nestedness property also holds for labelings of a class of parametric…
▽ More
We consider the problem of jointly inferring the M-best diverse labelings for a binary (high-order) submodular energy of a graphical model. Recently, it was shown that this problem can be solved to a global optimum, for many practically interesting diversity measures. It was noted that the labelings are, so-called, nested. This nestedness property also holds for labelings of a class of parametric submodular minimization problems, where different values of the global parameter $γ$ give rise to different solutions. The popular example of the parametric submodular minimization is the monotonic parametric max-flow problem, which is also widely used for computing multiple labelings. As the main contribution of this work we establish a close relationship between diversity with submodular energies and the parametric submodular minimization. In particular, the joint M-best diverse labelings can be obtained by running a non-parametric submodular minimization (in the special case - max-flow) solver for M different values of $γ$ in parallel, for certain diversity measures. Importantly, the values for $γ$ can be computed in a closed form in advance, prior to any optimization. These theoretical results suggest two simple yet efficient algorithms for the joint M-best diverse problem, which outperform competitors in terms of runtime and quality of results. In particular, as we show in the paper, the new methods compute the exact M-best diverse labelings faster than a popular method of Batra et al., which in some sense only obtains approximate solutions.
△ Less
Submitted 23 June, 2016; v1 submitted 22 June, 2016;
originally announced June 2016.
-
Solving Dense Image Matching in Real-Time using Discrete-Continuous Optimization
Authors:
Alexander Shekhovtsov,
Christian Reinbacher,
Gottfried Graber,
Thomas Pock
Abstract:
Dense image matching is a fundamental low-level problem in Computer Vision, which has received tremendous attention from both discrete and continuous optimization communities. The goal of this paper is to combine the advantages of discrete and continuous optimization in a coherent framework. We devise a model based on energy minimization, to be optimized by both discrete and continuous algorithms…
▽ More
Dense image matching is a fundamental low-level problem in Computer Vision, which has received tremendous attention from both discrete and continuous optimization communities. The goal of this paper is to combine the advantages of discrete and continuous optimization in a coherent framework. We devise a model based on energy minimization, to be optimized by both discrete and continuous algorithms in a consistent way. In the discrete setting, we propose a novel optimization algorithm that can be massively parallelized. In the continuous setting we tackle the problem of non-convex regularizers by a formulation based on differences of convex functions. The resulting hybrid discrete-continuous algorithm can be efficiently accelerated by modern GPUs and we demonstrate its real-time performance for the applications of dense stereo matching and optical flow.
△ Less
Submitted 23 January, 2016;
originally announced January 2016.
-
Maximum Persistency via Iterative Relaxed Inference with Graphical Models
Authors:
Alexander Shekhovtsov,
Paul Swoboda,
Bogdan Savchynskyy
Abstract:
We consider the NP-hard problem of MAP-inference for undirected discrete graphical models. We propose a polynomial time and practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks some labels of the considered graphical model either as (i) optimal, meaning that they belong to all optimal solutions of the inference problem; (ii) non-optimal if t…
▽ More
We consider the NP-hard problem of MAP-inference for undirected discrete graphical models. We propose a polynomial time and practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks some labels of the considered graphical model either as (i) optimal, meaning that they belong to all optimal solutions of the inference problem; (ii) non-optimal if they provably do not belong to any solution. With access to an exact solver of a linear programming relaxation to the MAP-inference problem, our algorithm marks the maximal possible (in a specified sense) number of labels. We also present a version of the algorithm, which has access to a suboptimal dual solver only and still can ensure the (non-)optimality for the marked labels, although the overall number of the marked labels may decrease. We propose an efficient implementation, which runs in time comparable to a single run of a suboptimal dual solver. Our method is well-scalable and shows state-of-the-art results on computational benchmarks from machine learning and computer vision.
△ Less
Submitted 3 February, 2017; v1 submitted 31 August, 2015;
originally announced August 2015.
-
Higher Order Maximum Persistency and Comparison Theorems
Authors:
Alexander Shekhovtsov
Abstract:
We address combinatorial problems that can be formulated as minimization of a partially separable function of discrete variables (energy minimization in graphical models, weighted constraint satisfaction, pseudo-Boolean optimization, 0-1 polynomial programming). For polyhedral relaxations of such problems it is generally not true that variables integer in the relaxed solution will retain the same…
▽ More
We address combinatorial problems that can be formulated as minimization of a partially separable function of discrete variables (energy minimization in graphical models, weighted constraint satisfaction, pseudo-Boolean optimization, 0-1 polynomial programming). For polyhedral relaxations of such problems it is generally not true that variables integer in the relaxed solution will retain the same values in the optimal discrete solution. Those which do are called persistent. Such persistent variables define a part of a globally optimal solution. Once identified, they can be excluded from the problem, reducing its size.
To any polyhedral relaxation we associate a sufficient condition proving persistency of a subset of variables. We set up a specially constructed linear program which determines the set of persistent variables maximal with respect to the relaxation. The condition improves as the relaxation is tightened and possesses all its invariances. The proposed framework explains a variety of existing methods originating from different areas of research and based on different principles. A theoretical comparison is established that relates these methods to the standard linear relaxation and proves that the proposed technique identifies same or larger set of persistent variables.
△ Less
Submitted 4 May, 2015;
originally announced May 2015.
-
Partial Optimality by Pruning for MAP-Inference with General Graphical Models
Authors:
Paul Swoboda,
Alexander Shekhovtsov,
Jörg Hendrik Kappes,
Christoph Schnörr,
Bogdan Savchynskyy
Abstract:
We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inferenc…
▽ More
We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. The method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem.
△ Less
Submitted 18 August, 2015; v1 submitted 24 October, 2014;
originally announced October 2014.
-
Maximum Persistency in Energy Minimization
Authors:
Alexander Shekhovtsov
Abstract:
We consider discrete pairwise energy minimization problem (weighted constraint satisfaction, max-sum labeling) and methods that identify a globally optimal partial assignment of variables. When finding a complete optimal assignment is intractable, determining optimal values for a part of variables is an interesting possibility. Existing methods are based on different sufficient conditions. We prop…
▽ More
We consider discrete pairwise energy minimization problem (weighted constraint satisfaction, max-sum labeling) and methods that identify a globally optimal partial assignment of variables. When finding a complete optimal assignment is intractable, determining optimal values for a part of variables is an interesting possibility. Existing methods are based on different sufficient conditions. We propose a new sufficient condition for partial optimality which is: (1) verifiable in polynomial time (2) invariant to reparametrization of the problem and permutation of labels and (3) includes many existing sufficient conditions as special cases. We pose the problem of finding the maximum optimal partial assignment identifiable by the new sufficient condition. A polynomial method is proposed which is guaranteed to assign same or larger part of variables than several existing approaches. The core of the method is a specially constructed linear program that identifies persistent assignments in an arbitrary multi-label setting.
△ Less
Submitted 16 June, 2014; v1 submitted 14 April, 2014;
originally announced April 2014.
-
Curvature Prior for MRF-based Segmentation and Shape Inpainting
Authors:
Alexander Shekhovtsov,
Pushmeet Kohli,
Carsten Rother
Abstract:
Most image labeling problems such as segmentation and image reconstruction are fundamentally ill-posed and suffer from ambiguities and noise. Higher order image priors encode high level structural dependencies between pixels and are key to overcoming these problems. However, these priors in general lead to computationally intractable models. This paper addresses the problem of discovering compact…
▽ More
Most image labeling problems such as segmentation and image reconstruction are fundamentally ill-posed and suffer from ambiguities and noise. Higher order image priors encode high level structural dependencies between pixels and are key to overcoming these problems. However, these priors in general lead to computationally intractable models. This paper addresses the problem of discovering compact representations of higher order priors which allow efficient inference. We propose a framework for solving this problem which uses a recently proposed representation of higher order functions where they are encoded as lower envelopes of linear functions. Maximum a Posterior inference on our learned models reduces to minimizing a pairwise function of discrete variables, which can be done approximately using standard methods. Although this is a primarily theoretical paper, we also demonstrate the practical effectiveness of our framework on the problem of learning a shape prior for image segmentation and reconstruction. We show that our framework can learn a compact representation that approximates a prior that encourages low curvature shapes. We evaluate the approximation accuracy, discuss properties of the trained model, and show various results for shape inpainting and image segmentation.
△ Less
Submitted 7 September, 2011;
originally announced September 2011.
-
On Partial Opimality by Auxiliary Submodular Problems
Authors:
Alexander Shekhovtsov,
Vaclav Hlavac
Abstract:
In this work, we prove several relations between three different energy minimization techniques. A recently proposed methods for determining a provably optimal partial assignment of variables by Ivan Kovtun (IK), the linear programming relaxation approach (LP) and the popular expansion move algorithm by Yuri Boykov. We propose a novel sufficient condition of optimal partial assignment, which is ba…
▽ More
In this work, we prove several relations between three different energy minimization techniques. A recently proposed methods for determining a provably optimal partial assignment of variables by Ivan Kovtun (IK), the linear programming relaxation approach (LP) and the popular expansion move algorithm by Yuri Boykov. We propose a novel sufficient condition of optimal partial assignment, which is based on LP relaxation and called LP-autarky. We show that methods of Kovtun, which build auxiliary submodular problems, fulfill this sufficient condition. The following link is thus established: LP relaxation cannot be tightened by IK. For non-submodular problems this is a non-trivial result. In the case of two labels, LP relaxation provides optimal partial assignment, known as persistency, which, as we show, dominates IK. Relating IK with expansion move, we show that the set of fixed points of expansion move with any "truncation" rule for the initial problem and the problem restricted by one-vs-all method of IK would coincide -- i.e. expansion move cannot be improved by this method. In the case of two labels, expansion move with a particular truncation rule coincide with one-vs-all method.
△ Less
Submitted 6 September, 2011;
originally announced September 2011.
-
A Distributed Mincut/Maxflow Algorithm Combining Path Augmentation and Push-Relabel
Authors:
Alexander Shekhovtsov,
Vaclav Hlavac
Abstract:
We develop a novel distributed algorithm for the minimum cut problem. We primarily aim at solving large sparse problems. Assuming vertices of the graph are partitioned into several regions, the algorithm performs path augmentations inside the regions and updates of the push-relabel style between the regions. The interaction between regions is considered expensive (regions are loaded into the memor…
▽ More
We develop a novel distributed algorithm for the minimum cut problem. We primarily aim at solving large sparse problems. Assuming vertices of the graph are partitioned into several regions, the algorithm performs path augmentations inside the regions and updates of the push-relabel style between the regions. The interaction between regions is considered expensive (regions are loaded into the memory one-by-one or located on separate machines in a network). The algorithm works in sweeps - passes over all regions. Let $B$ be the set of vertices incident to inter-region edges of the graph. We present a sequential and parallel versions of the algorithm which terminate in at most $2|B|^2+1$ sweeps. The competing algorithm by Delong and Boykov uses push-relabel updates inside regions. In the case of a fixed partition we prove that this algorithm has a tight $O(n^2)$ bound on the number of sweeps, where $n$ is the number of vertices. We tested sequential versions of the algorithms on instances of maxflow problems in computer vision. Experimentally, the number of sweeps required by the new algorithm is much lower than for the Delong and Boykov's variant. Large problems (up to $10^8$ vertices and $6\cdot 10^8$ edges) are solved using under 1GB of memory in about 10 sweeps.
△ Less
Submitted 6 September, 2011;
originally announced September 2011.