Search | arXiv e-print repository

Enumerating Complexity Revisited

Authors: Alexander Shekhovtsov, Georgii Zakharov

Abstract: Consider a subset of positive integers $S$. In this paper, we reduce the upper bound on the length of a minimum program that enumerates $S$ in terms of the probability of $S$ being enumerated by a random program. So far, the best-known upper bound was given by Solovay. Solovay proved that the minimum length of a program enumerating $S$ is bounded by $3$ times minus binary logarithm of the probab… ▽ More Consider a subset of positive integers $S$. In this paper, we reduce the upper bound on the length of a minimum program that enumerates $S$ in terms of the probability of $S$ being enumerated by a random program. So far, the best-known upper bound was given by Solovay. Solovay proved that the minimum length of a program enumerating $S$ is bounded by $3$ times minus binary logarithm of the probability that a random program enumerates $S$. Later, Vereshchagin showed that the constant can be improved from $3$ to $2$ for finite sets. By improving the method proposed by Solovay, we demonstrate that any bound for finite sets implies the same bound for infinite sets, modulo logarithmic factors. Thus, the constant can be replaced by $2$ for every set $S$ due to the result of Vereshchagin. △ Less

Submitted 14 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2307.09883 [pdf, other]

Symmetric Equilibrium Learning of VAEs

Authors: Boris Flach, Dmitrij Schlesinger, Alexander Shekhovtsov

Abstract: We view variational autoencoders (VAE) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. The standard learning approach for VAEs is the maximisation of the evidence lower bound (ELBO). It is asymmetric in that it aims at learning a latent variable model while using the encoder as an auxiliary means only. Moreover, it requires a… ▽ More We view variational autoencoders (VAE) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. The standard learning approach for VAEs is the maximisation of the evidence lower bound (ELBO). It is asymmetric in that it aims at learning a latent variable model while using the encoder as an auxiliary means only. Moreover, it requires a closed form a-priori latent distribution. This limits its applicability in more complex scenarios, such as general semi-supervised learning and employing complex generative models as priors. We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling. The flexibility and simplicity of this approach allows its application to a wide range of learning scenarios and downstream tasks. △ Less

Submitted 12 March, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: 13 pages, 6 figures, accepted for AISTATS 2024

ACM Class: I.2.6

arXiv:2212.13185 [pdf, other]

Generalized Differentiable RANSAC

Authors: Tong Wei, Yash Patel, Alexander Shekhovtsov, Jiri Matas, Daniel Barath

Abstract: We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models esti… ▽ More We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models estimated within $\nabla$-RANSAC to guide the network learning accurate and useful inlier probabilities or to train feature detection and matching networks. Our method directly maximizes the probability of drawing a good hypothesis, allowing us to learn better sampling distributions. We test $\nabla$-RANSAC on various real-world scenarios on fundamental and essential matrix estimation, and 3D point cloud registration, outdoors and indoors, with handcrafted and learning-based features. It is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives. The code and trained models are available at https://github.com/weitong8591/differentiable_ransac. △ Less

Submitted 8 September, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

arXiv:2110.03549 [pdf, other]

Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators

Authors: Alexander Shekhovtsov

Abstract: Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simpl… ▽ More Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simplicity and efficiency, in particular in deep networks where unbiased estimators are impractical. Several techniques were proposed to improve over ST while kee** the same low computational complexity: Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical analysis of bias and variance of these methods in order to understand tradeoffs and verify the originally claimed properties. The presented theoretical results allow for better understanding of these methods and in some cases reveal serious issues. △ Less

Submitted 15 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: 22 pages, GCPR 2021

arXiv:2102.09310 [pdf, other]

VAE Approximation Error: ELBO and Exponential Families

Authors: Alexander Shekhovtsov, Dmitrij Schlesinger, Boris Flach

Abstract: The importance of Variational Autoencoders reaches far beyond standalone generative models -- the approach is also used for learning latent representations and can be generalized to semi-supervised learning. This requires a thorough analysis of their commonly known shortcomings: posterior collapse and approximation errors. This paper analyzes VAE approximation errors caused by the combination of t… ▽ More The importance of Variational Autoencoders reaches far beyond standalone generative models -- the approach is also used for learning latent representations and can be generalized to semi-supervised learning. This requires a thorough analysis of their commonly known shortcomings: posterior collapse and approximation errors. This paper analyzes VAE approximation errors caused by the combination of the ELBO objective and encoder models from conditional exponential families, including, but not limited to, commonly used conditionally independent discrete and continuous models. We characterize subclasses of generative models consistent with these encoder families. We show that the ELBO optimizer is pulled away from the likelihood optimizer towards the consistent subset and study this effect experimentally. Importantly, this subset can not be enlarged, and the respective error cannot be decreased, by considering deeper encoder/decoder networks. △ Less

Submitted 11 April, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: ICLR 2022 spotlight

arXiv:2006.06880 [pdf, other]

Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Authors: Alexander Shekhovtsov, Viktor Yanush

Abstract: Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches, proposing a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete wei… ▽ More Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches, proposing a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. At the same time, ST methods can be truly derived as estimators in the stochastic binary network (SBN) model with Bernoulli weights. We advance these derivations to a more complete and systematic study. We analyze properties, estimation accuracy, obtain different forms of correct ST estimators for activations and weights, explain existing empirical approaches and their shortcomings, explain how latent weights arise from the mirror descent method when optimizing over probabilities. This allows to reintroduce ST methods, long known empirically, as sound approximations, apply them with clarity and develop further improvements. △ Less

Submitted 19 October, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 33 pages, DAGM 2021 version (presented, to be published)

arXiv:2006.03143 [pdf, other]

Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks

Authors: Alexander Shekhovtsov, Viktor Yanush, Boris Flach

Abstract: In neural networks with binary activations and or binary weights the training by gradient descent is complicated as the model has piecewise constant response. We consider stochastic binary networks, obtained by adding noises in front of activations. The expected model response becomes a smooth function of parameters, its gradient is well defined but it is challenging to estimate it accurately. We… ▽ More In neural networks with binary activations and or binary weights the training by gradient descent is complicated as the model has piecewise constant response. We consider stochastic binary networks, obtained by adding noises in front of activations. The expected model response becomes a smooth function of parameters, its gradient is well defined but it is challenging to estimate it accurately. We propose a new method for this estimation problem combining sampling and analytic approximation steps. The method has a significantly reduced variance at the price of a small bias which gives a very practical tradeoff in comparison with existing unbiased and biased estimators. We further show that one extra linearization step leads to a deep straight-through estimator previously known only as an ad-hoc heuristic. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models with both proposed methods. △ Less

Submitted 4 November, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020

arXiv:2004.08227 [pdf, other]

MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models

Authors: Siddharth Tourani, Alexander Shekhovtsov, Carsten Rother, Bogdan Savchynskyy

Abstract: Dense, discrete Graphical Models with pairwise potentials are a powerful class of models which are employed in state-of-the-art computer vision and bio-imaging applications. This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, the Max Product Linear Programming (MPLP) algorithm, we d… ▽ More Dense, discrete Graphical Models with pairwise potentials are a powerful class of models which are employed in state-of-the-art computer vision and bio-imaging applications. This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, the Max Product Linear Programming (MPLP) algorithm, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin, including the state-of-the-art solver Tree-Reweighted Sequential (TRWS) message-passing algorithm. Additionally, our solver is highly parallel, in contrast to TRWS, which gives a further boost in performance with the proposed GPU and multi-thread CPU implementations. We verify the superiority of our algorithm on dense problems from publicly available benchmarks, as well, as a new benchmark for 6D Object Pose estimation. We also provide an ablation study with respect to graph density. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: Accepted in ECCV-2018

arXiv:2004.07715 [pdf, other]

Taxonomy of Dual Block-Coordinate Ascent Methods for Discrete Energy Minimization

Authors: Siddharth Tourani, Alexander Shekhovtsov, Carsten Rother, Bogdan Savchynskyy

Abstract: We consider the maximum-a-posteriori inference problem in discrete graphical models and study solvers based on the dual block-coordinate ascent rule. We map all existing solvers in a single framework, allowing for a better understanding of their design principles. We theoretically show that some block-optimizing updates are sub-optimal and how to strictly improve them. On a wide range of problem i… ▽ More We consider the maximum-a-posteriori inference problem in discrete graphical models and study solvers based on the dual block-coordinate ascent rule. We map all existing solvers in a single framework, allowing for a better understanding of their design principles. We theoretically show that some block-optimizing updates are sub-optimal and how to strictly improve them. On a wide range of problem instances of varying graph connectivity, we study the performance of existing solvers as well as new variants that can be obtained within the framework. As a result of this exploration we build a new state-of-the art solver, performing uniformly better on the whole range of test instances. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: Accepted in AISTATS 2020

arXiv:2003.06258 [pdf, other]

Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems

Authors: Patrick Knöbelreiter, Christian Sormann, Alexander Shekhovtsov, Friedrich Fraundorfer, Thomas Pock

Abstract: It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest infer… ▽ More It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: CVPR 2020

arXiv:1907.00845 [pdf, other]

Graph-based Nearest Neighbor Search: From Practice to Theory

Authors: Liudmila Prokhorenkova, Aleksandr Shekhovtsov

Abstract: Graph-based approaches are empirically shown to be very successful for the nearest neighbor search (NNS). However, there has been very little research on their theoretical guarantees. We fill this gap and rigorously analyze the performance of graph-based NNS algorithms, specifically focusing on the low-dimensional (d << \log n) regime. In addition to the basic greedy algorithm on nearest neighbor… ▽ More Graph-based approaches are empirically shown to be very successful for the nearest neighbor search (NNS). However, there has been very little research on their theoretical guarantees. We fill this gap and rigorously analyze the performance of graph-based NNS algorithms, specifically focusing on the low-dimensional (d << \log n) regime. In addition to the basic greedy algorithm on nearest neighbor graphs, we also analyze the most successful heuristics commonly used in practice: speeding up via adding shortcut edges and improving accuracy via maintaining a dynamic list of candidates. We believe that our theoretical insights supported by experimental analysis are an important step towards understanding the limits and benefits of graph-based NNS algorithms. △ Less

Submitted 20 August, 2020; v1 submitted 1 July, 2019; originally announced July 2019.

arXiv:1811.00639 [pdf, other]

Stochastic Normalizations as Bayesian Learning

Authors: Alexander Shekhovtsov, Boris Flach

Abstract: In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply th… ▽ More In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: Accepted to ACCV 2018

arXiv:1803.10590 [pdf, other]

Feed-forward Uncertainty Propagation in Belief and Neural Networks

Authors: Alexander Shekhovtsov, Boris Flach, Michal Busta

Abstract: We propose a feed-forward inference method applicable to belief and neural networks. In a belief network, the method estimates an approximate factorized posterior of all hidden units given the input. In neural networks the method propagates uncertainty of the input through all the layers. In neural networks with injected noise, the method analytically takes into account uncertainties resulting fro… ▽ More We propose a feed-forward inference method applicable to belief and neural networks. In a belief network, the method estimates an approximate factorized posterior of all hidden units given the input. In neural networks the method propagates uncertainty of the input through all the layers. In neural networks with injected noise, the method analytically takes into account uncertainties resulting from this noise. Such feed-forward analytic propagation is differentiable in parameters and can be trained end-to-end. Compared to standard NN, which can be viewed as propagating only the means, we propagate the mean and variance. The method can be useful in all scenarios that require knowledge of the neuron statistics, e.g. when dealing with uncertain inputs, considering sigmoid activations as probabilities of Bernoulli units, training the models regularized by injected noise (dropout) or estimating activation statistics over the dataset (as needed for normalization methods). In the experiments we show the possible utility of the method in all these tasks as well as its current limitations. △ Less

Submitted 1 November, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

Comments: error corrections

arXiv:1803.10560 [pdf, other]

Normalization of Neural Networks using Analytic Variance Propagation

Authors: Alexander Shekhovtsov, Boris Flach

Abstract: We address the problem of estimating statistics of hidden units in a neural network using a method of analytic moment propagation. These statistics are useful for approximate whitening of the inputs in front of saturating non-linearities such as a sigmoid function. This is important for initialization of training and for reducing the accumulated scale and bias dependencies (compensating covariate… ▽ More We address the problem of estimating statistics of hidden units in a neural network using a method of analytic moment propagation. These statistics are useful for approximate whitening of the inputs in front of saturating non-linearities such as a sigmoid function. This is important for initialization of training and for reducing the accumulated scale and bias dependencies (compensating covariate shift), which presumably eases the learning. In batch normalization, which is currently a very widely applied technique, sample estimates of statistics of hidden units over a batch are used. The proposed estimation uses an analytic propagation of mean and variance of the training set through the network. The result depends on the network structure and its current weights but not on the specific batch input. The estimates are suitable for initialization and normalization, efficient to compute and independent of the batch size. The experimental verification well supports these claims. However, the method does not share the generalization properties of BN, to which our experiments give some additional insight. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Journal ref: In Proceedings of Computer Vision Winter Workshop 2018

arXiv:1709.08524 [pdf, other]

Generative learning for deep networks

Authors: Boris Flach, Alexander Shekhovtsov, Ondrej Fikar

Abstract: Learning, taking into account full distribution of the data, referred to as generative, is not feasible with deep neural networks (DNNs) because they model only the conditional distribution of the outputs given the inputs. Current solutions are either based on joint probability models facing difficult estimation problems or learn two separate networks, map** inputs to outputs (recognition) and v… ▽ More Learning, taking into account full distribution of the data, referred to as generative, is not feasible with deep neural networks (DNNs) because they model only the conditional distribution of the outputs given the inputs. Current solutions are either based on joint probability models facing difficult estimation problems or learn two separate networks, map** inputs to outputs (recognition) and vice-versa (generation). We propose an intermediate approach. First, we show that forward computation in DNNs with logistic sigmoid activations corresponds to a simplified approximate Bayesian inference in a directed probabilistic multi-layer model. This connection allows to interpret DNN as a probabilistic model of the output and all hidden units given the input. Second, we propose that in order for the recognition and generation networks to be more consistent with the joint model of the data, weights of the recognition and generator network should be related by transposition. We demonstrate in a tentative experiment that such a coupled pair can be learned generatively, modelling the full distribution of the data, and has enough capacity to perform well in both recognition and generation. △ Less

Submitted 25 September, 2017; originally announced September 2017.

Comments: submitted to AAAI

arXiv:1707.06427 [pdf, other]

Scalable Full Flow with Learned Binary Descriptors

Authors: Gottfried Munda, Alexander Shekhovtsov, Patrick Knöbelreiter, Thomas Pock

Abstract: We propose a method for large displacement optical flow in which local matching costs are learned by a convolutional neural network (CNN) and a smoothness prior is imposed by a conditional random field (CRF). We tackle the computation- and memory-intensive operations on the 4D cost volume by a min-projection which reduces memory complexity from quadratic to linear and binary descriptors for effici… ▽ More We propose a method for large displacement optical flow in which local matching costs are learned by a convolutional neural network (CNN) and a smoothness prior is imposed by a conditional random field (CRF). We tackle the computation- and memory-intensive operations on the 4D cost volume by a min-projection which reduces memory complexity from quadratic to linear and binary descriptors for efficient matching. This enables evaluation of the cost on the fly and allows to perform learning and CRF inference on high resolution images without ever storing the 4D cost volume. To address the problem of learning binary descriptors we propose a new hybrid learning scheme. In contrast to current state of the art approaches for learning binary CNNs we can compute the exact non-zero gradient within our model. We compare several methods for training binary descriptors and show results on public available benchmarks. △ Less

Submitted 20 July, 2017; originally announced July 2017.

Comments: GCPR 2017

arXiv:1611.10229 [pdf, other]

End-to-End Training of Hybrid CNN-CRF Models for Stereo

Authors: Patrick Knöbelreiter, Christian Reinbacher, Alexander Shekhovtsov, Thomas Pock

Abstract: We propose a novel and principled hybrid CNN+CRF model for stereo estimation. Our model allows to exploit the advantages of both, convolutional neural networks (CNNs) and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to compute the unary and binary costs of the CRF. For inference, we ap… ▽ More We propose a novel and principled hybrid CNN+CRF model for stereo estimation. Our model allows to exploit the advantages of both, convolutional neural networks (CNNs) and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to compute the unary and binary costs of the CRF. For inference, we apply a recently proposed highly parallel dual block descent algorithm which only needs a small fixed number of iterations to compute a high-quality approximate minimizer. As the main contribution of the paper, we propose a theoretically sound method based on the structured output support vector machine (SSVM) to train the hybrid CNN+CRF model on large-scale data end-to-end. Our trained models perform very well despite the fact that we are using shallow CNNs and do not apply any kind of post-processing to the final output of the CRF. We evaluate our combined models on challenging stereo benchmarks such as Middlebury 2014 and Kitti 2015 and also investigate the performance of each individual component. △ Less

Submitted 3 May, 2017; v1 submitted 30 November, 2016; originally announced November 2016.

Comments: To appear at CVPR 2017

arXiv:1607.08905 [pdf, other]

Complexity of Discrete Energy Minimization Problems

Authors: Mengtian Li, Alexander Shekhovtsov, Daniel Huber

Abstract: Discrete energy minimization is widely-used in computer vision and machine learning for problems such as MAP inference in graphical models. The problem, in general, is notoriously intractable, and finding the global optimal solution is known to be NP-hard. However, is it possible to approximate this problem with a reasonable ratio bound on the solution quality in polynomial time? We show in this p… ▽ More Discrete energy minimization is widely-used in computer vision and machine learning for problems such as MAP inference in graphical models. The problem, in general, is notoriously intractable, and finding the global optimal solution is known to be NP-hard. However, is it possible to approximate this problem with a reasonable ratio bound on the solution quality in polynomial time? We show in this paper that the answer is no. Specifically, we show that general energy minimization, even in the 2-label pairwise case, and planar energy minimization with three or more labels are exp-APX-complete. This finding rules out the existence of any approximation algorithm with a sub-exponential approximation ratio in the input size for these two problems, including constant factor approximations. Moreover, we collect and review the computational complexity of several subclass problems and arrange them on a complexity scale consisting of three major complexity classes -- PO, APX, and exp-APX, corresponding to problems that are solvable, approximable, and inapproximable in polynomial time. Problems in the first two complexity classes can serve as alternative tractable formulations to the inapproximable ones. This paper can help vision researchers to select an appropriate model for an application or guide them in designing new algorithms. △ Less

Submitted 29 July, 2016; originally announced July 2016.

Comments: ECCV'16 accepted

arXiv:1606.07015 [pdf, other]

Joint M-Best-Diverse Labelings as a Parametric Submodular Minimization

Authors: Alexander Kirillov, Alexander Shekhovtsov, Carsten Rother, Bogdan Savchynskyy

Abstract: We consider the problem of jointly inferring the M-best diverse labelings for a binary (high-order) submodular energy of a graphical model. Recently, it was shown that this problem can be solved to a global optimum, for many practically interesting diversity measures. It was noted that the labelings are, so-called, nested. This nestedness property also holds for labelings of a class of parametric… ▽ More We consider the problem of jointly inferring the M-best diverse labelings for a binary (high-order) submodular energy of a graphical model. Recently, it was shown that this problem can be solved to a global optimum, for many practically interesting diversity measures. It was noted that the labelings are, so-called, nested. This nestedness property also holds for labelings of a class of parametric submodular minimization problems, where different values of the global parameter $γ$ give rise to different solutions. The popular example of the parametric submodular minimization is the monotonic parametric max-flow problem, which is also widely used for computing multiple labelings. As the main contribution of this work we establish a close relationship between diversity with submodular energies and the parametric submodular minimization. In particular, the joint M-best diverse labelings can be obtained by running a non-parametric submodular minimization (in the special case - max-flow) solver for M different values of $γ$ in parallel, for certain diversity measures. Importantly, the values for $γ$ can be computed in a closed form in advance, prior to any optimization. These theoretical results suggest two simple yet efficient algorithms for the joint M-best diverse problem, which outperform competitors in terms of runtime and quality of results. In particular, as we show in the paper, the new methods compute the exact M-best diverse labelings faster than a popular method of Batra et al., which in some sense only obtains approximate solutions. △ Less

Submitted 23 June, 2016; v1 submitted 22 June, 2016; originally announced June 2016.

arXiv:1601.06274 [pdf, other]

Solving Dense Image Matching in Real-Time using Discrete-Continuous Optimization

Authors: Alexander Shekhovtsov, Christian Reinbacher, Gottfried Graber, Thomas Pock

Abstract: Dense image matching is a fundamental low-level problem in Computer Vision, which has received tremendous attention from both discrete and continuous optimization communities. The goal of this paper is to combine the advantages of discrete and continuous optimization in a coherent framework. We devise a model based on energy minimization, to be optimized by both discrete and continuous algorithms… ▽ More Dense image matching is a fundamental low-level problem in Computer Vision, which has received tremendous attention from both discrete and continuous optimization communities. The goal of this paper is to combine the advantages of discrete and continuous optimization in a coherent framework. We devise a model based on energy minimization, to be optimized by both discrete and continuous algorithms in a consistent way. In the discrete setting, we propose a novel optimization algorithm that can be massively parallelized. In the continuous setting we tackle the problem of non-convex regularizers by a formulation based on differences of convex functions. The resulting hybrid discrete-continuous algorithm can be efficiently accelerated by modern GPUs and we demonstrate its real-time performance for the applications of dense stereo matching and optical flow. △ Less

Submitted 23 January, 2016; originally announced January 2016.

Comments: 21 st Computer Vision Winter Workshop

arXiv:1508.07902 [pdf, other]

Maximum Persistency via Iterative Relaxed Inference with Graphical Models

Authors: Alexander Shekhovtsov, Paul Swoboda, Bogdan Savchynskyy

Abstract: We consider the NP-hard problem of MAP-inference for undirected discrete graphical models. We propose a polynomial time and practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks some labels of the considered graphical model either as (i) optimal, meaning that they belong to all optimal solutions of the inference problem; (ii) non-optimal if t… ▽ More We consider the NP-hard problem of MAP-inference for undirected discrete graphical models. We propose a polynomial time and practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks some labels of the considered graphical model either as (i) optimal, meaning that they belong to all optimal solutions of the inference problem; (ii) non-optimal if they provably do not belong to any solution. With access to an exact solver of a linear programming relaxation to the MAP-inference problem, our algorithm marks the maximal possible (in a specified sense) number of labels. We also present a version of the algorithm, which has access to a suboptimal dual solver only and still can ensure the (non-)optimality for the marked labels, although the overall number of the marked labels may decrease. We propose an efficient implementation, which runs in time comparable to a single run of a suboptimal dual solver. Our method is well-scalable and shows state-of-the-art results on computational benchmarks from machine learning and computer vision. △ Less

Submitted 3 February, 2017; v1 submitted 31 August, 2015; originally announced August 2015.

Comments: Reworked version, submitted to PAMI

arXiv:1505.00571 [pdf, other]

Higher Order Maximum Persistency and Comparison Theorems

Authors: Alexander Shekhovtsov

Abstract: We address combinatorial problems that can be formulated as minimization of a partially separable function of discrete variables (energy minimization in graphical models, weighted constraint satisfaction, pseudo-Boolean optimization, 0-1 polynomial programming). For polyhedral relaxations of such problems it is generally not true that variables integer in the relaxed solution will retain the same… ▽ More We address combinatorial problems that can be formulated as minimization of a partially separable function of discrete variables (energy minimization in graphical models, weighted constraint satisfaction, pseudo-Boolean optimization, 0-1 polynomial programming). For polyhedral relaxations of such problems it is generally not true that variables integer in the relaxed solution will retain the same values in the optimal discrete solution. Those which do are called persistent. Such persistent variables define a part of a globally optimal solution. Once identified, they can be excluded from the problem, reducing its size. To any polyhedral relaxation we associate a sufficient condition proving persistency of a subset of variables. We set up a specially constructed linear program which determines the set of persistent variables maximal with respect to the relaxation. The condition improves as the relaxation is tightened and possesses all its invariances. The proposed framework explains a variety of existing methods originating from different areas of research and based on different principles. A theoretical comparison is established that relates these methods to the standard linear relaxation and proves that the proposed technique identifies same or larger set of persistent variables. △ Less

Submitted 4 May, 2015; originally announced May 2015.

Comments: Submitted to CVIU Special Issuie on Inference in Graphical Models

arXiv:1410.6641 [pdf, other]

Partial Optimality by Pruning for MAP-Inference with General Graphical Models

Authors: Paul Swoboda, Alexander Shekhovtsov, Jörg Hendrik Kappes, Christoph Schnörr, Bogdan Savchynskyy

Abstract: We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inferenc… ▽ More We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. The method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem. △ Less

Submitted 18 August, 2015; v1 submitted 24 October, 2014; originally announced October 2014.

Comments: 16 pages, 4 tables and 4 figures

arXiv:1404.3653 [pdf, other]

Maximum Persistency in Energy Minimization

Authors: Alexander Shekhovtsov

Abstract: We consider discrete pairwise energy minimization problem (weighted constraint satisfaction, max-sum labeling) and methods that identify a globally optimal partial assignment of variables. When finding a complete optimal assignment is intractable, determining optimal values for a part of variables is an interesting possibility. Existing methods are based on different sufficient conditions. We prop… ▽ More We consider discrete pairwise energy minimization problem (weighted constraint satisfaction, max-sum labeling) and methods that identify a globally optimal partial assignment of variables. When finding a complete optimal assignment is intractable, determining optimal values for a part of variables is an interesting possibility. Existing methods are based on different sufficient conditions. We propose a new sufficient condition for partial optimality which is: (1) verifiable in polynomial time (2) invariant to reparametrization of the problem and permutation of labels and (3) includes many existing sufficient conditions as special cases. We pose the problem of finding the maximum optimal partial assignment identifiable by the new sufficient condition. A polynomial method is proposed which is guaranteed to assign same or larger part of variables than several existing approaches. The core of the method is a specially constructed linear program that identifies persistent assignments in an arbitrary multi-label setting. △ Less

Submitted 16 June, 2014; v1 submitted 14 April, 2014; originally announced April 2014.

Comments: Extended technical report for the CVPR 2014 paper. Update: correction to the proof of characterization theorem

arXiv:1109.1480 [pdf, other]

Curvature Prior for MRF-based Segmentation and Shape Inpainting

Authors: Alexander Shekhovtsov, Pushmeet Kohli, Carsten Rother

Abstract: Most image labeling problems such as segmentation and image reconstruction are fundamentally ill-posed and suffer from ambiguities and noise. Higher order image priors encode high level structural dependencies between pixels and are key to overcoming these problems. However, these priors in general lead to computationally intractable models. This paper addresses the problem of discovering compact… ▽ More Most image labeling problems such as segmentation and image reconstruction are fundamentally ill-posed and suffer from ambiguities and noise. Higher order image priors encode high level structural dependencies between pixels and are key to overcoming these problems. However, these priors in general lead to computationally intractable models. This paper addresses the problem of discovering compact representations of higher order priors which allow efficient inference. We propose a framework for solving this problem which uses a recently proposed representation of higher order functions where they are encoded as lower envelopes of linear functions. Maximum a Posterior inference on our learned models reduces to minimizing a pairwise function of discrete variables, which can be done approximately using standard methods. Although this is a primarily theoretical paper, we also demonstrate the practical effectiveness of our framework on the problem of learning a shape prior for image segmentation and reconstruction. We show that our framework can learn a compact representation that approximates a prior that encourages low curvature shapes. We evaluate the approximation accuracy, discuss properties of the trained model, and show various results for shape inpainting and image segmentation. △ Less

Submitted 7 September, 2011; originally announced September 2011.

Comments: 17 pages, 16 figures

Report number: CTU--CMP--2011--11

arXiv:1109.1149 [pdf, ps, other]

On Partial Opimality by Auxiliary Submodular Problems

Authors: Alexander Shekhovtsov, Vaclav Hlavac

Abstract: In this work, we prove several relations between three different energy minimization techniques. A recently proposed methods for determining a provably optimal partial assignment of variables by Ivan Kovtun (IK), the linear programming relaxation approach (LP) and the popular expansion move algorithm by Yuri Boykov. We propose a novel sufficient condition of optimal partial assignment, which is ba… ▽ More In this work, we prove several relations between three different energy minimization techniques. A recently proposed methods for determining a provably optimal partial assignment of variables by Ivan Kovtun (IK), the linear programming relaxation approach (LP) and the popular expansion move algorithm by Yuri Boykov. We propose a novel sufficient condition of optimal partial assignment, which is based on LP relaxation and called LP-autarky. We show that methods of Kovtun, which build auxiliary submodular problems, fulfill this sufficient condition. The following link is thus established: LP relaxation cannot be tightened by IK. For non-submodular problems this is a non-trivial result. In the case of two labels, LP relaxation provides optimal partial assignment, known as persistency, which, as we show, dominates IK. Relating IK with expansion move, we show that the set of fixed points of expansion move with any "truncation" rule for the initial problem and the problem restricted by one-vs-all method of IK would coincide -- i.e. expansion move cannot be improved by this method. In the case of two labels, expansion move with a particular truncation rule coincide with one-vs-all method. △ Less

Submitted 6 September, 2011; originally announced September 2011.

Comments: 9 pages, 0 figures; Control Systems and Computers #2/2011, Special issue: "Optimal Labeling Problem in Structural Pattern Recognition", pp. 71-78, issn 0130-5395

arXiv:1109.1146 [pdf, other]

A Distributed Mincut/Maxflow Algorithm Combining Path Augmentation and Push-Relabel

Authors: Alexander Shekhovtsov, Vaclav Hlavac

Abstract: We develop a novel distributed algorithm for the minimum cut problem. We primarily aim at solving large sparse problems. Assuming vertices of the graph are partitioned into several regions, the algorithm performs path augmentations inside the regions and updates of the push-relabel style between the regions. The interaction between regions is considered expensive (regions are loaded into the memor… ▽ More We develop a novel distributed algorithm for the minimum cut problem. We primarily aim at solving large sparse problems. Assuming vertices of the graph are partitioned into several regions, the algorithm performs path augmentations inside the regions and updates of the push-relabel style between the regions. The interaction between regions is considered expensive (regions are loaded into the memory one-by-one or located on separate machines in a network). The algorithm works in sweeps - passes over all regions. Let $B$ be the set of vertices incident to inter-region edges of the graph. We present a sequential and parallel versions of the algorithm which terminate in at most $2|B|^2+1$ sweeps. The competing algorithm by Delong and Boykov uses push-relabel updates inside regions. In the case of a fixed partition we prove that this algorithm has a tight $O(n^2)$ bound on the number of sweeps, where $n$ is the number of vertices. We tested sequential versions of the algorithms on instances of maxflow problems in computer vision. Experimentally, the number of sweeps required by the new algorithm is much lower than for the Delong and Boykov's variant. Large problems (up to $10^8$ vertices and $6\cdot 10^8$ edges) are solved using under 1GB of memory in about 10 sweeps. △ Less

Submitted 6 September, 2011; originally announced September 2011.

Comments: 40 pages, 15 figures

Report number: K333-43/11, CTU-CMP-2011-03

Showing 1–27 of 27 results for author: Shekhovtsov, A