-
Collision Cross-entropy for Soft Class Labels and Deep Clustering
Authors:
Zhongwen Zhang,
Yuri Boykov
Abstract:
We propose "collision cross-entropy" as a robust alternative to Shannon's cross-entropy (CE) loss when class labels are represented by soft categorical distributions y. In general, soft labels can naturally represent ambiguous targets in classification. They are particularly relevant for self-labeled clustering methods, where latent pseudo-labels are jointly estimated with the model parameters and…
▽ More
We propose "collision cross-entropy" as a robust alternative to Shannon's cross-entropy (CE) loss when class labels are represented by soft categorical distributions y. In general, soft labels can naturally represent ambiguous targets in classification. They are particularly relevant for self-labeled clustering methods, where latent pseudo-labels are jointly estimated with the model parameters and uncertainty is prevalent. In case of soft labels, Shannon's CE teaches the model predictions to reproduce the uncertainty in each training example, which inhibits the model's ability to learn and generalize from these examples. As an alternative loss, we propose the negative log of "collision probability" that maximizes the chance of equality between two random variables, predicted class and unknown true class. We show that it has the properties of a generalized CE. The proposed collision CE agrees with Shannon's CE for one-hot labels, but the training from soft labels differs. For example, unlike Shannon's CE, data points where y is a uniform distribution have zero contribution to the training. Collision CE significantly improves classification supervised by soft uncertain targets. Unlike Shannon's, collision CE is symmetric for y and network predictions, which is particularly relevant when both distributions are estimated in the context of self-labeled clustering. Focusing on discriminative deep clustering where self-labeling and entropy-based losses are dominant, we show that the use of collision CE improves the state-of-the-art. We also derive an efficient EM algorithm that significantly speeds up the pseudo-label estimation with collision CE.
△ Less
Submitted 27 November, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Discriminative Entropy Clustering and its Relation to K-means and SVM
Authors:
Zhongwen Zhang,
Yuri Boykov
Abstract:
Maximization of mutual information between the model's input and output is formally related to "decisiveness" and "fairness" of the softmax predictions, motivating these unsupervised entropy-based criteria for clustering. First, in the context of linear softmax models, we discuss some general properties of entropy-based clustering. Disproving some earlier claims, we point out fundamental differenc…
▽ More
Maximization of mutual information between the model's input and output is formally related to "decisiveness" and "fairness" of the softmax predictions, motivating these unsupervised entropy-based criteria for clustering. First, in the context of linear softmax models, we discuss some general properties of entropy-based clustering. Disproving some earlier claims, we point out fundamental differences with K-means. On the other hand, we prove the margin maximizing property for decisiveness establishing a relation to SVM-based clustering. Second, we propose a new self-labeling formulation of entropy clustering for general softmax models. The pseudo-labels are introduced as auxiliary variables "splitting" the fairness and decisiveness. The derived self-labeling loss includes the reverse cross-entropy robust to pseudo-label errors and allows an efficient EM solver for pseudo-labels. Our algorithm improves the state of the art on several standard benchmarks for deep clustering.
△ Less
Submitted 24 May, 2024; v1 submitted 26 January, 2023;
originally announced January 2023.
-
Robust Trust Region for Weakly Supervised Segmentation
Authors:
Dmitrii Marin,
Yuri Boykov
Abstract:
Acquisition of training data for the standard semantic segmentation is expensive if requiring that each pixel is labeled. Yet, current methods significantly deteriorate in weakly supervised settings, e.g. where a fraction of pixels is labeled or when only image-level tags are available. It has been shown that regularized losses - originally developed for unsupervised low-level segmentation and rep…
▽ More
Acquisition of training data for the standard semantic segmentation is expensive if requiring that each pixel is labeled. Yet, current methods significantly deteriorate in weakly supervised settings, e.g. where a fraction of pixels is labeled or when only image-level tags are available. It has been shown that regularized losses - originally developed for unsupervised low-level segmentation and representing geometric priors on pixel labels - can considerably improve the quality of weakly supervised training. However, many common priors require optimization stronger than gradient descent. Thus, such regularizers have limited applicability in deep learning. We propose a new robust trust region approach for regularized losses improving the state-of-the-art results. Our approach can be seen as a higher-order generalization of the classic chain rule. It allows neural network optimization to use strong low-level solvers for the corresponding regularizers, including discrete ones.
△ Less
Submitted 1 September, 2021; v1 submitted 5 April, 2021;
originally announced April 2021.
-
Confluent Vessel Trees with Accurate Bifurcations
Authors:
Zhongwen Zhang,
Dmitrii Marin,
Maria Drangova,
Yuri Boykov
Abstract:
We are interested in unsupervised reconstruction of complex near-capillary vasculature with thousands of bifurcations where supervision and learning are infeasible. Unsupervised methods can use many structural constraints, e.g. topology, geometry, physics. Common techniques use variants of MST on geodesic tubular graphs minimizing symmetric pairwise costs, i.e. distances. We show limitations of su…
▽ More
We are interested in unsupervised reconstruction of complex near-capillary vasculature with thousands of bifurcations where supervision and learning are infeasible. Unsupervised methods can use many structural constraints, e.g. topology, geometry, physics. Common techniques use variants of MST on geodesic tubular graphs minimizing symmetric pairwise costs, i.e. distances. We show limitations of such standard undirected tubular graphs producing typical errors at bifurcations where flow "directedness" is critical. We introduce a new general concept of confluence for continuous oriented curves forming vessel trees and show how to enforce it on discrete tubular graphs. While confluence is a high-order property, we present an efficient practical algorithm for reconstructing confluent vessel trees using minimum arborescence on a directed graph enforcing confluence via simple flow-extrapolating arc construction. Empirical tests on large near-capillary sub-voxel vasculature volumes demonstrate significantly improved reconstruction accuracy at bifurcations. Our code has also been made publicly available.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation
Authors:
Hossam Isack,
Christian Haene,
Cem Keskin,
Sofien Bouaziz,
Yuri Boykov,
Shahram Izadi,
Sameh Khamis
Abstract:
We propose a novel efficient and lightweight model for human pose estimation from a single image. Our model is designed to achieve competitive results at a fraction of the number of parameters and computational cost of various state-of-the-art methods. To this end, we explicitly incorporate part-based structural and geometric priors in a hierarchical prediction framework. At the coarsest resolutio…
▽ More
We propose a novel efficient and lightweight model for human pose estimation from a single image. Our model is designed to achieve competitive results at a fraction of the number of parameters and computational cost of various state-of-the-art methods. To this end, we explicitly incorporate part-based structural and geometric priors in a hierarchical prediction framework. At the coarsest resolution, and in a manner similar to classical part-based approaches, we leverage the kinematic structure of the human body to propagate convolutional feature updates between the keypoints or body parts. Unlike classical approaches, we adopt end-to-end training to learn this geometric prior through feature updates from data. We then propagate the feature representation at the coarsest resolution up the hierarchy to refine the predicted pose in a coarse-to-fine fashion. The final network effectively models the geometric prior and intuition within a lightweight deep neural network, yielding state-of-the-art results for a model of this size on two standard datasets, Leeds Sports Pose and MPII Human Pose.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Image Segmentation Using Deep Learning: A Survey
Authors:
Shervin Minaee,
Yuri Boykov,
Fatih Porikli,
Antonio Plaza,
Nasser Kehtarnavaz,
Demetri Terzopoulos
Abstract:
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of v…
▽ More
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at develo** image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
△ Less
Submitted 14 November, 2020; v1 submitted 15 January, 2020;
originally announced January 2020.
-
Efficient Segmentation: Learning Downsampling Near Semantic Boundaries
Authors:
Dmitrii Marin,
Zijian He,
Peter Vajda,
Priyam Chatterjee,
Sam Tsai,
Fei Yang,
Yuri Boykov
Abstract:
Many automated processes such as auto-piloting rely on a good semantic segmentation as a critical component. To speed up performance, it is common to downsample the input frame. However, this comes at the cost of missed small objects and reduced accuracy at semantic boundaries. To address this problem, we propose a new content-adaptive downsampling technique that learns to favor sampling locations…
▽ More
Many automated processes such as auto-piloting rely on a good semantic segmentation as a critical component. To speed up performance, it is common to downsample the input frame. However, this comes at the cost of missed small objects and reduced accuracy at semantic boundaries. To address this problem, we propose a new content-adaptive downsampling technique that learns to favor sampling locations near semantic boundaries of target classes. Cost-performance analysis shows that our method consistently outperforms the uniform sampling improving balance between accuracy and computational efficiency. Our adaptive sampling gives segmentation with better quality of boundaries and more reliable support for smaller-size objects.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Divergence Prior and Vessel-tree Reconstruction
Authors:
Zhongwen Zhang,
Egor Chesakov,
Dmitrii Marin,
Yuri Boykov
Abstract:
We propose a new geometric regularization principle for reconstructing vector fields based on prior knowledge about their divergence. As one important example of this general idea, we focus on vector fields modelling blood flow pattern that should be divergent in arteries and convergent in veins. We show that this previously ignored regularization constraint can significantly improve the quality o…
▽ More
We propose a new geometric regularization principle for reconstructing vector fields based on prior knowledge about their divergence. As one important example of this general idea, we focus on vector fields modelling blood flow pattern that should be divergent in arteries and convergent in veins. We show that this previously ignored regularization constraint can significantly improve the quality of vessel tree reconstruction particularly around bifurcations where non-zero divergence is concentrated. Our divergence prior is critical for resolving (binary) sign ambiguity in flow orientations produced by standard vessel filters, e.g. Frangi. Our vessel tree centerline reconstruction combines divergence constraints with robust curvature regularization. Our unsupervised method can reconstruct complete vessel trees with near-capillary details on synthetic and real 3D volumes.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Beyond Gradient Descent for Regularized Segmentation Losses
Authors:
Dmitrii Marin,
Meng Tang,
Ismail Ben Ayed,
Yuri Boykov
Abstract:
The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-ar…
▽ More
The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-art while GD performs poorly. Interestingly, GD obtains its best result for a "smoother" tuning of the loss function. The results are consistent across different network architectures. Our loss is motivated by well-understood MRF/CRF regularization models in "shallow" segmentation and their known global solvers. Our work suggests that network design/training should pay more attention to optimization methods.
△ Less
Submitted 29 April, 2019; v1 submitted 7 September, 2018;
originally announced September 2018.
-
Constrained-CNN losses for weakly supervised segmentation
Authors:
Hoel Kervadec,
Jose Dolz,
Meng Tang,
Eric Granger,
Yuri Boykov,
Ismail Ben Ayed
Abstract:
Weakly-supervised learning based on, e.g., partially labelled images or image-tags, is currently attracting significant attention in CNN segmentation as it can mitigate the need for full and laborious pixel/voxel annotations. Enforcing high-order (global) inequality constraints on the network output (for instance, to constrain the size of the target region) can leverage unlabeled data, guiding the…
▽ More
Weakly-supervised learning based on, e.g., partially labelled images or image-tags, is currently attracting significant attention in CNN segmentation as it can mitigate the need for full and laborious pixel/voxel annotations. Enforcing high-order (global) inequality constraints on the network output (for instance, to constrain the size of the target region) can leverage unlabeled data, guiding the training process with domain-specific knowledge. Inequality constraints are very flexible because they do not assume exact prior knowledge. However, constrained Lagrangian dual optimization has been largely avoided in deep networks, mainly for computational tractability reasons. To the best of our knowledge, the method of [Pathak et al., 2015] is the only prior work that addresses deep CNNs with linear constraints in weakly supervised segmentation. It uses the constraints to synthesize fully-labeled training masks (proposals) from weak labels, mimicking full supervision and facilitating dual optimization. We propose to introduce a differentiable penalty, which enforces inequality constraints directly in the loss function, avoiding expensive Lagrangian dual iterates and proposal generation. From constrained-optimization perspective, our simple penalty-based approach is not optimal as there is no guarantee that the constraints are satisfied. However, surprisingly, it yields substantially better results than the Lagrangian-based constrained CNNs in [Pathak et al., 2015], while reducing the computational demand for training. By annotating only a small fraction of the pixels, the proposed approach can reach a level of segmentation performance that is comparable to full supervision on three separate tasks. While our experiments focused on basic linear constraints such as the target-region size and image tags, our framework can be easily extended to other non-linear constraints.
△ Less
Submitted 8 February, 2019; v1 submitted 11 May, 2018;
originally announced May 2018.
-
Normalized Cut Loss for Weakly-supervised CNN Segmentation
Authors:
Meng Tang,
Abdelaziz Djelouah,
Federico Perazzi,
Yuri Boykov,
Christopher Schroers
Abstract:
Most recent semantic segmentation methods train deep convolutional neural networks with fully annotated masks requiring pixel-accuracy for good quality training. Common weakly-supervised approaches generate full masks from partial input (e.g. scribbles or seeds) using standard interactive segmentation methods as preprocessing. But, errors in such masks result in poorer training since standard loss…
▽ More
Most recent semantic segmentation methods train deep convolutional neural networks with fully annotated masks requiring pixel-accuracy for good quality training. Common weakly-supervised approaches generate full masks from partial input (e.g. scribbles or seeds) using standard interactive segmentation methods as preprocessing. But, errors in such masks result in poorer training since standard loss functions (e.g. cross-entropy) do not distinguish seeds from potentially mislabeled other pixels. Inspired by the general ideas in semi-supervised learning, we address these problems via a new principled loss function evaluating network output with criteria standard in "shallow" segmentation, e.g. normalized cut. Unlike prior work, the cross entropy part of our loss evaluates only seeds where labels are known while normalized cut softly evaluates consistency of all pixels. We focus on normalized cut loss where dense Gaussian kernel is efficiently implemented in linear time by fast Bilateral filtering. Our normalized cut loss approach to segmentation brings the quality of weakly-supervised training significantly closer to fully supervised methods.
△ Less
Submitted 4 April, 2018;
originally announced April 2018.
-
On Regularized Losses for Weakly-supervised CNN Segmentation
Authors:
Meng Tang,
Federico Perazzi,
Abdelaziz Djelouah,
Ismail Ben Ayed,
Christopher Schroers,
Yuri Boykov
Abstract:
Minimization of regularized losses is a principled approach to weak supervision well-established in deep learning, in general. However, it is largely overlooked in semantic segmentation currently dominated by methods mimicking full supervision via "fake" fully-labeled training masks (proposals) generated from available partial input. To obtain such full masks the typical methods explicitly use sta…
▽ More
Minimization of regularized losses is a principled approach to weak supervision well-established in deep learning, in general. However, it is largely overlooked in semantic segmentation currently dominated by methods mimicking full supervision via "fake" fully-labeled training masks (proposals) generated from available partial input. To obtain such full masks the typical methods explicitly use standard regularization techniques for "shallow" segmentation, e.g. graph cuts or dense CRFs. In contrast, we integrate such standard regularizers directly into the loss functions over partial input. This approach simplifies weakly-supervised training by avoiding extra MRF/CRF inference steps or layers explicitly generating full masks, while improving both the quality and efficiency of training. This paper proposes and experimentally compares different losses integrating MRF/CRF regularization terms. We juxtapose our regularized losses with earlier proposal-generation methods using explicit regularization steps or layers. Our approach achieves state-of-the-art accuracy in semantic segmentation with near full-supervision quality.
△ Less
Submitted 10 April, 2018; v1 submitted 26 March, 2018;
originally announced March 2018.
-
Kernel clustering: density biases and solutions
Authors:
Dmitrii Marin,
Meng Tang,
Ismail Ben Ayed,
Yuri Boykov
Abstract:
Kernel methods are popular in clustering due to their generality and discriminating power. However, we show that many kernel clustering criteria have density biases theoretically explaining some practically significant artifacts empirically observed in the past. For example, we provide conditions and formally prove the density mode isolation bias in kernel K-means for a common class of kernels. We…
▽ More
Kernel methods are popular in clustering due to their generality and discriminating power. However, we show that many kernel clustering criteria have density biases theoretically explaining some practically significant artifacts empirically observed in the past. For example, we provide conditions and formally prove the density mode isolation bias in kernel K-means for a common class of kernels. We call it Breiman's bias due to its similarity to the histogram mode isolation previously discovered by Breiman in decision tree learning with Gini impurity. We also extend our analysis to other popular kernel clustering methods, e.g. average/normalized cut or dominant sets, where density biases can take different forms. For example, splitting isolated points by cut-based criteria is essentially the sparsest subset bias, which is the opposite of the density mode bias. Our findings suggest that a principled solution for density biases in kernel clustering should directly address data inhomogeneity. We show that density equalization can be implicitly achieved using either locally adaptive weights or locally adaptive kernels. Moreover, density equalization makes many popular kernel clustering objectives equivalent. Our synthetic and real data experiments illustrate density biases and proposed solutions. We anticipate that theoretical understanding of kernel clustering limitations and their principled solutions will be important for a broad spectrum of data analysis applications across the disciplines.
△ Less
Submitted 6 December, 2017; v1 submitted 16 May, 2017;
originally announced May 2017.
-
Efficient optimization for Hierarchically-structured Interacting Segments (HINTS)
Authors:
Hossam Isack,
Olga Veksler,
Ipek Oguz,
Milan Sonka,
Yuri Boykov
Abstract:
We propose an effective optimization algorithm for a general hierarchical segmentation model with geometric interactions between segments. Any given tree can specify a partial order over object labels defining a hierarchy. It is well-established that segment interactions, such as inclusion/exclusion and margin constraints, make the model significantly more discriminant. However, existing optimizat…
▽ More
We propose an effective optimization algorithm for a general hierarchical segmentation model with geometric interactions between segments. Any given tree can specify a partial order over object labels defining a hierarchy. It is well-established that segment interactions, such as inclusion/exclusion and margin constraints, make the model significantly more discriminant. However, existing optimization methods do not allow full use of such models. Generic -expansion results in weak local minima, while common binary multi-layered formulations lead to non-submodularity, complex high-order potentials, or polar domain unwrap** and shape biases. In practice, applying these methods to arbitrary trees does not work except for simple cases. Our main contribution is an optimization method for the Hierarchically-structured Interacting Segments (HINTS) model with arbitrary trees. Our Path-Moves algorithm is based on multi-label MRF formulation and can be seen as a combination of well-known a-expansion and Ishikawa techniques. We show state-of-the-art biomedical segmentation for many diverse examples of complex trees.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
A-expansion for multiple "hedgehog" shapes
Authors:
Hossam Isack,
Yuri Boykov,
Olga Veksler
Abstract:
Overlap** colors and cluttered or weak edges are common segmentation problems requiring additional regularization. For example, star-convexity is popular for interactive single object segmentation due to simplicity and amenability to exact graph cut optimization. This paper proposes an approach to multiobject segmentation where objects could be restricted to separate "hedgehog" shapes. We show t…
▽ More
Overlap** colors and cluttered or weak edges are common segmentation problems requiring additional regularization. For example, star-convexity is popular for interactive single object segmentation due to simplicity and amenability to exact graph cut optimization. This paper proposes an approach to multiobject segmentation where objects could be restricted to separate "hedgehog" shapes. We show that a-expansion moves are submodular for our multi-shape constraints. Each "hedgehog" shape has its surface normals constrained by some vector field, e.g. gradients of a distance transform for user scribbles. Tight constraint give an extreme case of a shape prior enforcing skeleton consistency with the scribbles. Wider cones of allowed normals gives more relaxed hedgehog shapes. A single click and +/-90 degrees normal orientation constraints reduce our hedgehog prior to star-convexity. If all hedgehogs come from single clicks then our approach defines multi-star prior. Our general method has significantly more applications than standard one-star segmentation. For example, in medical data we can separate multiple non-star organs with similar appearances and weak or noisy edges.
△ Less
Submitted 2 February, 2016;
originally announced February 2016.
-
Kernel Cuts: MRF meets Kernel & Spectral Clustering
Authors:
Meng Tang,
Dmitrii Marin,
Ismail Ben Ayed,
Yuri Boykov
Abstract:
We propose a new segmentation model combining common regularization energies, e.g. Markov Random Field (MRF) potentials, and standard pairwise clustering criteria like Normalized Cut (NC), average association (AA), etc. These clustering and regularization models are widely used in machine learning and computer vision, but they were not combined before due to significant differences in the correspo…
▽ More
We propose a new segmentation model combining common regularization energies, e.g. Markov Random Field (MRF) potentials, and standard pairwise clustering criteria like Normalized Cut (NC), average association (AA), etc. These clustering and regularization models are widely used in machine learning and computer vision, but they were not combined before due to significant differences in the corresponding optimization, e.g. spectral relaxation and combinatorial max-flow techniques. On the one hand, we show that many common applications using MRF segmentation energies can benefit from a high-order NC term, e.g. enforcing balanced clustering of arbitrary high-dimensional image features combining color, texture, location, depth, motion, etc. On the other hand, standard clustering applications can benefit from an inclusion of common pairwise or higher-order MRF constraints, e.g. edge alignment, bin-consistency, label cost, etc. To address joint energies like NC+MRF, we propose efficient Kernel Cut algorithms based on bound optimization. While focusing on graph cut and move-making techniques, our new unary (linear) kernel and spectral bound formulations for common pairwise clustering criteria allow to integrate them with any regularization functionals with existing discrete or continuous solvers.
△ Less
Submitted 20 September, 2016; v1 submitted 24 June, 2015;
originally announced June 2015.
-
Thin Structure Estimation with Curvature Regularization
Authors:
Dmitrii Marin,
Yuri Boykov,
Yuchen Zhong
Abstract:
Many applications in vision require estimation of thin structures such as boundary edges, surfaces, roads, blood vessels, neurons, etc. Unlike most previous approaches, we simultaneously detect and delineate thin structures with sub-pixel localization and real-valued orientation estimation. This is an ill-posed problem that requires regularization. We propose an objective function combining detect…
▽ More
Many applications in vision require estimation of thin structures such as boundary edges, surfaces, roads, blood vessels, neurons, etc. Unlike most previous approaches, we simultaneously detect and delineate thin structures with sub-pixel localization and real-valued orientation estimation. This is an ill-posed problem that requires regularization. We propose an objective function combining detection likelihoods with a prior minimizing curvature of the center-lines or surfaces. Unlike simple block-coordinate descent, we develop a novel algorithm that is able to perform joint optimization of location and detection variables more effectively. Our lower bound optimization algorithm applies to quadratic or absolute curvature. The proposed early vision framework is sufficiently general and it can be used in many higher-level applications. We illustrate the advantage of our approach on a range of 2D and 3D examples.
△ Less
Submitted 16 September, 2015; v1 submitted 15 June, 2015;
originally announced June 2015.
-
Volumetric Bias in Segmentation and Reconstruction: Secrets and Solutions
Authors:
Yuri Boykov,
Hossam Isack,
Carl Olsson,
Ismail Ben Ayed
Abstract:
Many standard optimization methods for segmentation and reconstruction compute ML model estimates for appearance or geometry of segments, e.g. Zhu-Yuille 1996, Torr 1998, Chan-Vese 2001, GrabCut 2004, Delong et al. 2012. We observe that the standard likelihood term in these formulations corresponds to a generalized probabilistic K-means energy. In learning it is well known that this energy has a s…
▽ More
Many standard optimization methods for segmentation and reconstruction compute ML model estimates for appearance or geometry of segments, e.g. Zhu-Yuille 1996, Torr 1998, Chan-Vese 2001, GrabCut 2004, Delong et al. 2012. We observe that the standard likelihood term in these formulations corresponds to a generalized probabilistic K-means energy. In learning it is well known that this energy has a strong bias to clusters of equal size, which can be expressed as a penalty for KL divergence from a uniform distribution of cardinalities. However, this volumetric bias has been mostly ignored in computer vision. We demonstrate significant artifacts in standard segmentation and reconstruction methods due to this bias. Moreover, we propose binary and multi-label optimization techniques that either (a) remove this bias or (b) replace it by a KL divergence term for any given target volume distribution. Our general ideas apply to many continuous or discrete energy formulations in segmentation, stereo, and other reconstruction problems.
△ Less
Submitted 1 May, 2015;
originally announced May 2015.
-
Joint Energy-based Detection and Classificationon of Multilingual Text Lines
Authors:
Igor Milevskiy,
Yuri Boykov
Abstract:
This paper proposes a new hierarchical MDL-based model for a joint detection and classification of multilingual text lines in im- ages taken by hand-held cameras. The majority of related text detec- tion methods assume alphabet-based writing in a single language, e.g. in Latin. They use simple clustering heuristics specific to such texts: prox- imity between letters within one line, larger distanc…
▽ More
This paper proposes a new hierarchical MDL-based model for a joint detection and classification of multilingual text lines in im- ages taken by hand-held cameras. The majority of related text detec- tion methods assume alphabet-based writing in a single language, e.g. in Latin. They use simple clustering heuristics specific to such texts: prox- imity between letters within one line, larger distance between separate lines, etc. We are interested in a significantly more ambiguous problem where images combine alphabet and logographic characters from multiple languages and typographic rules vary a lot (e.g. English, Korean, and Chinese). Complexity of detecting and classifying text lines in multiple languages calls for a more principled approach based on information- theoretic principles. Our new MDL model includes data costs combining geometric errors with classification likelihoods and a hierarchical sparsity term based on label costs. This energy model can be efficiently minimized by fusion moves. We demonstrate robustness of the proposed algorithm on a large new database of multilingual text images collected in the pub- lic transit system of Seoul.
△ Less
Submitted 22 July, 2014;
originally announced July 2014.
-
An Experimental Comparison of Trust Region and Level Sets
Authors:
Lena Gorelick,
Ismail BenAyed,
Frank R. Schmidt,
Yuri Boykov
Abstract:
High-order (non-linear) functionals have become very popular in segmentation, stereo and other computer vision problems. Level sets is a well established general gradient descent framework, which is directly applicable to optimization of such functionals and widely used in practice. Recently, another general optimization approach based on trust region methodology was proposed for regional non-line…
▽ More
High-order (non-linear) functionals have become very popular in segmentation, stereo and other computer vision problems. Level sets is a well established general gradient descent framework, which is directly applicable to optimization of such functionals and widely used in practice. Recently, another general optimization approach based on trust region methodology was proposed for regional non-linear functionals. Our goal is a comprehensive experimental comparison of these two frameworks in regard to practical efficiency, robustness to parameters, and optimality. We experiment on a wide range of problems with non-linear constraints on segment volume, appearance and shape.
△ Less
Submitted 8 November, 2013;
originally announced November 2013.
-
Submodularization for Quadratic Pseudo-Boolean Optimization
Authors:
Lena Gorelick,
Yuri Boykov,
Olga Veksler,
Ismail Ben Ayed,
Andrew Delong
Abstract:
Many computer vision problems require optimization of binary non-submodular energies. We propose a general optimization framework based on local submodular approximations (LSA). Unlike standard LP relaxation methods that linearize the whole energy globally, our approach iteratively approximates the energies locally. On the other hand, unlike standard local optimization methods (e.g. gradient desce…
▽ More
Many computer vision problems require optimization of binary non-submodular energies. We propose a general optimization framework based on local submodular approximations (LSA). Unlike standard LP relaxation methods that linearize the whole energy globally, our approach iteratively approximates the energies locally. On the other hand, unlike standard local optimization methods (e.g. gradient descent or projection techniques) we use non-linear submodular approximations and optimize them without leaving the domain of integer solutions. We discuss two specific LSA algorithms based on "trust region" and "auxiliary function" principles, LSA-TR and LSA-AUX. These methods obtain state-of-the-art results on a wide range of applications outperforming many standard techniques such as LBP, QPBO, and TRWS. While our paper is focused on pairwise energies, our ideas extend to higher-order problems. The code is available online (http://vision.csd.uwo.ca/code/).
△ Less
Submitted 15 April, 2014; v1 submitted 7 November, 2013;
originally announced November 2013.
-
Efficient Regularization of Squared Curvature
Authors:
Claudia Nieuwenhuis,
Eno Toeppe,
Lena Gorelick,
Olga Veksler,
Yuri Boykov
Abstract:
Curvature has received increased attention as an important alternative to length based regularization in computer vision. In contrast to length, it preserves elongated structures and fine details. Existing approaches are either inefficient, or have low angular resolution and yield results with strong block artifacts. We derive a new model for computing squared curvature based on integral geometry.…
▽ More
Curvature has received increased attention as an important alternative to length based regularization in computer vision. In contrast to length, it preserves elongated structures and fine details. Existing approaches are either inefficient, or have low angular resolution and yield results with strong block artifacts. We derive a new model for computing squared curvature based on integral geometry. The model counts responses of straight line triple cliques. The corresponding energy decomposes into submodular and supermodular pairwise potentials. We show that this energy can be efficiently minimized even for high angular resolutions using the trust region framework. Our results confirm that we obtain accurate and visually pleasing solutions without strong artifacts at reasonable run times.
△ Less
Submitted 15 April, 2014; v1 submitted 7 November, 2013;
originally announced November 2013.
-
Joint optimization of fitting & matching in multi-view reconstruction
Authors:
Hossam Isack,
Yuri Boykov
Abstract:
Many standard approaches for geometric model fitting are based on pre-matched image features. Typically, such pre-matching uses only feature appearances (e.g. SIFT) and a large number of non-unique features must be discarded in order to control the false positive rate. In contrast, we solve feature matching and multi-model fitting problems in a joint optimization framework. This paper proposes sev…
▽ More
Many standard approaches for geometric model fitting are based on pre-matched image features. Typically, such pre-matching uses only feature appearances (e.g. SIFT) and a large number of non-unique features must be discarded in order to control the false positive rate. In contrast, we solve feature matching and multi-model fitting problems in a joint optimization framework. This paper proposes several fit-&-match energy formulations based on a generalization of the assignment problem. We developed an efficient solver based on min-cost-max-flow algorithm that finds near optimal solutions. Our approach significantly increases the number of detected matches. In practice, energy-based joint fitting & matching allows to increase the distance between view-points previously restricted by robustness of local SIFT-matching and to improve the model fitting accuracy when compared to state-of-the-art multi-model fitting techniques.
△ Less
Submitted 9 April, 2014; v1 submitted 11 March, 2013;
originally announced March 2013.
-
Simplifying Energy Optimization using Partial Enumeration
Authors:
Carl Olsson,
Johannes Ulen,
Yuri Boykov,
Vladimir Kolmogorov
Abstract:
Energies with high-order non-submodular interactions have been shown to be very useful in vision due to their high modeling power. Optimization of such energies, however, is generally NP-hard. A naive approach that works for small problem instances is exhaustive search, that is, enumeration of all possible labelings of the underlying graph. We propose a general minimization approach for large grap…
▽ More
Energies with high-order non-submodular interactions have been shown to be very useful in vision due to their high modeling power. Optimization of such energies, however, is generally NP-hard. A naive approach that works for small problem instances is exhaustive search, that is, enumeration of all possible labelings of the underlying graph. We propose a general minimization approach for large graphs based on enumeration of labelings of certain small patches. This partial enumeration technique reduces complex high-order energy formulations to pairwise Constraint Satisfaction Problems with unary costs (uCSP), which can be efficiently solved using standard methods like TRW-S. Our approach outperforms a number of existing state-of-the-art algorithms on well known difficult problems (e.g. curvature regularization, stereo, deconvolution); it gives near global minimum and better speed.
Our main application of interest is curvature regularization. In the context of segmentation, our partial enumeration technique allows to evaluate curvature directly on small patches using a novel integral geometry approach.
△ Less
Submitted 8 October, 2013; v1 submitted 7 March, 2013;
originally announced March 2013.