-
Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training
Authors:
Johannes Burchert,
Thorben Werner,
Vijaya Krishna Yalavarthi,
Diego Coello de Portugal,
Maximilian Stubbemann,
Lars Schmidt-Thieme
Abstract:
As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for e…
▽ More
As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Convergence of Some Convex Message Passing Algorithms to a Fixed Point
Authors:
Vaclav Voracek,
Tomas Werner
Abstract:
A popular approach to the MAP inference problem in graphical models is to minimize an upper bound obtained from a dual linear programming or Lagrangian relaxation by (block-)coordinate descent. This is also known as convex/convergent message passing; examples are max-sum diffusion and sequential tree-reweighted message passing (TRW-S). Convergence properties of these methods are currently not full…
▽ More
A popular approach to the MAP inference problem in graphical models is to minimize an upper bound obtained from a dual linear programming or Lagrangian relaxation by (block-)coordinate descent. This is also known as convex/convergent message passing; examples are max-sum diffusion and sequential tree-reweighted message passing (TRW-S). Convergence properties of these methods are currently not fully understood. They have been proved to converge to the set characterized by local consistency of active constraints, with unknown convergence rate; however, it was not clear if the iterates converge at all (to any point). We prove a stronger result (conjectured before but never proved): the iterates converge to a fixed point of the method. Moreover, we show that the algorithm terminates within $\mathcal{O}(1/\varepsilon)$ iterations. We first prove this for a version of coordinate descent applied to a general piecewise-affine convex objective. Then we show that several convex message passing methods are special cases of this method. Finally, we show that a slightly different version of coordinate descent can cycle.
△ Less
Submitted 5 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Human-machine social systems
Authors:
Milena Tsvetkova,
Taha Yasseri,
Niccolo Pescetelli,
Tobias Werner
Abstract:
From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and…
▽ More
From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and autonomous machines constitute complex adaptive social systems where the collective outcomes cannot be simply deduced from either human or machine behavior alone. Under this paradigm, we review recent experimental, theoretical, and observational research from across a range of disciplines - robotics, human-computer interaction, web science, complexity science, computational social science, finance, economics, political science, social psychology, and sociology. We identify general dynamics and patterns in situations of competition, coordination, cooperation, contagion, and collective decision-making, and contextualize them in four prominent existing human-machine communities: high-frequency trading markets, the social media platform formerly known as Twitter, the open-collaboration encyclopedia Wikipedia, and the news aggregation and discussion community Reddit. We conclude with suggestions for the research, design, and governance of human-machine social systems, which are necessary to reduce misinformation, prevent financial crashes, improve road safety, overcome labor market disruptions, and enable a better human future.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Towards Comparable Active Learning
Authors:
Thorben Werner,
Johannes Burchert,
Lars Schmidt-Thieme
Abstract:
Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked pr…
▽ More
Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked problems for reproducing AL experiments that can lead to unfair comparisons and increased variance in the results. This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and performant oracle algorithm for evaluation. To the best of our knowledge, we propose the first AL benchmark that tests algorithms in 3 major domains: Tabular, Image, and Text. We report empirical results for 6 widely used algorithms on 7 real-world and 2 synthetic datasets and aggregate them into a domain-specific ranking of AL algorithms.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Towards Comparable Knowledge Distillation in Semantic Image Segmentation
Authors:
Onno Niemann,
Christopher Vox,
Thorben Werner
Abstract:
Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this probl…
▽ More
Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the student model used as reference point. In our work, we reveal problems of insufficient hyperparameter tuning by showing that distillation improvements of two widely accepted frameworks, SKD and IFVD, vanish when hyperparameters are optimized sufficiently. To improve comparability of future research in the field, we establish a solid baseline for three datasets and two student models and provide extensive information on hyperparameter tuning. We find that only two out of eight techniques can compete with our simple baseline on the ADE20K dataset.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Global quantitative robustness of regression feed-forward neural networks
Authors:
Tino Werner
Abstract:
Neural networks are an indispensable model class for many complex learning tasks. Despite the popularity and importance of neural networks and many different established techniques from literature for stabilization and robustification of the training, the classical concepts from robust statistics have rarely been considered so far in the context of neural networks. Therefore, we adapt the notion o…
▽ More
Neural networks are an indispensable model class for many complex learning tasks. Despite the popularity and importance of neural networks and many different established techniques from literature for stabilization and robustification of the training, the classical concepts from robust statistics have rarely been considered so far in the context of neural networks. Therefore, we adapt the notion of the regression breakdown point to regression neural networks and compute the breakdown point for different feed-forward network configurations and contamination settings. In an extensive simulation study, we compare the performance, measured by the out-of-sample loss, by a proxy of the breakdown rate and by the training steps, of non-robust and robust regression feed-forward neural networks in a plethora of different configurations. The results indeed motivate to use robust loss functions for neural network training.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development
Authors:
Nghia Duong-Trung,
Stefan Born,
Jong Woo Kim,
Marie-Therese Schermeyer,
Katharina Paulick,
Maxim Borisyak,
Mariano Nicolas Cruz-Bournazou,
Thorben Werner,
Randolf Scholz,
Lars Schmidt-Thieme,
Peter Neubauer,
Ernesto Martinez
Abstract:
Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of…
▽ More
Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of the whole experimental cycle, including model building and practical planning, thus allowing human experts to focus on the more demanding and overarching cognitive tasks. First, probabilistic programming is used for the autonomous building of predictive models. Second, machine learning automatically assesses alternative decisions by planning experiments to test hypotheses and conducting investigations to gather informative data that focus on model selection based on the uncertainty of model predictions. This review provides a comprehensive overview of ML-based automation in bioprocess development. On the one hand, the biotech and bioengineering community should be aware of the potential and, most importantly, the limitation of existing ML solutions for their application in biotechnology and biopharma. On the other hand, it is essential to identify the missing links to enable the easy implementation of ML and Artificial Intelligence (AI) tools in valuable solutions for the bio-community.
△ Less
Submitted 1 November, 2022; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
Authors:
Julian Wörmann,
Daniel Bogdoll,
Christian Brunner,
Etienne Bührle,
Han Chen,
Evaristus Fuh Chuo,
Kostadin Cvejoski,
Ludger van Elst,
Philip Gottschall,
Stefan Griesche,
Christian Hellert,
Christian Hesels,
Sebastian Houben,
Tim Joseph,
Niklas Keil,
Johann Kelsch,
Mert Keser,
Hendrik Königshof,
Erwin Kraft,
Leonie Kreuser,
Kevin Krone,
Tobias Latka,
Denny Mattern,
Stefan Matthes,
Franz Motzkus
, et al. (27 additional authors not shown)
Abstract:
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical con…
▽ More
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
△ Less
Submitted 20 November, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Loss-guided Stability Selection
Authors:
Tino Werner
Abstract:
In modern data analysis, sparse model selection becomes inevitable once the number of predictors variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The celebrated Stability Selection overcomes these weaknesses by aggregating models, based on subsamples of the training data, followed by choosing a stable predictor set wh…
▽ More
In modern data analysis, sparse model selection becomes inevitable once the number of predictors variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The celebrated Stability Selection overcomes these weaknesses by aggregating models, based on subsamples of the training data, followed by choosing a stable predictor set which is usually much sparser than the predictor sets from the raw models. The standard Stability Selection is based on a global criterion, namely the per-family error rate, while additionally requiring expert knowledge to suitably configure the hyperparameters. Since model selection depends on the loss function, i.e., predictor sets selected w.r.t. some particular loss function differ from those selected w.r.t. some other loss function, we propose a Stability Selection variant which respects the chosen loss function via an additional validation step based on out-of-sample validation data, optionally enhanced with an exhaustive search strategy. Our Stability Selection variants are widely applicable and user-friendly. Moreover, our Stability Selection variants can avoid the issue of severe underfitting which affects the original Stability Selection for noisy high-dimensional data, so our priority is not to avoid false positives at all costs but to result in a sparse stable model with which one can make predictions. Experiments where we consider both regression and binary classification and where we use Boosting as model selection algorithm reveal a significant precision improvement compared to raw Boosting models while not suffering from any of the mentioned issues of the original Stability Selection.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Super-Reparametrizations of Weighted CSPs: Properties and Optimization Perspective
Authors:
Tomáš Dlask,
Tomáš Werner,
Simon de Givry
Abstract:
The notion of reparametrizations of Weighted CSPs (WCSPs) (also known as equivalence-preserving transformations of WCSPs) is well-known and finds its use in many algorithms to approximate or bound the optimal WCSP value. In contrast, the concept of super-reparametrizations (which are changes of the weights that keep or increase the WCSP objective for every assignment) was already proposed but neve…
▽ More
The notion of reparametrizations of Weighted CSPs (WCSPs) (also known as equivalence-preserving transformations of WCSPs) is well-known and finds its use in many algorithms to approximate or bound the optimal WCSP value. In contrast, the concept of super-reparametrizations (which are changes of the weights that keep or increase the WCSP objective for every assignment) was already proposed but never studied in detail. To fill this gap, we present a number of theoretical properties of super-reparametrizations and compare them to those of reparametrizations. Furthermore, we propose a framework for computing upper bounds on the optimal value of the (maximization version of) WCSP using super-reparametrizations. We show that it is in principle possible to employ arbitrary (under some technical conditions) constraint propagation rules to improve the bound. For arc consistency in particular, the method reduces to the known Virtual AC (VAC) algorithm. We implemented the method for singleton arc consistency (SAC) and compared it to other strong local consistencies in WCSPs on a public benchmark. The results show that the bounds obtained from SAC are superior for many instance groups.
△ Less
Submitted 17 May, 2023; v1 submitted 6 January, 2022;
originally announced January 2022.
-
EvoLearner: Learning Description Logics with Evolutionary Algorithms
Authors:
Stefan Heindorf,
Lukas Blübaum,
Nick Düsterhus,
Till Werner,
Varun Nandkumar Golani,
Caglar Demir,
Axel-Cyrille Ngonga Ngomo
Abstract:
Classifying nodes in knowledge graphs is an important task, e.g., for predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowled…
▽ More
Classifying nodes in knowledge graphs is an important task, e.g., for predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowledge. Towards this end, learning description logic concepts from positive and negative examples has been proposed. However, learning such concepts often takes a long time and state-of-the-art approaches provide limited support for literal data values, although they are crucial for many applications. In this paper, we propose EvoLearner - an evolutionary approach to learn concepts in ALCQ(D), which is the attributive language with complement (ALC) paired with qualified cardinality restrictions (Q) and data properties (D). We contribute a novel initialization method for the initial population: starting from positive examples, we perform biased random walks and translate them to description logic concepts. Moreover, we improve support for data properties by maximizing information gain when deciding where to split the data. We show that our approach significantly outperforms the state of the art on the benchmarking framework SML-Bench for structured machine learning. Our ablation study confirms that this is due to our novel initialization method and support for data properties.
△ Less
Submitted 8 March, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Reinforcement Learning Approach to Active Learning for Image Classification
Authors:
Thorben Werner
Abstract:
Machine Learning requires large amounts of labeled data to fit a model. Many datasets are already publicly available, nevertheless forcing application possibilities of machine learning to the domains of those public datasets. The ever-growing penetration of machine learning algorithms in new application areas requires solutions for the need for data in those new domains. This thesis works on activ…
▽ More
Machine Learning requires large amounts of labeled data to fit a model. Many datasets are already publicly available, nevertheless forcing application possibilities of machine learning to the domains of those public datasets. The ever-growing penetration of machine learning algorithms in new application areas requires solutions for the need for data in those new domains. This thesis works on active learning as one possible solution to reduce the amount of data that needs to be processed by hand, by processing only those datapoints that specifically benefit the training of a strong model for the task. A newly proposed framework for framing the active learning workflow as a reinforcement learning problem is adapted for image classification and a series of three experiments is conducted. Each experiment is evaluated and potential issues with the approach are outlined. Each following experiment then proposes improvements to the framework and evaluates their impact. After the last experiment, a final conclusion is drawn, unfortunately rejecting this work's hypothesis and outlining that the proposed framework at the moment is not capable of improving active learning for image classification with a trained reinforcement learning agent.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Intrinsic Quality Assessment of Arguments
Authors:
Henning Wachsmuth,
Till Werner
Abstract:
Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument's arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument's text. In systema…
▽ More
Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument's arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument's text. In systematic experiments with eight feature types on an existing corpus, we observe moderate but significant learning success for most dimensions. Rhetorical quality seems hardest to assess, and subjectivity features turn out strong, although length bias in the corpus impedes full validity. We also find that human assessors differ more clearly to each other than to our approach.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
A Class of Linear Programs Solvable by Coordinate-Wise Minimization
Authors:
Tomáš Dlask,
Tomáš Werner
Abstract:
Coordinate-wise minimization is a simple popular method for large-scale optimization. Unfortunately, for general (non-differentiable) convex problems it may not find global minima. We present a class of linear programs that coordinate-wise minimization solves exactly. We show that dual LP relaxations of several well-known combinatorial optimization problems are in this class and the method finds a…
▽ More
Coordinate-wise minimization is a simple popular method for large-scale optimization. Unfortunately, for general (non-differentiable) convex problems it may not find global minima. We present a class of linear programs that coordinate-wise minimization solves exactly. We show that dual LP relaxations of several well-known combinatorial optimization problems are in this class and the method finds a global minimum with sufficient accuracy in reasonable runtimes. Moreover, for extensions of these problems that no longer are in this class the method yields reasonably good suboptima. Though the presented LP relaxations can be solved by more efficient methods (such as max-flow), our results are theoretically non-trivial and can lead to new large-scale optimization algorithms in the future.
△ Less
Submitted 14 September, 2020; v1 submitted 28 January, 2020;
originally announced January 2020.
-
Automatic Analysis of Sewer Pipes Based on Unrolled Monocular Fisheye Images
Authors:
Johannes Künzel,
Thomas Werner,
Ronja Möller,
Peter Eisert,
Jan Waschnewski,
Ralf Hilpert
Abstract:
The task of detecting and classifying damages in sewer pipes offers an important application area for computer vision algorithms. This paper describes a system, which is capable of accomplishing this task solely based on low quality and severely compressed fisheye images from a pipe inspection robot. Relying on robust image features, we estimate camera poses, model the image lighting, and exploit…
▽ More
The task of detecting and classifying damages in sewer pipes offers an important application area for computer vision algorithms. This paper describes a system, which is capable of accomplishing this task solely based on low quality and severely compressed fisheye images from a pipe inspection robot. Relying on robust image features, we estimate camera poses, model the image lighting, and exploit this information to generate high quality cylindrical unwraps of the pipes' surfaces.Based on the generated images, we apply semantic labeling based on deep convolutional neural networks to detect and classify defects as well as structural elements.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Relative Interior Rule in Block-Coordinate Minimization
Authors:
Tomáš Werner,
Daniel Průša
Abstract:
(Block-)coordinate minimization is an iterative optimization method which in every iteration finds a global minimum of the objective over a variable or a subset of variables, while kee** the remaining variables constant. While for some problems, coordinate minimization converges to a global minimum (e.g., convex differentiable objective), for general (non-differentiable) convex problems this may…
▽ More
(Block-)coordinate minimization is an iterative optimization method which in every iteration finds a global minimum of the objective over a variable or a subset of variables, while kee** the remaining variables constant. While for some problems, coordinate minimization converges to a global minimum (e.g., convex differentiable objective), for general (non-differentiable) convex problems this may not be the case. Despite this drawback, (block-)coordinate minimization can be an acceptable option for large-scale non-differentiable convex problems; an example is methods to solve the linear programming relaxation of the discrete energy minimization problem (MAP inference in graphical models). When block-coordinate minimization is applied to a general convex problem, in every iteration the minimizer over the current coordinate block need not be unique and therefore a single minimizer must be chosen. We propose that this minimizer be chosen from the relative interior of the set of all minimizers over the current block. We show that this rule is not worse, in a certain precise sense, than any other rule.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
A review on ranking problems in statistical learning
Authors:
Tino Werner
Abstract:
Ranking problems, also known as preference learning problems, define a widely spread class of statistical learning problems with many applications, including fraud detection, document ranking, medicine, credit risk screening, image ranking or media memorability. In this article, we systematically review different types of instance ranking problems, i.e., ranking problems that require the predictio…
▽ More
Ranking problems, also known as preference learning problems, define a widely spread class of statistical learning problems with many applications, including fraud detection, document ranking, medicine, credit risk screening, image ranking or media memorability. In this article, we systematically review different types of instance ranking problems, i.e., ranking problems that require the prediction of an order of the response variables, and the corresponding loss functions resp. goodness criteria. We discuss the difficulties when trying to optimize those criteria. As for a detailed and comprehensive overview of existing machine learning techniques to solve such ranking problems, we systemize existing techniques and recapitulate the corresponding optimization problems in a unified notation. We also discuss to which of the ranking problems the respective algorithms are tailored and identify their strengths and limitations. Computational aspects and open research problems are also considered.
△ Less
Submitted 16 December, 2020; v1 submitted 6 September, 2019;
originally announced September 2019.
-
On Coordinate Minimization of Convex Piecewise-Affine Functions
Authors:
Tomas Werner
Abstract:
A popular class of algorithms to optimize the dual LP relaxation of the discrete energy minimization problem (a.k.a.\ MAP inference in graphical models or valued constraint satisfaction) are convergent message-passing algorithms, such as max-sum diffusion, TRW-S, MPLP and SRMP. These algorithms are successful in practice, despite the fact that they are a version of coordinate minimization applied…
▽ More
A popular class of algorithms to optimize the dual LP relaxation of the discrete energy minimization problem (a.k.a.\ MAP inference in graphical models or valued constraint satisfaction) are convergent message-passing algorithms, such as max-sum diffusion, TRW-S, MPLP and SRMP. These algorithms are successful in practice, despite the fact that they are a version of coordinate minimization applied to a convex piecewise-affine function, which is not guaranteed to converge to a global minimizer. These algorithms converge only to a local minimizer, characterized by local consistency known from constraint programming. We generalize max-sum diffusion to a version of coordinate minimization applicable to an arbitrary convex piecewise-affine function, which converges to a local consistency condition. This condition can be seen as the sign relaxation of the global optimality condition.
△ Less
Submitted 14 September, 2017;
originally announced September 2017.
-
Primal View on Belief Propagation
Authors:
Tomas Werner
Abstract:
It is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before co…
▽ More
It is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before convergence. Thus, we have no better understanding of BP than as an algorithm to seek for a common zero of a system of non-linear functions, not explicitly related to each other. In this theoretical paper, we show that these functions are in fact explicitly related -- they are the partial derivatives of a single function of reparameterizations. That means, BP seeks for a stationary point of a single function, without any constraints. This function has a very natural form: it is a linear combination of local log-partition functions, exactly as the Bethe entropy is the same linear combination of local entropies.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.
-
Zero-Temperature Limit of a Convergent Algorithm to Minimize the Bethe Free Energy
Authors:
Tomas Werner
Abstract:
After the discovery that fixed points of loopy belief propagation coincide with stationary points of the Bethe free energy, several researchers proposed provably convergent algorithms to directly minimize the Bethe free energy. These algorithms were formulated only for non-zero temperature (thus finding fixed points of the sum-product algorithm) and their possible extension to zero temperature is…
▽ More
After the discovery that fixed points of loopy belief propagation coincide with stationary points of the Bethe free energy, several researchers proposed provably convergent algorithms to directly minimize the Bethe free energy. These algorithms were formulated only for non-zero temperature (thus finding fixed points of the sum-product algorithm) and their possible extension to zero temperature is not obvious. We present the zero-temperature limit of the double-loop algorithm by Heskes, which converges a max-product fixed point. The inner loop of this algorithm is max-sum diffusion. Under certain conditions, the algorithm combines the complementary advantages of the max-product belief propagation and max-sum diffusion (LP relaxation): it yields good approximation of both ground states and max-marginals.
△ Less
Submitted 22 December, 2011;
originally announced December 2011.