Search | arXiv e-print repository

Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training

Authors: Johannes Burchert, Thorben Werner, Vijaya Krishna Yalavarthi, Diego Coello de Portugal, Maximilian Stubbemann, Lars Schmidt-Thieme

Abstract: As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for e… ▽ More As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.07004 [pdf, ps, other]

Convergence of Some Convex Message Passing Algorithms to a Fixed Point

Authors: Vaclav Voracek, Tomas Werner

Abstract: A popular approach to the MAP inference problem in graphical models is to minimize an upper bound obtained from a dual linear programming or Lagrangian relaxation by (block-)coordinate descent. This is also known as convex/convergent message passing; examples are max-sum diffusion and sequential tree-reweighted message passing (TRW-S). Convergence properties of these methods are currently not full… ▽ More A popular approach to the MAP inference problem in graphical models is to minimize an upper bound obtained from a dual linear programming or Lagrangian relaxation by (block-)coordinate descent. This is also known as convex/convergent message passing; examples are max-sum diffusion and sequential tree-reweighted message passing (TRW-S). Convergence properties of these methods are currently not fully understood. They have been proved to converge to the set characterized by local consistency of active constraints, with unknown convergence rate; however, it was not clear if the iterates converge at all (to any point). We prove a stronger result (conjectured before but never proved): the iterates converge to a fixed point of the method. Moreover, we show that the algorithm terminates within $\mathcal{O}(1/\varepsilon)$ iterations. We first prove this for a version of coordinate descent applied to a general piecewise-affine convex objective. Then we show that several convex message passing methods are special cases of this method. Finally, we show that a slightly different version of coordinate descent can cycle. △ Less

Submitted 5 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: ICML 2024; comments are welcome

arXiv:2402.14410 [pdf, other]

Human-machine social systems

Authors: Milena Tsvetkova, Taha Yasseri, Niccolo Pescetelli, Tobias Werner

Abstract: From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and… ▽ More From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and autonomous machines constitute complex adaptive social systems where the collective outcomes cannot be simply deduced from either human or machine behavior alone. Under this paradigm, we review recent experimental, theoretical, and observational research from across a range of disciplines - robotics, human-computer interaction, web science, complexity science, computational social science, finance, economics, political science, social psychology, and sociology. We identify general dynamics and patterns in situations of competition, coordination, cooperation, contagion, and collective decision-making, and contextualize them in four prominent existing human-machine communities: high-frequency trading markets, the social media platform formerly known as Twitter, the open-collaboration encyclopedia Wikipedia, and the news aggregation and discussion community Reddit. We conclude with suggestions for the research, design, and governance of human-machine social systems, which are necessary to reduce misinformation, prevent financial crashes, improve road safety, overcome labor market disruptions, and enable a better human future. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 44 pages, 2 figures

ACM Class: A.1; C.2.4; H.1.2; J.4; K.4.0; K.6.0

arXiv:2311.18356 [pdf, ps, other]

Towards Comparable Active Learning

Authors: Thorben Werner, Johannes Burchert, Lars Schmidt-Thieme

Abstract: Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked pr… ▽ More Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked problems for reproducing AL experiments that can lead to unfair comparisons and increased variance in the results. This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and performant oracle algorithm for evaluation. To the best of our knowledge, we propose the first AL benchmark that tests algorithms in 3 major domains: Tabular, Image, and Text. We report empirical results for 6 widely used algorithms on 7 real-world and 2 synthetic datasets and aggregate them into a domain-specific ranking of AL algorithms. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2309.03659 [pdf, other]

Towards Comparable Knowledge Distillation in Semantic Image Segmentation

Authors: Onno Niemann, Christopher Vox, Thorben Werner

Abstract: Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this probl… ▽ More Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the student model used as reference point. In our work, we reveal problems of insufficient hyperparameter tuning by showing that distillation improvements of two widely accepted frameworks, SKD and IFVD, vanish when hyperparameters are optimized sufficiently. To improve comparability of future research in the field, we establish a solid baseline for three datasets and two student models and provide extensive information on hyperparameter tuning. We find that only two out of eight techniques can compete with our simple baseline on the ADE20K dataset. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted by the ECML PKDD 2023 workshop track: Simplification, Compression, Efficiency, and Frugality for Artificial Intelligence (SCEFA). This preprint has not undergone peer review or any post-submission improvements or corrections

arXiv:2211.10124 [pdf, other]

Global quantitative robustness of regression feed-forward neural networks

Authors: Tino Werner

Abstract: Neural networks are an indispensable model class for many complex learning tasks. Despite the popularity and importance of neural networks and many different established techniques from literature for stabilization and robustification of the training, the classical concepts from robust statistics have rarely been considered so far in the context of neural networks. Therefore, we adapt the notion o… ▽ More Neural networks are an indispensable model class for many complex learning tasks. Despite the popularity and importance of neural networks and many different established techniques from literature for stabilization and robustification of the training, the classical concepts from robust statistics have rarely been considered so far in the context of neural networks. Therefore, we adapt the notion of the regression breakdown point to regression neural networks and compute the breakdown point for different feed-forward network configurations and contamination settings. In an extensive simulation study, we compare the performance, measured by the out-of-sample loss, by a proxy of the breakdown rate and by the training steps, of non-robust and robust regression feed-forward neural networks in a plethora of different configurations. The results indeed motivate to use robust loss functions for neural network training. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2209.01083 [pdf, other]

When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development

Authors: Nghia Duong-Trung, Stefan Born, Jong Woo Kim, Marie-Therese Schermeyer, Katharina Paulick, Maxim Borisyak, Mariano Nicolas Cruz-Bournazou, Thorben Werner, Randolf Scholz, Lars Schmidt-Thieme, Peter Neubauer, Ernesto Martinez

Abstract: Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of… ▽ More Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of the whole experimental cycle, including model building and practical planning, thus allowing human experts to focus on the more demanding and overarching cognitive tasks. First, probabilistic programming is used for the autonomous building of predictive models. Second, machine learning automatically assesses alternative decisions by planning experiments to test hypotheses and conducting investigations to gather informative data that focus on model selection based on the uncertainty of model predictions. This review provides a comprehensive overview of ML-based automation in bioprocess development. On the one hand, the biotech and bioengineering community should be aware of the potential and, most importantly, the limitation of existing ML solutions for their application in biotechnology and biopharma. On the other hand, it is essential to identify the missing links to enable the easy implementation of ML and Artificial Intelligence (AI) tools in valuable solutions for the bio-community. △ Less

Submitted 1 November, 2022; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2205.04712 [pdf, other]

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Authors: Julian Wörmann, Daniel Bogdoll, Christian Brunner, Etienne Bührle, Han Chen, Evaristus Fuh Chuo, Kostadin Cvejoski, Ludger van Elst, Philip Gottschall, Stefan Griesche, Christian Hellert, Christian Hesels, Sebastian Houben, Tim Joseph, Niklas Keil, Johann Kelsch, Mert Keser, Hendrik Königshof, Erwin Kraft, Leonie Kreuser, Kevin Krone, Tobias Latka, Denny Mattern, Stefan Matthes, Franz Motzkus , et al. (27 additional authors not shown)

Abstract: The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical con… ▽ More The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving. △ Less

Submitted 20 November, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: 111 pages, Added section on Run-time Network Verification

arXiv:2202.04956 [pdf, other]

Loss-guided Stability Selection

Authors: Tino Werner

Abstract: In modern data analysis, sparse model selection becomes inevitable once the number of predictors variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The celebrated Stability Selection overcomes these weaknesses by aggregating models, based on subsamples of the training data, followed by choosing a stable predictor set wh… ▽ More In modern data analysis, sparse model selection becomes inevitable once the number of predictors variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The celebrated Stability Selection overcomes these weaknesses by aggregating models, based on subsamples of the training data, followed by choosing a stable predictor set which is usually much sparser than the predictor sets from the raw models. The standard Stability Selection is based on a global criterion, namely the per-family error rate, while additionally requiring expert knowledge to suitably configure the hyperparameters. Since model selection depends on the loss function, i.e., predictor sets selected w.r.t. some particular loss function differ from those selected w.r.t. some other loss function, we propose a Stability Selection variant which respects the chosen loss function via an additional validation step based on out-of-sample validation data, optionally enhanced with an exhaustive search strategy. Our Stability Selection variants are widely applicable and user-friendly. Moreover, our Stability Selection variants can avoid the issue of severe underfitting which affects the original Stability Selection for noisy high-dimensional data, so our priority is not to avoid false positives at all costs but to result in a sparse stable model with which one can make predictions. Experiments where we consider both regression and binary classification and where we use Boosting as model selection algorithm reveal a significant precision improvement compared to raw Boosting models while not suffering from any of the mentioned issues of the original Stability Selection. △ Less

Submitted 10 February, 2022; originally announced February 2022.

arXiv:2201.02018 [pdf, other]

doi 10.1007/s10601-023-09343-6

Super-Reparametrizations of Weighted CSPs: Properties and Optimization Perspective

Authors: Tomáš Dlask, Tomáš Werner, Simon de Givry

Abstract: The notion of reparametrizations of Weighted CSPs (WCSPs) (also known as equivalence-preserving transformations of WCSPs) is well-known and finds its use in many algorithms to approximate or bound the optimal WCSP value. In contrast, the concept of super-reparametrizations (which are changes of the weights that keep or increase the WCSP objective for every assignment) was already proposed but neve… ▽ More The notion of reparametrizations of Weighted CSPs (WCSPs) (also known as equivalence-preserving transformations of WCSPs) is well-known and finds its use in many algorithms to approximate or bound the optimal WCSP value. In contrast, the concept of super-reparametrizations (which are changes of the weights that keep or increase the WCSP objective for every assignment) was already proposed but never studied in detail. To fill this gap, we present a number of theoretical properties of super-reparametrizations and compare them to those of reparametrizations. Furthermore, we propose a framework for computing upper bounds on the optimal value of the (maximization version of) WCSP using super-reparametrizations. We show that it is in principle possible to employ arbitrary (under some technical conditions) constraint propagation rules to improve the bound. For arc consistency in particular, the method reduces to the known Virtual AC (VAC) algorithm. We implemented the method for singleton arc consistency (SAC) and compared it to other strong local consistencies in WCSPs on a public benchmark. The results show that the bounds obtained from SAC are superior for many instance groups. △ Less

Submitted 17 May, 2023; v1 submitted 6 January, 2022; originally announced January 2022.

arXiv:2111.04879 [pdf, other]

doi 10.1145/3485447.3511925

EvoLearner: Learning Description Logics with Evolutionary Algorithms

Authors: Stefan Heindorf, Lukas Blübaum, Nick Düsterhus, Till Werner, Varun Nandkumar Golani, Caglar Demir, Axel-Cyrille Ngonga Ngomo

Abstract: Classifying nodes in knowledge graphs is an important task, e.g., for predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowled… ▽ More Classifying nodes in knowledge graphs is an important task, e.g., for predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowledge. Towards this end, learning description logic concepts from positive and negative examples has been proposed. However, learning such concepts often takes a long time and state-of-the-art approaches provide limited support for literal data values, although they are crucial for many applications. In this paper, we propose EvoLearner - an evolutionary approach to learn concepts in ALCQ(D), which is the attributive language with complement (ALC) paired with qualified cardinality restrictions (Q) and data properties (D). We contribute a novel initialization method for the initial population: starting from positive examples, we perform biased random walks and translate them to description logic concepts. Moreover, we improve support for data properties by maximizing information gain when deciding where to split the data. We show that our approach significantly outperforms the state of the art on the benchmarking framework SML-Bench for structured machine learning. Our ablation study confirms that this is due to our novel initialization method and support for data properties. △ Less

Submitted 8 March, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: Accepted at WWW 2022

arXiv:2108.05595 [pdf, other]

Reinforcement Learning Approach to Active Learning for Image Classification

Authors: Thorben Werner

Abstract: Machine Learning requires large amounts of labeled data to fit a model. Many datasets are already publicly available, nevertheless forcing application possibilities of machine learning to the domains of those public datasets. The ever-growing penetration of machine learning algorithms in new application areas requires solutions for the need for data in those new domains. This thesis works on activ… ▽ More Machine Learning requires large amounts of labeled data to fit a model. Many datasets are already publicly available, nevertheless forcing application possibilities of machine learning to the domains of those public datasets. The ever-growing penetration of machine learning algorithms in new application areas requires solutions for the need for data in those new domains. This thesis works on active learning as one possible solution to reduce the amount of data that needs to be processed by hand, by processing only those datapoints that specifically benefit the training of a strong model for the task. A newly proposed framework for framing the active learning workflow as a reinforcement learning problem is adapted for image classification and a series of three experiments is conducted. Each experiment is evaluated and potential issues with the approach are outlined. Each following experiment then proposes improvements to the framework and evaluates their impact. After the last experiment, a final conclusion is drawn, unfortunately rejecting this work's hypothesis and outlining that the proposed framework at the moment is not capable of improving active learning for image classification with a trained reinforcement learning agent. △ Less

Submitted 12 August, 2021; originally announced August 2021.

Comments: Master Thesis

arXiv:2010.12473 [pdf, other]

Intrinsic Quality Assessment of Arguments

Authors: Henning Wachsmuth, Till Werner

Abstract: Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument's arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument's text. In systema… ▽ More Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument's arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument's text. In systematic experiments with eight feature types on an existing corpus, we observe moderate but significant learning success for most dimensions. Rhetorical quality seems hardest to assess, and subjectivity features turn out strong, although length bias in the corpus impedes full validity. We also find that human assessors differ more clearly to each other than to our approach. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: Accepted at COLING 2020

arXiv:2001.10467 [pdf, ps, other]

A Class of Linear Programs Solvable by Coordinate-Wise Minimization

Authors: Tomáš Dlask, Tomáš Werner

Abstract: Coordinate-wise minimization is a simple popular method for large-scale optimization. Unfortunately, for general (non-differentiable) convex problems it may not find global minima. We present a class of linear programs that coordinate-wise minimization solves exactly. We show that dual LP relaxations of several well-known combinatorial optimization problems are in this class and the method finds a… ▽ More Coordinate-wise minimization is a simple popular method for large-scale optimization. Unfortunately, for general (non-differentiable) convex problems it may not find global minima. We present a class of linear programs that coordinate-wise minimization solves exactly. We show that dual LP relaxations of several well-known combinatorial optimization problems are in this class and the method finds a global minimum with sufficient accuracy in reasonable runtimes. Moreover, for extensions of these problems that no longer are in this class the method yields reasonably good suboptima. Though the presented LP relaxations can be solved by more efficient methods (such as max-flow), our results are theoretically non-trivial and can lead to new large-scale optimization algorithms in the future. △ Less

Submitted 14 September, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: The final authenticated version is available online at https://doi.org/10.1007/978-3-030-53552-0_8

arXiv:1912.05222 [pdf, other]

doi 10.1109/WACV.2018.00223

Automatic Analysis of Sewer Pipes Based on Unrolled Monocular Fisheye Images

Authors: Johannes Künzel, Thomas Werner, Ronja Möller, Peter Eisert, Jan Waschnewski, Ralf Hilpert

Abstract: The task of detecting and classifying damages in sewer pipes offers an important application area for computer vision algorithms. This paper describes a system, which is capable of accomplishing this task solely based on low quality and severely compressed fisheye images from a pipe inspection robot. Relying on robust image features, we estimate camera poses, model the image lighting, and exploit… ▽ More The task of detecting and classifying damages in sewer pipes offers an important application area for computer vision algorithms. This paper describes a system, which is capable of accomplishing this task solely based on low quality and severely compressed fisheye images from a pipe inspection robot. Relying on robust image features, we estimate camera poses, model the image lighting, and exploit this information to generate high quality cylindrical unwraps of the pipes' surfaces.Based on the generated images, we apply semantic labeling based on deep convolutional neural networks to detect and classify defects as well as structural elements. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Published in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

Journal ref: J. Kunzel, T. Werner, P. Eisert and J. Waschnewski, "Automatic Analysis of Sewer Pipes Based on Unrolled Monocular Fisheye Images," 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, 2018, pp. 2019-2027

arXiv:1910.09488 [pdf, ps, other]

Relative Interior Rule in Block-Coordinate Minimization

Authors: Tomáš Werner, Daniel Průša

Abstract: (Block-)coordinate minimization is an iterative optimization method which in every iteration finds a global minimum of the objective over a variable or a subset of variables, while kee** the remaining variables constant. While for some problems, coordinate minimization converges to a global minimum (e.g., convex differentiable objective), for general (non-differentiable) convex problems this may… ▽ More (Block-)coordinate minimization is an iterative optimization method which in every iteration finds a global minimum of the objective over a variable or a subset of variables, while kee** the remaining variables constant. While for some problems, coordinate minimization converges to a global minimum (e.g., convex differentiable objective), for general (non-differentiable) convex problems this may not be the case. Despite this drawback, (block-)coordinate minimization can be an acceptable option for large-scale non-differentiable convex problems; an example is methods to solve the linear programming relaxation of the discrete energy minimization problem (MAP inference in graphical models). When block-coordinate minimization is applied to a general convex problem, in every iteration the minimizer over the current coordinate block need not be unique and therefore a single minimizer must be chosen. We propose that this minimizer be chosen from the relative interior of the set of all minimizers over the current block. We show that this rule is not worse, in a certain precise sense, than any other rule. △ Less

Submitted 21 October, 2019; originally announced October 2019.

arXiv:1909.02998 [pdf, ps, other]

A review on ranking problems in statistical learning

Authors: Tino Werner

Abstract: Ranking problems, also known as preference learning problems, define a widely spread class of statistical learning problems with many applications, including fraud detection, document ranking, medicine, credit risk screening, image ranking or media memorability. In this article, we systematically review different types of instance ranking problems, i.e., ranking problems that require the predictio… ▽ More Ranking problems, also known as preference learning problems, define a widely spread class of statistical learning problems with many applications, including fraud detection, document ranking, medicine, credit risk screening, image ranking or media memorability. In this article, we systematically review different types of instance ranking problems, i.e., ranking problems that require the prediction of an order of the response variables, and the corresponding loss functions resp. goodness criteria. We discuss the difficulties when trying to optimize those criteria. As for a detailed and comprehensive overview of existing machine learning techniques to solve such ranking problems, we systemize existing techniques and recapitulate the corresponding optimization problems in a unified notation. We also discuss to which of the ranking problems the respective algorithms are tailored and identify their strengths and limitations. Computational aspects and open research problems are also considered. △ Less

Submitted 16 December, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

arXiv:1709.04989 [pdf, ps, other]

On Coordinate Minimization of Convex Piecewise-Affine Functions

Authors: Tomas Werner

Abstract: A popular class of algorithms to optimize the dual LP relaxation of the discrete energy minimization problem (a.k.a.\ MAP inference in graphical models or valued constraint satisfaction) are convergent message-passing algorithms, such as max-sum diffusion, TRW-S, MPLP and SRMP. These algorithms are successful in practice, despite the fact that they are a version of coordinate minimization applied… ▽ More A popular class of algorithms to optimize the dual LP relaxation of the discrete energy minimization problem (a.k.a.\ MAP inference in graphical models or valued constraint satisfaction) are convergent message-passing algorithms, such as max-sum diffusion, TRW-S, MPLP and SRMP. These algorithms are successful in practice, despite the fact that they are a version of coordinate minimization applied to a convex piecewise-affine function, which is not guaranteed to converge to a global minimizer. These algorithms converge only to a local minimizer, characterized by local consistency known from constraint programming. We generalize max-sum diffusion to a version of coordinate minimization applicable to an arbitrary convex piecewise-affine function, which converges to a local consistency condition. This condition can be seen as the sign relaxation of the global optimality condition. △ Less

Submitted 14 September, 2017; originally announced September 2017.

Comments: Research Report of Dept. of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague

Report number: CTU--CMP--2017--05

arXiv:1203.3526 [pdf]

Primal View on Belief Propagation

Authors: Tomas Werner

Abstract: It is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before co… ▽ More It is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before convergence. Thus, we have no better understanding of BP than as an algorithm to seek for a common zero of a system of non-linear functions, not explicitly related to each other. In this theoretical paper, we show that these functions are in fact explicitly related -- they are the partial derivatives of a single function of reparameterizations. That means, BP seeks for a stationary point of a single function, without any constraints. This function has a very natural form: it is a linear combination of local log-partition functions, exactly as the Bethe entropy is the same linear combination of local entropies. △ Less

Submitted 15 March, 2012; originally announced March 2012.

Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

Report number: UAI-P-2010-PG-651-657

arXiv:1112.5298 [pdf, ps, other]

Zero-Temperature Limit of a Convergent Algorithm to Minimize the Bethe Free Energy

Authors: Tomas Werner

Abstract: After the discovery that fixed points of loopy belief propagation coincide with stationary points of the Bethe free energy, several researchers proposed provably convergent algorithms to directly minimize the Bethe free energy. These algorithms were formulated only for non-zero temperature (thus finding fixed points of the sum-product algorithm) and their possible extension to zero temperature is… ▽ More After the discovery that fixed points of loopy belief propagation coincide with stationary points of the Bethe free energy, several researchers proposed provably convergent algorithms to directly minimize the Bethe free energy. These algorithms were formulated only for non-zero temperature (thus finding fixed points of the sum-product algorithm) and their possible extension to zero temperature is not obvious. We present the zero-temperature limit of the double-loop algorithm by Heskes, which converges a max-product fixed point. The inner loop of this algorithm is max-sum diffusion. Under certain conditions, the algorithm combines the complementary advantages of the max-product belief propagation and max-sum diffusion (LP relaxation): it yields good approximation of both ground states and max-marginals. △ Less

Submitted 22 December, 2011; originally announced December 2011.

Comments: Research Report

Report number: CTU--CMP--2011--14

Showing 1–20 of 20 results for author: Werner, T