Search | arXiv e-print repository

Assessment of Uncertainty Quantification in Universal Differential Equations

Authors: Nina Schmid, David Fernandes del Pozo, Willem Waegeman, Jan Hasenauer

Abstract: Scientific Machine Learning is a new class of approaches that integrate physical knowledge and mechanistic models with data-driven techniques for uncovering governing equations of complex processes. Among the available approaches, Universal Differential Equations (UDEs) are used to combine prior knowledge in the form of mechanistic formulations with universal function approximators, like neural ne… ▽ More Scientific Machine Learning is a new class of approaches that integrate physical knowledge and mechanistic models with data-driven techniques for uncovering governing equations of complex processes. Among the available approaches, Universal Differential Equations (UDEs) are used to combine prior knowledge in the form of mechanistic formulations with universal function approximators, like neural networks. Integral to the efficacy of UDEs is the joint estimation of parameters within mechanistic formulations and the universal function approximators using empirical data. The robustness and applicability of resultant models, however, hinge upon the rigorous quantification of uncertainties associated with these parameters, as well as the predictive capabilities of the overall model or its constituent components. With this work, we provide a formalisation of uncertainty quantification (UQ) for UDEs and investigate important frequentist and Bayesian methods. By analysing three synthetic examples of varying complexity, we evaluate the validity and efficiency of ensembles, variational inference and Markov chain Monte Carlo sampling as epistemic UQ methods for UDEs. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Shared last authorship between W.W. and J.H

arXiv:2402.09056 [pdf, other]

Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?

Authors: Mira Jürgens, Nis Meinert, Viktor Bengs, Eyke Hüllermeier, Willem Waegeman

Abstract: Trustworthy ML systems should not only return accurate predictions, but also a reliable representation of their uncertainty. Bayesian methods are commonly used to quantify both aleatoric and epistemic uncertainty, but alternative approaches, such as evidential deep learning methods, have become popular in recent years. The latter group of methods in essence extends empirical risk minimization (ERM… ▽ More Trustworthy ML systems should not only return accurate predictions, but also a reliable representation of their uncertainty. Bayesian methods are commonly used to quantify both aleatoric and epistemic uncertainty, but alternative approaches, such as evidential deep learning methods, have become popular in recent years. The latter group of methods in essence extends empirical risk minimization (ERM) for predicting second-order probability distributions over outcomes, from which measures of epistemic (and aleatoric) uncertainty can be extracted. This paper presents novel theoretical insights of evidential deep learning, highlighting the difficulties in optimizing second-order loss functions and interpreting the resulting epistemic uncertainty measures. With a systematic setup that covers a wide range of approaches for classification, regression and counts, it provides novel insights into issues of identifiability and convergence in second-order loss minimization, and the relative (rather than absolute) nature of epistemic uncertainty measures. △ Less

Submitted 20 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2309.08313 [pdf, other]

Conditional validity of heteroskedastic conformal regression

Authors: Nicolas Dewolf, Bernard De Baets, Willem Waegeman

Abstract: Conformal prediction, and split conformal prediction as a specific implementation, offer a distribution-free approach to estimating prediction intervals with statistical guarantees. Recent work has shown that split conformal prediction can produce state-of-the-art prediction intervals when focusing on marginal coverage, i.e. on a calibration dataset the method produces on average prediction interv… ▽ More Conformal prediction, and split conformal prediction as a specific implementation, offer a distribution-free approach to estimating prediction intervals with statistical guarantees. Recent work has shown that split conformal prediction can produce state-of-the-art prediction intervals when focusing on marginal coverage, i.e. on a calibration dataset the method produces on average prediction intervals that contain the ground truth with a predefined coverage level. However, such intervals are often not adaptive, which can be problematic for regression problems with heteroskedastic noise. This paper tries to shed new light on how prediction intervals can be constructed, using methods such as normalized and Mondrian conformal prediction, in such a way that they adapt to the heteroskedasticity of the underlying process. Theoretical and experimental results are presented in which these methods are compared in a systematic way. In particular, it is shown how the conditional validity of a chosen conformal predictor can be related to (implicit) assumptions about the data-generating distribution. △ Less

Submitted 30 April, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 36 pages

arXiv:2301.12736 [pdf, ps, other]

On Second-Order Scoring Rules for Epistemic Uncertainty Quantification

Authors: Viktor Bengs, Eyke Hüllermeier, Willem Waegeman

Abstract: It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a l… ▽ More It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules. △ Less

Submitted 30 January, 2023; originally announced January 2023.

MSC Class: 68T37 (Primary) 68T30 (Secondary)

arXiv:2211.04362 [pdf, other]

Hyperparameter optimization in deep multi-target prediction

Authors: Dimitrios Iliadis, Marcel Wever, Bernard De Baets, Willem Waegeman

Abstract: As a result of the ever increasing complexity of configuring and fine-tuning machine learning models, the field of automated machine learning (AutoML) has emerged over the past decade. However, software implementations like Auto-WEKA and Auto-sklearn typically focus on classical machine learning (ML) tasks such as classification and regression. Our work can be seen as the first attempt at offering… ▽ More As a result of the ever increasing complexity of configuring and fine-tuning machine learning models, the field of automated machine learning (AutoML) has emerged over the past decade. However, software implementations like Auto-WEKA and Auto-sklearn typically focus on classical machine learning (ML) tasks such as classification and regression. Our work can be seen as the first attempt at offering a single AutoML framework for most problem settings that fall under the umbrella of multi-target prediction, which includes popular ML settings such as multi-label classification, multivariate regression, multi-task learning, dyadic prediction, matrix completion, and zero-shot learning. Automated problem selection and model configuration are achieved by extending DeepMTP, a general deep learning framework for MTP problem settings, with popular hyperparameter optimization (HPO) methods. Our extensive benchmarking across different datasets and MTP problem settings identifies cases where specific HPO methods outperform others. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 17 pages, 4 figures, 1 table

arXiv:2205.10082 [pdf, other]

On the Calibration of Probabilistic Classifier Sets

Authors: Thomas Mortier, Viktor Bengs, Eyke Hüllermeier, Stijn Luca, Willem Waegeman

Abstract: Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via the size of the set. In this paper, we extend the notion of calibration, which is commonly used to evaluate the validity of t… ▽ More Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via the size of the set. In this paper, we extend the notion of calibration, which is commonly used to evaluate the validity of the aleatoric uncertainty representation of a single probabilistic classifier, to assess the validity of an epistemic uncertainty representation obtained by sets of probabilistic classifiers. Broadly speaking, we call a set of probabilistic classifiers calibrated if one can find a calibrated convex combination of these classifiers. To evaluate this notion of calibration, we propose a novel nonparametric calibration test that generalizes an existing test for single probabilistic classifiers to the case of sets of probabilistic classifiers. Making use of this test, we empirically show that ensembles of deep neural networks are often not well calibrated. △ Less

Submitted 19 April, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

arXiv:2203.06676 [pdf, other]

Set-valued prediction in hierarchical classification with constrained representation complexity

Authors: Thomas Mortier, Eyke Hüllermeier, Krzysztof Dembczyński, Willem Waegeman

Abstract: Set-valued prediction is a well-known concept in multi-class classification. When a classifier is uncertain about the class label for a test instance, it can predict a set of classes instead of a single class. In this paper, we focus on hierarchical multi-class classification problems, where valid sets (typically) correspond to internal nodes of the hierarchy. We argue that this is a very strong r… ▽ More Set-valued prediction is a well-known concept in multi-class classification. When a classifier is uncertain about the class label for a test instance, it can predict a set of classes instead of a single class. In this paper, we focus on hierarchical multi-class classification problems, where valid sets (typically) correspond to internal nodes of the hierarchy. We argue that this is a very strong restriction, and we propose a relaxation by introducing the notion of representation complexity for a predicted set. In combination with probabilistic classifiers, this leads to a challenging inference problem for which specific combinatorial optimization algorithms are needed. We propose three methods and evaluate them on benchmark datasets: a naïve approach that is based on matrix-vector multiplication, a reformulation as a knapsack problem with conflict graph, and a recursive tree search method. Experimental results demonstrate that the last method is computationally more efficient than the other two approaches, due to a hierarchical factorization of the conditional class distribution. △ Less

Submitted 13 March, 2022; originally announced March 2022.

arXiv:2203.06102 [pdf, other]

Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation

Authors: Viktor Bengs, Eyke Hüllermeier, Willem Waegeman

Abstract: Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a se… ▽ More Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way. △ Less

Submitted 13 October, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

MSC Class: 68T37 (Primary) 68T30 (Secondary)

arXiv:2107.00363 [pdf, other]

doi 10.1007/s10462-022-10178-5

Valid prediction intervals for regression problems

Authors: Nicolas Dewolf, Bernard De Baets, Willem Waegeman

Abstract: Over the last few decades, various methods have been proposed for estimating prediction intervals in regression settings, including Bayesian methods, ensemble methods, direct interval estimation methods and conformal prediction methods. An important issue is the calibration of these methods: the generated prediction intervals should have a predefined coverage level, without being overly conservati… ▽ More Over the last few decades, various methods have been proposed for estimating prediction intervals in regression settings, including Bayesian methods, ensemble methods, direct interval estimation methods and conformal prediction methods. An important issue is the calibration of these methods: the generated prediction intervals should have a predefined coverage level, without being overly conservative. In this work, we review the above four classes of methods from a conceptual and experimental point of view. Results on benchmark data sets from various domains highlight large fluctuations in performance from one data set to another. These observations can be attributed to the violation of certain assumptions that are inherent to some classes of methods. We illustrate how conformal prediction can be used as a general calibration procedure for methods that deliver poor results without a calibration step. △ Less

Submitted 1 April, 2024; v1 submitted 1 July, 2021; originally announced July 2021.

Comments: Minor correction (bibliography and typo in Fig. 3). Thanks to Dr. María Moreno de Castro for spotting this typo

arXiv:2104.09967 [pdf, other]

Multi-target prediction for dummies using two-branch neural networks

Authors: Dimitrios Iliadis, Bernard De Baets, Willem Waegeman

Abstract: Multi-target prediction (MTP) serves as an umbrella term for machine learning tasks that concern the simultaneous prediction of multiple target variables. Classical instantiations are multi-label classification, multivariate regression, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. Despite the significant similarities, all these domains have… ▽ More Multi-target prediction (MTP) serves as an umbrella term for machine learning tasks that concern the simultaneous prediction of multiple target variables. Classical instantiations are multi-label classification, multivariate regression, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. Despite the significant similarities, all these domains have evolved separately into distinct research areas over the last two decades. This led to the development of a plethora of highly-engineered methods, and created a substantially-high entrance barrier for machine learning practitioners that are not experts in the field. In this work we present a generic deep learning methodology that can be used for a wide range of multi-target prediction problems. We introduce a flexible multi-branch neural network architecture, partially configured via a questionnaire that helps end-users to select a suitable MTP problem setting for their needs. Experimental results for a wide range of domains illustrate that the proposed methodology manifests a competitive performance compared to methods from specific MTP domains. △ Less

Submitted 25 October, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

arXiv:1910.09457 [pdf, other]

doi 10.1007/s10994-021-05946-3

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Authors: Eyke Hüllermeier, Willem Waegeman

Abstract: The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues su… ▽ More The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular. △ Less

Submitted 16 September, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

Comments: 59 pages

arXiv:1906.08129 [pdf, other]

Efficient Set-Valued Prediction in Multi-Class Classification

Authors: Thomas Mortier, Marek Wydmuch, Krzysztof Dembczyński, Eyke Hüllermeier, Willem Waegeman

Abstract: In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a… ▽ More In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility functions. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency. △ Less

Submitted 27 May, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

arXiv:1809.02352 [pdf, other]

Multi-Target Prediction: A Unifying View on Problems and Methods

Authors: Willem Waegeman, Krzysztof Dembczynski, Eyke Huellermeier

Abstract: Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot… ▽ More Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research. △ Less

Submitted 7 September, 2018; originally announced September 2018.

arXiv:1803.01575 [pdf, other]

A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression

Authors: Michiel Stock, Tapio Pahikkala, Antti Airola, Bernard De Baets, Willem Waegeman

Abstract: Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavio… ▽ More Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify existing kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency and spectral filtering properties. Our theoretical results provide valuable insights in assessing the advantages and limitations of existing pairwise learning methods. △ Less

Submitted 5 March, 2018; originally announced March 2018.

Comments: arXiv admin note: text overlap with arXiv:1606.04275

arXiv:1606.04278 [pdf, other]

doi 10.1007/s10618-016-0456-z

Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models

Authors: Michiel Stock, Krzysztof Dembczynski, Bernard De Baets, Willem Waegeman

Abstract: Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference.… ▽ More Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-$K$ predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-K retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach. △ Less

Submitted 14 June, 2016; originally announced June 2016.

Journal ref: Data Min Knowl Disc (2016) 30:1370-1394

arXiv:1606.04275 [pdf, other]

Efficient Pairwise Learning Using Kernel Ridge Regression: an Exact Two-Step Method

Authors: Michiel Stock, Tapio Pahikkala, Antti Airola, Bernard De Baets, Willem Waegeman

Abstract: Pairwise learning or dyadic prediction concerns the prediction of properties for pairs of objects. It can be seen as an umbrella covering various machine learning problems such as matrix completion, collaborative filtering, multi-task learning, transfer learning, network prediction and zero-shot learning. In this work we analyze kernel-based methods for pairwise learning, with a particular focus o… ▽ More Pairwise learning or dyadic prediction concerns the prediction of properties for pairs of objects. It can be seen as an umbrella covering various machine learning problems such as matrix completion, collaborative filtering, multi-task learning, transfer learning, network prediction and zero-shot learning. In this work we analyze kernel-based methods for pairwise learning, with a particular focus on a recently-suggested two-step method. We show that this method offers an appealing alternative for commonly-applied Kronecker-based methods that model dyads by means of pairwise feature representations and pairwise kernels. In a series of theoretical results, we establish correspondences between the two types of methods in terms of linear algebra and spectral filtering, and we analyze their statistical consistency. In addition, the two-step method allows us to establish novel algorithmic shortcuts for efficient training and validation on very large datasets. Putting those properties together, we believe that this simple, yet powerful method can become a standard tool for many problems. Extensive experimental results for a range of practical settings are reported. △ Less

Submitted 14 June, 2016; originally announced June 2016.

arXiv:1506.05950 [pdf, ps, other]

Spectral Analysis of Symmetric and Anti-Symmetric Pairwise Kernels

Authors: Tapio Pahikkala, Markus Viljanen, Antti Airola, Willem Waegeman

Abstract: We consider the problem of learning regression functions from pairwise data when there exists prior knowledge that the relation to be learned is symmetric or anti-symmetric. Such prior knowledge is commonly enforced by symmetrizing or anti-symmetrizing pairwise kernel functions. Through spectral analysis, we show that these transformations reduce the kernel's effective dimension. Further, we provi… ▽ More We consider the problem of learning regression functions from pairwise data when there exists prior knowledge that the relation to be learned is symmetric or anti-symmetric. Such prior knowledge is commonly enforced by symmetrizing or anti-symmetrizing pairwise kernel functions. Through spectral analysis, we show that these transformations reduce the kernel's effective dimension. Further, we provide an analysis of the approximation properties of the resulting kernels, and bound the regularization bias of the kernels in terms of the corresponding bias of the original kernel. △ Less

Submitted 19 June, 2015; originally announced June 2015.

arXiv:1405.4423 [pdf, other]

A two-step learning approach for solving full and almost full cold start problems in dyadic prediction

Authors: Tapio Pahikkala, Michiel Stock, Antti Airola, Tero Aittokallio, Bernard De Baets, Willem Waegeman

Abstract: Dyadic prediction methods operate on pairs of objects (dyads), aiming to infer labels for out-of-sample dyads. We consider the full and almost full cold start problem in dyadic prediction, a setting that occurs when both objects in an out-of-sample dyad have not been observed during training, or if one of them has been observed, but very few times. A popular approach for addressing this problem is… ▽ More Dyadic prediction methods operate on pairs of objects (dyads), aiming to infer labels for out-of-sample dyads. We consider the full and almost full cold start problem in dyadic prediction, a setting that occurs when both objects in an out-of-sample dyad have not been observed during training, or if one of them has been observed, but very few times. A popular approach for addressing this problem is to train a model that makes predictions based on a pairwise feature representation of the dyads, or, in case of kernel methods, based on a tensor product pairwise kernel. As an alternative to such a kernel approach, we introduce a novel two-step learning algorithm that borrows ideas from the fields of pairwise learning and spectral filtering. We show theoretically that the two-step method is very closely related to the tensor product kernel approach, and experimentally that it yields a slightly better predictive performance. Moreover, unlike existing tensor product kernel methods, the two-step method allows closed-form solutions for training and parameter selection via cross-validation estimates both in the full and almost full cold start settings, making the approach much more efficient and straightforward to implement. △ Less

Submitted 17 May, 2014; originally announced May 2014.

arXiv:1405.4394 [pdf, other]

Identification of functionally related enzymes by learning-to-rank methods

Authors: Michiel Stock, Thomas Fober, Eyke Hüllermeier, Serghei Glinca, Gerhard Klebe, Tapio Pahikkala, Antti Airola, Bernard De Baets, Willem Waegeman

Abstract: Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the da… ▽ More Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes. △ Less

Submitted 17 May, 2014; originally announced May 2014.

arXiv:1310.4849 [pdf, other]

On the Bayes-optimality of F-measure maximizers

Authors: Willem Waegeman, Krzysztof Dembczynski, Arkadiusz Jachnik, Weiwei Cheng, Eyke Hullermeier

Abstract: The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective,… ▽ More The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems. △ Less

Submitted 6 March, 2015; v1 submitted 17 October, 2013; originally announced October 2013.

Journal ref: JMLR 15 (2014) 3333-3388

arXiv:1209.4825 [pdf, ps, other]

Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data

Authors: Tapio Pahikkala, Antti Airola, Michiel Stock, Bernard De Baets, Willem Waegeman

Abstract: In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficie… ▽ More In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficient algorithms for conditional ranking by optimizing squared regression and ranking loss functions. We show theoretically, that learning with the ranking loss is likely to generalize better than with the regression loss. Further, we prove that symmetry or reciprocity properties of relations can be efficiently enforced in the learned models. Experiments on synthetic and real-world data illustrate that the proposed methods deliver state-of-the-art performance in terms of predictive power and computational efficiency. Moreover, we also show empirically that incorporating symmetry or reciprocity properties can improve the generalization performance. △ Less

Submitted 8 June, 2013; v1 submitted 21 September, 2012; originally announced September 2012.

arXiv:1111.6473 [pdf, other]

doi 10.1109/TFUZZ.2012.2194151

A kernel-based framework for learning graded relations from data

Authors: Willem Waegeman, Tapio Pahikkala, Antti Airola, Tapio Salakoski, Michiel Stock, Bernard De Baets

Abstract: Driven by a large number of potential applications in areas like bioinformatics, information retrieval and social network analysis, the problem setting of inferring relations between pairs of data objects has recently been investigated quite intensively in the machine learning community. To this end, current approaches typically consider datasets containing crisp relations, so that standard classi… ▽ More Driven by a large number of potential applications in areas like bioinformatics, information retrieval and social network analysis, the problem setting of inferring relations between pairs of data objects has recently been investigated quite intensively in the machine learning community. To this end, current approaches typically consider datasets containing crisp relations, so that standard classification methods can be adopted. However, relations between objects like similarities and preferences are often expressed in a graded manner in real-world applications. A general kernel-based framework for learning relations from data is introduced here. It extends existing approaches because both crisp and graded relations are considered, and it unifies existing approaches because different types of graded relations can be modeled, including symmetric and reciprocal relations. This framework establishes important links between recent developments in fuzzy set theory and machine learning. Its usefulness is demonstrated through various experiments on synthetic and real-world data. △ Less

Submitted 28 November, 2011; originally announced November 2011.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Journal ref: IEEE Transactions on Fuzzy Systems, Volume: 20, Issue: 6, Dec. 2012, pages 1090 - 1101

Showing 1–22 of 22 results for author: Waegeman, W