Search | arXiv e-print repository

From knowledge-based to data-driven modeling of fuzzy rule-based systems: A critical reflection

Abstract: This paper briefly elaborates on a development in (applied) fuzzy logic that has taken place in the last couple of decades, namely, the complementation or even replacement of the traditional knowledge-based approach to fuzzy rule-based systems design by a data-driven one. It is argued that the classical rule-based modeling paradigm is actually more amenable to the knowledge-based approach, for whi… ▽ More This paper briefly elaborates on a development in (applied) fuzzy logic that has taken place in the last couple of decades, namely, the complementation or even replacement of the traditional knowledge-based approach to fuzzy rule-based systems design by a data-driven one. It is argued that the classical rule-based modeling paradigm is actually more amenable to the knowledge-based approach, for which it has originally been conceived, while being less apt to data-driven model design. An important reason that prevents fuzzy (rule-based) systems from being leveraged in large-scale applications is the flat structure of rule bases, along with the local nature of fuzzy rules and their limited ability to express complex dependencies between variables. This motivates alternative approaches to fuzzy systems modeling, in which functional dependencies can be represented more flexibly and more compactly in terms of hierarchical structures. △ Less

Submitted 2 December, 2017; originally announced December 2017.

arXiv:1711.10207 [pdf, other]

Learning to Rank based on Analogical Reasoning

Authors: Mohsen Ahmadi Fahandar, Eyke Hüllermeier

Abstract: Object ranking or "learning to rank" is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoni… ▽ More Object ranking or "learning to rank" is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects $A,B,C,D$, if object $A$ is known to be preferred to $B$, and $C$ relates to $D$ as $A$ relates to $B$, then $C$ is (supposedly) preferred to $D$. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 8 pages

arXiv:1703.00757 [pdf, other]

Predicting Rankings of Software Verification Competitions

Authors: Mike Czech, Eyke Hüllermeier, Marie-Christine Jakobs, Heike Wehrheim

Abstract: Software verification competitions, such as the annual SV-COMP, evaluate software verification tools with respect to their effectivity and efficiency. Typically, the outcome of a competition is a (possibly category-specific) ranking of the tools. For many applications, such as building portfolio solvers, it would be desirable to have an idea of the (relative) performance of verification tools on a… ▽ More Software verification competitions, such as the annual SV-COMP, evaluate software verification tools with respect to their effectivity and efficiency. Typically, the outcome of a competition is a (possibly category-specific) ranking of the tools. For many applications, such as building portfolio solvers, it would be desirable to have an idea of the (relative) performance of verification tools on a given verification task beforehand, i.e., prior to actually running all tools on the task. In this paper, we present a machine learning approach to predicting rankings of tools on verification tasks. The method builds upon so-called label ranking algorithms, which we complement with appropriate kernels providing a similarity measure for verification tasks. Our kernels employ a graph representation for software source code that mixes elements of control flow and program dependence graphs with abstract syntax trees. Using data sets from SV-COMP, we demonstrate our rank prediction technique to generalize well and achieve a rather high predictive accuracy. In particular, our method outperforms a recently proposed feature-based approach of Demyanova et al. (when applied to rank predictions). △ Less

Submitted 2 March, 2017; originally announced March 2017.

ACM Class: I.2.6; D.2.4

arXiv:1701.09007 [pdf, other]

Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)

Authors: Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, Ke Yi

Abstract: In April 2016, a community of researchers working in the area of Principles of Data Management (PDM) joined in a workshop at the Dagstuhl Castle in Germany. The workshop was organized jointly by the Executive Committee of the ACM Symposium on Principles of Database Systems (PODS) and the Council of the International Conference on Database Theory (ICDT). The mission of this workshop was to identify… ▽ More In April 2016, a community of researchers working in the area of Principles of Data Management (PDM) joined in a workshop at the Dagstuhl Castle in Germany. The workshop was organized jointly by the Executive Committee of the ACM Symposium on Principles of Database Systems (PODS) and the Council of the International Conference on Database Theory (ICDT). The mission of this workshop was to identify and explore some of the most important research directions that have high relevance to society and to Computer Science today, and where the PDM community has the potential to make significant contributions. This report describes the family of research directions that the workshop focused on from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term. △ Less

Submitted 31 January, 2017; originally announced January 2017.

arXiv:1405.4394 [pdf, other]

Identification of functionally related enzymes by learning-to-rank methods

Authors: Michiel Stock, Thomas Fober, Eyke Hüllermeier, Serghei Glinca, Gerhard Klebe, Tapio Pahikkala, Antti Airola, Bernard De Baets, Willem Waegeman

Abstract: Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the da… ▽ More Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes. △ Less

Submitted 17 May, 2014; originally announced May 2014.

arXiv:1310.4849 [pdf, other]

On the Bayes-optimality of F-measure maximizers

Authors: Willem Waegeman, Krzysztof Dembczynski, Arkadiusz Jachnik, Weiwei Cheng, Eyke Hullermeier

Abstract: The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective,… ▽ More The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems. △ Less

Submitted 6 March, 2015; v1 submitted 17 October, 2013; originally announced October 2013.

Journal ref: JMLR 15 (2014) 3333-3388

arXiv:1305.0698 [pdf, other]

Learning from Imprecise and Fuzzy Observations: Data Disambiguation through Generalized Loss Minimization

Authors: Eyke Hüllermeier

Abstract: Methods for analyzing or learning from "fuzzy data" have attracted increasing attention in recent years. In many cases, however, existing methods (for precise, non-fuzzy data) are extended to the fuzzy case in an ad-hoc manner, and without carefully considering the interpretation of a fuzzy set when being used for modeling data. Distinguishing between an ontic and an epistemic interpretation of fu… ▽ More Methods for analyzing or learning from "fuzzy data" have attracted increasing attention in recent years. In many cases, however, existing methods (for precise, non-fuzzy data) are extended to the fuzzy case in an ad-hoc manner, and without carefully considering the interpretation of a fuzzy set when being used for modeling data. Distinguishing between an ontic and an epistemic interpretation of fuzzy set-valued data, and focusing on the latter, we argue that a "fuzzification" of learning algorithms based on an application of the generic extension principle is not appropriate. In fact, the extension principle fails to properly exploit the inductive bias underlying statistical and machine learning methods, although this bias, at least in principle, offers a means for "disambiguating" the fuzzy data. Alternatively, we therefore propose a method which is based on the generalization of loss functions in empirical risk minimization, and which performs model identification and data disambiguation simultaneously. Elaborating on the fuzzification of specific types of losses, we establish connections to well-known loss functions in regression and classification. We compare our approach with related methods and illustrate its use in logistic regression for binary classification. △ Less

Submitted 3 May, 2013; originally announced May 2013.

Journal ref: International Journal of Approximate Reasoning, 55(7):1519-1534, 2014

arXiv:1206.6401 [pdf]

Consistent Multilabel Ranking through Univariate Losses

Authors: Krzysztof Dembczynski, Wojciech Kotlowski, Eyke Huellermeier

Abstract: We consider the problem of rank loss minimization in the setting of multilabel classification, which is usually tackled by means of convex surrogate losses defined on pairs of labels. Very recently, this approach was put into question by a negative result showing that commonly used pairwise surrogate losses, such as exponential and logistic losses, are inconsistent. In this paper, we show a positi… ▽ More We consider the problem of rank loss minimization in the setting of multilabel classification, which is usually tackled by means of convex surrogate losses defined on pairs of labels. Very recently, this approach was put into question by a negative result showing that commonly used pairwise surrogate losses, such as exponential and logistic losses, are inconsistent. In this paper, we show a positive result which is arguably surprising in light of the previous one: the simpler univariate variants of exponential and logistic surrogates (i.e., defined on single labels) are consistent for rank loss minimization. Instead of directly proving convergence, we give a much stronger result by deriving regret bounds and convergence rates. The proposed losses suggest efficient and scalable algorithms, which are tested experimentally. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

arXiv:1112.0508 [pdf, other]

Label Ranking with Abstention: Predicting Partial Orders by Thresholding Probability Distributions (Extended Abstract)

Authors: Weiwei Cheng, Eyke Hüllermeier

Abstract: We consider an extension of the setting of label ranking, in which the learner is allowed to make predictions in the form of partial instead of total orders. Predictions of that kind are interpreted as a partial abstention: If the learner is not sufficiently certain regarding the relative order of two alternatives, it may abstain from this decision and instead declare these alternatives as being i… ▽ More We consider an extension of the setting of label ranking, in which the learner is allowed to make predictions in the form of partial instead of total orders. Predictions of that kind are interpreted as a partial abstention: If the learner is not sufficiently certain regarding the relative order of two alternatives, it may abstain from this decision and instead declare these alternatives as being incomparable. We propose a new method for learning to predict partial orders that improves on an existing approach, both theoretically and empirically. Our method is based on the idea of thresholding the probabilities of pairwise preferences between labels as induced by a predicted (parameterized) probability distribution on the set of all rankings. △ Less

Submitted 2 December, 2011; originally announced December 2011.

Comments: 4 pages, 1 figure, appeared at NIPS 2011 Choice Models and Preference Learning workshop

Showing 101–109 of 109 results for author: Hüllermeier, E