Search | arXiv e-print repository

Statistical Multicriteria Benchmarking via the GSD-Front

Authors: Christoph Jansen, Georg Schollmeyer, Julian Rodemann, Hannah Blocher, Thomas Augustin

Abstract: Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmar… ▽ More Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmark suite. (3) The robustness of the comparisons under small deviations in the underlying assumptions should be verifiable. To address (1), we propose to compare classifiers using a generalized stochastic dominance ordering (GSD) and present the GSD-front as an information-efficient alternative to the classical Pareto-front. For (2), we propose a consistent statistical estimator for the GSD-front and construct a statistical test for whether a (potentially new) classifier lies in the GSD-front of a set of state-of-the-art classifiers. For (3), we relax our proposed test using techniques from robust statistics and imprecise probabilities. We illustrate our concepts on the benchmark suite PMLB and on the platform OpenML. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: CJ, GS,JR and HB equally contributed to this work

MSC Class: 62G05; 62G35; 62G09; 62G10 ACM Class: G.3

arXiv:2405.15294 [pdf, ps, other]

Semi-Supervised Learning guided by the Generalized Bayes Rule under Soft Revision

Authors: Stefan Dietrich, Julian Rodemann, Christoph Jansen

Abstract: We provide a theoretical and computational investigation of the Gamma-Maximin method with soft revision, which was recently proposed as a robust criterion for pseudo-label selection (PLS) in semi-supervised learning. Opposed to traditional methods for PLS we use credal sets of priors ("generalized Bayes") to represent the epistemic modeling uncertainty. These latter are then updated by the Gamma-M… ▽ More We provide a theoretical and computational investigation of the Gamma-Maximin method with soft revision, which was recently proposed as a robust criterion for pseudo-label selection (PLS) in semi-supervised learning. Opposed to traditional methods for PLS we use credal sets of priors ("generalized Bayes") to represent the epistemic modeling uncertainty. These latter are then updated by the Gamma-Maximin method with soft revision. We eventually select pseudo-labeled data that are most likely in light of the least favorable distribution from the so updated credal set. We formalize the task of finding optimal pseudo-labeled data w.r.t. the Gamma-Maximin method with soft revision as an optimization problem. A concrete implementation for the class of logistic models then allows us to compare the predictive power of the method with competing approaches. It is observed that the Gamma-Maximin method with soft revision can achieve very promising results, especially when the proportion of labeled data is low. △ Less

Submitted 4 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted at the 11th International Conference on Soft Methods in Probability and Statistics (SMPS) 2024

MSC Class: 62C12 62C10 ACM Class: I.2.6; G.3

arXiv:2312.12839 [pdf, other]

Comparing Machine Learning Algorithms by Union-Free Generic Depth

Authors: Hannah Blocher, Georg Schollmeyer, Malte Nalenz, Christoph Jansen

Abstract: We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) de… ▽ More We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we provide two examples of classifier comparisons on samples of standard benchmark data sets. Our results demonstrate promisingly the wide variety of different analysis approaches based on ufg methods. Furthermore, the examples outline that our approach differs substantially from existing benchmarking approaches, and thus adds a new perspective to the vivid debate on classifier comparison. △ Less

Submitted 21 February, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2304.09872

arXiv:2306.12803 [pdf, other]

Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement

Authors: Christoph Jansen, Georg Schollmeyer, Hannah Blocher, Julian Rodemann, Thomas Augustin

Abstract: Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mappin… ▽ More Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables map** into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine. △ Less

Submitted 4 March, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: Accepted for the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)

MSC Class: 62G10; 62G35

arXiv:2304.09872 [pdf, other]

Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms

Authors: Hannah Blocher, Georg Schollmeyer, Christoph Jansen, Malte Nalenz

Abstract: We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-fr… ▽ More We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we analyze the distribution of different classifier performances over a sample of standard benchmark data sets. Our results promisingly demonstrate that our approach differs substantially from existing benchmarking approaches and, therefore, adds a new perspective to the vivid debate on the comparison of classifiers. △ Less

Submitted 9 February, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: Accepted to ISIPTA 2023; Forthcoming in: Proceedings of Machine Learning Research

arXiv:2303.01117 [pdf, other]

In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning

Authors: Julian Rodemann, Christoph Jansen, Georg Schollmeyer, Thomas Augustin

Abstract: Self-training is a simple yet effective method within semi-supervised learning. The idea is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we aim at rendering PLS more robust towards the involved modeling assumptions. To this end, we propose to select pseudo-label… ▽ More Self-training is a simple yet effective method within semi-supervised learning. The idea is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we aim at rendering PLS more robust towards the involved modeling assumptions. To this end, we propose to select pseudo-labeled data that maximize a multi-objective utility function. The latter is constructed to account for different sources of uncertainty, three of which we discuss in more detail: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian alpha-cut updating rule for credal sets. As a practical proof of concept, we spotlight the application of three of our robust extensions on simulated and real-world data. Results suggest that in particular robustness w.r.t. model choice can lead to substantial accuracy gains. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 9 pages, 1 figure, under review

arXiv:2212.06832 [pdf, other]

Multi-Target Decision Making under Conditions of Severe Uncertainty

Authors: Christoph Jansen, Georg Schollmeyer, Thomas Augustin

Abstract: The quality of consequences in a decision making problem under (severe) uncertainty must often be compared among different targets (goals, objectives) simultaneously. In addition, the evaluations of a consequence's performance under the various targets often differ in their scale of measurement, classically being either purely ordinal or perfectly cardinal. In this paper, we transfer recent develo… ▽ More The quality of consequences in a decision making problem under (severe) uncertainty must often be compared among different targets (goals, objectives) simultaneously. In addition, the evaluations of a consequence's performance under the various targets often differ in their scale of measurement, classically being either purely ordinal or perfectly cardinal. In this paper, we transfer recent developments from abstract decision theory with incomplete preferential and probabilistic information to this multi-target setting and show how -- by exploiting the (potentially) partial cardinal and partial probabilistic information -- more informative orders for comparing decisions can be given than the Pareto order. We discuss some interesting properties of the proposed orders between decision options and show how they can be concretely computed by linear optimization. We conclude the paper by demonstrating our framework in an artificial (but quite real-world) example in the context of comparing algorithms under different performance measures. △ Less

Submitted 13 December, 2022; originally announced December 2022.

MSC Class: 91-10 ACM Class: G.3

arXiv:2209.01857 [pdf, other]

Statistical Comparisons of Classifiers by Generalized Stochastic Dominance

Authors: Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin

Abstract: Although being a crucial question for the development of machine learning algorithms, there is still no consensus on how to compare classifiers over multiple data sets with respect to several criteria. Every comparison framework is confronted with (at least) three fundamental challenges: the multiplicity of quality criteria, the multiplicity of data sets and the randomness of the selection of data… ▽ More Although being a crucial question for the development of machine learning algorithms, there is still no consensus on how to compare classifiers over multiple data sets with respect to several criteria. Every comparison framework is confronted with (at least) three fundamental challenges: the multiplicity of quality criteria, the multiplicity of data sets and the randomness of the selection of data sets. In this paper, we add a fresh view to the vivid debate by adopting recent developments in decision theory. Based on so-called preference systems, our framework ranks classifiers by a generalized concept of stochastic dominance, which powerfully circumvents the cumbersome, and often even self-contradictory, reliance on aggregates. Moreover, we show that generalized stochastic dominance can be operationalized by solving easy-to-handle linear programs and moreover statistically tested employing an adapted two-sample observation-randomization test. This yields indeed a powerful framework for the statistical comparison of classifiers over multiple data sets with respect to multiple quality criteria simultaneously. We illustrate and investigate our framework in a simulation study and with a set of standard benchmark data sets. △ Less

Submitted 5 July, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: Accepted for publication in: Journal of Machine Learning Research (JMLR)

MSC Class: 62G10; 62C05

arXiv:2110.12879 [pdf, other]

Information efficient learning of complexly structured preferences: Elicitation procedures and their application to decision making under uncertainty

Authors: Christoph Jansen, Hannah Blocher, Thomas Augustin, Georg Schollmeyer

Abstract: In this paper we propose efficient methods for elicitation of complexly structured preferences and utilize these in problems of decision making under (severe) uncertainty. Based on the general framework introduced in Jansen, Schollmeyer and Augustin (2018, Int. J. Approx. Reason), we now design elicitation procedures and algorithms that enable decision makers to reveal their underlying preference… ▽ More In this paper we propose efficient methods for elicitation of complexly structured preferences and utilize these in problems of decision making under (severe) uncertainty. Based on the general framework introduced in Jansen, Schollmeyer and Augustin (2018, Int. J. Approx. Reason), we now design elicitation procedures and algorithms that enable decision makers to reveal their underlying preference system (i.e. two relations, one encoding the ordinal, the other the cardinal part of the preferences) while having to answer as few as possible simple ranking questions. Here, two different approaches are followed. The first approach directly utilizes the collected ranking data for obtaining the ordinal part of the preferences, while their cardinal part is constructed implicitly by measuring meta data on the decision maker's consideration times. In contrast, the second approach explicitly elicits also the cardinal part of the decision maker's preference system, however, only an approximate version of it. This approximation is obtained by additionally collecting labels of preference strength during the elicitation procedure. For both approaches, we give conditions under which they produce the decision maker's true preference system and investigate how their efficiency can be improved. For the latter purpose, besides data-free approaches, we also discuss ways for effectively guiding the elicitation procedure if data from previous elicitation rounds is available. Finally, we demonstrate how the proposed elicitation methods can be utilized in problems of decision under (severe) uncertainty. Precisely, we show that under certain conditions optimal decisions can be found without fully specifying the preference system. △ Less

Submitted 1 February, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Showing 1–9 of 9 results for author: Jansen, C