Search | arXiv e-print repository

Fair Tree Classifier using Strong Demographic Parity

Authors: António Pereira Barata, Frank W. Takes, H. Jaap van den Herik, Cor J. Veenman

Abstract: When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the th… ▽ More When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the threshold-free ROC-AUC is the standard for measuring traditional classification model performance, current fair tree classification methods mainly optimise for a fixed threshold on both the classification task as well as the fairness metric. In this paper, we propose a compound splitting criterion which combines threshold-free (i.e., strong) demographic parity with ROC-AUC termed SCAFF -- Splitting Criterion AUC for Fairness -- and easily extends to bagged and boosted tree frameworks. Our method simultaneously leverages multiple sensitive attributes of which the values may be multicategorical or intersectional, and is tunable with respect to the unavoidable performance-fairness trade-off. In our experiments, we demonstrate how SCAFF generates models with performance and fairness with respect to binary, multicategorical, and multiple sensitive attributes. △ Less

Submitted 22 November, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

arXiv:1902.02412 [pdf, other]

doi 10.1137/1.9781611975673.35

A Bayesian Approach for Accurate Classification-Based Aggregates

Authors: Q. A. Meertens, C. G. H. Diks, H. J. van den Herik, F W Takes

Abstract: In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rate… ▽ More In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rates have to be estimated. In this estimation, two issues arise when applying existing bias correction methods. First, inaccuracies in estimating classification error rates have to be taken into account. Second, impermissible estimates, such as a negative estimate for a positive value, have to be dismissed. We show that both issues are relevant in applications where the true labels are known only for a small set of data points. We propose a novel bias correction method using Bayesian inference. The novelty of our method is that it imposes constraints on the model parameters. We show that our method solves the problem of biased classification-based aggregates as well as the two issues above, in the general setting of multi-class classification. In the empirical evaluation, using a binary classifier on a real-world dataset of company tax returns, we show that our method outperforms existing methods in terms of mean squared error. △ Less

Submitted 6 February, 2019; originally announced February 2019.

Comments: 9 pages, 5 figures, accepted conference paper, SIAM International Conference on Data Mining 2019 (SDM19)

arXiv:1805.06930 [pdf, other]

doi 10.1111/rssa.12487

A Data-Driven Supply-Side Approach for Measuring Cross-Border Internet Purchases

Authors: Q. A. Meertens, C. G. H. Diks, H. J. van den Herik, F. W. Takes

Abstract: The digital economy is a highly relevant item on the European Union's policy agenda. Cross-border internet purchases are part of the digital economy, but their total value can currently not be accurately measured or estimated. Traditional approaches based on consumer surveys or business surveys are shown to be inadequate for this purpose, due to language bias and sampling issues, respectively. We… ▽ More The digital economy is a highly relevant item on the European Union's policy agenda. Cross-border internet purchases are part of the digital economy, but their total value can currently not be accurately measured or estimated. Traditional approaches based on consumer surveys or business surveys are shown to be inadequate for this purpose, due to language bias and sampling issues, respectively. We address both problems by proposing a novel approach based on supply-side data, namely tax returns. The proposed data-driven record-linkage techniques and machine learning algorithms utilize two additional open data sources: European business registers and internet data. Our main finding is that the value of total cross-border internet purchases within the European Union by Dutch consumers was over EUR 1.3 billion in 2016. This is more than 6 times as high as current estimates. Our finding motivates the implementation of the proposed methodology in other EU member states. Ultimately, it could lead to more accurate estimates of cross-border internet purchases within the entire European Union. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: 27 pages, 5 figures, submitted to Journal of the Royal Statistical Society, Series A (Statistics in Society)

arXiv:1711.08337 [pdf, ps, other]

doi 10.1109/TEVC.2013.2285111

Genetic Algorithms for Evolving Computer Chess Programs

Authors: Eli David, H. Jaap van den Herik, Moshe Koppel, Nathan S. Netanyahu

Abstract: This paper demonstrates the use of genetic algorithms for evolving: 1) a grandmaster-level evaluation function, and 2) a search mechanism for a chess program, the parameter values of which are initialized randomly. The evaluation function of the program is evolved by learning from databases of (human) grandmaster games. At first, the organisms are evolved to mimic the behavior of human grandmaster… ▽ More This paper demonstrates the use of genetic algorithms for evolving: 1) a grandmaster-level evaluation function, and 2) a search mechanism for a chess program, the parameter values of which are initialized randomly. The evaluation function of the program is evolved by learning from databases of (human) grandmaster games. At first, the organisms are evolved to mimic the behavior of human grandmasters, and then these organisms are further improved upon by means of coevolution. The search mechanism is evolved by learning from tactical test suites. Our results show that the evolved program outperforms a two-time world computer chess champion and is at par with the other leading computer chess programs. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Comments: Winner of Gold Award in 11th Annual "Humies" Awards for Human-Competitive Results. arXiv admin note: substantial text overlap with arXiv:1711.06840, arXiv:1711.06841, arXiv:1711.06839

Journal ref: IEEE Transactions on Evolutionary Computation, Vol. 18, No. 5, pp. 779-789, September 2014

arXiv:1711.06840 [pdf, ps, other]

doi 10.1145/1569901.1570100

Simulating Human Grandmasters: Evolution and Coevolution of Evaluation Functions

Authors: Eli David, H. Jaap van den Herik, Moshe Koppel, Nathan S. Netanyahu

Abstract: This paper demonstrates the use of genetic algorithms for evolving a grandmaster-level evaluation function for a chess program. This is achieved by combining supervised and unsupervised learning. In the supervised learning phase the organisms are evolved to mimic the behavior of human grandmasters, and in the unsupervised learning phase these evolved organisms are further improved upon by means of… ▽ More This paper demonstrates the use of genetic algorithms for evolving a grandmaster-level evaluation function for a chess program. This is achieved by combining supervised and unsupervised learning. In the supervised learning phase the organisms are evolved to mimic the behavior of human grandmasters, and in the unsupervised learning phase these evolved organisms are further improved upon by means of coevolution. While past attempts succeeded in creating a grandmaster-level program by mimicking the behavior of existing computer chess programs, this paper presents the first successful attempt at evolving a state-of-the-art evaluation function by learning only from databases of games played by humans. Our results demonstrate that the evolved program outperforms a two-time World Computer Chess Champion. △ Less

Submitted 18 November, 2017; originally announced November 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1711.06839, arXiv:1711.06841

Journal ref: ACM Genetic and Evolutionary Computation Conference (GECCO), pages 1483-1489, Montreal, Canada, July 2009

arXiv:1207.7079 [pdf, ps, other]

doi 10.1016/j.cpc.2013.05.008

Improving multivariate Horner schemes with Monte Carlo tree search

Authors: J. Kuipers, J. A. M. Vermaseren, A. Plaat, H. J. van den Herik

Abstract: Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first a… ▽ More Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first are used. This simple textbook algorithm has given remarkably efficient results. Finding better algorithms has proved difficult. In trying to improve upon the greedy scheme we have implemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by factors up to two. △ Less

Submitted 30 July, 2012; originally announced July 2012.

Comments: 5 pages

arXiv:cs/0611068 [pdf, ps, other]

Wikipedia: organisation from a bottom-up approach

Authors: Sander Spek, Eric Postma, H. Jaap van den Herik

Abstract: Wikipedia can be considered as an extreme form of a self-managing team, as a means of labour division. One could expect that this bottom-up approach, with the absense of top-down organisational control, would lead to a chaos, but our analysis shows that this is not the case. In the Dutch Wikipedia, an integrated and coherent data structure is created, while at the same time users succeed in dist… ▽ More Wikipedia can be considered as an extreme form of a self-managing team, as a means of labour division. One could expect that this bottom-up approach, with the absense of top-down organisational control, would lead to a chaos, but our analysis shows that this is not the case. In the Dutch Wikipedia, an integrated and coherent data structure is created, while at the same time users succeed in distributing roles by self-selection. Some users focus on an area of expertise, while others edit over the whole encyclopedic range. This constitutes our conclusion that Wikipedia, in general, is a successful example of a self-managing team. △ Less

Submitted 11 December, 2006; v1 submitted 15 November, 2006; originally announced November 2006.

Comments: Presented on the Research in Wikipedia workshop, of the WikiSym 2006

ACM Class: H.3.7; H.4.3

Showing 1–7 of 7 results for author: Herik, H J v d