-
Fair Tree Classifier using Strong Demographic Parity
Authors:
António Pereira Barata,
Frank W. Takes,
H. Jaap van den Herik,
Cor J. Veenman
Abstract:
When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the th…
▽ More
When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the threshold-free ROC-AUC is the standard for measuring traditional classification model performance, current fair tree classification methods mainly optimise for a fixed threshold on both the classification task as well as the fairness metric. In this paper, we propose a compound splitting criterion which combines threshold-free (i.e., strong) demographic parity with ROC-AUC termed SCAFF -- Splitting Criterion AUC for Fairness -- and easily extends to bagged and boosted tree frameworks. Our method simultaneously leverages multiple sensitive attributes of which the values may be multicategorical or intersectional, and is tunable with respect to the unavoidable performance-fairness trade-off. In our experiments, we demonstrate how SCAFF generates models with performance and fairness with respect to binary, multicategorical, and multiple sensitive attributes.
△ Less
Submitted 22 November, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.
-
A Bayesian Approach for Accurate Classification-Based Aggregates
Authors:
Q. A. Meertens,
C. G. H. Diks,
H. J. van den Herik,
F W Takes
Abstract:
In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rate…
▽ More
In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rates have to be estimated. In this estimation, two issues arise when applying existing bias correction methods. First, inaccuracies in estimating classification error rates have to be taken into account. Second, impermissible estimates, such as a negative estimate for a positive value, have to be dismissed. We show that both issues are relevant in applications where the true labels are known only for a small set of data points. We propose a novel bias correction method using Bayesian inference. The novelty of our method is that it imposes constraints on the model parameters. We show that our method solves the problem of biased classification-based aggregates as well as the two issues above, in the general setting of multi-class classification. In the empirical evaluation, using a binary classifier on a real-world dataset of company tax returns, we show that our method outperforms existing methods in terms of mean squared error.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
A Data-Driven Supply-Side Approach for Measuring Cross-Border Internet Purchases
Authors:
Q. A. Meertens,
C. G. H. Diks,
H. J. van den Herik,
F. W. Takes
Abstract:
The digital economy is a highly relevant item on the European Union's policy agenda. Cross-border internet purchases are part of the digital economy, but their total value can currently not be accurately measured or estimated. Traditional approaches based on consumer surveys or business surveys are shown to be inadequate for this purpose, due to language bias and sampling issues, respectively. We…
▽ More
The digital economy is a highly relevant item on the European Union's policy agenda. Cross-border internet purchases are part of the digital economy, but their total value can currently not be accurately measured or estimated. Traditional approaches based on consumer surveys or business surveys are shown to be inadequate for this purpose, due to language bias and sampling issues, respectively. We address both problems by proposing a novel approach based on supply-side data, namely tax returns. The proposed data-driven record-linkage techniques and machine learning algorithms utilize two additional open data sources: European business registers and internet data. Our main finding is that the value of total cross-border internet purchases within the European Union by Dutch consumers was over EUR 1.3 billion in 2016. This is more than 6 times as high as current estimates. Our finding motivates the implementation of the proposed methodology in other EU member states. Ultimately, it could lead to more accurate estimates of cross-border internet purchases within the entire European Union.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.
-
Genetic Algorithms for Evolving Computer Chess Programs
Authors:
Eli David,
H. Jaap van den Herik,
Moshe Koppel,
Nathan S. Netanyahu
Abstract:
This paper demonstrates the use of genetic algorithms for evolving: 1) a grandmaster-level evaluation function, and 2) a search mechanism for a chess program, the parameter values of which are initialized randomly. The evaluation function of the program is evolved by learning from databases of (human) grandmaster games. At first, the organisms are evolved to mimic the behavior of human grandmaster…
▽ More
This paper demonstrates the use of genetic algorithms for evolving: 1) a grandmaster-level evaluation function, and 2) a search mechanism for a chess program, the parameter values of which are initialized randomly. The evaluation function of the program is evolved by learning from databases of (human) grandmaster games. At first, the organisms are evolved to mimic the behavior of human grandmasters, and then these organisms are further improved upon by means of coevolution. The search mechanism is evolved by learning from tactical test suites. Our results show that the evolved program outperforms a two-time world computer chess champion and is at par with the other leading computer chess programs.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Simulating Human Grandmasters: Evolution and Coevolution of Evaluation Functions
Authors:
Eli David,
H. Jaap van den Herik,
Moshe Koppel,
Nathan S. Netanyahu
Abstract:
This paper demonstrates the use of genetic algorithms for evolving a grandmaster-level evaluation function for a chess program. This is achieved by combining supervised and unsupervised learning. In the supervised learning phase the organisms are evolved to mimic the behavior of human grandmasters, and in the unsupervised learning phase these evolved organisms are further improved upon by means of…
▽ More
This paper demonstrates the use of genetic algorithms for evolving a grandmaster-level evaluation function for a chess program. This is achieved by combining supervised and unsupervised learning. In the supervised learning phase the organisms are evolved to mimic the behavior of human grandmasters, and in the unsupervised learning phase these evolved organisms are further improved upon by means of coevolution.
While past attempts succeeded in creating a grandmaster-level program by mimicking the behavior of existing computer chess programs, this paper presents the first successful attempt at evolving a state-of-the-art evaluation function by learning only from databases of games played by humans. Our results demonstrate that the evolved program outperforms a two-time World Computer Chess Champion.
△ Less
Submitted 18 November, 2017;
originally announced November 2017.
-
Improving multivariate Horner schemes with Monte Carlo tree search
Authors:
J. Kuipers,
J. A. M. Vermaseren,
A. Plaat,
H. J. van den Herik
Abstract:
Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first a…
▽ More
Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first are used. This simple textbook algorithm has given remarkably efficient results. Finding better algorithms has proved difficult. In trying to improve upon the greedy scheme we have implemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by factors up to two.
△ Less
Submitted 30 July, 2012;
originally announced July 2012.
-
Wikipedia: organisation from a bottom-up approach
Authors:
Sander Spek,
Eric Postma,
H. Jaap van den Herik
Abstract:
Wikipedia can be considered as an extreme form of a self-managing team, as a means of labour division. One could expect that this bottom-up approach, with the absense of top-down organisational control, would lead to a chaos, but our analysis shows that this is not the case. In the Dutch Wikipedia, an integrated and coherent data structure is created, while at the same time users succeed in dist…
▽ More
Wikipedia can be considered as an extreme form of a self-managing team, as a means of labour division. One could expect that this bottom-up approach, with the absense of top-down organisational control, would lead to a chaos, but our analysis shows that this is not the case. In the Dutch Wikipedia, an integrated and coherent data structure is created, while at the same time users succeed in distributing roles by self-selection. Some users focus on an area of expertise, while others edit over the whole encyclopedic range. This constitutes our conclusion that Wikipedia, in general, is a successful example of a self-managing team.
△ Less
Submitted 11 December, 2006; v1 submitted 15 November, 2006;
originally announced November 2006.