Search | arXiv e-print repository

doi 10.1145/3132847.3132938

FA*IR: A Fair Top-k Ranking Algorithm

Authors: Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates

Abstract: In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n >> k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proport… ▽ More In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n >> k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proportion of protected candidates in every prefix of the top-k ranking remains statistically above or indistinguishable from a given minimum. Utility is operationalized in two ways: (i) every candidate included in the top-$k$ should be more qualified than every candidate not included; and (ii) for every pair of candidates in the top-k, the more qualified candidate should be ranked above. An efficient algorithm is presented for producing the Fair Top-k Ranking, and tested experimentally on existing datasets as well as new datasets released with this paper, showing that our approach yields small distortions with respect to rankings that maximize utility without considering fairness criteria. To the best of our knowledge, this is the first algorithm grounded in statistical tests that can mitigate biases in the representation of an under-represented group along a ranked list. △ Less

Submitted 2 July, 2018; v1 submitted 20 June, 2017; originally announced June 2017.

Comments: In Proceedings of the 26th ACM International Conference on Information and Knowledge Management (CIKM'17). This version corrects an error on Table 4

ACM Class: H.3.3; J.1

arXiv:1510.00552 [pdf, other]

doi 10.1007/s41060-016-0040-z

Exposing the Probabilistic Causal Structure of Discrimination

Authors: Francesco Bonchi, Sara Hajian, Bud Mishra, Daniele Ramazzotti

Abstract: Discrimination discovery from data is an important task aiming at identifying patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally-valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation-based, albeit, as it is well known, correlation does not imply caus… ▽ More Discrimination discovery from data is an important task aiming at identifying patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally-valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation-based, albeit, as it is well known, correlation does not imply causation. In this paper we take a principled causal approach to the data mining problem of discrimination detection in databases. Following Suppes' probabilistic causation theory, we define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which we dub Suppes-Bayes Causal Network (SBCN). Next, we develop a toolkit of methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism. Our experiments on real-world datasets confirm the inferential power of our approach in all these different tasks. △ Less

Submitted 8 March, 2017; v1 submitted 2 October, 2015; originally announced October 2015.

arXiv:1306.6805 [pdf, other]

Simultaneous Discrimination Prevention and Privacy Protection in Data Publishing and Mining

Authors: Sara Hajian

Abstract: Data mining is an increasingly important technology for extracting useful knowledge hidden in large collections of data. There are, however, negative social perceptions about data mining, among which potential privacy violation and potential discrimination. Automated data collection and data mining techniques such as classification have paved the way to making automated decisions, like loan granti… ▽ More Data mining is an increasingly important technology for extracting useful knowledge hidden in large collections of data. There are, however, negative social perceptions about data mining, among which potential privacy violation and potential discrimination. Automated data collection and data mining techniques such as classification have paved the way to making automated decisions, like loan granting/denial, insurance premium computation. If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, discriminatory decisions may ensue. In the first part of this thesis, we tackle discrimination prevention in data mining and propose new techniques applicable for direct or indirect discrimination prevention individually or both at the same time. We discuss how to clean training datasets and outsourced datasets in such a way that direct and/or indirect discriminatory decision rules are converted to legitimate (non-discriminatory) classification rules. In the second part of this thesis, we argue that privacy and discrimination risks should be tackled together. We explore the relationship between privacy preserving data mining and discrimination prevention in data mining to design holistic approaches capable of addressing both threats simultaneously during the knowledge discovery process. As part of this effort, we have investigated for the first time the problem of discrimination and privacy aware frequent pattern discovery, i.e. the sanitization of the collection of patterns mined from a transaction database in such a way that neither privacy-violating nor discriminatory inferences can be inferred on the released patterns. Moreover, we investigate the problem of discrimination and privacy aware data publishing, i.e. transforming the data, instead of patterns, in order to simultaneously fulfill privacy preservation and discrimination prevention. △ Less

Submitted 28 June, 2013; originally announced June 2013.

Comments: PhD Thesis defended on June 10, 2013, at the Department of Computer Engineering and Mathematics of Universitat Rovira i Virgili. Advisors: Josep Domingo-Ferrer and Dino Pedreschi

MSC Class: 68P15; 68P20; 68P99 ACM Class: K.4.1; H.2.8

Showing 1–3 of 3 results for author: Hajian, S