Matched Pair Calibration for Ranking Fairness
Authors:
Hannah Korevaar,
Chris McConnell,
Edmund Tong,
Erik Brinkman,
Alana Shine,
Misam Abbas,
Blossom Metevier,
Sam Corbett-Davies,
Khalid El-Arini
Abstract:
We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differ…
▽ More
We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.
△ Less
Submitted 30 November, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
Cumulative differences between paired samples
Authors:
Isabel Kloumann,
Hannah Korevaar,
Chris McConnell,
Mark Tygert,
Jessica Zhao
Abstract:
The simplest, most common paired samples consist of observations from two populations, with each observed response from one population corresponding to an observed response from the other population at the same value of an ordinal covariate. The pair of observed responses (one from each population) at the same value of the covariate is known as a "matched pair" (with the matching based on the valu…
▽ More
The simplest, most common paired samples consist of observations from two populations, with each observed response from one population corresponding to an observed response from the other population at the same value of an ordinal covariate. The pair of observed responses (one from each population) at the same value of the covariate is known as a "matched pair" (with the matching based on the value of the covariate). A graph of cumulative differences between the two populations reveals differences in responses as a function of the covariate. Indeed, the slope of the secant line connecting two points on the graph becomes the average difference over the wide interval of values of the covariate between the two points; i.e., slope of the graph is the average difference in responses. ("Average" refers to the weighted average if the samples are weighted.) Moreover, a simple statistic known as the Kuiper metric summarizes into a single scalar the overall differences over all values of the covariate. The Kuiper metric is the absolute value of the total difference in responses between the two populations, totaled over the interval of values of the covariate for which the absolute value of the total is greatest. The total should be normalized such that it becomes the (weighted) average over all values of the covariate when the interval over which the total is taken is the entire range of the covariate (i.e., the sum for the total gets divided by the total number of observations, if the samples are unweighted, or divided by the total weight, if the samples are weighted). This cumulative approach is fully nonparametric and uniquely defined (with only one right way to construct the graphs and scalar summary statistics), unlike traditional methods such as reliability diagrams or parametric or semi-parametric regressions, which typically obscure significant differences due to their parameter settings.
△ Less
Submitted 8 April, 2024; v1 submitted 18 May, 2023;
originally announced May 2023.