-
Synthetic Census Data Generation via Multidimensional Multiset Sum
Authors:
Cynthia Dwork,
Kristjan Greenewald,
Manish Raghavan
Abstract:
The US Decennial Census provides valuable data for both research and policy purposes. Census data are subject to a variety of disclosure avoidance techniques prior to release in order to preserve respondent confidentiality. While many are interested in studying the impacts of disclosure avoidance methods on downstream analyses, particularly with the introduction of differential privacy in the 2020…
▽ More
The US Decennial Census provides valuable data for both research and policy purposes. Census data are subject to a variety of disclosure avoidance techniques prior to release in order to preserve respondent confidentiality. While many are interested in studying the impacts of disclosure avoidance methods on downstream analyses, particularly with the introduction of differential privacy in the 2020 Decennial Census, these efforts are limited by a critical lack of data: The underlying "microdata," which serve as necessary input to disclosure avoidance methods, are kept confidential.
In this work, we aim to address this limitation by providing tools to generate synthetic microdata solely from published Census statistics, which can then be used as input to any number of disclosure avoidance algorithms for the sake of evaluation and carrying out comparisons. We define a principled distribution over microdata given published Census statistics and design algorithms to sample from this distribution. We formulate synthetic data generation in this context as a knapsack-style combinatorial optimization problem and develop novel algorithms for this setting. While the problem we study is provably hard, we show empirically that our methods work well in practice, and we offer theoretical arguments to explain our performance. Finally, we verify that the data we produce are "close" to the desired ground truth.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Equilibria, Efficiency, and Inequality in Network Formation for Hiring and Opportunity
Authors:
Cynthia Dwork,
Chris Hays,
Jon Kleinberg,
Manish Raghavan
Abstract:
Professional networks -- the social networks among people in a given line of work -- can serve as a conduit for job prospects and other opportunities. Here we propose a model for the formation of such networks and the transfer of opportunities within them. In our theoretical model, individuals strategically connect with others to maximize the probability that they receive opportunities from them.…
▽ More
Professional networks -- the social networks among people in a given line of work -- can serve as a conduit for job prospects and other opportunities. Here we propose a model for the formation of such networks and the transfer of opportunities within them. In our theoretical model, individuals strategically connect with others to maximize the probability that they receive opportunities from them. We explore how professional networks balance connectivity, where connections facilitate opportunity transfers to those who did not get them from outside sources, and congestion, where some individuals receive too many opportunities from their connections and waste some of them.
We show that strategic individuals are over-connected at equilibrium relative to a social optimum, leading to a price of anarchy for which we derive nearly tight asymptotic bounds. We also show that, at equilibrium, individuals form connections to those who provide similar benefit to them as they provide to others. Thus, our model provides a microfoundation in professional networking contexts for the fundamental sociological principle of homophily, that "similarity breeds connection," which in our setting is realized as a form of status homophily based on alignment in individual benefit. We further explore how, even if individuals are a priori equally likely to receive opportunities from outside sources, equilibria can be unequal, and we provide nearly tight bounds on how unequal they can be. Finally, we explore the ability for online platforms to intervene to improve social welfare and show that natural heuristics may result in adverse effects at equilibrium. Our simple model allows for a surprisingly rich analysis of coordination problems in professional networks and suggests many directions for further exploration.
△ Less
Submitted 27 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Human Expertise in Algorithmic Prediction
Authors:
Rohan Alur,
Manish Raghavan,
Devavrat Shah
Abstract:
We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective inform…
▽ More
We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.
△ Less
Submitted 22 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Content Moderation and the Formation of Online Communities: A Theoretical Framework
Authors:
Cynthia Dwork,
Chris Hays,
Jon Kleinberg,
Manish Raghavan
Abstract:
We study the impact of content moderation policies in online communities. In our theoretical model, a platform chooses a content moderation policy and individuals choose whether or not to participate in the community according to the fraction of user content that aligns with their preferences. The effects of content moderation, at first blush, might seem obvious: it restricts speech on a platform.…
▽ More
We study the impact of content moderation policies in online communities. In our theoretical model, a platform chooses a content moderation policy and individuals choose whether or not to participate in the community according to the fraction of user content that aligns with their preferences. The effects of content moderation, at first blush, might seem obvious: it restricts speech on a platform. However, when user participation decisions are taken into account, its effects can be more subtle $\unicode{x2013}$ and counter-intuitive. For example, our model can straightforwardly demonstrate how moderation policies may increase participation and diversify content available on the platform. In our analysis, we explore a rich set of interconnected phenomena related to content moderation in online communities. We first characterize the effectiveness of a natural class of moderation policies for creating and sustaining stable communities. Building on this, we explore how resource-limited or ideological platforms might set policies, how communities are affected by differing levels of personalization, and competition between platforms. Our model provides a vocabulary and mathematically tractable framework for analyzing platform decisions about content moderation.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Reconciling the accuracy-diversity trade-off in recommendations
Authors:
Kenny Peng,
Manish Raghavan,
Emma Pierson,
Jon Kleinberg,
Nikhil Garg
Abstract:
In recommendation settings, there is an apparent trade-off between the goals of accuracy (to recommend items a user is most likely to want) and diversity (to recommend items representing a range of categories). As such, real-world recommender systems often explicitly incorporate diversity separately from accuracy. This approach, however, leaves a basic question unanswered: Why is there a trade-off…
▽ More
In recommendation settings, there is an apparent trade-off between the goals of accuracy (to recommend items a user is most likely to want) and diversity (to recommend items representing a range of categories). As such, real-world recommender systems often explicitly incorporate diversity separately from accuracy. This approach, however, leaves a basic question unanswered: Why is there a trade-off in the first place?
We show how the trade-off can be explained via a user's consumption constraints -- users typically only consume a few of the items they are recommended. In a stylized model we introduce, objectives that account for this constraint induce diverse recommendations, while objectives that do not account for this constraint induce homogeneous recommendations. This suggests that accuracy and diversity appear misaligned because standard accuracy metrics do not consider consumption constraints. Our model yields precise and interpretable characterizations of diversity in different settings, giving practical insights into the design of diverse recommendations.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Auditing for Human Expertise
Authors:
Rohan Alur,
Loren Laine,
Darrick K. Li,
Manish Raghavan,
Devavrat Shah,
Dennis Shung
Abstract:
High-stakes prediction tasks (e.g., patient diagnosis) are often handled by trained human experts. A common source of concern about automation in these settings is that experts may exercise intuition that is difficult to model and/or have access to information (e.g., conversations with a patient) that is simply unavailable to a would-be algorithm. This raises a natural question whether human exper…
▽ More
High-stakes prediction tasks (e.g., patient diagnosis) are often handled by trained human experts. A common source of concern about automation in these settings is that experts may exercise intuition that is difficult to model and/or have access to information (e.g., conversations with a patient) that is simply unavailable to a would-be algorithm. This raises a natural question whether human experts add value which could not be captured by an algorithmic predictor. We develop a statistical framework under which we can pose this question as a natural hypothesis test. Indeed, as our framework highlights, detecting human expertise is more subtle than simply comparing the accuracy of expert predictions to those made by a particular learning algorithm. Instead, we propose a simple procedure which tests whether expert predictions are statistically independent from the outcomes of interest after conditioning on the available inputs (`features'). A rejection of our test thus suggests that human experts may add value to any algorithm trained on the available data, and has direct implications for whether human-AI `complementarity' is achievable in a given prediction task. We highlight the utility of our procedure using admissions data collected from the emergency department of a large academic hospital system, where we show that physicians' admit/discharge decisions for patients with acute gastrointestinal bleeding (AGIB) appear to be incorporating information that is not available to a standard algorithmic screening tool. This is despite the fact that the screening tool is arguably more accurate than physicians' discretionary decisions, highlighting that -- even absent normative concerns about accountability or interpretability -- accuracy is insufficient to justify algorithmic automation.
△ Less
Submitted 27 October, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection
Authors:
Chris Hays,
Zachary Schutzman,
Manish Raghavan,
Erin Walk,
Philipp Zimmer
Abstract:
Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection…
▽ More
Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules -- shallow decision trees trained on a small number of features -- achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset's collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.
△ Less
Submitted 1 May, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
The Right to be an Exception to a Data-Driven Rule
Authors:
Sarah H. Cen,
Manish Raghavan
Abstract:
Data-driven tools are increasingly used to make consequential decisions. They have begun to advise employers on which job applicants to interview, judges on which defendants to grant bail, lenders on which homeowners to give loans, and more. In such settings, different data-driven rules result in different decisions. The problem is: to every data-driven rule, there are exceptions. While a data-dri…
▽ More
Data-driven tools are increasingly used to make consequential decisions. They have begun to advise employers on which job applicants to interview, judges on which defendants to grant bail, lenders on which homeowners to give loans, and more. In such settings, different data-driven rules result in different decisions. The problem is: to every data-driven rule, there are exceptions. While a data-driven rule may be appropriate for some, it may not be appropriate for all. As data-driven decisions become more common, there are cases in which it becomes necessary to protect the individuals who, through no fault of their own, are the data-driven exceptions. At the same time, it is impossible to scrutinize every one of the increasing number of data-driven decisions, begging the question: When and how should data-driven exceptions be protected?
In this piece, we argue that individuals have the right to be an exception to a data-driven rule. That is, the presumption should not be that a data-driven rule--even one with high accuracy--is suitable for an arbitrary decision-subject of interest. Rather, a decision-maker should apply the rule only if they have exercised due care and due diligence (relative to the risk of harm) in excluding the possibility that the decision-subject is an exception to the data-driven rule. In some cases, the risk of harm may be so low that only cursory consideration is required. Although applying due care and due diligence is meaningful in human-driven decision contexts, it is unclear what it means for a data-driven rule to do so. We propose that determining whether a data-driven rule is suitable for a given decision-subject requires the consideration of three factors: individualization, uncertainty, and harm. We unpack this right in detail, providing a framework for assessing data-driven rules and describing what it would mean to invoke the right in practice.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
The Challenge of Understanding What Users Want: Inconsistent Preferences and Engagement Optimization
Authors:
Jon Kleinberg,
Sendhil Mullainathan,
Manish Raghavan
Abstract:
Online platforms have a wealth of data, run countless experiments and use industrial-scale algorithms to optimize user experience. Despite this, many users seem to regret the time they spend on these platforms. One possible explanation is misaligned incentives: platforms are not optimizing for user happiness. We suggest the problem runs deeper, transcending the specific incentives of any particula…
▽ More
Online platforms have a wealth of data, run countless experiments and use industrial-scale algorithms to optimize user experience. Despite this, many users seem to regret the time they spend on these platforms. One possible explanation is misaligned incentives: platforms are not optimizing for user happiness. We suggest the problem runs deeper, transcending the specific incentives of any particular platform, and instead stems from a mistaken foundational assumption: To understand what users want, platforms look at what users do. Yet research has demonstrated, and personal experience affirms, that we often make choices in the moment that are inconsistent with what we actually want. In this work, we develop a model of media consumption where users have inconsistent preferences. We consider a platform which simply wants to maximize user utility, but only observes user engagement. We show how our model of users' preference inconsistencies produces phenomena that are familiar from everyday experience, but difficult to capture in traditional user interaction models. A key ingredient in our model is a formulation for how platforms determine what to show users: they optimize over a large set of potential content (the content manifold) parametrized by underlying features of the content. Whether improving engagement improves user welfare depends on the direction of movement in the content manifold: for certain directions of change, increasing engagement makes users less happy, while in other directions, increasing engagement makes users happier. We characterize the structure of content manifolds for which increasing engagement fails to increase user utility. By linking these effects to abstractions of platform design choices, our model thus creates a theoretical framework and vocabulary in which to explore interactions between design, behavioral science, and social media.
△ Less
Submitted 23 October, 2023; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Stochastic Model for Sunk Cost Bias
Authors:
Jon Kleinberg,
Sigal Oren,
Manish Raghavan,
Nadav Sklar
Abstract:
We present a novel model for capturing the behavior of an agent exhibiting sunk-cost bias in a stochastic environment. Agents exhibiting sunk-cost bias take into account the effort they have already spent on an endeavor when they evaluate whether to continue or abandon it. We model planning tasks in which an agent with this type of bias tries to reach a designated goal. Our model structures this p…
▽ More
We present a novel model for capturing the behavior of an agent exhibiting sunk-cost bias in a stochastic environment. Agents exhibiting sunk-cost bias take into account the effort they have already spent on an endeavor when they evaluate whether to continue or abandon it. We model planning tasks in which an agent with this type of bias tries to reach a designated goal. Our model structures this problem as a type of Markov decision process: loosely speaking, the agent traverses a directed acyclic graph with probabilistic transitions, paying costs for its actions as it tries to reach a target node containing a specified reward. The agent's sunk cost bias is modeled by a cost that it incurs for abandoning the traversal: if the agent decides to stop traversing the graph, it incurs a cost of $λ\cdot C_{sunk}$, where ${λ\geq 0}$ is a parameter that captures the extent of the bias and $C_{sunk}$ is the sum of costs already invested.
We analyze the behavior of two types of agents: naive agents that are unaware of their bias, and sophisticated agents that are aware of it. Since optimal (bias-free) behavior in this problem can involve abandoning the traversal before reaching the goal, the bias exhibited by these types of agents can result in sub-optimal behavior by shifting their decisions about abandonment. We show that in contrast to optimal agents, it is computationally hard to compute the optimal policy for a sophisticated agent. Our main results quantify the loss exhibited by these two types of agents with respect to an optimal agent. We present both general and topology-specific bounds.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Fairness On The Ground: Applying Algorithmic Fairness Approaches to Production Systems
Authors:
Chloé Bakalar,
Renata Barreto,
Stevie Bergman,
Miranda Bogen,
Bobbie Chern,
Sam Corbett-Davies,
Melissa Hall,
Isabel Kloumann,
Michelle Lam,
Joaquin Quiñonero Candela,
Manish Raghavan,
Joshua Simons,
Jonathan Tannen,
Edmund Tong,
Kate Vredenburgh,
Jie**g Zhao
Abstract:
Many technical approaches have been proposed for ensuring that decisions made by machine learning systems are fair, but few of these proposals have been stress-tested in real-world systems. This paper presents an example of one team's approach to the challenge of applying algorithmic fairness approaches to complex production systems within the context of a large technology company. We discuss how…
▽ More
Many technical approaches have been proposed for ensuring that decisions made by machine learning systems are fair, but few of these proposals have been stress-tested in real-world systems. This paper presents an example of one team's approach to the challenge of applying algorithmic fairness approaches to complex production systems within the context of a large technology company. We discuss how we disentangle normative questions of product and policy design (like, "how should the system trade off between different stakeholders' interests and needs?") from empirical questions of system implementation (like, "is the system achieving the desired tradeoff in practice?"). We also present an approach for answering questions of the latter sort, which allows us to measure how machine learning systems and human labelers are making these tradeoffs across different relevant groups. We hope our experience integrating fairness tools and approaches into large-scale and complex production systems will be useful to other practitioners facing similar challenges, and illuminating to academics and researchers looking to better address the needs of practitioners.
△ Less
Submitted 24 March, 2021; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Algorithmic Monoculture and Social Welfare
Authors:
Jon Kleinberg,
Manish Raghavan
Abstract:
As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. Here we s…
▽ More
As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. Here we show that the dangers of algorithmic monoculture run much deeper, in that monocultural convergence on a single algorithm by a group of decision-making agents, even when the algorithm is more accurate for any one agent in isolation, can reduce the overall quality of the decisions being made by the full collection of agents. Unexpected shocks are therefore not needed to expose the risks of monoculture; it can hurt accuracy even under "normal" operations, and even for algorithms that are more accurate when used by only a single decision-maker. Our results rely on minimal assumptions, and involve the development of a probabilistic framework for analyzing systems that use multiple noisy estimates of a set of alternatives.
△ Less
Submitted 1 June, 2021; v1 submitted 14 January, 2021;
originally announced January 2021.
-
Bridging Machine Learning and Mechanism Design towards Algorithmic Fairness
Authors:
Jessie Finocchiaro,
Roland Maio,
Faidra Monachou,
Gourab K Patro,
Manish Raghavan,
Ana-Andreea Stoica,
Stratis Tsirtsis
Abstract:
Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choic…
▽ More
Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.
△ Less
Submitted 4 March, 2021; v1 submitted 11 October, 2020;
originally announced October 2020.
-
An operational architecture for privacy-by-design in public service applications
Authors:
Prashant Agrawal,
Anubhutie Singh,
Malavika Raghavan,
Subodh Sharma,
Subhashis Banerjee
Abstract:
Governments around the world are trying to build large data registries for effective delivery of a variety of public services. However, these efforts are often undermined due to serious concerns over privacy risks associated with collection and processing of personally identifiable information. While a rich set of special-purpose privacy-preserving techniques exist in computer science, they are un…
▽ More
Governments around the world are trying to build large data registries for effective delivery of a variety of public services. However, these efforts are often undermined due to serious concerns over privacy risks associated with collection and processing of personally identifiable information. While a rich set of special-purpose privacy-preserving techniques exist in computer science, they are unable to provide end-to-end protection in alignment with legal principles in the absence of an overarching operational architecture to ensure purpose limitation and protection against insider attacks. This either leads to weak privacy protection in large designs, or adoption of overly defensive strategies to protect privacy by compromising on utility.
In this paper, we present an operational architecture for privacy-by-design based on independent regulatory oversight stipulated by most data protection regimes, regulated access control, purpose limitation and data minimisation. We briefly discuss the feasibility of implementing our architecture based on existing techniques. We also present some sample case studies of privacy-preserving design sketches of challenging public service applications.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
Greedy Algorithm almost Dominates in Smoothed Contextual Bandits
Authors:
Manish Raghavan,
Aleksandrs Slivkins,
Jennifer Wortman Vaughan,
Zhiwei Steven Wu
Abstract:
Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that alway…
▽ More
Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that always "exploits" by choosing an action that currently looks optimal. We ask under what conditions inherent diversity in the data makes explicit exploration unnecessary. We build on a recent line of work on the smoothed analysis of the greedy algorithm in the linear contextual bandits model. We improve on prior results to show that a greedy approach almost matches the best possible Bayesian regret rate of any other algorithm on the same problem instance whenever the diversity conditions hold, and that this regret is at most $\tilde O(T^{1/3})$.
△ Less
Submitted 27 December, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons
Authors:
Solon Barocas,
Andrew D. Selbst,
Manish Raghavan
Abstract:
Counterfactual explanations are gaining prominence within technical, legal, and business circles as a way to explain the decisions of a machine learning model. These explanations share a trait with the long-established "principal reason" explanations required by U.S. credit laws: they both explain a decision by highlighting a set of features deemed most relevant--and withholding others.
These "f…
▽ More
Counterfactual explanations are gaining prominence within technical, legal, and business circles as a way to explain the decisions of a machine learning model. These explanations share a trait with the long-established "principal reason" explanations required by U.S. credit laws: they both explain a decision by highlighting a set of features deemed most relevant--and withholding others.
These "feature-highlighting explanations" have several desirable properties: They place no constraints on model complexity, do not require model disclosure, detail what needed to be different to achieve a different decision, and seem to automate compliance with the law. But they are far more complex and subjective than they appear.
In this paper, we demonstrate that the utility of feature-highlighting explanations relies on a number of easily overlooked assumptions: that the recommended change in feature values clearly maps to real-world actions, that features can be made commensurate by looking only at the distribution of the training data, that features are only relevant to the decision at hand, and that the underlying model is stable over time, monotonic, and limited to binary outcomes.
We then explore several consequences of acknowledging and attempting to address these assumptions, including a paradox in the way that feature-highlighting explanations aim to respect autonomy, the unchecked power that feature-highlighting explanations grant decision makers, and a tension between making these explanations useful and the need to keep the model hidden.
While new research suggests several ways that feature-highlighting explanations can work around some of the problems that we identify, the disconnect between features in the model and actions in the real world--and the subjective choices necessary to compensate for this--must be understood before these techniques can be usefully implemented.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Roles for Computing in Social Change
Authors:
Rediet Abebe,
Solon Barocas,
Jon Kleinberg,
Karen Levy,
Manish Raghavan,
David G. Robinson
Abstract:
A recent normative turn in computer science has brought concerns about fairness, bias, and accountability to the core of the field. Yet recent scholarship has warned that much of this technical work treats problematic features of the status quo as fixed, and fails to address deeper patterns of injustice and inequality. While acknowledging these critiques, we posit that computational research has v…
▽ More
A recent normative turn in computer science has brought concerns about fairness, bias, and accountability to the core of the field. Yet recent scholarship has warned that much of this technical work treats problematic features of the status quo as fixed, and fails to address deeper patterns of injustice and inequality. While acknowledging these critiques, we posit that computational research has valuable roles to play in addressing social problems -- roles whose value can be recognized even from a perspective that aspires toward fundamental social change. In this paper, we articulate four such roles, through an analysis that considers the opportunities as well as the significant risks inherent in such work. Computing research can serve as a diagnostic, hel** us to understand and measure social problems with precision and clarity. As a formalizer, computing shapes how social problems are explicitly defined --- changing how those problems, and possible responses to them, are understood. Computing serves as rebuttal when it illuminates the boundaries of what is possible through technical means. And computing acts as synecdoche when it makes long-standing social problems newly salient in the public eye. We offer these paths forward as modalities that leverage the particular strengths of computational work in the service of social change, without overclaiming computing's capacity to solve social problems on its own.
△ Less
Submitted 9 July, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices
Authors:
Manish Raghavan,
Solon Barocas,
Jon Kleinberg,
Karen Levy
Abstract:
There has been rapidly growing interest in the use of algorithms in hiring, especially as a means to address or mitigate bias. Yet, to date, little is known about how these methods are used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and analyze the claims and practices of companies offering algorithms for employment assessment. I…
▽ More
There has been rapidly growing interest in the use of algorithms in hiring, especially as a means to address or mitigate bias. Yet, to date, little is known about how these methods are used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and analyze the claims and practices of companies offering algorithms for employment assessment. In particular, we identify vendors of algorithmic pre-employment assessments (i.e., algorithms to screen candidates), document what they have disclosed about their development and validation procedures, and evaluate their practices, focusing particularly on efforts to detect and mitigate bias. Our analysis considers both technical and legal perspectives. Technically, we consider the various choices vendors make regarding data collection and prediction targets, and explore the risks and trade-offs that these choices pose. We also discuss how algorithmic de-biasing techniques interface with, and create challenges for, antidiscrimination law.
△ Less
Submitted 6 December, 2019; v1 submitted 21 June, 2019;
originally announced June 2019.
-
Hiring Under Uncertainty
Authors:
Manish Raghavan,
Manish Purohit,
Sreenivas Gollupadi
Abstract:
In this paper we introduce the hiring under uncertainty problem to model the questions faced by hiring committees in large enterprises and universities alike. Given a set of $n$ eligible candidates, the decision maker needs to choose the sequence of candidates to make offers so as to hire the $k$ best candidates. However, candidates may choose to reject an offer (for instance, due to a competing o…
▽ More
In this paper we introduce the hiring under uncertainty problem to model the questions faced by hiring committees in large enterprises and universities alike. Given a set of $n$ eligible candidates, the decision maker needs to choose the sequence of candidates to make offers so as to hire the $k$ best candidates. However, candidates may choose to reject an offer (for instance, due to a competing offer) and the decision maker has a time limit by which all positions must be filled. Given an estimate of the probabilities of acceptance for each candidate, the hiring under uncertainty problem is to design a strategy of making offers so that the total expected value of all candidates hired by the time limit is maximized. We provide a 2-approximation algorithm for the setting where offers must be made in sequence, an 8-approximation when offers may be made in parallel, and a 10-approximation for the more general stochastic knapsack setting with finite probes.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
How Do Classifiers Induce Agents To Invest Effort Strategically?
Authors:
Jon Kleinberg,
Manish Raghavan
Abstract:
Algorithms are often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a tight characterization of when such agents can be…
▽ More
Algorithms are often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a tight characterization of when such agents can be incentivized to invest specified forms of effort into improving their outcomes as opposed to "gaming" the classifier. We show that whenever any "reasonable" mechanism can do so, a simple linear mechanism suffices.
△ Less
Submitted 31 July, 2019; v1 submitted 13 July, 2018;
originally announced July 2018.
-
The Externalities of Exploration and How Data Diversity Helps Exploitation
Authors:
Manish Raghavan,
Aleksandrs Slivkins,
Jennifer Wortman Vaughan,
Zhiwei Steven Wu
Abstract:
Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users for information that will lead to better decisions in the future. Recently, concerns have been raised about whether the process of exploration could be viewed as unfair, placing too much burden on certain ind…
▽ More
Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users for information that will lead to better decisions in the future. Recently, concerns have been raised about whether the process of exploration could be viewed as unfair, placing too much burden on certain individuals or groups. Motivated by these concerns, we initiate the study of the externalities of exploration - the undesirable side effects that the presence of one party may impose on another - under the linear contextual bandits model. We introduce the notion of a group externality, measuring the extent to which the presence of one population of users impacts the rewards of another. We show that this impact can in some cases be negative, and that, in a certain sense, no algorithm can avoid it. We then study externalities at the individual level, interpreting the act of exploration as an externality imposed on the current user of a system by future users. This drives us to ask under what conditions inherent diversity in the data makes explicit exploration unnecessary. We build on a recent line of work on the smoothed analysis of the greedy algorithm that always chooses the action that currently looks optimal, improving on prior results to show that a greedy approach almost matches the best possible Bayesian regret rate of any other algorithm on the same problem instance whenever the diversity conditions hold, and that this regret is at most $\tilde{O}(T^{1/3})$. Returning to group-level effects, we show that under the same conditions, negative group externalities essentially vanish under the greedy algorithm. Together, our results uncover a sharp contrast between the high externalities that exist in the worst case, and the ability to remove all externalities if the data is sufficiently diverse.
△ Less
Submitted 2 July, 2018; v1 submitted 1 June, 2018;
originally announced June 2018.
-
Map** the Invocation Structure of Online Political Interaction
Authors:
Manish Raghavan,
Ashton Anderson,
Jon Kleinberg
Abstract:
The surge in political information, discourse, and interaction has been one of the most important developments in social media over the past several years. There is rich structure in the interaction among different viewpoints on the ideological spectrum. However, we still have only a limited analytical vocabulary for expressing the ways in which these viewpoints interact.
In this paper, we devel…
▽ More
The surge in political information, discourse, and interaction has been one of the most important developments in social media over the past several years. There is rich structure in the interaction among different viewpoints on the ideological spectrum. However, we still have only a limited analytical vocabulary for expressing the ways in which these viewpoints interact.
In this paper, we develop network-based methods that operate on the ways in which users share content; we construct \emph{invocation graphs} on Web domains showing the extent to which pages from one domain are invoked by users to reply to posts containing pages from other domains. When we locate the domains on a political spectrum induced from the data, we obtain an embedded graph showing how these interaction links span different distances on the spectrum. The structure of this embedded network, and its evolution over time, helps us derive macro-level insights about how political interaction unfolded through 2016, leading up to the US Presidential election. In particular, we find that the domains invoked in replies spanned increasing distances on the spectrum over the months approaching the election, and that there was clear asymmetry between the left-to-right and right-to-left patterns of linkage.
△ Less
Submitted 26 February, 2018;
originally announced February 2018.
-
Selection Problems in the Presence of Implicit Bias
Authors:
Jon Kleinberg,
Manish Raghavan
Abstract:
Over the past two decades, the notion of implicit bias has come to serve as an important component in our understanding of discrimination in activities such as hiring, promotion, and school admissions. Research on implicit bias posits that when people evaluate others -- for example, in a hiring context -- their unconscious biases about membership in particular groups can have an effect on their de…
▽ More
Over the past two decades, the notion of implicit bias has come to serve as an important component in our understanding of discrimination in activities such as hiring, promotion, and school admissions. Research on implicit bias posits that when people evaluate others -- for example, in a hiring context -- their unconscious biases about membership in particular groups can have an effect on their decision-making, even when they have no deliberate intention to discriminate against members of these groups. A growing body of experimental work has pointed to the effect that implicit bias can have in producing adverse outcomes.
Here we propose a theoretical model for studying the effects of implicit bias on selection decisions, and a way of analyzing possible procedural remedies for implicit bias within this model. A canonical situation represented by our model is a hiring setting: a recruiting committee is trying to choose a set of finalists to interview among the applicants for a job, evaluating these applicants based on their future potential, but their estimates of potential are skewed by implicit bias against members of one group. In this model, we show that measures such as the Rooney Rule, a requirement that at least one of the finalists be chosen from the affected group, can not only improve the representation of this affected group, but also lead to higher payoffs in absolute terms for the organization performing the recruiting. However, identifying the conditions under which such measures can lead to improved payoffs involves subtle trade-offs between the extent of the bias and the underlying distribution of applicant characteristics, leading to novel theoretical questions about order statistics in the presence of probabilistic side information.
△ Less
Submitted 4 January, 2018;
originally announced January 2018.
-
On Fairness and Calibration
Authors:
Geoff Pleiss,
Manish Raghavan,
Felix Wu,
Jon Kleinberg,
Kilian Q. Weinberger
Abstract:
The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models. This has motivated a growing line of work on what it means for a classification procedure to be "fair." In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates…
▽ More
The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models. This has motivated a growing line of work on what it means for a classification procedure to be "fair." In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that calibration is compatible only with a single error constraint (i.e. equal false-negatives rates across groups), and show that any algorithm that satisfies this relaxation is no better than randomizing a percentage of predictions for an existing classifier. These unsettling findings, which extend and generalize existing results, are empirically confirmed on several datasets.
△ Less
Submitted 3 November, 2017; v1 submitted 6 September, 2017;
originally announced September 2017.
-
Planning with Multiple Biases
Authors:
Jon Kleinberg,
Sigal Oren,
Manish Raghavan
Abstract:
Recent work has considered theoretical models for the behavior of agents with specific behavioral biases: rather than making decisions that optimize a given payoff function, the agent behaves inefficiently because its decisions suffer from an underlying bias. These approaches have generally considered an agent who experiences a single behavioral bias, studying the effect of this bias on the outcom…
▽ More
Recent work has considered theoretical models for the behavior of agents with specific behavioral biases: rather than making decisions that optimize a given payoff function, the agent behaves inefficiently because its decisions suffer from an underlying bias. These approaches have generally considered an agent who experiences a single behavioral bias, studying the effect of this bias on the outcome.
In general, however, decision-making can and will be affected by multiple biases operating at the same time. How do multiple biases interact to produce the overall outcome? Here we consider decisions in the presence of a pair of biases exhibiting an intuitively natural interaction: present bias -- the tendency to value costs incurred in the present too highly -- and sunk-cost bias -- the tendency to incorporate costs experienced in the past into one's plans for the future.
We propose a theoretical model for planning with this pair of biases, and we show how certain natural behavioral phenomena can arise in our model only when agents exhibit both biases. As part of our model we differentiate between agents that are aware of their biases (sophisticated) and agents that are unaware of them (naive). Interestingly, we show that the interaction between the two biases is quite complex: in some cases, they mitigate each other's effects while in other cases they might amplify each other. We obtain a number of further results as well, including the fact that the planning problem in our model for an agent experiencing and aware of both biases is computationally hard in general, though tractable under more relaxed assumptions.
△ Less
Submitted 4 June, 2017;
originally announced June 2017.
-
Inherent Trade-Offs in the Fair Determination of Risk Scores
Authors:
Jon Kleinberg,
Sendhil Mullainathan,
Manish Raghavan
Abstract:
Recent discussion in the public sphere about algorithmic classification has involved tension between competing notions of what it means for a probabilistic classification to be fair to different groups. We formalize three fairness conditions that lie at the heart of these debates, and we prove that except in highly constrained special cases, there is no method that can satisfy these three conditio…
▽ More
Recent discussion in the public sphere about algorithmic classification has involved tension between competing notions of what it means for a probabilistic classification to be fair to different groups. We formalize three fairness conditions that lie at the heart of these debates, and we prove that except in highly constrained special cases, there is no method that can satisfy these three conditions simultaneously. Moreover, even satisfying all three conditions approximately requires that the data lie in an approximate version of one of the constrained special cases identified by our theorem. These results suggest some of the ways in which key notions of fairness are incompatible with each other, and hence provide a framework for thinking about the trade-offs between them.
△ Less
Submitted 17 November, 2016; v1 submitted 19 September, 2016;
originally announced September 2016.
-
Planning Problems for Sophisticated Agents with Present Bias
Authors:
Jon Kleinberg,
Sigal Oren,
Manish Raghavan
Abstract:
Present bias, the tendency to weigh costs and benefits incurred in the present too heavily, is one of the most widespread human behavioral biases. It has also been the subject of extensive study in the behavioral economics literature. While the simplest models assume that the agents are naive, reasoning about the future without taking their bias into account, there is considerable evidence that pe…
▽ More
Present bias, the tendency to weigh costs and benefits incurred in the present too heavily, is one of the most widespread human behavioral biases. It has also been the subject of extensive study in the behavioral economics literature. While the simplest models assume that the agents are naive, reasoning about the future without taking their bias into account, there is considerable evidence that people often behave in ways that are sophisticated with respect to present bias, making plans based on the belief that they will be present-biased in the future. For example, committing to a course of action to reduce future opportunities for procrastination or overconsumption are instances of sophisticated behavior in everyday life.
Models of sophisticated behavior have lacked an underlying formalism that allows one to reason over the full space of multi-step tasks that a sophisticated agent might face. This has made it correspondingly difficult to make comparative or worst-case statements about the performance of sophisticated agents in arbitrary scenarios. In this paper, we incorporate the notion of sophistication into a graph-theoretic model that we used in recent work for modeling naive agents. This new synthesis of two formalisms - sophistication and graph-theoretic planning - uncovers a rich structure that wasn't apparent in the earlier behavioral economics work on this problem.
In particular, our graph-theoretic model makes two kinds of new results possible. First, we give tight worst-case bounds on the performance of sophisticated agents in arbitrary multi-step tasks relative to the optimal plan. Second, the flexibility of our formalism makes it possible to identify new phenomena that had not been seen in prior literature: these include a surprising non-monotonic property in the use of rewards to motivate sophisticated agents and a framework for reasoning about commitment devices.
△ Less
Submitted 27 March, 2016;
originally announced March 2016.