-
An Abundance of Katherines: The Game Theory of Baby Naming
Authors:
Katy Blumer,
Kate Donahue,
Katie Fritz,
Kate Ivanovich,
Katherine Lee,
Katie Luo,
Cathy Meng,
Katie Van Koevering
Abstract:
In this paper, we study the highly competitive arena of baby naming. Through making several Extremely Reasonable Assumptions (namely, that parents are myopic, perfectly knowledgeable agents who pick a name based solely on its uniquness), we create a model which is not only tractable and clean, but also perfectly captures the real world. We then extend our investigation with numerical experiments,…
▽ More
In this paper, we study the highly competitive arena of baby naming. Through making several Extremely Reasonable Assumptions (namely, that parents are myopic, perfectly knowledgeable agents who pick a name based solely on its uniquness), we create a model which is not only tractable and clean, but also perfectly captures the real world. We then extend our investigation with numerical experiments, as well as analysis of large language model tools. We conclude by discussing avenues for future research.
△ Less
Submitted 1 April, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
Impact of Decentralized Learning on Player Utilities in Stackelberg Games
Authors:
Kate Donahue,
Nicole Immorlica,
Meena Jagadeesan,
Brendan Lucier,
Aleksandrs Slivkins
Abstract:
When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implicatio…
▽ More
When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case linear regret for at least one player. To better capture these systems, we construct a relaxed regret benchmark that is tolerant to small learning errors by agents. We show that standard learning algorithms fail to provide sublinear regret, and we develop algorithms to achieve near-optimal $O(T^{2/3})$ regret for both players with respect to these benchmarks. We further design relaxed environments under which faster learning ($O(\sqrt{T})$) is possible. Altogether, our results take a step towards assessing how two-agent interactions in sequential and decentralized learning environments affect the utility of both agents.
△ Less
Submitted 21 June, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making
Authors:
Kate Donahue,
Sreenivas Gollapudi,
Kostas Kollias
Abstract:
Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among…
▽ More
Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among those $k$. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of $k$ maximizes the probability that the best item will be ultimately selected? For $k=1$, performance is optimized by the algorithm acting alone, and for $k=n$ it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.
△ Less
Submitted 26 February, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Private Blotto: Viewpoint Competition with Polarized Agents
Authors:
Kate Donahue,
Jon Kleinberg
Abstract:
Colonel Blotto games are one of the oldest settings in game theory, originally proposed over a century ago in Borel 1921. However, they were originally designed to model two centrally-controlled armies competing over zero-sum "fronts", a specific scenario with limited modern-day application. In this work, we propose and study Private Blotto games, a variant connected to crowdsourcing and social me…
▽ More
Colonel Blotto games are one of the oldest settings in game theory, originally proposed over a century ago in Borel 1921. However, they were originally designed to model two centrally-controlled armies competing over zero-sum "fronts", a specific scenario with limited modern-day application. In this work, we propose and study Private Blotto games, a variant connected to crowdsourcing and social media. One key difference in Private Blotto is that individual agents act independently, without being coordinated by a central "Colonel". This model naturally arises from scenarios such as activist groups competing over multiple issues, partisan fund-raisers competing over elections in multiple states, or politically-biased social media users labeling news articles as misinformation. In this work, we completely characterize the Nash Stability of the Private Blotto game. Specifically, we show that the outcome function has a critical impact on the outcome of the game: we study whether a front is won by majority rule (median outcome) or a smoother outcome taking into account all agents (mean outcome). We study how this impacts the amount of "misallocated effort", or agents whose choices doesn't influence the final outcome. In general, mean outcome ensures that, if a stable arrangement exists, agents are close to evenly spaced across fronts, minimizing misallocated effort. However, mean outcome functions also have chaotic patterns as to when stable arrangements do and do not exist. For median outcome, we exactly characterize when a stable arrangement exists, but show that this outcome function frequently results in extremely unbalanced allocation of agents across fronts.
△ Less
Submitted 26 February, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness
Authors:
Kate Donahue,
Alexandra Chouldechova,
Krishnaram Kenthapadi
Abstract:
Much of machine learning research focuses on predictive accuracy: given a task, create a machine learning model (or algorithm) that maximizes accuracy. In many settings, however, the final prediction or decision of a system is under the control of a human, who uses an algorithm's output along with their own personal expertise in order to produce a combined prediction. One ultimate goal of such col…
▽ More
Much of machine learning research focuses on predictive accuracy: given a task, create a machine learning model (or algorithm) that maximizes accuracy. In many settings, however, the final prediction or decision of a system is under the control of a human, who uses an algorithm's output along with their own personal expertise in order to produce a combined prediction. One ultimate goal of such collaborative systems is "complementarity": that is, to produce lower loss (equivalently, greater payoff or utility) than either the human or algorithm alone. However, experimental results have shown that even in carefully-designed systems, complementary performance can be elusive. Our work provides three key contributions. First, we provide a theoretical framework for modeling simple human-algorithm systems and demonstrate that multiple prior analyses can be expressed within it. Next, we use this model to prove conditions where complementarity is impossible, and give constructive examples of where complementarity is achievable. Finally, we discuss the implications of our findings, especially with respect to the fairness of a classifier. In sum, these results deepen our understanding of key factors influencing the combined performance of human-algorithm systems, giving insight into how algorithmic tools can best be designed for collaborative environments.
△ Less
Submitted 1 June, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Models of fairness in federated learning
Authors:
Kate Donahue,
Jon Kleinberg
Abstract:
In many real-world situations, data is distributed across multiple self-interested agents. These agents can collaborate to build a machine learning model based on data from multiple agents, potentially reducing the error each experiences. However, sharing models in this way raises questions of fairness: to what extent can the error experienced by one agent be significantly lower than the error exp…
▽ More
In many real-world situations, data is distributed across multiple self-interested agents. These agents can collaborate to build a machine learning model based on data from multiple agents, potentially reducing the error each experiences. However, sharing models in this way raises questions of fairness: to what extent can the error experienced by one agent be significantly lower than the error experienced by another agent in the same coalition? In this work, we consider two notions of fairness that each may be appropriate in different circumstances: "egalitarian fairness" (which aims to bound how dissimilar error rates can be) and "proportional fairness" (which aims to reward players for contributing more data). We similarly consider two common methods of model aggregation, one where a single model is created for all agents (uniform), and one where an individualized model is created for each agent. For egalitarian fairness, we obtain a tight multiplicative bound on how widely error rates can diverge between agents collaborating (which holds for both aggregation methods). For proportional fairness, we show that the individualized aggregation method always gives a small player error that is upper bounded by proportionality. For uniform aggregation, we show that this upper bound is guaranteed for any individually rational coalition (where no player wishes to leave to do local learning).
△ Less
Submitted 25 February, 2023; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Optimality and Stability in Federated Learning: A Game-theoretic Approach
Authors:
Kate Donahue,
Jon Kleinberg
Abstract:
Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theore…
▽ More
Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players). First, we provide and prove the correctness of an efficient algorithm to calculate an optimal (error minimizing) arrangement of players. Next, we analyze the relationship between the stability and optimality of an arrangement. First, we show that for some regions of parameter space, all stable arrangements are optimal (Price of Anarchy equal to 1). However, we show this is not true for all settings: there exist examples of stable arrangements with higher cost than optimal (Price of Anarchy greater than 1). Finally, we give the first constant-factor bound on the performance gap between stability and optimality, proving that the total error of the worst stable solution can be no higher than 9 times the total error of an optimal solution (Price of Anarchy bound of 9).
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Better Together? How Externalities of Size Complicate Notions of Solidarity and Actuarial Fairness
Authors:
Kate Donahue,
Solon Barocas
Abstract:
Consider a cost-sharing game with players of different contribution to the total cost: an example might be an insurance company calculating premiums for a population of mixed-risk individuals. Two natural and competing notions of fairness might be to a) charge each individual the same price or b) charge each individual according to the cost that they bring to the pool. In the insurance literature,…
▽ More
Consider a cost-sharing game with players of different contribution to the total cost: an example might be an insurance company calculating premiums for a population of mixed-risk individuals. Two natural and competing notions of fairness might be to a) charge each individual the same price or b) charge each individual according to the cost that they bring to the pool. In the insurance literature, these general approaches are referred to as "solidarity" and "actuarial fairness" and are commonly viewed as opposites. However, in insurance (and many other natural settings), the cost-sharing game also exhibits "externalities of size": all else being equal, larger groups have lower average cost. In the insurance case, we analyze a model with externalities of size due to a reduction in the variability of losses. We explore how this complicates traditional understandings of fairness, drawing on literature in cooperative game theory.
First, we explore solidarity: we show that it is possible for both groups (high and low risk) to strictly benefit by joining an insurance pool where costs are evenly split, as opposed to being in separate risk pools. We build on this by producing a pricing scheme that maximally subsidizes the high risk group, while maintaining an incentive for lower risk people to stay in the insurance pool. Next, we demonstrate that with this new model, the price charged to each individual has to depend on the risk of other participants, making naive actuarial fairness inefficient. Furthermore, we prove that stable pricing schemes must be ones where players have the anti-social incentive of desiring riskier partners, contradicting motivations for using actuarial fairness. Finally, we describe how these results relate to debates about fairness in machine learning and potential avenues for future research.
△ Less
Submitted 1 December, 2021; v1 submitted 27 February, 2021;
originally announced March 2021.
-
Model-sharing Games: Analyzing Federated Learning Under Voluntary Participation
Authors:
Kate Donahue,
Jon Kleinberg
Abstract:
Federated learning is a setting where agents, each with access to their own data source, combine models from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they choose the global model or t…
▽ More
Federated learning is a setting where agents, each with access to their own data source, combine models from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they choose the global model or their local model? We show how this situation can be naturally analyzed through the framework of coalitional game theory.
We propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each player's goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions.
Here, we derive exact expected MSE values for problems in linear regression and mean estimation. We then analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly construct model(s). We analyze three methods of federation, modeling differing degrees of customization. In uniform federation, the agents collectively produce a single model. In coarse-grained federation, each agent can weight the global model together with their local model. In fine-grained federation, each agent can flexibly combine models from all other agents in the federation. For each method, we analyze the stable partitions of players into coalitions.
△ Less
Submitted 17 December, 2020; v1 submitted 1 October, 2020;
originally announced October 2020.
-
Brain volume: An important determinant of functional outcome after acute ischemic stroke
Authors:
Markus D. Schirmer,
Kathleen L. Donahue,
Marco J. Nardin,
Adrian V. Dalca,
Anne-Katrin Giese,
Mark R. Etherton,
Steven J. T. Mocking,
Elissa C. McIntosh,
John W. Cole,
Lukas Holmegaard,
Katarina Jood,
Jordi Jimenez-Conde,
Steven J. Kittner,
Robin Lemmens,
James F. Meschia,
Jonathan Rosand,
Jaume Roquer,
Tatjana Rundek,
Ralph L. Sacco MD,
Reinhold Schmidt,
Pankaj Sharma,
Agnieszka Slowik,
Tara M. Stanne,
Achala Vagal,
Johan Wasselius
, et al. (16 additional authors not shown)
Abstract:
Objective: To determine whether brain volume is associated with functional outcome after acute ischemic stroke (AIS).
Methods: We analyzed cross-sectional data of the multi-site, international hospital-based MRI-GENetics Interface Exploration (MRI-GENIE) study (July 1, 2014- March 16, 2019) with clinical brain magnetic resonance imaging (MRI) obtained on admission for index stroke and functional…
▽ More
Objective: To determine whether brain volume is associated with functional outcome after acute ischemic stroke (AIS).
Methods: We analyzed cross-sectional data of the multi-site, international hospital-based MRI-GENetics Interface Exploration (MRI-GENIE) study (July 1, 2014- March 16, 2019) with clinical brain magnetic resonance imaging (MRI) obtained on admission for index stroke and functional outcome assessment. Post-stroke outcome was determined using the modified Rankin Scale (mRS) score (0-6; 0: asymptomatic; 6 death) recorded between 60-190 days after stroke. Demographics and other clinical variables including acute stroke severity (measured as National Institutes of Health Stroke Scale score), vascular risk factors, and etiologic stroke subtypes (Causative Classification of Stroke) were recorded during index admission.
Results: Utilizing the data from 912 acute ischemic stroke (AIS) patients (65+/-15 years of age, 58% male, 57% history of smoking, and 65% hypertensive) in a generalized linear model, brain volume (per 155.1cm^3 ) was associated with age (beta -0.3 (per 14.4 years)), male sex (beta 1.0) and prior stroke (beta -0.2). In the multivariable outcome model, brain volume was an independent predictor of mRS (beta -0.233), with reduced odds of worse long-term functional outcomes (OR: 0.8, 95% CI 0.7-0.9) in those with larger brain volumes.
Conclusions: Larger brain volume quantified on clinical MRI of AIS patients at time of stroke purports a protective mechanism. The role of brain volume as a prognostic, protective biomarker has the potential to forge new areas of research and advance current knowledge of mechanisms of post-stroke recovery.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Multi-atlas image registration of clinical data with automated quality assessment using ventricle segmentation
Authors:
Florian Dubost,
Marleen de Bruijne,
Marco Nardin,
Adrian V. Dalca,
Kathleen L. Donahue,
Anne-Katrin Giese,
Mark R. Etherton,
Ona Wu,
Marius de Groot,
Wiro Niessen,
Meike Vernooij,
Natalia S. Rost,
Markus D. Schirmer
Abstract:
Registration is a core component of many imaging pipelines. In case of clinical scans, with lower resolution and sometimes substantial motion artifacts, registration can produce poor results. Visual assessment of registration quality in large clinical datasets is inefficient. In this work, we propose to automatically assess the quality of registration to an atlas in clinical FLAIR MRI scans of the…
▽ More
Registration is a core component of many imaging pipelines. In case of clinical scans, with lower resolution and sometimes substantial motion artifacts, registration can produce poor results. Visual assessment of registration quality in large clinical datasets is inefficient. In this work, we propose to automatically assess the quality of registration to an atlas in clinical FLAIR MRI scans of the brain. The method consists of automatically segmenting the ventricles of a given scan using a neural network, and comparing the segmentation to the atlas' ventricles propagated to image space. We used the proposed method to improve clinical image registration to a general atlas by computing multiple registrations and then selecting the registration that yielded the highest ventricle overlap. Methods were evaluated in a single-site dataset of more than 1000 scans, as well as a multi-center dataset comprising 142 clinical scans from 12 sites. The automated ventricle segmentation reached a Dice coefficient with manual annotations of 0.89 in the single-site dataset, and 0.83 in the multi-center dataset. Registration via age-specific atlases could improve ventricle overlap compared to a direct registration to the general atlas (Dice similarity coefficient increase up to 0.15). Experiments also showed that selecting scans with the registration quality assessment method could improve the quality of average maps of white matter hyperintensity burden, instead of using all scans for the computation of the white matter hyperintensity map. In this work, we demonstrated the utility of an automated tool for assessing image registration quality in clinical scans. This image quality assessment step could ultimately assist in the translation of automated neuroimaging pipelines to the clinic.
△ Less
Submitted 26 December, 2019; v1 submitted 1 July, 2019;
originally announced July 2019.
-
Fairness and Utilization in Allocating Resources with Uncertain Demand
Authors:
Kate Donahue,
Jon Kleinberg
Abstract:
Resource allocation problems are a fundamental domain in which to evaluate the fairness properties of algorithms. The trade-offs between fairness and utilization have a long history in this domain. A recent line of work has considered fairness questions for resource allocation when the demands for the resource are distributed across multiple groups and drawn from probability distributions. In such…
▽ More
Resource allocation problems are a fundamental domain in which to evaluate the fairness properties of algorithms. The trade-offs between fairness and utilization have a long history in this domain. A recent line of work has considered fairness questions for resource allocation when the demands for the resource are distributed across multiple groups and drawn from probability distributions. In such cases, a natural fairness requirement is that individuals from different groups should have (approximately) equal probabilities of receiving the resource. A largely open question in this area has been to bound the gap between the maximum possible utilization of the resource and the maximum possible utilization subject to this fairness condition.
Here, we obtain some of the first provable upper bounds on this gap. We obtain an upper bound for arbitrary distributions, as well as much stronger upper bounds for specific families of distributions that are typically used to model levels of demand. In particular, we find - somewhat surprisingly - that there are natural families of distributions (including Exponential and Weibull) for which the gap is non-existent: it is possible to simultaneously achieve maximum utilization and the given notion of fairness. Finally, we show that for power-law distributions, there is a non-trivial gap between the solutions, but this gap can be bounded by a constant factor independent of the parameters of the distribution.
△ Less
Submitted 5 December, 2019; v1 submitted 21 June, 2019;
originally announced June 2019.
-
Healthcare IT: Is your Information at Risk?
Authors:
Kimmarie Donahue,
Shawon Rahman
Abstract:
Healthcare Information Technology (IT) has made great advances over the past few years and while these advances have enable healthcare professionals to provide higher quality healthcare to a larger number of individuals it also provides the criminal element more opportunities to access sensitive information, such as patient protected health information (PHI) and Personal identification Information…
▽ More
Healthcare Information Technology (IT) has made great advances over the past few years and while these advances have enable healthcare professionals to provide higher quality healthcare to a larger number of individuals it also provides the criminal element more opportunities to access sensitive information, such as patient protected health information (PHI) and Personal identification Information (PII). Having an Information Assurance (IA) programallows for the protection of information and information systems and ensures the organization is in compliance with all requires regulations, laws and directive is essential. While most organizations have such a policy in place, often it is inadequate to ensure the proper protection to prevent security breaches. The increase of data breaches in the last few years demonstrates the importance of an effective IA program. To ensure an effective IA policy, the policy must manage the operational risk, including identifying risks, assessment and mitigation of identified risks and ongoing monitoring to ensure compliance
△ Less
Submitted 30 November, 2015;
originally announced December 2015.
-
Analysis and simulation of the operation of a Kelvin probe
Authors:
Robert D. Reasenberg,
Kathleen P. Donahue,
James D. Phillips
Abstract:
Experiments that measure extremely small gravitational forces are often hampered by the presence of non-gravitational forces that can neither be calculated nor separately measured. Among these spurious forces is electrostatic attraction between a test mass and its surroundings due to the presence of spatially varying surface potential known as the "patch effect." In order to make surfaces with sma…
▽ More
Experiments that measure extremely small gravitational forces are often hampered by the presence of non-gravitational forces that can neither be calculated nor separately measured. Among these spurious forces is electrostatic attraction between a test mass and its surroundings due to the presence of spatially varying surface potential known as the "patch effect." In order to make surfaces with small surface potential variation, it is necessary to be able to measure it. A Kelvin probe (KP) measures contact potential difference (CPD), using the time-varying capacitance between the sample and a vibrating tip that is biased with a backing potential. Assuming that the tip remains constant, this measures the sample's surface potential variation. We examine the operation of the KP from the perspective of parameter estimation in the presence of noise. We show that, when the CPD is estimated from measurements at two separate backing potentials, the standard deviation of the optimal estimate depends on the total observing time. Further, the observing time may be unevenly divided between the two backing potentials, provided the values of those potentials are correspondingly set. We simulate a two-stage KP data analysis, including a sub-optimal estimator with advantages for real-time operation. Based on the real-time version, we present a novel approach to stabilizing the average distance of the tip from the sample. We also present the results of a series of covariance analyses that validate and bound the applicability of the suboptimal estimator, make a comparison with the results of an optimal estimator and guide the user. We discuss the application of the KP to the LISA and to a test of the weak equivalence principle.
△ Less
Submitted 3 June, 2013;
originally announced June 2013.