-
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Authors:
Vincent Conitzer,
Rachel Freedman,
Jobst Heitzig,
Wesley H. Holliday,
Bob M. Jacobs,
Nathan Lambert,
Milan Mossé,
Eric Pacuit,
Stuart Russell,
Hailey Schoelkopf,
Emanuel Tewolde,
William S. Zwicker
Abstract:
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as hel** to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level prin…
▽ More
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as hel** to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.
△ Less
Submitted 4 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Learning to Manipulate under Limited Information
Authors:
Wesley H. Holliday,
Alexander Kristoffersen,
Eric Pacuit
Abstract:
By classic results in social choice theory, any reasonable preferential voting method sometimes gives individuals an incentive to report an insincere preference. The extent to which different voting methods are more or less resistant to such strategic manipulation has become a key consideration for comparing voting methods. Here we measure resistance to manipulation by whether neural networks of v…
▽ More
By classic results in social choice theory, any reasonable preferential voting method sometimes gives individuals an incentive to report an insincere preference. The extent to which different voting methods are more or less resistant to such strategic manipulation has become a key consideration for comparing voting methods. Here we measure resistance to manipulation by whether neural networks of varying sizes can learn to profitably manipulate a given voting method in expectation, given different types of limited information about how other voters will vote. We trained over 70,000 neural networks of 26 sizes to manipulate against 8 different voting methods, under 6 types of limited information, in committee-sized elections with 5-21 voters and 3-6 candidates. We find that some voting methods, such as Borda, are highly manipulable by networks with limited information, while others, such as Instant Runoff, are not, despite being quite profitably manipulated by an ideal manipulator with full information. For the two probability models for elections that we use, the overall least manipulable of the 8 methods we study are Condorcet methods, namely Minimax and Split Cycle.
△ Less
Submitted 15 April, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
An extension of May's Theorem to three alternatives: axiomatizing Minimax voting
Authors:
Wesley H. Holliday,
Eric Pacuit
Abstract:
May's Theorem [K. O. May, Econometrica 20 (1952) 680-684] characterizes majority voting on two alternatives as the unique preferential voting method satisfying several simple axioms. Here we show that by adding some desirable axioms to May's axioms, we can uniquely determine how to vote on three alternatives (setting aside tiebreaking). In particular, we add two axioms stating that the voting meth…
▽ More
May's Theorem [K. O. May, Econometrica 20 (1952) 680-684] characterizes majority voting on two alternatives as the unique preferential voting method satisfying several simple axioms. Here we show that by adding some desirable axioms to May's axioms, we can uniquely determine how to vote on three alternatives (setting aside tiebreaking). In particular, we add two axioms stating that the voting method should mitigate spoiler effects and avoid the so-called strong no show paradox. We prove a theorem stating that any preferential voting method satisfying our enlarged set of axioms, which includes some weak homogeneity and preservation axioms, must choose from among the Minimax winners in all three-alternative elections. When applied to more than three alternatives, our axioms also distinguish Minimax from other known voting methods that coincide with or refine Minimax for three alternatives.
△ Less
Submitted 2 July, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
An Axiomatic Characterization of Split Cycle
Authors:
Yifeng Ding,
Wesley H. Holliday,
Eric Pacuit
Abstract:
A number of rules for resolving majority cycles in elections have been proposed in the literature. Recently, Holliday and Pacuit (Journal of Theoretical Politics 33 (2021) 475-524) axiomatically characterized the class of rules refined by one such cycle-resolving rule, dubbed Split Cycle: in each majority cycle, discard the majority preferences with the smallest majority margin. They showed that a…
▽ More
A number of rules for resolving majority cycles in elections have been proposed in the literature. Recently, Holliday and Pacuit (Journal of Theoretical Politics 33 (2021) 475-524) axiomatically characterized the class of rules refined by one such cycle-resolving rule, dubbed Split Cycle: in each majority cycle, discard the majority preferences with the smallest majority margin. They showed that any rule satisfying five standard axioms plus a weakening of Arrow's Independence of Irrelevant Alternatives (IIA), called Coherent IIA, is refined by Split Cycle. In this paper, we go further and show that Split Cycle is the only rule satisfying the axioms of Holliday and Pacuit together with two additional axioms, which characterize the class of rules that refine Split Cycle: Coherent Defeat and Positive Involvement in Defeat. Coherent Defeat states that any majority preference not occurring in a cycle is retained, while Positive Involvement in Defeat is closely related to the well-known axiom of Positive Involvement (as in J. Perez, Social Choice and Welfare 18 (2001) 601-616). We characterize Split Cycle not only as a collective choice rule but also as a social choice correspondence, over both profiles of linear ballots and profiles of ballots allowing ties.
△ Less
Submitted 28 June, 2024; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Impossibility theorems involving weakenings of expansion consistency and resoluteness in voting
Authors:
Wesley H. Holliday,
Chase Norman,
Eric Pacuit,
Saam Zahedian
Abstract:
A fundamental principle of individual rational choice is Sen's $γ$ axiom, also known as expansion consistency, stating that any alternative chosen from each of two menus must be chosen from the union of the menus. Expansion consistency can also be formulated in the setting of social choice. In voting theory, it states that any candidate chosen from two fields of candidates must be chosen from the…
▽ More
A fundamental principle of individual rational choice is Sen's $γ$ axiom, also known as expansion consistency, stating that any alternative chosen from each of two menus must be chosen from the union of the menus. Expansion consistency can also be formulated in the setting of social choice. In voting theory, it states that any candidate chosen from two fields of candidates must be chosen from the combined field of candidates. An important special case of the axiom is binary expansion consistency, which states that any candidate chosen from an initial field of candidates and chosen in a head-to-head match with a new candidate must also be chosen when the new candidate is added to the field, thereby ruling out spoiler effects. In this paper, we study the tension between this weakening of expansion consistency and weakenings of resoluteness, an axiom demanding the choice of a single candidate in any election. As is well known, resoluteness is inconsistent with basic fairness conditions on social choice, namely anonymity and neutrality. Here we prove that even significant weakenings of resoluteness, which are consistent with anonymity and neutrality, are inconsistent with binary expansion consistency. The proofs make use of SAT solving, with the correctness of a SAT encoding formally verified in the Lean Theorem Prover, as well as a strategy for generalizing impossibility theorems obtained for special types of voting methods (namely majoritarian and pairwise voting methods) to impossibility theorems for arbitrary voting methods. This proof strategy may be of independent interest for its potential applicability to other impossibility theorems in social choice.
△ Less
Submitted 26 March, 2023; v1 submitted 14 August, 2022;
originally announced August 2022.
-
Voting Theory in the Lean Theorem Prover
Authors:
Wesley H. Holliday,
Chase Norman,
Eric Pacuit
Abstract:
There is a long tradition of fruitful interaction between logic and social choice theory. In recent years, much of this interaction has focused on computer-aided methods such as SAT solving and interactive theorem proving. In this paper, we report on the development of a framework for formalizing voting theory in the Lean theorem prover, which we have applied to verify properties of a recently stu…
▽ More
There is a long tradition of fruitful interaction between logic and social choice theory. In recent years, much of this interaction has focused on computer-aided methods such as SAT solving and interactive theorem proving. In this paper, we report on the development of a framework for formalizing voting theory in the Lean theorem prover, which we have applied to verify properties of a recently studied voting method. While previous applications of interactive theorem proving to social choice (using Isabelle/HOL and Mizar) have focused on the verification of impossibility theorems, we aim to cover a variety of results ranging from impossibility theorems to the verification of properties of specific voting methods (e.g., Condorcet consistency, independence of clones, etc.). In order to formalize voting theoretic axioms concerning adding or removing candidates and voters, we work in a variable-election setting whose formalization makes use of dependent types in Lean.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Stable Voting
Authors:
Wesley H. Holliday,
Eric Pacuit
Abstract:
We propose a new single-winner voting system using ranked ballots: Stable Voting. The motivating principle of Stable Voting is that if a candidate A would win without another candidate B in the election, and A beats B in a head-to-head majority comparison, then A should still win in the election with B included (unless there is another candidate A' who has the same kind of claim to winning, in whi…
▽ More
We propose a new single-winner voting system using ranked ballots: Stable Voting. The motivating principle of Stable Voting is that if a candidate A would win without another candidate B in the election, and A beats B in a head-to-head majority comparison, then A should still win in the election with B included (unless there is another candidate A' who has the same kind of claim to winning, in which case a tiebreaker may choose between such candidates). We call this principle Stability for Winners (with Tiebreaking). Stable Voting satisfies this principle while also having a remarkable ability to avoid tied outcomes in elections even with small numbers of voters.
△ Less
Submitted 11 February, 2023; v1 submitted 1 August, 2021;
originally announced August 2021.
-
Measuring Violations of Positive Involvement in Voting
Authors:
Wesley H. Holliday,
Eric Pacuit
Abstract:
In the context of computational social choice, we study voting methods that assign a set of winners to each profile of voter preferences. A voting method satisfies the property of positive involvement (PI) if for any election in which a candidate x would be among the winners, adding another voter to the election who ranks x first does not cause x to lose. Surprisingly, a number of standard voting…
▽ More
In the context of computational social choice, we study voting methods that assign a set of winners to each profile of voter preferences. A voting method satisfies the property of positive involvement (PI) if for any election in which a candidate x would be among the winners, adding another voter to the election who ranks x first does not cause x to lose. Surprisingly, a number of standard voting methods violate this natural property. In this paper, we investigate different ways of measuring the extent to which a voting method violates PI, using computer simulations. We consider the probability (under different probability models for preferences) of PI violations in randomly drawn profiles vs. profile-coalition pairs (involving coalitions of different sizes). We argue that in order to choose between a voting method that satisfies PI and one that does not, we should consider the probability of PI violation conditional on the voting methods choosing different winners. We should also relativize the probability of PI violation to what we call voter potency, the probability that a voter causes a candidate to lose. Although absolute frequencies of PI violations may be low, after this conditioning and relativization, we see that under certain voting methods that violate PI, much of a voter's potency is turned against them - in particular, against their desire to see their favorite candidate elected.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Axioms for Defeat in Democratic Elections
Authors:
Wesley H. Holliday,
Eric Pacuit
Abstract:
We propose six axioms concerning when one candidate should defeat another in a democratic election involving two or more candidates. Five of the axioms are widely satisfied by known voting procedures. The sixth axiom is a weakening of Kenneth Arrow's famous condition of the Independence of Irrelevant Alternatives (IIA). We call this weakening Coherent IIA. We prove that the five axioms plus Cohere…
▽ More
We propose six axioms concerning when one candidate should defeat another in a democratic election involving two or more candidates. Five of the axioms are widely satisfied by known voting procedures. The sixth axiom is a weakening of Kenneth Arrow's famous condition of the Independence of Irrelevant Alternatives (IIA). We call this weakening Coherent IIA. We prove that the five axioms plus Coherent IIA single out a method of determining defeats studied in our recent work: Split Cycle. In particular, Split Cycle provides the most resolute definition of defeat among any satisfying the six axioms for democratic defeat. In addition, we analyze how Split Cycle escapes Arrow's Impossibility Theorem and related impossibility results.
△ Less
Submitted 12 October, 2023; v1 submitted 15 August, 2020;
originally announced August 2020.
-
Intention as Commitment toward Time
Authors:
Marc van Zee,
Dragan Doder,
Leendert van der Torre,
Mehdi Dastani,
Thomas Icard,
Eric Pacuit
Abstract:
In this paper we address the interplay among intention, time, and belief in dynamic environments. The first contribution is a logic for reasoning about intention, time and belief, in which assumptions of intentions are represented by preconditions of intended actions. Intentions and beliefs are coherent as long as these assumptions are not violated, i.e. as long as intended actions can be performe…
▽ More
In this paper we address the interplay among intention, time, and belief in dynamic environments. The first contribution is a logic for reasoning about intention, time and belief, in which assumptions of intentions are represented by preconditions of intended actions. Intentions and beliefs are coherent as long as these assumptions are not violated, i.e. as long as intended actions can be performed such that their preconditions hold as well. The second contribution is the formalization of what-if scenarios: what happens with intentions and beliefs if a new (possibly conflicting) intention is adopted, or a new fact is learned? An agent is committed to its intended actions as long as its belief-intention database is coherent. We conceptualize intention as commitment toward time and we develop AGM-based postulates for the iterated revision of belief-intention databases, and we prove a Katsuno-Mendelzon-style representation theorem.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
Split Cycle: A New Condorcet Consistent Voting Method Independent of Clones and Immune to Spoilers
Authors:
Wesley H. Holliday,
Eric Pacuit
Abstract:
We propose a Condorcet consistent voting method that we call Split Cycle. Split Cycle belongs to the small family of known voting methods satisfying the anti-vote-splitting criterion of independence of clones. In this family, only Split Cycle satisfies a new criterion we call immunity to spoilers, which concerns adding candidates to elections, as well as the known criteria of positive involvement…
▽ More
We propose a Condorcet consistent voting method that we call Split Cycle. Split Cycle belongs to the small family of known voting methods satisfying the anti-vote-splitting criterion of independence of clones. In this family, only Split Cycle satisfies a new criterion we call immunity to spoilers, which concerns adding candidates to elections, as well as the known criteria of positive involvement and negative involvement, which concern adding voters to elections. Thus, in contrast to other clone-independent methods, Split Cycle mitigates both "spoiler effects" and "strong no show paradoxes."
△ Less
Submitted 28 November, 2023; v1 submitted 5 April, 2020;
originally announced April 2020.
-
Strategic Voting Under Uncertainty About the Voting Method
Authors:
Wesley H. Holliday,
Eric Pacuit
Abstract:
Much of the theoretical work on strategic voting makes strong assumptions about what voters know about the voting situation. A strategizing voter is typically assumed to know how other voters will vote and to know the rules of the voting method. A growing body of literature explores strategic voting when there is uncertainty about how others will vote. In this paper, we study strategic voting when…
▽ More
Much of the theoretical work on strategic voting makes strong assumptions about what voters know about the voting situation. A strategizing voter is typically assumed to know how other voters will vote and to know the rules of the voting method. A growing body of literature explores strategic voting when there is uncertainty about how others will vote. In this paper, we study strategic voting when there is uncertainty about the voting method. We introduce three notions of manipulability for a set of voting methods: sure, safe, and expected manipulability. With the help of a computer program, we identify voting scenarios in which uncertainty about the voting method may reduce or even eliminate a voter's incentive to misrepresent her preferences. Thus, it may be in the interest of an election designer who wishes to reduce strategic voting to leave voters uncertain about which of several reasonable voting methods will be used to determine the winners of an election.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
When is an Example a Counterexample?
Authors:
Eric Pacuit,
Arthur Paul Pedersen,
Jan-Willem Romeijn
Abstract:
In this extended abstract, we carefully examine a purported counterexample to a postulate of iterated belief revision. We suggest that the example is better seen as a failure to apply the theory of belief revision in sufficient detail. The main contribution is conceptual aiming at the literature on the philosophical foundations of the AGM theory of belief revision [1]. Our discussion is centered a…
▽ More
In this extended abstract, we carefully examine a purported counterexample to a postulate of iterated belief revision. We suggest that the example is better seen as a failure to apply the theory of belief revision in sufficient detail. The main contribution is conceptual aiming at the literature on the philosophical foundations of the AGM theory of belief revision [1]. Our discussion is centered around the observation that it is often unclear whether a specific example is a "genuine" counterexample to an abstract theory or a misapplication of that theory to a concrete case.
△ Less
Submitted 23 October, 2013;
originally announced October 2013.
-
Evidence and plausibility in neighborhood structures
Authors:
Johan van Benthem,
David Fernández-Duque,
Eric Pacuit
Abstract:
The intuitive notion of evidence has both semantic and syntactic features. In this paper, we develop an {\em evidence logic} for epistemic agents faced with possibly contradictory evidence from different sources. The logic is based on a neighborhood semantics, where a neighborhood $N$ indicates that the agent has reason to believe that the true state of the world lies in $N$. Further notions of re…
▽ More
The intuitive notion of evidence has both semantic and syntactic features. In this paper, we develop an {\em evidence logic} for epistemic agents faced with possibly contradictory evidence from different sources. The logic is based on a neighborhood semantics, where a neighborhood $N$ indicates that the agent has reason to believe that the true state of the world lies in $N$. Further notions of relative plausibility between worlds and beliefs based on the latter ordering are then defined in terms of this evidence structure, yielding our intended models for evidence-based beliefs. In addition, we also consider a second more general flavor, where belief and plausibility are modeled using additional primitive relations, and we prove a representation theorem showing that each such general model is a $p$-morphic image of an intended one. This semantics invites a number of natural special cases, depending on how uniform we make the evidence sets, and how coherent their total structure. We give a structural study of the resulting `uniform' and `flat' models. Our main result are sound and complete axiomatizations for the logics of all four major model classes with respect to the modal language of evidence, belief and safe belief. We conclude with an outlook toward logics for the dynamics of changing evidence, and the resulting language extensions and connections with logics of plausibility change.
△ Less
Submitted 4 July, 2013;
originally announced July 2013.
-
Neighbourhood Structures: Bisimilarity and Basic Model Theory
Authors:
Helle Hvid Hansen,
Clemens Kupke,
Eric Pacuit
Abstract:
Neighbourhood structures are the standard semantic tool used to reason about non-normal modal logics. The logic of all neighbourhood models is called classical modal logic. In coalgebraic terms, a neighbourhood frame is a coalgebra for the contravariant powerset functor composed with itself, denoted by 2^2. We use this coalgebraic modelling to derive notions of equivalence between neighbourhood…
▽ More
Neighbourhood structures are the standard semantic tool used to reason about non-normal modal logics. The logic of all neighbourhood models is called classical modal logic. In coalgebraic terms, a neighbourhood frame is a coalgebra for the contravariant powerset functor composed with itself, denoted by 2^2. We use this coalgebraic modelling to derive notions of equivalence between neighbourhood structures. 2^2-bisimilarity and behavioural equivalence are well known coalgebraic concepts, and they are distinct, since 2^2 does not preserve weak pullbacks. We introduce a third, intermediate notion whose witnessing relations we call precocongruences (based on pushouts). We give back-and-forth style characterisations for 2^2-bisimulations and precocongruences, we show that on a single coalgebra, precocongruences capture behavioural equivalence, and that between neighbourhood structures, precocongruences are a better approximation of behavioural equivalence than 2^2-bisimulations. We also introduce a notion of modal saturation for neighbourhood models, and investigate its relationship with definability and image-finiteness. We prove a Hennessy-Milner theorem for modally saturated and for image-finite neighbourhood models. Our main results are an analogue of Van Benthem's characterisation theorem and a model-theoretic proof of Craig interpolation for classical modal logic.
△ Less
Submitted 20 April, 2009; v1 submitted 28 January, 2009;
originally announced January 2009.