Search | arXiv e-print repository

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

Authors: Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mossé, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker

Abstract: Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as hel** to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level prin… ▽ More Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as hel** to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023. △ Less

Submitted 4 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 15 pages, 4 figures

MSC Class: 68T01; 68T50; 91B14; 91B12 ACM Class: I.2.0; I.2.7; K.4.2; I.2.m; J.4

arXiv:2401.16412 [pdf, other]

Learning to Manipulate under Limited Information

Authors: Wesley H. Holliday, Alexander Kristoffersen, Eric Pacuit

Abstract: By classic results in social choice theory, any reasonable preferential voting method sometimes gives individuals an incentive to report an insincere preference. The extent to which different voting methods are more or less resistant to such strategic manipulation has become a key consideration for comparing voting methods. Here we measure resistance to manipulation by whether neural networks of v… ▽ More By classic results in social choice theory, any reasonable preferential voting method sometimes gives individuals an incentive to report an insincere preference. The extent to which different voting methods are more or less resistant to such strategic manipulation has become a key consideration for comparing voting methods. Here we measure resistance to manipulation by whether neural networks of varying sizes can learn to profitably manipulate a given voting method in expectation, given different types of limited information about how other voters will vote. We trained over 70,000 neural networks of 26 sizes to manipulate against 8 different voting methods, under 6 types of limited information, in committee-sized elections with 5-21 voters and 3-6 candidates. We find that some voting methods, such as Borda, are highly manipulable by networks with limited information, while others, such as Instant Runoff, are not, despite being quite profitably manipulated by an ideal manipulator with full information. For the two probability models for elections that we use, the overall least manipulable of the 8 methods we study are Condorcet methods, namely Minimax and Split Cycle. △ Less

Submitted 15 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Appears at the 1st Workshop on Social Choice and Learning Algorithms (SCaLA 2024) held at the 23rd International Conference on Autonomous Agents and Multiagent Systems, organized by B. Armstrong, R. Fairstein, N. Mattei, and Z. Terzopoulou, May 6-7, 2024, Auckland, New Zealand

MSC Class: 91B12; 91B14; 91B10; 68T07 ACM Class: I.2.6; I.2.11

arXiv:2312.14256 [pdf, ps, other]

An extension of May's Theorem to three alternatives: axiomatizing Minimax voting

Authors: Wesley H. Holliday, Eric Pacuit

Abstract: May's Theorem [K. O. May, Econometrica 20 (1952) 680-684] characterizes majority voting on two alternatives as the unique preferential voting method satisfying several simple axioms. Here we show that by adding some desirable axioms to May's axioms, we can uniquely determine how to vote on three alternatives (setting aside tiebreaking). In particular, we add two axioms stating that the voting meth… ▽ More May's Theorem [K. O. May, Econometrica 20 (1952) 680-684] characterizes majority voting on two alternatives as the unique preferential voting method satisfying several simple axioms. Here we show that by adding some desirable axioms to May's axioms, we can uniquely determine how to vote on three alternatives (setting aside tiebreaking). In particular, we add two axioms stating that the voting method should mitigate spoiler effects and avoid the so-called strong no show paradox. We prove a theorem stating that any preferential voting method satisfying our enlarged set of axioms, which includes some weak homogeneity and preservation axioms, must choose from among the Minimax winners in all three-alternative elections. When applied to more than three alternatives, our axioms also distinguish Minimax from other known voting methods that coincide with or refine Minimax for three alternatives. △ Less

Submitted 2 July, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Generalized main theorem and added Section 5 and Appendix A

MSC Class: 91B12; 91B14; 91B10 ACM Class: I.2.11

arXiv:2210.12503 [pdf, other]

An Axiomatic Characterization of Split Cycle

Authors: Yifeng Ding, Wesley H. Holliday, Eric Pacuit

Abstract: A number of rules for resolving majority cycles in elections have been proposed in the literature. Recently, Holliday and Pacuit (Journal of Theoretical Politics 33 (2021) 475-524) axiomatically characterized the class of rules refined by one such cycle-resolving rule, dubbed Split Cycle: in each majority cycle, discard the majority preferences with the smallest majority margin. They showed that a… ▽ More A number of rules for resolving majority cycles in elections have been proposed in the literature. Recently, Holliday and Pacuit (Journal of Theoretical Politics 33 (2021) 475-524) axiomatically characterized the class of rules refined by one such cycle-resolving rule, dubbed Split Cycle: in each majority cycle, discard the majority preferences with the smallest majority margin. They showed that any rule satisfying five standard axioms plus a weakening of Arrow's Independence of Irrelevant Alternatives (IIA), called Coherent IIA, is refined by Split Cycle. In this paper, we go further and show that Split Cycle is the only rule satisfying the axioms of Holliday and Pacuit together with two additional axioms, which characterize the class of rules that refine Split Cycle: Coherent Defeat and Positive Involvement in Defeat. Coherent Defeat states that any majority preference not occurring in a cycle is retained, while Positive Involvement in Defeat is closely related to the well-known axiom of Positive Involvement (as in J. Perez, Social Choice and Welfare 18 (2001) 601-616). We characterize Split Cycle not only as a collective choice rule but also as a social choice correspondence, over both profiles of linear ballots and profiles of ballots allowing ties. △ Less

Submitted 28 June, 2024; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: Forthcoming in Social Choice and Welfare

MSC Class: 91B12; 91B14; 91B10 ACM Class: I.2.11

arXiv:2208.06907 [pdf, ps, other]

Impossibility theorems involving weakenings of expansion consistency and resoluteness in voting

Authors: Wesley H. Holliday, Chase Norman, Eric Pacuit, Saam Zahedian

Abstract: A fundamental principle of individual rational choice is Sen's $γ$ axiom, also known as expansion consistency, stating that any alternative chosen from each of two menus must be chosen from the union of the menus. Expansion consistency can also be formulated in the setting of social choice. In voting theory, it states that any candidate chosen from two fields of candidates must be chosen from the… ▽ More A fundamental principle of individual rational choice is Sen's $γ$ axiom, also known as expansion consistency, stating that any alternative chosen from each of two menus must be chosen from the union of the menus. Expansion consistency can also be formulated in the setting of social choice. In voting theory, it states that any candidate chosen from two fields of candidates must be chosen from the combined field of candidates. An important special case of the axiom is binary expansion consistency, which states that any candidate chosen from an initial field of candidates and chosen in a head-to-head match with a new candidate must also be chosen when the new candidate is added to the field, thereby ruling out spoiler effects. In this paper, we study the tension between this weakening of expansion consistency and weakenings of resoluteness, an axiom demanding the choice of a single candidate in any election. As is well known, resoluteness is inconsistent with basic fairness conditions on social choice, namely anonymity and neutrality. Here we prove that even significant weakenings of resoluteness, which are consistent with anonymity and neutrality, are inconsistent with binary expansion consistency. The proofs make use of SAT solving, with the correctness of a SAT encoding formally verified in the Lean Theorem Prover, as well as a strategy for generalizing impossibility theorems obtained for special types of voting methods (namely majoritarian and pairwise voting methods) to impossibility theorems for arbitrary voting methods. This proof strategy may be of independent interest for its potential applicability to other impossibility theorems in social choice. △ Less

Submitted 26 March, 2023; v1 submitted 14 August, 2022; originally announced August 2022.

Comments: Forthcoming in Mathematical Analyses of Decisions, Voting, and Games, eds. M. A. Jones, D. McCune, and J. Wilson, Contemporary Mathematics, American Mathematical Society, 2023

MSC Class: 91B12; 91B14 ACM Class: J.4

arXiv:2110.08453 [pdf, ps, other]

Voting Theory in the Lean Theorem Prover

Authors: Wesley H. Holliday, Chase Norman, Eric Pacuit

Abstract: There is a long tradition of fruitful interaction between logic and social choice theory. In recent years, much of this interaction has focused on computer-aided methods such as SAT solving and interactive theorem proving. In this paper, we report on the development of a framework for formalizing voting theory in the Lean theorem prover, which we have applied to verify properties of a recently stu… ▽ More There is a long tradition of fruitful interaction between logic and social choice theory. In recent years, much of this interaction has focused on computer-aided methods such as SAT solving and interactive theorem proving. In this paper, we report on the development of a framework for formalizing voting theory in the Lean theorem prover, which we have applied to verify properties of a recently studied voting method. While previous applications of interactive theorem proving to social choice (using Isabelle/HOL and Mizar) have focused on the verification of impossibility theorems, we aim to cover a variety of results ranging from impossibility theorems to the verification of properties of specific voting methods (e.g., Condorcet consistency, independence of clones, etc.). In order to formalize voting theoretic axioms concerning adding or removing candidates and voters, we work in a variable-election setting whose formalization makes use of dependent types in Lean. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: Postprint of the paper in Proceedings of the Eighth International Conference on Logic, Rationality and Interaction (Springer) with two typos fixed

MSC Class: 68V15; 03B70; 91B12; 91B14 ACM Class: I.2.3

arXiv:2108.00542 [pdf, other]

Stable Voting

Authors: Wesley H. Holliday, Eric Pacuit

Abstract: We propose a new single-winner voting system using ranked ballots: Stable Voting. The motivating principle of Stable Voting is that if a candidate A would win without another candidate B in the election, and A beats B in a head-to-head majority comparison, then A should still win in the election with B included (unless there is another candidate A' who has the same kind of claim to winning, in whi… ▽ More We propose a new single-winner voting system using ranked ballots: Stable Voting. The motivating principle of Stable Voting is that if a candidate A would win without another candidate B in the election, and A beats B in a head-to-head majority comparison, then A should still win in the election with B included (unless there is another candidate A' who has the same kind of claim to winning, in which case a tiebreaker may choose between such candidates). We call this principle Stability for Winners (with Tiebreaking). Stable Voting satisfies this principle while also having a remarkable ability to avoid tied outcomes in elections even with small numbers of voters. △ Less

Submitted 11 February, 2023; v1 submitted 1 August, 2021; originally announced August 2021.

Comments: Added proof of Proposition 1. Forthcoming in Constitutional Political Economy

MSC Class: 91B12; 91B14; 91B10 ACM Class: I.2.11

arXiv:2106.11502 [pdf, other]

doi 10.4204/EPTCS.335.17

Measuring Violations of Positive Involvement in Voting

Authors: Wesley H. Holliday, Eric Pacuit

Abstract: In the context of computational social choice, we study voting methods that assign a set of winners to each profile of voter preferences. A voting method satisfies the property of positive involvement (PI) if for any election in which a candidate x would be among the winners, adding another voter to the election who ranks x first does not cause x to lose. Surprisingly, a number of standard voting… ▽ More In the context of computational social choice, we study voting methods that assign a set of winners to each profile of voter preferences. A voting method satisfies the property of positive involvement (PI) if for any election in which a candidate x would be among the winners, adding another voter to the election who ranks x first does not cause x to lose. Surprisingly, a number of standard voting methods violate this natural property. In this paper, we investigate different ways of measuring the extent to which a voting method violates PI, using computer simulations. We consider the probability (under different probability models for preferences) of PI violations in randomly drawn profiles vs. profile-coalition pairs (involving coalitions of different sizes). We argue that in order to choose between a voting method that satisfies PI and one that does not, we should consider the probability of PI violation conditional on the voting methods choosing different winners. We should also relativize the probability of PI violation to what we call voter potency, the probability that a voter causes a candidate to lose. Although absolute frequencies of PI violations may be low, after this conditioning and relativization, we see that under certain voting methods that violate PI, much of a voter's potency is turned against them - in particular, against their desire to see their favorite candidate elected. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: In Proceedings TARK 2021, arXiv:2106.10886

ACM Class: I.2.11

Journal ref: EPTCS 335, 2021, pp. 189-209

arXiv:2008.08451 [pdf, ps, other]

Axioms for Defeat in Democratic Elections

Authors: Wesley H. Holliday, Eric Pacuit

Abstract: We propose six axioms concerning when one candidate should defeat another in a democratic election involving two or more candidates. Five of the axioms are widely satisfied by known voting procedures. The sixth axiom is a weakening of Kenneth Arrow's famous condition of the Independence of Irrelevant Alternatives (IIA). We call this weakening Coherent IIA. We prove that the five axioms plus Cohere… ▽ More We propose six axioms concerning when one candidate should defeat another in a democratic election involving two or more candidates. Five of the axioms are widely satisfied by known voting procedures. The sixth axiom is a weakening of Kenneth Arrow's famous condition of the Independence of Irrelevant Alternatives (IIA). We call this weakening Coherent IIA. We prove that the five axioms plus Coherent IIA single out a method of determining defeats studied in our recent work: Split Cycle. In particular, Split Cycle provides the most resolute definition of defeat among any satisfying the six axioms for democratic defeat. In addition, we analyze how Split Cycle escapes Arrow's Impossibility Theorem and related impossibility results. △ Less

Submitted 12 October, 2023; v1 submitted 15 August, 2020; originally announced August 2020.

Comments: Added missing edges in first graph in Remark 4.10

MSC Class: 91B12; 91B14; 91B10 ACM Class: I.2.11

Journal ref: Journal of Theoretical Politics, Vol. 33(4), 475-524, 2021

arXiv:2004.08144 [pdf, other]

doi 10.1016/j.artint.2020.103270

Intention as Commitment toward Time

Authors: Marc van Zee, Dragan Doder, Leendert van der Torre, Mehdi Dastani, Thomas Icard, Eric Pacuit

Abstract: In this paper we address the interplay among intention, time, and belief in dynamic environments. The first contribution is a logic for reasoning about intention, time and belief, in which assumptions of intentions are represented by preconditions of intended actions. Intentions and beliefs are coherent as long as these assumptions are not violated, i.e. as long as intended actions can be performe… ▽ More In this paper we address the interplay among intention, time, and belief in dynamic environments. The first contribution is a logic for reasoning about intention, time and belief, in which assumptions of intentions are represented by preconditions of intended actions. Intentions and beliefs are coherent as long as these assumptions are not violated, i.e. as long as intended actions can be performed such that their preconditions hold as well. The second contribution is the formalization of what-if scenarios: what happens with intentions and beliefs if a new (possibly conflicting) intention is adopted, or a new fact is learned? An agent is committed to its intended actions as long as its belief-intention database is coherent. We conceptualize intention as commitment toward time and we develop AGM-based postulates for the iterated revision of belief-intention databases, and we prove a Katsuno-Mendelzon-style representation theorem. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: 83 pages, 4 figures, Artificial Intelligence journal pre-print

Journal ref: Artificial Intelligence, Volume 283, June 2020, 103270

arXiv:2004.02350 [pdf, other]

doi 10.1007/s11127-023-01042-3

Split Cycle: A New Condorcet Consistent Voting Method Independent of Clones and Immune to Spoilers

Authors: Wesley H. Holliday, Eric Pacuit

Abstract: We propose a Condorcet consistent voting method that we call Split Cycle. Split Cycle belongs to the small family of known voting methods satisfying the anti-vote-splitting criterion of independence of clones. In this family, only Split Cycle satisfies a new criterion we call immunity to spoilers, which concerns adding candidates to elections, as well as the known criteria of positive involvement… ▽ More We propose a Condorcet consistent voting method that we call Split Cycle. Split Cycle belongs to the small family of known voting methods satisfying the anti-vote-splitting criterion of independence of clones. In this family, only Split Cycle satisfies a new criterion we call immunity to spoilers, which concerns adding candidates to elections, as well as the known criteria of positive involvement and negative involvement, which concern adding voters to elections. Thus, in contrast to other clone-independent methods, Split Cycle mitigates both "spoiler effects" and "strong no show paradoxes." △ Less

Submitted 28 November, 2023; v1 submitted 5 April, 2020; originally announced April 2020.

Comments: Final version with tables and graphs updated with additional data

MSC Class: 91B12; 91B14; 91B10 ACM Class: I.2.11

Journal ref: Public Choice, Vol. 197, 1-62, 2023

arXiv:1907.09110 [pdf, other]

doi 10.4204/EPTCS.297.17

Strategic Voting Under Uncertainty About the Voting Method

Authors: Wesley H. Holliday, Eric Pacuit

Abstract: Much of the theoretical work on strategic voting makes strong assumptions about what voters know about the voting situation. A strategizing voter is typically assumed to know how other voters will vote and to know the rules of the voting method. A growing body of literature explores strategic voting when there is uncertainty about how others will vote. In this paper, we study strategic voting when… ▽ More Much of the theoretical work on strategic voting makes strong assumptions about what voters know about the voting situation. A strategizing voter is typically assumed to know how other voters will vote and to know the rules of the voting method. A growing body of literature explores strategic voting when there is uncertainty about how others will vote. In this paper, we study strategic voting when there is uncertainty about the voting method. We introduce three notions of manipulability for a set of voting methods: sure, safe, and expected manipulability. With the help of a computer program, we identify voting scenarios in which uncertainty about the voting method may reduce or even eliminate a voter's incentive to misrepresent her preferences. Thus, it may be in the interest of an election designer who wishes to reduce strategic voting to leave voters uncertain about which of several reasonable voting methods will be used to determine the winners of an election. △ Less

Submitted 21 July, 2019; originally announced July 2019.

Comments: In Proceedings TARK 2019, arXiv:1907.08335

Journal ref: EPTCS 297, 2019, pp. 252-272

arXiv:1310.6432 [pdf]

When is an Example a Counterexample?

Authors: Eric Pacuit, Arthur Paul Pedersen, Jan-Willem Romeijn

Abstract: In this extended abstract, we carefully examine a purported counterexample to a postulate of iterated belief revision. We suggest that the example is better seen as a failure to apply the theory of belief revision in sufficient detail. The main contribution is conceptual aiming at the literature on the philosophical foundations of the AGM theory of belief revision [1]. Our discussion is centered a… ▽ More In this extended abstract, we carefully examine a purported counterexample to a postulate of iterated belief revision. We suggest that the example is better seen as a failure to apply the theory of belief revision in sufficient detail. The main contribution is conceptual aiming at the literature on the philosophical foundations of the AGM theory of belief revision [1]. Our discussion is centered around the observation that it is often unclear whether a specific example is a "genuine" counterexample to an abstract theory or a misapplication of that theory to a concrete case. △ Less

Submitted 23 October, 2013; originally announced October 2013.

Comments: 10 pages, Contributed talk at TARK 2013 (arXiv:1310.6382) http://www.tark.org

Report number: TARK/2013/p156

arXiv:1307.1277 [pdf, ps, other]

Evidence and plausibility in neighborhood structures

Authors: Johan van Benthem, David Fernández-Duque, Eric Pacuit

Abstract: The intuitive notion of evidence has both semantic and syntactic features. In this paper, we develop an {\em evidence logic} for epistemic agents faced with possibly contradictory evidence from different sources. The logic is based on a neighborhood semantics, where a neighborhood $N$ indicates that the agent has reason to believe that the true state of the world lies in $N$. Further notions of re… ▽ More The intuitive notion of evidence has both semantic and syntactic features. In this paper, we develop an {\em evidence logic} for epistemic agents faced with possibly contradictory evidence from different sources. The logic is based on a neighborhood semantics, where a neighborhood $N$ indicates that the agent has reason to believe that the true state of the world lies in $N$. Further notions of relative plausibility between worlds and beliefs based on the latter ordering are then defined in terms of this evidence structure, yielding our intended models for evidence-based beliefs. In addition, we also consider a second more general flavor, where belief and plausibility are modeled using additional primitive relations, and we prove a representation theorem showing that each such general model is a $p$-morphic image of an intended one. This semantics invites a number of natural special cases, depending on how uniform we make the evidence sets, and how coherent their total structure. We give a structural study of the resulting `uniform' and `flat' models. Our main result are sound and complete axiomatizations for the logics of all four major model classes with respect to the modal language of evidence, belief and safe belief. We conclude with an outlook toward logics for the dynamics of changing evidence, and the resulting language extensions and connections with logics of plausibility change. △ Less

Submitted 4 July, 2013; originally announced July 2013.

arXiv:0901.4430 [pdf, other]

doi 10.2168/LMCS-5(2:2)2009

Neighbourhood Structures: Bisimilarity and Basic Model Theory

Authors: Helle Hvid Hansen, Clemens Kupke, Eric Pacuit

Abstract: Neighbourhood structures are the standard semantic tool used to reason about non-normal modal logics. The logic of all neighbourhood models is called classical modal logic. In coalgebraic terms, a neighbourhood frame is a coalgebra for the contravariant powerset functor composed with itself, denoted by 2^2. We use this coalgebraic modelling to derive notions of equivalence between neighbourhood… ▽ More Neighbourhood structures are the standard semantic tool used to reason about non-normal modal logics. The logic of all neighbourhood models is called classical modal logic. In coalgebraic terms, a neighbourhood frame is a coalgebra for the contravariant powerset functor composed with itself, denoted by 2^2. We use this coalgebraic modelling to derive notions of equivalence between neighbourhood structures. 2^2-bisimilarity and behavioural equivalence are well known coalgebraic concepts, and they are distinct, since 2^2 does not preserve weak pullbacks. We introduce a third, intermediate notion whose witnessing relations we call precocongruences (based on pushouts). We give back-and-forth style characterisations for 2^2-bisimulations and precocongruences, we show that on a single coalgebra, precocongruences capture behavioural equivalence, and that between neighbourhood structures, precocongruences are a better approximation of behavioural equivalence than 2^2-bisimulations. We also introduce a notion of modal saturation for neighbourhood models, and investigate its relationship with definability and image-finiteness. We prove a Hennessy-Milner theorem for modally saturated and for image-finite neighbourhood models. Our main results are an analogue of Van Benthem's characterisation theorem and a model-theoretic proof of Craig interpolation for classical modal logic. △ Less

Submitted 20 April, 2009; v1 submitted 28 January, 2009; originally announced January 2009.

Comments: uses LMCS.cls (included), 2 figures (both ps and pdf)

ACM Class: F.1.1; F.3.2; F.4.1; I.2.4

Journal ref: Logical Methods in Computer Science, Volume 5, Issue 2 (April 9, 2009) lmcs:1167

Showing 1–15 of 15 results for author: Pacuit, E