\setcopyright

ifaamas \acmConference[AAMAS ’24]Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024)May 6 – 10, 2024 Auckland, New ZealandN. Alechina, V. Dignum, M. Dastani, J.S. Sichman (eds.) \copyrightyear2024 \acmYear2024 \acmDOI \acmPrice \acmISBN \acmSubmissionID397 \affiliation \institutionZhejiang University \cityHangzhou \countryChina \affiliation \institutionZhejiang University & \institutionZJU-Hangzhou Global Scientific and Technological Innovation Center \cityHangzhou \countryChina \affiliation \institutionUniversity of Nottingham \cityNottingham \countryUnited Kingdom \affiliation \institutionZhejiang University \cityHangzhou \countryChina \affiliation \institutionZhejiang University \cityHangzhou \countryChina

Stability of Weighted Majority Voting under Estimated Weights

Shaojie Bai [email protected] Dongxia Wang [email protected] Tim Muller [email protected] Peng Cheng [email protected]  and  Jiming Chen [email protected]
Abstract.

Weighted Majority Voting (WMV) is a well-known decision making rule. The weights of sources are determined by the probabilities that sources provide accurate information (trustworthiness). However, in reality, the trustworthiness is usually not a known quantity to the decision maker – they have to rely on an estimate called trust. An algorithm that computes trust is called unbiased when it has the property that it does not systematically overestimate or underestimate the trustworthiness. To formally analyze the uncertainty to the decision process brought by such unbiased trust values, we introduce and analyze two important properties of WMV: Stability of Correctness and Stability of Optimality. Stability of Correctness measures the difference between the decision accuracy that the decision maker believes he can achieve and the accuracy he actually achieves. We prove Stability of Correctness absolutely holds for WMV – the difference is 00. Stability of Optimality measures the difference between the actual accuracy of decisions made using trust values, and those made using trustworthiness values. We find a relatively tight upper bound on the Stability of Optimality, meaning that, although using (unbiased) trust values is suboptimal compared to using the true trustworthiness values, the difference is small. Meanwhile, a counter-intuitive observation is that while distributions of trustworthiness influence the Stability of Optimality, the number of sources barely influences it. We also provide an overview of how sensitive decision accuracy is to the changes in trust and trustworthiness.

Key words and phrases:
Weighted Majority Voting, Trust, Stability of Decision Making
* Corresponding author

1. Introduction

Crowd wisdom has been playing a fundamental role in hel** make better decisions in many scenarios, e.g., hiring workers for labeling tasks in crowdsourcing Luo (2023), aggregating classifiers for prediction in ensemble learning Kotary et al. (2023), asking for opinions of reliability in online rating systems Carbo and Molina (2023); Ge et al. (2023), etc. Decisions are derived based on aggregating the information or feedback from a collection of sources, the quality of which can be variable. It can be inaccurate due to lack of expertise, mistakes or malice, e.g., low-quality labeling for machine learning, fake ratings introduced by sellers to promote their reputation, etc.

Among the aggregation mechanisms, Weighted Majority Voting has long been a popular one. Basically, each source supports an option and is assigned a weight. WMV chooses the feedback option that is supported by sources with the maximal total weight. WMV has been seeing its use in a variety of domains ranging from voting Nitzan and Paroush (1982), crowdsourcing Dawid and Skene (1979), classification Littlestone and Warmuth (1994) to trust systems Yu et al. (2004) and even distributed systems Tong and Kain (1991). In different contexts, the weight of a source can mean differently. For instance, in determining a collective choice that is widely acceptable to individuals with diverse preference Sen (1977), WMV is used for preference aggregation and the weight means the importance of an individual. We are more interested in the scenarios where there is a notion of correctness (or accuracy) of decisions, and the weight of a source depends on how trustworthy it is in providing the feedback that corresponds to the correct decision, denoted as trustworthiness, which is usually modelled as a probability value. The examples include the aggregation of the crowd-sourced labels Li and Yu (2014), the crowd-sensed navigation data James (2020), or the outputs of the multiple classifiers Manino et al. (2019b), etc.

While WMV is proven to be optimal when source trustworthiness is given Nitzan and Paroush (1982)111To provide feedback independently is also required for the optimality of WMV Muller et al. (2020); Nitzan and Paroush (1982)., in practice, decision makers have to resort to an estimation or a belief (denoted as trust) that may not equal to the actual values of trustworthiness222Note that we use “trust” and “trustworthiness” to differentiate between what a decision maker trusts or estimates as the probability of a source’s suggesting correctly (regardless of whether he believes in the value or is aware that its just an estimate), and the actual probability value (Refer to their definition difference Walter (2008)).. Deviation in estimation may decrease decision accuracy. There exists plenty of effort to improve the estimation of source trustworthiness by learning from historical data (e.g., direct observations or indirect evidences), with a principle that more data increases the confidence in the estimation Berend and Kontorovich (2013); Wu and Yang (2016). Several approaches even treat the belief about source trustworthiness as its actual values Mazzetto et al. (2021); Maystre et al. (2021). However, no algorithm always produces perfect trust values. It is worth studying how the quality of the trust values impacts the decision quality of WMV. Is WMV able to maintain a tolerant level of decision incorrectness with the inaccuracy in the estimation bounded, meaning having certain levels of stability w.r.t the inaccurate estimation?

In this paper, we propose a formal analysis of the stability properties of WMV. Firstly, we study how sensitive the decision accuracy is to the changes in source trust and trustworthiness, with both the arguments taking fixed values. We find that unsurprisingly, decision accuracy decreases with the increasing deviation from trust to trustworthiness, and a sufficiently small deviation barely influences the accuracy. Besides, compared with overestimating, underestimating trustworthiness is usually less harmful to the decision accuracy. Secondly, we study the influence of trust and trustworthiness in a statistical way. Considering that a decision maker may sometimes overestimate source trustworthiness while sometimes underestimate it, the expectation remains correct – unbiased estimation333Generally, the estimation error always exists, but it is relatively small and can be zero on average with sufficient data Berend and Kontorovich (2013); Freedman (1963). We define two types of stability based on such unbiased estimation: Stability of Correctness and Stability of Optimality. Stability of Correctness reasons whether the decision accuracy a decision maker believes he achieves (i.e., the accuracy he computes with trust) equals what he actually achieves (i.e., the accuracy computed with trustworthiness). We prove that whatever distribution source trustworthiness follows, as long as the estimation is unbiased, a decision maker gets the accuracy as if the trustworthiness is known – absolute stability. This means that the shape and variance of trustworthiness are irrelevant to the Stability of Correctness.

Stability of Optimality reasons whether the decisions made based on unbiased trust values are as good as those made based on trustworthiness. Considering trustworthiness is usually unknown, Stability of Optimality measures the gap between the practical situation where the decision maker decides with trust, and where (magically) he has access to the actual trustworthiness. We prove that Stability of Optimality does not hold for WMV, but the degradation in decision accuracy caused by the incorrect but (averagely) unbiased trust is relatively tightly bounded. That is, decision accuracy with unbiased trust will not be too far off the theoretically determined value. Moreover, unlike Stability of Correctness, the distribution of trustworthiness influences the upper bound of that accuracy gap, and also determines how well the accuracy can be in the ideal situation, namely where trustworthiness is given. Last but not least, while it may usually be perceived that more sources improve accuracy, we observe counterintuitively that source number influences little on the accuracy gap.

The rest of this paper is organized as follows. In Section 2 the related work is presented. In Section 3 we introduce a formal framework to study WMV decision rule. In Section 4 we present how trust and trustworthiness influence the decision accuracy of WMV. In Section 5 we analyze the two types of stability. The numerical analysis is also performed where needed to demonstrate theories.

2. Related Work

The Weighted Majority Voting rule has been studied in several domains, e.g., decision theory, voting theory, management science, and receiving various applications. We focus on the scenarios where the weight of a source or “voter” is determined by how trustworthy it is in suggesting the correct decision. Some approaches utilizing WMV assume source trustworthiness is given Nitzan and Paroush (1982); Berend and Kontorovich (2015), although in practice it is usually unknown. Plenty of work focuses on modeling and learning source trustworthiness from observation and interaction history Zeynalvand et al. (2021); Wu et al. (2023); Ge et al. (2023). Some researchers model trust as a probability value. To get trust, they either rely on frequency estimation by counting the times of making the right decisions Berend and Kontorovich (2013), or solving an optimization problem based on their models by minimizing the decision error rate Rekatsinas et al. (2017) or maximizing the likelihood Dong et al. (2015); Manino et al. (2019a); Meir et al. (2023). Moreover, model-checking-based methods are also applied in quantifying the probability of trust on individual agents, representing the agent’s own beliefs Drawel et al. (2020); Bentahar et al. (2022); Drawel et al. (2022); Telang et al. (2023). Besides, trust also can be modeled as a random variable. Bayesian models have also been proposed and applied to this problem by Raykar et al. (2010); Sardana et al. (2018); Guo (2023), combining the prior knowledge and the observations to infer the trustworthiness. Expectation Maximization-based methods are also proposed to estimate source trustworthiness and the correct decision at the same time, via iterative updating Dawid and Skene (1979); Zhang et al. (2016).

Such learned trust is sometimes treated as an estimation of the source trustworthiness with the deviation considered Gao et al. (2016); Wu et al. (2021a), while sometimes treated equivalently as trustworthiness, namely as the probability of a source suggesting correctly and is further used to evaluate decision accuracy  Littlestone and Warmuth (1994); Guan et al. (2018); Martín-Morató and Mesaros (2023). However, trust essentially represents the belief of a decision maker about the source quality, which may deviate from the actual probability. And he may not gain the claimed decision accuracy based on trust.

Besides the efforts in modeling and learning trustworthiness, there exists work that theoretically analyzes how trustworthiness and trust would influence decision accuracy, which is most relevant to ours. Given trustworthiness, the decision accuracy of WMV is analyzed in Berend and Kontorovich (2015) without considering the learning process of trustworthiness. On the other hand, some other work takes the learning process into consideration. To measure the estimation quality, the decision accuracy bounds for learning algorithms have been proposed through PAC techniques in Lacasse et al. (2006); Germain et al. (2015); Wu et al. (2021b). Considering trust is derived from finite samples, some researchers then study precise characterizations of the relationship between the decision accuracy and the sample size in Gao et al. (2016). More recently, tighter bounds for decision accuracy under arbitrary estimation are provided, ignoring particular assumptions for trustworthiness  Manino et al. (2019b). Unfortunately, none of them have analyzed the relationship between the estimate error and the decision accuracy of WMV in a quantitative way.

3. Preliminaries

In this section, we outline a formal framework to support our study of the stability of Weighted Majority Voting decision rule. Note that the capital letters represent random variables, and the lower cases represent non-random variables. The bold letters represent a vector of multiple variables, and the non-bold letters represent single variables.

Consider a decision-making scenario, a decision maker is faced with multiple possible decisions 𝒪={o1,,oK}𝒪subscript𝑜1subscript𝑜𝐾\mathcal{O}=\{o_{1},\dots,o_{K}\}caligraphic_O = { italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_o start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT } and only one of them is correct. The random variable O𝑂Oitalic_O determines which of the options is actually correct, e.g., O=o1𝑂subscript𝑜1O{=}o_{1}italic_O = italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT if o1subscript𝑜1o_{1}italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the correct decision. The decision maker receives feedback from a set of sources, 𝒮={s1,..,sn}\mathcal{S}=\{s_{1},..,s_{n}\}caligraphic_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , . . , italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. The random variable Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denoting an outcome, represents the feedback of source sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The feedback may or may not correspond to the correct decision. For WMV, we assume444For model general decision-making scenarios, the options of feedback and that of decisions may not necessarily equal and may take a many-to-one map**. a one-to-one correspondence between the feedback that suggests the correct decision and the correct decision itself, and denote Fi=Osubscript𝐹𝑖𝑂F_{i}=Oitalic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_O iff fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT suggests correctly, Fi𝒪subscript𝐹𝑖𝒪F_{i}\in\mathcal{O}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_O. Feedback of all the sources is represented by random variable 𝑭:𝑭=(F1,,Fn):𝑭𝑭subscript𝐹1subscript𝐹𝑛\bm{F}:\bm{F}=(F_{1},\dots,F_{n})bold_italic_F : bold_italic_F = ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), with 𝒇:𝒇=(f1,,fn):𝒇𝒇subscript𝑓1subscript𝑓𝑛\bm{f}{:}\bm{f}{=}(f_{1},\dots,f_{n})bold_italic_f : bold_italic_f = ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) denoting an outcome, and its sample space is defined as :𝒇:𝒇\mathcal{F}{:}\bm{f}{\in}\mathcal{F}caligraphic_F : bold_italic_f ∈ caligraphic_F. A decision mechanism is a function: 𝒟:𝒪:𝒟𝒪\mathcal{D}:\mathcal{F}{\to}\mathcal{O}caligraphic_D : caligraphic_F → caligraphic_O. The quantity that the decision maker wants to maximize is the probability of making the correct decisions (which we shorthand as decision accuracy or decision correctness throughout the paper): (𝒟(𝑭)=O)𝒟𝑭𝑂\mathbb{P}(\mathcal{D}(\bm{F})=O)blackboard_P ( caligraphic_D ( bold_italic_F ) = italic_O ).

we define ΥisubscriptΥ𝑖\Upsilon_{i}roman_Υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as a {1,1}11\{-1,1\}{ - 1 , 1 }-indicator random variable of whether source sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT suggests the correct decision and υisubscript𝜐𝑖\upsilon_{i}italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as one of its outcome: Υi=1subscriptΥ𝑖1\Upsilon_{i}{=}1roman_Υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 if Fi=Osubscript𝐹𝑖𝑂F_{i}{=}Oitalic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_O and Υi=1subscriptΥ𝑖1\Upsilon_{i}{=}-1roman_Υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - 1 if FiOsubscript𝐹𝑖𝑂F_{i}{\neq}Oitalic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_O. For the indicator variables of all the sources i.e., 𝚼:𝚼=(Υ1,,Υn):𝚼𝚼subscriptΥ1subscriptΥ𝑛\bm{\Upsilon}:\bm{\Upsilon}=(\Upsilon_{1},\dots,\Upsilon_{n})bold_Υ : bold_Υ = ( roman_Υ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , roman_Υ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), one of its samples is an indicator vector i.e., 𝝊=(υ1,,υn),𝝊𝒯formulae-sequence𝝊subscript𝜐1subscript𝜐𝑛𝝊𝒯\bm{\upsilon}{=}(\upsilon_{1},\dots,\upsilon_{n}),\bm{\upsilon}\in\mathcal{T}bold_italic_υ = ( italic_υ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_υ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , bold_italic_υ ∈ caligraphic_T. 𝒯𝒯\mathcal{T}caligraphic_T denote the sample space of 𝚼𝚼\bm{\Upsilon}bold_Υ. Let 𝝊=(υ1,,υn)𝝊subscript𝜐1subscript𝜐𝑛-\bm{\upsilon}{=}(-\upsilon_{1},\dots,-\upsilon_{n})- bold_italic_υ = ( - italic_υ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , - italic_υ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) denote the opposite indicator vector of 𝝊𝝊\bm{\upsilon}bold_italic_υ where source indicators are flipped. The set of all the possible feedback under 𝝊𝝊\bm{\upsilon}bold_italic_υ is denoted as 𝝊subscript𝝊\mathcal{F}_{\bm{\upsilon}}caligraphic_F start_POSTSUBSCRIPT bold_italic_υ end_POSTSUBSCRIPT.

We use the following running example in this section to demonstrate the relevant concepts.

Example 1.

There are three sources 𝒮={s1,s2,s3}𝒮subscript𝑠1subscript𝑠2subscript𝑠3\mathcal{S}{=}\{s_{1},s_{2},s_{3}\}caligraphic_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT }. If 𝒪={A,B}𝒪𝐴𝐵\mathcal{O}=\{A,B\}caligraphic_O = { italic_A , italic_B }, O=B𝑂𝐵O{=}Bitalic_O = italic_B and the indicator vector 𝛖=(1,1,1)𝛖111\bm{\upsilon}=(1,1,-1)bold_italic_υ = ( 1 , 1 , - 1 ), then 𝐟=(B,B,A)𝐟𝐵𝐵𝐴\bm{f}=(B,B,A)bold_italic_f = ( italic_B , italic_B , italic_A ) and 𝛖={(B,B,A),(A,A,B)}subscript𝛖𝐵𝐵𝐴𝐴𝐴𝐵\mathcal{F}_{\bm{\upsilon}}=\{(B,B,A),(A,A,B)\}caligraphic_F start_POSTSUBSCRIPT bold_italic_υ end_POSTSUBSCRIPT = { ( italic_B , italic_B , italic_A ) , ( italic_A , italic_A , italic_B ) }.

When decision “correctness” is a concern, Weighted Majority Voting usually considers how probable each source suggests the correct decision. For source i𝑖iitalic_i, let (Fi=O)=pisubscript𝐹𝑖𝑂subscript𝑝𝑖\mathbb{P}(F_{i}{=}O)=p_{i}blackboard_P ( italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_O ) = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒑=(p1,,pn)𝒑subscript𝑝1subscript𝑝𝑛\bm{p}=(p_{1},\dots,p_{n})bold_italic_p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Hence (Υi=1)=pisubscriptΥ𝑖1subscript𝑝𝑖\mathbb{P}(\Upsilon_{i}=1)=p_{i}blackboard_P ( roman_Υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We refer to pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the trustworthiness of source sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In practice, the trustworthiness of a source is usually unknown to a decision maker. And an estimation is used, denoted as p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with 𝒑^=(p^1,p^n)^𝒑subscript^𝑝1subscript^𝑝𝑛\hat{\bm{p}}=(\hat{p}_{1},\dots\hat{p}_{n})over^ start_ARG bold_italic_p end_ARG = ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). We call the value p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT trust, which represents the subjective estimation or belief of the decision maker regarding how probable a source suggests correctly. There exist multiple ways to compute p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, e.g., counting the frequency of making correct decisions, or Bayesian learning methods based on prior interaction data. In the literature, the trustworthiness of a source can have different meanings, e.g. honesty of an agent in a rating system Muller et al. (2020), competency of a voter Condorcet (1785), reliability of a worker in crowdsourcing Dawid and Skene (1979), correctness of a sensor in crowdsensing Moslem et al. (2012), etc. Whatever the meanings, pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents an intrinsic quality or the fact that how probable the source reports correctly, while p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents how the decision maker thinks of or estimates that probability Walter (2008). We assume sources independently provide feedback, hence (𝚼=𝝊)=i:υi=1pii:υi=1(1pi)𝚼𝝊subscriptproduct:𝑖subscript𝜐𝑖1subscript𝑝𝑖subscriptproduct:𝑖subscript𝜐𝑖11subscript𝑝𝑖\mathbb{P}(\bm{\Upsilon}=\bm{\upsilon})=\prod_{i:\upsilon_{i}{=}1}p_{i}\cdot% \prod_{i:\upsilon_{i}{=}-1}(1{-}p_{i})blackboard_P ( bold_Υ = bold_italic_υ ) = ∏ start_POSTSUBSCRIPT italic_i : italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ∏ start_POSTSUBSCRIPT italic_i : italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - 1 end_POSTSUBSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

In Example 1, suppose 𝒑=(0.6,0.6,0.7)𝒑0.60.60.7\bm{p}=(0.6,0.6,0.7)bold_italic_p = ( 0.6 , 0.6 , 0.7 ), the estimation of 𝒑𝒑\bm{p}bold_italic_p by a decision maker may be inaccurate: 𝒑^=(0.6,0.7,0.8)^𝒑0.60.70.8\hat{\bm{p}}=(0.6,0.7,0.8)over^ start_ARG bold_italic_p end_ARG = ( 0.6 , 0.7 , 0.8 ).

Below, we introduce the Weighted Majority Voting (WMV) decision scheme. It can be treated as an extension of the more commonly known Majority Voting decision scheme. The difference is that Majority Voting treats sources without distinguishing, while WMV assigns sources different weights. The weight of a source is usually determined by how trustworthy its feedback is. Formally:

Definition 0 (Weighted Majority Voting 𝒟Wsubscript𝒟𝑊\mathcal{D}_{W}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT).

Given a set of n𝑛nitalic_n sources 𝒮𝒮\mathcal{S}caligraphic_S, their trustworthiness 𝐩𝐩\bm{p}bold_italic_p and independent feedback 𝐟𝐟\bm{f}bold_italic_f, 𝒟Wsubscript𝒟𝑊\mathcal{D}_{W}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT makes decisions via the function Muller et al. (2020); Nitzan and Paroush (1982):

𝒟W(𝒇)=argmaxo𝒪(i:fi=owi)subscript𝒟𝑊𝒇subscriptargmax𝑜𝒪subscript:𝑖subscript𝑓𝑖𝑜subscript𝑤𝑖\mathcal{D}_{W}(\bm{f})=\text{argmax}_{o\in\mathcal{O}}\left(\sum_{i:f_{i}=o}{% w_{i}}\right)caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_f ) = argmax start_POSTSUBSCRIPT italic_o ∈ caligraphic_O end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i : italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_o end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (1)

where fi𝒪subscript𝑓𝑖𝒪f_{i}\in\mathcal{O}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_O, wi=log(pi/1pi)subscript𝑤𝑖subscript𝑝𝑖1subscript𝑝𝑖w_{i}=\log(\nicefrac{{p_{i}}}{{1-p_{i}}})italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_log ( / start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) with pi0.5subscript𝑝𝑖0.5p_{i}\geq 0.5italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0.5.

To give an instance, consider Example 1, suppose 𝒑=(0.6,0.6,0.9)𝒑0.60.60.9\bm{p}=(0.6,0.6,0.9)bold_italic_p = ( 0.6 , 0.6 , 0.9 ), 𝒪=(A,B)𝒪𝐴𝐵\mathcal{O}=(A,B)caligraphic_O = ( italic_A , italic_B ), O=B𝑂𝐵O=Bitalic_O = italic_B and 𝝊=(1,1,1)𝝊111\bm{\upsilon}=(1,1,-1)bold_italic_υ = ( 1 , 1 , - 1 ), then 𝒇=(B,B,A)𝒇𝐵𝐵𝐴\bm{f}=(B,B,A)bold_italic_f = ( italic_B , italic_B , italic_A ). w10.18,w20.18,w30.60formulae-sequencesubscript𝑤10.18formulae-sequencesubscript𝑤20.18subscript𝑤30.60w_{1}\approx 0.18,w_{2}\approx 0.18,w_{3}\approx 0.60italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≈ 0.18 , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≈ 0.18 , italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ≈ 0.60. since w1+w2<w3subscript𝑤1subscript𝑤2subscript𝑤3w_{1}+w_{2}<w_{3}italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, 𝒟W(𝒇)=Asubscript𝒟𝑊𝒇𝐴\mathcal{D}_{W}(\bm{f})=Acaligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_f ) = italic_A.

Here pi0.5subscript𝑝𝑖0.5p_{i}\geq 0.5italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0.5 and the log weight function are well-known for classical WMV in the literature Nitzan and Paroush (1982); Grofman et al. (1983), where trust and trustworthiness are not distinguished. The assumption pi0.5subscript𝑝𝑖0.5p_{i}\geq 0.5italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0.5 means that sources with pi<0.5subscript𝑝𝑖0.5p_{i}<0.5italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0.5 are ignored. For a source with pi<0.5subscript𝑝𝑖0.5p_{i}<0.5italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0.5, a decision maker may assign negative weight to its feedback. Or he can just simply reverse the vote of the source (e.g., replacing the reported option A with C). But if either the operation is realized by the malicious sources, they can push the decision to a wrong one by reporting correctly, purposely reducing the chance of correct option being selected. Therefore, it is in the interest of the decision maker to ignore such sources.

It has been shown in the literature that the decision accuracy of WMV is determined by the indicator vectors where it always decides correctly (an example is where all sources report correctly). For such indicator vectors, whether a decision is correct is not influenced by the feedback of the sources that suggest incorrectly. To give an opposite example, consider Example 1. Suppose 𝒪={A,B,C}𝒪𝐴𝐵𝐶\mathcal{O}=\{A,B,C\}caligraphic_O = { italic_A , italic_B , italic_C }, 𝒑=(0.70,0.65,0.65)𝒑0.700.650.65\bm{p}=(0.70,0.65,0.65)bold_italic_p = ( 0.70 , 0.65 , 0.65 ) and 𝝊=(1,1,1)𝝊111\bm{\upsilon}=(1,-1,-1)bold_italic_υ = ( 1 , - 1 , - 1 ) (only s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT reports correctly). We get (w1,w2,w3)(0.37,0.27,0.27)subscript𝑤1subscript𝑤2subscript𝑤30.370.270.27(w_{1},w_{2},w_{3})\approx(0.37,0.27,0.27)( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ≈ ( 0.37 , 0.27 , 0.27 ). Both the feedback 𝒇=(A,B,C)𝒇𝐴𝐵𝐶\bm{f}=(A,B,C)bold_italic_f = ( italic_A , italic_B , italic_C ) and 𝒇=(A,C,C)superscript𝒇𝐴𝐶𝐶\bm{f}^{\prime}=(A,C,C)bold_italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_A , italic_C , italic_C ) are possible under 𝝊𝝊\bm{\upsilon}bold_italic_υ (both belong to 𝝊subscript𝝊\mathcal{F}_{\bm{\upsilon}}caligraphic_F start_POSTSUBSCRIPT bold_italic_υ end_POSTSUBSCRIPT. However, WMV decides correctly by choosing A𝐴Aitalic_A under 𝒇𝒇\bm{f}bold_italic_f and decides incorrectly by choosing C𝐶Citalic_C under 𝒇superscript𝒇\bm{f}^{\prime}bold_italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Whether WMV decides correctly is influenced by what incorrect feedback is. Given the same 𝒑𝒑\bm{p}bold_italic_p, for 𝝊=(1,1,1)superscript𝝊111\bm{\upsilon}^{\prime}=(1,1,-1)bold_italic_υ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( 1 , 1 , - 1 ), it can be seen that whatever s3subscript𝑠3s_{3}italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reports, WMV can always decides correctly by trusting s1,s2subscript𝑠1subscript𝑠2s_{1},s_{2}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Let 𝔻W(𝒑)subscript𝔻𝑊𝒑\mathbb{D}_{W}(\bm{p})blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) denote the set of all the indicator vectors where WMV always decides correctly when using 𝒑𝒑\bm{p}bold_italic_p, namely 𝔻W(𝒑)={𝝊|𝒟W(𝒇)=O,𝒇𝝊}subscript𝔻𝑊𝒑conditional-set𝝊formulae-sequencesubscript𝒟𝑊𝒇𝑂𝒇subscript𝝊\mathbb{D}_{W}(\bm{p})=\{\bm{\upsilon}|\mathcal{D}_{W}(\bm{f})=O,\bm{f}{\in}% \mathcal{F}_{\bm{\upsilon}}\}blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) = { bold_italic_υ | caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_f ) = italic_O , bold_italic_f ∈ caligraphic_F start_POSTSUBSCRIPT bold_italic_υ end_POSTSUBSCRIPT }. It has been proven that 𝔻W(𝒑)={𝝊|(𝝊)(𝝊)}subscript𝔻𝑊𝒑conditional-set𝝊𝝊𝝊\mathbb{D}_{W}(\bm{p})=\{\bm{\upsilon}|\mathbb{P}(\bm{\upsilon})\geq\mathbb{P}% (-\bm{\upsilon})\}blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) = { bold_italic_υ | blackboard_P ( bold_italic_υ ) ≥ blackboard_P ( - bold_italic_υ ) } and the decision accuracy of 𝒟Wsubscript𝒟𝑊\mathcal{D}_{W}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT is (Refer to  Nitzan and Paroush (1982); Berend and Kontorovich (2015); Muller et al. (2020)):

(𝒟W(𝑭)=O)subscript𝒟𝑊𝑭𝑂\displaystyle\mathbb{P}(\mathcal{D}_{W}(\bm{F})=O)blackboard_P ( caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_F ) = italic_O ) =𝝊:𝝊𝔻W(𝒑)(𝝊)absentsubscript:𝝊𝝊subscript𝔻𝑊𝒑𝝊\displaystyle=\!\!\sum_{\bm{\upsilon}:\bm{\upsilon}\in\mathbb{D}_{W}(\bm{p})}% \mathbb{P}(\bm{\upsilon})= ∑ start_POSTSUBSCRIPT bold_italic_υ : bold_italic_υ ∈ blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) end_POSTSUBSCRIPT blackboard_P ( bold_italic_υ ) (2)
=𝝊:(𝝊)(𝝊)(i:υi=1pii:υi=1(1pi))absentsubscript:𝝊𝝊𝝊subscriptproduct:𝑖subscript𝜐𝑖1subscript𝑝𝑖subscriptproduct:𝑖subscript𝜐𝑖11subscript𝑝𝑖\displaystyle=\!\!\sum_{\bm{\upsilon}:\mathbb{P}(\bm{\upsilon})\geq\mathbb{P}(% -\bm{\upsilon})}\!\!\left(\prod_{i:\upsilon_{i}{=}1}\!\!p_{i}\cdot\!\!\!\prod_% {i:\upsilon_{i}{=}-1}\!\!(1{-}p_{i})\right)= ∑ start_POSTSUBSCRIPT bold_italic_υ : blackboard_P ( bold_italic_υ ) ≥ blackboard_P ( - bold_italic_υ ) end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_i : italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ∏ start_POSTSUBSCRIPT italic_i : italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - 1 end_POSTSUBSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )

Equation 2 indicates that the accuracy of WMV is determined by the probabilities of indicator vectors, which depend on the trustworthiness values of the sources.

In Example 1, if 𝒑=(0.6,0.6,0.9)𝒑0.60.60.9\bm{p}=(0.6,0.6,0.9)bold_italic_p = ( 0.6 , 0.6 , 0.9 ), then 𝔻W(𝒑)={(1,1,1),(1,1,1),(1,1,1),(1,1,1)}subscript𝔻𝑊𝒑111111111111\mathbb{D}_{W}(\bm{p})=\{(1,1,1),\\ (-1,1,1),(1,-1,1),(-1,-1,1)\}blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) = { ( 1 , 1 , 1 ) , ( - 1 , 1 , 1 ) , ( 1 , - 1 , 1 ) , ( - 1 , - 1 , 1 ) } and the decision accuracy is 0.9 (i.e., a.l.a source s3subscript𝑠3s_{3}italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reports correctly).

WMV has been proved to be optimal when trustworthiness 𝒑𝒑\bm{p}bold_italic_p and the log weight function are used for decision making and the sources are independent in providing feedback Nitzan and Paroush (1982).

In practice, when trustworthiness is unknown, the weight assigned to each source depends on the trust, that is, wi=log(p^i/1p^i)subscript𝑤𝑖subscript^𝑝𝑖1subscript^𝑝𝑖w_{i}=\log(\nicefrac{{\hat{p}_{i}}}{{1-\hat{p}_{i}}})italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_log ( / start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ). Besides, the decision maker computes the probabilities of indicator vectors with trust values, which we use the subscript 𝒑^subscript^𝒑\mathbb{P}_{\hat{\bm{p}}}blackboard_P start_POSTSUBSCRIPT over^ start_ARG bold_italic_p end_ARG end_POSTSUBSCRIPT to distinguish from their actual probabilities: 𝒑^(𝚼=𝝊)=i:υi=1p^ii:υi=1(1p^i)subscript^𝒑𝚼𝝊subscriptproduct:𝑖subscript𝜐𝑖1subscript^𝑝𝑖subscriptproduct:𝑖subscript𝜐𝑖11subscript^𝑝𝑖\mathbb{P}_{\hat{\bm{p}}}(\bm{\Upsilon}=\bm{\upsilon})=\prod_{i:\upsilon_{i}{=% }1}\hat{p}_{i}\cdot\prod_{i:\upsilon_{i}{=}-1}(1{-}\hat{p}_{i})blackboard_P start_POSTSUBSCRIPT over^ start_ARG bold_italic_p end_ARG end_POSTSUBSCRIPT ( bold_Υ = bold_italic_υ ) = ∏ start_POSTSUBSCRIPT italic_i : italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ∏ start_POSTSUBSCRIPT italic_i : italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - 1 end_POSTSUBSCRIPT ( 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). With (𝝊)𝝊\mathbb{P}(\bm{\upsilon})blackboard_P ( bold_italic_υ ) replaced by 𝒑^(𝝊)subscript^𝒑𝝊\mathbb{P}_{\hat{\bm{p}}}(\bm{\upsilon})blackboard_P start_POSTSUBSCRIPT over^ start_ARG bold_italic_p end_ARG end_POSTSUBSCRIPT ( bold_italic_υ ), the decisions would always be correct for those indicator vectors which the decision maker thinks are more probable than their opposite, namely 𝔻W(𝒑^)={𝝊|𝒑^(𝝊)𝒑^(𝝊)}subscript𝔻𝑊^𝒑conditional-set𝝊subscript^𝒑𝝊subscript^𝒑𝝊\mathbb{D}_{W}(\hat{\bm{p}})=\{\bm{\upsilon}|\mathbb{P}_{\hat{\bm{p}}}(\bm{% \upsilon})\geq\mathbb{P}_{\hat{\bm{p}}}(-\bm{\upsilon})\}blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG ) = { bold_italic_υ | blackboard_P start_POSTSUBSCRIPT over^ start_ARG bold_italic_p end_ARG end_POSTSUBSCRIPT ( bold_italic_υ ) ≥ blackboard_P start_POSTSUBSCRIPT over^ start_ARG bold_italic_p end_ARG end_POSTSUBSCRIPT ( - bold_italic_υ ) }. As a result, 𝔻W(𝒑^)subscript𝔻𝑊^𝒑\mathbb{D}_{W}(\hat{\bm{p}})blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG ) and 𝔻W(𝒑)subscript𝔻𝑊𝒑\mathbb{D}_{W}(\bm{p})blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) may be different. In Example 1, if 𝒑=(0.6,0.6,0.9)𝒑0.60.60.9\bm{p}=(0.6,0.6,0.9)bold_italic_p = ( 0.6 , 0.6 , 0.9 ) and 𝒑^=(0.8,0.6,0.8)^𝒑0.80.60.8\hat{\bm{p}}=(0.8,0.6,0.8)over^ start_ARG bold_italic_p end_ARG = ( 0.8 , 0.6 , 0.8 ), then (1,1,1)𝔻W(𝒑)111subscript𝔻𝑊𝒑(-1,-1,1)\in\mathbb{D}_{W}(\bm{p})( - 1 , - 1 , 1 ) ∈ blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) while its opposite indicator vector (1,1,1)𝔻W(𝒑^)111subscript𝔻𝑊^𝒑(1,1,-1)\in\mathbb{D}_{W}(\hat{\bm{p}})( 1 , 1 , - 1 ) ∈ blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG ). This may result in different decision accuracy. We introduce ω(𝒑^,𝒑)𝜔^𝒑𝒑\omega(\hat{\bm{p}},\bm{p})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ) to distinguish:

(𝒟W(𝑭)=O)subscript𝒟𝑊𝑭𝑂\displaystyle\mathbb{P}(\mathcal{D}_{W}(\bm{F})=O)blackboard_P ( caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_F ) = italic_O ) ω(𝒑^,𝒑)=𝝊𝔻W(𝒑^)(𝝊)absent𝜔^𝒑𝒑subscript𝝊subscript𝔻𝑊^𝒑𝝊\displaystyle\triangleq\omega(\hat{\bm{p}},\bm{p})=\sum_{\bm{\upsilon}\in% \mathbb{D}_{W}(\hat{\bm{p}})}\mathbb{P}(\bm{\upsilon})≜ italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ) = ∑ start_POSTSUBSCRIPT bold_italic_υ ∈ blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG ) end_POSTSUBSCRIPT blackboard_P ( bold_italic_υ ) (3)

The first parameter of the function ω()𝜔\omega()italic_ω ( ) represents the value used for decision making, and the second parameter represents the value used to compute the probability of deciding correctly. For ω(𝒑^,𝒑)𝜔^𝒑𝒑\omega(\hat{\bm{p}},\bm{p})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ), decisions are made using trust values 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG, while the decision accuracy that the decision maker actually obtains still depends on source trustworthiness, which challenges the optimality of WMV.

Generally, both the parameters of ω()𝜔\omega()italic_ω ( ) can be either trust or trustworthiness, and we assume that the parameter (either trust or trustworthiness) used for decision-making is at least 0.5. Trust values are, by definition, known to the decision-maker. Therefore, it’s reasonable to apply the assumption for trust, meaning ignoring sources with trust below 0.5. For trustworthiness, we assume it is at least 0.5 only when it is used to decide (e.g., in Section 4.1), and otherwise, its value ranges from (0,1)01(0,1)( 0 , 1 ) (e.g., in Section 4.2,  4.3 and  5).

Depending on what we equip the parameters with, trust or trustworthiness, we will obtain different meanings for decision accuracy as follows. The quantity ω(𝒑,𝒑)𝜔𝒑𝒑\omega(\bm{p},\bm{p})italic_ω ( bold_italic_p , bold_italic_p ) denotes the “ideal” decision accuracy, where the decision maker knows and uses the trustworthiness values to decide and compute. The quantity ω(𝒑^,𝒑)𝜔^𝒑𝒑\omega(\hat{\bm{p}},\bm{p})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ) denotes the ”practical” decision accuracy, where the decision maker decides with the trust values 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG, but the accuracy he actually achieves depends on trustworthiness. The quantity ω(𝒑^,𝒑^)𝜔^𝒑^𝒑\omega(\hat{\bm{p}},\hat{\bm{p}})italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ) denotes the “perceived” decision accuracy that the decision maker thinks he can obtain (decides and computes accuracy with trust), while the actual accuracy may not equal ω(𝒑^,𝒑^)𝜔^𝒑^𝒑\omega(\hat{\bm{p}},\hat{\bm{p}})italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ).

4. Parameter Sensitivity

In this section, we analyze how changes in the values of trust and trustworthiness influence the decision accuracy or the correctness of WMV. There are several ways: 1) how the decision accuracy changes when the trustworthiness and trust change simultaneously; 2) how the decision accuracy changes with trustworthiness when trust remains constant; 3) how the decision accuracy changes with trust when trustworthiness remains constant. If the changes show relatively little effect on the correctness, then we can say that WMV is not very sensitive to the parameters. Sensitivity relates to stability, the analysis in this section provides several important insights for the analysis in the next section.

We will also take numerical analysis based on the setting in the following running Example 2 to further illustrate the theories.

Example 2.

There are four sources 𝒮={s1,s2,s3,s4}𝒮subscript𝑠1subscript𝑠2subscript𝑠3subscript𝑠4\mathcal{S}{=}\{s_{1},s_{2},s_{3},s_{4}\}caligraphic_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT } and their trustworthiness values are 𝐩=(0.8,0.75,0.7,0.6)𝐩0.80.750.70.6\bm{p}=(0.8,0.75,0.7,0.6)bold_italic_p = ( 0.8 , 0.75 , 0.7 , 0.6 ) respectively.

4.1. Direct Sensitivity Analysis

Here, we analyze the case where the parameters used for making decisions and that for computing accuracy are equal. There are two different rationales for doing this, but the mathematics is identical for both. First, consider that the decision maker is given the actual trustworthiness values to make decisions. Second, consider analyzing the sensitivity of the beliefs of the decision maker. Assume the decision maker only knows trust values, and uses them to compute their belief about how probable a decision is correct.

For simplicity, we use trustworthiness everywhere, but the analysis remains unchanged when using trust instead (simply put a hat on all p𝑝pitalic_p’s and 𝒑𝒑\bm{p}bold_italic_p’s). Observe that if trustworthiness of only 1111 source varies, then decision accuracy would appear to be a piecewise linear non-decreasing convex function. In Figure 1(a), we depict Example 2, with each plot representing a source trustworthiness variable.

Lemma 0.

Let f(pi)=ω(𝐩,𝐩)𝑓subscript𝑝𝑖𝜔𝐩𝐩f(p_{i})=\omega(\bm{p},\bm{p})italic_f ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ω ( bold_italic_p , bold_italic_p ), where pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is constant for ji𝑗𝑖j\neq iitalic_j ≠ italic_i. The function f(pi)𝑓subscript𝑝𝑖f(p_{i})italic_f ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is a piecewise linear non-decreasing convex function.

Sketch of Proof.

The computation for correctness of a decision can be characterized as pix+(1pi)ysubscript𝑝𝑖𝑥1subscript𝑝𝑖𝑦p_{i}\cdot x+(1-p_{i})\cdot yitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_x + ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_y, where xy0𝑥𝑦0x-y\geq 0italic_x - italic_y ≥ 0 and this coefficient increases with pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT increasing. Each decision based on corresponding 𝔻W(𝒑)subscript𝔻𝑊𝒑\mathbb{D}_{W}(\bm{p})blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_italic_p ) represents a non-decreasing line. ∎

Generally, if a source is more probable to be trustworthy, the decision will be better and improved faster. Figure 1(a) illustrates Lemma 2. The accuracy of WMV is determined by comparing the (𝝊)𝝊\mathbb{P}(\bm{\upsilon})blackboard_P ( bold_italic_υ ) and (𝝊)𝝊\mathbb{P}(-\bm{\upsilon})blackboard_P ( - bold_italic_υ ) for all the indicator vectors. The relation between (𝝊)𝝊\mathbb{P}(\bm{\upsilon})blackboard_P ( bold_italic_υ ) and (𝝊)𝝊\mathbb{P}(-\bm{\upsilon})blackboard_P ( - bold_italic_υ ) either remains or changes, depending on the value of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and how much it changes. Intuitively, this results in the piece-wise characteristics of f(pi)𝑓subscript𝑝𝑖f(p_{i})italic_f ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Moreover, we can vary trustworthiness values of multiple (or even all) sources. If we vary k𝑘kitalic_k trustworthiness, then we get an k𝑘kitalic_k-dimensional piecewise surface. In Figure 1(b), we depict our running example with p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on the two axes, and p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and p4subscript𝑝4p_{4}italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT remaining constant. The surface appears like a collection of intersecting planes, but in fact, the graph consists of surfaces described by a polynomial, rather than a linear one.

Refer to caption
(a) One Source pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Refer to caption
(b) Two sources p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Figure 1. Sensitivity of 𝒟Wsubscript𝒟𝑊\mathcal{D}_{W}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT to 𝒑𝒑\bm{p}bold_italic_p when 𝒑=𝒑^𝒑^𝒑\bm{p}=\hat{\bm{p}}bold_italic_p = over^ start_ARG bold_italic_p end_ARG

In Lemma 2, we assume that trustworthiness of all the sources remains constant and independent. However, sources can collude to influence decisions, or one can update the trust values of multiple sources at a time. For such situations, we assume the trustworthiness values of multiple sources are consistently equal, meaning they are not independent.

Lemma 0.

In the special case of the identical sources, let f(p)=ω(𝐩,𝐩)𝑓𝑝𝜔𝐩𝐩f(p)=\omega(\bm{p},\bm{p})italic_f ( italic_p ) = italic_ω ( bold_italic_p , bold_italic_p ), where p1==pm=psubscript𝑝1subscript𝑝𝑚𝑝p_{1}=\dots=p_{m}=pitalic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ⋯ = italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_p, mn𝑚𝑛m\leq nitalic_m ≤ italic_n and pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are constant, j>m𝑗𝑚j>mitalic_j > italic_m. The function f(p)𝑓𝑝f(p)italic_f ( italic_p ) is a piecewise non-decreasing function, and it is concave (linear or strictly concave) in each segment.

Sketch of Proof.

The computation for correctness can be characterized as a summation: ω(𝒑,𝒑)=igi(p)𝜔𝒑𝒑subscript𝑖subscript𝑔𝑖𝑝\omega(\bm{p},\bm{p})=\sum_{i}g_{i}(p)italic_ω ( bold_italic_p , bold_italic_p ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_p ). For each gi(p)subscript𝑔𝑖𝑝g_{i}(p)italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_p ), it meets piecewise non-decreasing property, and concave property in each segment. Thus, ω(𝒑,𝒑)𝜔𝒑𝒑\omega(\bm{p},\bm{p})italic_ω ( bold_italic_p , bold_italic_p ) also holds. ∎

Note that in Lemma 3, if m=n𝑚𝑛m{=}nitalic_m = italic_n, meaning all the sources are identical, WMV becomes the classical Majority Voting decision rule and f(p)𝑓𝑝f(p)italic_f ( italic_p ) becomes a concave monotonically increasing function  Boland (1989).

Figure 2(a) shows how the decision accuracy ω(𝒑,𝒑)𝜔𝒑𝒑\omega(\bm{p},\bm{p})italic_ω ( bold_italic_p , bold_italic_p ) changes with varying p𝑝pitalic_p and m𝑚mitalic_m values where there are originally 2222 sources with the same rest_p=0.7𝑟𝑒𝑠𝑡_𝑝0.7rest\_{}p=0.7italic_r italic_e italic_s italic_t _ italic_p = 0.7, then m𝑚mitalic_m identical sources join with their p0.5𝑝0.5p\geq 0.5italic_p ≥ 0.5 and n=m+2𝑛𝑚2n=m+2italic_n = italic_m + 2. Given m𝑚mitalic_m value, f(p)𝑓𝑝f(p)italic_f ( italic_p ) increases piecewisely with p𝑝pitalic_p, and specifically in each segment, it is concavely increasing. Given p𝑝pitalic_p value, f(p)𝑓𝑝f(p)italic_f ( italic_p ) increases monotonically with m𝑚mitalic_m. In Figure 2(b), we fix n=10,m=6formulae-sequence𝑛10𝑚6n=10,m=6italic_n = 10 , italic_m = 6 and vary the trustworthiness of the m𝑚mitalic_m identical sources p𝑝pitalic_p and the rest sources rest_p𝑟𝑒𝑠𝑡_𝑝rest\_{}pitalic_r italic_e italic_s italic_t _ italic_p. This figure illustrates that even the rest sources are in minority, but the higher rest_p𝑟𝑒𝑠𝑡_𝑝rest\_{}pitalic_r italic_e italic_s italic_t _ italic_p is, the more insensitive the decision to p𝑝pitalic_p is. Besides, it demonstrates that when the trustworthiness of multiple sources updates in a particular way, the variation characteristic of the decision accuracy may be captured and described.

Refer to caption
(a) rest_p=0.7𝑟𝑒𝑠𝑡_𝑝0.7rest\_p=0.7italic_r italic_e italic_s italic_t _ italic_p = 0.7
Refer to caption
(b) n=10𝑛10n=10italic_n = 10
Figure 2. Accuracy of 𝒟Wsubscript𝒟𝑊\mathcal{D}_{W}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT with m𝑚mitalic_m sources being identical

4.2. Trustworthiness Sensitivity Analysis

Next, we analyze the cases where trustworthiness and trust are not identical. The decisions are made based on the trust values 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG, while the probability of each indicator vector is determined by 𝒑𝒑\bm{p}bold_italic_p. The probability of deciding correctly is ω(𝒑^,𝒑)𝜔^𝒑𝒑\omega(\hat{\bm{p}},\bm{p})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ). In this section, trustworthiness varies with trust value fixed. Recall Equation 3, this means that decisions remain unchanged for given feedback (as the set 𝔻W(𝒑^)subscript𝔻𝑊^𝒑\mathbb{D}_{W}(\hat{\bm{p}})blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG ) remain unchanged), while decision accuracy ω(𝒑^,𝒑)𝜔^𝒑𝒑\omega(\hat{\bm{p}},\bm{p})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ) may change with trustworthiness.

If only one parameter pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT varies in ω(𝒑^,𝒑)𝜔^𝒑𝒑\omega(\hat{\bm{p}},\bm{p})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ), then the resulting decision accuracy is a non-decreasing function, which follows trivially from the proof of Lemma 2. In fact, the line corresponds to one of the line segments from the piece-wise linear graph from the previous section as depicted in Figure 3(a). Besides, we can have multiple variables as before. The surface obtained is non-decreasing and polynomial. The surface corresponds to one of the fragments from the graph discussed in the previous section. A 2d example is depicted in Figure 3(b). The result shows that the decision accuracy has a unique continuous differentiable function, rather than a piecewise function with different functions in different segments.

Refer to caption
(a) One Source pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Refer to caption
(b) Two sources p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Figure 3. Sensitivity of 𝒑𝒑\bm{p}bold_italic_p with 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG fixed

4.3. Trust Sensitivity Analysis

Alternatively, we can take trust to be variable and trustworthiness to be fixed. Different from before, the actual probability of each indicator vector ((𝝊)𝝊\mathbb{P}(\bm{\upsilon})blackboard_P ( bold_italic_υ )) now remains unchanged, but the decisions may change (as 𝒑^(𝝊)subscript^𝒑𝝊\mathbb{P}_{\hat{\bm{p}}}(\bm{\upsilon})blackboard_P start_POSTSUBSCRIPT over^ start_ARG bold_italic_p end_ARG end_POSTSUBSCRIPT ( bold_italic_υ ) and accordingly 𝔻W(𝒑^)subscript𝔻𝑊^𝒑\mathbb{D}_{W}(\hat{\bm{p}})blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG ) may change). Then we can analyze what happens if a trust value used for decision making moves away from the actual trustworthiness in either direction.

First, consider the case where we vary the trust value of only one source in ω(𝒑^,𝒑)𝜔^𝒑𝒑\omega(\hat{\bm{p}},\bm{p})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ). We get a uni-modal discontinuous staircase function, which is non-decreasing when p^i<pisubscript^𝑝𝑖subscript𝑝𝑖\hat{p}_{i}<p_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and non-increasing when p^i>pisubscript^𝑝𝑖subscript𝑝𝑖\hat{p}_{i}>p_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Figure 4(a) depicts Example 2 with one variable. Second, when trust values of multiple sources are variable, the resulting surface consists of flat fragments at different heights, with an increasing height with proximity to the point 𝒑^=𝒑^𝒑𝒑\hat{\bm{p}}=\bm{p}over^ start_ARG bold_italic_p end_ARG = bold_italic_p. Figure 4(b) depicts Example 2 with p^1subscript^𝑝1\hat{p}_{1}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and p^2subscript^𝑝2\hat{p}_{2}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT being the variables. Generally:

Lemma 0.

Let f(𝐩^)=ω(𝐩^,𝐩)𝑓^𝐩𝜔^𝐩𝐩f(\hat{\bm{p}})=\omega(\hat{\bm{p}},\bm{p})italic_f ( over^ start_ARG bold_italic_p end_ARG ) = italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ), where the trustworthiness 𝐩𝐩\bm{p}bold_italic_p is constant. The function f(𝐩^)𝑓^𝐩f(\hat{\bm{p}})italic_f ( over^ start_ARG bold_italic_p end_ARG ) is a discontinuous staircase function consisting of flat plateaus. Decision accuracy reaches the maximum at the plateau containing the point 𝐩^=𝐩^𝐩𝐩\hat{\bm{p}}=\bm{p}over^ start_ARG bold_italic_p end_ARG = bold_italic_p.

Sketch of Proof.

The probability that a decision is correct depends on 𝒑𝒑\bm{p}bold_italic_p, which is constant. Changing 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG does not affect the probability that a decision is correct, until it reaches a point where it changes the actual decision away from the optimum. Then, there is a discontinuous step to a new platform. ∎

An insight is that the nearby points are more likely to be on the same plateau. In other words, there is an area of trust values around the trustworthiness values, meaning small estimation deviation may be unlikely to affect the accuracy. However, it is possible that a certain trustworthiness 𝒑𝒑\bm{p}bold_italic_p is exactly at a border (or corner) of a plateau, meaning that even a tiny difference between trustworthiness and trust can lead to a staircase difference in correctness. The positive news is that the plateaus directly bordering the one containing 𝒑𝒑\bm{p}bold_italic_p are still more often correct than the ones further away.

Besides, while both underestimation and overestimation cause wrong judgment on (𝝊)𝝊\mathbb{P}(\bm{\upsilon})blackboard_P ( bold_italic_υ ) vs. (𝝊)𝝊\mathbb{P}(-\bm{\upsilon})blackboard_P ( - bold_italic_υ ), the numerical (Figure 4) results imply that overestimation perhaps results in the worse accuracy degradation compared with underestimation. Our intuition is that, if there is a high p𝑝pitalic_p-valued source, then that source tends to have a lot of sway on the vote, so any inaccuracies will be noticeable, whereas a low p𝑝pitalic_p-valued source tends to only matter in cases where the vote is tight, and thus any inaccuracies tend to matter less. From a micro perspective, the underlying reason might be that overestimation of a trustworthiness value makes the difference |(𝝊)(𝝊)|𝝊𝝊\left|\mathbb{P}(\bm{\upsilon})-\mathbb{P}(-\bm{\upsilon})\right|| blackboard_P ( bold_italic_υ ) - blackboard_P ( - bold_italic_υ ) | also overestimated, while for underestimation, the difference would be underestimated. Therefore, when the estimation error is typically inevitable, it is better to underestimate trustworthiness.

Refer to caption
(a) One Source p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Refer to caption
(b) Two sources p^1subscript^𝑝1\hat{p}_{1}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,p^2subscript^𝑝2\hat{p}_{2}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Figure 4. Sensitivity of 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG with 𝒑𝒑\bm{p}bold_italic_p fixed

5. Stability

The results of the Parameter Sensitivity Section are unsurprising. Increasing trustworthiness typically increases correctness, and cannot decrease correctness. Hence, if a source is believed to decide correctly with probability p^^𝑝\hat{p}over^ start_ARG italic_p end_ARG , while its trustworthiness p<p^𝑝^𝑝p<\hat{p}italic_p < over^ start_ARG italic_p end_ARG, then the actual correctness achieved is lower than what the decision maker believes: ω(𝒑^,𝒑)<ω(𝒑^,𝒑^)𝜔^𝒑𝒑𝜔^𝒑^𝒑\omega(\hat{\bm{p}},\bm{p}){<}\omega(\hat{\bm{p}},\hat{\bm{p}})italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ) < italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ). Vice versa when p>p^𝑝^𝑝p>\hat{p}italic_p > over^ start_ARG italic_p end_ARG. Our suspicion is that these two effects cancel each other out, if the algorithm that establishes trust is not biased towards overly trusting or being suspicious on average. We call this property Stability of Correctness, and prove it absolutely holds for WMV.

A better procedure to obtain trust returns values closer to the trustworthiness values, with little variance, meaning what is believed about the sources is close to the ground truth. The quality of the procedure does not affect the Stability of Correctness at all when it is unbiased, which may initially seem counter-intuitive. However, another property captures the idea that even when it’s unbiased, poor trust values still result in worse performance of WMV, Stability of Optimality. We prove Stability of Optimality does not hold absolutely, but that drop in the performance is bounded.

Beforehand, we need to formally define what we mean by an algorithm or procedure to establish trust values, and by it being unbiased (on average).

5.1. Parameter Distributions

We introduce random variables for our parameters. For trustworthiness: Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a random variable with outcome pi[0,1]subscript𝑝𝑖01p_{i}\in[0,1]italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 0 , 1 ], and 𝑷𝑷\bm{P}bold_italic_P is a joint random variable with outcome 𝒑=(p1,,pn)𝒑subscript𝑝1subscript𝑝𝑛\bm{p}=(p_{1},\dots,p_{n})bold_italic_p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Similarly, for trust: P^isubscript^𝑃𝑖\hat{P}_{i}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a random variable with outcome p^i[0,1]subscript^𝑝𝑖01\hat{p}_{i}\in[0,1]over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 0 , 1 ], and 𝑷^bold-^𝑷\bm{\hat{P}}overbold_^ start_ARG bold_italic_P end_ARG is a joint random variable with outcome 𝒑^=(p^1,,p^n)^𝒑subscript^𝑝1subscript^𝑝𝑛\hat{\bm{p}}=(\hat{p}_{1},\dots,\hat{p}_{n})over^ start_ARG bold_italic_p end_ARG = ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). The uncertainty of source trustworthiness may be due to lack of behavior consistency, or experience, so the sources can not provide stable-quality feedback. On the other hand, inadequate interaction with sources or inaccurate modeling by decision maker may incur uncertain trust estimation.

Weighted Majority Voting requires a weight for each source which is determined by p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (the outcome of P^isubscript^𝑃𝑖\hat{P}_{i}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). Practical usage of WMV, therefore, must have some algorithms to arrive at values for 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG. Depending on the quality of the algorithm, there is a degree of correlation between trust and trustworthiness: 𝑷^bold-^𝑷\bm{\hat{P}}overbold_^ start_ARG bold_italic_P end_ARG and 𝑷𝑷\bm{P}bold_italic_P. We consider the procedure to get the trust values 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG as unbiased when the expectation of trustworthiness equals the trust value: 𝔼(𝑷)=𝒑^𝔼𝑷^𝒑\mathbb{E}(\bm{P})=\hat{\bm{p}}blackboard_E ( bold_italic_P ) = over^ start_ARG bold_italic_p end_ARG. Hence, if an unbiased trust value p^isubscript^𝑝𝑖\hat{p}_{i}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is 0.70.70.70.7, then the trustworthiness Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can sometimes be greater or smaller than 0.70.70.70.7. Note that this is a reasonable assumption for various machine learning-based procedures or Bayesian learning in particular. In reality, we cannot guarantee that any machine learning method is completely free of such bias, but the unbiased case is interesting to study, and we expect any residual bias to be fairly small, if the algorithm is configured using sufficient empirical data.

We extend our definition of ω𝜔\omegaitalic_ω to accept random variables as parameters. In that case, the output of ω𝜔\omegaitalic_ω is a distribution over accuracy. The expectation of such decision accuracy is:

𝔼(ω(𝑷^,𝑷))=𝒑^,𝒑(𝑷^=𝒑^,𝑷=𝒑)ω(𝒑^,𝒑)𝔼𝜔bold-^𝑷𝑷subscript^𝒑𝒑formulae-sequencebold-^𝑷^𝒑𝑷𝒑𝜔^𝒑𝒑\mathbb{E}(\omega(\bm{\hat{P}},\bm{P}))=\sum_{\hat{\bm{p}},\bm{p}}\mathbb{P}(% \bm{\hat{P}}=\hat{\bm{p}},\bm{P}=\bm{p})\omega(\hat{\bm{p}},\bm{p})blackboard_E ( italic_ω ( overbold_^ start_ARG bold_italic_P end_ARG , bold_italic_P ) ) = ∑ start_POSTSUBSCRIPT over^ start_ARG bold_italic_p end_ARG , bold_italic_p end_POSTSUBSCRIPT blackboard_P ( overbold_^ start_ARG bold_italic_P end_ARG = over^ start_ARG bold_italic_p end_ARG , bold_italic_P = bold_italic_p ) italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_p ) (4)

Besides, there can be an ideal situation where ”magically” the decision maker knows the actual trustworthiness variable (i.e., 𝑷𝑷\bm{P}bold_italic_P), and can use it to make decisions. The expected probability of making correct decision is,

𝔼(ω(𝑷,𝑷))=𝒑(𝑷=𝒑)ω(𝒑,𝒑)𝔼𝜔𝑷𝑷subscript𝒑𝑷𝒑𝜔𝒑𝒑\mathbb{E}(\omega(\bm{P},\bm{P}))=\sum_{\bm{p}}\mathbb{P}(\bm{P}=\bm{p})\omega% (\bm{p},\bm{p})blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ) = ∑ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT blackboard_P ( bold_italic_P = bold_italic_p ) italic_ω ( bold_italic_p , bold_italic_p ) (5)

5.2. Stability of Correctness

In this section, we do not care about what the distribution of 𝑷𝑷\bm{P}bold_italic_P actually looks like, as long as 𝔼(𝑷)=𝒑^𝔼𝑷^𝒑\mathbb{E}(\bm{P})=\hat{\bm{p}}blackboard_E ( bold_italic_P ) = over^ start_ARG bold_italic_p end_ARG, meaning the trust values used for decision making are unbiased. The main result is that in this case, the decision accuracy that 𝒟Wsubscript𝒟𝑊\mathcal{D}_{W}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT is believed to achieve by the decision maker, equals the probability that the decision is actually correct. This is an important positive result, that supports the idea of using WMV in practice. Decision makers are not delusional about the correctness of their decisions. Formally, we define the property of Stability of Correctness (SoC) as:

Theorem 5.

Stability of Correctness (SoC): For WMV, if 𝐩^=𝔼(𝐏)^𝐩𝔼𝐏\hat{\bm{p}}=\mathbb{E}(\bm{P})over^ start_ARG bold_italic_p end_ARG = blackboard_E ( bold_italic_P ), then 𝔼(ω(𝐩^,𝐏))ω(𝐩^,𝐩^)=0𝔼𝜔^𝐩𝐏𝜔^𝐩^𝐩0\mathbb{E}(\omega(\hat{\bm{p}},\bm{P}))-\omega(\hat{\bm{p}},\hat{\bm{p}})=0blackboard_E ( italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_P ) ) - italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ) = 0.

Sketch of Proof.

It follows from the fact in Section 4.2 that the indicator vector set 𝔻W(𝒑^)subscript𝔻𝑊^𝒑\mathbb{D}_{W}(\hat{\bm{p}})blackboard_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG ) where decisions are supposed to be correct remains unchanged, when 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG is unchanged. Also consider the fact that each Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is independently distributed. ∎

We show the results of two Monte Carlo simulations with 100,000100000100,000100 , 000 runs over Example 2 to demonstrate the effect of distribution variance on the expected correctness of an unbiased estimate. In Figure 5(a), we depict ω(𝒑^,𝒑^)𝜔^𝒑^𝒑\omega(\hat{\bm{p}},\hat{\bm{p}})italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ) and 𝔼(ω(𝒑^,𝑷))𝔼𝜔^𝒑𝑷\mathbb{E}(\omega(\hat{\bm{p}},\bm{P}))blackboard_E ( italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_P ) ), where trustworthiness 𝑷𝑷\bm{P}bold_italic_P is a Beta distribution with expected value 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG equal to trust (unbiased) and a variance set by the x𝑥xitalic_x-axis. This figure shows the variance of the trustworthiness 𝑷𝑷\bm{P}bold_italic_P has no effect on the correctness on average, which confirms our theorem. In contrast, in Figure 5(b), we depict ω(𝒑,𝒑)𝜔𝒑𝒑\omega(\bm{p},\bm{p})italic_ω ( bold_italic_p , bold_italic_p ) and 𝔼(ω(𝑷^,𝒑))𝔼𝜔bold-^𝑷𝒑\mathbb{E}(\omega(\bm{\hat{P}},\bm{p}))blackboard_E ( italic_ω ( overbold_^ start_ARG bold_italic_P end_ARG , bold_italic_p ) ), letting the trust be the quantity being a random variable, distributed around trustworthiness with increasing variance. Unsurprisingly, this figure shows that a more divergent trust distribution leads to lower average correctness 𝔼(ω(𝑷^,𝒑))𝔼𝜔bold-^𝑷𝒑\mathbb{E}(\omega(\bm{\hat{P}},\bm{p}))blackboard_E ( italic_ω ( overbold_^ start_ARG bold_italic_P end_ARG , bold_italic_p ) ) since the trust is more likely to be far away from trustworthiness and results in accuracy degradation. Furthermore, 𝔼(ω(𝑷^,𝒑))𝔼𝜔bold-^𝑷𝒑\mathbb{E}(\omega(\bm{\hat{P}},\bm{p}))blackboard_E ( italic_ω ( overbold_^ start_ARG bold_italic_P end_ARG , bold_italic_p ) ) can never exceed ω(𝒑,𝒑)𝜔𝒑𝒑\omega(\bm{p},\bm{p})italic_ω ( bold_italic_p , bold_italic_p ), in line with the conclusion of section 4.3. In the next section, we will study this case further.

Refer to caption
(a) 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG fixed, 𝑷𝑷\bm{P}bold_italic_P probabilistic
Refer to caption
(b) 𝒑𝒑\bm{p}bold_italic_p fixed, 𝑷^bold-^𝑷\bm{\hat{P}}overbold_^ start_ARG bold_italic_P end_ARG probabilistic
Figure 5. Effect of variance on Stability of Correctness

5.3. Stability of Optimality

Although for Stability of Correctness the shape (and variance) of the trustworthiness distribution was irrelevant, intuitively a distribution with less variance should still be better for the decision maker. We introduce another stability property in this section to capture this idea: Stability of Optimality (SoO). Formally, it means whether decisions made with the trustworthiness variables revealed are as good as those made only with the trust values available. We formally capture this gap with the definition below:

SoO(𝑷)=𝔼(ω(𝑷,𝑷))𝔼(ω(𝒑^,𝑷))𝑆𝑜𝑂𝑷𝔼𝜔𝑷𝑷𝔼𝜔^𝒑𝑷SoO(\bm{P})=\mathbb{E}(\omega(\bm{P},\bm{P}))-\mathbb{E}(\omega(\hat{\bm{p}},% \bm{P}))italic_S italic_o italic_O ( bold_italic_P ) = blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ) - blackboard_E ( italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_P ) ) (6)

In other words, it also measures compared with using trust to decide, how much the decision accuracy can be improved, if trustworthiness values are available. Note that an equivalent (via Theorem 5) formulation is: 𝔼(ω(𝑷,𝑷))ω(𝒑^,𝒑^)𝔼𝜔𝑷𝑷𝜔^𝒑^𝒑\mathbb{E}(\omega(\bm{P},\bm{P}))-\omega(\hat{\bm{p}},\hat{\bm{p}})blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ) - italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ), when 𝒑^=𝔼(𝑷)^𝒑𝔼𝑷\hat{\bm{p}}=\mathbb{E}(\bm{P})over^ start_ARG bold_italic_p end_ARG = blackboard_E ( bold_italic_P ).

To analyze Stability of Optimality formally, we introduce some definitions. Assume the trustworthiness pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is bounded in some range, e.g. i,aipibifor-all𝑖subscript𝑎𝑖subscript𝑝𝑖subscript𝑏𝑖\forall i,a_{i}\leq p_{i}\leq b_{i}∀ italic_i , italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Denote the value space of 𝒑𝒑\bm{p}bold_italic_p as Hypercube \mathbb{H}blackboard_H, 𝒑𝒑\bm{p}\in\mathbb{H}bold_italic_p ∈ blackboard_H. The set of vertexes of the Hypercube is denoted as Vertex Space \mathbb{Q}blackboard_Q, where each vertex 𝒒𝒒\bm{q}\in\mathbb{Q}bold_italic_q ∈ blackboard_Q and qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is either aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Defined in the hypercube, the distribution of 𝑷𝑷\bm{P}bold_italic_P with expectation 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG can be arbitrary. We name an extreme distribution for random variables 𝑷𝑷\bm{P}bold_italic_P in the vertex space \mathbb{Q}blackboard_Q of the hypercube, where (Pi=ai)=bip^ibiaisubscript𝑃𝑖subscript𝑎𝑖subscript𝑏𝑖subscript^𝑝𝑖subscript𝑏𝑖subscript𝑎𝑖\mathbb{P}(P_{i}=a_{i})=\frac{b_{i}-\hat{p}_{i}}{b_{i}-a_{i}}blackboard_P ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG, (Pi=bi)=p^iaibiaisubscript𝑃𝑖subscript𝑏𝑖subscript^𝑝𝑖subscript𝑎𝑖subscript𝑏𝑖subscript𝑎𝑖\mathbb{P}(P_{i}=b_{i})=\frac{\hat{p}_{i}-a_{i}}{b_{i}-a_{i}}blackboard_P ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG.

When trustworthiness is revealed for decision-making, we observe that a high variance in trustworthiness is good for accuracy, especially the extreme distribution. That is, when a source is more trustworthy than the average, increasing its weight enhances overall decision accuracy. Conversely, when a source is less trustworthy than the average, it can degrade decision quality to some extent, but the impact is mitigated by reducing the weight of this source. In other words, it’s better to have a 50%percent5050\%50 % chance for a source with P=0.9𝑃0.9P=0.9italic_P = 0.9 and 50%percent5050\%50 % for P=0.5𝑃0.5P=0.5italic_P = 0.5, than a source with p=0.7𝑝0.7p=0.7italic_p = 0.7. Formally,

Lemma 0.

Take random variables 𝐏𝐏\bm{P}bold_italic_P defined in a Hypercube with 𝔼(𝐏)=𝐩^𝔼𝐏^𝐩\mathbb{E}(\bm{P})=\hat{\bm{p}}blackboard_E ( bold_italic_P ) = over^ start_ARG bold_italic_p end_ARG. The correctness of 𝔼(ω(𝐏,𝐏))𝔼𝜔𝐏𝐏\mathbb{E}(\omega(\bm{P},\bm{P}))blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ) is bounded by the extreme distribution:

𝔼(ω(𝑷,𝑷))𝒒(ω(𝒒,𝒒)i=1n(Pi=qi))𝔼𝜔𝑷𝑷subscript𝒒𝜔𝒒𝒒superscriptsubscriptproduct𝑖1𝑛subscript𝑃𝑖subscript𝑞𝑖\mathbb{E}(\omega(\bm{P},\bm{P}))\leq\sum_{\bm{q}\in\mathbb{Q}}\left(\omega(% \bm{q},\bm{q})\prod_{i=1}^{n}\mathbb{P}(P_{i}=q_{i})\right)blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ) ≤ ∑ start_POSTSUBSCRIPT bold_italic_q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_ω ( bold_italic_q , bold_italic_q ) ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_P ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) (7)
Sketch of Proof.

Per Lemma 2, ω(𝒑,𝒑)𝜔𝒑𝒑\omega(\bm{p},\bm{p})italic_ω ( bold_italic_p , bold_italic_p ) is convex in one dimension, and the extreme distribution maximizes 𝔼(ω(𝑷,𝑷))𝔼𝜔𝑷𝑷\mathbb{E}(\omega(\bm{P},\bm{P}))blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ) in that dimension. The Lemma follows by independence of the trustworthiness variables. ∎

Lemma 6 demonstrates that the decision accuracy is bounded (not always 100%), even in the ideal situation where trustworthiness is given, and it is determined by the distribution of trustworthiness. This is intuitive as more trustworthy sources should lead to better decisions. Further, if trustworthiness is a constant rather than a random variable, Lemma 6 still holds. That is:

Corollary 0.

For any point 𝐩𝐩\bm{p}bold_italic_p in the Hypercube, ω(𝐩,𝐩)𝜔𝐩𝐩\omega(\bm{p},\bm{p})italic_ω ( bold_italic_p , bold_italic_p ) is bounded by a linear combination of the correctness of the vertexes of the hypercube.

ω(𝒑,𝒑)1i=1n(biai)𝒒ω(𝒒,𝒒)i:qi=ai(bipi)i:qi=bi(piai)𝜔𝒑𝒑1superscriptsubscriptproduct𝑖1𝑛subscript𝑏𝑖subscript𝑎𝑖subscript𝒒𝜔𝒒𝒒subscriptproduct:𝑖subscript𝑞𝑖subscript𝑎𝑖subscript𝑏𝑖subscript𝑝𝑖subscriptproduct:𝑖subscript𝑞𝑖subscript𝑏𝑖subscript𝑝𝑖subscript𝑎𝑖\omega(\bm{p},\bm{p})\!\leq\frac{1}{\prod_{i=1}^{n}(b_{i}-a_{i})}\!\!\sum_{\bm% {q}\in\mathbb{Q}}\!\omega(\bm{q},\bm{q})\!\!\!\prod_{i:q_{i}=a_{i}}\!\!\!(b_{i% }-p_{i})\!\!\!\prod_{i:q_{i}=b_{i}}\!\!\!(p_{i}-a_{i})italic_ω ( bold_italic_p , bold_italic_p ) ≤ divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ∑ start_POSTSUBSCRIPT bold_italic_q ∈ blackboard_Q end_POSTSUBSCRIPT italic_ω ( bold_italic_q , bold_italic_q ) ∏ start_POSTSUBSCRIPT italic_i : italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∏ start_POSTSUBSCRIPT italic_i : italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (8)
Proof.

Let (𝑷=𝒑)=1𝑷𝒑1\mathbb{P}(\bm{P}=\bm{p})=1blackboard_P ( bold_italic_P = bold_italic_p ) = 1 in Lemma 6. ∎

Refer to caption
(a) p^^𝑝\hat{p}over^ start_ARG italic_p end_ARG fixed, P𝑃Pitalic_P probabilistic
Refer to caption
(b) Example of Beta Distribution
Figure 6. Effect of Variance on Stability of Optimality

Stability of Optimality does not strictly hold, as the gap (Equation 6) is typically non-zero. We prove upper bounds on the gap, which goes to 00 as the distribution of trustworthiness becomes tighter. Let δ𝛿\deltaitalic_δ quantify the size of the support of the distribution.

Theorem 8.

Stability of Optimality: If 𝐩^=𝔼(𝐏)^𝐩𝔼𝐏\hat{\bm{p}}=\mathbb{E}(\bm{P})over^ start_ARG bold_italic_p end_ARG = blackboard_E ( bold_italic_P ) and all Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT have support [p^iδi,p^i+δi]subscript^𝑝𝑖subscript𝛿𝑖subscript^𝑝𝑖subscript𝛿𝑖[\hat{p}_{i}-\delta_{i},\hat{p}_{i}+\delta_{i}][ over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], then

SoO(𝑷)(1ω(𝒑^,𝒑^))(1i=1n(112δi1p^i))𝑆𝑜𝑂𝑷1𝜔^𝒑^𝒑1superscriptsubscriptproduct𝑖1𝑛112subscript𝛿𝑖1subscript^𝑝𝑖SoO(\bm{P})\leq\!\left(1-\omega(\hat{\bm{p}},\hat{\bm{p}})\right)\!\cdot\!% \left(1-\prod_{i=1}^{n}\left(1-\frac{1}{2}\cdot\frac{\delta_{i}}{1-\hat{p}_{i}% }\right)\right)italic_S italic_o italic_O ( bold_italic_P ) ≤ ( 1 - italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ) ) ⋅ ( 1 - ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ) (9)

A weaker but more intuitive bound is also derived using the Bernoulli Inequality,

SoO(𝑷)1ω(𝒑^,𝒑^)2i=1nδi1p^i𝑆𝑜𝑂𝑷1𝜔^𝒑^𝒑2superscriptsubscript𝑖1𝑛subscript𝛿𝑖1subscript^𝑝𝑖SoO(\bm{P})\leq\frac{1-\omega(\hat{\bm{p}},\hat{\bm{p}})}{2}\sum_{i=1}^{n}% \frac{\delta_{i}}{1-\hat{p}_{i}}italic_S italic_o italic_O ( bold_italic_P ) ≤ divide start_ARG 1 - italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ) end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (10)
Sketch of Proof.

Via Lemma 6, we know the extreme distribution that maximizes 𝔼(ω(𝑷,𝑷))𝔼𝜔𝑷𝑷\mathbb{E}(\omega(\bm{P},\bm{P}))blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ), relying on the correctness of the vertexes. Via Corollary 7, the upper bounds for the correctness of vertexes can be obtained, only relying on ω(𝒑^,𝒑^)𝜔^𝒑^𝒑\omega(\hat{\bm{p}},\hat{\bm{p}})italic_ω ( over^ start_ARG bold_italic_p end_ARG , over^ start_ARG bold_italic_p end_ARG ). With some algebra, both bounds (9) and (10) can be obtained. ∎

While there is a gap between making decisions based on unbiased trust and based on trustworthiness, Theorem 8 proves that this gap is bounded by a relatively small threshold, implying that the unbiased trust would not reduce the decision quality too much. The upper bound is influenced by the distribution of trustworthiness, and converges towards zero with that variance reducing.

To illustrate the effect of distribution variance on SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ), we provide a Monte Carlo simulation with 100,000100000100,000100 , 000 runs over Example 2. In Figure 6(a), we measure 𝔼(ω(𝑷,𝑷))𝔼𝜔𝑷𝑷\mathbb{E}(\omega(\bm{P},\bm{P}))blackboard_E ( italic_ω ( bold_italic_P , bold_italic_P ) ) and 𝔼(ω(𝒑^,𝑷))𝔼𝜔^𝒑𝑷\mathbb{E}(\omega(\hat{\bm{p}},\bm{P}))blackboard_E ( italic_ω ( over^ start_ARG bold_italic_p end_ARG , bold_italic_P ) ), where 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG is constant, and 𝑷𝑷\bm{P}bold_italic_P follows Beta distribution with increasing variance. It presents that the larger the variance is, the larger SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) is, which validates the result of Lemma 6. To put the quantity of the variance in context, we provide examples of trustworthiness being Beta distributions with a certain variance in Figure 6(b).

Refer to caption
(a) Effect of δ𝛿\deltaitalic_δ
Refer to caption
(b) Effect of n𝑛nitalic_n
Refer to caption
(c) Effect of single p^1subscript^𝑝1\hat{p}_{1}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
Refer to caption
(d) Effect of identical 𝒑^^𝒑\hat{\bm{p}}over^ start_ARG bold_italic_p end_ARG
Figure 7. Parameter Analysis on Stability of Optimality

In Figure 7, we provide a parameter analysis with numerical experiments to demonstrate how they influence SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) and the bounds. Example 2 is used in Figures 7(a) and 7(c); Figures 7(b) and 7(d) needed some adaptation, where the sources have identical p^=0.7^𝑝0.7\hat{p}=0.7over^ start_ARG italic_p end_ARG = 0.7. And δ=0.05𝛿0.05\delta=0.05italic_δ = 0.05 is the default for all the sources.

Figure 7(a) represents that with δ𝛿\deltaitalic_δ decreasing, SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) and its bounds also decrease to zero. There is a linear bound on the effect of δ𝛿\deltaitalic_δ (via the weaker bound). This implies that the uncertainty level of sources plays a significant role in determining the Stability of Optimality. In Figure 7(b), with the number of identical sources n𝑛nitalic_n increasing (p^=0.7^𝑝0.7\hat{p}=0.7over^ start_ARG italic_p end_ARG = 0.7), SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) and the bounds change little, which implies source number perhaps influences little on the accuracy gap SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) (i.e., the gap in accuracy between decision making using unbiased trust and using trustworthiness). This means that the number of sources may barely influence how valuable it is to know the sources’ combined trustworthiness.

In Figure 7(c), only p^1subscript^𝑝1\hat{p}_{1}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is variable and it shows that SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) always remains low level and it is a piecewise function with local maximization. As studied in Section 4.1, it becomes evident that the local maximization results from the piece-wise nature of the trustworthiness effect on decision accuracy. In Figure 7(d), where all p^^𝑝\hat{p}over^ start_ARG italic_p end_ARG are equal and increase to 1111 simultaneously, SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) almost decreases to 00. It makes sense because with the trustworthiness increasing, the decision accuracy increases more slowly due to the concavity of WMV with identical sources (See Lemma 3).

To conclude, the gap of Stability of Optimality SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) is somewhat sensitive to parameter δ𝛿\deltaitalic_δ, which depicts the range and variance of sources trustworthiness, but not sensitive to the number of sources and other parameters. Overall, the optimality of WMV has a high degree of stability, meaning SoO(𝑷)𝑆𝑜𝑂𝑷SoO(\bm{P})italic_S italic_o italic_O ( bold_italic_P ) tends to be close to 00.

6. Conclusion and Future Work

The common dependence on an estimate or trust of source trustworthiness brings out the need to analyze whether WMV is stable, meaning having tolerant decision inaccuracy with the difference between trust and trustworthiness bounded.

We first analyze how sensitive WMV is to the changes in trust and trustworthiness. We find that small deviation between trust and trustworthiness does not affect accuracy, and also underestimation usually harms less than overestimation. We then introduced two statistical properties of WMV, Stability of Correctness and Stability of Optimality. Assuming that on average the estimation procedure has no bias towards over or underestimating, we proved that Stability of Correctness holds absolutely, regardless of which estimation procedure is used or how well it estimates. This guarantees that relying on an unbiased estimate of source trustworthiness is safe, which is also common in practice. However, the amount of inefficiency introduced by relying on an estimate instead of the trustworthiness itself is limited, as we prove a linear bound on Stability of Optimality. The proposed formal framework and the two types of stability properties can be generalized to analyze other types of decision mechanisms or scenarios (e.g., where sources are dependent).

For future work, beyond the bounded assumption, it’s valuable to explore a more precise characterization of the impact of the trustworthiness distribution on SoO in the unbiased setting. Besides, it is also worth studying the stability of WMV in a more general case, namely when trust is a biased estimate of trustworthiness. Some researchers have found that although some sources are assigned weights, they have no influence on the decision result Allouche et al. (2021); Bowen (2009). In other words, we may distribute more estimate error on such sources.

{acks}

This work was supported by National Natural Science Foundation of China (NSFC) under Grant 62106223 and (NSFC) Grant 62293511.

References

  • (1)
  • Allouche et al. (2021) Tahar Allouche, Bruno Escoffier, Stefano Moretti, and Meltem Öztürk. 2021. Social ranking manipulability for the cp-majority, Banzhaf and lexicographic excellence solutions. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 17–23.
  • Bentahar et al. (2022) Jamal Bentahar, Nagat Drawel, and Abdeladim Sadiki. 2022. Quantitative group trust: A two-stage verification approach. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 100–108.
  • Berend and Kontorovich (2013) Daniel Berend and Aryeh Kontorovich. 2013. A sharp estimate of the binomial mean absolute deviation with applications. Statistics & Probability Letters 83, 4 (2013), 1254–1259. https://doi.org/10.1016/j.spl.2013.01.023
  • Berend and Kontorovich (2015) Daniel Berend and Aryeh Kontorovich. 2015. A finite sample analysis of the Naive Bayes classifier. J. Mach. Learn. Res. 16, 1 (2015), 1519–1545.
  • Boland (1989) Philip J. Boland. 1989. Majority Systems and the Condorcet Jury Theorem. Journal of the Royal Statistical Society: Series D (The Statistician) 38, 3 (1989), 181–189.
  • Bowen (2009) Larry Bowen. 2009. Weighted voting systems.
  • Carbo and Molina (2023) Javier Carbo and Jose M Molina. 2023. Promoting cooperation of agents through aggregation of services in trust models. Knowledge-Based Systems 277 (2023), 110804.
  • Condorcet (1785) marquis de Condorcet, Jean-Antoine-Nicolas de Caritat. 1785. Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie royale. 1743–1794 pages.
  • Dawid and Skene (1979) Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28, 1 (1979), 20–28.
  • Dong et al. (2015) Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based trust: estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment 8, 9 (2015), 938–949.
  • Drawel et al. (2022) Nagat Drawel, Jamal Bentahar, Amine Laarej, and Gaith Rjoub. 2022. Formal verification of group and propagated trust in multi-agent systems. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 19.
  • Drawel et al. (2020) Nagat Drawel, Hongyang Qu, Jamal Bentahar, and Elhadi Shakshuki. 2020. Specification and automatic verification of trust-based multi-agent systems. Future Generation Computer Systems 107 (2020), 1047–1060.
  • Freedman (1963) David A Freedman. 1963. On the asymptotic behavior of Bayes’ estimates in the discrete case. The Annals of Mathematical Statistics 34, 4 (1963), 1386–1403.
  • Gao et al. (2016) Chao Gao, Yu Lu, and Dengyong Zhou. 2016. Exact exponent in optimal rates for crowdsourcing. In International Conference on Machine Learning. PMLR, 603–611.
  • Ge et al. (2023) Yan Ge, Jun Ma, Li Zhang, ** Lu. 2023. Trustworthiness-aware knowledge graph representation for recommendation. Knowledge-Based Systems 278 (2023), 110865.
  • Germain et al. (2015) Pascal Germain, Alexandre Lacasse, Francois Laviolette, Mario March, and Jean-Francis Roy. 2015. Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm. Journal of Machine Learning Research 16, 26 (2015), 787–860.
  • Grofman et al. (1983) Bernard Grofman, Guillermo Owen, and Scott L Feld. 1983. Thirteen theorems in search of the truth. Theory and decision 15, 3 (1983), 261–278.
  • Guan et al. (2018) Melody Guan, Varun Gulshan, Andrew Dai, and Geoffrey Hinton. 2018. Who Said What: Modeling Individual Labelers Improves Classification. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (Apr. 2018).
  • Guo (2023) Zhaori Guo. 2023. Multi-Advisor Dynamic Decision Making. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 2949–2951.
  • James (2020) JQ James. 2020. Sybil attack identification for crowdsourced navigation: A self-supervised deep learning approach. IEEE Transactions on Intelligent Transportation Systems 22, 7 (2020), 4622–4634.
  • Kotary et al. (2023) James Kotary, Vincenzo Di Vito, and Ferdinando Fioretto. 2023. Differentiable model selection for ensemble learning. In Proceedings of the Fifteen International Joint Conference on Artificial Intelligence, IJCAI-23.
  • Lacasse et al. (2006) Alexandre Lacasse, François Laviolette, Mario Marchand, Pascal Germain, and Nicolas Usunier. 2006. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In NIPS. 769–776.
  • Li and Yu (2014) Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086 (2014).
  • Littlestone and Warmuth (1994) N. Littlestone and M.K. Warmuth. 1994. The Weighted Majority Algorithm. Information and Computation 108, 2 (1994), 212–261.
  • Luo (2023) Yuan Luo. 2023. Incentivizing Sequential Crowdsourcing Systems. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 2697–2699.
  • Manino et al. (2019a) Edoardo Manino, Long Tran-Thanh, and Nicholas Jennings. 2019a. Streaming Bayesian inference for crowdsourced classification. Advances in Neural Information Processing Systems 32 (2019), 12782–12792.
  • Manino et al. (2019b) Edoardo Manino, Long Tran-Thanh, and Nicholas R Jennings. 2019b. On the efficiency of data collection for multiple Naïve Bayes classifiers. Artificial Intelligence 275 (2019), 356–378.
  • Martín-Morató and Mesaros (2023) Irene Martín-Morató and Annamaria Mesaros. 2023. Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023), 902–914.
  • Maystre et al. (2021) Lucas Maystre, Nagarjuna Kumarappan, Judith Bütepage, and Mounia Lalmas. 2021. Collaborative Classification from Noisy Labels. In International Conference on Artificial Intelligence and Statistics. PMLR, 1639–1647.
  • Mazzetto et al. (2021) Alessio Mazzetto, Dylan Sam, Andrew Park, Eli Upfal, and Stephen Bach. 2021. Semi-supervised aggregation of dependent weak supervision sources with performance guarantees. In International Conference on Artificial Intelligence and Statistics. PMLR, 3196–3204.
  • Meir et al. (2023) Reshef Meir, Ofra Amir, Omer Ben-Porat, Tsviel Ben Shabat, Gal Cohensius, and Lirong Xia. 2023. Frustratingly easy truth discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 6074–6083.
  • Moslem et al. (2012) Bassam Moslem, Mohamad Diab, Mohamad Khalil, and Catherine Marque. 2012. Combining data fusion with multiresolution analysis for improving the classification accuracy of uterine EMG signals. EURASIP Journal on Advances in Signal Processing 2012, 1 (2012), 1–9.
  • Muller et al. (2020) Tim Muller, Dongxia Wang, and Jun Sun. 2020. Provably Robust Decisions based on Potentially Malicious Sources of Information. In 2020 IEEE 33rd Computer Security Foundations Symposium (CSF). IEEE, 411–424.
  • Nitzan and Paroush (1982) Shmuel Nitzan and Jacob Paroush. 1982. Optimal decision rules in uncertain dichotomous choice situations. International Economic Review (1982), 289–297.
  • Raykar et al. (2010) Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. Journal of Machine Learning Research 11, 4 (2010).
  • Rekatsinas et al. (2017) Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Ré. 2017. Slimfast: Guaranteed results for data fusion and source reliability. In Proceedings of the 2017 ACM International Conference on Management of Data. 1399–1414.
  • Sardana et al. (2018) Noel Sardana, Robin Cohen, Jie Zhang, and Shuo Chen. 2018. A Bayesian multiagent trust model for social networks. IEEE Transactions on Computational Social Systems 5, 4 (2018), 995–1008.
  • Sen (1977) Amartya Sen. 1977. Social choice theory: A re-examination. Econometrica: journal of the Econometric Society (1977), 53–89.
  • Telang et al. (2023) Pankaj Telang, Munindar P Singh, and Neil Yorke-Smith. 2023. Maintenance commitments: Conception, semantics, and coherence. Artificial Intelligence 324 (2023), 103993.
  • Tong and Kain (1991) Zhijun Tong and Richard Y Kain. 1991. Vote assignments in weighted voting mechanisms. IEEE Trans. Comput. 40, 05 (1991), 664–667.
  • Walter (2008) Elizabeth Walter. 2008. Cambridge advanced learner’s dictionary. Cambridge university press.
  • Wu et al. (2023) Gongqing Wu, Xingrui Zhuo, Xianyu Bao, Xuegang Hu, Richang Hong, and Xindong Wu. 2023. Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph Embedding. ACM Transactions on Knowledge Discovery from Data 17, 5 (2023), 1–26.
  • Wu and Yang (2016) Yihong Wu and Pengkun Yang. 2016. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory 62, 6 (2016), 3702–3720.
  • Wu et al. (2021a) Yi-Shan Wu, Andres Masegosa, Stephan Lorenzen, Christian Igel, and Yevgeny Seldin. 2021a. Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote. Advances in Neural Information Processing Systems 34 (2021).
  • Wu et al. (2021b) Yi-Shan Wu, Andres Masegosa, Stephan Lorenzen, Christian Igel, and Yevgeny Seldin. 2021b. Chebyshev-Cantelli PAC-Bayes-Bennett inequality for the weighted majority vote. Advances in Neural Information Processing Systems 34 (2021), 12625–12636.
  • Yu et al. (2004) Bin Yu, Munindar P Singh, and Katia Sycara. 2004. Develo** trust in large-scale peer-to-peer systems. In IEEE First Symposium onMulti-Agent Security and Survivability, 2004. IEEE, 1–10.
  • Zeynalvand et al. (2021) Leonit Zeynalvand, Tie Luo, Ewa Andrejczuk, Dusit Niyato, Sin G. Teo, and Jie Zhang. 2021. A Blockchain-Enabled Quantitative Approach to Trust and Reputation Management with Sparse Evidence. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (Virtual Event, United Kingdom) (AAMAS ’21). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1707–1708.
  • Zhang et al. (2016) Yuchen Zhang, Xi Chen, Dengyong Zhou, and Michael I Jordan. 2016. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. The Journal of Machine Learning Research 17, 1 (2016), 3537–3580.