ifaamas \acmConference[AAMAS ’24]Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024)May 6 – 10, 2024 Auckland, New ZealandN. Alechina, V. Dignum, M. Dastani, J.S. Sichman (eds.) \copyrightyear2024 \acmYear2024 \acmDOI \acmPrice \acmISBN \acmSubmissionID397 \affiliation \institutionZhejiang University \cityHangzhou \countryChina \affiliation \institutionZhejiang University & \institutionZJU-Hangzhou Global Scientific and Technological Innovation Center \cityHangzhou \countryChina \affiliation \institutionUniversity of Nottingham \cityNottingham \countryUnited Kingdom \affiliation \institutionZhejiang University \cityHangzhou \countryChina \affiliation \institutionZhejiang University \cityHangzhou \countryChina
Stability of Weighted Majority Voting under Estimated Weights
Abstract.
Weighted Majority Voting (WMV) is a well-known decision making rule. The weights of sources are determined by the probabilities that sources provide accurate information (trustworthiness). However, in reality, the trustworthiness is usually not a known quantity to the decision maker – they have to rely on an estimate called trust. An algorithm that computes trust is called unbiased when it has the property that it does not systematically overestimate or underestimate the trustworthiness. To formally analyze the uncertainty to the decision process brought by such unbiased trust values, we introduce and analyze two important properties of WMV: Stability of Correctness and Stability of Optimality. Stability of Correctness measures the difference between the decision accuracy that the decision maker believes he can achieve and the accuracy he actually achieves. We prove Stability of Correctness absolutely holds for WMV – the difference is . Stability of Optimality measures the difference between the actual accuracy of decisions made using trust values, and those made using trustworthiness values. We find a relatively tight upper bound on the Stability of Optimality, meaning that, although using (unbiased) trust values is suboptimal compared to using the true trustworthiness values, the difference is small. Meanwhile, a counter-intuitive observation is that while distributions of trustworthiness influence the Stability of Optimality, the number of sources barely influences it. We also provide an overview of how sensitive decision accuracy is to the changes in trust and trustworthiness.
Key words and phrases:
Weighted Majority Voting, Trust, Stability of Decision Making1. Introduction
Crowd wisdom has been playing a fundamental role in hel** make better decisions in many scenarios, e.g., hiring workers for labeling tasks in crowdsourcing Luo (2023), aggregating classifiers for prediction in ensemble learning Kotary et al. (2023), asking for opinions of reliability in online rating systems Carbo and Molina (2023); Ge et al. (2023), etc. Decisions are derived based on aggregating the information or feedback from a collection of sources, the quality of which can be variable. It can be inaccurate due to lack of expertise, mistakes or malice, e.g., low-quality labeling for machine learning, fake ratings introduced by sellers to promote their reputation, etc.
Among the aggregation mechanisms, Weighted Majority Voting has long been a popular one. Basically, each source supports an option and is assigned a weight. WMV chooses the feedback option that is supported by sources with the maximal total weight. WMV has been seeing its use in a variety of domains ranging from voting Nitzan and Paroush (1982), crowdsourcing Dawid and Skene (1979), classification Littlestone and Warmuth (1994) to trust systems Yu et al. (2004) and even distributed systems Tong and Kain (1991). In different contexts, the weight of a source can mean differently. For instance, in determining a collective choice that is widely acceptable to individuals with diverse preference Sen (1977), WMV is used for preference aggregation and the weight means the importance of an individual. We are more interested in the scenarios where there is a notion of correctness (or accuracy) of decisions, and the weight of a source depends on how trustworthy it is in providing the feedback that corresponds to the correct decision, denoted as trustworthiness, which is usually modelled as a probability value. The examples include the aggregation of the crowd-sourced labels Li and Yu (2014), the crowd-sensed navigation data James (2020), or the outputs of the multiple classifiers Manino et al. (2019b), etc.
While WMV is proven to be optimal when source trustworthiness is given Nitzan and Paroush (1982)111To provide feedback independently is also required for the optimality of WMV Muller et al. (2020); Nitzan and Paroush (1982)., in practice, decision makers have to resort to an estimation or a belief (denoted as trust) that may not equal to the actual values of trustworthiness222Note that we use “trust” and “trustworthiness” to differentiate between what a decision maker trusts or estimates as the probability of a source’s suggesting correctly (regardless of whether he believes in the value or is aware that its just an estimate), and the actual probability value (Refer to their definition difference Walter (2008)).. Deviation in estimation may decrease decision accuracy. There exists plenty of effort to improve the estimation of source trustworthiness by learning from historical data (e.g., direct observations or indirect evidences), with a principle that more data increases the confidence in the estimation Berend and Kontorovich (2013); Wu and Yang (2016). Several approaches even treat the belief about source trustworthiness as its actual values Mazzetto et al. (2021); Maystre et al. (2021). However, no algorithm always produces perfect trust values. It is worth studying how the quality of the trust values impacts the decision quality of WMV. Is WMV able to maintain a tolerant level of decision incorrectness with the inaccuracy in the estimation bounded, meaning having certain levels of stability w.r.t the inaccurate estimation?
In this paper, we propose a formal analysis of the stability properties of WMV. Firstly, we study how sensitive the decision accuracy is to the changes in source trust and trustworthiness, with both the arguments taking fixed values. We find that unsurprisingly, decision accuracy decreases with the increasing deviation from trust to trustworthiness, and a sufficiently small deviation barely influences the accuracy. Besides, compared with overestimating, underestimating trustworthiness is usually less harmful to the decision accuracy. Secondly, we study the influence of trust and trustworthiness in a statistical way. Considering that a decision maker may sometimes overestimate source trustworthiness while sometimes underestimate it, the expectation remains correct – unbiased estimation333Generally, the estimation error always exists, but it is relatively small and can be zero on average with sufficient data Berend and Kontorovich (2013); Freedman (1963). We define two types of stability based on such unbiased estimation: Stability of Correctness and Stability of Optimality. Stability of Correctness reasons whether the decision accuracy a decision maker believes he achieves (i.e., the accuracy he computes with trust) equals what he actually achieves (i.e., the accuracy computed with trustworthiness). We prove that whatever distribution source trustworthiness follows, as long as the estimation is unbiased, a decision maker gets the accuracy as if the trustworthiness is known – absolute stability. This means that the shape and variance of trustworthiness are irrelevant to the Stability of Correctness.
Stability of Optimality reasons whether the decisions made based on unbiased trust values are as good as those made based on trustworthiness. Considering trustworthiness is usually unknown, Stability of Optimality measures the gap between the practical situation where the decision maker decides with trust, and where (magically) he has access to the actual trustworthiness. We prove that Stability of Optimality does not hold for WMV, but the degradation in decision accuracy caused by the incorrect but (averagely) unbiased trust is relatively tightly bounded. That is, decision accuracy with unbiased trust will not be too far off the theoretically determined value. Moreover, unlike Stability of Correctness, the distribution of trustworthiness influences the upper bound of that accuracy gap, and also determines how well the accuracy can be in the ideal situation, namely where trustworthiness is given. Last but not least, while it may usually be perceived that more sources improve accuracy, we observe counterintuitively that source number influences little on the accuracy gap.
The rest of this paper is organized as follows. In Section 2 the related work is presented. In Section 3 we introduce a formal framework to study WMV decision rule. In Section 4 we present how trust and trustworthiness influence the decision accuracy of WMV. In Section 5 we analyze the two types of stability. The numerical analysis is also performed where needed to demonstrate theories.
2. Related Work
The Weighted Majority Voting rule has been studied in several domains, e.g., decision theory, voting theory, management science, and receiving various applications. We focus on the scenarios where the weight of a source or “voter” is determined by how trustworthy it is in suggesting the correct decision. Some approaches utilizing WMV assume source trustworthiness is given Nitzan and Paroush (1982); Berend and Kontorovich (2015), although in practice it is usually unknown. Plenty of work focuses on modeling and learning source trustworthiness from observation and interaction history Zeynalvand et al. (2021); Wu et al. (2023); Ge et al. (2023). Some researchers model trust as a probability value. To get trust, they either rely on frequency estimation by counting the times of making the right decisions Berend and Kontorovich (2013), or solving an optimization problem based on their models by minimizing the decision error rate Rekatsinas et al. (2017) or maximizing the likelihood Dong et al. (2015); Manino et al. (2019a); Meir et al. (2023). Moreover, model-checking-based methods are also applied in quantifying the probability of trust on individual agents, representing the agent’s own beliefs Drawel et al. (2020); Bentahar et al. (2022); Drawel et al. (2022); Telang et al. (2023). Besides, trust also can be modeled as a random variable. Bayesian models have also been proposed and applied to this problem by Raykar et al. (2010); Sardana et al. (2018); Guo (2023), combining the prior knowledge and the observations to infer the trustworthiness. Expectation Maximization-based methods are also proposed to estimate source trustworthiness and the correct decision at the same time, via iterative updating Dawid and Skene (1979); Zhang et al. (2016).
Such learned trust is sometimes treated as an estimation of the source trustworthiness with the deviation considered Gao et al. (2016); Wu et al. (2021a), while sometimes treated equivalently as trustworthiness, namely as the probability of a source suggesting correctly and is further used to evaluate decision accuracy Littlestone and Warmuth (1994); Guan et al. (2018); Martín-Morató and Mesaros (2023). However, trust essentially represents the belief of a decision maker about the source quality, which may deviate from the actual probability. And he may not gain the claimed decision accuracy based on trust.
Besides the efforts in modeling and learning trustworthiness, there exists work that theoretically analyzes how trustworthiness and trust would influence decision accuracy, which is most relevant to ours. Given trustworthiness, the decision accuracy of WMV is analyzed in Berend and Kontorovich (2015) without considering the learning process of trustworthiness. On the other hand, some other work takes the learning process into consideration. To measure the estimation quality, the decision accuracy bounds for learning algorithms have been proposed through PAC techniques in Lacasse et al. (2006); Germain et al. (2015); Wu et al. (2021b). Considering trust is derived from finite samples, some researchers then study precise characterizations of the relationship between the decision accuracy and the sample size in Gao et al. (2016). More recently, tighter bounds for decision accuracy under arbitrary estimation are provided, ignoring particular assumptions for trustworthiness Manino et al. (2019b). Unfortunately, none of them have analyzed the relationship between the estimate error and the decision accuracy of WMV in a quantitative way.
3. Preliminaries
In this section, we outline a formal framework to support our study of the stability of Weighted Majority Voting decision rule. Note that the capital letters represent random variables, and the lower cases represent non-random variables. The bold letters represent a vector of multiple variables, and the non-bold letters represent single variables.
Consider a decision-making scenario, a decision maker is faced with multiple possible decisions and only one of them is correct. The random variable determines which of the options is actually correct, e.g., if is the correct decision. The decision maker receives feedback from a set of sources, . The random variable , with denoting an outcome, represents the feedback of source . The feedback may or may not correspond to the correct decision. For WMV, we assume444For model general decision-making scenarios, the options of feedback and that of decisions may not necessarily equal and may take a many-to-one map**. a one-to-one correspondence between the feedback that suggests the correct decision and the correct decision itself, and denote iff suggests correctly, . Feedback of all the sources is represented by random variable , with denoting an outcome, and its sample space is defined as . A decision mechanism is a function: . The quantity that the decision maker wants to maximize is the probability of making the correct decisions (which we shorthand as decision accuracy or decision correctness throughout the paper): .
we define as a -indicator random variable of whether source suggests the correct decision and as one of its outcome: if and if . For the indicator variables of all the sources i.e., , one of its samples is an indicator vector i.e., . denote the sample space of . Let denote the opposite indicator vector of where source indicators are flipped. The set of all the possible feedback under is denoted as .
We use the following running example in this section to demonstrate the relevant concepts.
Example 1.
There are three sources . If , and the indicator vector , then and .
When decision “correctness” is a concern, Weighted Majority Voting usually considers how probable each source suggests the correct decision. For source , let and . Hence . We refer to as the trustworthiness of source . In practice, the trustworthiness of a source is usually unknown to a decision maker. And an estimation is used, denoted as , with . We call the value trust, which represents the subjective estimation or belief of the decision maker regarding how probable a source suggests correctly. There exist multiple ways to compute , e.g., counting the frequency of making correct decisions, or Bayesian learning methods based on prior interaction data. In the literature, the trustworthiness of a source can have different meanings, e.g. honesty of an agent in a rating system Muller et al. (2020), competency of a voter Condorcet (1785), reliability of a worker in crowdsourcing Dawid and Skene (1979), correctness of a sensor in crowdsensing Moslem et al. (2012), etc. Whatever the meanings, represents an intrinsic quality or the fact that how probable the source reports correctly, while represents how the decision maker thinks of or estimates that probability Walter (2008). We assume sources independently provide feedback, hence .
In Example 1, suppose , the estimation of by a decision maker may be inaccurate: .
Below, we introduce the Weighted Majority Voting (WMV) decision scheme. It can be treated as an extension of the more commonly known Majority Voting decision scheme. The difference is that Majority Voting treats sources without distinguishing, while WMV assigns sources different weights. The weight of a source is usually determined by how trustworthy its feedback is. Formally:
Definition 0 (Weighted Majority Voting ).
Given a set of sources , their trustworthiness and independent feedback , makes decisions via the function Muller et al. (2020); Nitzan and Paroush (1982):
(1) |
where , with .
To give an instance, consider Example 1, suppose , , and , then . . since , .
Here and the log weight function are well-known for classical WMV in the literature Nitzan and Paroush (1982); Grofman et al. (1983), where trust and trustworthiness are not distinguished. The assumption means that sources with are ignored. For a source with , a decision maker may assign negative weight to its feedback. Or he can just simply reverse the vote of the source (e.g., replacing the reported option A with C). But if either the operation is realized by the malicious sources, they can push the decision to a wrong one by reporting correctly, purposely reducing the chance of correct option being selected. Therefore, it is in the interest of the decision maker to ignore such sources.
It has been shown in the literature that the decision accuracy of WMV is determined by the indicator vectors where it always decides correctly (an example is where all sources report correctly). For such indicator vectors, whether a decision is correct is not influenced by the feedback of the sources that suggest incorrectly. To give an opposite example, consider Example 1. Suppose , and (only reports correctly). We get . Both the feedback and are possible under (both belong to . However, WMV decides correctly by choosing under and decides incorrectly by choosing under . Whether WMV decides correctly is influenced by what incorrect feedback is. Given the same , for , it can be seen that whatever reports, WMV can always decides correctly by trusting .
Let denote the set of all the indicator vectors where WMV always decides correctly when using , namely . It has been proven that and the decision accuracy of is (Refer to Nitzan and Paroush (1982); Berend and Kontorovich (2015); Muller et al. (2020)):
(2) | ||||
Equation 2 indicates that the accuracy of WMV is determined by the probabilities of indicator vectors, which depend on the trustworthiness values of the sources.
In Example 1, if , then and the decision accuracy is 0.9 (i.e., a.l.a source reports correctly).
WMV has been proved to be optimal when trustworthiness and the log weight function are used for decision making and the sources are independent in providing feedback Nitzan and Paroush (1982).
In practice, when trustworthiness is unknown, the weight assigned to each source depends on the trust, that is, . Besides, the decision maker computes the probabilities of indicator vectors with trust values, which we use the subscript to distinguish from their actual probabilities: . With replaced by , the decisions would always be correct for those indicator vectors which the decision maker thinks are more probable than their opposite, namely . As a result, and may be different. In Example 1, if and , then while its opposite indicator vector . This may result in different decision accuracy. We introduce to distinguish:
(3) |
The first parameter of the function represents the value used for decision making, and the second parameter represents the value used to compute the probability of deciding correctly. For , decisions are made using trust values , while the decision accuracy that the decision maker actually obtains still depends on source trustworthiness, which challenges the optimality of WMV.
Generally, both the parameters of can be either trust or trustworthiness, and we assume that the parameter (either trust or trustworthiness) used for decision-making is at least 0.5. Trust values are, by definition, known to the decision-maker. Therefore, it’s reasonable to apply the assumption for trust, meaning ignoring sources with trust below 0.5. For trustworthiness, we assume it is at least 0.5 only when it is used to decide (e.g., in Section 4.1), and otherwise, its value ranges from (e.g., in Section 4.2, 4.3 and 5).
Depending on what we equip the parameters with, trust or trustworthiness, we will obtain different meanings for decision accuracy as follows. The quantity denotes the “ideal” decision accuracy, where the decision maker knows and uses the trustworthiness values to decide and compute. The quantity denotes the ”practical” decision accuracy, where the decision maker decides with the trust values , but the accuracy he actually achieves depends on trustworthiness. The quantity denotes the “perceived” decision accuracy that the decision maker thinks he can obtain (decides and computes accuracy with trust), while the actual accuracy may not equal .
4. Parameter Sensitivity
In this section, we analyze how changes in the values of trust and trustworthiness influence the decision accuracy or the correctness of WMV. There are several ways: 1) how the decision accuracy changes when the trustworthiness and trust change simultaneously; 2) how the decision accuracy changes with trustworthiness when trust remains constant; 3) how the decision accuracy changes with trust when trustworthiness remains constant. If the changes show relatively little effect on the correctness, then we can say that WMV is not very sensitive to the parameters. Sensitivity relates to stability, the analysis in this section provides several important insights for the analysis in the next section.
We will also take numerical analysis based on the setting in the following running Example 2 to further illustrate the theories.
Example 2.
There are four sources and their trustworthiness values are respectively.
4.1. Direct Sensitivity Analysis
Here, we analyze the case where the parameters used for making decisions and that for computing accuracy are equal. There are two different rationales for doing this, but the mathematics is identical for both. First, consider that the decision maker is given the actual trustworthiness values to make decisions. Second, consider analyzing the sensitivity of the beliefs of the decision maker. Assume the decision maker only knows trust values, and uses them to compute their belief about how probable a decision is correct.
For simplicity, we use trustworthiness everywhere, but the analysis remains unchanged when using trust instead (simply put a hat on all ’s and ’s). Observe that if trustworthiness of only source varies, then decision accuracy would appear to be a piecewise linear non-decreasing convex function. In Figure 1(a), we depict Example 2, with each plot representing a source trustworthiness variable.
Lemma 0.
Let , where is constant for . The function is a piecewise linear non-decreasing convex function.
Sketch of Proof.
The computation for correctness of a decision can be characterized as , where and this coefficient increases with increasing. Each decision based on corresponding represents a non-decreasing line. ∎
Generally, if a source is more probable to be trustworthy, the decision will be better and improved faster. Figure 1(a) illustrates Lemma 2. The accuracy of WMV is determined by comparing the and for all the indicator vectors. The relation between and either remains or changes, depending on the value of and how much it changes. Intuitively, this results in the piece-wise characteristics of . Moreover, we can vary trustworthiness values of multiple (or even all) sources. If we vary trustworthiness, then we get an -dimensional piecewise surface. In Figure 1(b), we depict our running example with and on the two axes, and and remaining constant. The surface appears like a collection of intersecting planes, but in fact, the graph consists of surfaces described by a polynomial, rather than a linear one.
![Refer to caption](x1.png)
![Refer to caption](x2.png)
In Lemma 2, we assume that trustworthiness of all the sources remains constant and independent. However, sources can collude to influence decisions, or one can update the trust values of multiple sources at a time. For such situations, we assume the trustworthiness values of multiple sources are consistently equal, meaning they are not independent.
Lemma 0.
In the special case of the identical sources, let , where , and are constant, . The function is a piecewise non-decreasing function, and it is concave (linear or strictly concave) in each segment.
Sketch of Proof.
The computation for correctness can be characterized as a summation: . For each , it meets piecewise non-decreasing property, and concave property in each segment. Thus, also holds. ∎
Note that in Lemma 3, if , meaning all the sources are identical, WMV becomes the classical Majority Voting decision rule and becomes a concave monotonically increasing function Boland (1989).
Figure 2(a) shows how the decision accuracy changes with varying and values where there are originally sources with the same , then identical sources join with their and . Given value, increases piecewisely with , and specifically in each segment, it is concavely increasing. Given value, increases monotonically with . In Figure 2(b), we fix and vary the trustworthiness of the identical sources and the rest sources . This figure illustrates that even the rest sources are in minority, but the higher is, the more insensitive the decision to is. Besides, it demonstrates that when the trustworthiness of multiple sources updates in a particular way, the variation characteristic of the decision accuracy may be captured and described.
![Refer to caption](x3.png)
![Refer to caption](x4.png)
4.2. Trustworthiness Sensitivity Analysis
Next, we analyze the cases where trustworthiness and trust are not identical. The decisions are made based on the trust values , while the probability of each indicator vector is determined by . The probability of deciding correctly is . In this section, trustworthiness varies with trust value fixed. Recall Equation 3, this means that decisions remain unchanged for given feedback (as the set remain unchanged), while decision accuracy may change with trustworthiness.
If only one parameter varies in , then the resulting decision accuracy is a non-decreasing function, which follows trivially from the proof of Lemma 2. In fact, the line corresponds to one of the line segments from the piece-wise linear graph from the previous section as depicted in Figure 3(a). Besides, we can have multiple variables as before. The surface obtained is non-decreasing and polynomial. The surface corresponds to one of the fragments from the graph discussed in the previous section. A 2d example is depicted in Figure 3(b). The result shows that the decision accuracy has a unique continuous differentiable function, rather than a piecewise function with different functions in different segments.
![Refer to caption](x5.png)
![Refer to caption](x6.png)
4.3. Trust Sensitivity Analysis
Alternatively, we can take trust to be variable and trustworthiness to be fixed. Different from before, the actual probability of each indicator vector () now remains unchanged, but the decisions may change (as and accordingly may change). Then we can analyze what happens if a trust value used for decision making moves away from the actual trustworthiness in either direction.
First, consider the case where we vary the trust value of only one source in . We get a uni-modal discontinuous staircase function, which is non-decreasing when and non-increasing when . Figure 4(a) depicts Example 2 with one variable. Second, when trust values of multiple sources are variable, the resulting surface consists of flat fragments at different heights, with an increasing height with proximity to the point . Figure 4(b) depicts Example 2 with and being the variables. Generally:
Lemma 0.
Let , where the trustworthiness is constant. The function is a discontinuous staircase function consisting of flat plateaus. Decision accuracy reaches the maximum at the plateau containing the point .
Sketch of Proof.
The probability that a decision is correct depends on , which is constant. Changing does not affect the probability that a decision is correct, until it reaches a point where it changes the actual decision away from the optimum. Then, there is a discontinuous step to a new platform. ∎
An insight is that the nearby points are more likely to be on the same plateau. In other words, there is an area of trust values around the trustworthiness values, meaning small estimation deviation may be unlikely to affect the accuracy. However, it is possible that a certain trustworthiness is exactly at a border (or corner) of a plateau, meaning that even a tiny difference between trustworthiness and trust can lead to a staircase difference in correctness. The positive news is that the plateaus directly bordering the one containing are still more often correct than the ones further away.
Besides, while both underestimation and overestimation cause wrong judgment on vs. , the numerical (Figure 4) results imply that overestimation perhaps results in the worse accuracy degradation compared with underestimation. Our intuition is that, if there is a high -valued source, then that source tends to have a lot of sway on the vote, so any inaccuracies will be noticeable, whereas a low -valued source tends to only matter in cases where the vote is tight, and thus any inaccuracies tend to matter less. From a micro perspective, the underlying reason might be that overestimation of a trustworthiness value makes the difference also overestimated, while for underestimation, the difference would be underestimated. Therefore, when the estimation error is typically inevitable, it is better to underestimate trustworthiness.
![Refer to caption](x7.png)
![Refer to caption](x8.png)
5. Stability
The results of the Parameter Sensitivity Section are unsurprising. Increasing trustworthiness typically increases correctness, and cannot decrease correctness. Hence, if a source is believed to decide correctly with probability , while its trustworthiness , then the actual correctness achieved is lower than what the decision maker believes: . Vice versa when . Our suspicion is that these two effects cancel each other out, if the algorithm that establishes trust is not biased towards overly trusting or being suspicious on average. We call this property Stability of Correctness, and prove it absolutely holds for WMV.
A better procedure to obtain trust returns values closer to the trustworthiness values, with little variance, meaning what is believed about the sources is close to the ground truth. The quality of the procedure does not affect the Stability of Correctness at all when it is unbiased, which may initially seem counter-intuitive. However, another property captures the idea that even when it’s unbiased, poor trust values still result in worse performance of WMV, Stability of Optimality. We prove Stability of Optimality does not hold absolutely, but that drop in the performance is bounded.
Beforehand, we need to formally define what we mean by an algorithm or procedure to establish trust values, and by it being unbiased (on average).
5.1. Parameter Distributions
We introduce random variables for our parameters. For trustworthiness: is a random variable with outcome , and is a joint random variable with outcome . Similarly, for trust: is a random variable with outcome , and is a joint random variable with outcome . The uncertainty of source trustworthiness may be due to lack of behavior consistency, or experience, so the sources can not provide stable-quality feedback. On the other hand, inadequate interaction with sources or inaccurate modeling by decision maker may incur uncertain trust estimation.
Weighted Majority Voting requires a weight for each source which is determined by (the outcome of ). Practical usage of WMV, therefore, must have some algorithms to arrive at values for . Depending on the quality of the algorithm, there is a degree of correlation between trust and trustworthiness: and . We consider the procedure to get the trust values as unbiased when the expectation of trustworthiness equals the trust value: . Hence, if an unbiased trust value is , then the trustworthiness can sometimes be greater or smaller than . Note that this is a reasonable assumption for various machine learning-based procedures or Bayesian learning in particular. In reality, we cannot guarantee that any machine learning method is completely free of such bias, but the unbiased case is interesting to study, and we expect any residual bias to be fairly small, if the algorithm is configured using sufficient empirical data.
We extend our definition of to accept random variables as parameters. In that case, the output of is a distribution over accuracy. The expectation of such decision accuracy is:
(4) |
Besides, there can be an ideal situation where ”magically” the decision maker knows the actual trustworthiness variable (i.e., ), and can use it to make decisions. The expected probability of making correct decision is,
(5) |
5.2. Stability of Correctness
In this section, we do not care about what the distribution of actually looks like, as long as , meaning the trust values used for decision making are unbiased. The main result is that in this case, the decision accuracy that is believed to achieve by the decision maker, equals the probability that the decision is actually correct. This is an important positive result, that supports the idea of using WMV in practice. Decision makers are not delusional about the correctness of their decisions. Formally, we define the property of Stability of Correctness (SoC) as:
Theorem 5.
Stability of Correctness (SoC): For WMV, if , then .
Sketch of Proof.
It follows from the fact in Section 4.2 that the indicator vector set where decisions are supposed to be correct remains unchanged, when is unchanged. Also consider the fact that each is independently distributed. ∎
We show the results of two Monte Carlo simulations with runs over Example 2 to demonstrate the effect of distribution variance on the expected correctness of an unbiased estimate. In Figure 5(a), we depict and , where trustworthiness is a Beta distribution with expected value equal to trust (unbiased) and a variance set by the -axis. This figure shows the variance of the trustworthiness has no effect on the correctness on average, which confirms our theorem. In contrast, in Figure 5(b), we depict and , letting the trust be the quantity being a random variable, distributed around trustworthiness with increasing variance. Unsurprisingly, this figure shows that a more divergent trust distribution leads to lower average correctness since the trust is more likely to be far away from trustworthiness and results in accuracy degradation. Furthermore, can never exceed , in line with the conclusion of section 4.3. In the next section, we will study this case further.
![Refer to caption](x9.png)
![Refer to caption](x10.png)
5.3. Stability of Optimality
Although for Stability of Correctness the shape (and variance) of the trustworthiness distribution was irrelevant, intuitively a distribution with less variance should still be better for the decision maker. We introduce another stability property in this section to capture this idea: Stability of Optimality (SoO). Formally, it means whether decisions made with the trustworthiness variables revealed are as good as those made only with the trust values available. We formally capture this gap with the definition below:
(6) |
In other words, it also measures compared with using trust to decide, how much the decision accuracy can be improved, if trustworthiness values are available. Note that an equivalent (via Theorem 5) formulation is: , when .
To analyze Stability of Optimality formally, we introduce some definitions. Assume the trustworthiness is bounded in some range, e.g. . Denote the value space of as Hypercube , . The set of vertexes of the Hypercube is denoted as Vertex Space , where each vertex and is either or . Defined in the hypercube, the distribution of with expectation can be arbitrary. We name an extreme distribution for random variables in the vertex space of the hypercube, where , .
When trustworthiness is revealed for decision-making, we observe that a high variance in trustworthiness is good for accuracy, especially the extreme distribution. That is, when a source is more trustworthy than the average, increasing its weight enhances overall decision accuracy. Conversely, when a source is less trustworthy than the average, it can degrade decision quality to some extent, but the impact is mitigated by reducing the weight of this source. In other words, it’s better to have a chance for a source with and for , than a source with . Formally,
Lemma 0.
Take random variables defined in a Hypercube with . The correctness of is bounded by the extreme distribution:
(7) |
Sketch of Proof.
Per Lemma 2, is convex in one dimension, and the extreme distribution maximizes in that dimension. The Lemma follows by independence of the trustworthiness variables. ∎
Lemma 6 demonstrates that the decision accuracy is bounded (not always 100%), even in the ideal situation where trustworthiness is given, and it is determined by the distribution of trustworthiness. This is intuitive as more trustworthy sources should lead to better decisions. Further, if trustworthiness is a constant rather than a random variable, Lemma 6 still holds. That is:
Corollary 0.
For any point in the Hypercube, is bounded by a linear combination of the correctness of the vertexes of the hypercube.
(8) |
Proof.
Let in Lemma 6. ∎
![Refer to caption](x11.png)
![Refer to caption](x12.png)
Stability of Optimality does not strictly hold, as the gap (Equation 6) is typically non-zero. We prove upper bounds on the gap, which goes to as the distribution of trustworthiness becomes tighter. Let quantify the size of the support of the distribution.
Theorem 8.
Stability of Optimality: If and all have support , then
(9) |
A weaker but more intuitive bound is also derived using the Bernoulli Inequality,
(10) |
Sketch of Proof.
While there is a gap between making decisions based on unbiased trust and based on trustworthiness, Theorem 8 proves that this gap is bounded by a relatively small threshold, implying that the unbiased trust would not reduce the decision quality too much. The upper bound is influenced by the distribution of trustworthiness, and converges towards zero with that variance reducing.
To illustrate the effect of distribution variance on , we provide a Monte Carlo simulation with runs over Example 2. In Figure 6(a), we measure and , where is constant, and follows Beta distribution with increasing variance. It presents that the larger the variance is, the larger is, which validates the result of Lemma 6. To put the quantity of the variance in context, we provide examples of trustworthiness being Beta distributions with a certain variance in Figure 6(b).
![Refer to caption](x13.png)
![Refer to caption](x14.png)
![Refer to caption](x15.png)
![Refer to caption](x16.png)
In Figure 7, we provide a parameter analysis with numerical experiments to demonstrate how they influence and the bounds. Example 2 is used in Figures 7(a) and 7(c); Figures 7(b) and 7(d) needed some adaptation, where the sources have identical . And is the default for all the sources.
Figure 7(a) represents that with decreasing, and its bounds also decrease to zero. There is a linear bound on the effect of (via the weaker bound). This implies that the uncertainty level of sources plays a significant role in determining the Stability of Optimality. In Figure 7(b), with the number of identical sources increasing (), and the bounds change little, which implies source number perhaps influences little on the accuracy gap (i.e., the gap in accuracy between decision making using unbiased trust and using trustworthiness). This means that the number of sources may barely influence how valuable it is to know the sources’ combined trustworthiness.
In Figure 7(c), only is variable and it shows that always remains low level and it is a piecewise function with local maximization. As studied in Section 4.1, it becomes evident that the local maximization results from the piece-wise nature of the trustworthiness effect on decision accuracy. In Figure 7(d), where all are equal and increase to simultaneously, almost decreases to . It makes sense because with the trustworthiness increasing, the decision accuracy increases more slowly due to the concavity of WMV with identical sources (See Lemma 3).
To conclude, the gap of Stability of Optimality is somewhat sensitive to parameter , which depicts the range and variance of sources trustworthiness, but not sensitive to the number of sources and other parameters. Overall, the optimality of WMV has a high degree of stability, meaning tends to be close to .
6. Conclusion and Future Work
The common dependence on an estimate or trust of source trustworthiness brings out the need to analyze whether WMV is stable, meaning having tolerant decision inaccuracy with the difference between trust and trustworthiness bounded.
We first analyze how sensitive WMV is to the changes in trust and trustworthiness. We find that small deviation between trust and trustworthiness does not affect accuracy, and also underestimation usually harms less than overestimation. We then introduced two statistical properties of WMV, Stability of Correctness and Stability of Optimality. Assuming that on average the estimation procedure has no bias towards over or underestimating, we proved that Stability of Correctness holds absolutely, regardless of which estimation procedure is used or how well it estimates. This guarantees that relying on an unbiased estimate of source trustworthiness is safe, which is also common in practice. However, the amount of inefficiency introduced by relying on an estimate instead of the trustworthiness itself is limited, as we prove a linear bound on Stability of Optimality. The proposed formal framework and the two types of stability properties can be generalized to analyze other types of decision mechanisms or scenarios (e.g., where sources are dependent).
For future work, beyond the bounded assumption, it’s valuable to explore a more precise characterization of the impact of the trustworthiness distribution on SoO in the unbiased setting. Besides, it is also worth studying the stability of WMV in a more general case, namely when trust is a biased estimate of trustworthiness. Some researchers have found that although some sources are assigned weights, they have no influence on the decision result Allouche et al. (2021); Bowen (2009). In other words, we may distribute more estimate error on such sources.
This work was supported by National Natural Science Foundation of China (NSFC) under Grant 62106223 and (NSFC) Grant 62293511.
References
- (1)
- Allouche et al. (2021) Tahar Allouche, Bruno Escoffier, Stefano Moretti, and Meltem Öztürk. 2021. Social ranking manipulability for the cp-majority, Banzhaf and lexicographic excellence solutions. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 17–23.
- Bentahar et al. (2022) Jamal Bentahar, Nagat Drawel, and Abdeladim Sadiki. 2022. Quantitative group trust: A two-stage verification approach. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 100–108.
- Berend and Kontorovich (2013) Daniel Berend and Aryeh Kontorovich. 2013. A sharp estimate of the binomial mean absolute deviation with applications. Statistics & Probability Letters 83, 4 (2013), 1254–1259. https://doi.org/10.1016/j.spl.2013.01.023
- Berend and Kontorovich (2015) Daniel Berend and Aryeh Kontorovich. 2015. A finite sample analysis of the Naive Bayes classifier. J. Mach. Learn. Res. 16, 1 (2015), 1519–1545.
- Boland (1989) Philip J. Boland. 1989. Majority Systems and the Condorcet Jury Theorem. Journal of the Royal Statistical Society: Series D (The Statistician) 38, 3 (1989), 181–189.
- Bowen (2009) Larry Bowen. 2009. Weighted voting systems.
- Carbo and Molina (2023) Javier Carbo and Jose M Molina. 2023. Promoting cooperation of agents through aggregation of services in trust models. Knowledge-Based Systems 277 (2023), 110804.
- Condorcet (1785) marquis de Condorcet, Jean-Antoine-Nicolas de Caritat. 1785. Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie royale. 1743–1794 pages.
- Dawid and Skene (1979) Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28, 1 (1979), 20–28.
- Dong et al. (2015) Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based trust: estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment 8, 9 (2015), 938–949.
- Drawel et al. (2022) Nagat Drawel, Jamal Bentahar, Amine Laarej, and Gaith Rjoub. 2022. Formal verification of group and propagated trust in multi-agent systems. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 19.
- Drawel et al. (2020) Nagat Drawel, Hongyang Qu, Jamal Bentahar, and Elhadi Shakshuki. 2020. Specification and automatic verification of trust-based multi-agent systems. Future Generation Computer Systems 107 (2020), 1047–1060.
- Freedman (1963) David A Freedman. 1963. On the asymptotic behavior of Bayes’ estimates in the discrete case. The Annals of Mathematical Statistics 34, 4 (1963), 1386–1403.
- Gao et al. (2016) Chao Gao, Yu Lu, and Dengyong Zhou. 2016. Exact exponent in optimal rates for crowdsourcing. In International Conference on Machine Learning. PMLR, 603–611.
- Ge et al. (2023) Yan Ge, Jun Ma, Li Zhang, ** Lu. 2023. Trustworthiness-aware knowledge graph representation for recommendation. Knowledge-Based Systems 278 (2023), 110865.
- Germain et al. (2015) Pascal Germain, Alexandre Lacasse, Francois Laviolette, Mario March, and Jean-Francis Roy. 2015. Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm. Journal of Machine Learning Research 16, 26 (2015), 787–860.
- Grofman et al. (1983) Bernard Grofman, Guillermo Owen, and Scott L Feld. 1983. Thirteen theorems in search of the truth. Theory and decision 15, 3 (1983), 261–278.
- Guan et al. (2018) Melody Guan, Varun Gulshan, Andrew Dai, and Geoffrey Hinton. 2018. Who Said What: Modeling Individual Labelers Improves Classification. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (Apr. 2018).
- Guo (2023) Zhaori Guo. 2023. Multi-Advisor Dynamic Decision Making. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 2949–2951.
- James (2020) JQ James. 2020. Sybil attack identification for crowdsourced navigation: A self-supervised deep learning approach. IEEE Transactions on Intelligent Transportation Systems 22, 7 (2020), 4622–4634.
- Kotary et al. (2023) James Kotary, Vincenzo Di Vito, and Ferdinando Fioretto. 2023. Differentiable model selection for ensemble learning. In Proceedings of the Fifteen International Joint Conference on Artificial Intelligence, IJCAI-23.
- Lacasse et al. (2006) Alexandre Lacasse, François Laviolette, Mario Marchand, Pascal Germain, and Nicolas Usunier. 2006. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In NIPS. 769–776.
- Li and Yu (2014) Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086 (2014).
- Littlestone and Warmuth (1994) N. Littlestone and M.K. Warmuth. 1994. The Weighted Majority Algorithm. Information and Computation 108, 2 (1994), 212–261.
- Luo (2023) Yuan Luo. 2023. Incentivizing Sequential Crowdsourcing Systems. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 2697–2699.
- Manino et al. (2019a) Edoardo Manino, Long Tran-Thanh, and Nicholas Jennings. 2019a. Streaming Bayesian inference for crowdsourced classification. Advances in Neural Information Processing Systems 32 (2019), 12782–12792.
- Manino et al. (2019b) Edoardo Manino, Long Tran-Thanh, and Nicholas R Jennings. 2019b. On the efficiency of data collection for multiple Naïve Bayes classifiers. Artificial Intelligence 275 (2019), 356–378.
- Martín-Morató and Mesaros (2023) Irene Martín-Morató and Annamaria Mesaros. 2023. Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023), 902–914.
- Maystre et al. (2021) Lucas Maystre, Nagarjuna Kumarappan, Judith Bütepage, and Mounia Lalmas. 2021. Collaborative Classification from Noisy Labels. In International Conference on Artificial Intelligence and Statistics. PMLR, 1639–1647.
- Mazzetto et al. (2021) Alessio Mazzetto, Dylan Sam, Andrew Park, Eli Upfal, and Stephen Bach. 2021. Semi-supervised aggregation of dependent weak supervision sources with performance guarantees. In International Conference on Artificial Intelligence and Statistics. PMLR, 3196–3204.
- Meir et al. (2023) Reshef Meir, Ofra Amir, Omer Ben-Porat, Tsviel Ben Shabat, Gal Cohensius, and Lirong Xia. 2023. Frustratingly easy truth discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 6074–6083.
- Moslem et al. (2012) Bassam Moslem, Mohamad Diab, Mohamad Khalil, and Catherine Marque. 2012. Combining data fusion with multiresolution analysis for improving the classification accuracy of uterine EMG signals. EURASIP Journal on Advances in Signal Processing 2012, 1 (2012), 1–9.
- Muller et al. (2020) Tim Muller, Dongxia Wang, and Jun Sun. 2020. Provably Robust Decisions based on Potentially Malicious Sources of Information. In 2020 IEEE 33rd Computer Security Foundations Symposium (CSF). IEEE, 411–424.
- Nitzan and Paroush (1982) Shmuel Nitzan and Jacob Paroush. 1982. Optimal decision rules in uncertain dichotomous choice situations. International Economic Review (1982), 289–297.
- Raykar et al. (2010) Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. Journal of Machine Learning Research 11, 4 (2010).
- Rekatsinas et al. (2017) Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Ré. 2017. Slimfast: Guaranteed results for data fusion and source reliability. In Proceedings of the 2017 ACM International Conference on Management of Data. 1399–1414.
- Sardana et al. (2018) Noel Sardana, Robin Cohen, Jie Zhang, and Shuo Chen. 2018. A Bayesian multiagent trust model for social networks. IEEE Transactions on Computational Social Systems 5, 4 (2018), 995–1008.
- Sen (1977) Amartya Sen. 1977. Social choice theory: A re-examination. Econometrica: journal of the Econometric Society (1977), 53–89.
- Telang et al. (2023) Pankaj Telang, Munindar P Singh, and Neil Yorke-Smith. 2023. Maintenance commitments: Conception, semantics, and coherence. Artificial Intelligence 324 (2023), 103993.
- Tong and Kain (1991) Zhijun Tong and Richard Y Kain. 1991. Vote assignments in weighted voting mechanisms. IEEE Trans. Comput. 40, 05 (1991), 664–667.
- Walter (2008) Elizabeth Walter. 2008. Cambridge advanced learner’s dictionary. Cambridge university press.
- Wu et al. (2023) Gongqing Wu, Xingrui Zhuo, Xianyu Bao, Xuegang Hu, Richang Hong, and Xindong Wu. 2023. Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph Embedding. ACM Transactions on Knowledge Discovery from Data 17, 5 (2023), 1–26.
- Wu and Yang (2016) Yihong Wu and Pengkun Yang. 2016. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory 62, 6 (2016), 3702–3720.
- Wu et al. (2021a) Yi-Shan Wu, Andres Masegosa, Stephan Lorenzen, Christian Igel, and Yevgeny Seldin. 2021a. Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote. Advances in Neural Information Processing Systems 34 (2021).
- Wu et al. (2021b) Yi-Shan Wu, Andres Masegosa, Stephan Lorenzen, Christian Igel, and Yevgeny Seldin. 2021b. Chebyshev-Cantelli PAC-Bayes-Bennett inequality for the weighted majority vote. Advances in Neural Information Processing Systems 34 (2021), 12625–12636.
- Yu et al. (2004) Bin Yu, Munindar P Singh, and Katia Sycara. 2004. Develo** trust in large-scale peer-to-peer systems. In IEEE First Symposium onMulti-Agent Security and Survivability, 2004. IEEE, 1–10.
- Zeynalvand et al. (2021) Leonit Zeynalvand, Tie Luo, Ewa Andrejczuk, Dusit Niyato, Sin G. Teo, and Jie Zhang. 2021. A Blockchain-Enabled Quantitative Approach to Trust and Reputation Management with Sparse Evidence. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (Virtual Event, United Kingdom) (AAMAS ’21). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1707–1708.
- Zhang et al. (2016) Yuchen Zhang, Xi Chen, Dengyong Zhou, and Michael I Jordan. 2016. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. The Journal of Machine Learning Research 17, 1 (2016), 3537–3580.