License: arXiv.org perpetual non-exclusive license
arXiv:2404.05272v1 [cs.AI] 08 Apr 2024

Constructing Data Transaction Chains Based on Opportunity Cost Exploration

Jie Liu, Tao Feng, Yan Jiang, Peizheng Wang, ✉Chao Wu, Zhejiang University
Abstract.

Data trading is increasingly gaining attention. However, the inherent replicability and privacy concerns of data make it challenging to directly apply traditional trading theories to data markets. This paper compares data trading markets with traditional ones, focusing particularly on how the replicability and privacy of data impact data markets. We discuss how data’s replicability fundamentally alters the concept of opportunity cost in traditional microeconomics within the context of data markets. Additionally, we explore how to leverage this change to maximize benefits without compromising data privacy. This paper outlines the constraints for data circulation within the privacy domain chain and presents a model that maximizes data’s value under these constraints. Specific application scenarios are provided, and experiments demonstrate the solvability of this model.

conference: ; ;

1. Introduction

Over the past few decades, the pricing and trading of the data, as well as the associated trading markets, have experienced rapid development. There have been many relevant research results, see (Zhang et al., 2023; Pei, 2020).

The design of market mechanisms for data trading is one of the popular research directions recently, with an increasing number of studies focusing on this area. Unlike traditional market trading, the data elements can be replicated at a low cost and the privacy of the data elements need to be protected. It leads to traditional trading mechanisms being impossible to fully graft onto the data transactions. In (Agarwal et al., 2019), the authors introduce an effective data trading mechanism, which gives the strategies for matching buyers and sellers as well as how the market interacts. Additionally, the authors present a robust-to-replication algorithm, ensuring that even if the traded data is replicated, it will not affect its market pricing. In (Chen et al., 2022), the authors introduce another data trading mechanism to prevent the devaluation of the seller’s data after replication. This method involves the seller initially presenting a portion of the data to the buyer as sample data along with a selling price. Then the buyer conducts Bayesian inference to predict the accuracy and quality of the data via the corresponding prior knowledge, thus determining whether to purchase the data at the offered price or not. In terms of privacy protection, (Amiri et al., 2023) presents a trading mechanism that not only protects data privacy but also allows the seller to demonstrate the quality of the data to the buyer. In this paper, the quality of the data depends on the relevance of the sold data to the buyer’s needs. This mechanism incorporates the traditional privacy protection method, i.e., principal component analysis, see also (Mangoubi et al., 2022; Leake et al., 2021).

The valuation of data points, which involves assessing and ranking the importance of each data point within a dataset, is also a crucial aspect of data trading. The most classic data valuation method is the Shapley value method from traditional game theory(Jia et al., 2019; Ghorbani and Zou, 2019), i.e., viewing each data point as a cooperating member, the importance of each data point is evaluated based on the trained model accuracy. We refer to the Shapley value method in Section 5. In (Wang et al., 2020), the authors proposed a valuation algorithm that combines the Shapley value with federated learning. In this case, federated learning ensures data privacy on the client servers, while Shapley value calculation guarantees fairness in data valuation.

We consider the transaction process on a data chain. In (Karlaš et al., 2022), the authors present a data valuation algorithm for assessing the value of each data point at different positions in the data processing chain. The advantage of this algorithm is that the authors no longer focus on a single model training scenario, but rather separate the modeling process for discussion. In (Yu et al., 2023), the authors model the transaction process between nodes as a Markov decision process and use reinforcement learning algorithms for multiple transactions to find the optimal data pricing mechanism.

Our contribution There has been fruitful research on the design of data trading market mechanisms and the valuation of data and models. However, most of them overlook the differences between the data trading market and the traditional trading market, which are influenced by data replicability and privacy, and due to these differences, how to construct models to maximize the benefits for the supply-side nodes. We give two main aspects in this paper. Firstly, the trading field of the data is limited by the privacy. Also, we consider the data trading path to be chain-like in this paper. Due to the replicability, for each node on the data chain, selling the training model to the market, and selling the data itself to the downstream node, which is then used by the downstream node to train the model for sale to the market, are not mutually exclusive. It leads to differences in the opportunity cost of data transactions compared to the opportunity cost proposed in classic microeconomic theory. To the best of our knowledge, we are the first to discuss the opportunity cost in data transactions.

2. Motivation

In microeconomic theory, the concept of opportunity cost is a prevalent notion. It reveals the value that is given up in order to obtain the higher value from the selected opportunity, see (Buchanan, 1991).

We consider the opportunity cost in the traditional industry. The seller has several options to make a profit, such as selling the product directly to the market, or reprocessing the product to obtain a higher unit price and then selling it to the market, along with other options within legal and other limitations, see Figure  1. Rational sellers will always choose the option with the highest total revenue, and the next best alternative not chosen represents the corresponding opportunity cost.

Refer to caption
Figure 1. Traditional Industrial Scene

In the data transaction, such a trade-off naturally exists. However, unlike traditional scenarios, data has two distinctive characteristics, replicability and privacy. The replicability determines that some transaction options are not mutually exclusive. For instance, sellers can both sell the data to others and train the model by themselves at the same time. Selling data to others will result in multiple models based on the same dataset being sold in the market subsequently, and leads to competition. At this point, the seller chooses between two options: directly training models for sale or selling the data to others while simultaneously training models for sale, see Figure  2. The option not chosen represents the corresponding opportunity cost.

Privacy determines that it is also necessary to consider the privacy protection of the data in the transaction. For instance, it is not allowed to sell the data to the market directly even if it generates more revenue, since it leads to data privacy leakage.

Refer to caption
Figure 2. Data Trading Market Scene
Remark 2.1 ().

In this paper, we assume the privacy of the data must not be compromised in any way. However, in practical scenarios, such as in communication contexts, this condition may not be entirely achievable. In (Shokri et al., 2012), the authors present a strategy for customers to obtain more accurate services by disclosing partial location privacy to the service provider. Moreover, the customers’ disclosure of one-unit privacy in exchange for a one-unit improvement in service is referred to as the shadow price of service.

Remark 2.2 ().

In the data transaction situation, the data can be directly traded within the scope of privacy permissions, or it can be used to train and trade models afterward. Also, both the circulation of data itself and the circulation of models coexist. Then it is hard to distinguish the supply-side and the demand-side. In this paper, for a fixed dataset D𝐷Ditalic_D, we define the supply-side as the field that allows data to circulate directly, and we define the demand-side as the field that can only purchase the trained models.

3. Modeling

In this paper, we focus on designing a mechanism for a data transaction chain to maximize the total revenue, see Figure  3.

Refer to caption
Figure 3. Data Trading Process (Dataset D cannot be directly sold to the market)

Let 𝕊:s1s2sn:𝕊subscript𝑠1subscript𝑠2subscript𝑠𝑛\mathbb{S}:s_{1}\to s_{2}\to\cdots\to s_{n}blackboard_S : italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → ⋯ → italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be a data transaction chain, and initially, the node s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT possesses a dataset D𝐷Ditalic_D. The notations we use are shown in Table 1.

Table 1. Notations
𝕊𝕊\mathbb{S}blackboard_S Data transaction chain sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT The i𝑖iitalic_i-th node Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT The model trained by sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Sales volume of Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the market pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Unit price of Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Cost for training Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT p~isubscript~𝑝𝑖\tilde{p}_{i}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Price where the node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT sells D𝐷Ditalic_D to si+1subscript𝑠𝑖1s_{i+1}italic_s start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT111The motivation of considering a discount factor is that another model from the next node si+1subscript𝑠𝑖1s_{i+1}italic_s start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT based on the same dataset leads to competition. Also, it is obvious that the discount factor corresponding to the terminal node is 1111. Discount factor of Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

There are two options for s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT making profits, i.e., training and selling model T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to market and selling dataset D𝐷Ditalic_D to s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The revenue with the transaction not happening of node s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is r1p1c1subscript𝑟1subscript𝑝1subscript𝑐1r_{1}p_{1}-c_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Also, the revenue with the transaction is p~1+δ1r1p1c1subscript~𝑝1subscript𝛿1subscript𝑟1subscript𝑝1subscript𝑐1\tilde{p}_{1}+\delta_{1}r_{1}p_{1}-c_{1}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The transaction happens only if the node s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT obtains higher revenue, i.e.,

r1p1c1p~1+δ1r1p1c1.subscript𝑟1subscript𝑝1subscript𝑐1subscript~𝑝1subscript𝛿1subscript𝑟1subscript𝑝1subscript𝑐1r_{1}p_{1}-c_{1}\leq\tilde{p}_{1}+\delta_{1}r_{1}p_{1}-c_{1}.italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .

In s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s perspective, the trading between s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT happens if and only if the revenue surpasses the cost, i.e.,

r2p2p~1c20.subscript𝑟2subscript𝑝2subscript~𝑝1subscript𝑐20r_{2}p_{2}-\tilde{p}_{1}-c_{2}\geq 0.italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ 0 .

The transaction between s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT happens if and only if the solution set of p~1subscript~𝑝1\tilde{p}_{1}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is non-empty, i.e.,

r1p1δ1r1p1r2p2c2.subscript𝑟1subscript𝑝1subscript𝛿1subscript𝑟1subscript𝑝1subscript𝑟2subscript𝑝2subscript𝑐2r_{1}p_{1}-\delta_{1}r_{1}p_{1}\leq r_{2}p_{2}-c_{2}.italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

Similarly, the trading between sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and si+1subscript𝑠𝑖1s_{i+1}italic_s start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT happens if and only if, in sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s perspective, the revenue with trading happening exceeds the revenue without trading happening, i.e.,

ripicip~i1p~i+δiripicip~i1subscript𝑟𝑖subscript𝑝𝑖subscript𝑐𝑖subscript~𝑝𝑖1subscript~𝑝𝑖subscript𝛿𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑐𝑖subscript~𝑝𝑖1r_{i}p_{i}-c_{i}-\tilde{p}_{i-1}\leq\tilde{p}_{i}+\delta_{i}r_{i}p_{i}-c_{i}-% \tilde{p}_{i-1}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ≤ over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT

and in si+1subscript𝑠𝑖1s_{i+1}italic_s start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT’s perspective, the revenue surpasses the cost, i.e.,

ri+1pi+1ci+1p~i0.subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑐𝑖1subscript~𝑝𝑖0r_{i+1}p_{i+1}-c_{i+1}-\tilde{p}_{i}\geq 0.italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 .

In this case, the transaction between sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and si+1subscript𝑠𝑖1s_{i+1}italic_s start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT happens if and only if the solution set of p~isubscript~𝑝𝑖\tilde{p}_{i}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is non-empty, i.e.,

(1) ripiδiripiri+1pi+1ci+1.subscript𝑟𝑖subscript𝑝𝑖subscript𝛿𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑐𝑖1r_{i}p_{i}-\delta_{i}r_{i}p_{i}\leq r_{i+1}p_{i+1}-c_{i+1}.italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT .
Remark 3.1 ().

We give an alternative perspective on this mechanism. From the inter-node transaction’s perspective, (1) provides a necessary and sufficient condition for this transaction to be achieved. Moreover, after re-arranging (1), we have

ci+1ri+1pi+1+δiripiripi.subscript𝑐𝑖1subscript𝑟𝑖1subscript𝑝𝑖1subscript𝛿𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑝𝑖c_{i+1}\leq r_{i+1}p_{i+1}+\delta_{i}r_{i}p_{i}-r_{i}p_{i}.italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ≤ italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

It mentions that, from the data transaction chain’s perspective, the necessary and sufficient condition for dataset D𝐷Ditalic_D flowing in this arrow is that the difference in profit between node trading and non-trading can cover the downstream node’s model training cost.

Now we give the model. Our goal is to maximize the total revenue within the constraints of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT(i=1,2,,n𝑖12𝑛i=1,2,\ldots,nitalic_i = 1 , 2 , … , italic_n). Also, we need to add the constraints that the trade between sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and si+1subscript𝑠𝑖1s_{i+1}italic_s start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT happens. We have

(2) maxni=1riδipi𝑖1superscript𝑛subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖\displaystyle\max\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}r_{i}% \delta_{i}p_{i}roman_max start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
ripiδiripiri+1pi+1ci+1,i=1,2,,n1formulae-sequencesubscript𝑟𝑖subscript𝑝𝑖subscript𝛿𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑐𝑖1𝑖12𝑛1\displaystyle r_{i}p_{i}-\delta_{i}r_{i}p_{i}\leq r_{i+1}p_{i+1}-c_{i+1},\quad i% =1,2,\ldots,n-1italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_n - 1
δn=1subscript𝛿𝑛1\displaystyle\delta_{n}=1italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1
constraints of ci’s.constraints of subscript𝑐𝑖’s\displaystyle\text{constraints of }c_{i}\text{'s}.constraints of italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ’s .
Remark 3.2 ().

In the previous context, we assume that the training of models in downstream nodes depends on the training results of upstream nodes, meaning that all nodes involved in the data transaction train the data. However, there is another possible scenario in real-world settings that, upstream nodes can choose to sell data directly to downstream nodes. Additionally, downstream model training relies solely on the data itself rather than the training results of upstream nodes. In this case, upstream nodes can assess whether the revenue for selling the model can cover the training cost if they sell the data to downstream nodes simultaneously. The corresponding programming is

maxni=1riδipi𝑖1superscript𝑛subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖\displaystyle\max\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}r_{i}% \delta_{i}p_{i}roman_max start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
ripicimax{δiripici,0}ri+1pi+1ci+1,i=1,2,,n1formulae-sequencesubscript𝑟𝑖subscript𝑝𝑖subscript𝑐𝑖subscript𝛿𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑐𝑖0subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑐𝑖1𝑖12𝑛1\displaystyle r_{i}p_{i}-c_{i}-\max\{\delta_{i}r_{i}p_{i}-c_{i},0\}\leq r_{i+1% }p_{i+1}-c_{i+1},\quad i=1,2,\ldots,n-1italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_max { italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 0 } ≤ italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_n - 1
δn=1subscript𝛿𝑛1\displaystyle\delta_{n}=1italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1
constraints of ci’s.constraints of subscript𝑐𝑖’s\displaystyle\text{constraints of }c_{i}\text{'s}.constraints of italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ’s .
Remark 3.3 ().

In some cases, risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be reflected as a function of the cost cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, since both risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are related to the model Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s accuracy aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We define

ri=fi(E[ai|ci]).subscript𝑟𝑖subscript𝑓𝑖𝐸delimited-[]conditionalsubscript𝑎𝑖subscript𝑐𝑖r_{i}=f_{i}(E[a_{i}|c_{i}]).italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_E [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) .

Also, by Bayesian inference, we have

Pr(ai|ci)=Pr(ai,ci)Pr(ci)=Pr(ci|ai)μi(ai)aiPr(ci|ai)μi(ai).𝑃𝑟conditionalsubscript𝑎𝑖subscript𝑐𝑖𝑃𝑟subscript𝑎𝑖subscript𝑐𝑖𝑃𝑟subscript𝑐𝑖𝑃𝑟conditionalsubscript𝑐𝑖subscript𝑎𝑖subscript𝜇𝑖subscript𝑎𝑖subscriptsubscript𝑎𝑖𝑃𝑟conditionalsubscript𝑐𝑖subscript𝑎𝑖subscript𝜇𝑖subscript𝑎𝑖Pr(a_{i}|c_{i})=\frac{Pr(a_{i},c_{i})}{Pr(c_{i})}=\frac{Pr(c_{i}|a_{i})\mu_{i}% (a_{i})}{\sum_{a_{i}}Pr(c_{i}|a_{i})\mu_{i}(a_{i})}.italic_P italic_r ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG italic_P italic_r ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG = divide start_ARG italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG .

It leads to

E[ai|ci]=aiaiPr(ci|ai)μi(ai)aiPr(ci|ai)μi(ai)𝐸delimited-[]conditionalsubscript𝑎𝑖subscript𝑐𝑖subscriptsubscript𝑎𝑖subscript𝑎𝑖𝑃𝑟conditionalsubscript𝑐𝑖subscript𝑎𝑖subscript𝜇𝑖subscript𝑎𝑖subscriptsubscript𝑎𝑖𝑃𝑟conditionalsubscript𝑐𝑖subscript𝑎𝑖subscript𝜇𝑖subscript𝑎𝑖E[a_{i}|c_{i}]=\sum_{a_{i}}a_{i}\frac{Pr(c_{i}|a_{i})\mu_{i}(a_{i})}{\sum_{a_{% i}}Pr(c_{i}|a_{i})\mu_{i}(a_{i})}italic_E [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG

where μi(ai)subscript𝜇𝑖subscript𝑎𝑖\mu_{i}(a_{i})italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the accuracy profile of the node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Pr(ci|ai)𝑃𝑟conditionalsubscript𝑐𝑖subscript𝑎𝑖Pr(c_{i}|a_{i})italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and μi(ai)subscript𝜇𝑖subscript𝑎𝑖\mu_{i}(a_{i})italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are prior knowledge. Then we have

ri=fi(aiaiPr(ci|ai)μi(ai)aiPr(ci|ai)μi(ai)).subscript𝑟𝑖subscript𝑓𝑖subscriptsubscript𝑎𝑖subscript𝑎𝑖𝑃𝑟conditionalsubscript𝑐𝑖subscript𝑎𝑖subscript𝜇𝑖subscript𝑎𝑖subscriptsubscript𝑎𝑖𝑃𝑟conditionalsubscript𝑐𝑖subscript𝑎𝑖subscript𝜇𝑖subscript𝑎𝑖r_{i}=f_{i}(\sum_{a_{i}}a_{i}\frac{Pr(c_{i}|a_{i})\mu_{i}(a_{i})}{\sum_{a_{i}}% Pr(c_{i}|a_{i})\mu_{i}(a_{i})}).italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P italic_r ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ) .

The method we utilized is referred from (Chen et al., 2022).

4. Optimization

In Section 3, the general data chain optimization model is given. In this section, we consider two simple examples, where risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a linear function of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the cost constraints can be expressed in the form of matrix inequalities, i.e., ACG𝐴𝐶𝐺AC\leq Gitalic_A italic_C ≤ italic_G where A𝐴Aitalic_A is an a×n𝑎𝑛a\times nitalic_a × italic_n matrix, C=(c1,c2,,cn)T𝐶superscriptsubscript𝑐1subscript𝑐2subscript𝑐𝑛𝑇C=(c_{1},c_{2},\cdots,c_{n})^{T}italic_C = ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and G=(g1,g2,,ga)T𝐺superscriptsubscript𝑔1subscript𝑔2subscript𝑔𝑎𝑇G=(g_{1},g_{2},\cdots,g_{a})^{T}italic_G = ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. We define ri=kicibisubscript𝑟𝑖subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖r_{i}=k_{i}c_{i}-b_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where ki,bi0subscript𝑘𝑖subscript𝑏𝑖0k_{i},b_{i}\geq 0italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0.

4.1. Linear programming and shadow price

In this case, (2) can be written as

maxni=1(kicibi)δipi𝑖1superscript𝑛subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖subscript𝛿𝑖subscript𝑝𝑖\displaystyle\max\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}(k_{i}c_{i% }-b_{i})\delta_{i}p_{i}roman_max start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG ( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
ACG𝐴𝐶𝐺\displaystyle AC\leq Gitalic_A italic_C ≤ italic_G
(kicibi)piδi(kicibi)pi(ki+1ci+1bi+1)pi+1ci+1subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖subscript𝑝𝑖subscript𝛿𝑖subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖subscript𝑝𝑖subscript𝑘𝑖1subscript𝑐𝑖1subscript𝑏𝑖1subscript𝑝𝑖1subscript𝑐𝑖1\displaystyle(k_{i}c_{i}-b_{i})p_{i}-\delta_{i}(k_{i}c_{i}-b_{i})p_{i}\leq(k_{% i+1}c_{i+1}-b_{i+1})p_{i+1}-c_{i+1}( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ ( italic_k start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT
kicibi0subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖0\displaystyle k_{i}c_{i}-b_{i}\geq 0italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0
ci0subscript𝑐𝑖0\displaystyle c_{i}\geq 0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0
δn=1.subscript𝛿𝑛1\displaystyle\delta_{n}=1.italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1 .
Remark 4.1 ().

We consider the case where bi=0subscript𝑏𝑖0b_{i}=0italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 and the corresponding dual linear programming is

minai=1yigi𝑖1superscript𝑎subscript𝑦𝑖subscript𝑔𝑖\displaystyle\min\underset{i=1}{\stackrel{{\scriptstyle a}}{{\sum}}}y_{i}g_{i}roman_min start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Y(AΔ)(P0)𝑌𝐴Δ𝑃0\displaystyle Y\left(\begin{array}[]{c}A\\ \Delta\end{array}\right)\geq\left(\begin{array}[]{c}P\\ 0\end{array}\right)italic_Y ( start_ARRAY start_ROW start_CELL italic_A end_CELL end_ROW start_ROW start_CELL roman_Δ end_CELL end_ROW end_ARRAY ) ≥ ( start_ARRAY start_ROW start_CELL italic_P end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARRAY )
yi0subscript𝑦𝑖0\displaystyle y_{i}\geq 0italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0
δn=1subscript𝛿𝑛1\displaystyle\delta_{n}=1italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1

where Y=(y1,y2,,ya+n1)𝑌subscript𝑦1subscript𝑦2normal-⋯subscript𝑦𝑎𝑛1Y=(y_{1},y_{2},\cdots,y_{a+n-1})italic_Y = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_y start_POSTSUBSCRIPT italic_a + italic_n - 1 end_POSTSUBSCRIPT ), P=(k1δ1p1,k2δ2p2,,knδnpn)𝑃subscript𝑘1subscript𝛿1subscript𝑝1subscript𝑘2subscript𝛿2subscript𝑝2normal-⋯subscript𝑘𝑛subscript𝛿𝑛subscript𝑝𝑛P=(k_{1}\delta_{1}p_{1},k_{2}\delta_{2}p_{2},\cdots,k_{n}\delta_{n}p_{n})italic_P = ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and

Δ=((δ11)k1p1(k2p21)0000(δ21)k2p2(k3p31)00000(δn11)kn1pn1(knpn1)).Δsubscript𝛿11subscript𝑘1subscript𝑝1subscript𝑘2subscript𝑝210000subscript𝛿21subscript𝑘2subscript𝑝2subscript𝑘3subscript𝑝3100000subscript𝛿𝑛11subscript𝑘𝑛1subscript𝑝𝑛1subscript𝑘𝑛subscript𝑝𝑛1\Delta=\left(\begin{array}[]{cccccc}(\delta_{1}-1)k_{1}p_{1}&(k_{2}p_{2}-1)&0&% \cdots&0&0\\ 0&(\delta_{2}-1)k_{2}p_{2}&(k_{3}p_{3}-1)&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&(\delta_{n-1}-1)k_{n-1}p_{n-1}&(k_{n}p_{n}-1)\end{array}\right).roman_Δ = ( start_ARRAY start_ROW start_CELL ( italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL ( italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL ( italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL ( italic_k start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - 1 ) end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL ( italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - 1 ) italic_k start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT end_CELL start_CELL ( italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1 ) end_CELL end_ROW end_ARRAY ) .

This linear programming model solves for minimizing the overall model training cost while ensuring the profit for each node. The cost required for increasing the profit by one unit for each node is the corresponding shadow price.

4.2. Non-linear programming and Lagrange duality

We consider the programming in Remark 3.2 where

(3) maxni=1(kicibi)δipi𝑖1superscript𝑛subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖subscript𝛿𝑖subscript𝑝𝑖\displaystyle\max\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}(k_{i}c_{i% }-b_{i})\delta_{i}p_{i}roman_max start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG ( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
ACG𝐴𝐶𝐺\displaystyle AC\leq Gitalic_A italic_C ≤ italic_G
(kicibi)picimax{δi(kicibi)pici,0}(ki+1ci+1bi+1)pi+1ci+1subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖subscript𝑝𝑖subscript𝑐𝑖subscript𝛿𝑖subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖subscript𝑝𝑖subscript𝑐𝑖0subscript𝑘𝑖1subscript𝑐𝑖1subscript𝑏𝑖1subscript𝑝𝑖1subscript𝑐𝑖1\displaystyle(k_{i}c_{i}-b_{i})p_{i}-c_{i}-\max\{\delta_{i}(k_{i}c_{i}-b_{i})p% _{i}-c_{i},0\}\leq(k_{i+1}c_{i+1}-b_{i+1})p_{i+1}-c_{i+1}( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_max { italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 0 } ≤ ( italic_k start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT
kicibi0subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖0\displaystyle k_{i}c_{i}-b_{i}\geq 0italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0
ci0subscript𝑐𝑖0\displaystyle c_{i}\geq 0italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0
δn=1.subscript𝛿𝑛1\displaystyle\delta_{n}=1.italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1 .

Note that this optimization model is not convex, then we can use Lagrange duality to find a lower bound of the optimal solution. We consider the equivalent form of model (3). We recall ri=kicibisubscript𝑟𝑖subscript𝑘𝑖subscript𝑐𝑖subscript𝑏𝑖r_{i}=k_{i}c_{i}-b_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and we have

maxni=1riδipi𝑖1superscript𝑛subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖\displaystyle\max\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}r_{i}% \delta_{i}p_{i}roman_max start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
ARGsuperscript𝐴𝑅superscript𝐺\displaystyle A^{\prime}R\leq G^{\prime}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_R ≤ italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
ripiri+bikimax{riδipiri+biki,0}ri+1pi+1ri+1+bi+1ki+1subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖0subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑟𝑖1subscript𝑏𝑖1subscript𝑘𝑖1\displaystyle r_{i}p_{i}-\frac{r_{i}+b_{i}}{k_{i}}-\max\{r_{i}\delta_{i}p_{i}-% \frac{r_{i}+b_{i}}{k_{i}},0\}\leq r_{i+1}p_{i+1}-\frac{r_{i+1}+b_{i+1}}{k_{i+1}}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - roman_max { italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , 0 } ≤ italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG
ri0subscript𝑟𝑖0\displaystyle r_{i}\geq 0italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0
δn=1subscript𝛿𝑛1\displaystyle\delta_{n}=1italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1

where R=(r1,r2,,rn)T𝑅superscriptsubscript𝑟1subscript𝑟2subscript𝑟𝑛𝑇R=(r_{1},r_{2},\cdots,r_{n})^{T}italic_R = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, A=Adiag(1k1,1k2,,1kn)superscript𝐴𝐴diag1subscript𝑘11subscript𝑘21subscript𝑘𝑛A^{\prime}=A\cdot\text{diag}(\frac{1}{k_{1}},\frac{1}{k_{2}},\cdots,\frac{1}{k% _{n}})italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_A ⋅ diag ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , ⋯ , divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ) and G=GA(b1,b2,,bn)Tsuperscript𝐺𝐺superscript𝐴superscriptsubscript𝑏1subscript𝑏2subscript𝑏𝑛𝑇G^{\prime}=G-A^{\prime}(b_{1},b_{2},\cdots,b_{n})^{T}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_G - italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

The corresponding Lagrange function is

L(R,Λ)=𝑖riδipi+(λ1,λ2,,λa)(ARG)+ni=1λa+i(ripiri+bikimax{riδipiri+biki,0}ri+1pi+1+ri+1+bi+1ki+1)ni=1λa+n+iri.𝐿𝑅Λ𝑖subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝜆1subscript𝜆2subscript𝜆𝑎superscript𝐴𝑅superscript𝐺𝑖1superscript𝑛subscript𝜆𝑎𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖0subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑟𝑖1subscript𝑏𝑖1subscript𝑘𝑖1𝑖1superscript𝑛subscript𝜆𝑎𝑛𝑖subscript𝑟𝑖L(R,\Lambda)=\underset{i}{\sum}r_{i}\delta_{i}p_{i}+(\lambda_{1},\lambda_{2},% \cdots,\lambda_{a})(A^{\prime}R-G^{\prime})\\ +\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}\lambda_{a+i}(r_{i}p_{i}-% \frac{r_{i}+b_{i}}{k_{i}}-\max\{r_{i}\delta_{i}p_{i}-\frac{r_{i}+b_{i}}{k_{i}}% ,0\}-r_{i+1}p_{i+1}+\frac{r_{i+1}+b_{i+1}}{k_{i+1}})\\ -\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}\lambda_{a+n+i}r_{i}.start_ROW start_CELL italic_L ( italic_R , roman_Λ ) = underitalic_i start_ARG ∑ end_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_λ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_R - italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - roman_max { italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , 0 } - italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + divide start_ARG italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG ) end_CELL end_ROW start_ROW start_CELL - start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_n + italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . end_CELL end_ROW

We define Λ=(λ1,λ2,,λa+2n)Λsubscript𝜆1subscript𝜆2subscript𝜆𝑎2𝑛\Lambda=(\lambda_{1},\lambda_{2},\cdots,\lambda_{a+2n})roman_Λ = ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_λ start_POSTSUBSCRIPT italic_a + 2 italic_n end_POSTSUBSCRIPT ) and

g(Λ)=infRnL(R,Λ).𝑔Λ𝑅superscript𝑛inf𝐿𝑅Λg(\Lambda)=\underset{R\in\mathbb{R}^{n}}{\text{inf}}L(R,\Lambda).italic_g ( roman_Λ ) = start_UNDERACCENT italic_R ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG inf end_ARG italic_L ( italic_R , roman_Λ ) .

Since the domain of L(R,Λ)𝐿𝑅ΛL(R,\Lambda)italic_L ( italic_R , roman_Λ ) can be divided into 2n1superscript2𝑛12^{n-1}2 start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT parts and each part can be defined as, let N={1,2,,n1}𝑁12𝑛1N=\{1,2,\ldots,n-1\}italic_N = { 1 , 2 , … , italic_n - 1 } and S𝑆Sitalic_S be a subset of N𝑁Nitalic_N,

US={(r1,r2,,rn)n|riδipiri+biki0foriS,riδipiri+biki<0,foriN\S}.subscript𝑈𝑆conditional-setsubscript𝑟1subscript𝑟2subscript𝑟𝑛superscript𝑛formulae-sequencesubscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖0for𝑖𝑆formulae-sequencesubscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖0for𝑖\𝑁𝑆U_{S}=\{(r_{1},r_{2},\cdots,r_{n})\in\mathbb{R}^{n}|r_{i}\delta_{i}p_{i}-\frac% {r_{i}+b_{i}}{k_{i}}\geq 0\ \text{for}\ i\in S,\quad r_{i}\delta_{i}p_{i}-% \frac{r_{i}+b_{i}}{k_{i}}<0,\ \text{for}\ i\in N\backslash S\}.italic_U start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = { ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ≥ 0 for italic_i ∈ italic_S , italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG < 0 , for italic_i ∈ italic_N \ italic_S } .

Then L(R,Λ)𝐿𝑅ΛL(R,\Lambda)italic_L ( italic_R , roman_Λ ) can be rewritten as, in the domain USsubscript𝑈𝑆U_{S}italic_U start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT,

L(R,Λ)=𝑖riδipi+(λ1,λ2,,λa)(ARG)+iSλa+i(ripiriδipiri+1pi+1+ri+1+bi+1ki+1)+iN\Sλa+i(ripiri+bikiri+1pi+1+ri+1+bi+1ki+1)ni=1λa+n+iri.𝐿𝑅Λ𝑖subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝜆1subscript𝜆2subscript𝜆𝑎superscript𝐴𝑅superscript𝐺𝑖𝑆subscript𝜆𝑎𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑟𝑖1subscript𝑏𝑖1subscript𝑘𝑖1𝑖\𝑁𝑆subscript𝜆𝑎𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖subscript𝑟𝑖1subscript𝑝𝑖1subscript𝑟𝑖1subscript𝑏𝑖1subscript𝑘𝑖1𝑖1superscript𝑛subscript𝜆𝑎𝑛𝑖subscript𝑟𝑖L(R,\Lambda)=\underset{i}{\sum}r_{i}\delta_{i}p_{i}+(\lambda_{1},\lambda_{2},% \cdots,\lambda_{a})(A^{\prime}R-G^{\prime})\\ +\underset{i\in S}{\sum}\lambda_{a+i}(r_{i}p_{i}-r_{i}\delta_{i}p_{i}-r_{i+1}p% _{i+1}+\frac{r_{i+1}+b_{i+1}}{k_{i+1}})+\underset{i\in N\backslash S}{\sum}% \lambda_{a+i}(r_{i}p_{i}-\frac{r_{i}+b_{i}}{k_{i}}-r_{i+1}p_{i+1}+\frac{r_{i+1% }+b_{i+1}}{k_{i+1}})\\ -\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}\lambda_{a+n+i}r_{i}.start_ROW start_CELL italic_L ( italic_R , roman_Λ ) = underitalic_i start_ARG ∑ end_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_λ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_R - italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + start_UNDERACCENT italic_i ∈ italic_S end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + divide start_ARG italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG ) + start_UNDERACCENT italic_i ∈ italic_N \ italic_S end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + divide start_ARG italic_r start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG ) end_CELL end_ROW start_ROW start_CELL - start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_n + italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . end_CELL end_ROW
Proposition 4.2 ().

In the domain USsubscript𝑈𝑆U_{S}italic_U start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, the Lagrange dual function of L(R,Λ)𝐿𝑅normal-ΛL(R,\Lambda)italic_L ( italic_R , roman_Λ ) is

g(Λ)={ai=1λigiiN\Sλa+ibiki+ni=2λa+i1biki,ΛDS,otherg(\Lambda)=\left\{\begin{aligned} -\underset{i=1}{\stackrel{{\scriptstyle a}}{% {\sum}}}\lambda_{i}g_{i}-\underset{i\in N\backslash S}{\sum}\lambda_{a+i}\frac% {b_{i}}{k_{i}}+\underset{i=2}{\stackrel{{\scriptstyle n}}{{\sum}}}\lambda_{a+i% -1}\frac{b_{i}}{k_{i}},\quad\Lambda\in D_{S}\\ -\infty,\quad other\\ \end{aligned}\right.italic_g ( roman_Λ ) = { start_ROW start_CELL - start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - start_UNDERACCENT italic_i ∈ italic_N \ italic_S end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT divide start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + start_UNDERACCENT italic_i = 2 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i - 1 end_POSTSUBSCRIPT divide start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , roman_Λ ∈ italic_D start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL - ∞ , italic_o italic_t italic_h italic_e italic_r end_CELL end_ROW

where

DS=iS{Λ|δipi+aj=1λjaji+λa+i(1δi)pi+λa+i1(1kipi)λa+n+i=0}iN\S{Λ|δipi+aj=1λjaji+λa+i(pi1ki)+λa+i1(1kipi)λa+n+i=0}{Λ|δnpn+aj=1λjajn+λa+n1(1knpn)λa+2n=0}.subscript𝐷𝑆𝑖𝑆conditional-setΛsubscript𝛿𝑖subscript𝑝𝑖𝑗1superscript𝑎subscript𝜆𝑗subscript𝑎𝑗𝑖subscript𝜆𝑎𝑖1subscript𝛿𝑖subscript𝑝𝑖subscript𝜆𝑎𝑖11subscript𝑘𝑖subscript𝑝𝑖subscript𝜆𝑎𝑛𝑖0𝑖\𝑁𝑆conditional-setΛsubscript𝛿𝑖subscript𝑝𝑖𝑗1superscript𝑎subscript𝜆𝑗subscript𝑎𝑗𝑖subscript𝜆𝑎𝑖subscript𝑝𝑖1subscript𝑘𝑖subscript𝜆𝑎𝑖11subscript𝑘𝑖subscript𝑝𝑖subscript𝜆𝑎𝑛𝑖0conditional-setΛsubscript𝛿𝑛subscript𝑝𝑛𝑗1superscript𝑎subscript𝜆𝑗subscript𝑎𝑗𝑛subscript𝜆𝑎𝑛11subscript𝑘𝑛subscript𝑝𝑛subscript𝜆𝑎2𝑛0D_{S}=\underset{i\in S}{\bigcap}\{\Lambda|\delta_{i}p_{i}+\underset{j=1}{% \stackrel{{\scriptstyle a}}{{\sum}}}\lambda_{j}a_{ji}+\lambda_{a+i}(1-\delta_{% i})p_{i}+\lambda_{a+i-1}(\frac{1}{k_{i}}-p_{i})-\lambda_{a+n+i}=0\}\\ \cap\underset{i\in N\backslash S}{\bigcap}\{\Lambda|\delta_{i}p_{i}+\underset{% j=1}{\stackrel{{\scriptstyle a}}{{\sum}}}\lambda_{j}a_{ji}+\lambda_{a+i}(p_{i}% -\frac{1}{k_{i}})+\lambda_{a+i-1}(\frac{1}{k_{i}}-p_{i})-\lambda_{a+n+i}=0\}\\ \{\Lambda|\delta_{n}p_{n}+\underset{j=1}{\stackrel{{\scriptstyle a}}{{\sum}}}% \lambda_{j}a_{jn}+\lambda_{a+n-1}(\frac{1}{k_{n}}-p_{n})-\lambda_{a+2n}=0\}.start_ROW start_CELL italic_D start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = start_UNDERACCENT italic_i ∈ italic_S end_UNDERACCENT start_ARG ⋂ end_ARG { roman_Λ | italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_i - 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_λ start_POSTSUBSCRIPT italic_a + italic_n + italic_i end_POSTSUBSCRIPT = 0 } end_CELL end_ROW start_ROW start_CELL ∩ start_UNDERACCENT italic_i ∈ italic_N \ italic_S end_UNDERACCENT start_ARG ⋂ end_ARG { roman_Λ | italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) + italic_λ start_POSTSUBSCRIPT italic_a + italic_i - 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_λ start_POSTSUBSCRIPT italic_a + italic_n + italic_i end_POSTSUBSCRIPT = 0 } end_CELL end_ROW start_ROW start_CELL { roman_Λ | italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_n end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_n - 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG - italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_λ start_POSTSUBSCRIPT italic_a + 2 italic_n end_POSTSUBSCRIPT = 0 } . end_CELL end_ROW
Proof.

We have

L(R,Λ)=𝑖riδipi+(λ1,λ2,,λa)(ARG)+iSλa+i(ripiriδipi)+iN\Sλa+i(ripiri+biki)+ni=2λa+i1(ri+bikiripi)ni=1λa+n+iri.𝐿𝑅Λ𝑖subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖subscript𝜆1subscript𝜆2subscript𝜆𝑎superscript𝐴𝑅superscript𝐺𝑖𝑆subscript𝜆𝑎𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝛿𝑖subscript𝑝𝑖𝑖\𝑁𝑆subscript𝜆𝑎𝑖subscript𝑟𝑖subscript𝑝𝑖subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖𝑖2superscript𝑛subscript𝜆𝑎𝑖1subscript𝑟𝑖subscript𝑏𝑖subscript𝑘𝑖subscript𝑟𝑖subscript𝑝𝑖𝑖1superscript𝑛subscript𝜆𝑎𝑛𝑖subscript𝑟𝑖L(R,\Lambda)=\underset{i}{\sum}r_{i}\delta_{i}p_{i}+(\lambda_{1},\lambda_{2},% \cdots,\lambda_{a})(A^{\prime}R-G^{\prime})\\ +\underset{i\in S}{\sum}\lambda_{a+i}(r_{i}p_{i}-r_{i}\delta_{i}p_{i})+% \underset{i\in N\backslash S}{\sum}\lambda_{a+i}(r_{i}p_{i}-\frac{r_{i}+b_{i}}% {k_{i}})+\underset{i=2}{\stackrel{{\scriptstyle n}}{{\sum}}}\lambda_{a+i-1}(% \frac{r_{i}+b_{i}}{k_{i}}-r_{i}p_{i})\\ -\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}\lambda_{a+n+i}r_{i}.start_ROW start_CELL italic_L ( italic_R , roman_Λ ) = underitalic_i start_ARG ∑ end_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_λ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_R - italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + start_UNDERACCENT italic_i ∈ italic_S end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + start_UNDERACCENT italic_i ∈ italic_N \ italic_S end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) + start_UNDERACCENT italic_i = 2 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i - 1 end_POSTSUBSCRIPT ( divide start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL - start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_n + italic_i end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . end_CELL end_ROW

If iS𝑖𝑆i\in Sitalic_i ∈ italic_S, the corresponding coefficient of risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is

δipi+aj=1λjaji+λa+i(1δi)pi+λa+i1(1kipi)λa+n+i.subscript𝛿𝑖subscript𝑝𝑖𝑗1superscript𝑎subscript𝜆𝑗subscript𝑎𝑗𝑖subscript𝜆𝑎𝑖1subscript𝛿𝑖subscript𝑝𝑖subscript𝜆𝑎𝑖11subscript𝑘𝑖subscript𝑝𝑖subscript𝜆𝑎𝑛𝑖\delta_{i}p_{i}+\underset{j=1}{\stackrel{{\scriptstyle a}}{{\sum}}}\lambda_{j}% a_{ji}+\lambda_{a+i}(1-\delta_{i})p_{i}+\lambda_{a+i-1}(\frac{1}{k_{i}}-p_{i})% -\lambda_{a+n+i}.italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_i - 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_λ start_POSTSUBSCRIPT italic_a + italic_n + italic_i end_POSTSUBSCRIPT .

If iN\S𝑖\𝑁𝑆i\in N\backslash Sitalic_i ∈ italic_N \ italic_S, the corresponding coefficient of risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is

δipi+aj=1λjaji+λa+i(pi1ki)+λa+i1(1kipi)λa+n+i.subscript𝛿𝑖subscript𝑝𝑖𝑗1superscript𝑎subscript𝜆𝑗subscript𝑎𝑗𝑖subscript𝜆𝑎𝑖subscript𝑝𝑖1subscript𝑘𝑖subscript𝜆𝑎𝑖11subscript𝑘𝑖subscript𝑝𝑖subscript𝜆𝑎𝑛𝑖\delta_{i}p_{i}+\underset{j=1}{\stackrel{{\scriptstyle a}}{{\sum}}}\lambda_{j}% a_{ji}+\lambda_{a+i}(p_{i}-\frac{1}{k_{i}})+\lambda_{a+i-1}(\frac{1}{k_{i}}-p_% {i})-\lambda_{a+n+i}.italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) + italic_λ start_POSTSUBSCRIPT italic_a + italic_i - 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_λ start_POSTSUBSCRIPT italic_a + italic_n + italic_i end_POSTSUBSCRIPT .

If i=n𝑖𝑛i=nitalic_i = italic_n, the corresponding coefficient of risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is

δnpn+aj=1λjajn+λa+n1(1knpn)λa+2n.subscript𝛿𝑛subscript𝑝𝑛𝑗1superscript𝑎subscript𝜆𝑗subscript𝑎𝑗𝑛subscript𝜆𝑎𝑛11subscript𝑘𝑛subscript𝑝𝑛subscript𝜆𝑎2𝑛\delta_{n}p_{n}+\underset{j=1}{\stackrel{{\scriptstyle a}}{{\sum}}}\lambda_{j}% a_{jn}+\lambda_{a+n-1}(\frac{1}{k_{n}}-p_{n})-\lambda_{a+2n}.italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_n end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_a + italic_n - 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG - italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_λ start_POSTSUBSCRIPT italic_a + 2 italic_n end_POSTSUBSCRIPT .

The constant term in L(R,Λ)𝐿𝑅ΛL(R,\Lambda)italic_L ( italic_R , roman_Λ ) is

ai=1λigiiN\Sλa+ibiki+ni=2λa+i1biki𝑖1superscript𝑎subscript𝜆𝑖subscript𝑔𝑖𝑖\𝑁𝑆subscript𝜆𝑎𝑖subscript𝑏𝑖subscript𝑘𝑖𝑖2superscript𝑛subscript𝜆𝑎𝑖1subscript𝑏𝑖subscript𝑘𝑖-\underset{i=1}{\stackrel{{\scriptstyle a}}{{\sum}}}\lambda_{i}g_{i}-\underset% {i\in N\backslash S}{\sum}\lambda_{a+i}\frac{b_{i}}{k_{i}}+\underset{i=2}{% \stackrel{{\scriptstyle n}}{{\sum}}}\lambda_{a+i-1}\frac{b_{i}}{k_{i}}- start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_a end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - start_UNDERACCENT italic_i ∈ italic_N \ italic_S end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i end_POSTSUBSCRIPT divide start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + start_UNDERACCENT italic_i = 2 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_λ start_POSTSUBSCRIPT italic_a + italic_i - 1 end_POSTSUBSCRIPT divide start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG

and this proposition is proved. ∎

5. Experiments

5.1. Method

We consider data transaction chain 𝕊:s1s2sn:𝕊subscript𝑠1subscript𝑠2subscript𝑠𝑛\mathbb{S}:s_{1}\to s_{2}\to\cdots\to s_{n}blackboard_S : italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → ⋯ → italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and initially, the node s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT possesses dataset D𝐷Ditalic_D. In this section, each node trains the model to calculate the Shapley value of each data point in D𝐷Ditalic_D and sells it to the market.

We recall the data valuation using the Shapley value, see also (Jia et al., 2019). Given a dataset D𝐷Ditalic_D consisting of m𝑚mitalic_m data points, let U(S)𝑈𝑆U(S)italic_U ( italic_S )(SD𝑆𝐷S\subset Ditalic_S ⊂ italic_D) be a utility function that reflects the data value of S𝑆Sitalic_S. The Shapley value of a data point jD𝑗𝐷j\in Ditalic_j ∈ italic_D can be written as

ϕj=1mSD\{j}1(m1|S|)(U(S{j})U(S))superscriptitalic-ϕ𝑗1𝑚𝑆\𝐷𝑗1𝑚1𝑆𝑈𝑆𝑗𝑈𝑆\phi^{j}=\frac{1}{m}\underset{S\subset D\backslash\{j\}}{\sum}\frac{1}{\left(% \begin{array}[]{l}m-1\\ |S|\end{array}\right)}(U(S\cup\{j\})-U(S))italic_ϕ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG start_UNDERACCENT italic_S ⊂ italic_D \ { italic_j } end_UNDERACCENT start_ARG ∑ end_ARG divide start_ARG 1 end_ARG start_ARG ( start_ARRAY start_ROW start_CELL italic_m - 1 end_CELL end_ROW start_ROW start_CELL | italic_S | end_CELL end_ROW end_ARRAY ) end_ARG ( italic_U ( italic_S ∪ { italic_j } ) - italic_U ( italic_S ) )

or

ϕj=1m!πΠ(D)[U(Pjπ{j})U(Pjπ)]superscriptitalic-ϕ𝑗1𝑚𝜋Π𝐷delimited-[]𝑈superscriptsubscript𝑃𝑗𝜋𝑗𝑈superscriptsubscript𝑃𝑗𝜋\phi^{j}=\frac{1}{m!}\underset{\pi\in\Pi(D)}{\sum}[U(P_{j}^{\pi}\cup\{j\})-U(P% _{j}^{\pi})]italic_ϕ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m ! end_ARG start_UNDERACCENT italic_π ∈ roman_Π ( italic_D ) end_UNDERACCENT start_ARG ∑ end_ARG [ italic_U ( italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ∪ { italic_j } ) - italic_U ( italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ) ]

where Pjπsuperscriptsubscript𝑃𝑗𝜋P_{j}^{\pi}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT is the set of members which precede member j𝑗jitalic_j in a permutation πΠ(D)𝜋Π𝐷\pi\in\Pi(D)italic_π ∈ roman_Π ( italic_D ).

Note that it is necessary to train all possible combinations of data points and calculate their marginal contributions. Assume that there are m𝑚mitalic_m data points in the dataset D𝐷Ditalic_D, and we define a partition (m1,m2,,mn)subscript𝑚1subscript𝑚2subscript𝑚𝑛(m_{1},m_{2},\cdots,m_{n})( italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) where ni=1mi=m𝑖1superscript𝑛subscript𝑚𝑖𝑚\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}m_{i}=mstart_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_m. We assume that the node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can train a model with a dataset of size at most ik=1mk𝑘1superscript𝑖subscript𝑚𝑘\underset{k=1}{\stackrel{{\scriptstyle i}}{{\sum}}}m_{k}start_UNDERACCENT italic_k = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. We have Algorithm 1. In this algorithm, we use Monte Carlo sampling to approximate the Shapley value, and at the same time, each node also computes a corresponding value vector ϕ^i=(ϕ^i1,ϕ^i2,,ϕ^im),i=1,2,,nformulae-sequencesubscript^italic-ϕ𝑖superscriptsubscript^italic-ϕ𝑖1superscriptsubscript^italic-ϕ𝑖2superscriptsubscript^italic-ϕ𝑖𝑚𝑖12𝑛\hat{\phi}_{i}=(\hat{\phi}_{i}^{1},\hat{\phi}_{i}^{2},\cdots,\hat{\phi}_{i}^{m% }),i=1,2,\ldots,nover^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) , italic_i = 1 , 2 , … , italic_n for the data points. However, the precision of the results from the intermediate nodes is lower than the precision of the results from the terminal node. We define the accuracy of each node as ϕ^iϕ^n2subscriptnormsubscript^italic-ϕ𝑖subscript^italic-ϕ𝑛2\|\hat{\phi}_{i}-\hat{\phi}_{n}\|_{2}∥ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT(i=1,2,,n𝑖12𝑛i=1,2,\ldots,nitalic_i = 1 , 2 , … , italic_n). Also, we specify the unit price of each model based on the corresponding accuracy.

After pricing the models, we present the linear programming optimization

(4) maxij=1(kjcjbj)δjpj𝑗1superscript𝑖subscript𝑘𝑗subscript𝑐𝑗subscript𝑏𝑗subscript𝛿𝑗subscript𝑝𝑗\displaystyle\max\underset{j=1}{\stackrel{{\scriptstyle i}}{{\sum}}}(k_{j}c_{j% }-b_{j})\delta_{j}p_{j}roman_max start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i end_ARG end_RELOP end_ARG ( italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
2ci(i1j=1mj+1+ij=1mj)mi2ci1(i2j=1mj+1+i1j=1mj)mi12c1(1+m1)m12subscript𝑐𝑖𝑗1superscript𝑖1subscript𝑚𝑗1𝑗1superscript𝑖subscript𝑚𝑗subscript𝑚𝑖2subscript𝑐𝑖1𝑗1superscript𝑖2subscript𝑚𝑗1𝑗1superscript𝑖1subscript𝑚𝑗subscript𝑚𝑖12subscript𝑐11subscript𝑚1subscript𝑚1\displaystyle\frac{2c_{i}}{(\underset{j=1}{\stackrel{{\scriptstyle i-1}}{{\sum% }}}m_{j}+1+\underset{j=1}{\stackrel{{\scriptstyle i}}{{\sum}}}m_{j})m_{i}}\leq% \frac{2c_{i-1}}{(\underset{j=1}{\stackrel{{\scriptstyle i-2}}{{\sum}}}m_{j}+1+% \underset{j=1}{\stackrel{{\scriptstyle i-1}}{{\sum}}}m_{j})m_{i-1}}\leq\cdots% \leq\frac{2c_{1}}{(1+m_{1})m_{1}}divide start_ARG 2 italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ( start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i - 1 end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1 + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ≤ divide start_ARG 2 italic_c start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG start_ARG ( start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i - 2 end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1 + start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i - 1 end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_m start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG ≤ ⋯ ≤ divide start_ARG 2 italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
il=1clC𝑙1superscript𝑖subscript𝑐𝑙𝐶\displaystyle\underset{l=1}{\stackrel{{\scriptstyle i}}{{\sum}}}c_{l}\leq Cstart_UNDERACCENT italic_l = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i end_ARG end_RELOP end_ARG italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ≤ italic_C
(kjcjbj)pjδj(kjcjbj)pj(kj+1cj+1bj+1)pj+1cj+1subscript𝑘𝑗subscript𝑐𝑗subscript𝑏𝑗subscript𝑝𝑗subscript𝛿𝑗subscript𝑘𝑗subscript𝑐𝑗subscript𝑏𝑗subscript𝑝𝑗subscript𝑘𝑗1subscript𝑐𝑗1subscript𝑏𝑗1subscript𝑝𝑗1subscript𝑐𝑗1\displaystyle(k_{j}c_{j}-b_{j})p_{j}-\delta_{j}(k_{j}c_{j}-b_{j})p_{j}\leq(k_{% j+1}c_{j+1}-b_{j+1})p_{j+1}-c_{j+1}( italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ ( italic_k start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT
kjcjbj0subscript𝑘𝑗subscript𝑐𝑗subscript𝑏𝑗0\displaystyle k_{j}c_{j}-b_{j}\geq 0italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ 0
cj0,j=1,2,,iformulae-sequencesubscript𝑐𝑗0𝑗12𝑖\displaystyle c_{j}\geq 0,\quad j=1,2,\ldots,iitalic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ 0 , italic_j = 1 , 2 , … , italic_i
δi=1subscript𝛿𝑖1\displaystyle\delta_{i}=1italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1

and Algorithm 2.

Input: m𝑚mitalic_m, n𝑛nitalic_n, partition m1,m2,,mnsubscript𝑚1subscript𝑚2subscript𝑚𝑛m_{1},m_{2},\cdots,m_{n}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT where ni=1mi=m𝑖1superscript𝑛subscript𝑚𝑖𝑚\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}m_{i}=mstart_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_m.
Output: Accuracy aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponding to the model Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT trained by the node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
ϕ^ij=0,i=1,2,n,j=1,2,,mformulae-sequencesuperscriptsubscript^italic-ϕ𝑖𝑗0formulae-sequence𝑖12𝑛𝑗12𝑚\hat{\phi}_{i}^{j}=0,i=1,2,\ldots n,j=1,2,\ldots,mover^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = 0 , italic_i = 1 , 2 , … italic_n , italic_j = 1 , 2 , … , italic_m;
repeat
      Sample a permutation σ𝜎\sigmaitalic_σ; M=0𝑀0M=0italic_M = 0;
      for i=1,2,,n𝑖12normal-…𝑛i=1,2,\ldots,nitalic_i = 1 , 2 , … , italic_n do
           for k=1,2,,i1𝑘12normal-…𝑖1k=1,2,\ldots,i-1italic_k = 1 , 2 , … , italic_i - 1 do
               ϕ^i=ϕ^i+ϕ^ksubscript^italic-ϕ𝑖subscript^italic-ϕ𝑖subscript^italic-ϕ𝑘\hat{\phi}_{i}=\hat{\phi}_{i}+\hat{\phi}_{k}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT(collect the results from former nodes);
                end for
               for j=M+1,M+2,,M+mi𝑗𝑀1𝑀2normal-…𝑀subscript𝑚𝑖j=M+1,M+2,\ldots,M+m_{i}italic_j = italic_M + 1 , italic_M + 2 , … , italic_M + italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT do
                    ϕ^iσ[j]=ϕ^iσ[j]+(U(Pσ[j]σ{σ[j]})U(Pσ[j]σ))superscriptsubscript^italic-ϕ𝑖𝜎delimited-[]𝑗superscriptsubscript^italic-ϕ𝑖𝜎delimited-[]𝑗𝑈subscriptsuperscript𝑃𝜎𝜎delimited-[]𝑗𝜎delimited-[]𝑗𝑈subscriptsuperscript𝑃𝜎𝜎delimited-[]𝑗\hat{\phi}_{i}^{\sigma[j]}=\hat{\phi}_{i}^{\sigma[j]}+(U(P^{\sigma}_{\sigma[j]% }\cup\{\sigma[j]\})-U(P^{\sigma}_{\sigma[j]}))over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ [ italic_j ] end_POSTSUPERSCRIPT = over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ [ italic_j ] end_POSTSUPERSCRIPT + ( italic_U ( italic_P start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ [ italic_j ] end_POSTSUBSCRIPT ∪ { italic_σ [ italic_j ] } ) - italic_U ( italic_P start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ [ italic_j ] end_POSTSUBSCRIPT ) );
                     end for
                    M=M+mi𝑀𝑀subscript𝑚𝑖M=M+m_{i}italic_M = italic_M + italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT;
                    
                     end for
                    
                     until Convergence criteria met;
                    for i=1,2,,n1𝑖12normal-…𝑛1i=1,2,\ldots,n-1italic_i = 1 , 2 , … , italic_n - 1 do
                         Calculate ai=ϕ^iϕ^n2subscript𝑎𝑖subscriptnormsubscript^italic-ϕ𝑖subscript^italic-ϕ𝑛2a_{i}=\|\hat{\phi}_{i}-\hat{\phi}_{n}\|_{2}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∥ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;
                          end for
ALGORITHM 1 Pricing
Input: pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, kisubscript𝑘𝑖k_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,2,,n𝑖12𝑛i=1,2,\ldots,nitalic_i = 1 , 2 , … , italic_n.
Output: Total revenue Re𝑅𝑒Reitalic_R italic_e of chain S𝑆Sitalic_S.
Re=0𝑅𝑒0Re=0italic_R italic_e = 0;
for i=1,2,,n𝑖12normal-…𝑛i=1,2,\ldots,nitalic_i = 1 , 2 , … , italic_n do
      if Constraint is empty  then
           break;
          
           end if
          Solve (4) and return rei𝑟subscript𝑒𝑖re_{i}italic_r italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT;
           Re=max{Re,rei}𝑅𝑒𝑅𝑒𝑟subscript𝑒𝑖Re=\max\{Re,re_{i}\}italic_R italic_e = roman_max { italic_R italic_e , italic_r italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT };
          
           end for
ALGORITHM 2 Linear programming

5.2. Experimental setup and results

We use Wine(Aeberhard et al., 1994), Cancer(Street et al., 1993), and Adult(Kohavi, 1996) datasets. The size of each dataset and the corresponding partition are shown in Table 2.

Table 2. Accuracy
Datasets m𝑚mitalic_m Partitions Accuracy Wine 80808080 4×204204\times 204 × 20 (13.23,5.28,1.34,0)×10613.235.281.340superscript106(13.23,5.28,1.34,0)\times 10^{-6}( 13.23 , 5.28 , 1.34 , 0 ) × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT Cancer 280280280280 7×407407\times 407 × 40 (18.37,10.48,5.77,3.02,1.40,0.46,0)×10718.3710.485.773.021.400.460superscript107(18.37,10.48,5.77,3.02,1.40,0.46,0)\times 10^{-7}( 18.37 , 10.48 , 5.77 , 3.02 , 1.40 , 0.46 , 0 ) × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT Adult 1000100010001000 100×1010010100\times 10100 × 10 (92.20,49.03,29.47,18.20,11.09,6.46,3.38,1.43,0.36,0)×10992.2049.0329.4718.2011.096.463.381.430.360superscript109(92.20,49.03,29.47,18.20,11.09,6.46,3.38,1.43,0.36,0)\times 10^{-9}( 92.20 , 49.03 , 29.47 , 18.20 , 11.09 , 6.46 , 3.38 , 1.43 , 0.36 , 0 ) × 10 start_POSTSUPERSCRIPT - 9 end_POSTSUPERSCRIPT

Referring to the accuracy aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the trained model Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in Algorithm 1, we give the price for each model where pi=100aisubscript𝑝𝑖100subscript𝑎𝑖p_{i}=100-a_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 100 - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Also, we use ki=1subscript𝑘𝑖1k_{i}=1italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 and bi=0subscript𝑏𝑖0b_{i}=0italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0(i=1,2,,n𝑖12𝑛i=1,2,\ldots,nitalic_i = 1 , 2 , … , italic_n), and δi=0.9subscript𝛿𝑖0.9\delta_{i}=0.9italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.9 if the node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is not the terminal node and δi=1subscript𝛿𝑖1\delta_{i}=1italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 if the node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the terminal node. We use C=100,1000,10000𝐶100100010000C=100,1000,10000italic_C = 100 , 1000 , 10000 corresponding to the Wine, Cancer, and Adult datasets respectively. Based on these parameters, we have the results in Table 3.

Table 3. Results
Datasets End nodes Each node Total revenue Wine 4444 (0.41,16.80,33.20,49.59)0.4116.8033.2049.59(0.41,16.80,33.20,49.59)( 0.41 , 16.80 , 33.20 , 49.59 ) 9375.509375.509375.509375.50 Cancer 7777 (0.59,48.01,95.44,142.86,190.28,237.70,285.12)0.5948.0195.44142.86190.28237.70285.12(0.59,48.01,95.44,142.86,190.28,237.70,285.12)( 0.59 , 48.01 , 95.44 , 142.86 , 190.28 , 237.70 , 285.12 ) 90690.7390690.7390690.7390690.73 Adult 10101010 (1.11,223.09,445.06,667.04,889.01,1110.99,1332.96,(1.11,223.09,445.06,667.04,889.01,1110.99,1332.96,( 1.11 , 223.09 , 445.06 , 667.04 , 889.01 , 1110.99 , 1332.96 , 1554.94,1776.91)1554.94,1776.91)1554.94 , 1776.91 ) 865357.98865357.98865357.98865357.98
Remark 5.1 ().

We consider (4) because our primary concern is to ensure the existence of the solution, i.e., the constraint space is non-empty. Additionally, we note that the results of these three experiments maximizing the total profit are all generated as the dataset flows to the terminal node. However, whether this result holds in any arbitrary scenario requires further research. Therefore, we propose a hypothesis: The longer the dataset is traded along the data chain, the greater the total profit made.

5.3. Error analysis

The goal of this subsection is to compare the error with the models in different nodes with the same round T𝑇Titalic_T, i.e., the number of the permutations being sampled in Algorithm 1. We recall the Bennett inequality at first.

Lemma 5.2 ().

(Bennett inequality(Bennett, 1962))Let x1,x2,,xpsubscript𝑥1subscript𝑥2normal-⋯subscript𝑥𝑝x_{1},x_{2},\cdots,x_{p}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT be the independent random variables with Var(xiE(xi))=ξi2𝑉𝑎𝑟subscript𝑥𝑖𝐸subscript𝑥𝑖superscriptsubscript𝜉𝑖2Var(x_{i}-E(x_{i}))=\xi_{i}^{2}italic_V italic_a italic_r ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_E ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) = italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and |xiE(xi)|rsubscript𝑥𝑖𝐸subscript𝑥𝑖𝑟|x_{i}-E(x_{i})|\leq r| italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_E ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ≤ italic_r, then we have

Pr(|ni=1(xiE(xi))|ϵ)exp(ξ2r2h(rϵξ2))𝑃𝑟𝑖1superscript𝑛subscript𝑥𝑖𝐸subscript𝑥𝑖italic-ϵsuperscript𝜉2superscript𝑟2𝑟italic-ϵsuperscript𝜉2Pr(|\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}(x_{i}-E(x_{i}))|\geq% \epsilon)\leq\exp(-\frac{\xi^{2}}{r^{2}}h(\frac{r\epsilon}{\xi^{2}}))italic_P italic_r ( | start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_E ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) | ≥ italic_ϵ ) ≤ roman_exp ( - divide start_ARG italic_ξ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_h ( divide start_ARG italic_r italic_ϵ end_ARG start_ARG italic_ξ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) )

where ξ2=ni=1ξi2superscript𝜉2𝑖1superscript𝑛superscriptsubscript𝜉𝑖2\xi^{2}=\underset{i=1}{\stackrel{{\scriptstyle n}}{{\sum}}}\xi_{i}^{2}italic_ξ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = start_UNDERACCENT italic_i = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_n end_ARG end_RELOP end_ARG italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and h(x)=(1+x)ln(1+x)x𝑥1𝑥1𝑥𝑥h(x)=(1+x)\ln(1+x)-xitalic_h ( italic_x ) = ( 1 + italic_x ) roman_ln ( 1 + italic_x ) - italic_x.

By abuse of notation, we define λi=ik=1mjsubscript𝜆𝑖𝑘1superscript𝑖subscript𝑚𝑗\lambda_{i}=\underset{k=1}{\stackrel{{\scriptstyle i}}{{\sum}}}m_{j}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = start_UNDERACCENT italic_k = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_i end_ARG end_RELOP end_ARG italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. As node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can select up to λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT data points for model training, the strategy for node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to calculate the Shapley value is to first select λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT points from the dataset, and then compute the marginal contribution for each data point. Note that when the number of data points is less than λi1subscript𝜆𝑖1\lambda_{i-1}italic_λ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT, we directly utilize the computation results of node si1subscript𝑠𝑖1s_{i-1}italic_s start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT to save computational costs. Let ϕ^i=(ϕ^i1,ϕ^i2,,ϕ^im)subscript^italic-ϕ𝑖superscriptsubscript^italic-ϕ𝑖1superscriptsubscript^italic-ϕ𝑖2superscriptsubscript^italic-ϕ𝑖𝑚\hat{\phi}_{i}=(\hat{\phi}_{i}^{1},\hat{\phi}_{i}^{2},\cdots,\hat{\phi}_{i}^{m})over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) be the corresponding approximation value. A similar approximation algorithm is mentioned in (Liu et al., 2023), and we refer to the error analysis computation therein. Also, for the sample value x𝑥xitalic_x of the marginal contribution from data point j𝑗jitalic_j, we assume |xϕj|yj𝑥superscriptitalic-ϕ𝑗subscript𝑦𝑗|x-\phi^{j}|\leq y_{j}| italic_x - italic_ϕ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | ≤ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

We denote ΔΔ\Deltaroman_Δ be the indicator of whether the data point j𝑗jitalic_j has been chosen or not, i.e.,

Pr(Δ=1)=λim,Pr(Δ=0)=1λim.formulae-sequence𝑃𝑟Δ1subscript𝜆𝑖𝑚𝑃𝑟Δ01subscript𝜆𝑖𝑚Pr(\Delta=1)=\frac{\lambda_{i}}{m},\quad Pr(\Delta=0)=1-\frac{\lambda_{i}}{m}.italic_P italic_r ( roman_Δ = 1 ) = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG , italic_P italic_r ( roman_Δ = 0 ) = 1 - divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG .

Let ϕ=(ϕ1,ϕ2,,ϕm)italic-ϕsuperscriptitalic-ϕ1superscriptitalic-ϕ2superscriptitalic-ϕ𝑚\phi=(\phi^{1},\phi^{2},\cdots,\phi^{m})italic_ϕ = ( italic_ϕ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ⋯ , italic_ϕ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ), we have

Pr(|ϕjϕ^ij|ϵ)exp((2λim(λim)2)Th(ϵ(2λim(λim)2)yj))𝑃𝑟superscriptitalic-ϕ𝑗superscriptsubscript^italic-ϕ𝑖𝑗italic-ϵ2subscript𝜆𝑖𝑚superscriptsubscript𝜆𝑖𝑚2𝑇italic-ϵ2subscript𝜆𝑖𝑚superscriptsubscript𝜆𝑖𝑚2subscript𝑦𝑗Pr(|\phi^{j}-\hat{\phi}_{i}^{j}|\geq\epsilon)\leq\exp(-(2\frac{\lambda_{i}}{m}% -(\frac{\lambda_{i}}{m})^{2})Th(\frac{\epsilon}{(2\frac{\lambda_{i}}{m}-(\frac% {\lambda_{i}}{m})^{2})y_{j}}))italic_P italic_r ( | italic_ϕ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | ≥ italic_ϵ ) ≤ roman_exp ( - ( 2 divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG - ( divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T italic_h ( divide start_ARG italic_ϵ end_ARG start_ARG ( 2 divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG - ( divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) )

and

Pr(ϕϕ^i2ϵ)mj=1Pr(|ϕjϕ^ij|ϵm)𝑃𝑟subscriptnormitalic-ϕsubscript^italic-ϕ𝑖2italic-ϵ𝑗1superscript𝑚𝑃𝑟superscriptitalic-ϕ𝑗superscriptsubscript^italic-ϕ𝑖𝑗italic-ϵ𝑚Pr(||\phi-\hat{\phi}_{i}||_{2}\geq\epsilon)\leq\underset{j=1}{\stackrel{{% \scriptstyle m}}{{\sum}}}Pr(|\phi^{j}-\hat{\phi}_{i}^{j}|\geq\frac{\epsilon}{% \sqrt{m}})italic_P italic_r ( | | italic_ϕ - over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_ϵ ) ≤ start_UNDERACCENT italic_j = 1 end_UNDERACCENT start_ARG start_RELOP SUPERSCRIPTOP start_ARG ∑ end_ARG start_ARG italic_m end_ARG end_RELOP end_ARG italic_P italic_r ( | italic_ϕ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | ≥ divide start_ARG italic_ϵ end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG )

using Bennett’s inequality and the error analysis in (Liu et al., 2023),

6. Conclusion

This paper compares the differences between the data transaction market and traditional transaction markets due to the replicability and privacy of data, especially focusing on the differences in opportunity costs. We introduce a chain-like data transaction scenario and a linear programming model based on opportunity cost comparison. In the experimental section, a data trading scenario based on computing and trading the Shapley value of each data point is given, along with providing the solution and error analysis of the models from the nodes.

7. Limitations and Outlook

This paper exists three limitations, which could be further studied. We assume the sales volume of the model in the market is a linear function of the corresponding training cost. However, in real-world scenarios, the relationship between the two is more complex. Whether the constructed model corresponds to a convex optimization problem, and how the model is solved, need to be further studied.

Secondly, we provide a chain-like data trading mechanism. However, in real-world scenarios, downstream nodes often receive datasets from different upstream nodes for model training, such as in federated learning. Therefore, how our constructed data trading process integrates with the federated learning scenario is also a question.

Thirdly, in data trading, we always assume that data is completely traded. However, downstream nodes can also choose to purchase partial data. For instance, we can refer to the algorithm design in (Yu et al., 2023) to frame the trade between two nodes as a Markov decision process, utilizing reinforcement learning algorithms to find the optimal data trading ratio and strategy.

References

  • (1)
  • Aeberhard et al. (1994) Stefan Aeberhard, Danny Coomans, and Olivier de Vel. 1994. Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recognition 27, 8 (1994), 1065–1077. https://doi.org/10.1016/0031-3203(94)90145-7
  • Agarwal et al. (2019) Anish Agarwal, Munther Dahleh, and Tuhin Sarkar. 2019. A marketplace for data: An algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation. 701–726.
  • Amiri et al. (2023) Mohammad Mohammadi Amiri, Frederic Berdoz, and Ramesh Raskar. 2023. Fundamentals of task-agnostic data valuation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 9226–9234.
  • Bennett (1962) George Bennett. 1962. Probability Inequalities for the Sum of Independent Random Variables. J. Amer. Statist. Assoc. 57, 297 (1962), 33–45. https://doi.org/10.1080/01621459.1962.10482149 arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1962.10482149
  • Buchanan (1991) James M. Buchanan. 1991. Opportunity Cost. Palgrave Macmillan UK, London, 520–525. https://doi.org/10.1007/978-1-349-21315-3_69
  • Chen et al. (2022) Junjie Chen, Minming Li, and Haifeng Xu. 2022. Selling data to a machine learner: Pricing via costly signaling. In International Conference on Machine Learning. PMLR, 3336–3359.
  • Ghorbani and Zou (2019) Amirata Ghorbani and James Zou. 2019. Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning. PMLR, 2242–2251.
  • Jia et al. (2019) Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Dawn Song, and Costas J Spanos. 2019. Towards efficient data valuation based on the shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1167–1176.
  • Karlaš et al. (2022) Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, and Ce Zhang. 2022. Data debugging with shapley importance over end-to-end machine learning pipelines. arXiv preprint arXiv:2204.11131 (2022).
  • Kohavi (1996) Ron Kohavi. 1996. Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Portland, Oregon) (KDD’96). AAAI Press, 202–207.
  • Leake et al. (2021) Jonathan Leake, Colin McSwiggen, and Nisheeth K Vishnoi. 2021. Sampling matrices from Harish-Chandra–Itzykson–Zuber densities with applications to Quantum inference and differential privacy. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing. 1384–1397.
  • Liu et al. (2023) Jie Liu, Peizheng Wang, and Chao Wu. 2023. Data valuation: The partial ordinal Shapley value for machine learning. arXiv preprint arXiv:2305.01660 (2023).
  • Mangoubi et al. (2022) Oren Mangoubi, Yikai Wu, Satyen Kale, Abhradeep Thakurta, and Nisheeth K Vishnoi. 2022. Private Matrix Approximation and Geometry of Unitary Orbits. In Conference on Learning Theory. PMLR, 3547–3588.
  • Pei (2020) Jian Pei. 2020. A survey on data pricing: from economics to data science. IEEE Transactions on knowledge and Data Engineering 34, 10 (2020), 4586–4608.
  • Shokri et al. (2012) Reza Shokri, George Theodorakopoulos, Carmela Troncoso, Jean-Pierre Hubaux, and Jean-Yves Le Boudec. 2012. Protecting location privacy: optimal strategy against localization attacks. In Proceedings of the 2012 ACM conference on Computer and communications security. 617–627.
  • Street et al. (1993) William Nick Street, William H. Wolberg, and Olvi L. Mangasarian. 1993. Nuclear feature extraction for breast tumor diagnosis. In Electronic imaging. https://api.semanticscholar.org/CorpusID:14922543
  • Wang et al. (2020) Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, and Dawn Song. 2020. A principled approach to data valuation for federated learning. Federated Learning: Privacy and Incentive (2020), 153–167.
  • Yu et al. (2023) Yi Yu, Shengyue Yao, Juanjuan Li, Fei-Yue Wang, and Yilun Lin. 2023. SWDPM: A Social Welfare-Optimized Data Pricing Mechanism. arXiv preprint arXiv:2305.06357 (2023).
  • Zhang et al. (2023) Mengxiao Zhang, Fernando Beltrán, and Jiamou Liu. 2023. A Survey of Data Pricing for Data Marketplaces. IEEE Transactions on Big Data (2023).