User Welfare Optimization in Recommender Systems with Competing Content Creators

\nameFan Yao1 \email[email protected] \AND\nameYiming Liao 2 \email[email protected] \AND\nameMingzhe Wu1 \email[email protected] \AND\nameChuanhao Li4 \email[email protected] \AND\nameYan Zhu5 \email[email protected] \AND\nameJames Yang2 \email[email protected] \AND\nameQifan Wang2 \email[email protected] \AND\nameHaifeng Xu3 \email[email protected] \AND\nameHongning Wang1 \email[email protected] \AND
\addr1Department of Computer Science, University of Virginia, USA
\addr2Meta, USA
\addr3Department of Computer Science, University of Chicago, USA
\addr4Department of Statistics and Data Science, Yale University, USA
\addr5Google, USA
Abstract

Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global user preference distribution often traps the competition, especially the creators, in states that yield sub-optimal user welfare. To encourage creators to best serve a broad user population with relevant content, it becomes the platform’s responsibility to leverage its information advantage regarding user preference distribution to accurately signal creators. In this study, we perform system-side user welfare optimization under a competitive game setting among content creators. We propose an algorithmic solution for the platform, which dynamically computes a sequence of weights for each user based on their satisfaction of the recommended content. These weights are then utilized to design mechanisms that adjust the recommendation policy or the post-recommendation rewards, thereby influencing creators’ content production strategies. To validate the effectiveness of our proposed method, we report our findings from a series of experiments, including: 1. a proof-of-concept negative example illustrating how creators’ strategies converge towards sub-optimal states without platform intervention; 2. offline experiments employing our proposed intervention mechanisms on diverse datasets; and 3. results from a three-week online experiment conducted on a leading short-video recommendation platform.

1 Introduction

Online content recommendation platforms have evolved into an indispensable component of our daily lives (Bobadilla et al., 2013). These platforms play a pivotal role in assisting their users in navigating the vast ocean of content generated by revenue-seeking creators, including various social media platforms (e.g., Facebook, Instagram), streaming services (e.g., YouTube, TikTok), and many more. One of the primary functions of these recommendation platforms is to advance user welfare, defined as the overall volume and quality of interactions between users and content. This metric is widely regarded as a fundamental indicator of the well-being of an online ecosystem and is also closely tied to the platform’s revenue.

After decades of effort in relevance-driven matching between users and content, industry practitioners and researchers have reached the consensus that user welfare optimization cannot be achieved through myopic approaches that merely target at eliciting and predicting user preferences (Qian and Jain, 2022; Boutilier et al., 2023; Zhan et al., 2021; Mladenov et al., 2020; Yao et al., 2022a, b; Dean and Morgenstern, 2022; Biyik et al., 2023). One primary reason is because any matching strategy has a profound impact on content creators’ beliefs about the users’ demand and consequently their reactions, i.e., what to produce next, leading to a shift in the distribution of content available for recommendation. This influence pathway is unfortunately overlooked in existing recommendation algorithm design; and therefore, there is a great need for a robust recommendation strategy that operates with respect to creators’ strategic responses and the resultant content dynamics. It is imperative for the platform to encourage creators in generating content that continuously contributes to the overall health of the ecosystem.

Typically, creators’ well-being is intricately linked to the exposure of their content and the economic incentives they accrue from the platform, compelling them to continuously strive for maximized benefits (Glotfelter, 2019; Hodgson, 2021). This dynamic creates a competitive environment that leads to intriguing phenomena in terms of welfare guarantees at equilibrium (Fleder and Hosanagar, 2009; Jagadeesan et al., 2022; Zhu et al., 2023). For instance, Yao et al. (2023a) introduced a game theoretical framework to investigate competition dynamics among content creators. Their research revealed that social welfare loss can be attributed to factors such as the degree of exploration in users’ decision making and the span of recommendation slots. As indicated by many previous studies, the platform suffers from sub-optimal social welfare and thus undermines long-term revenue when content distribution lacks necessary diversity to cater to various users’ preferences. This issue is also observed in empirical studies, where content creators often exhibit a tendency to chase trends (Holmbom, 2015; Nandagiri and Philip, 2018). In essence, creators tend to produce content that arouses the interests of the majority user group, owing to the group’s high visibility and the creators’ myopic creation strategies (Yao et al., 2023a; Jagadeesan et al., 2022). However, it is our contention that the platform should not simply blame creators for their perceived selfishness and myopia. This is because creators do not possess a holistic view of the demand distribution, i.e., user preferences. Instead, it is the platform’s responsibility to disseminate knowledge about user demand to creators. By doing so, creators can make better informed decisions that mutually benefit their own interest and enhance user welfare (and hence platform’s revenue).

In this study, we extend the Content Creator Competition (C3superscript𝐶3C^{3}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT) framework introduced by Yao et al. (2023a, b), to model the dynamics of competition among content creators. We relax the behavioral assumptions about creators’ updating strategies in the original framework and explore how the platform can design mechanisms to optimize user welfare accordingly. Our key idea is to direct creators’ attention towards currently under-served users, by manipulating creators’ received utilities with respect to the cumulative user satisfaction about the recommended content. We present a series of approaches to implement the interventions with theoretical justifications.

To validate the effectiveness of our approach, we conducted offline experiments using both synthetic data and the MovieLens dataset, and demonstrated how our mechanism improves user welfare over time under a creator response simulator. Additionally, we deployed an online experiment on a leading short-video recommendation platform over a span of three weeks and observed statistically significant and positive result in terms of the overall user engagement and content diversity. Our model and online experiments offer valuable insights into the design of incentive-aware recommender platforms. To summarize, our contributions can be listed as follows:

  1. 1.

    We formalize the user welfare optimization problem in a competitive content creation environment and identify the primary cause for potential sub-optimal outcomes: the information asymmetry between content creators and the platform.

  2. 2.

    We propose a dynamic user importance reweighting approach with theoretical justifications for optimizing user welfare and three implementation schemes which can be applied to various practical scenarios.

  3. 3.

    We demonstrate the effectiveness of our solution with both offline simulations and online testing on real traffic.

2 Related Work

The characterization and optimization of long-term dynamics on content platform involving strategic content creators has garnered increasing attention from both theoretical (Ben-Porat and Tennenholtz, 2017, 2018; Yao et al., 2023b, a; Zhu et al., 2023; Hu et al., 2023; Immorlica et al., 2024; Hron et al., 2022; Jagadeesan et al., 2022; Dean et al., 2024; Immorlica et al., 2024; Xu et al., 2024; Yao et al., 2024) and empirical (Mladenov et al., 2020; Prasad et al., 2023) fields. Seminal works from Ben-Porat and Tennenholtz (2017, 2018) introduced a game theoretical setting to model interactions between content creators and users, and proposed the Shapley mediator to ensure the existence of a pure Nash Equilibrium (Nash Jr, 1950).

Recently, Yao et al. (2023a) demonstrated that due to creators’ competition, the user welfare loss under a top-K𝐾Kitalic_K recommender systems can be upper-bounded by O(1logK)𝑂1𝐾O(\frac{1}{\log K})italic_O ( divide start_ARG 1 end_ARG start_ARG roman_log italic_K end_ARG ). This finding suggests that the platform can improve user welfare by providing more recommendations. Building on this, the authors further proposed a category of mechanisms for the platform to ensure a stable equilibrium and developed a computational solution to identify the optimal mechanism for social welfare optimization (Yao et al., 2023b). Additionally, Zhu et al. (2023) introduced an online learning method to jointly optimize recommendation policy and payment contracts for creators to maximize accumulated utility. Hu et al. (2023) designed a learning algorithm to incentivize the creation of high-quality content. However, all these studies rely on strong behavioral assumptions about content creators, e.g., they can perform no-regret learning (Yao et al., 2023a), or have oracle access to their utility functions (Yao et al., 2023b; Ben-Porat and Tennenholtz, 2017, 2018), so that the Nash equilibium is achievable. Our work bridges this gap by develo** a system-side solution to optimize user welfare that even when creators are not able to achieve Nash equilibria.

On the empirical side, Mladenov et al. (2020) explored a scenario where content creators may leave the platform if their user engagement falls below a threshold. The study optimized social welfare by solving a constrained matching problem. In a similar spirit, Prasad et al. (2023) introduced a sequential prompting policy aimed at optimizing user welfare in equilibrium. The optimal policy was determined through mixed integer programming. The solutions were reported to be effective under specific behavioral assumptions or environmental contexts, e.g., the platform can send prompts to creators as additional signals. However, the platforms are often constrained in their ability to influence the ecosystem. They may primarily rely on monetary incentives to motivate creators and have limited flexibility to manipulate factors beyond matching strategies and post-matching rewards. Our solution addresses this broader range of scenarios, making it applicable, for example, when creators are highly responsive to monetary incentives, and the platform’s influence is primarily exerted through adjustments to matching probabilities and post-matching rewards.

3 The Modeling of Content Creation Competition

In this section, we formulate the competition among content creators (i.e., players) as a strategic game, which will serve as an environment for the subsequent mechanism design problem. At a high level, each creator’s utility is determined by the platform’s matching strategy and the post-matching reward function. Creators adhere to simple, local update principles to sequentially alter their strategies, resulting in a dynamic content distribution on the platform. The primary objective of the platform is to optimize the cumulative user welfare by designing its matching strategy and post-matching reward function. Our strategic game setup builds upon and extends the framework of Content Creator Competition (C3superscript𝐶3C^{3}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT) game introduced by Yao et al. (2023a, b). For the sake of simplicity in nomenclature, we retain the name of C3superscript𝐶3C^{3}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT and refer to our game as Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT, i.e., an extension of the C3superscript𝐶3C^{3}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. Formally, a Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT instance is defined by the following tuple: (𝒳,{𝒮i}i=1n,σ,β,K,R())𝒳superscriptsubscriptsubscript𝒮𝑖𝑖1𝑛𝜎𝛽𝐾𝑅\big{(}\mathcal{X},\{\mathcal{S}_{i}\}_{i=1}^{n},\sigma,\beta,K,R(\cdot)\big{)}( caligraphic_X , { caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_σ , italic_β , italic_K , italic_R ( ⋅ ) ), which we explain in details below.

  1. 1.

    Basic setups: a user distribution 𝒳𝒳\mathcal{X}caligraphic_X with finite support {𝒙jd}j=1msuperscriptsubscriptsubscript𝒙𝑗superscript𝑑𝑗1𝑚\{\bm{x}_{j}\in\mathbb{R}^{d}\}_{j=1}^{m}{ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and a set of content creators denoted by [n]={1,,n}delimited-[]𝑛1𝑛[n]=\{1,\cdots,n\}[ italic_n ] = { 1 , ⋯ , italic_n }. Each creator i𝑖iitalic_i can take an action 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, is often referred to as a pure strategy in game-theoretic literature, from an action set 𝒮idsubscript𝒮𝑖superscript𝑑\mathcal{S}_{i}\subset\mathbb{R}^{d}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be understood as the embedding of content that creator i𝑖iitalic_i will produce. Without loss of generality, we assume the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms of any 𝒙𝒙\bm{x}bold_italic_x and 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are upper bounded by 1.

  2. 2.

    Relevance function: the relevance function σ(𝒔,𝒙):d×d0:𝜎𝒔𝒙superscript𝑑superscript𝑑subscriptabsent0\sigma(\bm{s},\bm{x}):\mathbb{R}^{d}\times\mathbb{R}^{d}\rightarrow\mathbb{R}_% {\geq 0}italic_σ ( bold_italic_s , bold_italic_x ) : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT measures the relevance score between a user 𝒙𝒳similar-to𝒙𝒳\bm{x}\sim\mathcal{X}bold_italic_x ∼ caligraphic_X and content 𝒔𝒔\bm{s}bold_italic_s. Without loss of generality, we normalize σ𝜎\sigmaitalic_σ to [0,1]01[0,1][ 0 , 1 ], where 1111 suggests perfect matching. We focus on modeling the strategic behavior of creators and thus abstract away the estimation of σ𝜎\sigmaitalic_σ 111We assume σ𝜎\sigmaitalic_σ is learned from the offline data and σ(𝒔i,𝒙)𝜎subscript𝒔𝑖𝒙\sigma(\bm{s}_{i},\bm{x})italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) is an unbiased estimation of user 𝒙𝒙\bm{x}bold_italic_x’s satisfaction when exposed to 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. . For simplicity, we use σi,𝒙subscript𝜎𝑖𝒙\sigma_{i,\bm{x}}italic_σ start_POSTSUBSCRIPT italic_i , bold_italic_x end_POSTSUBSCRIPT to denote σ(𝒔i,𝒙)𝜎subscript𝒔𝑖𝒙\sigma(\bm{s}_{i},\bm{x})italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) when the joint strategy profile 𝒔=(𝒔1,,𝒔n)𝒮𝒔subscript𝒔1subscript𝒔𝑛𝒮\bm{s}=(\bm{s}_{1},\cdots,\bm{s}_{n})\in\mathcal{S}bold_italic_s = ( bold_italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ caligraphic_S and user profile 𝒙𝒙\bm{x}bold_italic_x are clear in the context of our discussion.

  3. 3.

    Matching function: Given any user 𝒙𝒳𝒙𝒳\bm{x}\in\mathcal{X}bold_italic_x ∈ caligraphic_X and when each creator commits to a strategy 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the platform retrieves the top-K𝐾Kitalic_K ranked content in terms of the relevance scores {σi,𝒙}i=1nsuperscriptsubscriptsubscript𝜎𝑖𝒙𝑖1𝑛\{\sigma_{i,\bm{x}}\}_{i=1}^{n}{ italic_σ start_POSTSUBSCRIPT italic_i , bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and match one of them to 𝒙𝒙\bm{x}bold_italic_x. Specifically, let {σl(1),𝒙σl(n),𝒙}subscript𝜎𝑙1𝒙subscript𝜎𝑙𝑛𝒙\{\sigma_{l(1),\bm{x}}\geq\cdots\geq\sigma_{l(n),\bm{x}}\}{ italic_σ start_POSTSUBSCRIPT italic_l ( 1 ) , bold_italic_x end_POSTSUBSCRIPT ≥ ⋯ ≥ italic_σ start_POSTSUBSCRIPT italic_l ( italic_n ) , bold_italic_x end_POSTSUBSCRIPT } be a permutation of {σi,𝒙}i=1nsuperscriptsubscriptsubscript𝜎𝑖𝒙𝑖1𝑛\{\sigma_{i,\bm{x}}\}_{i=1}^{n}{ italic_σ start_POSTSUBSCRIPT italic_i , bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we assume that the platform would pick 𝒔𝒙L𝒙(K;𝒔){σl(i),𝒙}i=1Ksubscript𝒔𝒙subscript𝐿𝒙𝐾𝒔superscriptsubscriptsubscript𝜎𝑙𝑖𝒙𝑖1𝐾\bm{s}_{\bm{x}}\in L_{\bm{x}}(K;\bm{s})\triangleq\{\sigma_{l(i),\bm{x}}\}_{i=1% }^{K}bold_italic_s start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∈ italic_L start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_K ; bold_italic_s ) ≜ { italic_σ start_POSTSUBSCRIPT italic_l ( italic_i ) , bold_italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT using a softmax distribution with temperature β0𝛽0\beta\geq 0italic_β ≥ 0 222The formulation in (Yao et al., 2023a) also assumes the platform retrieve top-K𝐾Kitalic_K content for each user, but let the user to choose one according to the Random Utility model. The resulting matching probability shares the same form as in Eq. (1), but differs in the sense that the β𝛽\betaitalic_β in our setting is a parameter controlled by the platform while it is the user decision noise in (Yao et al., 2023a)., i.e.,

    Pi(𝒔,𝒙)Prob[𝒔𝒙=𝒔l(i)]exp[β1σl(i),𝒙],1iK.formulae-sequencesubscript𝑃𝑖𝒔𝒙𝑃𝑟𝑜𝑏delimited-[]subscript𝒔𝒙subscript𝒔𝑙𝑖proportional-tosuperscript𝛽1subscript𝜎𝑙𝑖𝒙1𝑖𝐾P_{i}(\bm{s},\bm{x})\triangleq Prob[\bm{s}_{\bm{x}}=\bm{s}_{l(i)}]\propto\exp[% \beta^{-1}\sigma_{l(i),\bm{x}}],1\leq i\leq K.italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ) ≜ italic_P italic_r italic_o italic_b [ bold_italic_s start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT = bold_italic_s start_POSTSUBSCRIPT italic_l ( italic_i ) end_POSTSUBSCRIPT ] ∝ roman_exp [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_l ( italic_i ) , bold_italic_x end_POSTSUBSCRIPT ] , 1 ≤ italic_i ≤ italic_K . (1)

    A small β𝛽\betaitalic_β makes the matching strategy more deterministic, and β𝛽\beta\rightarrow\inftyitalic_β → ∞ corresponds to random matching.

  4. 4.

    User utility and welfare: When user 𝒙𝒙\bm{x}bold_italic_x is matched with 𝒔𝒔\bm{s}bold_italic_s, the user’s perceived utility is given by a function π(𝒔,𝒙)𝜋𝒔𝒙\pi(\bm{s},\bm{x})italic_π ( bold_italic_s , bold_italic_x ). The user welfare W(𝒔)𝑊𝒔W(\bm{s})italic_W ( bold_italic_s ) is thus defined as the total expected utility resulted from the matching,

    W(𝒔)=𝔼𝒙𝒳[π(𝒔𝒙,𝒙)].𝑊𝒔subscript𝔼similar-to𝒙𝒳delimited-[]𝜋subscript𝒔𝒙𝒙W(\bm{s})=\mathbb{E}_{\bm{x}\sim\mathcal{X}}[\pi(\bm{s}_{\bm{x}},\bm{x})].italic_W ( bold_italic_s ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ italic_π ( bold_italic_s start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT , bold_italic_x ) ] . (2)

    To simplify the technical discussions, we assume the learned relevance function σ𝜎\sigmaitalic_σ is an unbiased estimation of π𝜋\piitalic_π, and therefore W(𝒔)𝑊𝒔W(\bm{s})italic_W ( bold_italic_s ) can be simplified to

    W(𝒔)=𝔼𝒙𝒳[σ(𝒔𝒙,𝒙)].𝑊𝒔subscript𝔼similar-to𝒙𝒳delimited-[]𝜎subscript𝒔𝒙𝒙W(\bm{s})=\mathbb{E}_{\bm{x}\sim\mathcal{X}}[\sigma(\bm{s}_{\bm{x}},\bm{x})].italic_W ( bold_italic_s ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ italic_σ ( bold_italic_s start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT , bold_italic_x ) ] . (3)

    However, our proposed solution works for general welfare function defined in Eq (2).

  5. 5.

    Creator utility: For creator i𝑖iitalic_i, her utility is given by

    ui(𝒔)=𝔼𝒙𝒳[R(𝒔i,𝒙)Pi(𝒔,𝒙)],subscript𝑢𝑖𝒔subscript𝔼𝒙𝒳delimited-[]𝑅subscript𝒔𝑖𝒙subscript𝑃𝑖𝒔𝒙u_{i}(\bm{s})=\mathbb{E}_{\bm{x}\in\mathcal{X}}\left[R(\bm{s}_{i},\bm{x})\cdot P% _{i}(\bm{s},\bm{x})\right],italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT [ italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) ⋅ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ) ] , (4)

    where R(𝒔i,𝒙)𝑅subscript𝒔𝑖𝒙R(\bm{s}_{i},\bm{x})italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) is the system-provided reward for this matching.

    Natural choices of R𝑅Ritalic_R include R(𝒔i,𝒙)𝑅subscript𝒔𝑖𝒙R(\bm{s}_{i},\bm{x})italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) being proportional to the user’s perceived utility, or simply setting R(𝒔i,𝒙)=1𝑅subscript𝒔𝑖𝒙1R(\bm{s}_{i},\bm{x})=1italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) = 1 (i.e., reward creators by the amount of traffic). Therefore, we have

    ui(𝒔)=𝔼𝒙𝒳[σ(𝒔i,𝒙)Pi(𝒔,𝒙)],subscript𝑢𝑖𝒔subscript𝔼𝒙𝒳delimited-[]𝜎subscript𝒔𝑖𝒙subscript𝑃𝑖𝒔𝒙\displaystyle u_{i}(\bm{s})=\mathbb{E}_{\bm{x}\in\mathcal{X}}\left[\sigma(\bm{% s}_{i},\bm{x})\cdot P_{i}(\bm{s},\bm{x})\right],italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT [ italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) ⋅ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ) ] , (5)
    ui(𝒔)=𝔼𝒙𝒳[Pi(𝒔,𝒙)],subscript𝑢𝑖𝒔subscript𝔼𝒙𝒳delimited-[]subscript𝑃𝑖𝒔𝒙\displaystyle u_{i}(\bm{s})=\mathbb{E}_{\bm{x}\in\mathcal{X}}\left[P_{i}(\bm{s% },\bm{x})\right],italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT [ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ) ] , (6)

    Throughout the paper we adopt Eq (5) as the platform’s default choice, as it is demonstrated in (Yao et al., 2023a) that rewarding creators by user utility enjoys a better welfare guarantee than rewarding them by traffic.

The most well established concept for characterizing a game’s outcome is pure Nash equilibrium (PNE) (Nash Jr, 1950). At a PNE, any possible deviation from a player’s current strategy would not increase her utility conditioned on other players’ strategies. Under some mild assumptions, we can prove that the PNE of our Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT game exists and is unique as stated in the following theorem.

Theorem 1

Any Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT game with K=n𝐾𝑛K=nitalic_K = italic_n has a unique pure Nash equilibrium (PNE) under the utility function (6) if σ()𝜎\sigma(\cdot)italic_σ ( ⋅ ) is sufficiently smooth and concave and each creator has a convex strategy set.

Theorem 1 guarantees the existence of a unique PNE and thus theoretically allows the platform to establish a stable outcome. However, in practical scenarios, we find it uninteresting to either generalize this result or delve further into its properties for two reasons. First, it is rare for K=n𝐾𝑛K=nitalic_K = italic_n to hold in practice because no system will present the entire collection of content to each user. When K<n𝐾𝑛K<nitalic_K < italic_n, the existence of a PNE becomes challenging to establish, due to the discontinuity of the utility functions caused by the top-K𝐾Kitalic_K ranking operator during the matching process. Second, even when a PNE does exist, it does not suggest that creators can consistently reach it through sequential updates. Furthermore, the existence of a PNE does not necessarily imply it is easily achievable in practice, nor does it suggest an improved user welfare. In fact, as we will demonstrate in Section 4.1, even in a simple environment with a unique PNE, a natural updating dynamics among creators fails to converge to the PNE and results in sub-optimal user welfare.

Therefore, we focus on a more practical solution concept called Local Nash equilibria (LNE). While a PNE requires that all players do not want to deviate to any other strategy in the entire space, an LNE merely stipulates players are satisfied with their strategies in a local region. Its formal definition is given as follows.

Definition 1

A profile of creator strategies {𝒔i}i=1nsuperscriptsubscriptsubscriptsuperscript𝒔𝑖𝑖1𝑛\{\bm{s}^{*}_{i}\}_{i=1}^{n}{ bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT forms a local Nash equilibrium (LNE), if for every creator i𝑖iitalic_i, there exists an open set 𝒮i0𝒮isubscriptsuperscript𝒮0𝑖subscript𝒮𝑖\mathcal{S}^{0}_{i}\in\mathcal{S}_{i}caligraphic_S start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that 𝒔isubscriptsuperscript𝒔𝑖\bm{s}^{*}_{i}bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a best response strategy within 𝒮i0superscriptsubscript𝒮𝑖0\mathcal{S}_{i}^{0}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT; formally,

ui(𝒔i,𝒔i)ui(𝒔i,𝒔i) for every 𝒔i𝒮i0.subscript𝑢𝑖subscriptsuperscript𝒔𝑖subscriptsuperscript𝒔𝑖subscript𝑢𝑖subscript𝒔𝑖subscriptsuperscript𝒔𝑖 for every subscript𝒔𝑖subscriptsuperscript𝒮0𝑖u_{i}(\bm{s}^{*}_{i},\bm{s}^{*}_{-i})\geq u_{i}(\bm{s}_{i},\bm{s}^{*}_{-i})\,% \,\text{ for every }\bm{s}_{i}\in\mathcal{S}^{0}_{i}.italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT ) ≥ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT ) for every bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_S start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (7)

We argue that LNE offers a more intuitive and practical solution concept for consideration due to two observations. First, the strategic evolution of content creation is often deeply intertwined with creators’ historical decisions (Kajander, 2019). This correlation stems from content generation being anchored in domain-specific expertise and accumulated experiences, which are inherently stable attributes. As a result, the produced content usually demonstrates path dependency, posing significant challenges for creators in implementing drastic modifications. Second, creators are typically constrained by a lack of comprehensive insights into their utility functions due to a limited understanding of the user demographic and the distribution of user preferences. Given these constraints, creators are likely to resort to incremental adjustments for strategy update.

Hence, we focus on the setting where creators engage in a repeated play of Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT and employ a local searching rule termed local better response (LBR) update for improving their strategies. The details of LBR is presented in Algorithm 2 in Appendix A. LBR characterizes two fundamental properties of content creation: 1. it relies solely on point estimations of the utility function; and 2. it only incurs local changes at each update. At each step, a creator who decides to update her strategy would first generate an exploration direction 𝒈isubscript𝒈𝑖\bm{g}_{i}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and then she would evaluate whether adjusting her strategy in this direction results in a higher utility. If so, she proceeds to update her strategy along 𝒈isubscript𝒈𝑖\bm{g}_{i}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in a pace of η𝜂\etaitalic_η; otherwise, she maintains her current strategy. This procedure closely emulates real-world scenarios where creators strive to optimize their utilities while having merely black-box access to the utility functions. In practice, finding a clear direction that guarantees improved utility can be a challenging and, at times, unrealistic task. Consequently, we model their strategy evolution as an iterative process of trial and error. By definition, when LBR converges in Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT, it must converge to an LNE. Our primary interest lies in understanding how the platform can devise a dynamic rewarding or matching principle that maximizes cumulative user welfare within a given time period.

4 Intervention Mechanism Design

In this section, we introduce the new intervention mechanism designed to optimize user welfare. These mechanisms are intended for the platform to influence creators’ perceived utilities, thereby guiding the evolution of their strategies toward more desirable outcomes. We will first establish the need for platform-driven mechanism design by illustrating how suboptimal results can arise in a simplified example without any intervention. Subsequently, we will delve into the specifics of our proposed methods.

Refer to captionRefer to caption
Figure 1: Visualization of creators’ evolving strategies. Left: no intervention, right: platform decreases the weight of the center user by half. Creators’ strategies are marked with different colors, and the arrows start from initial strategies and point to the last-iterate strategies.

4.1 The Necessity of Intervention

We start with a simple illustrative example to show how the competition among creators could result in quite inferior user welfare in Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT when creators employ local update dynamics specified in Algorithm 2. This example exhibits a stark contrast to the sound welfare guarantee for no-regret learning (Belmega et al., 2018) equipped creators in (Yao et al., 2023a). Consider a Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT instance (𝒳,{𝒮i}i=1n,σ,β,K,R())𝒳superscriptsubscriptsubscript𝒮𝑖𝑖1𝑛𝜎𝛽𝐾𝑅(\mathcal{X},\{\mathcal{S}_{i}\}_{i=1}^{n},\sigma,\beta,K,R(\cdot))( caligraphic_X , { caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_σ , italic_β , italic_K , italic_R ( ⋅ ) ) described below. The user population 𝒳𝒳\mathcal{X}caligraphic_X is evenly distributed over the finite set {𝒙j}j=15={(0,0),(1,0),(0,1),(1,0),(0,1)}superscriptsubscriptsubscript𝒙𝑗𝑗150010011001\{\bm{x}_{j}\}_{j=1}^{5}=\{(0,0),(1,0),(0,1),(-1,0),(0,-1)\}{ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT = { ( 0 , 0 ) , ( 1 , 0 ) , ( 0 , 1 ) , ( - 1 , 0 ) , ( 0 , - 1 ) } and there are n=5𝑛5n=5italic_n = 5 content creators, each with action set 𝒮i=2subscript𝒮𝑖superscript2\mathcal{S}_{i}=\mathbb{R}^{2}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The reward function is defined as R(𝒔i,𝒙)=σ(𝒔i,𝒙)=max{2𝒔i𝒙2,0}𝑅subscript𝒔𝑖𝒙𝜎subscript𝒔𝑖𝒙𝑚𝑎𝑥conditional-setlimit-from2subscript𝒔𝑖evaluated-at𝒙20R(\bm{s}_{i},\bm{x})=\sigma(\bm{s}_{i},\bm{x})=\mathop{max}\{2-\|\bm{s}_{i}-% \bm{x}\|_{2},0\}italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) = italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) = start_BIGOP italic_m italic_a italic_x end_BIGOP { 2 - ∥ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 } and β=10,K=3formulae-sequence𝛽10𝐾3\beta=10,K=3italic_β = 10 , italic_K = 3. It is evident that the user welfare defined in Eq (3) is maximized when each creator precisely targets a single user, i.e., 𝒔i=𝒙i,1i5formulae-sequencesubscript𝒔𝑖subscript𝒙𝑖1𝑖5\bm{s}_{i}=\bm{x}_{i},1\leq i\leq 5bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ≤ italic_i ≤ 5, which also represents the PNE of this game. However, as we will illustrate through simulations, creators’ strategies do not converge to the PNE nor optimize the user welfare under the LBR dynamics when the platform does not intervene.

First, let’s examine what happens when the platform takes no action to guide the creators. The left panel of Figure 1 visualizes the trajectories of strategy evolution in our constructed environment. Initially, creators’ strategies are randomly distributed in the region between 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒙2subscript𝒙2\bm{x}_{2}bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Over time, 𝒙2subscript𝒙2\bm{x}_{2}bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝒙3subscript𝒙3\bm{x}_{3}bold_italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are exclusively occupied by one creator each, while 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT has two creators competing for it. The remaining creator chooses not to target either 𝒙4subscript𝒙4\bm{x}_{4}bold_italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT or 𝒙5subscript𝒙5\bm{x}_{5}bold_italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT and hovers around the region between 𝒙4subscript𝒙4\bm{x}_{4}bold_italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT and 𝒙5subscript𝒙5\bm{x}_{5}bold_italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, leaving both 𝒙4subscript𝒙4\bm{x}_{4}bold_italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT and 𝒙5subscript𝒙5\bm{x}_{5}bold_italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT unsatisfied.

From the observed strategy evolution paths, we can deduce how this sub-optimal situation arises. Initially, creators move in different directions: two creators quickly converge to 𝒙2subscript𝒙2\bm{x}_{2}bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝒙3subscript𝒙3\bm{x}_{3}bold_italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, while the remaining three compete for the attention of the central user 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. However, after this point, no creator has a strong incentive to move closer to 𝒙4subscript𝒙4\bm{x}_{4}bold_italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT or 𝒙5subscript𝒙5\bm{x}_{5}bold_italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, as the marginal utility gained from getting closer to 𝒙4subscript𝒙4\bm{x}_{4}bold_italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT or 𝒙5subscript𝒙5\bm{x}_{5}bold_italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT does not compensate for the loss incurred by moving away from 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Consequently, two creators decide to remain around 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and one creator settles in a region between 𝒙4subscript𝒙4\bm{x}_{4}bold_italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT and 𝒙5subscript𝒙5\bm{x}_{5}bold_italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT.

The above observations highlight the pivotal role played by the central user 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the occurrence of sub-optimal results. Since 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is close to other users in the embedding space, targeting 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT becomes a popular and safe choice for creators. It secures a fraction of attention from 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT without completely sacrificing the utility gained from other user groups. Thus, users like 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT act as “popular states” when creators dynamically adjust their strategies. Whenever a creator is located near 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, they are likely to be trapped and reluctant to explore potentially better strategies. Consequently, such “popular” users end up attracting more creators, leaving other users unattended.

One immediate solution for the platform is to identify and reduce the impact of these “popular” users. For instance, the platform can halve the utility gained from the central user 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for each creator. This simple mechanism works effectively in this example, as illustrated in the right panel of Figure 1. Initially, there are still three creators converging to 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. However, due to the reduced reward from 𝒙1subscript𝒙1\bm{x}_{1}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, two creators find it less profitable to stay, driving them to deviate towards 𝒙4subscript𝒙4\bm{x}_{4}bold_italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT and 𝒙5subscript𝒙5\bm{x}_{5}bold_italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT. By assigning different importance weight for each user, the platform can reshape each creator utility landscape and therefore influence their local search based dynamical behaviors.

4.2 Platform’s Intervention Mechanisms

The observations above motivate our design of intervention mechanisms that can be employed by the platform to influence creators’ perceived utilities. These mechanisms lay the foundation for the adaptive optimization methods we will delve into later. As a reminder, as defined in Eq (4), a creator’s expected utility from a specific user 𝒙𝒙\bm{x}bold_italic_x is influenced by two key factors: the probability of creator i𝑖iitalic_i being matched with user 𝒙𝒙\bm{x}bold_italic_x denoted as Pi(𝒔,𝒙)subscript𝑃𝑖𝒔𝒙P_{i}(\bm{s},\bm{x})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ), and the post-matching reward assigned by the platform, denoted as R(𝒔i,𝒙)𝑅subscript𝒔𝑖𝒙R(\bm{s}_{i},\bm{x})italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ). The default choice of the platform is to set the reward function R(𝒔i,𝒙)=σ(𝒔i,𝒙)𝑅subscript𝒔𝑖𝒙𝜎subscript𝒔𝑖𝒙R(\bm{s}_{i},\bm{x})=\sigma(\bm{s}_{i},\bm{x})italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) = italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) as in Eq (5) and the matching probability function Pi(𝒔,𝒙)subscript𝑃𝑖𝒔𝒙P_{i}(\bm{s},\bm{x})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ) as the softmax over the top-K𝐾Kitalic_K ranked content 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as demonstrated in Eq (1).

In the example provided in Section 4.1, the primary factors leading to sub-optimal welfare is the presence of popular user groups that attract excessive creator attention, making minority user groups unnoticed by creators. To enhance overall user welfare, it is crucial for the platform to guide creators’ attention toward these overlooked user groups by re-emphasizing their significance. In this way, creators who were previously unaware of these user groups or found them less lucrative may consider adjusting their strategies to align more closely with those users’ preferences. To achieve this objective, we introduce and study three different approaches for modifying the schemes of R(𝒔i,𝒙)𝑅subscript𝒔𝑖𝒙R(\bm{s}_{i},\bm{x})italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) and Pi(𝒔,𝒙)subscript𝑃𝑖𝒔𝒙P_{i}(\bm{s},\bm{x})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ), namely User Importance Reweighting (UIR), Soft Matching Truncation (SMT), and Hard Matching Truncation (HMT). These three mechanisms share a common underlying principle, but they are designed to operate under different scenarios, taking into account potential constraints faced by a platform.

User Importance Reweighting (UIR)

The most straightforward approach is UIR,

ui(𝒔i,𝒔i)=𝔼𝒙𝒳[w(𝒙)R(𝒔i,𝒙)Pi(𝒔,𝒙)],subscript𝑢𝑖subscript𝒔𝑖subscript𝒔𝑖subscript𝔼𝒙𝒳delimited-[]𝑤𝒙𝑅subscript𝒔𝑖𝒙subscript𝑃𝑖𝒔𝒙u_{i}(\bm{s}_{i},\bm{s}_{-i})=\mathbb{E}_{\bm{x}\in\mathcal{X}}[w(\bm{x})\cdot R% (\bm{s}_{i},\bm{x})\cdot P_{i}(\bm{s},\bm{x})],italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT [ italic_w ( bold_italic_x ) ⋅ italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) ⋅ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ) ] , (8)

where the platform simply adjusts the post-matching rewards for creators based on the measured importance of each user. Specifically, if the platform believes a user has been under-served under the current content distribution, it raises the reward for creators whose content is consumed by such a user. Intuitively, this sends a message to creators that “if you shift your content towards such users, you will get a higher marginal reward compared to sticking to your current content.” As a result, the platform can carefully design the user weights such that a reasonable number of creators can be successfully incentivized to serve the targeted users.

Soft Matching Truncation (SMT) and Hard Matching Truncation (HMT)

Both SMT and HMT function in a similar manner as UIR but focus on manipulating the matching probability rather than the post-matching reward by utilizing the weight w(𝒙)𝑤𝒙w(\bm{x})italic_w ( bold_italic_x ). Recall that the probabilistic matching function P𝑃Pitalic_P is characterized by two parameters: the truncation number K𝐾Kitalic_K (which, in practice, corresponds to the total number of recommendation candidates retrieved for ranking) and the temperature β𝛽\betaitalic_β (which can be viewed as a measure of the exploration strength in the ranking model). When the platform needs to signal the importance of a specific user 𝒙𝒙\bm{x}bold_italic_x, it enhances 𝒙𝒙\bm{x}bold_italic_x’s visibility among creators, increasing the chance that creators who were previously unaware of 𝒙𝒙\bm{x}bold_italic_x start realizing the potential benefits of catering to 𝒙𝒙\bm{x}bold_italic_x. This can be achieved by either increasing β𝛽\betaitalic_β or K𝐾Kitalic_K: increasing β𝛽\betaitalic_β flattens the distribution of 𝒙𝒙\bm{x}bold_italic_x’s matches among the top-K𝐾Kitalic_K candidates, while increasing K𝐾Kitalic_K enlarges the pool of creators exposed to 𝒙𝒙\bm{x}bold_italic_x. Therefore, both of them augment the expected number of creators exposed to 𝒙𝒙\bm{x}bold_italic_x. Since K𝐾Kitalic_K imposes a rigid threshold on the number of creators exposed to 𝒙𝒙\bm{x}bold_italic_x, while β𝛽\betaitalic_β offers a more flexible threshold, we refer to them as Hard Matching Truncation (HMT) and Soft Matching Truncation (SMT), respectively:

ui(𝒔i,𝒔i)=𝔼𝒙𝒳[R(𝒔i,𝒙)Pi(𝒔,𝒙;β(w(𝒙)),K)],subscript𝑢𝑖subscript𝒔𝑖subscript𝒔𝑖subscript𝔼𝒙𝒳delimited-[]𝑅subscript𝒔𝑖𝒙subscript𝑃𝑖𝒔𝒙𝛽𝑤𝒙𝐾\displaystyle u_{i}(\bm{s}_{i},\bm{s}_{-i})=\mathbb{E}_{\bm{x}\in\mathcal{X}}[% R(\bm{s}_{i},\bm{x})\cdot P_{i}(\bm{s},\bm{x};\beta(w(\bm{x})),K)],italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT [ italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) ⋅ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ; italic_β ( italic_w ( bold_italic_x ) ) , italic_K ) ] , (9)
ui(𝒔i,𝒔i)=𝔼𝒙𝒳[R(𝒔i,𝒙)Pi(𝒔,𝒙;β,K(w(𝒙)))].subscript𝑢𝑖subscript𝒔𝑖subscript𝒔𝑖subscript𝔼𝒙𝒳delimited-[]𝑅subscript𝒔𝑖𝒙subscript𝑃𝑖𝒔𝒙𝛽𝐾𝑤𝒙\displaystyle u_{i}(\bm{s}_{i},\bm{s}_{-i})=\mathbb{E}_{\bm{x}\in\mathcal{X}}[% R(\bm{s}_{i},\bm{x})\cdot P_{i}(\bm{s},\bm{x};\beta,K(w(\bm{x})))].italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT [ italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) ⋅ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x ; italic_β , italic_K ( italic_w ( bold_italic_x ) ) ) ] . (10)

We remark that UIR is more suitable when the platform possesses the flexibility to design payment incentives for creators. However, if the platform has limited control over payment, such as budget constraints or other factors, SMT or HMT can be employed, as they only require minor adjustments to the matching function. The specific choices of increasing functions β(),K()𝛽𝐾\beta(\cdot),K(\cdot)italic_β ( ⋅ ) , italic_K ( ⋅ ) are flexible and we leave it to the experiments.

4.3 Welfare Optimization through Adaptive Reweighing

To implement our proposed intervention mechanisms, we need to compute the corresponding user-specific weighting functions, namely w()𝑤w(\cdot)italic_w ( ⋅ ), β()𝛽\beta(\cdot)italic_β ( ⋅ ), and K()𝐾K(\cdot)italic_K ( ⋅ ). In this section we will use UIR as an example to illustrate our method and let the user distribution 𝒳𝒳\mathcal{X}caligraphic_X be a uniform distribution over its support {𝒙1,,𝒙m}subscript𝒙1subscript𝒙𝑚\{\bm{x}_{1},\cdots,\bm{x}_{m}\}{ bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } so that w()𝑤w(\cdot)italic_w ( ⋅ ) can be parameterized by a vector 𝒘0m𝒘subscriptsuperscript𝑚absent0\bm{w}\in\mathbb{R}^{m}_{\geq 0}bold_italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT. When the platform commits to an intervention mechanism 𝒘𝒘\bm{w}bold_italic_w, the content creators’ strategic updates according to LBR (i.e., algorithm 2) will lead their joint strategy to an LNE 𝒔superscript𝒔\bm{s}^{*}bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which determines the content distribution and the total user welfare W𝑊Witalic_W. Therefore, the task of finding the optimal 𝒘𝒘\bm{w}bold_italic_w maximizing W𝑊Witalic_W under Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT can be formulated as the following bi-level optimization problem:

max𝒘0msubscript𝑚𝑎𝑥𝒘superscriptsubscriptabsent0𝑚\displaystyle\mathop{max}_{\bm{w}\in\mathbb{R}_{\geq 0}^{m}}start_BIGOP italic_m italic_a italic_x end_BIGOP start_POSTSUBSCRIPT bold_italic_w ∈ blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT W(𝒔(𝒘))𝑊superscript𝒔𝒘\displaystyle\quad W(\bm{s}^{*}(\bm{w}))italic_W ( bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_w ) ) (11)
s.t., 𝒔(𝒘) is an LNE of Cext3.superscript𝒔𝒘 is an LNE of Cext3\displaystyle\quad\bm{s}^{*}(\bm{w})\text{~{}is an LNE of~{}$C^{3}_{\text{ext}% }${}}.bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_w ) is an LNE of italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT . (12)

We adopt the formulation in Eq (11) simply for presentation purpose, as the constraint in Eq (12) is not well-defined due to the non-uniqueness of LNE of Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT in general. When we takcle problem in Eq (11), we employ either LBR for simulating an 𝒔(𝒘)superscript𝒔𝒘\bm{s}^{*}(\bm{w})bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_w ) in offline experiments, or we directly observe 𝒔(𝒘)superscript𝒔𝒘\bm{s}^{*}(\bm{w})bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_w ) based on the creators’ actual responses over a period of time for online experiments. An straightforward approach to solve Eq (11) is to use an iterative method to dynamically adjust 𝒘𝒘\bm{w}bold_italic_w, and the main challenge is to pin down an improving direction of 𝒘𝒘\bm{w}bold_italic_w. Ideally, we can apply first-order optimization if an estimation of the gradient dWd𝒘𝑑𝑊𝑑𝒘\frac{dW}{d\bm{w}}divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_w end_ARG is available. However, the interplay between 𝒘𝒘\bm{w}bold_italic_w and 𝒔(𝒘)superscript𝒔𝒘\bm{s}^{*}(\bm{w})bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_w ) is generally intractable to analyze and we have to resort to heuristic methods. To get an intuitive idea about an improving direction of 𝒘𝒘\bm{w}bold_italic_w, we consider a stylized setting where the user population is perfectly separated and the relevance function is given by dot-product σ(𝒔,𝒙)=𝒔𝒙𝜎𝒔𝒙superscript𝒔top𝒙\sigma(\bm{s},\bm{x})=\bm{s}^{\top}\bm{x}italic_σ ( bold_italic_s , bold_italic_x ) = bold_italic_s start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x. In such a structured environment, the following theorem reveals a useful principle for finding an improving direction of 𝒘𝒘\bm{w}bold_italic_w.

Theorem 2

When the number of creators n𝑛nitalic_n is large enough and the user population 𝒳𝒳\mathcal{X}caligraphic_X is a uniform distribution over an orthogonal basis in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, updating 𝐰𝐰\bm{w}bold_italic_w with the following formula guarantees an improvement in W𝑊Witalic_W defined in Eq (3):

wj=wjeηπ¯(𝒙j),j[m],formulae-sequencesubscriptsuperscript𝑤𝑗subscript𝑤𝑗superscript𝑒𝜂¯𝜋subscript𝒙𝑗for-all𝑗delimited-[]𝑚w^{\prime}_{j}=w_{j}\cdot e^{-\eta\bar{\pi}(\bm{x}_{j})},\forall j\in[m],italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUPERSCRIPT - italic_η over¯ start_ARG italic_π end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , ∀ italic_j ∈ [ italic_m ] , (13)

where η𝜂\etaitalic_η is a small scalar denoting the learning rate, and π¯(𝐱j)¯𝜋subscript𝐱𝑗\bar{\pi}(\bm{x}_{j})over¯ start_ARG italic_π end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is the expected utility of user 𝐱jsubscript𝐱𝑗\bm{x}_{j}bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT at 𝐬(𝐰)superscript𝐬𝐰\bm{s}^{*}(\bm{w})bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_w ).

By the definition in Eq (4), rescaling each wjsubscript𝑤𝑗w_{j}italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT by a constant does not alter the nature of problem in Eq (11). Therefore, the insight conveyed by Eq (13) is clear: if a user enjoys a high expected utility under the current content distribution, the platform should reduce her weight when rewarding creators. Conversely, if a user’s expected utility is relatively low, the platform needs to highlight her significance for motivating a larger set of creators to develop content that caters to the needs of this user. Despite the fact that Eq (13) is derived from a significantly simplified user distribution, we will leverage it as a foundational element in the development of our adaptive reweighing algorithm and demonstrate in our experiments that this simple heuristic works pretty well for real user distributions.

Next, we formally introduce our proposed adaptive reweighting algorithm for optimizing the intervention mechanism 𝒘𝒘\bm{w}bold_italic_w. Each user 𝒙𝒙\bm{x}bold_italic_x is initially assigned a unit weight 𝒘(0)(𝒙)=1superscript𝒘0𝒙1\bm{w}^{(0)}(\bm{x})=1bold_italic_w start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( bold_italic_x ) = 1. During subsequent iterations, the platform continuously monitors the average utility of user 𝒙𝒙\bm{x}bold_italic_x, denoted as π¯(𝒙)¯𝜋𝒙\bar{\pi}(\bm{x})over¯ start_ARG italic_π end_ARG ( bold_italic_x ), within a specified time window, and updates 𝒘𝒘\bm{w}bold_italic_w according to the following (14), where α>0𝛼0\alpha>0italic_α > 0 is a tunable parameter. This adjustment process employs the meta-algorithm structure of multiplicative weight update method (Arora et al., 2012).

w(i+1)(𝒙)w(i)(𝒙)exp(απ¯(𝒙)).proportional-tosuperscript𝑤𝑖1𝒙superscript𝑤𝑖𝒙𝛼¯𝜋𝒙w^{(i+1)}(\bm{x})\propto w^{(i)}(\bm{x})\cdot\exp(-\alpha\bar{\pi}(\bm{x})).italic_w start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT ( bold_italic_x ) ∝ italic_w start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( bold_italic_x ) ⋅ roman_exp ( - italic_α over¯ start_ARG italic_π end_ARG ( bold_italic_x ) ) . (14)

In practice, we can choose the user utility function π(𝒙)𝜋𝒙\pi(\bm{x})italic_π ( bold_italic_x ) as the metric used for defining the user welfare function Eq (2). Up to this point, our discussion has primarily focused on the assumption that π(𝒙;𝒔)σ(𝒔i,𝒙)proportional-to𝜋𝒙𝒔𝜎subscript𝒔𝑖𝒙\pi(\bm{x};\bm{s})\propto\sigma(\bm{s}_{i},\bm{x})italic_π ( bold_italic_x ; bold_italic_s ) ∝ italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ). However, it is important to highlight that π𝜋\piitalic_π in Eq (14) can also take alternative forms to optimize empirical performance. For instance, it can be a function of any numerical measurement related to user satisfaction (e.g., click-through rate). To reduce the dimension of the user weight vector and enhance the robustness of weight updates, we recommend that algorithm designers pre-cluster users into L𝐿Litalic_L groups based on their static features so that users within the same group maintain identical weights. The platform’s intervention strategy is thus parameterized by an L𝐿Litalic_L-dimensional vector, 𝒘=(w1,,wL)𝒘subscript𝑤1subscript𝑤𝐿\bm{w}=(w_{1},\cdots,w_{L})bold_italic_w = ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_w start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ), with each entry denoting the weight assigned to the corresponding user group.

For a fixed time horizon T𝑇Titalic_T in which the platform plans to perform intervention, the platform divides the horizon into E𝐸Eitalic_E epochs, each with an equal length of M𝑀Mitalic_M (i.e., T=EM𝑇𝐸𝑀T=EMitalic_T = italic_E italic_M). At the start of each epoch e𝑒eitalic_e, the platform commits to a weight vector 𝒘(e)superscript𝒘𝑒\bm{w}^{(e)}bold_italic_w start_POSTSUPERSCRIPT ( italic_e ) end_POSTSUPERSCRIPT and deploy it to one of the intervention mechanisms UIR, SMT or HMT. After that, the platform observes and records the sequence of creators’ strategic responses, denoted as {𝒔(e,i)}i=1Msuperscriptsubscriptsuperscript𝒔𝑒𝑖𝑖1𝑀\{\bm{s}^{(e,i)}\}_{i=1}^{M}{ bold_italic_s start_POSTSUPERSCRIPT ( italic_e , italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT from the online environment. Subsequently, the algorithm estimates the average user welfare π¯lsubscript¯𝜋𝑙\bar{\pi}_{l}over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT for each group l𝑙litalic_l. It then employs values in {π¯l}l=1Lsuperscriptsubscriptsubscript¯𝜋𝑙𝑙1𝐿\{\bar{\pi}_{l}\}_{l=1}^{L}{ over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT to update the weights at the beginning of the (e+1)𝑒1(e+1)( italic_e + 1 )-th epoch using Eq (14). To prevent 𝒘𝒘\bm{w}bold_italic_w from growing or declining excessively, after each update we first normalize and then clip its values within a predetermined interval [wmin,wmax]subscript𝑤𝑚𝑖𝑛subscript𝑤𝑚𝑎𝑥[w_{\mathop{min}},w_{\mathop{max}}][ italic_w start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ]. The formal description of this process is presented in Algorithm 1. The implementation details for the deployment of UIR, SMT and HMT in Line 4 are deferred to Appendix B.

Algorithm 1 Adaptive Reweighting
1:  Input: Number of epochs E𝐸Eitalic_E, Epoch length M𝑀Mitalic_M, Initial strategy profile 𝒔(0)superscript𝒔0\bm{s}^{(0)}bold_italic_s start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT, learning rate η𝜂\etaitalic_η, temperature parameter α𝛼\alphaitalic_α, user groups (G1,,GL)subscript𝐺1subscript𝐺𝐿(G_{1},\cdots,G_{L})( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_G start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ), clip** constant wmin,wmaxsubscript𝑤𝑚𝑖𝑛subscript𝑤𝑚𝑎𝑥w_{\mathop{min}},w_{\mathop{max}}italic_w start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT.
2:  Initialization: Initial weight 𝒘(0)=(w1(0),,wL(0))superscript𝒘0superscriptsubscript𝑤10superscriptsubscript𝑤𝐿0\bm{w}^{(0)}=(w_{1}^{(0)},\cdots,w_{L}^{(0)})bold_italic_w start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , ⋯ , italic_w start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ).
3:  for e=0𝑒0e=0italic_e = 0 to E𝐸Eitalic_E do
4:     Deploy the weight 𝒘esuperscript𝒘𝑒\bm{w}^{e}bold_italic_w start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT using UIR (Eq (8)), SMT (Eq (9)) or HMT (Eq (10)).
5:     Observe creators’ strategy sequence {𝒔(e,i)}i=1Msuperscriptsubscriptsuperscript𝒔𝑒𝑖𝑖1𝑀\{\bm{s}^{(e,i)}\}_{i=1}^{M}{ bold_italic_s start_POSTSUPERSCRIPT ( italic_e , italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT.
6:     Compute the average user utility for each group
π¯l=1M|Gl|𝒙Gli=1Mπ(𝒙;𝒔(e,i)).subscript¯𝜋𝑙1𝑀subscript𝐺𝑙subscript𝒙subscript𝐺𝑙superscriptsubscript𝑖1𝑀𝜋𝒙superscript𝒔𝑒𝑖\bar{\pi}_{l}=\frac{1}{M|G_{l}|}\sum_{\bm{x}\in G_{l}}\sum_{i=1}^{M}\pi(\bm{x}% ;\bm{s}^{(e,i)}).over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_M | italic_G start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT bold_italic_x ∈ italic_G start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_π ( bold_italic_x ; bold_italic_s start_POSTSUPERSCRIPT ( italic_e , italic_i ) end_POSTSUPERSCRIPT ) .
7:     Update wl(e+13)=wl(e)exp(απ¯l),l[L]formulae-sequencesubscriptsuperscript𝑤𝑒13𝑙subscriptsuperscript𝑤𝑒𝑙𝛼subscript¯𝜋𝑙𝑙delimited-[]𝐿w^{(e+\frac{1}{3})}_{l}=w^{(e)}_{l}\cdot\exp(-\alpha\bar{\pi}_{l}),l\in[L]italic_w start_POSTSUPERSCRIPT ( italic_e + divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT ( italic_e ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⋅ roman_exp ( - italic_α over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) , italic_l ∈ [ italic_L ].
8:     Normalize wl(e+23)=Lwl(e+13)/j=1Lwj(e+13),l[L]formulae-sequencesubscriptsuperscript𝑤𝑒23𝑙𝐿subscriptsuperscript𝑤𝑒13𝑙superscriptsubscript𝑗1𝐿subscriptsuperscript𝑤𝑒13𝑗𝑙delimited-[]𝐿w^{(e+\frac{2}{3})}_{l}=L\cdot w^{(e+\frac{1}{3})}_{l}/\sum_{j=1}^{L}w^{(e+% \frac{1}{3})}_{j},l\in[L]italic_w start_POSTSUPERSCRIPT ( italic_e + divide start_ARG 2 end_ARG start_ARG 3 end_ARG ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_L ⋅ italic_w start_POSTSUPERSCRIPT ( italic_e + divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT / ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ( italic_e + divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_l ∈ [ italic_L ].
9:     Clip 𝒘(e+1)=superscript𝒘𝑒1absent\bm{w}^{(e+1)}=bold_italic_w start_POSTSUPERSCRIPT ( italic_e + 1 ) end_POSTSUPERSCRIPT = Clip(w(e+23),wmin,wmax)superscript𝑤𝑒23subscript𝑤𝑚𝑖𝑛subscript𝑤𝑚𝑎𝑥(\bm{w}^{(e+\frac{2}{3})},w_{\mathop{min}},w_{\mathop{max}})( bold_italic_w start_POSTSUPERSCRIPT ( italic_e + divide start_ARG 2 end_ARG start_ARG 3 end_ARG ) end_POSTSUPERSCRIPT , italic_w start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ).
10:     Set 𝒔(e+1)=𝒔(e,M)superscript𝒔𝑒1superscript𝒔𝑒𝑀\bm{s}^{(e+1)}=\bm{s}^{(e,M)}bold_italic_s start_POSTSUPERSCRIPT ( italic_e + 1 ) end_POSTSUPERSCRIPT = bold_italic_s start_POSTSUPERSCRIPT ( italic_e , italic_M ) end_POSTSUPERSCRIPT.

5 Experiments

In this section, we evaluate our proposed intervention mechanisms on both offline datasets and an online environment on a leading short-video recommendation platform in the industry.

5.1 Experiments on Offline Data

We conduct simulations on Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT game instances constructed from synthetic data and MovieLens-1m dataset (Harper and Konstan, 2015). In the following, we first introduce the specification of these two simulation environments and then report the results.

5.1.1 Synthetic environment

For the synthetic environment, we first construct the user population as follows: we fix an embedding dimension d=5𝑑5d=5italic_d = 5 and independently sample 10101010 cluster centers, denoted as {𝒄1,,𝒄10}subscript𝒄1subscript𝒄10\{\bm{c}_{1},\cdots,\bm{c}_{10}\}{ bold_italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_c start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT }, from the unit sphere 𝕊d1superscript𝕊𝑑1\mathbb{S}^{d-1}blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. For each center 𝒄isubscript𝒄𝑖\bm{c}_{i}bold_italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we generate users belonging to cluster-i𝑖iitalic_i by independently sampling from a Gaussian distribution 𝒩(𝒄i,0.52Id)𝒩subscript𝒄𝑖superscript0.52subscript𝐼𝑑\mathcal{N}(\bm{c}_{i},0.5^{2}I_{d})caligraphic_N ( bold_italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 0.5 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). The sizes of the 10101010 user clusters are denoted by a vector 𝒛=10×(100,50,20,10,10,5,2,1,1,1)𝒛101005020101052111\bm{z}=10\times(100,50,20,10,10,5,2,1,1,1)bold_italic_z = 10 × ( 100 , 50 , 20 , 10 , 10 , 5 , 2 , 1 , 1 , 1 ). In this manner, we generate a population 𝒳𝒳\mathcal{X}caligraphic_X of size m=2000𝑚2000m=2000italic_m = 2000. The number of creators is set to n=200𝑛200n=200italic_n = 200, and each action set 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is set to the unit ball in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The user utility and relevance score function are set to π(𝒔,𝒙)=σ(𝒔,𝒙)=max{1𝒔𝒙/3,0}𝜋𝒔𝒙𝜎𝒔𝒙𝑚𝑎𝑥1norm𝒔𝒙30\pi(\bm{s},\bm{x})=\sigma(\bm{s},\bm{x})=\mathop{max}\{1-\|\bm{s}-\bm{x}\|/3,0\}italic_π ( bold_italic_s , bold_italic_x ) = italic_σ ( bold_italic_s , bold_italic_x ) = start_BIGOP italic_m italic_a italic_x end_BIGOP { 1 - ∥ bold_italic_s - bold_italic_x ∥ / 3 , 0 }. We set (β,K)𝛽𝐾(\beta,K)( italic_β , italic_K ) to (0.1,20)0.120(0.1,20)( 0.1 , 20 ) by default. Such synthetic datasets characterize a class of clustered user preference distributions (e.g., majority vs., minority user groups).

On the creators’ side, we let their initial strategies to be close the center of the largest user group. This environment models a situation where creators tend to chase popular trends by exclusively producing content tailored to the taste of the largest user group. We aim to investigate whether our proposed mechanisms can assist the platform to escape from such sub-optimal states.

Refer to caption
(a) User welfare evolving curve.
Refer to caption
(b) Avg. group weights.
Refer to caption
(c) Avg. group utilities.
Figure 2: Performance of UIR, SMT and HMT on synthetic dataset against the no-intervention baseline. Results are averaged over 10 independently sampled synthetic environments including one-sigma error bars. x𝑥xitalic_x-axis: group sizes divided by 10.
Refer to caption
(a) User welfare evolving curve.
Refer to caption
(b) Avg. group weights.
Refer to caption
(c) Avg. group utilities.
Figure 3: Performance of UIR, SMT and HMT on MovieLens-1m dataset against the no-intervention baseline. Results are averaged over 10 independent simulations including 0.2-sigma error bars.

5.1.2 Environment constructed from MovieLens-1m

We use deep matrix factorization (Fan and Cheng, 2018) to train user and movie embeddings (with dimension set to 32323232) by fitting the observed ratings in the range of 1 to 5. To ensure the quality of the trained embeddings, we performed a 5-fold cross-validation and obtained an averaged RMSE =0.739absent0.739=0.739= 0.739 on the test sets. Then with the same hyper-parameter settings, we train the user/item embeddings with the complete dataset.

We select active users with more than 200 ratings, resulting in a population 𝒳𝒳\mathcal{X}caligraphic_X comprising 1578157815781578 users. We set the number of creators to 20202020, with each creator’s action set 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT consisting of 1000100010001000 different movies. All {𝒮i}subscript𝒮𝑖\{\mathcal{S}_{i}\}{ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } share a common part – the most popular 700 movies based on the number of ratings they received, and each 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT also has a private part – a randomly sampled 300 movies. Our choice of the user utility and matching score functions is π(𝒔,𝒙)=σ(𝒔,𝒙)=𝒔𝒙𝜋𝒔𝒙𝜎𝒔𝒙superscript𝒔top𝒙\pi(\bm{s},\bm{x})=\sigma(\bm{s},\bm{x})=\bm{s}^{\top}\bm{x}italic_π ( bold_italic_s , bold_italic_x ) = italic_σ ( bold_italic_s , bold_italic_x ) = bold_italic_s start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x, and then normalized to the region [0,1].01[0,1].[ 0 , 1 ] . Additionally, we set (β,K)=(0.1,20)𝛽𝐾0.120(\beta,K)=(0.1,20)( italic_β , italic_K ) = ( 0.1 , 20 ) and initialize creators’ strategies to the most preferred movie among all users (i.e., the movie that enjoys the highest average rating among 𝒳𝒳\mathcal{X}caligraphic_X).

5.1.3 Configurations of adaptive reweighting algorithm and intervention mechanisms

For the adaptive reweighting algorithm, we set the epoch length M=5𝑀5M=5italic_M = 5 and the simulation time horizon T=3000𝑇3000T=3000italic_T = 3000 for both environments. During each time step within an epoch, we simulate creators’ responses by letting each of them update her strategy once using Algorithm 2 in a random order. Creators’ learning rate is set to η=0.2𝜂0.2\eta=0.2italic_η = 0.2. On the platform side, we use K𝐾Kitalic_K-means clustering to determine user groups and set the number of clusters to 20202020 for synthetic environment and 15151515 for MovieLens environment, respectively. We should note as in practice, even the system does not have the exact knowledge about user distribution, we do not use the ground-truth clustering of users set in the simulation. In addition, we set the temperature parameter α=0.5𝛼0.5\alpha=0.5italic_α = 0.5 for the first half of the time period and reduce it to 0.10.10.10.1 for the remaining period. The clip** constants are set to (wmin,wmax)=(0.2,5.0)subscript𝑤𝑚𝑖𝑛subscript𝑤𝑚𝑎𝑥0.25.0(w_{\mathop{min}},w_{\mathop{max}})=(0.2,5.0)( italic_w start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ) = ( 0.2 , 5.0 ) and the map** used in SMT and HMT are set to β(𝒙)=βw(𝒙)𝛽𝒙𝛽𝑤𝒙\beta(\bm{x})=\beta\cdot w(\bm{x})italic_β ( bold_italic_x ) = italic_β ⋅ italic_w ( bold_italic_x ) and K(𝒙)=Kw(𝒙)𝐾𝒙𝐾𝑤𝒙K(\bm{x})=\lceil K\cdot w(\bm{x})\rceilitalic_K ( bold_italic_x ) = ⌈ italic_K ⋅ italic_w ( bold_italic_x ) ⌉.

5.1.4 Results

Figure 2(a) illustrates the user welfare resulted from creators’ evolving strategies under the three intervention mechanisms: UIR, SMT, and HMT, compared to the baseline (no platform intervention). Over time, all three mechanisms consistently outperform the baseline. In the baseline (shown in blue), the welfare plateaus quickly and remains stagnant. Conversely, the welfare curves under the other mechanisms exhibit “double-ascent” patterns. Initially, they also plateau, but eventually, they begin to rise again and surpass the baseline. This is because, without platform’s intervention, creators tend to remain in sub-optimal equilibria as illustrated in Section 4.1. However, our proposed mechanisms gradually accumulate user group weights, which, when significant enough, encourage creators to explore unattended user groups, leading to increased welfare. Among the three mechanisms, HMT demonstrates the most substantial gain with the least variance. UIR, while showing a lower marginal gain, maintains stability with minimal variance. SMT, which achieves a moderate gain, exhibits higher variance, suggesting that directly manipulating the matching temperature may be overly aggressive.

Figure 2(b) shows the learned group weights at the last iteration of simulation. As it demonstrates, all three mechanisms emphasize on small groups over larger ones. This outcome aligns with our expectation: on one hand, larger user groups are more likely to “trap” unnecessarily many creators and thus should be deprioritized; on the other hand, increasing weights of niche user groups also improve their chances of being discovered by more creators.

Figure 2(c) breaks down the average utilities across user groups. The blue dashed line (i.e., the no-intervention baseline) exhibits a positive correlation between averaged group utility and group size, mirroring real-world observations. The orange bars show that UIR strikes a balance by improving the utility of niche groups while slightly trading off utility in larger groups. HMT achieves a remarkable Pareto improvement across all groups, as indicated by the red bars. However, SMT’s gains come at the cost of even greater skewness in the average utility distribution across groups.

To summarize, all three mechanisms show promising improvements in overall user welfare, but their nature of gains differs, introducing considerations for the platform. When condition allows, HMT is the top choice due to its strong performance, stability, and fairness. For platforms that prioritize fairness and stability, UIR is also a viable option. However, SMT, despite improving overall welfare, may suffer from potential drawbacks such as instability and fairness issues. In-depth analysis of the merits and limitations of these mechanisms remains a topic for future research.

The results in the MovieLens environment align with the insights from the synthetic environment (refer to Figure 3). However, it’s worth noting that the trends in learned group weights and realized group utilities do not always align with group size, which is expected in real-world data where unattended user groups may not necessarily have small sizes. Nevertheless, our proposed mechanisms continue to improve overall welfare by identifying and prioritizing these groups.

5.2 Online Experiments

We conducted online evaluations on one of the world’s leading short-video content creation and recommendation platforms (referred to as the “platform” hereafter due to the anonymity requirement), spanning over 3 weeks. We observe that the platform’s intervention can indeed influence creator behavior because, on average, there is a positive correlation between the delivery volume and content creation volume for each topic. (The Pearson correlation is 0.2, and there are hundreds of topics in total). In this experiment, we employed the “like-through-rate” (LTR) as the user utility function. LTR is calculated as the ratio of total likes to the number of impressions of a specific short video. We opted for LTR as the chosen metric because it not only serves as a reflection of user satisfaction but also offers a straightforward and easily interpretable signal for content creators to assess their content’s perceived quality. The selection of the HMT mechanism for testing was deliberate, driven not only by its strong performance against the baseline and other mechanisms in our offline experiments, but also due to its ease of integration into production: HMT solely requires changing the number of candidate content retrieved for different users within the deployed relevance-based ranking model.

5.2.1 Experiment Setups

We list the experiment setups below.

User clustering: We utilized explicit user characteristics such as demographics including country and gender and their level of activeness including video consumption volume and watch time. This approach led to the creation of over 10,000 user groups and we retained groups that had a sufficient number of users, resulting in hundreds of user groups.

Cluster weight update: We implemented a daily weight updating cadence. Each day, we assessed the satisfaction of every user group by calculating the relative change of LTR over its average in the previous two days. Subsequently, we recalculated the user weights in accordance with the method outlined in Algorithm 1.

A/B test configurations: To evaluate changes in both user and creator behavior, we employed a symmetric A/B test setup on the platform. This symmetric A/B test consisted of an experiment arm and a control arm to measure performance. At the beginning, we randomly pair 3% creators with 3% users from entire platform for each arm. Under this setup, users within each arm exclusively received content created by creators within the same arm, and content created by these creators was exclusively exposed to users within the same arm throughout the testing period. This stringent separation prevents any cross-group treatment leakage and maintains a closed feedback loop within each arm. In our online experiment, we ran these two arms for a duration of 3 weeks: a control arm adhering to the existing production setup and a test arm where we applied our proposed mechanism, HMT.

HMT specifics: We implemented HMT during the cold start content retrieval phase, which pertains to content created within a few days and has not yet garnered a predefined number of impressions. Specifically, within the platform’s production pipeline, we integrated an audience matching stage to retrieve cold start content. During this stage, content is exclusively delivered to the most suitable user candidates based on relevance scores generated by a pre-trained model. In the existing production setup, a fixed relevance score percentile of 99% is uniformly applied to all users. This means that every user is only matched with the top 1% of cold-start content in terms of relevance scores to ensure a high level of personalization. When tuning the percentile, we typically observe a trade-off between overall user satisfaction and the volume of cold-start content. In our experiment, we leveraged HMT to intelligently adjust this threshold for different user groups, anticipating improvements in both of thees metrics. Consequently, user groups with higher weights were granted a higher chance to be selected by content creators, while those with lower weights were deprioritized. The map** from the group weight w𝑤witalic_w to the percentile of retrieved cold start content proportion was designed as a piece-wise constant function, with details specified in Table 1.

Table 1: Map** g𝑔gitalic_g in HMT
Weight <1.0absent1.0<1.0< 1.0 <1.19absent1.19<1.19< 1.19 <1.79absent1.79<1.79< 1.79 <2.13absent2.13<2.13< 2.13 <2.36absent2.36<2.36< 2.36 <2.68absent2.68<2.68< 2.68 2.68absent2.68\geq 2.68≥ 2.68
Percentile 0.99 0.95 0.90 0.85 0.75 0.7 0.1

5.2.2 Results

Positive results were obtained in three key aspects.

User-side engagement: The core utility metric LTR increased by 1.13%percent1.131.13\%1.13 % and the total impression number of cold-start content increased by 0.76%percent0.760.76\%0.76 %, leading to a 3.7%percent3.73.7\%3.7 % increase in impressions for fresh content created within 2 hours. These improvements are statistically significant and demonstrate increased user welfare while enhancing the freshness and diversity of content. The gains in both user satisfaction and the volume of cold-start content indicate that HMT influenced many creators to produce more targeted content that benefits niche user groups. Table 2 provides a breakdown of performance improvement per user group. We indexed all groups in descending order by their sizes and divided them into four columns, with each column constituting approximately 25% of the total traffic during the experiment period. As shown, smaller groups enjoyed a higher gain in terms of LTR, which echoes the observations in offline results. The gain in cold-start content impression volume shows an opposite trend. This is because the absolute number of cold-start impressions for larger user groups was smaller as the distribution of relevance scores in this group was more skewed, resulting in a larger relative gain in this metric.

Content diversity: The average number of consumed topics per user during the experimental period increased by 0.71%, and this increase is also statistically significant.

Creator-side engagement: For popular creators (those with more than 1000 followers), the number of daily active users (Creator DAU) increased by an average of 0.17%, while for the remaining creators, the gain is 0.06%. Additionally, there is a promising increasing trend in Creator DAU for popular creators over the three weeks of the experiment: the increases over the first, second, and third weeks are -0.2%, 0.24%, and 0.48%. This suggests that the three-week duration of the experiment may have been too short to influence the majority of creators to respond accordingly, and more time may be needed to fully observe the positive feedback from creators.

Table 2: Gains per User Group
User Groups 1-5 6-20 21-74 75+ TOTAL
LTR +0.43% +1.40% +0.75% +1.36% +1.13%percent1.13\bm{+1.13\%}bold_+ bold_1.13 bold_%
Impression +2.64% +0.62% +1.42% +0.11% +0.76%percent0.76\bm{+0.76\%}bold_+ bold_0.76 bold_%

6 Conclusion

In this study, we tackle the user welfare optimization challenge faced by online content recommendation platforms through the lens of mechanism design. We identified myopic strategy updates among creators caused by their limited information access as the culprit of sub-optimal welfare and introduced platform interventions to address this issue. Our three proposed mechanisms, based on adaptive user importance reweighting, enable platforms to convey global user preference information, reshape creators’ perceived utilities, and influence their behaviors. Empirical experiments in both offline and online environments demonstrated the effectiveness of our approach, highlighting its potential for practical impact.

For future work, there remains an intriguing need for a comprehensive understanding of the merits and limitations of UIR, SMT, and HMT to aid practitioners in selecting the most suitable mechanism for real-world applications. It is also important to address practical constraints when applying the developed mechanisms. For instance, can we find ways to jointly optimize user welfare and platform costs? Can the mechanism explicitly ensure fairness on the user side and producer side? Deeper insights into these questions hold the potential to greatly impact the rapidly evolving online content landscape and industry practices.


References

  • Arora et al. (2012) Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121–164, 2012.
  • Belmega et al. (2018) E Veronica Belmega, Panayotis Mertikopoulos, Romain Negrel, and Luca Sanguinetti. Online convex optimization and no-regret learning: Algorithms, guarantees and applications. arXiv preprint arXiv:1804.04529, 2018.
  • Ben-Porat and Tennenholtz (2017) Omer Ben-Porat and Moshe Tennenholtz. Shapley facility location games. In International Conference on Web and Internet Economics, pages 58–73. Springer, 2017.
  • Ben-Porat and Tennenholtz (2018) Omer Ben-Porat and Moshe Tennenholtz. A game-theoretic approach to recommendation systems with strategic content providers. Advances in Neural Information Processing Systems, 31, 2018.
  • Biyik et al. (2023) Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-wei Hsu, Mohammad Ghavamzadeh, and Craig Boutilier. Preference elicitation with soft attributes in interactive recommendation. arXiv preprint arXiv:2311.02085, 2023.
  • Bobadilla et al. (2013) Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. Recommender systems survey. Knowledge-based systems, 46:109–132, 2013.
  • Boutilier et al. (2023) Craig Boutilier, Martin Mladenov, and Guy Tennenholtz. Modeling recommender ecosystems: Research challenges at the intersection of mechanism design, reinforcement learning and generative models. arXiv preprint arXiv:2309.06375, 2023.
  • Bravo et al. (2018) Mario Bravo, David Leslie, and Panayotis Mertikopoulos. Bandit learning in concave n-person games. Advances in Neural Information Processing Systems, 31, 2018.
  • Dean and Morgenstern (2022) Sarah Dean and Jamie Morgenstern. Preference dynamics under personalized recommendations. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 795–816, 2022.
  • Dean et al. (2024) Sarah Dean, Evan Dong, Meena Jagadeesan, and Liu Leqi. Recommender systems as dynamical systems: Interactions with viewers and creators. In Workshop on Recommendation Ecosystems: Modeling, Optimization and Incentive Design, 2024.
  • Fan and Cheng (2018) Jicong Fan and Jieyu Cheng. Matrix completion by deep matrix factorization. Neural Networks, 98:34–41, 2018.
  • Fleder and Hosanagar (2009) Daniel Fleder and Kartik Hosanagar. Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Management science, 55(5):697–712, 2009.
  • Glotfelter (2019) Angela Glotfelter. Algorithmic circulation: how content creators navigate the effects of algorithms on their work. Computers and composition, 54:102521, 2019.
  • Harper and Konstan (2015) F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):1–19, 2015.
  • Hodgson (2021) Thomas Hodgson. Spotify and the democratisation of music. Popular Music, 40(1):1–17, 2021.
  • Holmbom (2015) Mattias Holmbom. The youtuber: A qualitative study of popular content creators, 2015.
  • Hron et al. (2022) Jiri Hron, Karl Krauth, Michael I Jordan, Niki Kilbertus, and Sarah Dean. Modeling content creator incentives on algorithm-curated platforms. arXiv preprint arXiv:2206.13102, 2022.
  • Hu et al. (2023) Xinyan Hu, Meena Jagadeesan, Michael I Jordan, and Jacob Steinhard. Incentivizing high-quality content in online recommender systems. arXiv preprint arXiv:2306.07479, 2023.
  • Immorlica et al. (2024) Nicole Immorlica, Meena Jagadeesan, and Brendan Lucier. Clickbait vs. quality: How engagement-based optimization shapes the content landscape in online platforms. In Proceedings of the ACM Web Conference 2024, 2024.
  • Jagadeesan et al. (2022) Meena Jagadeesan, Nikhil Garg, and Jacob Steinhardt. Supply-side equilibria in recommender systems. arXiv preprint arXiv:2206.13489, 2022.
  • Kajander (2019) Hanna Kajander. Challenges of a content creator in the era of digital marketing. 2019.
  • Krantz and Parks (2002) Steven George Krantz and Harold R Parks. The implicit function theorem: history, theory, and applications. Springer Science & Business Media, 2002.
  • Mladenov et al. (2020) Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, and Craig Boutilier. Optimizing long-term social welfare in recommender systems: A constrained matching approach. In International Conference on Machine Learning, pages 6987–6998. PMLR, 2020.
  • Nandagiri and Philip (2018) Vaibhavi Nandagiri and Leena Philip. Impact of influencers from instagram and youtube on their followers. International Journal of Multidisciplinary Research and Modern Education, 4(1):61–65, 2018.
  • Nash Jr (1950) John F Nash Jr. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49, 1950.
  • Prasad et al. (2023) Siddharth Prasad, Martin Mladenov, and Craig Boutilier. Content prompting: Modeling content provider dynamics to improve user welfare in recommender ecosystems. arXiv preprint arXiv:2309.00940, 2023.
  • Qian and Jain (2022) Kun Qian and Sanjay Jain. Digital content creation: An analysis of the impact of recommendation systems. Available at SSRN 4311562, 2022.
  • Rosen (1965) J Ben Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
  • Xu et al. (2024) Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, and Peng Cui. Ppa-game: Characterizing and learning competitive dynamics among online content creators. arXiv preprint arXiv:2403.15524, 2024.
  • Yao et al. (2022a) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. Learning from a learning user for optimal recommendations. In International Conference on Machine Learning, pages 25382–25406. PMLR, 2022a.
  • Yao et al. (2022b) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. Learning the optimal recommendation from explorative users. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9457–9465, 2022b.
  • Yao et al. (2023a) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. How bad is top-k𝑘kitalic_k recommendation under competing content creators? In International Conference on Machine Learning. PMLR, 2023a.
  • Yao et al. (2023b) Fan Yao, Chuanhao Li, Karthik Abinav Sankararaman, Yiming Liao, Yan Zhu, Qifan Wang, Hongning Wang, and Haifeng Xu. Rethinking incentives in recommender systems: Are monotone rewards always beneficial? arXiv preprint arXiv:2306.07893, 2023b.
  • Yao et al. (2024) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. Human vs. generative ai in content creation competition: Symbiosis or conflict? arXiv preprint arXiv:2402.15467, 2024.
  • Zhan et al. (2021) Ruohan Zhan, Konstantina Christakopoulou, Ya Le, Jayden Ooi, Martin Mladenov, Alex Beutel, Craig Boutilier, Ed Chi, and Minmin Chen. Towards content provider aware recommender systems: A simulation study on the interplay between user and provider utilities. In Proceedings of the Web Conference 2021, pages 3872–3883, 2021.
  • Zhu et al. (2023) Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, and Michael I Jordan. Online learning in a creator economy. arXiv preprint arXiv:2305.11381, 2023.

A Details of content creators’ strategy update dynamics

The Local Better Response (LBR) procedure described in Algorithm 2 captures the evolution of creators’ strategies in a snapshot, and characterizes two fundamental properties of content creation: 1. it relies solely on point estimations of the utility function (Line 3); and 2. it only incurs local changes at each update (Line 4). At each step, a creator who decides to update her strategy would first generate an exploration direction 𝒈isubscript𝒈𝑖\bm{g}_{i}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Line 2); then she would evaluate whether adjusting her strategy in this direction results in a higher utility. If so, she proceeds to update her strategy along 𝒈isubscript𝒈𝑖\bm{g}_{i}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in a pace of η𝜂\etaitalic_η; otherwise, she maintains her current strategy.

Algorithm 2 closely emulates real-world scenarios where creators strive to optimize their utilities while having merely black-box access to the utility functions. In practice, finding a clear direction that guarantees improved utility can be a challenging and, at times, unrealistic task. Consequently, we model their strategy evolution as an iterative process of trial and error. By definition, when LBR converges in Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT, it must converge to an LNE. Our primary interest lies in understanding how the platform can devise a dynamic rewarding or matching principle that maximizes cumulative user welfare within a given time period.

Algorithm 2 (LBR) Local Better Response update at time step t𝑡titalic_t
1:  Input: Learning rate η𝜂\etaitalic_η, an Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT instance including utility functions and strategy sets (ui(𝒔),𝒮i)subscript𝑢𝑖𝒔subscript𝒮𝑖(u_{i}(\bm{s}),\mathcal{S}_{i})( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s ) , caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) of creator i𝑖iitalic_i, the joint strategy profile 𝒔(t)=(𝒔1(t),,𝒔n(t))superscript𝒔𝑡superscriptsubscript𝒔1𝑡superscriptsubscript𝒔𝑛𝑡\bm{s}^{(t)}=(\bm{s}_{1}^{(t)},\cdots,\bm{s}_{n}^{(t)})bold_italic_s start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = ( bold_italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , ⋯ , bold_italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) at the current step t𝑡titalic_t.
2:  Generate a random direction 𝒈i𝕊dsubscript𝒈𝑖superscript𝕊𝑑\bm{g}_{i}\in\mathbb{S}^{d}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.
3:  if  ui(𝒔i(t)+η𝒈i,𝒔i(t))ui(𝒔(t))subscript𝑢𝑖superscriptsubscript𝒔𝑖𝑡𝜂subscript𝒈𝑖superscriptsubscript𝒔𝑖𝑡subscript𝑢𝑖superscript𝒔𝑡u_{i}(\bm{s}_{i}^{(t)}+\eta\bm{g}_{i},\bm{s}_{-i}^{(t)})\geq u_{i}(\bm{s}^{(t)})italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + italic_η bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ≥ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT )  then
4:     𝒔i(t+12)=𝒔i(t)+η𝒈isuperscriptsubscript𝒔𝑖𝑡12superscriptsubscript𝒔𝑖𝑡𝜂subscript𝒈𝑖\bm{s}_{i}^{(t+\frac{1}{2})}=\bm{s}_{i}^{(t)}+\eta\bm{g}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_POSTSUPERSCRIPT = bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + italic_η bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
5:     Find 𝒔i(t+1)superscriptsubscript𝒔𝑖𝑡1\bm{s}_{i}^{(t+1)}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT as the projection of 𝒔i(t+12)superscriptsubscript𝒔𝑖𝑡12\bm{s}_{i}^{(t+\frac{1}{2})}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_POSTSUPERSCRIPT in 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
6:  else
7:     𝒔i(t+1)=𝒔i(t)superscriptsubscript𝒔𝑖𝑡1superscriptsubscript𝒔𝑖𝑡\bm{s}_{i}^{(t+1)}=\bm{s}_{i}^{(t)}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT

B Implementation Details of UIR, SMT and HMT Mechanisms

The following sub-routine, denoted as Algorithm 3, outlines how the platform deploys the weights obtained from Line 8, Algorithm 1 as an intervention mechanism in Line 4. In Algorithm 3, the weight vector 𝒘𝒘\bm{w}bold_italic_w is directly employed to modify the reward or payment associated with each creator-user interaction.

Algorithm 3 UIR Intervention
  Input: Default recall capacity K𝐾Kitalic_K, matching temperature β𝛽\betaitalic_β.
  for each user request 𝒙𝒙\bm{x}bold_italic_x  do
     Compute the relevance scores {σ(𝒔i,𝒙)}i=1nsuperscriptsubscript𝜎subscript𝒔𝑖𝒙𝑖1𝑛\{\sigma(\bm{s}_{i},\bm{x})\}_{i=1}^{n}{ italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.
     Retrieve the top-K𝐾Kitalic_K ranked content {sl(1),,sl(K)}subscript𝑠𝑙1subscript𝑠𝑙𝐾\{s_{l(1)},\cdots,s_{l(K)}\}{ italic_s start_POSTSUBSCRIPT italic_l ( 1 ) end_POSTSUBSCRIPT , ⋯ , italic_s start_POSTSUBSCRIPT italic_l ( italic_K ) end_POSTSUBSCRIPT } list based on relevance scores and randomly sample one element according to Softmax({β1σ(𝒔l(i),𝒙)}i=1K)Softmaxsuperscriptsubscriptsuperscript𝛽1𝜎subscript𝒔𝑙𝑖𝒙𝑖1𝐾\text{Softmax}(\{\beta^{-1}\sigma(\bm{s}_{l(i)},\bm{x})\}_{i=1}^{K})Softmax ( { italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_l ( italic_i ) end_POSTSUBSCRIPT , bold_italic_x ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ).
     For the user’s choice 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, adjust creator-i𝑖iitalic_i’s default reward (payment) from R(𝒔i,𝒙)𝑅subscript𝒔𝑖𝒙R(\bm{s}_{i},\bm{x})italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) to w(𝒙)R(𝒔i,𝒙)𝑤𝒙𝑅subscript𝒔𝑖𝒙w(\bm{x})R(\bm{s}_{i},\bm{x})italic_w ( bold_italic_x ) italic_R ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ).

In the case of SMT or HMT intervention types, the platform requires a function to map w(𝒙)𝑤𝒙w(\bm{x})italic_w ( bold_italic_x ) to β(𝒙)𝛽𝒙\beta(\bm{x})italic_β ( bold_italic_x ) or K(𝒙)𝐾𝒙K(\bm{x})italic_K ( bold_italic_x ). This map** can be implemented as a piecewise constant function and determined empirically. The specifics of this process are elucidated in Algorithm 4 and 5.

Algorithm 4 SMT Intervention
  Input: Default recall capacity K𝐾Kitalic_K, matching temperature β𝛽\betaitalic_β, f:++:𝑓subscriptsubscriptf:\mathbb{R}_{+}\rightarrow\mathbb{R}_{+}italic_f : blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.
  for each user request 𝒙𝒙\bm{x}bold_italic_x  do
     Compute the relevance scores {σ(𝒔i,𝒙)}i=1nsuperscriptsubscript𝜎subscript𝒔𝑖𝒙𝑖1𝑛\{\sigma(\bm{s}_{i},\bm{x})\}_{i=1}^{n}{ italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.
     Retrieve the top-K𝐾Kitalic_K ranked content {sl(1),,sl(K)}subscript𝑠𝑙1subscript𝑠𝑙𝐾\{s_{l(1)},\cdots,s_{l(K)}\}{ italic_s start_POSTSUBSCRIPT italic_l ( 1 ) end_POSTSUBSCRIPT , ⋯ , italic_s start_POSTSUBSCRIPT italic_l ( italic_K ) end_POSTSUBSCRIPT } list based on relevance scores and randomly sample one element according to Softmax({β(𝒙)1σ(𝒔l(i),𝒙)}i=1K)Softmaxsuperscriptsubscript𝛽superscript𝒙1𝜎subscript𝒔𝑙𝑖𝒙𝑖1𝐾\text{Softmax}(\{\beta(\bm{x})^{-1}\sigma(\bm{s}_{l(i)},\bm{x})\}_{i=1}^{K})Softmax ( { italic_β ( bold_italic_x ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_l ( italic_i ) end_POSTSUBSCRIPT , bold_italic_x ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ), where β(𝒙)=f(w(𝒙))𝛽𝒙𝑓𝑤𝒙\beta(\bm{x})=f(w(\bm{x}))italic_β ( bold_italic_x ) = italic_f ( italic_w ( bold_italic_x ) ).
Algorithm 5 HMT Intervention
  Input: Default recall capacity K𝐾Kitalic_K, matching temperature β𝛽\betaitalic_β, g:++:𝑔subscriptsubscriptg:\mathbb{R}_{+}\rightarrow\mathbb{N}_{+}italic_g : blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT → blackboard_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.
  for each user request 𝒙𝒙\bm{x}bold_italic_x  do
     Compute the relevance scores {σ(𝒔i,𝒙)}i=1nsuperscriptsubscript𝜎subscript𝒔𝑖𝒙𝑖1𝑛\{\sigma(\bm{s}_{i},\bm{x})\}_{i=1}^{n}{ italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.
     Retrieve the top-K(𝒙)𝐾𝒙K(\bm{x})italic_K ( bold_italic_x ) ranked content {sl(1),,sl(K(𝒙))}subscript𝑠𝑙1subscript𝑠𝑙𝐾𝒙\{s_{l(1)},\cdots,s_{l(K(\bm{x}))}\}{ italic_s start_POSTSUBSCRIPT italic_l ( 1 ) end_POSTSUBSCRIPT , ⋯ , italic_s start_POSTSUBSCRIPT italic_l ( italic_K ( bold_italic_x ) ) end_POSTSUBSCRIPT } list based on relevance scores and randomly sample one element according to Softmax({β1σ(𝒔l(i),𝒙)}i=1K(𝒙))Softmaxsuperscriptsubscriptsuperscript𝛽1𝜎subscript𝒔𝑙𝑖𝒙𝑖1𝐾𝒙\text{Softmax}(\{\beta^{-1}\sigma(\bm{s}_{l(i)},\bm{x})\}_{i=1}^{K}(\bm{x}))Softmax ( { italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_l ( italic_i ) end_POSTSUBSCRIPT , bold_italic_x ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( bold_italic_x ) ), where K(𝒙)=g(w(𝒙))𝐾𝒙𝑔𝑤𝒙K(\bm{x})=g(w(\bm{x}))italic_K ( bold_italic_x ) = italic_g ( italic_w ( bold_italic_x ) ).

C Proof of Theorem 1

We restate Theorem 1 as the following with more rigorous characterizations, and then provide its detailed proof.

Theorem 3

Any Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT game with K=n𝐾𝑛K=nitalic_K = italic_n has a unique pure Nash equilibrium (PNE) if each creator’s srtategy set 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is convex and σ(,𝐱)𝜎𝐱\sigma(\cdot,\bm{x})italic_σ ( ⋅ , bold_italic_x ) is twice-differentiable and satisfies

𝔼𝒙𝒳[2σ𝒔i2+(σ𝒔i)(σ𝒔i)]0,i[n].formulae-sequenceprecedes-or-equalssubscript𝔼similar-to𝒙𝒳delimited-[]superscript2𝜎superscriptsubscript𝒔𝑖2𝜎subscript𝒔𝑖superscript𝜎subscript𝒔𝑖top0for-all𝑖delimited-[]𝑛\mathbb{E}_{\bm{x}\sim\mathcal{X}}\left[\frac{\partial^{2}\sigma}{\partial\bm{% s}_{i}^{2}}+\Big{(}\frac{\partial\sigma}{\partial\bm{s}_{i}}\Big{)}\Big{(}% \frac{\partial\sigma}{\partial\bm{s}_{i}}\Big{)}^{\top}\right]\preceq 0,% \forall i\in[n].blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ⪯ 0 , ∀ italic_i ∈ [ italic_n ] . (15)

Proof 

We prove that under the proposed conditions, the Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT is a strictly monotone game Rosen (1965) and thus possesses a unique PNE. According to Appendix A in Bravo et al. (2018), a sufficient condition that establishes strictly monotonicity for any n𝑛nitalic_n-person game 𝒢𝒢\mathcal{G}caligraphic_G is convex action sets and a negative definite Hessian [Hij𝒢]delimited-[]subscriptsuperscript𝐻𝒢𝑖𝑗[H^{\mathcal{G}}_{ij}][ italic_H start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] of 𝒢𝒢\mathcal{G}caligraphic_G, which is defined as

Hij(𝒔)=12jiui(𝒔)+12ijuj(𝒔).subscript𝐻𝑖𝑗𝒔12subscript𝑗subscript𝑖subscript𝑢𝑖𝒔12subscript𝑖subscript𝑗subscript𝑢𝑗superscript𝒔topH_{ij}(\bm{s})=\frac{1}{2}\nabla_{j}\nabla_{i}u_{i}(\bm{s})+\frac{1}{2}\nabla_% {i}\nabla_{j}u_{j}(\bm{s})^{\top}.italic_H start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_s ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∇ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_s ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT .

For Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT game, the convexity of strategy sets are satisfied. Next we prove the property of the game’s Hessian matrix with associated utility function

ui(𝒔)=𝔼𝒙𝒳[exp(σ(𝒔i,𝒙))l=1nexp(σ(𝒔l,𝒙))].subscript𝑢𝑖𝒔subscript𝔼similar-to𝒙𝒳delimited-[]𝜎subscript𝒔𝑖𝒙superscriptsubscript𝑙1𝑛𝜎subscript𝒔𝑙𝒙u_{i}(\bm{s})=\mathbb{E}_{\bm{x}\sim\mathcal{X}}\left[\frac{\exp(\sigma(\bm{s}% _{i},\bm{x}))}{\sum_{l=1}^{n}\exp(\sigma(\bm{s}_{l},\bm{x}))}\right].italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ divide start_ARG roman_exp ( italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_exp ( italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , bold_italic_x ) ) end_ARG ] . (16)

Without loss of generality, let β=1𝛽1\beta=1italic_β = 1. Denote Ai=exp(σ(𝒔i,𝒙)),M=A1++Anformulae-sequencesubscript𝐴𝑖𝜎subscript𝒔𝑖𝒙𝑀subscript𝐴1subscript𝐴𝑛A_{i}=\exp(\sigma(\bm{s}_{i},\bm{x})),M=A_{1}+\cdots+A_{n}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_exp ( italic_σ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ) ) , italic_M = italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we have

Hiisubscript𝐻𝑖𝑖\displaystyle H_{ii}italic_H start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT =𝔼𝒙𝒳{[2σ𝒔i2(σ𝒔i)(σ𝒔i)]Ai(MAi)1M2\displaystyle=-\mathbb{E}_{\bm{x}\sim\mathcal{X}}\Big{\{}\Big{[}-\frac{% \partial^{2}\sigma}{\partial\bm{s}_{i}^{2}}-\Big{(}\frac{\partial\sigma}{% \partial\bm{s}_{i}}\Big{)}\Big{(}\frac{\partial\sigma}{\partial\bm{s}_{i}}\Big% {)}^{\top}\Big{]}A_{i}(M-A_{i})\cdot\frac{1}{M^{2}}= - blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT { [ - divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
+2M(σ𝒔i)(σ𝒔i)Ai2(MAi)1M2}\displaystyle\quad\quad\quad\quad\quad+\frac{2}{M}\Big{(}\frac{\partial\sigma}% {\partial\bm{s}_{i}}\Big{)}\Big{(}\frac{\partial\sigma}{\partial\bm{s}_{i}}% \Big{)}^{\top}A_{i}^{2}(M-A_{i})\cdot\frac{1}{M^{2}}\Big{\}}+ divide start_ARG 2 end_ARG start_ARG italic_M end_ARG ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG }
=𝔼𝒙𝒳{[2σ𝒔i2(σ𝒔i)(σ𝒔i)(1AiM)]Ai(MAi)1M2}absentsubscript𝔼similar-to𝒙𝒳delimited-[]superscript2𝜎superscriptsubscript𝒔𝑖2𝜎subscript𝒔𝑖superscript𝜎subscript𝒔𝑖top1subscript𝐴𝑖𝑀subscript𝐴𝑖𝑀subscript𝐴𝑖1superscript𝑀2\displaystyle=-\mathbb{E}_{\bm{x}\sim\mathcal{X}}\Big{\{}\Big{[}-\frac{% \partial^{2}\sigma}{\partial\bm{s}_{i}^{2}}-\Big{(}\frac{\partial\sigma}{% \partial\bm{s}_{i}}\Big{)}\Big{(}\frac{\partial\sigma}{\partial\bm{s}_{i}}\Big% {)}^{\top}(1-\frac{A_{i}}{M})\Big{]}\cdot A_{i}(M-A_{i})\frac{1}{M^{2}}\Big{\}}= - blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT { [ - divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( 1 - divide start_ARG italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_M end_ARG ) ] ⋅ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG }
𝔼𝒙𝒳{(σ𝒔i)(σ𝒔i)Ai2(MAi)1M3}subscript𝔼similar-to𝒙𝒳𝜎subscript𝒔𝑖superscript𝜎subscript𝒔𝑖topsuperscriptsubscript𝐴𝑖2𝑀subscript𝐴𝑖1superscript𝑀3\displaystyle\quad-\mathbb{E}_{\bm{x}\sim\mathcal{X}}\Big{\{}\Big{(}\frac{% \partial\sigma}{\partial\bm{s}_{i}}\Big{)}\Big{(}\frac{\partial\sigma}{% \partial\bm{s}_{i}}\Big{)}^{\top}A_{i}^{2}(M-A_{i})\cdot\frac{1}{M^{3}}\Big{\}}- blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT { ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG }
𝔼𝒙𝒳[Hii(0)(𝒔,𝒙)]𝔼𝒙𝒳[Hii(1)(𝒔,𝒙)1M3].absentsubscript𝔼similar-to𝒙𝒳delimited-[]superscriptsubscript𝐻𝑖𝑖0𝒔𝒙subscript𝔼similar-to𝒙𝒳delimited-[]superscriptsubscript𝐻𝑖𝑖1𝒔𝒙1superscript𝑀3\displaystyle\triangleq-\mathbb{E}_{\bm{x}\sim\mathcal{X}}\left[H_{ii}^{(0)}(% \bm{s},\bm{x})\right]-\mathbb{E}_{\bm{x}\sim\mathcal{X}}\left[H_{ii}^{(1)}(\bm% {s},\bm{x})\frac{1}{M^{3}}\right].≜ - blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ italic_H start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( bold_italic_s , bold_italic_x ) ] - blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ italic_H start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ( bold_italic_s , bold_italic_x ) divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ] .
Hijsubscript𝐻𝑖𝑗\displaystyle H_{ij}italic_H start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =𝔼𝒙𝒳{(σ𝒔i)(σ𝒔j)AiAj(MAiAj)1M3}absentsubscript𝔼similar-to𝒙𝒳𝜎subscript𝒔𝑖superscript𝜎subscript𝒔𝑗topsubscript𝐴𝑖subscript𝐴𝑗𝑀subscript𝐴𝑖subscript𝐴𝑗1superscript𝑀3\displaystyle=-\mathbb{E}_{\bm{x}\sim\mathcal{X}}\Big{\{}\Big{(}\frac{\partial% \sigma}{\partial\bm{s}_{i}}\Big{)}\Big{(}\frac{\partial\sigma}{\partial\bm{s}_% {j}}\Big{)}^{\top}A_{i}A_{j}(M-A_{i}-A_{j})\cdot\frac{1}{M^{3}}\Big{\}}= - blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT { ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ⋅ divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG }
𝔼𝒙𝒳[Hij(1)(𝒔,𝒙)1M3].absentsubscript𝔼similar-to𝒙𝒳delimited-[]superscriptsubscript𝐻𝑖𝑗1𝒔𝒙1superscript𝑀3\displaystyle\triangleq-\mathbb{E}_{\bm{x}\sim\mathcal{X}}\left[H_{ij}^{(1)}(% \bm{s},\bm{x})\frac{1}{M^{3}}\right].≜ - blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ italic_H start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ( bold_italic_s , bold_italic_x ) divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ] .

Next we show that for any 𝒙𝒙\bm{x}bold_italic_x and 𝒔𝒔\bm{s}bold_italic_s, the block matrix [Hij(1)]delimited-[]subscriptsuperscript𝐻1𝑖𝑗[H^{(1)}_{ij}][ italic_H start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] is always positive semi-definite (PSD). For simplicity, let

𝒚i=Aiσ𝒔id×1,𝒚=[𝒚1;;𝒚n]dn×1,formulae-sequencesubscript𝒚𝑖subscript𝐴𝑖𝜎subscript𝒔𝑖superscript𝑑1𝒚subscript𝒚1subscript𝒚𝑛superscript𝑑𝑛1\displaystyle\bm{y}_{i}=A_{i}\frac{\partial\sigma}{\partial\bm{s}_{i}}\in% \mathbb{R}^{d\times 1},\bm{y}=[\bm{y}_{1};\dots;\bm{y}_{n}]\in\mathbb{R}^{dn% \times 1},bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × 1 end_POSTSUPERSCRIPT , bold_italic_y = [ bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d italic_n × 1 end_POSTSUPERSCRIPT ,
𝒛=[A1𝒚1;;An𝒚n]dn×1,𝒛subscript𝐴1subscript𝒚1subscript𝐴𝑛subscript𝒚𝑛superscript𝑑𝑛1\displaystyle\bm{z}=[A_{1}\bm{y}_{1};\dots;A_{n}\bm{y}_{n}]\in\mathbb{R}^{dn% \times 1},bold_italic_z = [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d italic_n × 1 end_POSTSUPERSCRIPT ,

we obtain

[Hij(1)]=delimited-[]subscriptsuperscript𝐻1𝑖𝑗absent\displaystyle[H^{(1)}_{ij}]=[ italic_H start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] =
[𝒚1𝒚1(MA1)𝒚1𝒚2(MA1A2)𝒚1𝒚n(MA1An)𝒚2𝒚1(MA2A1)𝒚2𝒚2(MA2)𝒚2𝒚n(MA2An)𝒚n𝒚1(MAnA1)𝒚n𝒚2(MAnA2)𝒚n𝒚n(MAn)]delimited-[]subscript𝒚1superscriptsubscript𝒚1top𝑀subscript𝐴1subscript𝒚1superscriptsubscript𝒚2top𝑀subscript𝐴1subscript𝐴2subscript𝒚1superscriptsubscript𝒚𝑛top𝑀subscript𝐴1subscript𝐴𝑛subscript𝒚2superscriptsubscript𝒚1top𝑀subscript𝐴2subscript𝐴1subscript𝒚2superscriptsubscript𝒚2top𝑀subscript𝐴2subscript𝒚2superscriptsubscript𝒚𝑛top𝑀subscript𝐴2subscript𝐴𝑛subscript𝒚𝑛superscriptsubscript𝒚1top𝑀subscript𝐴𝑛subscript𝐴1subscript𝒚𝑛superscriptsubscript𝒚2top𝑀subscript𝐴𝑛subscript𝐴2subscript𝒚𝑛superscriptsubscript𝒚𝑛top𝑀subscript𝐴𝑛\displaystyle\left[\begin{array}[]{cccc}\bm{y}_{1}\bm{y}_{1}^{\top}(M-A_{1})&% \bm{y}_{1}\bm{y}_{2}^{\top}(M-A_{1}-A_{2})&\cdots&\bm{y}_{1}\bm{y}_{n}^{\top}(% M-A_{1}-A_{n})\\ \bm{y}_{2}\bm{y}_{1}^{\top}(M-A_{2}-A_{1})&\bm{y}_{2}\bm{y}_{2}^{\top}(M-A_{2}% )&\cdots&\bm{y}_{2}\bm{y}_{n}^{\top}(M-A_{2}-A_{n})\\ \vdots&\vdots&\ddots&\vdots\\ \bm{y}_{n}\bm{y}_{1}^{\top}(M-A_{n}-A_{1})&\bm{y}_{n}\bm{y}_{2}^{\top}(M-A_{n}% -A_{2})&\cdots&\bm{y}_{n}\bm{y}_{n}^{\top}(M-A_{n})\\ \end{array}\right][ start_ARRAY start_ROW start_CELL bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL ⋯ end_CELL start_CELL bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL ⋯ end_CELL start_CELL bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL ⋯ end_CELL start_CELL bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARRAY ]
=\displaystyle== M𝒚𝒚𝒚𝒛𝒛𝒚+diag(A1𝒚1𝒚1,,An𝒚n𝒚n)𝑀𝒚superscript𝒚top𝒚superscript𝒛top𝒛superscript𝒚topdiagsubscript𝐴1subscript𝒚1superscriptsubscript𝒚1topsubscript𝐴𝑛subscript𝒚𝑛superscriptsubscript𝒚𝑛top\displaystyle M\bm{y}\bm{y}^{\top}-\bm{y}\bm{z}^{\top}-\bm{z}\bm{y}^{\top}+% \text{diag}(A_{1}\bm{y}_{1}\bm{y}_{1}^{\top},\dots,A_{n}\bm{y}_{n}\bm{y}_{n}^{% \top})italic_M bold_italic_y bold_italic_y start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_italic_y bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_italic_z bold_italic_y start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + diag ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT )
=\displaystyle== 1M(M𝒚𝒛)(M𝒚𝒛)+diag(A1𝒚1𝒚1,,An𝒚n𝒚n)1M𝒛𝒛1𝑀𝑀𝒚𝒛superscript𝑀𝒚𝒛topdiagsubscript𝐴1subscript𝒚1superscriptsubscript𝒚1topsubscript𝐴𝑛subscript𝒚𝑛superscriptsubscript𝒚𝑛top1𝑀𝒛superscript𝒛top\displaystyle\frac{1}{M}\cdot(M\bm{y}-\bm{z})(M\bm{y}-\bm{z})^{\top}+\text{% diag}(A_{1}\bm{y}_{1}\bm{y}_{1}^{\top},\dots,A_{n}\bm{y}_{n}\bm{y}_{n}^{\top})% -\frac{1}{M}\bm{z}\bm{z}^{\top}divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ⋅ ( italic_M bold_italic_y - bold_italic_z ) ( italic_M bold_italic_y - bold_italic_z ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + diag ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_italic_z bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
succeeds\displaystyle\succ diag(A1𝒚1𝒚1,,An𝒚n𝒚n)1M𝒛𝒛.diagsubscript𝐴1subscript𝒚1superscriptsubscript𝒚1topsubscript𝐴𝑛subscript𝒚𝑛superscriptsubscript𝒚𝑛top1𝑀𝒛superscript𝒛top\displaystyle\text{diag}(A_{1}\bm{y}_{1}\bm{y}_{1}^{\top},\dots,A_{n}\bm{y}_{n% }\bm{y}_{n}^{\top})-\frac{1}{M}\bm{z}\bm{z}^{\top}.diag ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_italic_z bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT .

Therefore, it suffices to prove that the matrix

H~=Mdiag(A1𝒚1𝒚1,,An𝒚n𝒚n)𝒛𝒛~𝐻𝑀diagsubscript𝐴1subscript𝒚1superscriptsubscript𝒚1topsubscript𝐴𝑛subscript𝒚𝑛superscriptsubscript𝒚𝑛top𝒛superscript𝒛top\tilde{H}=M\text{diag}(A_{1}\bm{y}_{1}\bm{y}_{1}^{\top},\dots,A_{n}\bm{y}_{n}% \bm{y}_{n}^{\top})-\bm{z}\bm{z}^{\top}over~ start_ARG italic_H end_ARG = italic_M diag ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) - bold_italic_z bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT

is PSD. For any 𝒕=[𝒕1;;𝒕n]dn×1𝒕subscript𝒕1subscript𝒕𝑛superscript𝑑𝑛1\bm{t}=[\bm{t}_{1};\cdots;\bm{t}_{n}]\in\mathbb{R}^{dn\times 1}bold_italic_t = [ bold_italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; ⋯ ; bold_italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d italic_n × 1 end_POSTSUPERSCRIPT where 𝒕idsubscript𝒕𝑖superscript𝑑\bm{t}_{i}\in\mathbb{R}^{d}bold_italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we can verify that

𝒕H~𝒕superscript𝒕top~𝐻𝒕\displaystyle\bm{t}^{\top}\tilde{H}\bm{t}bold_italic_t start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_H end_ARG bold_italic_t =Mi=1nAi𝒕i𝒚i𝒚i𝒕i𝒕𝒛𝒛𝒕absent𝑀superscriptsubscript𝑖1𝑛subscript𝐴𝑖superscriptsubscript𝒕𝑖topsubscript𝒚𝑖superscriptsubscript𝒚𝑖topsubscript𝒕𝑖superscript𝒕top𝒛superscript𝒛top𝒕\displaystyle=M\sum_{i=1}^{n}A_{i}\bm{t}_{i}^{\top}\bm{y}_{i}\bm{y}_{i}^{\top}% \bm{t}_{i}-\bm{t}^{\top}\bm{z}\bm{z}^{\top}\bm{t}= italic_M ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_t start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_t
=i=1nAii=1nAi𝒕i𝒚i𝒚i𝒕i𝒕𝒛𝒛𝒕absentsuperscriptsubscript𝑖1𝑛subscript𝐴𝑖superscriptsubscript𝑖1𝑛subscript𝐴𝑖superscriptsubscript𝒕𝑖topsubscript𝒚𝑖superscriptsubscript𝒚𝑖topsubscript𝒕𝑖superscript𝒕top𝒛superscript𝒛top𝒕\displaystyle=\sum_{i=1}^{n}A_{i}\sum_{i=1}^{n}A_{i}\bm{t}_{i}^{\top}\bm{y}_{i% }\bm{y}_{i}^{\top}\bm{t}_{i}-\bm{t}^{\top}\bm{z}\bm{z}^{\top}\bm{t}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_t start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_t
=1i<jnAiAj(𝒚i𝒕i𝒚j𝒕j)20.absentsubscript1𝑖𝑗𝑛subscript𝐴𝑖subscript𝐴𝑗superscriptsuperscriptsubscript𝒚𝑖topsubscript𝒕𝑖superscriptsubscript𝒚𝑗topsubscript𝒕𝑗20\displaystyle=\sum_{1\leq i<j\leq n}A_{i}A_{j}(\bm{y}_{i}^{\top}\bm{t}_{i}-\bm% {y}_{j}^{\top}\bm{t}_{j})^{2}\geq 0.= ∑ start_POSTSUBSCRIPT 1 ≤ italic_i < italic_j ≤ italic_n end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 0 .

Therefore, the block matrix [Hij(1)]delimited-[]subscriptsuperscript𝐻1𝑖𝑗[H^{(1)}_{ij}][ italic_H start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] is always PSD for any 𝒙𝒙\bm{x}bold_italic_x and 𝒔𝒔\bm{s}bold_italic_s. A sufficient condition for [Hij𝒢]delimited-[]superscriptsubscript𝐻𝑖𝑗𝒢[H_{ij}^{\mathcal{G}}][ italic_H start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT ] to be negative definite is thus Hii(0)superscriptsubscript𝐻𝑖𝑖0H_{ii}^{(0)}italic_H start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT being positive definite (PD), i.e., Hii(0)(𝒔,𝒙)0,𝒔,𝒙succeedssuperscriptsubscript𝐻𝑖𝑖0𝒔𝒙0for-all𝒔𝒙H_{ii}^{(0)}(\bm{s},\bm{x})\succ 0,\forall\bm{s},\bm{x}italic_H start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( bold_italic_s , bold_italic_x ) ≻ 0 , ∀ bold_italic_s , bold_italic_x. It remains to show that

𝔼𝒙𝒳[[2σ𝒔i2(σ𝒔i)(σ𝒔i)(1AiM)]Ai(MAi)1M2]0.succeedssubscript𝔼similar-to𝒙𝒳delimited-[]delimited-[]superscript2𝜎superscriptsubscript𝒔𝑖2𝜎subscript𝒔𝑖superscript𝜎subscript𝒔𝑖top1subscript𝐴𝑖𝑀subscript𝐴𝑖𝑀subscript𝐴𝑖1superscript𝑀20\mathbb{E}_{\bm{x}\sim\mathcal{X}}\left[\Big{[}-\frac{\partial^{2}\sigma}{% \partial\bm{s}_{i}^{2}}-\Big{(}\frac{\partial\sigma}{\partial\bm{s}_{i}}\Big{)% }\Big{(}\frac{\partial\sigma}{\partial\bm{s}_{i}}\Big{)}^{\top}(1-\frac{A_{i}}% {M})\Big{]}\cdot A_{i}(M-A_{i})\frac{1}{M^{2}}\right]\succ 0.blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ [ - divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( 1 - divide start_ARG italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_M end_ARG ) ] ⋅ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_M - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] ≻ 0 . (17)

And a sufficient condition for Eq (17) to hold is

𝔼𝒙𝒳[2σ𝒔i2(σ𝒔i)(σ𝒔i)]0,succeeds-or-equalssubscript𝔼similar-to𝒙𝒳delimited-[]superscript2𝜎superscriptsubscript𝒔𝑖2𝜎subscript𝒔𝑖superscript𝜎subscript𝒔𝑖top0\mathbb{E}_{\bm{x}\sim\mathcal{X}}\left[-\frac{\partial^{2}\sigma}{\partial\bm% {s}_{i}^{2}}-\Big{(}\frac{\partial\sigma}{\partial\bm{s}_{i}}\Big{)}\Big{(}% \frac{\partial\sigma}{\partial\bm{s}_{i}}\Big{)}^{\top}\right]\succeq 0,blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ caligraphic_X end_POSTSUBSCRIPT [ - divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG ∂ italic_σ end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ⪰ 0 , (18)

which completes the proof.

 

D Proof of Theorem 2

Proof  Since the utility functions of Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT are twice differentiable, any LNE 𝒔𝒔\bm{s}bold_italic_s of Cext3subscriptsuperscript𝐶3extC^{3}_{\text{ext}}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ext end_POSTSUBSCRIPT satisfies the following definition

𝒔i=argmax𝒛iB(𝒔i,δ)ui(𝒛i,𝒔i;𝒘)subscript𝒔𝑖subscript𝑚𝑎𝑥subscript𝒛𝑖𝐵subscript𝒔𝑖𝛿subscript𝑢𝑖subscript𝒛𝑖subscript𝒔𝑖𝒘\bm{s}_{i}=\arg\mathop{max}_{\bm{z}_{i}\in B(\bm{s}_{i},\delta)}u_{i}(\bm{z}_{% i},\bm{s}_{-i};\bm{w})bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_arg start_BIGOP italic_m italic_a italic_x end_BIGOP start_POSTSUBSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_B ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ ) end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT ; bold_italic_w ) (19)

must also satisfy the first-order condition ui𝒔i|𝒔=(𝒔i,𝒔i)=0\frac{\partial u_{i}}{\partial\bm{s}_{i}}\Big{|}_{\bm{s}=(\bm{s}_{i},\bm{s}_{-% i)}}=0divide start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_s = ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT - italic_i ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0. If we let

F(𝒔,𝒘)=(u1(𝒔;𝒘)𝒔1,,un(𝒔;𝒘)𝒔n):dn×0mdn:𝐹𝒔𝒘subscript𝑢1𝒔𝒘subscript𝒔1subscript𝑢𝑛𝒔𝒘subscript𝒔𝑛superscript𝑑𝑛subscriptsuperscript𝑚absent0superscript𝑑𝑛F(\bm{s},\bm{w})=\left(\frac{\partial u_{1}(\bm{s};\bm{w})}{\partial\bm{s}_{1}% },\cdots,\frac{\partial u_{n}(\bm{s};\bm{w})}{\partial\bm{s}_{n}}\right):% \mathbb{R}^{dn}\times\mathbb{R}^{m}_{\geq 0}\rightarrow\mathbb{R}^{dn}italic_F ( bold_italic_s , bold_italic_w ) = ( divide start_ARG ∂ italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_s ; bold_italic_w ) end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , ⋯ , divide start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_s ; bold_italic_w ) end_ARG start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ) : blackboard_R start_POSTSUPERSCRIPT italic_d italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d italic_n end_POSTSUPERSCRIPT (20)

be a vector-valued function, the constraint (12) can be rewritten into

F(𝒔(𝒘),𝒘)=0.𝐹superscript𝒔𝒘𝒘0F(\bm{s}^{*}(\bm{w}),\bm{w})=0.italic_F ( bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_w ) , bold_italic_w ) = 0 . (21)

From the implicit function theorem (Krantz and Parks, 2002), the derivative of 𝒔superscript𝒔\bm{s}^{*}bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT w.r.t. 𝒘𝒘\bm{w}bold_italic_w can be written as

d𝒔d𝒘=(F𝒔)1F𝒘,𝑑𝒔𝑑𝒘superscript𝐹𝒔1𝐹𝒘\frac{d\bm{s}}{d\bm{w}}=-\left(\frac{\partial F}{\partial\bm{s}}\right)^{-1}% \cdot\frac{\partial F}{\partial\bm{w}},divide start_ARG italic_d bold_italic_s end_ARG start_ARG italic_d bold_italic_w end_ARG = - ( divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_s end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_w end_ARG ,

where [F𝒔]nd×nd,[F𝒘]nd×msubscriptdelimited-[]𝐹𝒔𝑛𝑑𝑛𝑑subscriptdelimited-[]𝐹𝒘𝑛𝑑𝑚\left[\frac{\partial F}{\partial\bm{s}}\right]_{nd\times nd},\left[\frac{% \partial F}{\partial\bm{w}}\right]_{nd\times m}[ divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_s end_ARG ] start_POSTSUBSCRIPT italic_n italic_d × italic_n italic_d end_POSTSUBSCRIPT , [ divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_w end_ARG ] start_POSTSUBSCRIPT italic_n italic_d × italic_m end_POSTSUBSCRIPT are the Jacobian matrices, and

dWd𝒘=dWd𝒔d𝒔d𝒘=dWd𝒔(F𝒔)1F𝒘,𝑑𝑊𝑑𝒘𝑑𝑊𝑑𝒔𝑑𝒔𝑑𝒘𝑑𝑊𝑑𝒔superscript𝐹𝒔1𝐹𝒘\frac{dW}{d\bm{w}}=\frac{dW}{d\bm{s}}\cdot\frac{d\bm{s}}{d\bm{w}}=-\frac{dW}{d% \bm{s}}\cdot\left(\frac{\partial F}{\partial\bm{s}}\right)^{-1}\cdot\frac{% \partial F}{\partial\bm{w}},divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_w end_ARG = divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_s end_ARG ⋅ divide start_ARG italic_d bold_italic_s end_ARG start_ARG italic_d bold_italic_w end_ARG = - divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_s end_ARG ⋅ ( divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_s end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_w end_ARG , (22)

where (dWd𝒔)1×ndsubscript𝑑𝑊𝑑𝒔1𝑛𝑑\left(\frac{dW}{d\bm{s}}\right)_{1\times nd}( divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_s end_ARG ) start_POSTSUBSCRIPT 1 × italic_n italic_d end_POSTSUBSCRIPT is the partial derivative of W𝑊Witalic_W w.r.t. 𝒔𝒔\bm{s}bold_italic_s.

Since wj0subscript𝑤𝑗0w_{j}\geq 0italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ 0, we apply a change of variable and denote each wjsubscript𝑤𝑗w_{j}italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as ewjsuperscript𝑒subscript𝑤𝑗e^{w_{j}}italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT instead. Next we calculate each term of the RHS of (22) to obtain an estimation of the gradient of our objective welfare function W𝑊Witalic_W to the user weight vector 𝒘𝒘\bm{w}bold_italic_w. Without loss of generality we let the user distribution 𝒳𝒳\mathcal{X}caligraphic_X be a uniform distribution on unit basis {𝒆1,,𝒆d}subscript𝒆1subscript𝒆𝑑\{\bm{e}_{1},\cdots,\bm{e}_{d}\}{ bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } and m=d𝑚𝑑m=ditalic_m = italic_d. The utility functions given in Eq (6) and the user welfare function read

W(𝒔)=1mj=1di=1n𝒔i𝒙jexp[β1𝒔i𝒙j]k=1nexp[β1𝒔k𝒙j].𝑊𝒔1𝑚superscriptsubscript𝑗1𝑑superscriptsubscript𝑖1𝑛superscriptsubscript𝒔𝑖topsubscript𝒙𝑗superscript𝛽1superscriptsubscript𝒔𝑖topsubscript𝒙𝑗superscriptsubscript𝑘1𝑛superscript𝛽1superscriptsubscript𝒔𝑘topsubscript𝒙𝑗W(\bm{s})=\frac{1}{m}\sum_{j=1}^{d}\sum_{i=1}^{n}\bm{s}_{i}^{\top}\bm{x}_{j}% \cdot\frac{\exp[\beta^{-1}\bm{s}_{i}^{\top}\bm{x}_{j}]}{\sum_{k=1}^{n}\exp[% \beta^{-1}\bm{s}_{k}^{\top}\bm{x}_{j}]}.italic_W ( bold_italic_s ) = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ divide start_ARG roman_exp [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_exp [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_ARG . (23)
ui(𝒔i,𝒔i)=1mj=1dewjexp[β1𝒔i𝒙j]k=1nexp[β1𝒔k𝒙j],i[K].formulae-sequencesubscript𝑢𝑖subscript𝒔𝑖subscript𝒔𝑖1𝑚superscriptsubscript𝑗1𝑑superscript𝑒subscript𝑤𝑗superscript𝛽1superscriptsubscript𝒔𝑖topsubscript𝒙𝑗superscriptsubscript𝑘1𝑛superscript𝛽1superscriptsubscript𝒔𝑘topsubscript𝒙𝑗𝑖delimited-[]𝐾u_{i}(\bm{s}_{i},\bm{s}_{-i})=\frac{1}{m}\sum_{j=1}^{d}e^{w_{j}}\cdot\frac{% \exp[\beta^{-1}\bm{s}_{i}^{\top}\bm{x}_{j}]}{\sum_{k=1}^{n}\exp[\beta^{-1}\bm{% s}_{k}^{\top}\bm{x}_{j}]},i\in[K].italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ divide start_ARG roman_exp [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_exp [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_ARG , italic_i ∈ [ italic_K ] . (24)

If we denote Aij=exp[β1𝒔i𝒙j],Mj=k=1nexp[β1𝒔k𝒙j]formulae-sequencesubscript𝐴𝑖𝑗superscript𝛽1superscriptsubscript𝒔𝑖topsubscript𝒙𝑗subscript𝑀𝑗superscriptsubscript𝑘1𝑛superscript𝛽1superscriptsubscript𝒔𝑘topsubscript𝒙𝑗A_{ij}=\exp[\beta^{-1}\bm{s}_{i}^{\top}\bm{x}_{j}],M_{j}=\sum_{k=1}^{n}\exp[% \beta^{-1}\bm{s}_{k}^{\top}\bm{x}_{j}]italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_exp [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] , italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_exp [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ], then AijMj=Pi(𝒔,𝒙j)subscript𝐴𝑖𝑗subscript𝑀𝑗subscript𝑃𝑖𝒔subscript𝒙𝑗\frac{A_{ij}}{M_{j}}=P_{i}(\bm{s},\bm{x}_{j})divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is exactly the probability of matching content 𝒔isubscript𝒔𝑖\bm{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to 𝒙jsubscript𝒙𝑗\bm{x}_{j}bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Given the assumption that n𝑛nitalic_n is sufficiently large, we have AijMj=o(1)subscript𝐴𝑖𝑗subscript𝑀𝑗𝑜1\frac{A_{ij}}{M_{j}}=o(1)divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = italic_o ( 1 ) is sufficiently small for any i𝑖iitalic_i and therefore we ignore the high-order infiintesimal terms such as Aij2Mj2,AkjAijMj2superscriptsubscript𝐴𝑖𝑗2superscriptsubscript𝑀𝑗2subscript𝐴𝑘𝑗subscript𝐴𝑖𝑗superscriptsubscript𝑀𝑗2\frac{A_{ij}^{2}}{M_{j}^{2}},\frac{A_{kj}A_{ij}}{M_{j}^{2}}divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , divide start_ARG italic_A start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG in the following derivation.

dWd𝒔i=𝑑𝑊𝑑subscript𝒔𝑖absent\displaystyle\frac{dW}{d\bm{s}_{i}}=divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 1mj=1d𝒙j[AijMj+β1𝒔i𝒙j(AijMjAij2Mj2)]1𝑚superscriptsubscript𝑗1𝑑subscript𝒙𝑗delimited-[]subscript𝐴𝑖𝑗subscript𝑀𝑗superscript𝛽1superscriptsubscript𝒔𝑖topsubscript𝒙𝑗subscript𝐴𝑖𝑗subscript𝑀𝑗superscriptsubscript𝐴𝑖𝑗2superscriptsubscript𝑀𝑗2\displaystyle\frac{1}{m}\sum_{j=1}^{d}\bm{x}_{j}\left[\frac{A_{ij}}{M_{j}}+% \beta^{-1}\bm{s}_{i}^{\top}\bm{x}_{j}\left(\frac{A_{ij}}{M_{j}}-\frac{A_{ij}^{% 2}}{M_{j}^{2}}\right)\right]divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG + italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ]
1mj=1d𝒙j[kiβ1𝒔k𝒙jAkjAijMj2]1𝑚superscriptsubscript𝑗1𝑑subscript𝒙𝑗delimited-[]subscript𝑘𝑖superscript𝛽1superscriptsubscript𝒔𝑘topsubscript𝒙𝑗subscript𝐴𝑘𝑗subscript𝐴𝑖𝑗superscriptsubscript𝑀𝑗2\displaystyle-\frac{1}{m}\sum_{j=1}^{d}\bm{x}_{j}\left[\sum_{k\neq i}\beta^{-1% }\bm{s}_{k}^{\top}\bm{x}_{j}\frac{A_{kj}A_{ij}}{M_{j}^{2}}\right]- divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_i end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ]
\displaystyle\approx 1mj=1d𝒙jAijMj(1+β1𝒔i𝒙j),i[n],1𝑚superscriptsubscript𝑗1𝑑subscript𝒙𝑗subscript𝐴𝑖𝑗subscript𝑀𝑗1superscript𝛽1superscriptsubscript𝒔𝑖topsubscript𝒙𝑗𝑖delimited-[]𝑛\displaystyle\frac{1}{m}\sum_{j=1}^{d}\bm{x}_{j}\frac{A_{ij}}{M_{j}}\left(1+% \beta^{-1}\bm{s}_{i}^{\top}\bm{x}_{j}\right),i\in[n],divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( 1 + italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_i ∈ [ italic_n ] , (25)

where π¯(𝒙j)k=1n𝒔k𝒙jAkjMj¯𝜋subscript𝒙𝑗superscriptsubscript𝑘1𝑛superscriptsubscript𝒔𝑘topsubscript𝒙𝑗subscript𝐴𝑘𝑗subscript𝑀𝑗\bar{\pi}(\bm{x}_{j})\triangleq\sum_{k=1}^{n}\bm{s}_{k}^{\top}\bm{x}_{j}\frac{% A_{kj}}{M_{j}}over¯ start_ARG italic_π end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≜ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG.

Next we calculate each term in the RHS of Eq (22). The i𝑖iitalic_i-th block of F(𝒔,𝒘)𝐹𝒔𝒘F(\bm{s},\bm{w})italic_F ( bold_italic_s , bold_italic_w ) is a d𝑑ditalic_d-dimensional vector given by

F(𝒔,𝒘)i=𝐹subscript𝒔𝒘𝑖absent\displaystyle F(\bm{s},\bm{w})_{i}=italic_F ( bold_italic_s , bold_italic_w ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1mj=1d𝒙j[β1ewj(AijMjAij2Mj2)]1𝑚superscriptsubscript𝑗1𝑑subscript𝒙𝑗delimited-[]superscript𝛽1superscript𝑒subscript𝑤𝑗subscript𝐴𝑖𝑗subscript𝑀𝑗superscriptsubscript𝐴𝑖𝑗2superscriptsubscript𝑀𝑗2\displaystyle\frac{1}{m}\sum_{j=1}^{d}\bm{x}_{j}\left[\beta^{-1}e^{w_{j}}\left% (\frac{A_{ij}}{M_{j}}-\frac{A_{ij}^{2}}{M_{j}^{2}}\right)\right]divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ]
\displaystyle\approx 1mj=1d𝒙jAijMjβ1ewj,i[n],1𝑚superscriptsubscript𝑗1𝑑subscript𝒙𝑗subscript𝐴𝑖𝑗subscript𝑀𝑗superscript𝛽1superscript𝑒subscript𝑤𝑗𝑖delimited-[]𝑛\displaystyle\frac{1}{m}\sum_{j=1}^{d}\bm{x}_{j}\frac{A_{ij}}{M_{j}}\beta^{-1}% e^{w_{j}},i\in[n],divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_i ∈ [ italic_n ] , (26)

the (i,j)𝑖𝑗(i,j)( italic_i , italic_j )-th block of F𝒘𝐹𝒘\frac{\partial F}{\partial\bm{w}}divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_w end_ARG is a d𝑑ditalic_d-dimensional vector given by

[F𝒘]ij=subscriptdelimited-[]𝐹𝒘𝑖𝑗absent\displaystyle\left[\frac{\partial F}{\partial\bm{w}}\right]_{ij}=[ divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_w end_ARG ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1m𝒙jβ1ewj(AijMjAij2Mj2)1𝑚subscript𝒙𝑗superscript𝛽1superscript𝑒subscript𝑤𝑗subscript𝐴𝑖𝑗subscript𝑀𝑗superscriptsubscript𝐴𝑖𝑗2superscriptsubscript𝑀𝑗2\displaystyle\frac{1}{m}\bm{x}_{j}\beta^{-1}e^{w_{j}}\left(\frac{A_{ij}}{M_{j}% }-\frac{A_{ij}^{2}}{M_{j}^{2}}\right)divide start_ARG 1 end_ARG start_ARG italic_m end_ARG bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
\displaystyle\approx 1m𝒙jβ1AijMjewj,i[n],j[d].formulae-sequence1𝑚subscript𝒙𝑗superscript𝛽1subscript𝐴𝑖𝑗subscript𝑀𝑗superscript𝑒subscript𝑤𝑗𝑖delimited-[]𝑛𝑗delimited-[]𝑑\displaystyle\frac{1}{m}\bm{x}_{j}\beta^{-1}\frac{A_{ij}}{M_{j}}e^{w_{j}},i\in% [n],j\in[d].divide start_ARG 1 end_ARG start_ARG italic_m end_ARG bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_i ∈ [ italic_n ] , italic_j ∈ [ italic_d ] . (27)

Since {𝒙i}i=1nsuperscriptsubscriptsubscript𝒙𝑖𝑖1𝑛\{\bm{x}_{i}\}_{i=1}^{n}{ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are orthogonal basis, the non-diagonal blocks of matrix F𝒔𝐹𝒔\frac{\partial F}{\partial\bm{s}}divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_s end_ARG are all zero matrices and the i𝑖iitalic_i-th diagonal block of matrix F𝒔𝐹𝒔\frac{\partial F}{\partial\bm{s}}divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_s end_ARG is given by

[F𝒔]iisubscriptdelimited-[]𝐹𝒔𝑖𝑖\displaystyle\left[\frac{\partial F}{\partial\bm{s}}\right]_{ii}[ divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_s end_ARG ] start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT =1mβ2j=1dewj[𝒙j𝒙j(AijMj3Aij2Mj2+2Aij3Mj3)]absent1𝑚superscript𝛽2superscriptsubscript𝑗1𝑑superscript𝑒subscript𝑤𝑗delimited-[]subscript𝒙𝑗superscriptsubscript𝒙𝑗topsubscript𝐴𝑖𝑗subscript𝑀𝑗3superscriptsubscript𝐴𝑖𝑗2superscriptsubscript𝑀𝑗22superscriptsubscript𝐴𝑖𝑗3superscriptsubscript𝑀𝑗3\displaystyle=\frac{1}{m\beta^{2}}\sum_{j=1}^{d}e^{w_{j}}\left[\bm{x}_{j}\bm{x% }_{j}^{\top}\left(\frac{A_{ij}}{M_{j}}-\frac{3A_{ij}^{2}}{M_{j}^{2}}+\frac{2A_% {ij}^{3}}{M_{j}^{3}}\right)\right]= divide start_ARG 1 end_ARG start_ARG italic_m italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG - divide start_ARG 3 italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 2 italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) ] (28)
1mβ2j=1dewj[𝒙j𝒙jAijMj],i[n].formulae-sequenceabsent1𝑚superscript𝛽2superscriptsubscript𝑗1𝑑superscript𝑒subscript𝑤𝑗delimited-[]subscript𝒙𝑗superscriptsubscript𝒙𝑗topsubscript𝐴𝑖𝑗subscript𝑀𝑗𝑖delimited-[]𝑛\displaystyle\approx\frac{1}{m\beta^{2}}\sum_{j=1}^{d}e^{w_{j}}\left[\bm{x}_{j% }\bm{x}_{j}^{\top}\frac{A_{ij}}{M_{j}}\right],i\in[n].≈ divide start_ARG 1 end_ARG start_ARG italic_m italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ] , italic_i ∈ [ italic_n ] .

Therefore, we can derive a approximation of dWd𝒘𝑑𝑊𝑑𝒘\frac{dW}{d\bm{w}}divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_w end_ARG as below:

dWdwj𝑑𝑊𝑑subscript𝑤𝑗\displaystyle\frac{dW}{dw_{j}}divide start_ARG italic_d italic_W end_ARG start_ARG italic_d italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG =dWd𝒔(F𝒔)1(F𝒘)jabsent𝑑𝑊𝑑𝒔superscript𝐹𝒔1subscript𝐹𝒘𝑗\displaystyle=-\frac{dW}{d\bm{s}}\cdot\left(\frac{\partial F}{\partial\bm{s}}% \right)^{-1}\cdot\left(\frac{\partial F}{\partial\bm{w}}\right)_{j}= - divide start_ARG italic_d italic_W end_ARG start_ARG italic_d bold_italic_s end_ARG ⋅ ( divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_s end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ ( divide start_ARG ∂ italic_F end_ARG start_ARG ∂ bold_italic_w end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
i=1n{1mk=1d𝒙kAikMk(1+β1𝒔i𝒙k)\displaystyle\approx-\sum_{i=1}^{n}\Bigg{\{}\frac{1}{m}\sum_{k=1}^{d}\bm{x}^{% \top}_{k}\frac{A_{ik}}{M_{k}}\left(1+\beta^{-1}\bm{s}_{i}^{\top}\bm{x}_{k}\right)≈ - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( 1 + italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
mβ2diag1(ew1Ai1/M1,,ewLAid/Md)1m𝒙jβ1AijMjewj}\displaystyle\cdot m\beta^{2}\text{diag}^{-1}(e^{w_{1}}A_{i1}/M_{1},\cdots,e^{% w_{L}}A_{id}/M_{d})\cdot\frac{1}{m}\bm{x}_{j}\beta^{-1}\frac{A_{ij}}{M_{j}}e^{% w_{j}}\Bigg{\}}⋅ italic_m italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT diag start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT / italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT / italic_M start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ⋅ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }
=β2mi=1newj(1+β1𝒔i𝒙j)β1AijMjewjabsentsuperscript𝛽2𝑚superscriptsubscript𝑖1𝑛superscript𝑒subscript𝑤𝑗1superscript𝛽1superscriptsubscript𝒔𝑖topsubscript𝒙𝑗superscript𝛽1subscript𝐴𝑖𝑗subscript𝑀𝑗superscript𝑒subscript𝑤𝑗\displaystyle=-\frac{\beta^{2}}{m}\sum_{i=1}^{n}e^{-w_{j}}\left(1+\beta^{-1}% \bm{s}_{i}^{\top}\bm{x}_{j}\right)\beta^{-1}\frac{A_{ij}}{M_{j}}e^{w_{j}}= - divide start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 + italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
1mi=1n𝒔i𝒙jAijMjabsent1𝑚superscriptsubscript𝑖1𝑛superscriptsubscript𝒔𝑖topsubscript𝒙𝑗subscript𝐴𝑖𝑗subscript𝑀𝑗\displaystyle\approx-\frac{1}{m}\sum_{i=1}^{n}\bm{s}_{i}^{\top}\bm{x}_{j}\frac% {A_{ij}}{M_{j}}≈ - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG (29)
=1mi=1nπ(𝒔i,𝒙j)Pi(𝒔,𝒙j)absent1𝑚superscriptsubscript𝑖1𝑛𝜋subscript𝒔𝑖subscript𝒙𝑗subscript𝑃𝑖𝒔subscript𝒙𝑗\displaystyle=-\frac{1}{m}\sum_{i=1}^{n}\pi(\bm{s}_{i},\bm{x}_{j})P_{i}(\bm{s}% ,\bm{x}_{j})= - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_s , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
=1m𝔼𝒔[π(𝒙j)],absent1𝑚subscript𝔼𝒔delimited-[]𝜋subscript𝒙𝑗\displaystyle=-\frac{1}{m}\mathbb{E}_{\bm{s}}[\pi(\bm{x}_{j})],= - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT [ italic_π ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] , (30)

where (29) holds because β1>>1much-greater-thansuperscript𝛽11\beta^{-1}>>1italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT > > 1.

Therefore, Eq (30) suggests that the following update rule

ewj=ewjeηπ¯(𝒙j)superscript𝑒subscriptsuperscript𝑤𝑗superscript𝑒subscript𝑤𝑗superscript𝑒𝜂¯𝜋subscript𝒙𝑗e^{w^{\prime}_{j}}=e^{w_{j}}\cdot e^{-\eta\bar{\pi}(\bm{x}_{j})}italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ italic_e start_POSTSUPERSCRIPT - italic_η over¯ start_ARG italic_π end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (31)

aligns with the gradient direction of W(𝒘)𝑊𝒘W(\bm{w})italic_W ( bold_italic_w ), which yields Eq (13).