Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search

Pai Peng [email protected] ShanghaiChina Quanxiang Jia [email protected] ShanghaiChina Ziqiang Zhou [email protected] ShanghaiChina Shuang Hong [email protected] ShanghaiChina  and  Zichong Xiao [email protected] ShanghaiChina
(2018)
Abstract.

Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. The ambiguous and incomplete nature of search queries makes it important for CTR models to mine users’ search intentions from rich historical behavior data. With the development of deep learning, a series of works has been proposed to model user interests, bringing significant improvement in model performance. Attention-based methods focus on summarizing user behaviors into a comprehensive interest representation depending on their relationship with the target. Graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods face the challenges of deployment and performance in large-scale e-commerce search systems. First, these methods usually require a separate graph engine for graph storage and sampling, which makes it hard to jointly train graph embedding with CTR prediction, while requiring more implementation effort. Second, they mainly focus on recommendation scenarios, and therefore their graph structures highly depend on item’s sequential information from user behaviors, ignoring query’s sequential signal and query-item correlation. In both practice and our experiments, this extra information brings notable improvement because of the query-dependent nature of e-commerce search.

In this paper, we propose a new approach named Light-weight End-to-End Graph Interest Network (EGIN) to effectively mine users’ search interests and tackle previous challenges. (i) EGIN utilizes query and item’s correlation and sequential information from the search system to build a heterogeneous graph for better CTR prediction in e-commerce search. (ii) EGIN’s graph embedding learning shares the same training input and is jointly trained with CTR prediction, making the end-to-end framework effortless to deploy in large-scale search systems. The proposed EGIN is composed of three parts: query-item heterogeneous graph, light-weight graph sampling, and multi-interest network. The query-item heterogeneous graph captures correlation and sequential information of query and item efficiently by the proposed light-weight graph sampling. The multi-interest network is well designed to utilize graph embedding to capture various similarity relationships between query and item to enhance the final CTR prediction. We conduct extensive experiments on both public and industrial datasets to demonstrate the effectiveness of the proposed EGIN. At the same time, the training cost of graph learning is relatively low compared with the main CTR prediction task, ensuring efficiency in practical applications. Our code will be publicly available.

sponsored search, click-through rate prediction, graph neural network, behavior sequence
copyright: acmcopyrightjournalyear: 2023doi: XXXXXXX.XXXXXXXconference: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 23-27, 2023; Taipei, Taiwanprice: 15.00isbn: 978-1-4503-XXXX-X/18/06ccs: Information systems Sponsored search advertisingccs: Information systems Recommender systemsccs: Mathematics of computing Graph algorithms

1. Introduction

In sponsored search, click-through rate (CTR) prediction is vital for improving user experience and total revenue. The accuracy of CTR prediction depends on the system’s ability to understand users’ current search intentions and historical interests. The ambiguity of search queries and the consistency of user interests encourage search systems to exploit user behavior in a more and more elaborate way.

With the development of search and recommender systems, user behavior sequence is inspected in different ways to mine user interest. Recently, a series of works focusing on modeling latent user interest from historical behaviors use various deep neural network architectures including CNN (Tang and Wang, 2018; Yuan et al., 2019), RNN (Feng et al., 2019; Zhou et al., 2018), Transformer (Chen et al., 2019; Min et al., 2022), Capsule (Li et al., 2019b), etc. Besides, Graph Neural Network (Perozzi et al., 2014; Defferrard et al., 2016; Wang et al., 2018; Li et al., 2019a; Wang et al., 2019) has become a popular method for embedding learning to aid CTR prediction in search and recommendation systems and achieved great success. The learned graph embedding is usually consumed by the CTR prediction network to combine the graph learning and the final CTR objective.

These graph-based methods, although following either a two-stage or end-to-end approach, still require first constructing a graph and then sampling using a graph engine. We summarize them as Construct&Sample paradigm. When applying these methods in a large-scale e-commerce system with billions of items and samples, updating and retrieving the graph within a short delay can be challenging, thus becoming the bottleneck of end-to-end joint training of graph embedding with CTR prediction task.

In addition, these graph-based methods mentioned above mainly focus on items’ sequential and attributive information during graph construction, ignoring queries’ sequential information and query-item correlation, which help improve search ctr prediction quality in our industrial practice.

We propose a new approach called Light-weight End-to-End Graph Interest Network (EGIN) to solve these problems. Firstly, the resource consumption and time delay of the Construct&Sample paradigm is alleviated by introducing a light-weight graph sampling method based on data manipulation of CTR task input instead of building a graph engine. Secondly, the EGIN utilizes the query-item heterogeneous graph for search CTR prediction.

The main contributions of this paper are as follows.

  • We introduce our light-weight graph sampling that shares the same training input with the CTR prediction task without physically storing the graph structure. End-to-end joint training of graph embedding and CTR prediction is implemented without reorganizing complicated graph data or depending on the graph engine. Our method can be effortlessly integrated into other CTR prediction networks.

  • We propose the query-item heterogeneous graph to model query-item correlation in graph structure, which improves CTR prediction performance in search scenarios compared with the item-only version. We also design the multi-interest network to exploit query-item correlation provided by graph learning toward a better understanding of user interest.

  • We conduct extensive experiments on public and industrial datasets to demonstrate the effectiveness of the proposed EGIN framework. The online A/B test is conducted to verify the productive performance of the proposed approach. We also discuss the influence of different training techniques and data management.

2. Related Work

2.1. CTR Prediction

CTR prediction has gained attention from researchers for many years because of its vital role in search and recommendation systems and its ability to improve the revenue of online applications largely. Considering the high sparsity of the input features, a series of works have been proposed to capture feature interactions. Factorization Machines (FM) (Rendle, 2010) uses a low-dimensional vector for feature representation and learns the second-order crossover of features by the inner product. Wide&Deep (Cheng et al., 2016) jointly trains the wide linear unit for and the deep MLP layer to enhance both memorization and generalization. DeepFM (Guo et al., 2017) integrated factorization machines and deep neural networks to learn the low-order feature interactions. xDeepFM (Lian et al., 2018) proposes a novel Compressed Interaction Network (CIN) to model high-order feature interactions in an explicit fashion and adopts the traditional DNN simultaneously. Deep & Cross network (DCN) (Wang et al., 2017) adopts interaction with representation in each layer with original feature embedding to learn higher-order feature representations. Deep learning-based methods have also achieved great success in user interest mining to aid CTR prediction. Deep Interest Network (DIN) (Zhou et al., 2018) developed an attention-based method to assign different weights to historical commodities according to their relationship with the target commodity. Depp Interest Evolution Network (DIEN (Zhou et al., 2019) further utilizes RNN to model the evolution of user interest by taking sequential information into account. Deep Session Interest Network (DSIN) (Feng et al., 2019) leverages Bi-LSTM with self-attention layers to capture users’ inter-session and intro-session interests. MIMN (Pi et al., 2019) tackles the challenge of long sequential user behavior modeling by decoupling the user interest model from the entire framework and designing the User Interest Center (UIC) to record new behaviors incrementally. Behavior Sequence Transformer (BST) (Chen et al., 2019) uses Transformer to capture underlying sequential signals from user behavior for a better recommendation.

2.2. Graph Neural Networks for Search and Recommendation

A group of graph embedding methods is introduced into the CTR prediction task with their strong potential for modeling graph information in search and recommendation. EGES (Wang et al., 2018) adopts DeepWalk (Perozzi et al., 2014) to construct graphs based on click sequence, and the Skip-Gram model is used for graph embedding learning. Then the learned node representation is consumed in the CTR prediction network. Graph Intention Network (GIN) (Li et al., 2019a) utilizes user behavior sequence to build a co-occurrence commodity network and applies graph diffusion and aggregation to enrich node representation of historical clicks to overcome behavior sparsity and weak generalization problems. KGAT (Wang et al., 2019) combines user-item graph with knowledge graph for collaborative knowledge and then applies graph convolution to get the graph representation. Heterogeneous graph Attention Network (HGAT) (Linmei et al., 2019) utilizes a semantic-level and a node-level attention network to discriminate the importance of neighbor nodes and node types. DG-ENN (Guo et al., 2021) uses attribute graph and collaborative graph to refine the embedding with strategies, and alleviates the feature and behavior sparsity problems. However, when applying this method directly in CTR prediction in an end-to-end manner, a separate graph engine is needed to restore and sample the graph for embedding learning. Otherwise, the two-stage approach which first conducts graph embedding learning and then consumes in CTR prediction brings the problem that the graph representation is not optimized for the final objective. Also, these methods focus more on recommendations and ignore the potential of capturing interactions between items and queries to build heterogeneous graphs. To face these challenges, we propose a novel approach to conduct high-efficient end-to-end graph learning in e-commerce search scenarios.

3. The Proposed Approach

Refer to caption
Figure 1. The framework of the proposed EGIN model. We use the unified input for both graph learning and CTR prediction network. Our query-item heterogeneous graph is constructed based on user behavior sequence and conducts embedding learning based on edges. We build i2i edges for every neighbor within the distance of 2 in click sequence and build q2i/q2q pairs by time and category constraints. The same behavior sequence is provided to the CTR prediction network, where multiple similarity relationships are calculated based on graph embedding.
\Description

The framework of EGIN. Left-side shows how the graph is structured as pairs, right-side shows how features are fed into the CTR prediction model.

In this section, we introduce our Light-weight End-to-End Graph Interest Network (EGIN) in detail. As shown in Figure 1, EGIN is composed of two parts, on the left side is the query-item heterogeneous graph, and on the right side is the multi-interest network. Notice that the two parts are jointly optimized in an end-to-end manner and share the same input extracted from search impression logs. Equation 1 presents our joint training objective.

3.1. Query-item Heterogeneous Graph

Refer to caption
Figure 2. Query-item heterogeneous graph structure. User behavior sequence containing query and click sequence ordered in time. First, neighboring items are connected by a window size of 2. Then queries are segmented into sessions based on their semantic similarity and happening time. Queries within the same session are all connected. Finally, every query and its nearby items within the time window that satisfy categorical constraints with the query is connected to capture item-query correlation.
\Description

Graph structure.

In previous graph-based CTR prediction work, users’ historical behaviors are well explored to represent personal interest. In e-commerce search scenarios, queries explicitly describe interest evolution and current search intention. However, previous work mainly focuses on semantic knowledge conducted by the query, ignoring its sequential information and correlation with items.

To jointly model signals of click sequence and query sequence extracted from user behavior, we introduce our query-item heterogeneous graph as shown in Figure 2. With query and item as nodes, our graph architecture contains three kinds of edges: item2item, query2query and query2item:

item2item. Following previous work, we utilize user click sequence to capture item2item relationship based on co-occurrence following Skip-Gram (Mikolov et al., 2013a, b). For example, in Figure 2, items are connected according to the window size. Notice that instead of splitting the click sequence into sessions by time frame or user interest, we adopt the original click sequence to learn the item2item relationship.

User’s click sequence usually forms a list of several clusters, where items within the cluster are often similar to each other. In contrast, cross-cluster items are quite different in the category, representing the user’s interest transfer. Considering these two properties, user’s click sequence can provide information about both intro-cluster similarity and cross-cluster transfer probability.

query2query. We adopt users’ historical query sequences to build the query2query relationship. Query sequence is segmented into multiple sessions similar to (Feng et al., 2019) based on semantic similarity and time constraint to prevent edge construction across dissimilar queries. Queries within the same session are treated as similar and are connected with each other. As in Figure 2, all queries form three sessions, and each session constructs a complete graph.

query2item. After capturing sequential information of item and query separately, the relationship between query and item is preserved in our graph to jointly learn their embeddings, which plays an important role in modern search systems. For each query in a user’s behavior sequence, the clicked items within a time window before or after this query are filtered by category constraints to become the relevant items. Then we connect edges between every query and its relevant items. As in Figure 2, queries and their relevant items are painted the same colors, but only relevant items are connected with corresponding queries.

3.2. Light-weight Subgraph Sampling

When adopting graph-based methods in CTR prediction, previous work usually constructs graphs based on global or historical information, and the whole graph is stored physically. During embedding learning, the subgraph is sampled from the whole graph to fit in the memory. We summarize them as Construct&Sample paradigm. When applying these methods in industrial scenarios with billions of items and samples, updating and retrieving graphs with a short delay can be challenging, thus becoming the bottleneck of end-to-end joint training of graph embedding and CTR prediction task.

Towards a light-weight subgraph sampling method, we first consider an underlying graph representing current global user interest as implicit graph. Statistically, every user’s mini-graph is merged together to form the implicit graph. Previous graph construction methods based on accumulative data are gradual approximations of the implicit graph. In this article, we consider user behavior sequence or its slightly manipulated version to be a subgraph sampling of the implicit graph, thus avoiding storing and sampling the graph physically.

The pseudo-code of the light-weight graph sampling and learning method is listed in Algorithm 1. Given users’ click sequences and query sequences from impression log data, we first build edges between items with co-occurrence according to the window size. Then query2query and query2item edges are connected using the BuildGraph method described in Algorithm 2, explaining how two entities are related according to time and category constraints. Finally, our training objective over graph embedding is computed based on all the edges. With billions of items and queries in our dataset, we adopt Negative Sampling and softmax loss to approximately optimize the graph embedding.

LG=logexp(eaep)exp(eaep)+enNexp(eaen)subscript𝐿𝐺logexpsubscript𝑒𝑎subscript𝑒𝑝expsubscript𝑒𝑎subscript𝑒𝑝subscriptsubscript𝑒𝑛𝑁expsubscript𝑒𝑎subscript𝑒𝑛L_{G}=-\mathrm{log}\frac{\mathrm{exp}(e_{a}\cdot e_{p})}{\mathrm{exp}(e_{a}% \cdot e_{p})+\sum_{e_{n}\in N}\mathrm{exp}(e_{a}\cdot e_{n})}italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = - roman_log divide start_ARG roman_exp ( italic_e start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG roman_exp ( italic_e start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_N end_POSTSUBSCRIPT roman_exp ( italic_e start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG
Algorithm 1 Pseudo-code of graph embedding learning
0:  User click sequence C𝐶Citalic_CUser query sequence Q𝑄Qitalic_QWindow size w𝑤witalic_w
0:  Loss over the graph embedding
1:  Initialization: Empty edge set E𝐸Eitalic_E
2:  for cjCsubscript𝑐𝑗𝐶c_{j}\in Citalic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_C do
3:     for ckC[jw:j+w]c_{k}\in C[j-w:j+w]italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_C [ italic_j - italic_w : italic_j + italic_w ] do
4:        E=E{(cj,ck),(ck,cj)}𝐸𝐸subscript𝑐𝑗subscript𝑐𝑘subscript𝑐𝑘subscript𝑐𝑗E=E\cup\{(c_{j},c_{k}),(c_{k},c_{j})\}italic_E = italic_E ∪ { ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , ( italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) }
5:     end for
6:  end for
7:  E=EBuildEdge(Q,Q)BuildEdge(Q,C)𝐸𝐸BuildEdge𝑄𝑄BuildEdge𝑄𝐶E=E\cup\mathrm{BuildEdge}(Q,Q)\cup\mathrm{BuildEdge}(Q,C)italic_E = italic_E ∪ roman_BuildEdge ( italic_Q , italic_Q ) ∪ roman_BuildEdge ( italic_Q , italic_C )
8:  for edge𝑒𝑑𝑔𝑒edgeitalic_e italic_d italic_g italic_e in E𝐸Eitalic_E do
9:     (p1,p2)subscript𝑝1subscript𝑝2(p_{1},p_{2})( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = edge𝑒𝑑𝑔𝑒edgeitalic_e italic_d italic_g italic_e
10:     N=NegSample(typeof(p2))𝑁NegSampletypeofsubscript𝑝2N=\mathrm{NegSample}(\mathrm{typeof}(p_{2}))italic_N = roman_NegSample ( roman_typeof ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) )
11:     LG=SoftmaxLoss(p1,p2,N)subscript𝐿𝐺SoftmaxLosssubscript𝑝1subscript𝑝2𝑁L_{G}=\mathrm{SoftmaxLoss}(p_{1},p_{2},N)italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = roman_SoftmaxLoss ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_N )
12:  end for
Algorithm 2 Pseudo-code of BuildEdge
0:  User click/query sequence S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTTime map** TIME()TIME\mathrm{TIME}()roman_TIME ( )Category map** CAT()CAT\mathrm{CAT}()roman_CAT ( )timespan T𝑇Titalic_T
0:  Built Edges E𝐸Eitalic_E
1:  Initialization: Empty edge set E𝐸Eitalic_E
2:  for i1S1subscript𝑖1subscript𝑆1i_{1}\in S_{1}italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT do
3:     for i2S2subscript𝑖2subscript𝑆2i_{2}\in S_{2}italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT do
4:        if  0<TIME(i2)TIME(i1)<T0TIMEsubscript𝑖2TIMEsubscript𝑖1𝑇0<\mathrm{TIME}(i_{2})-\mathrm{TIME}(i_{1})<T0 < roman_TIME ( italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - roman_TIME ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_T then
5:           if CAT(i1)CAT(i1)ϕCATsubscript𝑖1CATsubscript𝑖1italic-ϕ\mathrm{CAT}(i_{1})\cap\mathrm{CAT}(i_{1})\neq\phiroman_CAT ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∩ roman_CAT ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≠ italic_ϕ then
6:              E=E{(i1,i2),(i2,i1)}𝐸𝐸subscript𝑖1subscript𝑖2subscript𝑖2subscript𝑖1E=E\cup\{(i_{1},i_{2}),(i_{2},i_{1})\}italic_E = italic_E ∪ { ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ( italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) }
7:           end if
8:        end if
9:     end for
10:  end for

Here easubscript𝑒𝑎e_{a}italic_e start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT means anchor, epsubscript𝑒𝑝e_{p}italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT means positive sample, and N𝑁Nitalic_N represents negative samples. Graph learning loss LGsubscript𝐿𝐺L_{G}italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT can be split into Li2isubscript𝐿𝑖2𝑖L_{i2i}italic_L start_POSTSUBSCRIPT italic_i 2 italic_i end_POSTSUBSCRIPT, Lq2qsubscript𝐿𝑞2𝑞L_{q2q}italic_L start_POSTSUBSCRIPT italic_q 2 italic_q end_POSTSUBSCRIPT, and Lq2isubscript𝐿𝑞2𝑖L_{q2i}italic_L start_POSTSUBSCRIPT italic_q 2 italic_i end_POSTSUBSCRIPT depending on the kind of edge from which the positive pair is extracted.

With the growth of the negative sample size, model performance also increase. In practice, we choose the negative sample number of 100 as a compromise of performance and efficiency. To support sampling for both query and item on our stream machine learning platform, we dynamically maintain a query queue and an item queue of size 1 million to conduct cross-batch negative sampling.

3.3. Multi-interest Network for CTR Prediction

Powered by the query-item heterogeneous graph, which jointly learns query and item embedding, we design a CTR prediction network named multi-interest network that utilizes various kinds of user interest measurements based on i2i, q2q, and q2i relationships. The network is composed of the item interest unit, query interest unit, query-item compatibility unit, and CTR prediction layer.

Item interest unit. Based on graph embedding, we first compute the cosine similarity between target item itsubscript𝑖𝑡i_{t}italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and historical items I=ij(j=1,2,,n)𝐼subscript𝑖𝑗𝑗12𝑛I=i_{j}(j=1,2,\cdots,n)italic_I = italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_j = 1 , 2 , ⋯ , italic_n ):

sim(it,ij)=eitTeijeiteij𝑠𝑖𝑚subscript𝑖𝑡subscript𝑖𝑗superscriptsubscript𝑒subscript𝑖𝑡𝑇subscript𝑒subscript𝑖𝑗normsubscript𝑒subscript𝑖𝑡normsubscript𝑒subscript𝑖𝑗sim(i_{t},i_{j})=\frac{e_{i_{t}}^{T}e_{i_{j}}}{||e_{i_{t}}||\cdot||e_{i_{j}}||}italic_s italic_i italic_m ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = divide start_ARG italic_e start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG | | italic_e start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | ⋅ | | italic_e start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | end_ARG

Then, we use top-k retrieval to get k historical items with the largest similarity score:

i1,i2,,ik=topk(sim(it,I))subscriptsuperscript𝑖1subscriptsuperscript𝑖2subscriptsuperscript𝑖𝑘𝑡𝑜𝑝𝑘𝑠𝑖𝑚subscript𝑖𝑡𝐼i^{\prime}_{1},i^{\prime}_{2},\cdots,i^{\prime}_{k}=topk(sim(i_{t},I))italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_t italic_o italic_p italic_k ( italic_s italic_i italic_m ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_I ) )

We adopt the equal width interval binning method (Dougherty et al., 1995) to transformer sim(it,ik)𝑠𝑖𝑚subscript𝑖𝑡subscriptsuperscript𝑖𝑘sim(i_{t},i^{\prime}_{k})italic_s italic_i italic_m ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) into the categorical feature and get the embedding. To capture the order information of top-k items, we add positional embedding depending on the original index of top-k items to binning embedding to get item representations:

eik=binning(sim(it,ik))+positional(ik)subscript𝑒subscriptsuperscript𝑖𝑘𝑏𝑖𝑛𝑛𝑖𝑛𝑔𝑠𝑖𝑚subscript𝑖𝑡subscriptsuperscript𝑖𝑘𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑎𝑙subscriptsuperscript𝑖𝑘e_{i^{\prime}_{k}}=binning(sim(i_{t},i^{\prime}_{k}))+positional(i^{\prime}_{k})italic_e start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_b italic_i italic_n italic_n italic_i italic_n italic_g ( italic_s italic_i italic_m ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) + italic_p italic_o italic_s italic_i italic_t italic_i italic_o italic_n italic_a italic_l ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )

Finally, we concatenate k item representations to form the i2i feature, which is then fed into the CTR prediction network.

fi2i=concate(ei1,ei2,,eik)subscript𝑓𝑖2𝑖concatesubscript𝑒subscriptsuperscript𝑖1subscript𝑒subscriptsuperscript𝑖2subscript𝑒subscriptsuperscript𝑖𝑘f_{i2i}=\mathrm{concate}(e_{i^{\prime}_{1}},e_{i^{\prime}_{2}},\cdots,e_{i^{% \prime}_{k}})italic_f start_POSTSUBSCRIPT italic_i 2 italic_i end_POSTSUBSCRIPT = roman_concate ( italic_e start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , ⋯ , italic_e start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

We summarize the process described in this unit as the function SimExtract(target,sequence)SimExtract𝑡𝑎𝑟𝑔𝑒𝑡𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒\mathrm{SimExtract}(target,sequence)roman_SimExtract ( italic_t italic_a italic_r italic_g italic_e italic_t , italic_s italic_e italic_q italic_u italic_e italic_n italic_c italic_e ), which gives a comprehensive representation of similarity between target𝑡𝑎𝑟𝑔𝑒𝑡targetitalic_t italic_a italic_r italic_g italic_e italic_t and sequence𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒sequenceitalic_s italic_e italic_q italic_u italic_e italic_n italic_c italic_e. Therefore i2i feature can also be written as:

fi2i=SimExtract(it,I)subscript𝑓𝑖2𝑖SimExtractsubscript𝑖𝑡𝐼f_{i2i}=\mathrm{SimExtract}(i_{t},I)italic_f start_POSTSUBSCRIPT italic_i 2 italic_i end_POSTSUBSCRIPT = roman_SimExtract ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_I )

Query interest unit. In addition to the item2item relationship that demonstrates whether the target item is similar to the historical items, we want to apply the same logic to query sequence to describe the relationship between queries which reveal users’ explicit search interests.

Following item interest layer, given current query qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and query sequence Q=qk(k=1,2,,m)𝑄subscript𝑞𝑘𝑘12𝑚Q=q_{k}(k=1,2,\cdots,m)italic_Q = italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_k = 1 , 2 , ⋯ , italic_m ), we get q2q feature by top-k retrieval, binning and positional embedding:

fq2q=SimExtract(qt,Q)subscript𝑓𝑞2𝑞SimExtractsubscript𝑞𝑡𝑄f_{q2q}=\mathrm{SimExtract}(q_{t},Q)italic_f start_POSTSUBSCRIPT italic_q 2 italic_q end_POSTSUBSCRIPT = roman_SimExtract ( italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Q )

Query-item compatibility unit. Relationships between current query and historical items supplement the pure item-based or query-based interest representation and further utilize the correlation between query and items corresponding to the heterogeneous graph structure. We call this relationship query-item compatibility because it describes how well the current query is compatible with the user’s interested items.

Here we also adopt the same methods to produce the compatibility information given the current query qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and historical click sequence I𝐼Iitalic_I:

fq2i=SimExtract(qt,I)subscript𝑓𝑞2𝑖SimExtractsubscript𝑞𝑡𝐼f_{q2i}=\mathrm{SimExtract}(q_{t},I)italic_f start_POSTSUBSCRIPT italic_q 2 italic_i end_POSTSUBSCRIPT = roman_SimExtract ( italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_I )

CTR prediction layer. We concatenate the aforementioned features fi2isubscript𝑓𝑖2𝑖f_{i2i}italic_f start_POSTSUBSCRIPT italic_i 2 italic_i end_POSTSUBSCRIPT,fq2qsubscript𝑓𝑞2𝑞f_{q2q}italic_f start_POSTSUBSCRIPT italic_q 2 italic_q end_POSTSUBSCRIPT, fq2isubscript𝑓𝑞2𝑖f_{q2i}italic_f start_POSTSUBSCRIPT italic_q 2 italic_i end_POSTSUBSCRIPT and other features fosubscript𝑓𝑜f_{o}italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, then feed them into an MLP layer for final CTR prediction:

pctr=sigmoid(MLP(concate(fi2i,fq2q,fq2i,fo)))pctrsigmoidMLPconcatesubscript𝑓𝑖2𝑖subscript𝑓𝑞2𝑞subscript𝑓𝑞2𝑖subscript𝑓𝑜\mathrm{pctr}=\mathrm{sigmoid}(\mathrm{MLP}(\mathrm{concate}(f_{i2i},f_{q2q},f% _{q2i},f_{o})))roman_pctr = roman_sigmoid ( roman_MLP ( roman_concate ( italic_f start_POSTSUBSCRIPT italic_i 2 italic_i end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_q 2 italic_q end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_q 2 italic_i end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ) )

The objective function of the multi-interest network is the cross entropy loss function as follows:

LCTR=1Ni=0Nyilog(pctri)+(1yi)log(1pctri)subscript𝐿𝐶𝑇𝑅1𝑁superscriptsubscript𝑖0𝑁subscript𝑦𝑖logsubscriptpctr𝑖1subscript𝑦𝑖log1subscriptpctr𝑖L_{CTR}=-\frac{1}{N}\sum_{i=0}^{N}y_{i}{\mathrm{log}(\mathrm{pctr}_{i})}+(1-y_% {i}){\mathrm{log}(1-\mathrm{pctr}_{i})}italic_L start_POSTSUBSCRIPT italic_C italic_T italic_R end_POSTSUBSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( roman_pctr start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ( 1 - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_log ( 1 - roman_pctr start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

Where N𝑁Nitalic_N is the total number of samples, yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the ground truth label of the i𝑖iitalic_ith sample, and pctrisubscriptpctr𝑖\mathrm{pctr}_{i}roman_pctr start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the CTR prediction from EGIN of the i𝑖iitalic_ith sample.

3.4. End-to-End Joint Training

In most previous works, graph embedding learning takes a different form of input compared with the CTR prediction task. Also, they usually rely on graph engines to restore and sample graphs. These problems bring difficulties to end-to-end joint training of the two tasks. In previous sections, we introduce two key features of our proposed EGIN method: (i) light-weight graph sampling, (ii) unified input for graph and CTR prediction. Benefiting from the nature of EGIN, steam-style end-to-end training is effortless to implement in both offline and online scenarios.

As shown in Figure 1, both tasks receive the same raw input sampled from search impression logs, and the training objectives of different parts of EGIN are jointly considered for model optimization:

(1) L=LCTR+αLi2i+βLq2q+γLq2i𝐿subscript𝐿𝐶𝑇𝑅𝛼subscript𝐿𝑖2𝑖𝛽subscript𝐿𝑞2𝑞𝛾subscript𝐿𝑞2𝑖L=L_{CTR}+\alpha L_{i2i}+\beta L_{q2q}+\gamma L_{q2i}italic_L = italic_L start_POSTSUBSCRIPT italic_C italic_T italic_R end_POSTSUBSCRIPT + italic_α italic_L start_POSTSUBSCRIPT italic_i 2 italic_i end_POSTSUBSCRIPT + italic_β italic_L start_POSTSUBSCRIPT italic_q 2 italic_q end_POSTSUBSCRIPT + italic_γ italic_L start_POSTSUBSCRIPT italic_q 2 italic_i end_POSTSUBSCRIPT

Compared with two-stage methods that first train graph embedding and then exploit it in CTR prediction, the end-to-end framework comes with many advantages. It can avoid saving and loading graph embedding frequently, and therefore consistency between the two tasks is better guaranteed. With the gradient backpropagated from both tasks, the graph embedding is jointly trained to fill the gap between graph learning and final CTR prediction.

4. Experiments

In this section, we conduct extensive experiments to demonstrate the effectiveness of our proposed methods. First, we evaluate the performance of different methods on public and industrial datasets. Second, we discuss our training detail and the corresponding influence. Third, we analyze the graph embedding result by visualization and statistics. Finally, we introduce our online A/B test result.

4.1. Datasets

Model comparisons are conducted on a commonly used public Taobao dataset (Zhu et al., 2018) and an industrial dataset collected from our e-commerce system. Table 1 shows the statistics of two datasets.

Taobao Dataset 111https://tianchi.aliyun.com/dataset/649 is a collection of user behaviors from Taobao’s recommender system. The dataset contains four types of user behaviors, including click, purchase, adding to cart, and adding to wishlist. We take the click behaviors for each user and sort them according to time to construct the behavior sequence. Assuming there are T𝑇Titalic_T behaviors of a user, we use the former T1𝑇1T-1italic_T - 1 clicked items as features to predict whether the user will click the T𝑇Titalic_T-th item. The behavior sequence is truncated at length 200.

Industrial Dataset is collected from our online search system, one of the world’s largest e-commerce platforms. Samples are constructed from impression logs data of search results page. Each instance contains the user’s historical click sequence and search query sequence with timestamps, as well as other user-side or item-side features, with ”click” or ”not” as the label. With the rich features contained, the collected industrial dataset allows us to build the item-query heterogeneous graph. The dataset is composed of training samples from the past 25 days and test samples from the following day, a classic setting for industrial modeling. The click and query sequences are truncated at length 100.

Table 1. Statistics of datasets used in paper
Dataset Sample Items Queries
Taobao 100 million 4.16 million None
Industrial 11.2 billion 473 million 228 million

4.2. Compared Methods

We compare EGIN with some mainstream CTR prediction methods, including three types: pooling-based, attention-based, and graph-based methods. The AUC is adopted as the performance metric, representing the ranking ability of the model. The same train and test data are used in all methods.

EGIN adopts the following settings in experiments. The window size is set to 2 during the graph construction, and top-10 retrieval is used in the multi-interest network. The scale of the negative sampling queue is maintained at 1 million, and the negative sample number is set to 100. All embeddings in both graph and CTR prediction network share a dimension of 10.

  • DNN. Sum-pooling is adopted on user behavior sequence to summarize the historical interest of users, which is concatenated with target item features, user features, and query features. Finally, they are fed into an MLP and get the CTR prediction.

  • DNN-cross. Compared with DNN, the cartesian product between historical and target items is adopted to represent the i2i relationships. Then this 2-order feature is fed into the sum-pooling layer to generate user interest representation. We called this ”DNN-cross”, and by default, the following model comparisons are based on this method.

  • DIN (Zhou et al., 2018). DIN introduces an attention mechanism to assign different weights to historical items based on their relationships with the target item to learn the representation of user interests adaptively.

  • BST (Chen et al., 2019). BST utilizes Transformer to capture underlying sequential signals from user behavior sequences for a better recommendation.

  • EGES (Wang et al., 2018). EGES constructs an item graph depending on user behaviors and then adopts DeepWalk to learn the representation of items. In order to alleviate the sparsity and cold start problems, side information is incorporated into the graph to enhance the embedding procedure. We consume the graph embedding produced by EGES in our CTR prediction network to evaluate its performance.

4.3. Results on Public Dataset

Table 2. Result on Taobao dataset
Method AUC RelaImpr
DNN 0.87090.87090.87090.8709 0.16%percent0.16-0.16\%- 0.16 %
DNN-cross 0.87150.87150.87150.8715 0.00%percent0.000.00\%0.00 %
DIN 0.88330.88330.88330.8833 3.18%percent3.183.18\%3.18 %
BST 0.90400.90400.90400.9040 8.75%percent8.758.75\%8.75 %
EGIN(ours) 0.91840.9184\mathbf{0.9184}bold_0.9184 12.62%percent12.62\mathbf{12.62\%}bold_12.62 %

Considering the absence of query in the recommendation scenario of Taobao Dataset, we adjust our framework by preserving only nodes and edges of items in our query-item heterogeneous graph and eliminating features related to query in the multi-interest network.

Table 2 indicates that the proposed framework outperforms competitors on this commonly used CTR prediction benchmark. Our approach significantly achieves a better result than pooling-based and attention-based baselines. The result also reveals that our approach is a generalized CTR prediction framework that can be fit in recommendation scenarios.

4.4. Results on Industrial Dataset

Table 3. Result on industrial dataset
Category Method AUC RelaImpr
Pooling-based DNN 0.7010.7010.7010.7011 0.30%percent0.30-0.30\%- 0.30 %
DNN-cross 0.7017 0.00%percent0.000.00\%0.00 %
Attention-based DIN 0.70220.70220.70220.7022 0.25%percent0.250.25\%0.25 %
BST 0.7042 1.24%percent1.241.24\%1.24 %
Graph-based EGES 0.70790.70790.70790.7079 3.07%percent3.073.07\%3.07 %
EGIN(ours) 0.71080.7108\mathbf{0.7108}bold_0.7108 4.51%percent4.51\mathbf{4.51\%}bold_4.51 %

Table 3 shows the results on our industrial dataset. The proposed approach achieves significant performance gain compared with DNN-based methods. Compared with the attention-based DIN and BST, we can see that incorporating graph learning can provide extra information to the CTR prediction model and improve model performance. In comparison with another graph-based method EGES, which utilizes co-occurrence and side information of commodity, our end-to-end framework shows a better result that proves the effectiveness of our query-item heterogeneous graph. Notice that our graph learning method brings low computation cost in addition to the main CTR prediction task.

4.5. Ablation Study

Table 4. Ablation study
Method AUC Diff %percent\%%
EGIN 0.7108
EGIN w/o graph 0.7032 -0.76%percent\%%
EGIN w/o query 0.7081 -0.27%percent\%%
EGIN w/o pos_emb 0.7087 -0.21%percent\%%

The result of the ablation study on the industrial dataset is shown in Table 4. EGIN w/o graph removes the graph model and uses embedding produced by the CTR prediction network instead of jointly learned graph embedding. The significant performance decay indicates that the graph model provides plenty of extra information to the CTR network. EGIN w/o query means graph structure is limited to historical click sequence, causing a significant AUC drop. The result implies that query-item heterogeneous graph enhances CTR prediction performance by exploiting the information provided by queries. EGIN w/o pos_emb eliminates the positional embedding in the multi-interest network, which brings the problem of order information loss during top-k retrieval. The AUC diff of 0.21%percent0.210.21\%0.21 % conveys the necessity of this component in the network.

4.6. Graph Input Data Management

4.6.1. Subsampling of Frequent Entites

As our graph model shares the same input form with the CTR prediction network, it brings the problem of frequent items and queries. In our industrial dataset, the frequency of different items can vary from the range of 104superscript104{10}^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. In our general practice, subsampling of frequent entities helps improve performance in different search-related tasks. In order to verify the influence of this technique, we conduct a series of experiments on our industrial dataset by adopting different subsampling functions and subsampling targets (e.g., original behavior sequence or negative sample pool). In experiments, the subsampling method results in a CTR AUC of 0.7109 compared with 0.7108 for the unsampled version. The performance gap between different settings is limited to 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, which is negligible.

4.6.2. Combination of Categorically Restricted Data

Refer to caption
Figure 3. CTR prediction performance on industrial dataset of different mixtures of user click sequence and seeds sequence as input to graph learning. While the click sequence contains more inter-category transitions, the seeds sequence concentrates more on items corresponding to the search intention. 𝟖𝟎%percent80\mathbf{80\%}bold_80 % of click sequence and 𝟐𝟎%percent20\mathbf{20\%}bold_20 % of seeds sequence achieves best result in our experiment.
\Description

Similarity matrix.

In our search systems, a variant of user click sequence is available, and we call it seeds sequence. The seeds sequence is obtained by filtering out the items in user behavior sequence that belongs to different categories compared with the predicted intention of the current query. While click sequence offers more inter-category transfer information of user interest, seeds sequence concentrates more on intra-category similarity. In our query-item heterogeneous graph, the item2item relationship is expected to capture both types of information. Therefore, we adopt different mixture rates of click sequence and seeds sequence when building item2item relationship to test their performance. As shown in Figure 3, 80%percent8080\%80 % of click sequence and 20%percent2020\%20 % of seeds sequence are combined to achieve the best CTR AUC on the industrial datasets. Our approach adopts this proportion during other experiments for the best performance.

4.7. Analysis of EGIN Embedding

Refer to caption
Figure 4. Graph embedding similarity matrix of several randomly selected item pairs from our industrial dataset.
\Description

Similarity matrix.

Table 5. Average similarity of EGIN and DNN embedding
Average similarity EGIN emb EGES emb DNN emb
Intra-category 0.6392 0.2973 0.0121
Inter-category 0.0061 0.0121 0.0076

To analyze the embedding learned by our EGIN model, we randomly choose four pairs of items from the industrial dataset to demonstrate their similarity. As shown in Figure 4, similarity within the pair is significantly higher. Belonging to different sub-categories, (female t-shirt, female pants) and (spicy fish, pickles) present similarities of 0.214 and 0.440, demonstrating that the embedding learned by EGIN is able to model various similarity information.

Besides, we randomly choose 1 million intra-category and inter-category item pairs from our industrial dataset and test their average cosine similarities of embeddings produced by different methods. Table 5 shows us that the intra-category similarity of EGIN is much higher than the inter-category, whereas the difference from DNN embedding is relatively indistinctive. At the same time, our graph embedding appears to be more capable of capturing relationships between similar items than EGES and DNN.

4.8. Online A/B Test

We conduct online experiments in an A/B testing framework to further evaluate the performance of EGIN. The experimental goal is Click-Through-Rate (CTR) on the search result page of our e-commerce platform. The experiment lasts for 14 days and EGIN achieves a 2.76% CTR gain compared to the product model, showing the great application value of the proposed approach.

5. Conclusion

In this paper, we propose a novel approach named EGIN for CTR prediction in e-commerce search scenarios. End-to-end joint training of graph embedding and CTR prediction is implemented in a light-weight framework without reorganization of graph data or dependency on graph engines. Our query-item heterogeneous graph well exploits impression log data of search systems to take query-item correlation and their sequential information into account to improve CTR prediction performance. Multi-interest network for CTR prediction is designed to comprehensively consume various information provided by item and query embeddings provided by the graph neural network. Offline experiments on public and industrial datasets demonstrate that the proposed EGIN outperforms attention-based and graph-based competitors. Online A/B test result reveals the great application value of the proposed approach.

Our work is an initial step towards a highly efficient heterogeneous graph learning method capturing various similarity information for CTR prediction in e-commerce search. The proposed framework has low computation cost and no dependency on the graph engine, which results in an effortless implementation in large-scale systems. At the same time, our method can be generalized to different search and recommendation systems and becomes a helpful plug-in for CTR prediction by providing high-quality embedding. In the future, we will integrate other types of nodes into the heterogeneous graph to complete the structure and improve the performance. We will also explore a better way to integrate graph embedding information in the CTR prediction network.

References

  • (1)
  • Chen et al. (2019) Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1–4.
  • Cheng et al. (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (Boston, MA, USA) (DLRS 2016). Association for Computing Machinery, New York, NY, USA, 7–10. https://doi.org/10.1145/2988450.2988454
  • Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS’16). Curran Associates Inc., Red Hook, NY, USA, 3844–3852.
  • Dougherty et al. (1995) James Dougherty, Ron Kohavi, and Mehran Sahami. 1995. Supervised and unsupervised discretization of continuous features. In Machine learning proceedings 1995. Elsevier, 194–202.
  • Feng et al. (2019) Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and Ke** Yang. 2019. Deep Session Interest Network for Click-Through Rate Prediction. CoRR abs/1905.06482 (2019). arXiv:1905.06482 http://arxiv.longhoe.net/abs/1905.06482
  • Guo et al. (2017) Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. CoRR abs/1703.04247 (2017). arXiv:1703.04247 http://arxiv.longhoe.net/abs/1703.04247
  • Guo et al. (2021) Wei Guo, Rong Su, Renhao Tan, Huifeng Guo, Yingxue Zhang, Zhirong Liu, Ruiming Tang, and Xiuqiang He. 2021. Dual Graph Enhanced Embedding Neural Network for CTR Prediction. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 496–504. https://doi.org/10.1145/3447548.3467384
  • Li et al. (2019b) Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019b. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Bei**g, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 2615–2623. https://doi.org/10.1145/3357384.3357814
  • Li et al. (2019a) Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, and Xiaoyu Zhu. 2019a. Graph intention network for click-through rate prediction in sponsored search. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 961–964.
  • Lian et al. (2018) Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. XDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 1754–1763. https://doi.org/10.1145/3219819.3220023
  • Linmei et al. (2019) Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. 2019. Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 4821–4830. https://doi.org/10.18653/v1/D19-1488
  • Mikolov et al. (2013a) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  • Mikolov et al. (2013b) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26 (2013).
  • Min et al. (2022) Erxue Min, Yu Rong, Tingyang Xu, Yatao Bian, Da Luo, Kangyi Lin, Junzhou Huang, Sophia Ananiadou, and Peilin Zhao. 2022. Neighbour Interaction Based Click-Through Rate Prediction via Graph-Masked Transformer. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 353–362. https://doi.org/10.1145/3477495.3532031
  • Perozzi et al. (2014) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.
  • Pi et al. (2019) Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
  • Rendle (2010) Steffen Rendle. 2010. Factorization Machines. In 2010 IEEE International Conference on Data Mining. 995–1000. https://doi.org/10.1109/ICDM.2010.127
  • Tang and Wang (2018) Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM ’18). Association for Computing Machinery, New York, NY, USA, 565–573. https://doi.org/10.1145/3159652.3159656
  • Wang et al. (2018) Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 839–848.
  • Wang et al. (2017) Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD’17 (Halifax, NS, Canada) (ADKDD’17). Association for Computing Machinery, New York, NY, USA, Article 12, 7 pages. https://doi.org/10.1145/3124749.3124754
  • Wang et al. (2019) Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 950–958. https://doi.org/10.1145/3292500.3330989
  • Yuan et al. (2019) Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose, and Xiangnan He. 2019. A Simple Convolutional Generative Network for Next Item Recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). Association for Computing Machinery, New York, NY, USA, 582–590. https://doi.org/10.1145/3289600.3290975
  • Zhou et al. (2019) Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948.
  • Zhou et al. (2018) Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi **, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059–1068.
  • Zhu et al. (2018) Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.