Multi-intent-aware Session-based Recommendation

Min** Choi Sungkyunkwan UniversityRepublic of Korea [email protected] Hye-young Kim Sungkyunkwan UniversityRepublic of Korea [email protected] Hyunsouk Cho Ajou UniversityRepublic of Korea [email protected]  and  Jongwuk Lee Sungkyunkwan UniversityRepublic of Korea [email protected]

Session-based recommendation (SBR) aims to predict the following item a user will interact with during an ongoing session. Most existing SBR models focus on designing sophisticated neural-based encoders to learn a session representation, capturing the relationship among session items. However, they tend to focus on the last item, neglecting diverse user intents that may exist within a session. This limitation leads to significant performance drops, especially for longer sessions. To address this issue, we propose a novel SBR model, called Multi-intent-aware Session-based Recommendation Model (MiaSRec). It adopts frequency embedding vectors indicating the item frequency in session to enhance the information about repeated items. MiaSRec represents various user intents by deriving multiple session representations centered on each item and dynamically selecting the important ones. Extensive experimental results show that MiaSRec outperforms existing state-of-the-art SBR models on six datasets, particularly those with longer average session length, achieving up to 6.27% and 24.56% gains for MRR@20 and Recall@20. Our code is available at**530/MiaSRec.

session-based recommendation; multiple intents
journalyear: 2024copyright: rightsretainedconference: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 14–18, 2024; Washington, DC, USAbooktitle: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), July 14–18, 2024, Washington, DC, USAdoi: 10.1145/3626772.3657928isbn: 979-8-4007-0431-4/24/07ccs: Information systems Recommender systems

1. Introduction

Refer to caption
Figure 1. A session example with multiple user intents, such as travel, fashion, sun protection, and photo. Dotted rectangles represent items related to each user intent, and solid rectangles represent recommendations for each user intent.

Session-based recommendation (SBR) (Jannach et al., 2017; Wang et al., 2022) aims to learn hidden user preferences in a session and provide personalized items for each user. A session refers to a sequence of user-item interactions over time, e.g., consecutive clicks on multiple products during a transaction. It is particularly effective for anonymous or first-time users in web applications like e-commerce and streaming services, e.g., Amazon, YouTube, Netflix, and Spotify (Linden et al., 2003; Covington et al., 2016; Gomez-Uribe and Hunt, 2016; Chen et al., 2018). SBR inherently suffers from extreme data sparsity because it only deals with user actions during an ongoing session, making it challenging to capture dynamic and intricate item correlations.

Existing SBR models (Hidasi et al., 2016a; Li et al., 2017; Hidasi and Karatzoglou, 2018; Wu et al., 2019; Pan et al., 2020; Wang et al., 2020; Kang and McAuley, 2018; Yuan et al., 2021) have primarily focused on extracting a single representation from a session to capture and express user preferences. They mainly aimed to model a session consisting of multiple items using various neural-based session encoders, including recurrent neural networks (RNNs) (Hidasi et al., 2016a; Li et al., 2017; Hidasi and Karatzoglou, 2018), graph neural networks (GNNs) (Wu et al., 2019; Pan et al., 2020; Wang et al., 2020), or Transformers (Kang and McAuley, 2018; Yuan et al., 2021). However, despite their advanced encoder designs, modeling only a single representation cannot express multiple user intents.

Figure 1 illustrates the importance of using multiple user intents. While the user may be interested in photo when focusing on the last item “camera”, looking at the entire session suggests that the user will click items about travel. Considering the other items in the session, fashion or sun protection also align with different user intents. In this scenario, it is more appropriate to recommend a top-N𝑁Nitalic_N item list with appropriate multiple intents, e.g., (“travel bag”, “sneakers”, and “photo frame”). On the other hand, in some sessions, not all items are important. For example, in Figure 1, “speaker” is less relevant to the other items for capturing user intents.

Recently, some SBR models, such as MSGIFSR (Guo et al., 2022a) and Atten-Mixer (Zhang et al., 2023), have attempted to capture multiple user intents, focusing primarily on the last few consecutive items to represent diverse user intents. However, they cannot accurately capture user intent if the last item is less important or noisy. Besides, some studies (Wang et al., 2019; Tan et al., 2021; Chen et al., 2021; Zhang et al., 2022) have attempted to identify multiple user interests over a long user-item history. They employ a fixed number of user interests and extract the same number of interests for all users. It may miss some interests or include unnecessary ones since the number of user interests varies by user. We claim the challenges for modeling multiple user intents in the session: (i) how to fully capture multiple user intents inherent in each session and (ii) how to filter out unimportant ones among multiple intents.

To address these issues, we propose a novel SBR model, called Multi-intent-aware Session-based Recommendation Model (MiaSRec), as shown in Figure 2. First, MiaSRec encodes the session items with position and frequency embeddings, reflecting sequential information and repeat patterns. It then employs a self-attention mechanism and a high-way network to derive different user intents in the session. Then, it adaptively extracts diverse user intents in the session. Lastly, MiaSRec decodes multiple session representations into item distributions and aggregates them using pooling functions. Despite its simplicity, extensive experiments demonstrate that MiaSRec outperforms existing SBR models on six benchmark datasets. Notably, MiaSRec achieves significant gains for longer sessions (10absent10\geq 10≥ 10) with multiple user intents, up to 13.51% in Recall@20.

2. Proposed Model

Refer to caption
Figure 2. The model architecture of MiaSRec.

2.1. Session-based Recommendation

Let 𝒱={v1,,vn}𝒱subscript𝑣1subscript𝑣𝑛\mathcal{V}=\{v_{1},\dots,v_{n}\}caligraphic_V = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } denote a set of n𝑛nitalic_n unique items, e.g., products and songs. An arbitrary session s=(vt1,,vt|s|)𝑠subscript𝑣subscript𝑡1subscript𝑣subscript𝑡𝑠s=(v_{t_{1}},\dots,v_{t_{|s|}})italic_s = ( italic_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT | italic_s | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) represents a sequence of |s|𝑠|s|| italic_s | items that a user interacts with, e.g., clicks, views, and purchases. Here, tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicates the index of the i𝑖iitalic_i-th item in the session. Given a session, the goal of SBR is to predict the next item vt|s|+1subscript𝑣subscript𝑡𝑠1v_{t_{|s|+1}}italic_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT | italic_s | + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT that the user is likely to consume. The SBR model takes a session as an input and returns a top-N𝑁Nitalic_N item list to recommend.

2.2. Model Architecture

In this section, we present a novel SBR model called MiaSRec, which aims to address (i) how to represent multiple user intents in the session and (ii) how to prune out unnecessary user intents.

2.2.1. Embedding Layer

We first embed each session item vtisubscript𝑣subscript𝑡𝑖v_{t_{i}}italic_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT into item embedding vector 𝐯tidsubscript𝐯subscript𝑡𝑖superscript𝑑\mathbf{v}_{t_{i}}\in\mathbb{R}^{d}bold_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and generate the mean item embedding 𝐯m=1|s|i=1|s|𝐯tisubscript𝐯𝑚1𝑠superscriptsubscript𝑖1𝑠subscript𝐯subscript𝑡𝑖\mathbf{v}_{m}=\frac{1}{|s|}\sum_{i=1}^{|s|}{\mathbf{v}_{t_{i}}}bold_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | italic_s | end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_s | end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, capturing the global session information. To better capture the importance of each item in a session, we incorporate the absolute position embedding vector 𝐚idsubscript𝐚𝑖superscript𝑑\mathbf{a}_{i}\in\mathbb{R}^{d}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT to distinguish sequential order and the frequency embedding vector 𝐫fidsubscript𝐫subscript𝑓𝑖superscript𝑑\mathbf{r}_{f_{i}}\in\mathbb{R}^{d}bold_r start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT to express the importance of repeated items in a session. Here, fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the frequency of the i𝑖iitalic_i-th item in a session. In Figure 2, given session s=(v1,v3,v2,v3)𝑠subscript𝑣1subscript𝑣3subscript𝑣2subscript𝑣3s=(v_{1},v_{3},v_{2},v_{3})italic_s = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), the frequency is (1,2,1,2)1212(1,2,1,2)( 1 , 2 , 1 , 2 ). Finally, the item, position, and frequency embedding vectors are combined as the model input xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

(1) 𝐱i=𝐯ti+𝐚i+𝐫fifori{1,,|s|,m}.subscript𝐱𝑖subscript𝐯subscript𝑡𝑖subscript𝐚𝑖subscript𝐫subscript𝑓𝑖for𝑖1𝑠𝑚\mathbf{x}_{i}=\mathbf{v}_{t_{i}}+\mathbf{a}_{i}+\mathbf{r}_{f_{i}}\ \text{for% }\ i\in\{1,\dots,|s|,m\}.bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_r start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for italic_i ∈ { 1 , … , | italic_s | , italic_m } .

Note that we sort session items in reverse order as in  (Wang et al., 2020), so the last item vt|s|subscript𝑣subscript𝑡𝑠v_{t_{|s|}}italic_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT | italic_s | end_POSTSUBSCRIPT end_POSTSUBSCRIPT always corresponds to the first positional embedding 𝐚1subscript𝐚1\mathbf{a}_{1}bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. And, we randomly initialize the learnable parameter of position and frequency embeddings.

2.2.2. Multi-intent Representation

We employ the self-attention mechanism (Vaswani et al., 2017) to capture the complex relationships among session items. Using the bi-directional self-attention layer Self-attention()Self-attention\text{Self-attention}(\cdot)Self-attention ( ⋅ ), we generate multiple contextualized representations of the session information associated with each item as follows:

(2) 𝐜1,,𝐜|s|,𝐜m=Self-attention([𝐱1,,𝐱|s|,𝐱m])).\mathbf{c}_{1},\dots,\mathbf{c}_{|s|},\mathbf{c}_{m}=\text{Self-attention}([% \mathbf{x}_{1},\dots,\mathbf{x}_{|s|},\mathbf{x}_{m}])).bold_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_c start_POSTSUBSCRIPT | italic_s | end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = Self-attention ( [ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_x start_POSTSUBSCRIPT | italic_s | end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] ) ) .

To ensure that multiple contextualized representations do not become similar and to better reflect different user intent, we leverage the high-way network (Pan et al., 2020) by emphasizing item embeddings. Specifically, we combine contextualized representation 𝐜isubscript𝐜𝑖\mathbf{c}_{i}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and item embedding 𝐯isubscript𝐯𝑖\mathbf{v}_{i}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to derive user intent representation 𝐨idsubscript𝐨𝑖superscript𝑑\mathbf{o}_{i}\in\mathbb{R}^{d}bold_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT that takes into account both the overall information of the session and the information of each item.

(3) 𝐨i=𝐠𝐯i+(1𝐠)𝐜ifori{1,,|s|,m},where𝐠=σ(𝐖𝐠[𝐯i;𝐜i]),formulae-sequencesubscript𝐨𝑖direct-product𝐠subscript𝐯𝑖direct-product1𝐠subscript𝐜𝑖for𝑖1𝑠𝑚where𝐠𝜎subscript𝐖𝐠superscriptsubscript𝐯𝑖subscript𝐜𝑖top\begin{split}\mathbf{o}_{i}=\mathbf{g}\odot\mathbf{v}_{i}+(1-\mathbf{g})&\odot% \mathbf{c}_{i}\ \text{for}\ i\in\{1,\dots,|s|,m\},\\ \text{where}\ \mathbf{g}=\sigma(&\mathbf{W_{g}}[\mathbf{v}_{i};\mathbf{c}_{i}]% ^{\top}),\end{split}start_ROW start_CELL bold_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_g ⊙ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - bold_g ) end_CELL start_CELL ⊙ bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for italic_i ∈ { 1 , … , | italic_s | , italic_m } , end_CELL end_ROW start_ROW start_CELL where bold_g = italic_σ ( end_CELL start_CELL bold_W start_POSTSUBSCRIPT bold_g end_POSTSUBSCRIPT [ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) , end_CELL end_ROW

where 𝐖𝐠d×2dsubscript𝐖𝐠superscript𝑑2𝑑\mathbf{W_{g}}\in\mathbb{R}^{d\times 2d}bold_W start_POSTSUBSCRIPT bold_g end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × 2 italic_d end_POSTSUPERSCRIPT is a learnable weight matrix, 𝐠d𝐠superscript𝑑\mathbf{g}\in\mathbb{R}^{d}bold_g ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a gating vector, and σ()𝜎\sigma(\cdot)italic_σ ( ⋅ ) is the sigmoid function.

2.2.3. Intent Selection

We employ multiple session representations to fully exploit the potential of each session item. However, not all session items may be necessary, and some may be noisy. To extract essential user intents in a session, we calculate the importance of multiple representations and remove unimportant ones. We utilize a sparse transformation α𝛼\alphaitalic_α-entmax (Peters et al., 2019; Yuan et al., 2021), assigning a zero probability to unimportant representations.

(4) α-entmax(𝐳)=argmax𝐩Δl𝐩𝐳+HαT(𝐩),𝛼-entmax𝐳subscriptargmax𝐩superscriptΔ𝑙superscript𝐩top𝐳subscriptsuperscript𝐻𝑇𝛼𝐩\alpha\text{-entmax}(\mathbf{z})=\operatornamewithlimits{argmax}_{\mathbf{p}% \in\Delta^{l}}{\mathbf{p}^{\top}\mathbf{z}+H^{T}_{\alpha}(\mathbf{p})},italic_α -entmax ( bold_z ) = roman_argmax start_POSTSUBSCRIPT bold_p ∈ roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_p start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_z + italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_p ) ,

where HαT(𝐩)=1α(α1)j(𝐩j𝐩jα)subscriptsuperscript𝐻𝑇𝛼𝐩1𝛼𝛼1subscript𝑗subscript𝐩𝑗superscriptsubscript𝐩𝑗𝛼H^{T}_{\alpha}(\mathbf{p})=\frac{1}{\alpha(\alpha-1)}\sum_{j}(\mathbf{p}_{j}-% \mathbf{p}_{j}^{\alpha})italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_p ) = divide start_ARG 1 end_ARG start_ARG italic_α ( italic_α - 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) if α1𝛼1\alpha\neq 1italic_α ≠ 1, else H1T(p)=j𝐩jlog𝐩jsubscriptsuperscript𝐻𝑇1𝑝subscript𝑗subscript𝐩𝑗subscript𝐩𝑗H^{T}_{1}(p)=-\sum_{j}\mathbf{p}_{j}\log\mathbf{p}_{j}italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p ) = - ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_log bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT 111Δl:={𝐩l:𝐩0,p1=1\Delta^{l}:=\{\mathbf{p}\in\mathbb{R}^{l}:\mathbf{p}\geq 0,\|p\|_{1}=1roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := { bold_p ∈ blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT : bold_p ≥ 0 , ∥ italic_p ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1} denotes the l𝑙litalic_l-probability simplex.. Depending on α𝛼\alphaitalic_α, 1-entmax and 2-entmax are equivalent to softmax and sparsemax (Martins and Astudillo, 2016), respectively. A larger α𝛼\alphaitalic_α value generates a more sparse probability distribution, and we set α𝛼\alphaitalic_α as 1.5 empirically. We extract session representation 𝐡idsubscript𝐡𝑖superscript𝑑\mathbf{h}_{i}\in\mathbb{R}^{d}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT by masking unnecessary user intents.

(5) γ=α-entmax(𝐰[𝐨1;;𝐨|s|;𝐨m]),{𝐡1,,𝐡k}={γi𝐨i|γi>0,i{1,,|s|,m}},formulae-sequence𝛾𝛼-entmax𝐰superscriptsubscript𝐨1subscript𝐨𝑠subscript𝐨𝑚topsubscript𝐡1subscript𝐡𝑘conditional-setsubscript𝛾𝑖subscript𝐨𝑖formulae-sequencesubscript𝛾𝑖0𝑖1𝑠𝑚\begin{split}\mathbf{\gamma}=\alpha&\text{-entmax}(\mathbf{w}\cdot[\mathbf{o}_% {1};\dots;\mathbf{o}_{|s|};\mathbf{o}_{m}]^{\top}),\\ \{\mathbf{h}_{1},\dots,\mathbf{h}_{k}\}&=\{\mathbf{\gamma}_{i}\mathbf{o}_{i}|% \mathbf{\gamma}_{i}>0,i\in\{1,\dots,|s|,m\}\},\end{split}start_ROW start_CELL italic_γ = italic_α end_CELL start_CELL -entmax ( bold_w ⋅ [ bold_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; bold_o start_POSTSUBSCRIPT | italic_s | end_POSTSUBSCRIPT ; bold_o start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL { bold_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_CELL start_CELL = { italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 , italic_i ∈ { 1 , … , | italic_s | , italic_m } } , end_CELL end_ROW

where 𝐰d𝐰superscript𝑑\mathbf{w}\in\mathbb{R}^{d}bold_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a learnable parameter and γ|s|+1𝛾superscript𝑠1\mathbf{\gamma}\in\mathbb{R}^{|s|+1}italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT | italic_s | + 1 end_POSTSUPERSCRIPT is the importance weight vector for each item. k𝑘kitalic_k is the number of non-zero elements in γ𝛾\gammaitalic_γ. Unlike the previous multiple representation models (Zhang et al., 2022; Chen et al., 2021; Guo et al., 2022a; Zhang et al., 2023) utilize a fixed number of representations regardless of the session, MiaSRec dynamically selects multiple session representations for a given session, up to the number of session items.

2.2.4. Multi-intent Aggregation

The multi-intent aggregation process of MiaSRec is divided into two parts: (i) decoding item distributions from multiple session representations and (ii) aggregating the distributions for the final recommendation.

To decode each session representation into the item distribution, we employ cosine similarity with item embedding matrix, i.e., dot product with L2-normalization. For simplicity, we reuse the item embedding look-up table 𝐕n×d𝐕superscript𝑛𝑑\mathbf{V}\in\mathbb{R}^{n\times d}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT, so the number of model parameters does not increase.

(6) 𝐲^1,,𝐲^k=[𝐡~1𝐕~,,𝐡~k𝐕~],subscript^𝐲1subscript^𝐲𝑘subscript~𝐡1superscript~𝐕topsubscript~𝐡𝑘superscript~𝐕top\mathbf{\hat{y}}_{1},\dots,\mathbf{\hat{y}}_{k}=[\mathbf{\tilde{h}}_{1}\mathbf% {\tilde{V}}^{\top},\dots,\mathbf{\tilde{h}}_{k}\mathbf{\tilde{V}}^{\top}],over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ over~ start_ARG bold_h end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , over~ start_ARG bold_h end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ,

where 𝐡~isubscript~𝐡𝑖\mathbf{\tilde{h}}_{i}over~ start_ARG bold_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐕~~𝐕\mathbf{\tilde{V}}over~ start_ARG bold_V end_ARG are the normalized session vector and the normalized item embedding matrix, respectively. 𝐲^insubscript^𝐲𝑖superscript𝑛\mathbf{\hat{y}}_{i}\in\mathbb{R}^{n}over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT represents an item distribution decoded by the session vector 𝐡isubscript𝐡𝑖\mathbf{h}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

To aggregate multiple item distributions, we adopt max-pooling and average-pooling functions. Max-pooling maintains the principal features for multiple intents, and average-pooling captures consistent intent over the session.

(7) 𝐲^=βMaxPool(𝐲^1,,𝐲^k)+(1β)AvgPool(𝐲^1,,𝐲^k),^𝐲𝛽MaxPoolsubscript^𝐲1subscript^𝐲𝑘1𝛽AvgPoolsubscript^𝐲1subscript^𝐲𝑘\mathbf{\hat{y}}=\beta\text{MaxPool}(\mathbf{\hat{y}}_{1},\dots,\mathbf{\hat{y% }}_{k})+(1-\beta)\text{AvgPool}(\mathbf{\hat{y}}_{1},\dots,\mathbf{\hat{y}}_{k% }),over^ start_ARG bold_y end_ARG = italic_β MaxPool ( over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ( 1 - italic_β ) AvgPool ( over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ,

where 𝐲^n^𝐲superscript𝑛\mathbf{\hat{y}}\in\mathbb{R}^{n}over^ start_ARG bold_y end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the final aggregated item distribution, and β𝛽\betaitalic_β is a combination hyperparameter. MaxPool()MaxPool\text{MaxPool}(\cdot)MaxPool ( ⋅ ) and AvgPool()AvgPool\text{AvgPool}(\cdot)AvgPool ( ⋅ ) represent the max- and average-pooling functions that fetch the maximum (or average) value for each dimension from multiple vectors.

Lastly, we formulate a cross-entropy loss function for training.

(8) L(𝐲,𝐲^)=j=1n𝐲(j)log(exp(𝐲^(j)/τ)iexp(𝐲^(i)/τ)),𝐿𝐲^𝐲subscriptsuperscript𝑛𝑗1𝐲𝑗exp^𝐲𝑗𝜏subscript𝑖exp^𝐲𝑖𝜏L(\mathbf{y},\mathbf{\hat{y}})=-\sum^{n}_{j=1}\mathbf{y}(j)\log(\frac{\text{% exp}(\mathbf{\hat{y}}(j)/\tau)}{\sum_{i}{\text{exp}(\mathbf{\hat{y}}(i)/\tau)}% }),italic_L ( bold_y , over^ start_ARG bold_y end_ARG ) = - ∑ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT bold_y ( italic_j ) roman_log ( divide start_ARG exp ( over^ start_ARG bold_y end_ARG ( italic_j ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT exp ( over^ start_ARG bold_y end_ARG ( italic_i ) / italic_τ ) end_ARG ) ,

where 𝐲n𝐲superscript𝑛\mathbf{y}\in\mathbb{R}^{n}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a one-hot vector of the target item, i.e., 𝐲(j)=1𝐲𝑗1\mathbf{y}(j)=1bold_y ( italic_j ) = 1 if the j𝑗jitalic_j-th item is the target item; otherwise, 𝐲(j)=0𝐲𝑗0\mathbf{y}(j)=0bold_y ( italic_j ) = 0. Here, τ𝜏\tauitalic_τ is the hyperparameter to control the temperature (Hinton et al., 2015) for better convergence (Gupta et al., 2019).

3. Experiments

3.1. Experimental Setup

Table 1. Statistics of the various benchmark datasets. AvgLen indicates the average length of entire sessions.
Dataset # Interacts # Sessions # Items AvgLen
Diginetica 786,582 204,532 42,862 4.12
Retailrocket 871,637 321,032 51,428 6.40
Yoochoose 1,434,349 470,477 19,690 4.64
Tmall 427,797 66,909 37,367 10.62
Dressipi 4,305,641 943,658 18,059 6.47
LastFM 3,510,163 325,543 38,616 8.16
Table 2. Performance comparison for MiaSRec and baseline models. Imp. indicates how much better MiaSRec is than the best baseline model. The best model is marked bold and the second best model is underlined. Significant differences (ρ<0.01𝜌0.01\rho<0.01italic_ρ < 0.01) between the best baseline model and MiaSRec are reported with \dagger.
R@20 49.86 48.01 51.11 50.60 52.06 48.70 52.89 46.45 51.22 51.59 49.84 53.20 53.54 0.65%
Diginetica M@20 17.20 16.60 18.21 17.28 18.25 16.96 18.53 16.10 18.35 18.47 17.07 18.37 19.47 5.04%
R@20 59.70 58.01 60.70 57.43 61.13 56.56 61.77 55.11 61.56 61.65 59.49 63.04 63.37 0.26%
Retailrocket M@20 35.71 36.01 38.18 35.39 38.68 36.82 38.49 34.15 38.16 38.10 36.25 38.42 39.23 1.41%
R@20 63.64 62.28 63.50 61.60 63.73 62.78 64.64 57.50 63.13 62.67 63.73 65.20 65.37 0.26%
Yoochoose M@20 28.66 28.36 29.06 27.97 29.23 28.84 28.25 25.07 28.29 28.00 29.32 30.02 30.74 2.39%
R@20 35.80 33.47 40.39 39.71 42.82 32.59 44.91 35.66 42.40 41.56 38.76 35.39 55.94 24.56%
Tmall M@20 25.08 24.75 29.48 24.16 30.85 24.19 31.59 22.41 28.43 28.56 28.52 22.19 33.57 6.27%
R@20 37.18 36.10 38.19 38.35 37.77 37.71 38.14 38.18 39.60 39.15 37.75 38.43 42.26 6.73%
Dressipi M@20 14.31 14.51 15.34 15.05 15.13 14.73 15.54 15.46 16.07 15.92 15.24 15.90 16.70 3.92%
R@20 20.53 21.80 22.50 22.72 22.47 22.31 22.75 22.17 22.13 23.02 22.93 22.73 25.85 12.32%
LastFM M@20 6.22 8.70 8.79 7.66 7.93 8.80 7.83 7.57 7.83 8.50 8.74 8.20 9.95 13.06%

Datasets. We conduct extensive experiments on six real-world datasets collected from e-commerce and music streaming services: Diginetica, Retailrocket, Yoochoose, Tmall 222Since it has been widely used in previous studies (Hou et al., 2022; Xia et al., 2021; Han et al., 2022), we adopt Tmall even though it consists of timestamps in units of days, not in minutes or seconds., Dressipi, and LastFM. For data pre-processing, we follow the conventional procedure (Ludewig and Jannach, 2018; Ludewig et al., 2021; Li et al., 2017; Wu et al., 2019). We discard the sessions with a single item and the items that occur less than five times in entire sessions. We split training, validation, and test sets chronologically as the 8:1:1 ratio. Table 1 summarizes detailed statistics on all benchmark datasets.

Baselines. We compare MiaSRec with the following SBR models: SASRec (Kang and McAuley, 2018), SR-GNN (Wu et al., 2019), NISER+ (Gupta et al., 2019), SGNN-HN (Pan et al., 2020), DSAN (Yuan et al., 2021), LESSR (Chen and Wong, 2020), CORE (Hou et al., 2022). We also compare with the subsequent multiple representations models. SINE (Tan et al., 2021), ComiRec (Chen et al., 2021), Re4 (Zhang et al., 2022), Atten-mixer (A-mixer) (Zhang et al., 2023), MSGIFSR (Guo et al., 2022a). We do not consider SBR models which use additional information, e.g., temporal information (Guo et al., 2022b; Shen et al., 2021) or content-based features (Hidasi et al., 2016b; Zhu et al., 2020; Chen et al., 2022; Li et al., 2022).

Evaluation protocol and metrics. As the common protocol to evaluate SBR models (Li et al., 2017; Wu et al., 2019), we adopt the iterative revealing scheme, which iteratively exposes an item from a session to the model. We adopt Recall (R@20) and Mean Reciprocal Rank (M@20) to quantify the prediction accuracy of the next single item. All experimental results are averaged over three runs with different seeds, and we conduct the significance test using a paired t-test.

Implementation details. For reproducibility, we implement MiaSRec and the other baseline models on an open-source recommendation system library RecBole333 (Zhao et al., 2021, 2022). We optimize all baselines using Adam optimizer (Kingma and Ba, 2015) with a learning rate of 0.001. We set the embedding dimension to 100 and the max session length to 50. We stop the training if the validation MRR@20 decreases for three consecutive epochs 444We report the performance on the test set using the models that show the highest performance on the validation set.. For all methods, we set the batch size to 1024. For MiaSRec, we set α𝛼\alphaitalic_α as 1.5 for α𝛼\alphaitalic_α-entmax and tune the temperature τ𝜏\tauitalic_τ among {0.01, 0.05, 0.07, 0.1, 0.5, 1}, dropout ratio δ𝛿\deltaitalic_δ among {0, 0.1, 0.2, 0.3, 0.4, 0.5}. We search β𝛽\betaitalic_β from 0 to 1 in 0.1 increments. We follow the original papers’ settings for other hyperparameters of baseline models, but if not available, we thoroughly tune them.

3.2. Experimental Results

Overall comparison. Table 2 reports the performance comparison between MiaSRec and other baseline models. (i) MiaSRec shows the best performance on all datasets. Note that MiaSRec demonstrates performance improvements of up to 24.56% for R@20 compared to the best baseline. In particular, MiaSRec shows substantial improvements in Recall on datasets with longer average session lengths (e.g., Tmall and LastFM). (ii) Multiple representation models, such as MSGIFSR (Guo et al., 2022a), ComiRec (Chen et al., 2021) and Re4 (Zhang et al., 2022), tend to show higher accuracy than single representation models. This implies that various intents can exist in session, and it is necessary to capture them.

Refer to caption
Refer to caption Refer to caption
(a) Diginetica (b) Retailrocket
Figure 3. Performance comparison of SBR models over varying session lengths. Sessions are divided into six groups depending on session length.

Effect of session length. Figure 3 illustrates the accuracy of SBR models as the session length varies. (i) The accuracy of all models decreases as the session length increases since user intent can vary. For example, in Diginetica, CORE (Hou et al., 2022) shows 13.2% performance drop for long sessions (|s|𝑠|s|| italic_s |\geq10101010) compared to short sessions (|s|𝑠|s|| italic_s |<<<5555). (ii) MiaSRec shows the highest performance in most cases, regardless of session length, and a comparatively modest performance drop, which indicates that MiaSRec effectively captures various user intents. Particularly, for long sessions (|s|𝑠|s|| italic_s |\geq10101010), it shows significant improvements over CORE, e.g., 6.03% for R@20 in Diginetica.

Table 3. Ablation study for MiaSRec. “PE” and “FE” mean position and frequency embedding. “mean” indicates only using the mean vector (omsubscriptom\textbf{o}_{\text{m}}o start_POSTSUBSCRIPT m end_POSTSUBSCRIPT), and “last k𝑘kitalic_k” indicates selecting the last k𝑘kitalic_k representations as session representations.

Model Diginetica Retailrocket Yoochoose R@20 M@20 R@20 M@20 R@20 M@20 MiaSRec 53.54 19.47 63.37 39.23 65.37 30.74 Variants for embedding layers w/o PE (𝐚isubscript𝐚𝑖\mathbf{a}_{i}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) 51.36 18.15 61.06 37.51 61.13 26.51 w/o FE (𝐫fisubscript𝐫subscript𝑓𝑖\mathbf{r}_{f_{i}}bold_r start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT) 53.48 19.23 63.28 38.92 65.15 29.91 Variants for intent selection mean (𝐨msubscript𝐨𝑚\mathbf{o}_{{m}}bold_o start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT) 52.73 18.66 61.70 38.25 64.71 29.84 last 1111 (𝐨|s|subscript𝐨𝑠\mathbf{o}_{{|s|}}bold_o start_POSTSUBSCRIPT | italic_s | end_POSTSUBSCRIPT) 52.34 18.37 61.90 37.79 64.10 30.05 last 3333 (𝐨|s|2:|s|subscript𝐨:𝑠2𝑠\mathbf{o}_{{|s|-2:|s|}}bold_o start_POSTSUBSCRIPT | italic_s | - 2 : | italic_s | end_POSTSUBSCRIPT) 53.08 19.20 62.07 38.68 64.77 30.34 last 5555 (𝐨|s|4:|s|subscript𝐨:𝑠4𝑠\mathbf{o}_{{|s|-4:|s|}}bold_o start_POSTSUBSCRIPT | italic_s | - 4 : | italic_s | end_POSTSUBSCRIPT) 53.38 19.29 63.01 38.90 65.13 30.51

Ablation study. Table 3 shows the ablation study of MiaSRec for additional embeddings and multi-intent selection. (i) Both frequency and position embeddings have a significant impact on performance. This indicates that reflecting the importance of each item through sequential and occurrence information is effective in improving performance. (ii) It is always better to use multiple representations than a single representation. In particular, MiaSRec shows up to 2.71% improvements in R@20 compared to single representation variants using mean vector (𝐨msubscript𝐨𝑚\mathbf{o}_{m}bold_o start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT) and last item vector (𝐨|s|subscript𝐨𝑠\mathbf{o}_{|s|}bold_o start_POSTSUBSCRIPT | italic_s | end_POSTSUBSCRIPT). (iii) The intent selection method of MiaSRec is more effective than heuristic multiple representation variants. It outperforms other methods that adopt multiple representations of the last few item vectors, suggesting the importance of extracting the representation dynamically over the session.

4. Conclusion

This paper proposed a novel SBR model, MiaSRec, which exploits various user intents in a session. Unlike previous SBR models that only use a single session representation, MiaSRec fully captures a variety of intents utilizing each session item using multiple representations and dynamically selects more important ones using the intent selection layer. It then effectively decodes and aggregates the multiple representations and provides recommendations that reflect the various intents. Extensive experiments showed that MiaSRec outperformed twelve baseline models on six benchmark datasets.


This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant and National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019-0-00421, 2022-0-00680-003, IITP-2024-2020-0-01821, and NRF-2018R1A5A1060031).


  • (1)
  • Chen et al. (2018) Ching-Wei Chen, Paul Lamere, Markus Schedl, and Hamed Zamani. 2018. Recsys challenge 2018: automatic music playlist continuation. In RecSys. 527–528.
  • Chen et al. (2021) Gaode Chen, Xinghua Zhang, Yanyan Zhao, Cong Xue, and Ji Xiang. 2021. Exploring Periodicity and Interactivity in Multi-Interest Framework for Sequential Recommendation. In IJCAI. 1426–1433.
  • Chen et al. (2022) **peng Chen, Yuan Cao, Fan Zhang, Pengfei Sun, and Kaimin Wei. 2022. Sequential Intention-aware Recommender based on User Interaction Graph. In ICMR. 118–126.
  • Chen and Wong (2020) Tianwen Chen and Raymond Chi-Wing Wong. 2020. Handling Information Loss of Graph Neural Networks for Session-based Recommendation. In KDD. 1172–1180.
  • Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In RecSys. 191–198.
  • Gomez-Uribe and Hunt (2016) Carlos Alberto Gomez-Uribe and Neil Hunt. 2016. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manag. Inf. Syst. 6, 4 (2016), 13:1–13:19.
  • Guo et al. (2022a) Jiayan Guo, Yaming Yang, Xiangchen Song, Yuan Zhang, Yu**g Wang, **g Bai, and Yan Zhang. 2022a. Learning Multi-granularity Consecutive User Intent Unit for Session-based Recommendation. In WSDM. 343–352.
  • Guo et al. (2022b) Jiayan Guo, Peiyan Zhang, Chaozhuo Li, Xing Xie, Yan Zhang, and Sunghun Kim. 2022b. Evolutionary Preference Learning via Graph Nested GRU ODE for Session-based Recommendation. In CIKM. 624–634.
  • Gupta et al. (2019) Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam M. Shroff. 2019. NISER: Normalized Item and Session Representations with Graph Neural Networks. CoRR (2019).
  • Han et al. (2022) Qilong Han, Chi Zhang, Rui Chen, Riwei Lai, Hongtao Song, and Li Li. 2022. Multi-Faceted Global Item Relation Learning for Session-Based Recommendation. In SIGIR. 1705–1715.
  • Hidasi and Karatzoglou (2018) Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In CIKM. 843–852.
  • Hidasi et al. (2016a) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016a. Session-based Recommendations with Recurrent Neural Networks. In ICLR.
  • Hidasi et al. (2016b) Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016b. Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations. In RecSys. 241–248.
  • Hinton et al. (2015) Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR (2015).
  • Hou et al. (2022) Yupeng Hou, Binbin Hu, Zhiqiang Zhang, and Wayne Xin Zhao. 2022. CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space. In SIGIR. 1796–1801.
  • Jannach et al. (2017) Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-based item recommendation in e-commerce: on short-term intents, reminders, trends and discounts. User Model. User Adapt. Interact. 27, 3-5 (2017), 351–392.
  • Kang and McAuley (2018) Wang-Cheng Kang and Julian J. McAuley. 2018. Self-Attentive Sequential Recommendation. In ICDM. 197–206.
  • Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
  • Li et al. (2022) Haoyang Li, Xin Wang, Ziwei Zhang, Jianxin Ma, Peng Cui, and Wenwu Zhu. 2022. Intention-Aware Sequential Recommendation With Structured Intent Transition. IEEE Trans. Knowl. Data Eng. 34, 11 (2022), 5403–5414.
  • Li et al. (2017) **g Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In CIKM. 1419–1428.
  • Linden et al. (2003) Greg Linden, Brent Smith, and Jeremy York. 2003. Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Comput. 7, 1 (2003), 76–80.
  • Ludewig and Jannach (2018) Malte Ludewig and Dietmar Jannach. 2018. Evaluation of session-based recommendation algorithms. User Model. User Adapt. Interact. 28, 4-5 (2018), 331–390.
  • Ludewig et al. (2021) Malte Ludewig, Noemi Mauro, Sara Latifi, and Dietmar Jannach. 2021. Empirical analysis of session-based recommendation algorithms. User Model. User Adapt. Interact. 31, 1 (2021), 149–181.
  • Martins and Astudillo (2016) André F. T. Martins and Ramón Fernandez Astudillo. 2016. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. In ICML (JMLR Workshop and Conference Proceedings, Vol. 48). 1614–1623.
  • Pan et al. (2020) Zhiqiang Pan, Fei Cai, Wanyu Chen, Honghui Chen, and Maarten de Rijke. 2020. Star Graph Neural Networks for Session-based Recommendation. In CIKM. 1195–1204.
  • Peters et al. (2019) Ben Peters, Vlad Niculae, and André F. T. Martins. 2019. Sparse Sequence-to-Sequence Models. In ACL. 1504–1519.
  • Shen et al. (2021) Qi Shen, Shixuan Zhu, Yitong Pang, Yiming Zhang, and Zhihua Wei. 2021. Temporal aware Multi-Interest Graph Neural Network For Session-based Recommendation. CoRR (2021).
  • Tan et al. (2021) Qiaoyu Tan, Jianwei Zhang, Jiangchao Yao, Ninghao Liu, **gren Zhou, Hongxia Yang, and Xia Hu. 2021. Sparse-Interest Network for Sequential Recommendation. In WSDM. 598–606.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NeurIPS. 5998–6008.
  • Wang et al. (2022) Shou** Wang, Longbing Cao, Yan Wang, Quan Z. Sheng, Mehmet A. Orgun, and Defu Lian. 2022. A Survey on Session-based Recommender Systems. ACM Comput. Surv. 54, 7 (2022), 154:1–154:38.
  • Wang et al. (2019) Shou** Wang, Liang Hu, Yan Wang, Quan Z. Sheng, Mehmet A. Orgun, and Longbing Cao. 2019. Modeling Multi-Purpose Sessions for Next-Item Recommendations via Mixture-Channel Purpose Routing Networks. In IJCAI. 3771–3777.
  • Wang et al. (2020) Ziyang Wang, Wei Wei, Gao Cong, Xiao-Li Li, Xianling Mao, and Minghui Qiu. 2020. Global Context Enhanced Graph Neural Networks for Session-based Recommendation. In SIGIR. 169–178.
  • Wu et al. (2019) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-Based Recommendation with Graph Neural Networks. In AAAI. 346–353.
  • Xia et al. (2021) Xin Xia, Hongzhi Yin, Junliang Yu, Yingxia Shao, and Lizhen Cui. 2021. Self-Supervised Graph Co-Training for Session-based Recommendation. In CIKM. 2180–2190.
  • Yuan et al. (2021) Jiahao Yuan, Zihan Song, Mingyou Sun, Xiaoling Wang, and Wayne Xin Zhao. 2021. Dual Sparse Attention Network For Session-based Recommendation. In AAAI. 4635–4643.
  • Zhang et al. (2023) Peiyan Zhang, Jiayan Guo, Chaozhuo Li, Yueqi Xie, Jaeboum Kim, Yan Zhang, Xing Xie, Haohan Wang, and Sunghun Kim. 2023. Efficiently Leveraging Multi-level User Intent for Session-based Recommendation via Atten-Mixer Network. In WSDM. 168–176.
  • Zhang et al. (2022) Shengyu Zhang, Lingxiao Yang, Dong Yao, Yujie Lu, Fuli Feng, Zhou Zhao, Tat-Seng Chua, and Fei Wu. 2022. Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation. In WWW. 2216–2226.
  • Zhao et al. (2022) Wayne Xin Zhao, Yupeng Hou, Xingyu Pan, Chen Yang, Zeyu Zhang, Zihan Lin, **gsen Zhang, Shuqing Bian, Jiakai Tang, Wenqi Sun, Yushuo Chen, Lanling Xu, Gaowei Zhang, Zhen Tian, Changxin Tian, Shanlei Mu, Xinyan Fan, Xu Chen, and Ji-Rong Wen. 2022. RecBole 2.0: Towards a More Up-to-Date Recommendation Library. In CIKM. 4722–4726.
  • Zhao et al. (2021) Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In CIKM. 4653–4664.
  • Zhu et al. (2020) Nengjun Zhu, Jian Cao, Yanchi Liu, Yang Yang, Haochao Ying, and Hui Xiong. 2020. Sequential Modeling of Hierarchical User Intention and Preference for Next-item Recommendation. In WSDM. 807–815.