License: arXiv.org perpetual non-exclusive license
arXiv:2404.00261v1 [cs.IR] 30 Mar 2024

A Simple Yet Effective Approach for Diversified Session-Based Recommendation thanks: Citation:

Qing Yin, Hui Fang
Shanghai University of Finance and Economics
Shanghai, China
[email protected], [email protected]
&Zhu Sun
Institute of High Performance Computing; Centre for Frontier AI Research, A*STAR
Singapore
[email protected]
\ANDYew-Soon Ong
A*STAR Centre for Frontier AI Research; Nanyang Technological University
Singapore
[email protected]
Abstract

Session-based recommender systems (SBRSs) have become extremely popular in view of the core capability of capturing short-term and dynamic user preferences. However, most SBRSs primarily maximize recommendation accuracy but ignore users’ minor preferences, thus leading to filter bubbles in the long run. Only a handful of works, being devoted to improving diversity, depend on unique model designs and calibrated loss functions, which cannot be easily adapted to existing accuracy-oriented SBRSs. It is thus worthwhile to come up with a simple yet effective design that can be used as a plugin to facilitate existing SBRSs on generating a more diversified list in the meantime preserving the recommendation accuracy. In this case, we propose an end-to-end framework applied for every existing representative (accuracy-oriented) SBRS, called diversified category-aware attentive SBRS (DCA-SBRS), to boost the performance on recommendation diversity. It consists of two novel designs: a model-agnostic diversity-oriented loss function, and a non-invasive category-aware attention mechanism. Extensive experiments on three datasets showcase that our framework helps existing SBRSs achieve extraordinary performance in terms of recommendation diversity (e.g., an average of 74.1% increase on ILD@@@@10) and comprehensive performance (e.g., an average of 52.3% lift on F-score@@@@10), without significantly deteriorating recommendation accuracy compared to state-of-the-art accuracy-oriented SBRSs. The source code can be obtained via github.com/qyin863/DCA-SBRS.

Keywords recommender systems, session-based recommendation, diversification, diversified recommendation

1 Introduction

Session-based recommender systems (SBRSs) have gained significant attention because they provide more timely and accurate recommendations by incorporating short-term and dynamic user preferences [fang2020deep, wang2021survey]. To enhance recommendation accuracy, existing SBRSs utilize sophisticated models like deep neural networks that capture short-term preferences from the most recent session. For instance, GRU4Rec [hidasi2015session] employs gated recurrent units (GRU) to learn a session’s sequential behaviors. Furthermore, the attention mechanism is imported to capture main-purpose (intent) preferences such as NARM [li2017neural] and STAMP [LiuZMZ18]. Moreover, graph neural networks (GNNs) are utilized to learn more complex item relationships (e.g., SR-GNN [WuT0WXT19], GC-SAN [XuZLSXZFZ19], and GCE-GNN [wang2020global]). For the above state-of-the-art (SOTA) SBRSs, attention mechanisms are used together with RNNs or GNNs to improve recommendation performance [wang2021survey].

However, the aforementioned SOTA (accuracy-oriented) SBRSs would gradually overemphasize dominant interests and weaken minor ones [steck2018calibrated], thus leading to a filter bubble [nguyen2014exploring, khenissi2020theoretical] over time. As such, diversified recommender systems (RSs) are raised to recommend more diverse lists (e.g., with items covering many categories). The diversified works in traditional recommendation fall into three major categories: post-processing heuristic methods [carbonell1998use, steck2018calibrated], determinantal point process (DPP) methods [chen_fast_2018, wu2019pd, gan2020enhancing] and end-to-end learning methods [zheng2021dgcn, liang2021enhancing]. However, to the best of our knowledge, there are only three representative diversified SBRSs such as MCPRN [Wang0WSOC19], ComiRec [Cen2020ControllableMF] and IDSR [chen2020improving]. Both MCPRN and ComiRec design multiple channels rather than one major channel to learn multiple purposes in a session, where recommendations strive to satisfy these purposes instead of only capturing the main purpose as representative accuracy-oriented works (e.g., NARM). Following the above multiple-purpose assumption, IDSR also jointly incorporates both item relevance and diversity into the prediction score and loss function.

Refer to caption
Figure 1: NARM vs NARM+MTL. Note: +MTL denotes the variant of NARM via leveraging item categories as input and adopting the common multi-task learning framework.

To conclude, existing studies on diversified SBRSs mainly suffer from two challenges: (1) as we can tell from previous studies, model variants like multiple channels and unique diversity-oriented loss (objective) fitted for special diversity modules are carefully calibrated by diversified SBRSs. However, such diversified designs cannot be easily adopted by existing representative accuracy-oriented SOTA SBRSs. Thus, the first research challenge lies in how to come up with simple yet effective designs (like loss function) that can facilitate the diversity performance of SOTA accuracy-oriented SBRSs? and (2) previous diversified works mostly fail to obtain a comparable performance on accuracy to those representative accuracy-oriented SBRSs, since in most cases improved diversity is reached at the cost of sacrificing a certain level of accuracy. To mitigate the adversarial effect, side information like category of items is generally imported to help better learn user preferences [zhao2018categorical, sun2019research, liu2021noninvasive]. However, for representative accuracy-oriented SBRSs, we surprisingly find that simply concatenating item ID and its category information as the input and adopting the common multi-task learning framework, as in SBRS+MTL [zhao2018categorical], cannot considerably improve recommendation performance and may even result in worse performance in terms of accuracy metrics (see Figure 1). In this case, our second challenge is to seek for a solution that can help maintain recommendation accuracy for diversified SBRSs by better exploiting category information.

Towards the aforementioned two issues, we propose a simple yet effective end-to-end Diversified Category-aware Attentive framework that can be easily instantiated with existing representative accuracy-oriented SBRSs, called DCA-SBRS, to help them generate a more diversified recommendation list without significantly sacrificing their accuracy performance. Given the widespread adoption and efficacy of attention mechanisms in existing state-of-the-art accuracy-oriented SBRSs [wang2021survey, under2023], we extend our approach by incorporating category information into the attention mechanism. Specifically, DCA-SBRS is composed of two particularly designed parts: (1) a Model-agnostic Diversity-oriented Loss (MDL) function, working with accuracy-oriented loss (e.g., cross-entropy loss), exploits items’ category attribute and estimated item scores from the given SBRS; and (2) a Non-invasive Category-aware Attention (NCA) mechanism, which inspired by NOVA [liu2021noninvasive] utilizes category information in a non-invasive way, instead of directly fusing category information, and acts as directional guidance (attention signal) to help more accurate session-based recommendation. The main contributions of this work are summarized as follows:

  • We propose a simple yet effective diversity-oriented loss function that can be used as a model-agnostic and individual plugin to deep neural accuracy-oriented SBRSs to improve their diversity performance, mitigating the technical gap between accuracy-oriented and diversified SBRSs.

  • We transfer the non-invasive idea from NOVA [liu2021noninvasive] into the common attention mechanism used in SOTA accuracy-oriented SBRSs (e.g., NARM and GCE-GNN) to capture more accurate preference by utilizing category information in a non-invasive way, so as to efficiently help maintain recommendation accuracy.

  • We conduct extensive experiments on three real-world datasets, in terms of accuracy, diversity, and comprehensive performance (jointly considering accuracy and diversity), to demonstrate the effectiveness of our DCA-SBRS framework. Experimental results unveil that, our framework can help SOTA SBRSs achieve extraordinary performance in terms of diversity and comprehensive performance (e.g., average 74.1% and 52.3% increase on ILD@@@@10 and F-score@@@@10 respectively), without significantly deteriorating recommendation accuracy in contrast with SOTA diversified SBRSs (e.g., an average of only 1.6%percent1.61.6\%1.6 % decrease on accuracy regarding NDCG@10101010 but 138%percent138138\%138 % increase on diversity for ILD@10101010 on Diginetica). Additionally, we fairly analyze the limitations of the standard comprehensive measure and offer alternative solutions.

2 Related work

Our study is related to two major areas: session-based recommendation, and diversified recommendation.

2.1 Session-Based Recommendation

The approaches on SBRSs can be divided into two groups: conventional non-neural methods and deep neural ones. Typical conventional techniques include but are not limited to Item-KNN [sarwar2001item], BPR-MF [rendle2009bpr], and FPMC [rendle2010factorizing]. For example, FPMC deploys Matrix Factorization (MF) with Markov Chain (MC) to better deal with dependent relationships between items in sequence. However, they generally suffer from inadequately addressing the item relationships in comparatively longer sequences. In contrast, deep neural networks can better deal with much longer sequences and thus generate more effective recommendation [tan2016improved, hidasi2018recurrent]. For example, GRU4Rec [hidasi2015session] and its variants [tan2016improved, hidasi2018recurrent] apply GRU to capture the long-term dependency in a sequence. NARM [li2017neural] further adopts an attention mechanism to assess the similarity between previous items and the last item in every session, and the hidden states are then weighted averaged to obtain the main-purpose session representation. And, STAMP [LiuZMZ18] models both users’ general interests and current interests using attentive nets and basic multiple-layer perceptions (MLPs) instead of adopting RNNs.

However, the above techniques only model one-way transitions between successive items, ignoring transitions between contexts (i.e., other items in the session) [qiu2019rethinking]. Recently, GNNs have been employed to mitigate the research gap [yu2020tagnn]. For instance, SR-GNN [WuT0WXT19] and GC-SAN [XuZLSXZFZ19] import GNNs to generate more accurate item embedding vectors based on the current session graph built for each session. Besides the current session graph, GCE-GNN [wang2020global] also explores item relationships in the global session graph.

It is worth noting that, the above conventional and deep neural SBRSs are all accuracy-oriented approaches that fail to consider diversity (i.e., non-diversified). Given that RSs have an iterative or closed feedback loop, this may result in filter bubbles [nguyen2014exploring, khenissi2020theoretical].

2.2 Diversified Recommendation

Towards individual diversity in traditional RSs, inspired by dissimilarity score in Maximal Marginal Relevance (MMR) [carbonell1998use], some studies [agrawal2009diversifying, santos2010exploiting] define diversification on explicit aspects (categories) or sub-queries. Besides, DPP is utilized [chen_fast_2018, kulesza2012determinantal, wu2019pd, gan2020enhancing] to provide a better relevance-diversity trade-off in recommendation as it can score sets of items collectively and consider negative correlations between various items. The aforementioned studies are two-stage ones which re-rank items accounting for diversity in the second stage. In traditional RS, there are only several end-to-end studies [zheng2021dgcn, liang2021enhancing] which simultaneously optimize diversity and accuracy.

Refer to caption
Figure 2: An Overview of Our Proposed DCA-SBRS.

To the best of our knowledge, there are only three diversified (and also end-to-end) works for session-based recommendation: MCPRN [Wang0WSOC19], ComiRec [Cen2020ControllableMF], and IDSR [chen2020improving]. Specifically, MCPRN uses mixture-channel purpose routing networks to guide multi-purpose learning, while ComiRec explores two methods as multi-interest extraction modules(i.e., the dynamic routing and self-attentive methods). Thus, multiple session representations are used by MCPRN and ComiRec to capture user preferences which can implicitly satisfy user needs. In contrast, IDSR delivers the end-to-end recommendation under the guidance of the intent-aware diversity promoting (IDP) loss and explicitly creates set diversity. A “trade-off hyper-parameter" (in IDSR) is adopted to keep the balance between recommendation relevance and diversity.

To summarize, such diversified designs in those three works cannot be easily adapted to existing representative accuracy-oriented SBRSs. Besides, regarding the widely-hold “trade-off" relationship, these studies fail to obtain a satisfying performance on recommendation accuracy (can also be observed in Tables 4-6).

3 Our DCA-SBRS Framework

In this section, we firstly formulate our research problem, and then introduce the two components in the proposed framework in detail.

3.1 Problem Statement and Model Overview

Let 𝒳={x1,x2,,xm}𝒳subscript𝑥1subscript𝑥2subscript𝑥𝑚\mathcal{X}=\{x_{1},x_{2},\cdots,x_{m}\}caligraphic_X = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } be all of items and 𝒞={c1,c2,,cn}𝒞subscript𝑐1subscript𝑐2subscript𝑐𝑛\mathcal{C}=\{c_{1},c_{2},\cdots,c_{n}\}caligraphic_C = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } be all of categories. Each anonymous session, denoted by S=[x1s,x2s,,xts]𝑆superscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠superscriptsubscript𝑥𝑡𝑠S=[x_{1}^{s},x_{2}^{s},\cdots,x_{t}^{s}]italic_S = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ], consists of item IDs in chronological order (i.e., items clicked by a user), where xissuperscriptsubscript𝑥𝑖𝑠x_{i}^{s}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT denotes the i𝑖iitalic_i-th item clicked within session S𝑆Sitalic_S. Additionally, our framework uses the category attribute of items (i.e., cissuperscriptsubscript𝑐𝑖𝑠c_{i}^{s}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT denotes the corresponding category of xissuperscriptsubscript𝑥𝑖𝑠x_{i}^{s}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT) to guide the session representation learning for better item prediction. Given a session S𝑆Sitalic_S, the objective of our session-based recommendation aims to recommend a both diversified and accurate Top-N𝑁Nitalic_N item list, denoted as y=[y1s,y2s,,yNs]𝑦superscriptsubscript𝑦1𝑠superscriptsubscript𝑦2𝑠normal-⋯superscriptsubscript𝑦𝑁𝑠y=[y_{1}^{s},y_{2}^{s},\cdots,y_{N}^{s}]italic_y = [ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , ⋯ , italic_y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ], for next-item prediction.

To address the problem, we propose a Diversified Category-aware Attentive framework which can be instantiated with SOTA accuracy-oriented SBRS, named DCA-SBRS, to improve the diversity performance of the corresponding SBRS while preserving its recommendation accuracy. It mainly consists of two novel components: 1) Model-agnostic Diversity-oriented Loss function (MDL, Ldivsubscript𝐿𝑑𝑖𝑣L_{div}italic_L start_POSTSUBSCRIPT italic_d italic_i italic_v end_POSTSUBSCRIPT), working with accuracy-oriented loss (e.g., cross-entropy loss Laccsubscript𝐿𝑎𝑐𝑐L_{acc}italic_L start_POSTSUBSCRIPT italic_a italic_c italic_c end_POSTSUBSCRIPT), which is built on items’ category attribute and estimated item scores by the SBRS. It can help achieve more diverse recommendation lists towards existing SOTA accuracy-oriented SBRSs; 2) Non-invasive Category-aware Attention (NCA) mechanism, which utilizes category information as directional guidance to replace normal attention mechanism widely used in existing SBRSs. With such design, since there exists a widely-known “trade-off" relationship between recommendation accuracy and diversity [chen2020improving], the adverse effect induced by diversity objective on recommendation accuracy can be partially alleviated.

Figure 2 presents the architecture of our DCA-SBRS framework, which depicts the installation of the MDL and NCA components on the basis of the general encoder-decoder framework and common attention mechanism from a SOTA SBRS, NARM [li2017neural]. Without losing generality, as shown in Figure 2, let encoder-decoder framework denotes the architecture of SOTA SBRSs where the encoder is to encode session representation, while the decoder is designed to estimate item scores for generating recommendations. The similarity layer projects the session representation into the item space, and then produces a Top-N𝑁Nitalic_N recommendation list. We next present the two components in detail.

3.2 Model-agnostic Diversity-oriented Loss

Refer to caption
Figure 3: The Unbalanced Grou** Induced by the Category (the symbol ‘×\times×’ denotes the outliers with a mass of involved items).

The goal of this module is to enhance diversity performance by acting as a model-agnostic plugin to accuracy-oriented SBRSs. The non-diversified SBRSs frequently predict relevance scores of items by capturing preferences from item sequences. For simplicity, we attempt to leverage the obtained relevance scores as the foundation of this module and increase recommendation diversity by penalizing more monotonous Recommendation List (e.g., most items in a top-N recommended list of the same category). To fulfill the goal, as shown in Figure 2, the model-agnostic diversity-oriented loss (Ldivsubscript𝐿𝑑𝑖𝑣L_{div}italic_L start_POSTSUBSCRIPT italic_d italic_i italic_v end_POSTSUBSCRIPT) is designed to facilitate existing SBRSs achieve the end-to-end learning. Specifically, we define it via using the entropy of estimated category distribution P^csubscript^𝑃𝑐\widehat{P}_{c}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT in a recommended list, given by,

Ldiv=H(P^c),subscript𝐿𝑑𝑖𝑣Hsubscript^𝑃𝑐L_{div}=-\mathrm{H}(\widehat{P}_{c}),italic_L start_POSTSUBSCRIPT italic_d italic_i italic_v end_POSTSUBSCRIPT = - roman_H ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) , (1)

where H(P^)=jP^cjlog2P^cjH^𝑃subscript𝑗subscript^𝑃subscript𝑐𝑗𝑙𝑜subscript𝑔2subscript^𝑃subscript𝑐𝑗\mathrm{H}(\widehat{P})=-\sum_{j}\widehat{P}_{c_{j}}log_{2}\widehat{P}_{c_{j}}roman_H ( over^ start_ARG italic_P end_ARG ) = - ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l italic_o italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT (cj𝒞subscript𝑐𝑗𝒞{c_{j}}\in\mathcal{C}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_C) denotes the information entropy. A larger H(P^c)Hsubscript^𝑃𝑐\mathrm{H}(\widehat{P}_{c})roman_H ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) depicts that the recommended list is likely to be more diverse from the category perspective. In this case, its negative value can be regarded as penalizing the recommended list with low diversity. Intuitively, the reasonable P^cisubscript^𝑃subscript𝑐𝑖\widehat{P}_{c_{i}}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT (ci𝒞)c_{i}\in\mathcal{C})italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_C ) in a recommendation list should satisfy the following two characteristics:

  • In proportion to the number of items from the category cisubscriptnormal-cnormal-ic_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: In real-world datasets, the grou** induced by categorical attribute can be very unbalanced [zhao2018categorical]. For better understanding, we select two datasets (Diginetica and Retailrocket) and statistically show the number of items belonging to the same category using Box-plot as Figure 3. As can be observed, the outliers in the Box-plot depict that for some categories, a large group of items are involved while for others only a few. The category with a larger group of items is more likely to appear in the RL without considering personalized preference.

  • In proportion to relevance scores of items: Regarding personalized preference, representative SBRSs recommend Top-N𝑁Nitalic_N items by ranking the predicted scores given session S𝑆Sitalic_S. As a result, the items with much higher scores are more likely to appear in the RL along with their corresponding categories.

Considering that common accuracy-oriented SBRSs only output predicted item scores without a special module capturing category scores, we simulate the category distribution in the RL, which can well satisfy the above two characteristics as below,

P^ci=c(xj)=ciP^xj,subscript^𝑃subscript𝑐𝑖subscript𝑐subscript𝑥𝑗subscript𝑐𝑖subscript^𝑃subscript𝑥𝑗\widehat{P}_{c_{i}}=\sum_{c(x_{j})=c_{i}}\widehat{P}_{x_{j}},over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (2)

where P^xjsubscript^𝑃subscript𝑥𝑗\widehat{P}_{x_{j}}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT depicts the predicted personalized preference score of item xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT obtained by the given SOTA SBRS (P^xj=1subscript^𝑃subscript𝑥𝑗1\sum\widehat{P}_{x_{j}}=1∑ over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 1 using softmax function on all items). We sum the scores of items from the category cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the occurred probability of category cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT so as to consider both the number of items in cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and personalized preference P^xjsubscript^𝑃subscript𝑥𝑗\widehat{P}_{x_{j}}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then, Ldivsubscript𝐿𝑑𝑖𝑣L_{div}italic_L start_POSTSUBSCRIPT italic_d italic_i italic_v end_POSTSUBSCRIPT combined with the origin accuracy-oriented loss Laccsubscript𝐿𝑎𝑐𝑐L_{acc}italic_L start_POSTSUBSCRIPT italic_a italic_c italic_c end_POSTSUBSCRIPT (e.g., the cross-entropy of the prediction results [li2017neural, wang2020global]) is the final loss function for model training,

L=Lacc+λLdiv,𝐿subscript𝐿𝑎𝑐𝑐𝜆subscript𝐿𝑑𝑖𝑣L=L_{acc}+\lambda L_{div},italic_L = italic_L start_POSTSUBSCRIPT italic_a italic_c italic_c end_POSTSUBSCRIPT + italic_λ italic_L start_POSTSUBSCRIPT italic_d italic_i italic_v end_POSTSUBSCRIPT , (3)

where λ𝜆\lambdaitalic_λ controls the importance of our proposed MDL.

Table 1: The Category-aware Attentive Signal Extension of Representative SBRSs (the symbols used in these functions are aligned with the ones used in the original papers).

Attention Signal Category-aware Attention Signal NARM αtj=𝐯𝐓σ(𝐀𝟏𝐡𝐭+𝐀𝟐𝐡𝐣)subscript𝛼𝑡𝑗superscript𝐯𝐓𝜎subscript𝐀1subscript𝐡𝐭subscript𝐀2subscript𝐡𝐣\alpha_{tj}=\bf{v}^{T}\sigma(\bf{A_{1}}\bf{h_{t}}+\bf{A_{2}}\bf{h_{j}})italic_α start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT = bold_v start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT italic_σ ( bold_A start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT + bold_A start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT ) αtj=𝐯𝐓σ(𝐀𝟏(𝐡𝐭+𝐜𝐭𝐬)+𝐀𝟐(𝐡𝐣+𝐜𝐣𝐬))subscript𝛼𝑡𝑗superscript𝐯𝐓𝜎subscript𝐀1subscript𝐡𝐭superscriptsubscript𝐜𝐭𝐬subscript𝐀2subscript𝐡𝐣superscriptsubscript𝐜𝐣𝐬\alpha_{tj}=\bf{v}^{T}\sigma(\bf{A_{1}}(\bf{h_{t}}+\bf{c_{t}^{s}})+\bf{A_{2}}(% \bf{h_{j}}+\bf{c_{j}^{s}}))italic_α start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT = bold_v start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT italic_σ ( bold_A start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT + bold_c start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) + bold_A start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT + bold_c start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) ) 𝐜𝐭𝐥=𝐣=𝟏𝐭α𝐭𝐣𝐡𝐣superscriptsubscript𝐜𝐭𝐥superscriptsubscript𝐣1𝐭subscript𝛼𝐭𝐣subscript𝐡𝐣\bf{c_{t}^{l}}=\sum_{j=1}^{t}\alpha_{tj}\bf{h_{j}}bold_c start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_l end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT bold_j = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT bold_tj end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT 𝐜𝐭𝐥=𝐣=𝟏𝐭α𝐭𝐣𝐡𝐣superscriptsubscript𝐜𝐭𝐥superscriptsubscript𝐣1𝐭subscript𝛼𝐭𝐣subscript𝐡𝐣\bf{c_{t}^{l}}=\sum_{j=1}^{t}\alpha_{tj}\bf{h_{j}}bold_c start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_l end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT bold_j = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT bold_tj end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT STAMP 𝐦𝐬=𝟏𝐭𝐢=𝟏𝐭𝐱𝐢subscript𝐦𝐬1𝐭superscriptsubscript𝐢1𝐭subscript𝐱𝐢\bf{m_{s}}=\frac{1}{t}\sum_{i=1}^{t}\bf{x_{i}}bold_m start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT = divide start_ARG bold_1 end_ARG start_ARG bold_t end_ARG ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT 𝐦𝐬=𝟏𝐭𝐢=𝟏𝐭(𝐱𝐢+𝐜𝐢)subscript𝐦𝐬1𝐭superscriptsubscript𝐢1𝐭subscript𝐱𝐢subscript𝐜𝐢\bf{m_{s}}=\frac{1}{t}\sum_{i=1}^{t}(\bf{x_{i}}+\bf{c_{i}})bold_m start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT = divide start_ARG bold_1 end_ARG start_ARG bold_t end_ARG ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT + bold_c start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT ) αi=𝐖𝟎σ(𝐖𝟏𝐱𝐢+𝐖𝟐𝐱𝐭+𝐖𝟑𝐦𝐬+𝐛𝐚)subscript𝛼𝑖subscript𝐖0𝜎subscript𝐖1subscript𝐱𝐢subscript𝐖2subscript𝐱𝐭subscript𝐖3subscript𝐦𝐬subscript𝐛𝐚\alpha_{i}=\bf{W_{0}}\sigma\left(\bf{W_{1}}\bf{x_{i}}+\bf{W_{2}}\bf{x_{t}}+\bf% {W_{3}}\bf{m_{s}}+\bf{b_{a}}\right)italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_W start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT italic_σ ( bold_W start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT + bold_W start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT + bold_W start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT ) αi=𝐖𝟎σ(𝐖𝟏(𝐱𝐢+𝐜𝐢)+𝐖𝟐(𝐱𝐭+𝐜𝐭)+𝐖𝟑𝐦𝐬+𝐛𝐚)subscript𝛼𝑖subscript𝐖0𝜎subscript𝐖1subscript𝐱𝐢subscript𝐜𝐢subscript𝐖2subscript𝐱𝐭subscript𝐜𝐭subscript𝐖3subscript𝐦𝐬subscript𝐛𝐚\alpha_{i}=\bf{W_{0}}\sigma\left(\bf{W_{1}}(\bf{x_{i}}+\bf{c_{i}})+\bf{W_{2}}(% \bf{x_{t}}+\bf{c_{t}})+\bf{W_{3}}\bf{m_{s}}+\bf{b_{a}}\right)italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_W start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT italic_σ ( bold_W start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT + bold_c start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT ) + bold_W start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT + bold_c start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT ) + bold_W start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT ) 𝐦𝐚=𝐢=𝟏𝐭α𝐢𝐱𝐢subscript𝐦𝐚superscriptsubscript𝐢1𝐭subscript𝛼𝐢subscript𝐱𝐢\bf{m_{a}}=\sum_{i=1}^{t}\alpha_{i}\bf{x_{i}}bold_m start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT 𝐦𝐚=𝐢=𝟏𝐭α𝐢𝐱𝐢subscript𝐦𝐚superscriptsubscript𝐢1𝐭subscript𝛼𝐢subscript𝐱𝐢\bf{m_{a}}=\sum_{i=1}^{t}\alpha_{i}\bf{x_{i}}bold_m start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT GCE-GNN 𝐳𝐢=tanh(𝐖𝟑[𝐡𝐯𝐢𝐬𝐩𝐥𝐢+𝟏]+𝐛𝟑)subscript𝐳𝐢subscript𝐖3delimited-[]conditionalsuperscriptsubscript𝐡superscriptsubscript𝐯𝐢𝐬subscript𝐩𝐥𝐢1subscript𝐛3\bf{z}_{i}=\tanh\left(\bf{W}_{3}\left[\bf{h}_{v_{i}^{s}}^{\prime}\|\bf{p}_{l-i% +1}\right]+\bf{b}_{3}\right)bold_z start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT = roman_tanh ( bold_W start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT [ bold_h start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ bold_p start_POSTSUBSCRIPT bold_l - bold_i + bold_1 end_POSTSUBSCRIPT ] + bold_b start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT ) 𝐳𝐢=tanh(𝐖𝟑[𝐡𝐯𝐢𝐬𝐩𝐥𝐢+𝟏𝐜𝐢𝐬]+𝐛𝟑)subscript𝐳𝐢subscript𝐖3delimited-[]superscriptsubscript𝐡superscriptsubscript𝐯𝐢𝐬normsubscript𝐩𝐥𝐢1superscriptsubscript𝐜𝐢𝐬subscript𝐛3\bf{z}_{i}=\tanh\left(\bf{W}_{3}\left[\bf{h}_{v_{i}^{s}}^{\prime}\|\bf{p}_{l-i% +1}\|\bf{c}_{i}^{s}\right]+\bf{b}_{3}\right)bold_z start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT = roman_tanh ( bold_W start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT [ bold_h start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ bold_p start_POSTSUBSCRIPT bold_l - bold_i + bold_1 end_POSTSUBSCRIPT ∥ bold_c start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ] + bold_b start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT ) 𝐬=𝟏𝐥𝐢=𝟏𝐥𝐡𝐯𝐢𝐬superscript𝐬1𝐥superscriptsubscript𝐢1𝐥superscriptsubscript𝐡superscriptsubscript𝐯𝐢𝐬\bf{s}^{\prime}=\frac{1}{l}\sum_{i=1}^{l}\bf{h}_{v_{i}^{s}}^{\prime}bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG bold_1 end_ARG start_ARG bold_l end_ARG ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_l end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 𝐬=𝟏𝐥𝐢=𝟏𝐥(𝐡𝐯𝐢𝐬+𝐜𝐥𝐬)superscript𝐬1𝐥superscriptsubscript𝐢1𝐥superscriptsubscript𝐡superscriptsubscript𝐯𝐢𝐬superscriptsubscript𝐜𝐥𝐬\bf{s}^{\prime}=\frac{1}{l}\sum_{i=1}^{l}\left(\bf{h}_{v_{i}^{s}}^{\prime}+\bf% {c}_{l}^{s}\right)bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG bold_1 end_ARG start_ARG bold_l end_ARG ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_l end_POSTSUPERSCRIPT ( bold_h start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_c start_POSTSUBSCRIPT bold_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) βi=𝐪𝟐σ(𝐖𝟒𝐳𝐢+𝐖𝟓𝐬+𝐛𝟒)subscript𝛽𝑖superscriptsubscript𝐪2top𝜎subscript𝐖4subscript𝐳𝐢subscript𝐖5superscript𝐬subscript𝐛4\beta_{i}=\bf{q}_{2}^{\top}\sigma\left(\bf{W}_{4}\bf{z}_{i}+\bf{W}_{5}\bf{s}^{% \prime}+\bf{b}_{4}\right)italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_q start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_σ ( bold_W start_POSTSUBSCRIPT bold_4 end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT + bold_W start_POSTSUBSCRIPT bold_5 end_POSTSUBSCRIPT bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_b start_POSTSUBSCRIPT bold_4 end_POSTSUBSCRIPT ) βi=𝐪𝟐σ(𝐖𝟒𝐳𝐢+𝐖𝟓𝐬+𝐛𝟒)subscript𝛽𝑖superscriptsubscript𝐪2top𝜎subscript𝐖4subscript𝐳𝐢subscript𝐖5superscript𝐬subscript𝐛4\beta_{i}=\bf{q}_{2}^{\top}\sigma\left(\bf{W}_{4}\bf{z}_{i}+\bf{W}_{5}\bf{s}^{% \prime}+\bf{b}_{4}\right)italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_q start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_σ ( bold_W start_POSTSUBSCRIPT bold_4 end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT + bold_W start_POSTSUBSCRIPT bold_5 end_POSTSUBSCRIPT bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_b start_POSTSUBSCRIPT bold_4 end_POSTSUBSCRIPT ) 𝐒=𝐢=𝟏𝐥β𝐢𝐡𝐯𝐢𝐬𝐒superscriptsubscript𝐢1𝐥subscript𝛽𝐢superscriptsubscript𝐡superscriptsubscript𝐯𝐢𝐬\bf{S}=\sum_{i=1}^{l}\beta_{i}\bf{h}_{v_{i}^{s}}^{\prime}bold_S = ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_l end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 𝐒=𝐢=𝟏𝐥β𝐢𝐡𝐯𝐢𝐬𝐒superscriptsubscript𝐢1𝐥subscript𝛽𝐢superscriptsubscript𝐡superscriptsubscript𝐯𝐢𝐬\bf{S}=\sum_{i=1}^{l}\beta_{i}\bf{h}_{v_{i}^{s}}^{\prime}bold_S = ∑ start_POSTSUBSCRIPT bold_i = bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_l end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

3.3 Non-invasive Category-aware Attention

There exists a widely-known “trade-off" relationship between recommendation accuracy and diversity [chen2020improving]. In this case, the plugged diversity loss (in MDL) will probably lead to deteriorating performance on recommendation accuracy towards accuracy-oriented SBRSs. To address this issue, we consider to exploit category information to enhance preference learning.

As shown in Figure 1, invasive fusion (like merely concatenating item embeddings with the relevant category embeddings as input), might not considerably improve recommendation accuracy. Therefore, considering that attention mechanisms are widely adopted by SOTA accuracy-oriented SBRSs, we transfer the non-invasive idea from NOVA [liu2021noninvasive] into the common attention mechanism. Specifically, as shown in Figure 2, the encoder, employing a deep learning technique as an existing SBRS (e.g., RNN [li2017neural], MLP [LiuZMZ18], or GNN [WuT0WXT19, wang2020global]), firstly coverts session S=[x1s,x2s,,xts]𝑆superscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠superscriptsubscript𝑥𝑡𝑠S=[x_{1}^{s},x_{2}^{s},\cdots,x_{t}^{s}]italic_S = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] into a set of high-dimensional hidden states 𝐡=[𝐡𝟏,𝐡𝟐,,𝐡𝐭]𝐡subscript𝐡1subscript𝐡2subscript𝐡𝐭\bf{h}=[\bf{h_{1}},\bf{h_{2}},\cdots,\bf{h_{t}}]bold_h = [ bold_h start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT , ⋯ , bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT ], which are weighted summed by attention signal output by common attention mechanism at time t𝑡titalic_t (denoted as α𝐭={α𝐭𝟏,,α𝐭𝐭}subscript𝛼𝐭subscript𝛼𝐭𝟏subscript𝛼𝐭𝐭\bf{\alpha}_{t}=\{\alpha_{t1},\ldots,\alpha_{tt}\}italic_α start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT = { italic_α start_POSTSUBSCRIPT bold_t1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT bold_tt end_POSTSUBSCRIPT }) to obtain the current session representation decoded at time t𝑡titalic_t (denoted as 𝐬𝐭subscript𝐬𝐭\bf{s_{t}}bold_s start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT).

The category-aware extensions for SOTA SBRSs with attention mechanism (i.e. NARM, STAMP, and GCE-GNN) are described in detail in Table 1, where the symbols in the functions are unified with the original papers and thus the corresponding detailed explanation is omitted here. Note that 𝐜𝐣𝐬superscriptsubscript𝐜𝐣𝐬\bf{c_{j}^{s}}bold_c start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT is the corresponding category embedding vector of item xjssuperscriptsubscript𝑥𝑗𝑠x_{j}^{s}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT in session S=[x1s,x2s,,xts]𝑆superscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠superscriptsubscript𝑥𝑡𝑠S=[x_{1}^{s},x_{2}^{s},\cdots,x_{t}^{s}]italic_S = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ].

Here, we use NARM [li2017neural] as an example to further elaborate our NCA. In NARM, the attention signal αtjsubscript𝛼𝑡𝑗\alpha_{tj}italic_α start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT is computed as the correlation between the final hidden state 𝐡𝐭subscript𝐡𝐭\bf{h_{t}}bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT and the hidden state of the j𝑗jitalic_j-th item, 𝐡𝐣subscript𝐡𝐣\bf{h_{j}}bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT,

αtj=q(𝐡𝐭,𝐡𝐣)=𝐯𝐓σ(𝐀𝟏𝐡𝐭+𝐀𝟐𝐡𝐣),subscript𝛼𝑡𝑗𝑞subscript𝐡𝐭subscript𝐡𝐣superscript𝐯𝐓𝜎subscript𝐀1subscript𝐡𝐭subscript𝐀2subscript𝐡𝐣\alpha_{tj}=q(\bf{h_{t}},\bf{h_{j}})=\bf{v}^{T}\sigma(\bf{A_{1}}\bf{h_{t}}+\bf% {A_{2}}\bf{h_{j}}),italic_α start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT = italic_q ( bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT ) = bold_v start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT italic_σ ( bold_A start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT + bold_A start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT ) , (4)

where σ𝜎\sigmaitalic_σ is an activate function (e.g., sigmoid function) and matrix 𝐀𝟏,𝐀𝟐subscript𝐀1subscript𝐀2\bf{A_{1}},\bf{A_{2}}bold_A start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT , bold_A start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT are used to transform hidden states into a latent space, respectively. Correspondingly, our NCA mechanism further uses the category attribute as directional guidance and keeps the hidden states undoped in their vector space. Specifically, NCA uses the category attribute to update the attention signal as:

αtjsubscript𝛼𝑡𝑗\displaystyle\alpha_{tj}italic_α start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT =q(𝐡𝐭𝐜𝐭𝐬,𝐡𝐣𝐜𝐣𝐬)absent𝑞direct-sumsubscript𝐡𝐭superscriptsubscript𝐜𝐭𝐬direct-sumsubscript𝐡𝐣superscriptsubscript𝐜𝐣𝐬\displaystyle=q(\bf{h_{t}}\oplus\bf{c_{t}^{s}},\bf{h_{j}}\oplus\bf{c_{j}^{s}})= italic_q ( bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT ⊕ bold_c start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT ⊕ bold_c start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) (5)
=𝐯𝐓σ(𝐀𝟏(𝐡𝐭+𝐜𝐭𝐬)+𝐀𝟐(𝐡𝐣+𝐜𝐣𝐬)),absentsuperscript𝐯𝐓𝜎subscript𝐀1subscript𝐡𝐭superscriptsubscript𝐜𝐭𝐬subscript𝐀2subscript𝐡𝐣superscriptsubscript𝐜𝐣𝐬\displaystyle=\bf{v}^{T}\sigma(\bf{A_{1}}(\bf{h_{t}}+\bf{c_{t}^{s}})+\bf{A_{2}% }(\bf{h_{j}}+\bf{c_{j}^{s}})),= bold_v start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT italic_σ ( bold_A start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT + bold_c start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) + bold_A start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT + bold_c start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) ) ,

where 𝐜𝐣𝐬superscriptsubscript𝐜𝐣𝐬\bf{c_{j}^{s}}bold_c start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT is the corresponding category embedding vector of item xjssuperscriptsubscript𝑥𝑗𝑠x_{j}^{s}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and direct-sum\oplus denotes element-wise addition. Note that here we use the simplest fusor ‘addition’ to straightforwardly add the hidden states and category embedding vectors in this paper. It can also be replaced by other fusors, like ‘concatenation’ or ‘gating’ [liu2021noninvasive].

To conclude, by doing this, we have successfully exploited category information in a non-invasive way to help generate attention signals, with the goal of maintaining the recommendation accuracy.

3.4 Discussion: Simple yet Effective Approach

Our DCA-SBRS framework can serve as a plugin for SOTA accuracy-oriented SBRSs to improve their diversity performance with MDL module while in the meantime striving to maintain their recommendation accuracy with NCA mechanism. Generally speaking, both MDL module and NCA mechanism can be easily equipped with existing SOTA accuracy-oriented SBRSs to further promote their performance regarding diversity towards more trustworthy recommender systems [ge2022survey, wang2022trustworthy]. Extensive experimental results in Section 5 verify that our approach can help SOTA SBRSs (i.e., NARM, STAMP, and GCE-GNN) obtain extraordinary performance in terms of recommendation diversity and comprehensive performance (considering both accuracy and diversity).

Besides, our approach is much more lightweight (simple yet effective) than existing diversified recommender systems: (1) in contrast to the (two-stage) re-ranking methods (e.g., MMR [carbonell1998use]), MDL can achieve end-to-end learning, that is, simultaneously maximizing accuracy and diversity objectives; (2) unlike other diversified SBRSs (e.g., IDSR [chen2020improving]) relying on specifically calibrated diversity-aware components with a substantial amount of extra parameters, our MDL module is a model-agnostic plugin by utilizing the estimated relevance scores of items from every existing SOTA SBRS and the category information, which thus requires limited extra parameters and is efficiently comparable to the corresponding SBRS; and (3) both MMR [carbonell1998use] and IDSR [chen2020improving] employ a greedy iterative inference algorithm to generate the final Top-N𝑁Nitalic_N recommended lists. On the contrary, our DCA-SBRS framework directly generate a recommended list including Top-N items with the highest final scores, implying that our approach is more computationally efficient in model inferences. It is also empirically verified in table 7.

Table 2: Statistics of Datasets (Note: # train and #test are the number of sessions before sequence splitting preprocess; avg. len. denotes the average session length; DS is the diversity score defined in Section 4.3; and RR [repeat2019] is the repeat ratio, indicating the ratio of repeated items within a session.).
Dataset Diginetica Retailrocket Tmall
# interactions 993,483 1,040,796 1,505,683
# train 186,670 283,446 188,756
# test 18,101 11,718 51,894
# items 43,097 45,831 96,182
# categories 995 871 822
avg. len. 4.8504 3.5262 6.0775
train DS 0.3741 0.4646 0.6575
test DS 0.3721 0.4893 0.6278
train RR 0.1301 0.2488 0
test RR 0.1317 0.2370 0

4 Experimental Settings

In this section, we introduce the selection of datasets, baselines, and evaluation metrics. The specifics of dataset preprocessing and partitioning, as well as the hyper-parameter settings for our methods and other baselines, are also provided. The source code and datasets are available online111https://github.com/qyin863/DCA-SBRS..

4.1 Datasets and Preprocessing

For the experimental purpose, we delicately select three representative public e-commerce datasets (i.e., Diginetica222https://competitions.codalab.org/competitions/11161#learn_the_details-overview., Retailrocket333https://www.kaggle.com/retailrocket/ecommerce-dataset., Tmall444https://tianchi.aliyun.com/dataset/dataDetail?dataId=42.) with item category information, following [wang2021survey, li2017neural, wang2020global].

  • Diginetica from CIKM Cup 2016, contains user sessions, taken from records of an e-commerce search engine with its own ‘SessionId’. We only use the data with the behavior type ‘view’.

  • Retailrocket collects users’ interactions on an e-commerce website over a period of 4.5 months. We select interactions with the behavior type ‘view’, and a new session is created when the user’s idle time exceeds 30 minutes following [luo2020collaborative].

  • Tmall from the IJCAI-15 competition, includes anonymous Tmall shop** logs. We adopt interactions with the behavior type ‘buy’ and ‘view’, and partition user history into sessions by day following [ludewig2018evaluation]. We pick 1/161161/161 / 16 sessions as a sampling inspired by Yoochoose fractions [li2017neural].

For data preprocessing, following [li2017neural, LiuZMZ18, WuT0WXT19], we filter out sessions of length 1111 and items occuring less than 5555 times. Then we set the most recent data (i.e., the last one week) as the test set and the previous sessions as the training set. The validation set contains the final week of data from the training set. Additionally, we drop items appearing in the test set but not in the training set. The statistics of these three datasets after preprocessing are shown in Table 2. A sequence splitting preprocess, that is, generating n1𝑛1n-1italic_n - 1 sub-sequences ([i1],i2)delimited-[]subscript𝑖1subscript𝑖2([i_{1}],i_{2})( [ italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), ([i1,i2],i3)subscript𝑖1subscript𝑖2subscript𝑖3([i_{1},i_{2}],i_{3})( [ italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] , italic_i start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), \dots, ([i1,,in1],in)subscript𝑖1subscript𝑖𝑛1subscript𝑖𝑛([i_{1},\dots,i_{n-1}],i_{n})( [ italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) for a session sequence S=[i1,i2,,in]𝑆subscript𝑖1subscript𝑖2subscript𝑖𝑛S=[i_{1},i_{2},\dots,i_{n}]italic_S = [ italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ], is required if a recommendation model is not trained in session-parallel manner [hidasi2015session].

4.2 Baseline Models

To explore the recommendation performance on accuracy and diversity, following [wang2021survey, wang2020global, chen2020improving], we select three categories of popular and representative baseline models for session-based recommendation, including traditional methods, deep neural methods with attention mechanism (as they are chosen as the basic predictors in our proposed framework), and deep diversified methods.

1. Traditional Methods.

  • Item-KNN [sarwar2001item] measures cosine similarity of every two items regarding sessions in the training data. It recommends items for a session that are most similar to the last item.

  • BPR-MF [rendle2009bpr] performs Matrix Factorization (MF) with a pairwise ranking loss. Particularly, the session feature vector is averaged over all items in the session.

2. Deep Neural Methods with Attention Mechanism.

  • NARM [li2017neural] is an RNN-based model with an attention mechanism, which combines the last hidden vector and the main purpose from the hidden states as the final representation to produce recommendations.

  • STAMP [LiuZMZ18] applies attention layers on item representations directly and captures the user’s long-term preference as well as short-term interest from the session context.

  • GCE-GNN [wang2020global] constructs both the local (current session) and global (all sessions) graphs to obtain session- and global-level item embeddings. Then, before the soft attention, it incorporates the reversed position information into the item embedding.

3. Deep Diversified Methods.

  • MCPRN [Wang0WSOC19] models users’ multiple purposes of the session, rather than only one purpose in common SBRSs. Furthermore, it combines the above various learned purposes by the target-aware attention to get the final representation. As stated in the original paper, MCPRN can boost both accuracy and diversity.

  • NARM+MMR [chen2020improving] is a two-stage approach which in the second stage uses MMR [carbonell1998use] and a greedy algorithm to re-rank items provided by NARM in terms of relevance scores in the first stage.

  • IDSR [chen2020improving] is the first end-to-end deep neural network for SBRSs that takes both diversity and accuracy into account. The hyper-parameter λ𝜆\lambdaitalic_λ is used to balance the relevance score and diversification score.

4.3 Evaluation Metrics

We adopt the following metrics related to accuracy, diversity, and both to conduct a thorough evaluation. Higher metric values indicate better performance. Towards accuracy, we select HR (Hit Rate), MRR (Mean Reciprocal Rank), and NDCG (Normalized Discounted Cumulative Gain) by following state-of-the-arts [li2017neural, WuT0WXT19, wang2020global]. Specifically, HR depicts whether the Top-N𝑁Nitalic_N Recommended List (abbreviated as RL, and N𝑁Nitalic_N is the length of the RL) contains the target item; MRR and NDCG both measure the hit position and encourage the predicted item to rank ahead in the recommended list. Towards diversity, we choose the widely-used ILD (Intra-List Distance) [Cen2020ControllableMF, chen2020improving], Entropy [Wang0WSOC19, zheng2021dgcn], and Diversity Score [liang2021enhancing] as the evaluation metrics. Particularly, ILD measures the average distance between each pair of items in the recommended list,

ILD=(i,j)RLdij|RL|×(|RL|1),ILDsubscript𝑖𝑗𝑅𝐿subscript𝑑𝑖𝑗𝑅𝐿𝑅𝐿1\text{ILD}=\frac{\sum_{(i,j)\in RL}d_{ij}}{|RL|\times(|RL|-1)},ILD = divide start_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ italic_R italic_L end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG | italic_R italic_L | × ( | italic_R italic_L | - 1 ) end_ARG , (6)

where dijsubscript𝑑𝑖𝑗d_{ij}italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT represents the euclidean distance between the respective embeddings (e.g., one-hot encoding) of categories that items i𝑖iitalic_i and j𝑗jitalic_j belong to.

Entropy measures the entropy of item category distribution in the recommended list; and Diversity Score (shorted as DS) is calculated by the number of interacted/recommended categories divided by number of interacted/recommended items. Additionally, we use F-score [hu2017diversifying], the harmonic mean of HR and ILD, as an aggregative indicator capturing both accuracy and diversity.

4.4 Hyper-parameter Settings

For a fair comparison, we use the Bayesian TPE555Compared to the grid and random search, it has proven to be a more intelligent and effective technique, especially for deep methods (having more hyper-parameters) [sun2020we]. [bergstra2011algorithms] of Hyperopt666https://github.com/hyperopt/hyperopt framework to tune hyper-parameters of all methods according to their performance on the validation set (i.e., the last week of the training set). We have integrated all the codes with PyTorch framework, except for IDSR. Specifically, we adopt its official code777https://bitbucket.org/WanyuChen/idsr/ with its own early-stop** mechanism. For all methods, Adam is utilized as the model optimizer; the dimension of item embedding is searched in the range of [100,300]100300[100,300][ 100 , 300 ] stepped by 50; the learning rate is searched in {0.001,0.005,0.01,0.05}0.0010.0050.010.05\{0.001,0.005,0.01,0.05\}{ 0.001 , 0.005 , 0.01 , 0.05 }; the size of mini-batch is searched from {64,128,256,512}64128256512\{64,128,256,512\}{ 64 , 128 , 256 , 512 }; the number of epochs is searched in the range of [10,40]1040[10,40][ 10 , 40 ] stepped by 5. The exceptions are made on GCE-GNN, where we set its dimension of item embedding and size of mini-batch as 100100100100 (consistent with the original paper setting) due to memory space limitations; and set the size of mini-batch as 50505050 for MCPRN. For IDSR, we search λ𝜆\lambdaitalic_λ, which balances the importance of relevance and diversification scores, in {0.2,0.5,0.8}0.20.50.8\{0.2,0.5,0.8\}{ 0.2 , 0.5 , 0.8 } on every dataset. Moreover, for NARM+MMR, we set the multiplier λ=5e6𝜆5𝑒6\lambda=5e-6italic_λ = 5 italic_e - 6 for the diversification score in MMR, so as to avoid a significant decrease (e.g., more than 20% decline) on accuracy performance in comparison with NARM. The detailed best hyper-parameter settings are shown in Table 3.

Table 3: The Optimal Hyper-parameter Settings by Bayesian TPE of Hyperopt.

Model Hyper-parameter Digi* Retail* Tmall Searching Space Description Item-KNN -alpha 0.9270 0.7100 0.8514 𝒰(0.1,1)𝒰0.11\mathcal{U}(0.1,1)caligraphic_U ( 0.1 , 1 ) Balance for normalizing items’ supports BPR-MF -item_*_dim 300 100 200 [min=100,max=300,step=50]delimited-[]formulae-sequence𝑚𝑖𝑛100formulae-sequence𝑚𝑎𝑥300𝑠𝑡𝑒𝑝50[min=100,max=300,step=50][ italic_m italic_i italic_n = 100 , italic_m italic_a italic_x = 300 , italic_s italic_t italic_e italic_p = 50 ] the dimension of item embedding -lr 0.01 0.01 0.001 [0.001,0.005,0.01,0.05]0.0010.0050.010.05[0.001,0.005,0.01,0.05][ 0.001 , 0.005 , 0.01 , 0.05 ] learning rate -batch_size 64 64 512 [64,128,256,512]64128256512[64,128,256,512][ 64 , 128 , 256 , 512 ] the size for mini-batch -epochs 20 20 40 [min=10,max=40,step=5]delimited-[]formulae-sequence𝑚𝑖𝑛10formulae-sequence𝑚𝑎𝑥40𝑠𝑡𝑒𝑝5[min=10,max=40,step=5][ italic_m italic_i italic_n = 10 , italic_m italic_a italic_x = 40 , italic_s italic_t italic_e italic_p = 5 ] the number of epochs NARM -item_*_dim 200 100 250 [min=100,max=300,step=50]delimited-[]formulae-sequence𝑚𝑖𝑛100formulae-sequence𝑚𝑎𝑥300𝑠𝑡𝑒𝑝50[min=100,max=300,step=50][ italic_m italic_i italic_n = 100 , italic_m italic_a italic_x = 300 , italic_s italic_t italic_e italic_p = 50 ] -lr 0.001 0.001 0.005 [0.001,0.005,0.01,0.05]0.0010.0050.010.05[0.001,0.005,0.01,0.05][ 0.001 , 0.005 , 0.01 , 0.05 ] -batch_size 512 512 256 [64,128,256,512]64128256512[64,128,256,512][ 64 , 128 , 256 , 512 ] -epochs 35 40 25 [min=10,max=40,step=5]delimited-[]formulae-sequence𝑚𝑖𝑛10formulae-sequence𝑚𝑎𝑥40𝑠𝑡𝑒𝑝5[min=10,max=40,step=5][ italic_m italic_i italic_n = 10 , italic_m italic_a italic_x = 40 , italic_s italic_t italic_e italic_p = 5 ] -hidden_size 50 150 150 [min=50,max=200,step=50]delimited-[]formulae-sequence𝑚𝑖𝑛50formulae-sequence𝑚𝑎𝑥200𝑠𝑡𝑒𝑝50[min=50,max=200,step=50][ italic_m italic_i italic_n = 50 , italic_m italic_a italic_x = 200 , italic_s italic_t italic_e italic_p = 50 ] the dimension of latent vector -n_layers 1 1 1 [1,2,3]123[1,2,3][ 1 , 2 , 3 ] the number of layers in RNN DCA-NARM -item_*_dim 100 100 200 -lr 0.001 0.001 0.005 -batch_size 512 512 256 -epochs 20 15 20 -hidden_size 200 100 150 -n_layers 1 2 1 STAMP -item_*_dim 100 100 150 [min=100,max=300,step=50]delimited-[]formulae-sequence𝑚𝑖𝑛100formulae-sequence𝑚𝑎𝑥300𝑠𝑡𝑒𝑝50[min=100,max=300,step=50][ italic_m italic_i italic_n = 100 , italic_m italic_a italic_x = 300 , italic_s italic_t italic_e italic_p = 50 ] -lr 0.001 0.001 0.01 [0.001,0.005,0.01,0.05]0.0010.0050.010.05[0.001,0.005,0.01,0.05][ 0.001 , 0.005 , 0.01 , 0.05 ] -batch_size 128 512 256 [64,128,256,512]64128256512[64,128,256,512][ 64 , 128 , 256 , 512 ] -epochs 35 20 35 [min=10,max=40,step=5]delimited-[]formulae-sequence𝑚𝑖𝑛10formulae-sequence𝑚𝑎𝑥40𝑠𝑡𝑒𝑝5[min=10,max=40,step=5][ italic_m italic_i italic_n = 10 , italic_m italic_a italic_x = 40 , italic_s italic_t italic_e italic_p = 5 ] DCA-STAMP -item_*_dim 100 200 200 -lr 0.001 0.001 0.01 -batch_size 256 512 512 -epochs 35 15 40 GCE-GNN -item_*_dim 250 100 100 [100]delimited-[]100[100][ 100 ] -lr 0.001 0.001 0.005 [0.001,0.005]0.0010.005[0.001,0.005][ 0.001 , 0.005 ] -batch_size 128 100 100 [100]delimited-[]100[100][ 100 ] -epochs 10 30 20 [min=10,max=30,step=5]delimited-[]formulae-sequence𝑚𝑖𝑛10formulae-sequence𝑚𝑎𝑥30𝑠𝑡𝑒𝑝5[min=10,max=30,step=5][ italic_m italic_i italic_n = 10 , italic_m italic_a italic_x = 30 , italic_s italic_t italic_e italic_p = 5 ] -n_iter 1 1 2 [1,2]12[1,2][ 1 , 2 ] the number of hop -dropout_gcn 0.4 0.4 0.2 [0,0.2,0.4,0.6,0.8]00.20.40.60.8[0,0.2,0.4,0.6,0.8][ 0 , 0.2 , 0.4 , 0.6 , 0.8 ] dropout rate -dropout_local 0.5 0.0 0.0 [0,0.5]00.5[0,0.5][ 0 , 0.5 ] dropout rate DCA-GCEGNN -item_*_dim 100 100 100 -lr 0.001 0.001 0.005 -batch_size 100 100 100 -epochs 30 20 20 -n_iter 2 2 2 -dropout_gcn 0.0 0.2 0.4 -dropout_local 0.0 0.0 0.0 MCPRN -item_*_dim 150 150 100 [min=100,max=200,step=50]delimited-[]formulae-sequence𝑚𝑖𝑛100formulae-sequence𝑚𝑎𝑥200𝑠𝑡𝑒𝑝50[min=100,max=200,step=50][ italic_m italic_i italic_n = 100 , italic_m italic_a italic_x = 200 , italic_s italic_t italic_e italic_p = 50 ] dimension of item embedding/latent vector -lr 0.005 0.005 0.005 [0.005,0.01,0.05]0.0050.010.05[0.005,0.01,0.05][ 0.005 , 0.01 , 0.05 ] -batch_size 256 50 50 [50]delimited-[]50[50][ 50 ] -epochs 15 30 25 [min=10,max=40,step=5]delimited-[]formulae-sequence𝑚𝑖𝑛10formulae-sequence𝑚𝑎𝑥40𝑠𝑡𝑒𝑝5[min=10,max=40,step=5][ italic_m italic_i italic_n = 10 , italic_m italic_a italic_x = 40 , italic_s italic_t italic_e italic_p = 5 ] -tau 1 0.01 0.01 [0.01,0.1,1,10]0.010.1110[0.01,0.1,1,10][ 0.01 , 0.1 , 1 , 10 ] temperature parameter in softmax -purposes 1 4 1 [1,2,3,4]1234[1,2,3,4][ 1 , 2 , 3 , 4 ] The number of channels Remark 1. Digi* represents Diginetica, Retail* for Retailrocket, item_*_dim for item_embedding_dim. 2. Omit the hyper-parameter description if exists before. Additionally, DCA-SBRS and the related SBRS have the same searching space, hence omit. 3. Due to memory limit, set item_*_dim, batch_size as 100 (original setting) in GCE-GNN, and batch_size as 50 in MCPRN except Digi*. 4. IDSR uses own official TensorFlow code with early-stop**. Tune λe[0.1,1]subscript𝜆𝑒0.11\lambda_{e}\in[0.1,1]italic_λ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ∈ [ 0.1 , 1 ] and set it as 1 for four datasets. Besides, tune the trade-off hyper-parameter λ𝜆\lambdaitalic_λ from {0.2,0.5,0.8}0.20.50.8\{0.2,0.5,0.8\}{ 0.2 , 0.5 , 0.8 } aiming competitive accuracy and set it as 0.8,0.5,0.80.80.50.80.8,0.5,0.80.8 , 0.5 , 0.8 for three datasets respectively.

5 Experimental Results

In this section, we evaluate the performance of DCA-SBRS on the three selected real-world datasets to verify its superiority (in comparison with other SOTA methods) and the effectiveness of its respective modules. Additionally, we analyze the shortcomings of the standard comprehensive measurement to measure both accuracy and diversity (i.e., F-score), and provide remedies accordingly.

5.1 Overall Comparisons

Tables 4-6 exhibit the experimental results of the chosen baselines on the three real-world datasets, where the best result for each metric is highlighted in boldface and the runner-up is underlined; the row ‘Improvements’ indicates the average relative enhancements achieved by our DCA-SBRSs over the corresponding SBRSs on various metrics across the three datasets, as shown in Equation 7. Note that the reported performance per model in the tables is the average results via running 5 times with the best hyper-parameter settings.

Improvements=Improvementsabsent\displaystyle\text{Improvements}=Improvements = (7)
DCA-NARMNARMNARM+DCA-STAMPSTAMPSTAMP+DCA-GCEGNNGCE-GNNGCE-GNN3DCA-NARMNARMNARMDCA-STAMPSTAMPSTAMPDCA-GCEGNNGCE-GNNGCE-GNN3\displaystyle\frac{\frac{\text{DCA-NARM}-\text{NARM}}{\text{NARM}}+\frac{\text% {DCA-STAMP}-\text{STAMP}}{\text{STAMP}}+\frac{\text{DCA-GCEGNN}-\text{GCE-GNN}% }{\text{GCE-GNN}}}{3}divide start_ARG divide start_ARG DCA-NARM - NARM end_ARG start_ARG NARM end_ARG + divide start_ARG DCA-STAMP - STAMP end_ARG start_ARG STAMP end_ARG + divide start_ARG DCA-GCEGNN - GCE-GNN end_ARG start_ARG GCE-GNN end_ARG end_ARG start_ARG 3 end_ARG
Table 4: Model Performance on Diginetica. Best result is highlighted in boldface and the runner-up is underlined; ‘Improvements’ indicates the average relative enhancements achieved by our DCA-SBRSs over the corresponding SBRSs as Equation 7.
Model\\\backslash\Metric NDCG MRR HR ILD Entropy DS F-score
@10 @20 @10 @20 @10 @20 @10 @20 @10 @20 @10 @20 @10 @20
Item-KNN 0.1313 0.1438 0.0999 0.1036 0.2343 0.2814 0.1653 0.2247 0.2852 0.4353 0.1562 0.1376 0.0375 0.0635
BPR-MF 0.0799 0.0954 0.0618 0.0661 0.1397 0.2012 0.5334 0.5799 0.9490 1.2148 0.2871 0.2159 0.0676 0.1061
NARM 0.3191 0.3468 0.2578 0.2654 0.5162 0.6256 0.1811 0.2519 0.3047 0.5037 0.1575 0.1182 0.0921 0.1645
STAMP 0.3143 0.3385 0.2558 0.2624 0.5018 0.5973 0.2704 0.3923 0.4781 0.8410 0.1977 0.1783 0.1381 0.2491
GCE-GNN 0.34580.3458\mathbf{0.3458}bold_0.3458 0.37230.3723\mathbf{0.3723}bold_0.3723 0.28760.2876\mathbf{0.2876}bold_0.2876 0.29500.2950\mathbf{0.2950}bold_0.2950 0.53240.5324\mathbf{0.5324}bold_0.5324 0.63730.6373\mathbf{0.6373}bold_0.6373 0.1124 0.1623 0.1825 0.3096 0.1328 0.0892 0.0627 0.1145
MCPRN 0.2321 0.2610 0.1858 0.1938 0.3829 0.4972 0.2671 0.3394 0.4651 0.7106 0.1935 0.1556 0.1100 0.1867
NARM+MMR 0.2626 0.2896 0.2092 0.2167 0.4354 0.5420 0.3484 0.4574 0.6157 0.9691 0.2234 0.1909 0.1401 0.2443
IDSR(λ=0.8𝜆0.8\lambda=0.8italic_λ = 0.8) 0.2681 0.2958 0.2140 0.2217 0.4438 0.5532 0.4105 0.4635 0.7464 1.0110 0.2593 0.2090 0.1814 0.2688
DCA-NARM 0.3226 0.3435 0.2641 0.2699 0.5099 0.5920 0.4115 0.6791 0.7698 1.6254 0.2691 0.3399 0.2022 0.4017
DCA-STAMP 0.3067 0.3237 0.2529 0.2577 0.4779 0.5444 0.57500.5750\mathbf{0.5750}bold_0.5750 0.87130.8713\mathbf{0.8713}bold_0.8713 1.10091.1009\mathbf{1.1009}bold_1.1009 2.14472.1447\mathbf{2.1447}bold_2.1447 0.34890.3489\mathbf{0.3489}bold_0.3489 0.44180.4418\mathbf{0.4418}bold_0.4418 0.26930.2693\mathbf{0.2693}bold_0.2693 0.46320.4632\mathbf{0.4632}bold_0.4632
DCA-GCEGNN 0.3342 0.3554 0.2813 0.2872 0.5032 0.5868 0.3090 0.5419 0.5844 1.2960 0.2304 0.2836 0.1426 0.3172
Improvements -1.56% -3.29% -0.29% -0.91% -3.82% -7.38% 138% 175% 168% 232% 73.6% 184% 114% 136%
Table 5: Model Performance on Retailrocket. Best result is highlighted in boldface and the runner-up is underlined; ‘Improvements’ indicates the average relative enhancements achieved by our DCA-SBRSs over the corresponding SBRSs as Equation 7.
Model\\\backslash\Metric NDCG MRR HR ILD Entropy DS F-score
@10 @20 @10 @20 @10 @20 @10 @20 @10 @20 @10 @20 @10 @20
Item-KNN 0.1558 0.1634 0.1267 0.1289 0.2491 0.2777 0.6868 0.7954 1.2871 1.7206 0.3749 0.3822 0.1491 0.1979
BPR-MF 0.1244 0.1369 0.1037 0.1072 0.1915 0.2407 0.8106 0.8599 1.5023 1.8863 0.4077 0.3183 0.1391 0.1899
NARM 0.3625 0.3815 0.3138 0.3190 0.5181 0.5928 0.4860 0.5885 0.8698 1.2658 0.2767 0.2369 0.2475 0.3507
STAMP 0.3516 0.3688 0.3068 0.3115 0.4945 0.5624 0.5313 0.6563 0.9769 1.4613 0.3046 0.2739 0.2530 0.3642
GCE-GNN 0.39170.3917\mathbf{0.3917}bold_0.3917 0.41070.4107\mathbf{0.4107}bold_0.4107 0.34260.3426\mathbf{0.3426}bold_0.3426 0.34780.3478\mathbf{0.3478}bold_0.3478 0.54810.5481\mathbf{0.5481}bold_0.5481 0.62290.6229\mathbf{0.6229}bold_0.6229 0.3701 0.4525 0.6312 0.9139 0.2207 0.1744 0.2143 0.3044
MCPRN 0.2363 0.2501 0.2085 0.2123 0.3252 0.3799 0.7664 0.8432 1.4931 2.0162 0.4322 0.3852 0.2293 0.2930
NARM+MMR 0.3234 0.3413 0.2785 0.2834 0.4669 0.5375 0.6247 0.7436 1.1543 1.6684 0.3424 0.3073 0.2764 0.3863
IDSR(λ=0.5𝜆0.5\lambda=0.5italic_λ = 0.5) 0.2863 0.3116 0.2526 0.2596 0.3998 0.4996 1.19291.1929\mathbf{1.1929}bold_1.1929 1.0939 2.45732.4573\mathbf{2.4573}bold_2.4573 2.7566 0.67940.6794\mathbf{0.6794}bold_0.6794 0.5506 0.42740.4274\mathbf{0.4274}bold_0.4274 0.5093
DCA-NARM 0.3654 0.3804 0.3200 0.3241 0.5099 0.5688 0.7181 0.9328 1.3801 2.3049 0.4074 0.4618 0.3544 0.5053
DCA-STAMP 0.3362 0.3471 0.2929 0.2960 0.4726 0.5155 0.9061 1.12761.1276\mathbf{1.1276}bold_1.1276 1.8147 2.96132.9613\mathbf{2.9613}bold_2.9613 0.5230 0.61330.6133\mathbf{0.6133}bold_0.6133 0.3994 0.52570.5257\mathbf{0.5257}bold_0.5257
DCA-GCEGNN 0.3826 0.3985 0.3364 0.3408 0.5293 0.5921 0.5970 0.7813 1.1258 1.8713 0.3461 0.3754 0.3103 0.4533
Improvements -1.97% -3.05% -1.45% -1.80% -3.15% -5.78% 59.9% 67.7% 74.3% 96.5% 58.6% 111% 48.6% 45.8%
Table 6: Model Performance on Tmall. Best result is highlighted in boldface and the runner-up is underlined; ‘Improvements’ indicates the average relative enhancements achieved by our DCA-SBRSs over the corresponding SBRSs as Equation 7.
Model\\\backslash\Metric NDCG MRR HR ILD Entropy DS F-score
@10 @20 @10 @20 @10 @20 @10 @20 @10 @20 @10 @20 @10 @20
Item-KNN 0.03210.0321\mathbf{0.0321}bold_0.0321 0.03490.0349\mathbf{0.0349}bold_0.0349 0.02510.0251\mathbf{0.0251}bold_0.0251 0.02590.0259\mathbf{0.0259}bold_0.0259 0.0551 0.0655 0.8888 0.9593 1.6790 2.0452 0.4546 0.4219 0.0442 0.0573
BPR-MF 0.0096 0.0119 0.0069 0.0075 0.0186 0.0279 0.9963 1.0350 1.8716 2.3219 0.4852 0.3805 0.0168 0.0259
NARM 0.0244 0.0306 0.0174 0.0191 0.0476 0.0720 0.9453 1.0085 1.7760 2.2625 0.4689 0.3778 0.0386 0.0642
STAMP 0.0171 0.0215 0.0121 0.0133 0.0336 0.0511 1.0449 1.0959 2.0375 2.5806 0.5428 0.4494 0.0292 0.0481
GCE-GNN 0.0282 0.0355 0.0187 0.0207 0.05940.0594\mathbf{0.0594}bold_0.0594 0.08860.0886\mathbf{0.0886}bold_0.0886 0.8571 0.9326 1.5691 2.0340 0.4161 0.3345 0.0443 0.0744
MCPRN 0.0110 0.0142 0.0075 0.0084 0.0225 0.0354 1.0661 1.1042 2.1139 2.6437 0.5686 0.4679 0.0193 0.0326
NARM+MMR 0.0198 0.0249 0.0141 0.0154 0.0386 0.0592 1.0116 1.0634 1.9386 2.4437 0.5124 0.4155 0.0331 0.0548
IDSR(λ=0.8𝜆0.8\lambda=0.8italic_λ = 0.8) 0.0083 0.0114 0.0054 0.0063 0.0179 0.0303 1.31751.3175\mathbf{1.3175}bold_1.3175 1.2969 2.8725 3.4530 0.8108 0.6773 0.0192 0.0327
DCA-NARM 0.0145 0.0171 0.0106 0.0113 0.0272 0.0374 1.3096 1.34661.3466\mathbf{1.3466}bold_1.3466 2.93082.9308\mathbf{2.9308}bold_2.9308 3.89863.8986\mathbf{3.8986}bold_3.8986 0.85480.8548\mathbf{0.8548}bold_0.8548 0.85660.8566\mathbf{0.8566}bold_0.8566 0.0274 0.0402
DCA-STAMP 0.0164 0.0192 0.0122 0.0129 0.0304 0.0414 1.2720 1.3274 2.7719 3.7395 0.7919 0.7962 0.0311 0.0453
DCA-GCEGNN 0.0259 0.0329 0.0165 0.0185 0.0566 0.0843 0.9647 1.0464 1.8368 2.4230 0.4894 0.4205 0.04670.0467\mathbf{0.0467}bold_0.0467 0.07770.0777\mathbf{0.0777}bold_0.0777
Improvements. -17.6% -20.7% -16.7% -18.16% -19.03% -24.0% 24.3% 22.3% 39.4% 45.4% 48.6% 76.5% -5.70% -12.9%

5.1.1 Performance on Recommendation Accuracy

The accuracy of all approaches is measured via NDCG@N𝑁Nitalic_N, MRR@N𝑁Nitalic_N, and HR@N𝑁Nitalic_N (N={10,20}𝑁1020N=\{10,20\}italic_N = { 10 , 20 }) in Tables 4-6, where several observations are obtained as follows. 1) For traditional methods, Item-KNN outperforms BPR-MF across all three datasets. Both are generally defeated by the deep neural approaches, except for Item-KNN on Tmall. 2) Compared with our proposed framework, the existing accuracy-focused SBRSs come in first with the help of the neural network to learn more precise item embeddings and attention mechanism to denoise. Among them, GCE-GNN outperforms other methods on all three datasets, which demonstrates the expressive power of local current session graph and global session graph. 3) The accuracy of the aforementioned SBRSs is slightly decreased under our DCA framework, with few exceptions, such as DCA-NARM vs. NARM on Diginetica and Retailrocket. While, the perturbation (e.g., with 1.6%percent1.61.6\%1.6 % and 2%percent22\%2 % drops on average w.r.t. NDCG@10101010 on Diginetica and Retailrocket respectively) can be tolerated given our significant enhancements in diversity and comprehensive metrics, which will be elaborated in what follows. 4) Deep diversified SBRSs generally perform better than traditional methods whereas worse than the accuracy-oriented deep methods due to their special design for gaining higher diversity. In contrast to NARM, the accuracy of NARM+MMR drops significantly across three datasets. It’s worth noting that our DCA-SBRSs show a superior advantage over deep diversified methods, for instance, the performance of our DCA-SBRS is one time better than IDSR w.r.t. HR@10101010 on Tmall.

5.1.2 Performance on Recommendation Diversity

The diversity of all comparisons is measured via ILD@N𝑁Nitalic_N, Entropy@N𝑁Nitalic_N, and DS@N𝑁Nitalic_N (N={10,20}𝑁1020N=\{10,20\}italic_N = { 10 , 20 }) in Tables 4-6. Three major findings can be noted. 1) Existing SBRSs benefit significantly from our proposed DCA framework. For instance, averagely, across the three datasets, the relative improvements regarding diversity on ILD@10101010 achieved by our DCA-SBRSs over the corresponding SBRSs (e.g., DCA-NARM vs. NARM) can reach 138%percent138138\%138 %, 59.9%percent59.959.9\%59.9 %, and 24.3%percent24.324.3\%24.3 %, respectively. Besides, some of our DCA-SBRSs (e.g., DCA-STAMP) outperform all other methods (including the deep diversified models) on Diginetica and Tmall. 2) Towards diversified models, the performance of IDSR exceeds that of MCPRN on all three datasets. Meanwhile, all of them beat existing accuracy-oriented SBRSs (except MCPRN vs. STAMP on Diginetica), indicating the efficacy of these diversified methods in gaining better diversity. 3) Existing accuracy-oriented SBRSs perform worst due to ignoring the demands on diversity. Among them, STAMP performs best across all three datasets. Moreover, traditional methods (led by BPR-MF), though being surpassed by these accuracy-oriented SBRSs with regard to recommendation accuracy, perform slightly better when it comes to diversity.

5.1.3 Comprehensive Performance

To comprehensively assess the performance from both accuracy and diversity perspectives, we further compare them in terms of F-score@N𝑁Nitalic_N (N={10,20}𝑁1020N=\{10,20\}italic_N = { 10 , 20 }) in Tables 4-6, and several interesting findings can be gained. 1) Our proposed DCA-SBRSs perform the best among all baselines. Specifically, a quite encouraging phenomenon is observed that some of our DCA-SBRSs show effectiveness by defeating diversified models in terms of both accuracy and diversity (e.g., DCA-STAMP on Diginetica, DCA-NARM on Tmall and DCA-NARM vs. NARM+MMR on Diginetica and Retailrocket). Additionally, our framework also outperforms accuracy-oriented SBRSs with significant gains on diversity while only minor drops on accuracy. 2) Towards deep diversified models, IDSR achieves both better accuracy and diversity than MCPRN and NARM+MMR on Diginetica and Retailrocket, demonstrating the superiority of IDSR against MCPRN. 3) Typically, traditional methods perform worse than accuracy-oriented SBRSs. Comparing accuracy-oriented SBRSs and diversified SBRSs, the former performs better on Tmall, while worse on Diginetica. This is mainly caused by the calculation of the F-score (harmonic mean of HR and ILD). Due to the different features (e.g., distribution) of various datasets, the results achieved on different datasets regarding HR and ILD may vary a lot. For instance, the ILD values are generally higher than HR values on Diginetica, while the opposite case is held on Tmall. Therefore, the model achieving the best result w.r.t. the weaker metric (e.g., HR on Diginetica) will gain advantages regarding the comprehensive performance, i.e., F-score.

Interestingly, we notice that all methods perform worse regarding the recommendation accuracy whilst better w.r.t. diversity on Tmall compared with the other two datasets. This might be caused by the unique data distribution of Tmall, i.e., lower RR and higher DS in Table 2. Nevertheless, our proposed DCA still exceeds other diversified SBRSs, showing the stability of our DCA.

Table 7: Computational Time Comparison.
Model\Time Training(/epoch) Inference
Digi* Retail* Tmall Digi* Retail* Tmall
NARM 49s 68s 156s 560s 137s 2678s
NARM+MMR 49s 68s 156s 5173s 2082s 6075s
MCPRN 244s 3003s 1138s 2325s 1689s 3167s
IDSR 1486s 1646s 4604s 62s 29s 1928s
DCA-NARM 127s 159s 418s 8s 4s 134s
Note: Diginetica and Retailrocket are shortened as Digi* and Retail*.

5.1.4 Performance on Time Complexity

Following the discussion in Section 3.4, we empirically verify the efficiency of our lightweight DCA. As such, we record the training and inference time for representative methods, including NARM, NARM+MMR, DCA-NARM, MCPRN, IDSR and our DCA-NARM, across three datasets shown in Table 7. Two major findings are noted. 1) MMR is a re-ranking (two-stage) method by a greedy search for diversity-promoting based on the trained NARM from the first step (training stage). NARM+MMR hence has a substantially longer inference time than NARM. By contrast, our DCA+NARM accomplishes an end-to-end learning and avoids greedy search in the inference stage, thus being faster than NARM+MMR. 2) Unlike other diversified SBRSs (i.e., IDSR and MCPRN) relying on specifically calibrated diversity-aware components, our DCA framework performs effectively on both training and inference stages due to limited additional parameters.

5.1.5 Adaptation on F-score

We now discuss the drawbacks of the current comprehensive metric (F-score [hu2017diversifying]), and provide remedies accordingly. First, due to different scales of HR and ILD, the weaker metric may easily dominate the final comprehensive performance, particularly on Tmall in Table 6. Therefore, it is necessary to map the two metrics into the same range before calculating F-score. Alternatively, we may replace ILD with DS (Diversity Score [liang2021enhancing]) in F-score since HR and DS are in the same range of [0,1]01[0,1][ 0 , 1 ]. Second, a clear decline on accuracy is generally not acceptable in real-world recommendation scenarios. According to Tables 4-6, diversified models have apparent drops on accuracy due to the significant improvements on diversity. However, their comprehensive performance (i.e., F-score) is not the worst, even the best on Retailrocket (Table 5). That is to say, the current comprehensive performance does not match what is actually anticipated by the real-world applications. As such, we propose a generalized comprehensive metric F(ACCuracy,DIVersity)β{}_{\beta}(\text{ACCuracy},\text{DIVersity})start_FLOATSUBSCRIPT italic_β end_FLOATSUBSCRIPT ( ACCuracy , DIVersity ) to solve the aforementioned issue, as below:

Fβ(ACC, DIV)=(1+β2)ACC×DIVβ2ACC+DIV,subscriptF𝛽(ACC, DIV)1superscript𝛽2ACCDIVsuperscript𝛽2ACCDIV\text{F}_{\beta}\text{(ACC, DIV)}=\frac{(1+\beta^{2})\text{ACC}\times\text{DIV% }}{\beta^{2}\text{ACC}+\text{DIV}},F start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT (ACC, DIV) = divide start_ARG ( 1 + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ACC × DIV end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ACC + DIV end_ARG , (8)

where β>0𝛽0\beta>0italic_β > 0. Accordingly, the F-score [hu2017diversifying] can be regarded as a special case, i.e., F1(HR, ILD)subscriptF1(HR, ILD)\text{F}_{1}\text{(HR, ILD)}F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (HR, ILD). For a consistent range of ACC and DIV, we recommend Fβ(HR, DS)subscriptF𝛽(HR, DS)\text{F}_{\beta}\text{(HR, DS)}F start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT (HR, DS). Additionally, if accuracy is prioritized over diversity, we suggest β<1𝛽1\beta<1italic_β < 1, e.g., F0.5(HR, DS)subscriptF0.5(HR, DS)\text{F}_{0.5}\text{(HR, DS)}F start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT (HR, DS), to put more emphasis on accuracy since it is less meaningful to gain diversity without taking accuracy into account in real-world applications. Note that with the proposed Fβ(ACC, DIV)subscriptF𝛽(ACC, DIV)\text{F}_{\beta}\text{(ACC, DIV)}F start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT (ACC, DIV), our proposed DCA-SBRSs rank first thanks to the satisfying performance on accuracy and superior performance on diversity, as shown in Table 8. Specifically, on Retailrocket, the ranking of our DCA-SBRS improves w.r.t. N=10𝑁10N=10italic_N = 10, while diversified models (e.g., IDSR) experience a decline in ranking by changing β𝛽\betaitalic_β from 1111 to 0.50.50.50.5 due to its inferior accuracy performance.

Table 8: F-score vs. Adapted F-score on Retailrocket. Best result is highlighted in boldface and the runner-up is underlined; ‘Improvements’ indicates the average relative enhancements achieved by our DCA-SBRSs over the corresponding SBRSs as Equation 7.

Model\\\backslash\Metric F-score F0.50.5{}_{0.5}start_FLOATSUBSCRIPT 0.5 end_FLOATSUBSCRIPT(HR,ILD) F0.50.5{}_{0.5}start_FLOATSUBSCRIPT 0.5 end_FLOATSUBSCRIPT(HR,DS) @10 @20 @10 @20 @10 @20 Item-KNN 0.1491 0.1979 0.1618 0.2093 0.1542 0.1714 BPR-MF 0.1391 0.1899 0.1452 0.1980 0.1287 0.1418 NARM 0.2475 0.3507 0.2782 0.3960 0.2888 0.2938 STAMP 0.2530 0.3642 0.2790 0.4008 0.2859 0.3024 GCE-GNN 0.2143 0.3044 0.2439 0.3544 0.2820 0.2681 MCPRN 0.2293 0.2930 0.2393 0.3059 0.2193 0.2349 NARM+MMR 0.2764 0.3863 0.3013 0.4174 0.2871 0.3110 IDSR(λ=0.5𝜆0.5\lambda=0.5italic_λ = 0.5) 0.42740.4274\mathbf{0.4274}bold_0.4274 0.5093 0.4095 0.5011 0.3573 0.4142 DCA-NARM 0.3544 0.5053 0.3787 0.51800.5180\mathbf{0.5180}bold_0.5180 0.3447 0.4138 DCA-STAMP 0.3994 0.52570.5257\mathbf{0.5257}bold_0.5257 0.41220.4122\mathbf{0.4122}bold_0.4122 0.5128 0.36060.3606\mathbf{0.3606}bold_0.3606 0.43190.4319\mathbf{0.4319}bold_0.4319 DCA-GCEGNN 0.3103 0.4533 0.3391 0.4838 0.3269 0.3761 Improvements. 48.6% 45.8% 41.0% 31.8% 20.5% 41.3%

5.2 The Impact of Essential Modules

5.2.1 Impact of Model-agnostic Diversified Loss (abbr. MDL)

Our proposed MDL in Equation 1 aims to improve the diversity of accuracy-oriented SBRSs as an end-to-end plugin by punishing monotonous RL with low diversity. In Figure 4, we compare the accuracy-oriented SBRSs (labeled as ‘SBRSs’) and the corresponding variants with our MDL supplemented solely (labeled as ‘SBRSs+MDL’) w.r.t. ILD@10101010. Accordingly, by adding our MDL, the diversity of all baseline SBRSs significantly improves across the three datasets. Specifically, on Diginetica, Retailrocket, and Tmall, the average relative improvements are 100%percent100100\%100 %, 56.7%percent56.756.7\%56.7 %, and 30.3%percent30.330.3\%30.3 %, respectively. Besides, among the three selected baselines (NARM, STAMP, and GCE-GNN), MDL improves NARM most (i.e., 71.46%percent71.4671.46\%71.46 %).

It’s worth noting that, for simplicity, we set λ=1𝜆1\lambda=1italic_λ = 1 in Equation 3. To analyze the effect of MDL in a fine-grained manner, we select NARM as our basic predictor and vary the value of λ𝜆\lambdaitalic_λ from 0 to 1 stepped by 0.1. Figure 5 depicts the variation w.r.t. accuracy (i.e., NDCG and HR), diversity (i.e., ILD), and comprehensive performance (i.e., F-score) with varied λ𝜆\lambdaitalic_λ on the three datasets888For ease of presentation, we display the values of ‘ILD minus one’ (i.e., ILD-1) on Tmall to ensure all metrics in a proper scale without changing the overall trend.. As noted, the accuracy slightly decreases with the increasing of λ𝜆\lambdaitalic_λ on all three datasets; whilst a significant enhancements on diversity is noted on all datasets, showcasing the remarkable effectiveness of our MDL. Towards comprehensive performance, F-score climbs up when λ𝜆\lambdaitalic_λ varies from 0 to 1 on Diginetica and Retailrocket; whereas it has a slight decline on Tmall. The possible explanation can be found in Section 5.1.3. As a whole, the recommendation accuracy drops and diversity increases by boosting the value of λ𝜆\lambdaitalic_λ gradually. This indicates the necessity of fune-tuning λ𝜆\lambdaitalic_λ to achieve more satisfying performance.

Refer to caption
Figure 4: The Impact of MDL in Diversity w.r.t. ILD@10101010.
Refer to caption
Figure 5: The Impact of MDL for NARM+MDL with N=10𝑁10N=10italic_N = 10.

5.2.2 Impact of Non-invasive Category-aware Attention (abbr. NCA)

As indicated in Section 5.2.1, the recommendation accuracy of baseline SBRSs may slightly drops when integrating our designed MDL. To ease this issue, we propose category-aware attention (i.e., NCA) by importing category information into the pervasive attention mechanisms in SBRSs, with the goal of assisting item prediction. This differs from simply concatenating category information as the input of SBRSs. For verification, we compare accuracy-oriented SBRSs (labeled as ‘SBRSs’) and the corresponding variants by simply substituting the attention mechanism with our category-aware attention (labeled as ‘SBRSs+NCA’) on accuracy (i.e., NDCG@10), as depicted in Figure 6. In general, replacing the attention mechanism with our NCA facilitates the accuracy of SBRSs. Specifically, NCA helps NARM and GCE-GNN enhance their accuracy on all datasets. A similar trend is held by STAMP on Tmall; however, on the other two datasets, the accuracy of STAMP+NCA has not improved. That is perhaps due to the straightforward design of STAMP, which employs item embeddings directly rather than hidden states from RNNs or GNNs (e.g., NARM and GCE-GNN). As a result, STAMP+NCA simply sums item embeddings and the relevant category embeddings before computing attention scores, which may introduce more noise to interfere with the final item prediction.

There’s no denying that our DCA framework aids existing accuracy-oriented SBRSs in achieving extraordinary diversity and comprehensive performance gains while maintaining accuracy simultaneously, even without a thorough accuracy improvements for all SBRSs+NCA on all datasets as shown in Figure 6 (this may be caused by different features of datasets or designs of baseline predictors). Alternatively stated, the efficacy of our proposed framework does not rely on NCA only.

Refer to caption
Figure 6: The Impact of NCA in Accuracy w.r.t. NDCG@10101010.

5.3 Discussion on our proposed DCA framework

Our proposed Diversified Category-aware Attentive (DCA) framework comprises two key components: a model-agnostic diversity-oriented loss function and a non-invasive category-aware attention mechanism. To evaluate the efficacy of the DCA framework, we selected three deep neural methods with attention mechanisms as their backbone, as detailed in Section 4.2 and Section 5. Notably, these methods all rank among the top five SBRSs in terms of accuracy [under2023]. In the session-based evaluation survey [under2023], it is evident that all of the top-performing SBRSs in accuracy leverage attention mechanisms.

However, our DCA framework isn’t limited solely to attention-based models. Despite the original SBRS not making use of an attention mechanism, we demonstrate the seamless integration of this component for enhanced session representation. Specifically, we adopt GRU4Rec[hidasi2015session], an RNN-based SBRS without an attention mechanism, as our backbone model to showcase the effectiveness of our DCA framework in this context. As illustrated in Figure LABEL:fig:gru_dca, we compare GRU4Rec with two variants: GRU4Rec with an attention mechanism and DCA-GRU4Rec, considering accuracy, diversity, and comprehensive performance. In summary, GRU4Rec with an attention mechanism outperforms the baseline GRU4Rec in terms of accuracy but lags in terms of diversity. Our DCA-GRU4Rec, on the other hand, achieves similar accuracy to GRU4Rec with an attention mechanism while significantly enhancing diversity and delivering a satisfactory overall performance. This substantiates the effectiveness of our DCA framework when applied to backbone models without attention mechanisms.

In conclusion, our DCA framework is highly versatile and can be seamlessly integrated into common SBRSs, whether they incorporate attention mechanisms or not, consistently showcasing its effectiveness in enhancing recommendation system performance.