License: CC BY 4.0
arXiv:2403.15075v1 [cs.IR] 22 Mar 2024

Bilateral Unsymmetrical Graph Contrastive Learning for Recommendation
thanks: *Corresponding author

Anonymous Authors    1st Jiaheng Yu
4th Kai Zhu
School of Computer Science
Wuhan University
Wuhan, China
[email protected]
School of Computer Science
Wuhan University
Wuhan, China
[email protected]
   2nd **g Li*
5th Shuyi Zhang
School of Computer Science
Wuhan University
Wuhan, China
[email protected]
School of Computer Science
Wuhan University
Wuhan, China
[email protected]
   3rd Yue He
6th Wen Hu
School of Computer Science
Wuhan University
Wuhan, China
[email protected]
School of Artificial Intelligence
Wuchang University of Technology
Wuhan, China
[email protected]
Abstract

Recent methods utilize graph contrastive Learning within graph-structured user-item interaction data for collaborative filtering and have demonstrated their efficacy in recommendation tasks. However, they ignore that the difference relation density of nodes between the user- and item-side causes the adaptability of graphs on bilateral nodes to be different after multi-hop graph interaction calculation, which limits existing models to achieve ideal results. To solve this issue, we propose a novel framework for recommendation tasks called Bilateral Unsymmetrical Graph Contrastive Learning (BusGCL) that consider the bilateral unsymmetry on user-item node relation density for sliced user and item graph reasoning better with bilateral slicing contrastive training. Especially, taking into account the aggregation ability of hypergraph-based graph convolutional network (GCN) in digging implicit similarities is more suitable for user nodes, embeddings generated from three different modules: hypergraph-based GCN, GCN and perturbed GCN, are sliced into two subviews by the user- and item-side respectively, and selectively combined into subview pairs bilaterally based on the characteristics of inter-node relation structure. Furthermore, to align the distribution of user and item embeddings after aggregation, a dispersing loss is leveraged to adjust the mutual distance between all embeddings for maintaining learning ability. Comprehensive experiments on two public datasets have proved the superiority of BusGCL in comparison to various recommendation methods. Other models can simply utilize our bilateral slicing contrastive learning to enhance recommending performance without incurring extra expenses.

Index Terms:
Recommendation System, Hypergraph, Graph Contrastive Learning

I Introduction

Recommendation systems have found widespread application in diverse domains, including online retail platforms [6], social networking applications, and online multimedia websites [2], to aid users in navigating through the overwhelming amounts of information on the internet and discover items that align with their preferences. However, recommending tasks remains difficult because of the distinct structure of the data and the extremely sparse density of the user-item dataset.

Refer to caption

Figure 1: Illustration of difference in the relation density between bilateral nodes. (a) means that user nodes often have denser inter-node relationships than item nodes. (b) visualizes the different distribution of embeddings generated from a 1-layer LightGCN [5] on Yelp dataset by t-SNE. (c) counts the number of 2-hop neighbours, by normalizing them into a same total number.




Based graph convolutional network (GCN) methods for recommendation [8, 4] consider collaborative filtering as the fundamental architecture, that reduces the dimensionality of users and items by projecting observed interactions onto a low dimensional space for representation [1]. Further exploiting the hypergraph structures, hypergraphs show their potential in accurately representing more implicit high-order relationships within graph data [16]. Hypergraph-based recommenders [16] and [15] utilize hyperedges to encapsulate implicit high-order collaborative effects within user-item graphs. Recently, combining the self-supervised Graph Contrastive Learning (GCL) paradigm with strong effectiveness in resisting data sparsity. Subsequently, several recommendation models based on hypergraph structures fusing with GCL [16, 15, 25] have been proposed and led a promising development of recommendation. However, these symmetrical GCL-based models generally overlooked the inter-node pattern differences upon the interaction information from the perspectives of users and items, which could lead to the differences in probability distribution of the learned embeddings.

The difference in relation density of nodes between the user- and item-side is shown in Fig. 1 (a), which represents that user nodes often have denser inter-node relationships than item ones and are more inclined to organize groups of similarity. For example, after GCN reasoning, the different distributions of embeddings generated from a 1-layer LightGCN [5] on Yelp [20] by t-SNE(b) are visualized in Figure 1(b). Comparing the two figures, it can be seen that embeddings of users are more cohesive with more clear boundaries of groups. That means similar users generally have closer relationships. Then, different relational density brings different degrees of aggregation in graph structures after multi-hop interaction calculation. Before the 2-hop calculation, Fig. 1(c) shows counts of the number of 2-hop neighbors of the user node and the item node, that statistic is normalizing into the same total number. It shows that item nodes are concentrated in the left end representing fewer complex relationships, and user nodes are relatively balanced. Different relational density brings different degrees of aggregation in graph structures after multi-hop interaction calculation. In view of this, incorporating identical or highly similar graph structures without differentiated methods on both user-side and item-side takes no effective measure to address this situation, limiting improvement in recommending performance.

To tackle this limitation, we propose a novel framework for recommendation system, namely Bilateral Unsymmetrical Graph Contrastive Learning (BusGCL), which consider the bilateral unsymmetry on user-item node relation density for sliced user and item graph reasoning better with bilateral slicing contrastive learning. There is a multi-structure graph model to extract the sliced view of users using hypergraph-approach and of items using the perturbing-based GCN method, which allows for the construction of more effective and expressive contrastive views. In theoretical analysis, the hyperedges of hypergraphs tend to aggregate nodes with similar relation patterns, which is more suitable for user-side nodes with widespread inter-node similarity, while the features of item-side nodes are generally scattered. Thus, we adapt GCN with random noise perturbing to capture collaborative information on the item-side to generate contrastive views. Furthermore, in order to mitigate the over-smooth issue induced by the introduction of noise perturbing, we designed dispersing loss to balance it, thereby maintaining the learning ability of nodes to refine the collaborative relationship information during training. In summary, the contribution of this work is threefold:

  • We enhance the recommendation system by utilizing the bilateral unsymmetry of node density on the user- and item-side, and propose bilateral slicing contrastive learning which generates user and item subviews through different GCNs to reason better results.

  • We propose a multi-struct graph framework BusGCL considering the characteristics of different gcns. BusGCL provides guidance for other recommendation methods to utilize hypergraphs in user-side aggregation.

  • A dispersing loss is designed to alleviate the over-smoothing issue deteriorated by GCN, and it refines bilateral slicing contrastive training. The outperforming results on the experiments of different datasets illustrate the efficiency of our model.

II Related Work

II-A Recommendation methods

The fundamental premise underpinning numerous collaborative filtering models [7, 11, 12]. making recommendation for the target user by finding other users who are similar to the target user or other items that are similar to the target item. However, recent recommenders based on collaborative filtering extend to three different types: Graph Convolution Networks based Recommenders. NGCF [13] is a graph-based collaborative filtering method that integrates features of second-order interactions into the messages during the message-passing process. LightGCN [5] designs a lightweight graph convolution for training efficiency and generation ability with only adding neighborhood aggregation as a component. Hypergraph-based recommenders.HCCF [16] enhances GNN with hypergraph learning global dependency, and employs cross-view contrastive learning to capture both local and global collaborative relationships simultaneously. SHT [15] introduces transformer architecture into the hypergraph recommendation to improve recommendation performance. Self-Supervised Learning enhanced recommenders. SLRec [18] incorporates contrastive learning between features to regularize two augmented embeddings, in order to enhance the effectiveness of data augmentation based recommendation. And SGL [14] generates contrastive views through three different ways as node dropout, edge dropout and random walk to enhance recommendation performance through contrastive learning. Recently, SimGCL [22] refines the graph augmentation procedure within contrastive learning by directly incorporating noises taking values in hypersphere space randomly.

II-B Graph Contrastive Learning in Recommendation

Contrastive learning has attracted widespread attention in computer vision [24], which constructs positive and negative sample pairs through differences between views to provide a self-supervised solution to data sparsity problem [21, 23]. Inspired by contrastive learning, S3-Rec [27] firstly employs random mask on attributes, sequences and items, thereby generating sequence augmentations for the pre-training of sequential models through the maximization of mutual information with contrastive learning. Beyond the application of the dropout, CL4Rec [17] suggests the reordering and crop** of item segments for sequential data augmentation.

In addition to addressing the data sparsity problem, CLRec [26] has theoretically demonstrated that contrastive learning can also alleviate the exposure bias present in recommendations, and improve the depth of matching with respect to fairness and efficiency. SGL [14] employs node/edge drop techniques, coupled with random walk methods to generate positive instances. HCCF [16] uses hypergraph to generate contrastive view for high-order collaborative signals learning in intereaction graph and achieve notable success. However, these methods seldom consider the difference relation density of nodes between the user-side and item-side. It is still a problem that they deal with the whole embedding which contains different distributed user- and item embedding to get loss after the GCN aggregation, which limits the model not getting a great result.

III Methodology

Refer to caption

Figure 2: Illustration of BusGCL framework. An adjacency matrix which represents user-item interaction graph, passes through a Multi-structurally Graph Model that contains three variants of GCNs to form three embedding matrices. And three embedding matrices are sliced by both user-side and item-side for bilateral slicing contrastive learning before recommendation predictions.

III-A Overview

Perform mathematical expression in preliminaries, we represent the sets of users and items as 𝒰=u1,u2,,ui,(|𝒰|=I)𝒰subscript𝑢1subscript𝑢2subscript𝑢𝑖𝒰𝐼\mathcal{U}={{u_{1},u_{2},\cdots,u_{i}}},(|\mathcal{U}|=I)caligraphic_U = italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ( | caligraphic_U | = italic_I ) and 𝒱=v1,v2,,vj,(|𝒱|=J)𝒱subscript𝑣1subscript𝑣2subscript𝑣𝑗𝒱𝐽\mathcal{V}={{v_{1},v_{2},\cdots,v_{j}}},(|\mathcal{V}|=J)caligraphic_V = italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ( | caligraphic_V | = italic_J ), respectively. The interaction adjacency matrix 𝒜𝑹I×J𝒜superscript𝑹𝐼𝐽\mathcal{A}\in\bm{R}^{I\times{J}}caligraphic_A ∈ bold_italic_R start_POSTSUPERSCRIPT italic_I × italic_J end_POSTSUPERSCRIPT stores interaction history information between users 𝒰𝒰\mathcal{U}caligraphic_U and corresponding items 𝒱𝒱\mathcal{V}caligraphic_V. The value of each entry 𝒜i,jsubscript𝒜𝑖𝑗\mathcal{A}_{i,j}caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT in 𝒜𝒜\mathcal{A}caligraphic_A is designated as 1111 when there exists an interaction between the user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the item vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and 𝒜i,j=0subscript𝒜𝑖𝑗0\mathcal{A}_{i,j}=0caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 0 otherwise.

The comprehensive architecture of BusGCL is depicted in Figure 2. First, we get better user-item interaction representations from our multi-structurally graph Model. When inputting user-item interaction graph to the model, an adjacency matrix 𝒜𝒜\mathcal{A}caligraphic_A is constructed with aggregation, followed by the extraction of representations via three variants of GCNs to form three embedding matrices 𝑬(G),𝑬(P),𝑬(H)superscript𝑬𝐺superscript𝑬𝑃superscript𝑬𝐻\bm{E}^{(G)},\bm{E}^{(P)},\bm{E}^{(H)}bold_italic_E start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT , bold_italic_E start_POSTSUPERSCRIPT ( italic_P ) end_POSTSUPERSCRIPT , bold_italic_E start_POSTSUPERSCRIPT ( italic_H ) end_POSTSUPERSCRIPT. Then, we use bilateral slicing contrastive learning to realize great recommendation. There embedding matrices are sliced by both user-side and item-side, and are recombined into two unsymmetrical subview pairs for contrastive learning. The structure of embeddings is then constrained by dispersing loss, culminating in the generation of recommendation predictions.

III-B A Multi-structurally Graph Model for Recommendation

The Multi-structurally Graph Model is structurally consist of two steps: Obtaining based Adjacency Matrix and multi-structural graph reasoning.

Obtaining based Adjacency Matrix. Firstly, we normalize the adjacency matrix 𝒜𝒜\mathcal{A}caligraphic_A mentioned above which encapsulates the interaction relation between users and items, represented by the subsequent formula:

𝒜¯=𝑫(u)1/2𝒜𝑫(v)1/2,𝒜¯i,j=𝒜i,j|𝒩i||𝒩i|,formulae-sequence¯𝒜superscriptsubscript𝑫𝑢12𝒜superscriptsubscript𝑫𝑣12subscript¯𝒜𝑖𝑗continued-fractionsubscript𝒜𝑖𝑗subscript𝒩𝑖subscript𝒩𝑖\bar{\mathcal{A}}=\bm{D}_{(u)}^{-1/2}\cdot\mathcal{A}\cdot\bm{D}_{(v)}^{-1/2},% \bar{\mathcal{A}}_{i,j}=\cfrac{\mathcal{A}_{i,j}}{\sqrt{|\mathcal{N}_{i}|\cdot% |\mathcal{N}_{i}|}},over¯ start_ARG caligraphic_A end_ARG = bold_italic_D start_POSTSUBSCRIPT ( italic_u ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ⋅ caligraphic_A ⋅ bold_italic_D start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT , over¯ start_ARG caligraphic_A end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = continued-fraction start_ARG caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG | caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ⋅ | caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_ARG end_ARG , (1)

here, 𝑫(u)I×Isubscript𝑫𝑢superscript𝐼𝐼\bm{D}_{(u)}\in\mathbbm{R}^{I{\times}I}bold_italic_D start_POSTSUBSCRIPT ( italic_u ) end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_I × italic_I end_POSTSUPERSCRIPT,𝑫(v))J×J\bm{D}_{(v))}\in\mathbbm{R}^{J{\times}J}bold_italic_D start_POSTSUBSCRIPT ( italic_v ) ) end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_J × italic_J end_POSTSUPERSCRIPT represents degree matrices of users and items, respectively. 𝒩isubscript𝒩𝑖\mathcal{N}_{i}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents neighbouring item nodes of user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝒩jsubscript𝒩𝑗\mathcal{N}_{j}caligraphic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is similar. Then, to encode the pattern information of user-item interactions, we follow the classic conventional collaborative filtering paradigm, projecting the graph structure into a d𝑑ditalic_d-dimensional latent space. For user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and item vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we establish vectors 𝒆isubscript𝒆𝑖\bm{e}_{i}bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒆jsubscript𝒆𝑗\bm{e}_{j}bold_italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of size dsuperscript𝑑\mathbbm{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as embeddings, and define matrices combined by embeddings as 𝑬(u)I×dsuperscript𝑬𝑢superscript𝐼𝑑\bm{E}^{(u)}\in\mathbbm{R}^{I{\times}d}bold_italic_E start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_I × italic_d end_POSTSUPERSCRIPT and 𝑬(v)J×dsuperscript𝑬𝑣superscript𝐽𝑑\bm{E}^{(v)}\in\mathbbm{R}^{J{\times}d}bold_italic_E start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_J × italic_d end_POSTSUPERSCRIPT, respectively.

Multi-structural graph reasoning. We use a three-branch graph model to reasoning with inputted embedding 𝑬(u)superscript𝑬𝑢\bm{E}^{(u)}bold_italic_E start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT and 𝑬(v)superscript𝑬𝑣\bm{E}^{(v)}bold_italic_E start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT. The normal branch is the middle branch shown in Figure 2. To aggregate the local collaborative signals for each node from their neighbours, Simplified by LightGCN [5], we design an embedding propagation layer leveraging a lightweight graph convolutional network without non-linear activation functions. And the output embedding 𝑬l(G)={𝑬l(GU);𝑬l(GI)}superscriptsubscript𝑬𝑙𝐺superscriptsubscript𝑬𝑙𝐺𝑈superscriptsubscript𝑬𝑙𝐺𝐼\bm{E}_{l}^{(G)}=\{\bm{E}_{l}^{(GU)};\bm{E}_{l}^{(GI)}\}bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT = { bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_U ) end_POSTSUPERSCRIPT ; bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_I ) end_POSTSUPERSCRIPT } from l𝑙litalic_l-th layer network contains user part 𝑬l(GU)={𝜶1,l(u),𝜶2,l(u),𝜶i,l(u)|iI}superscriptsubscript𝑬𝑙𝐺𝑈conditional-setsuperscriptsubscript𝜶1𝑙𝑢superscriptsubscript𝜶2𝑙𝑢superscriptsubscript𝜶𝑖𝑙𝑢𝑖𝐼\bm{E}_{l}^{(GU)}=\{\bm{\alpha}_{1,l}^{(u)},\bm{\alpha}_{2,l}^{(u)},...\bm{% \alpha}_{i,l}^{(u)}|i\in I\}bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_U ) end_POSTSUPERSCRIPT = { bold_italic_α start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , bold_italic_α start_POSTSUBSCRIPT 2 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , … bold_italic_α start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT | italic_i ∈ italic_I } and item part 𝑬l(GI)={𝜶1,l(v),𝜶2,l(v),𝜶j,l(v)|jJ}superscriptsubscript𝑬𝑙𝐺𝐼conditional-setsuperscriptsubscript𝜶1𝑙𝑣superscriptsubscript𝜶2𝑙𝑣superscriptsubscript𝜶𝑗𝑙𝑣𝑗𝐽\bm{E}_{l}^{(GI)}=\{\bm{\alpha}_{1,l}^{(v)},\bm{\alpha}_{2,l}^{(v)},...\bm{% \alpha}_{j,l}^{(v)}|j\in J\}bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_I ) end_POSTSUPERSCRIPT = { bold_italic_α start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT , bold_italic_α start_POSTSUBSCRIPT 2 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT , … bold_italic_α start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT | italic_j ∈ italic_J }, which I𝐼Iitalic_I and J𝐽Jitalic_J denote the number of user and item respectively. This process can be described as follows:

𝜶i,l(u)=𝒜¯i,*𝑬l1(G),𝜶j,l(v)=𝒜¯*,j𝑬l1(G),formulae-sequencesuperscriptsubscript𝜶𝑖𝑙𝑢subscript¯𝒜𝑖superscriptsubscript𝑬𝑙1𝐺superscriptsubscript𝜶𝑗𝑙𝑣subscript¯𝒜𝑗superscriptsubscript𝑬𝑙1𝐺\bm{\alpha}_{i,l}^{(u)}=\bar{\mathcal{A}}_{i,*}\cdot\bm{E}_{l-1}^{(G)},\bm{% \alpha}_{j,l}^{(v)}=\bar{\mathcal{A}}_{*,j}\cdot\bm{E}_{l-1}^{(G)},bold_italic_α start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT = over¯ start_ARG caligraphic_A end_ARG start_POSTSUBSCRIPT italic_i , * end_POSTSUBSCRIPT ⋅ bold_italic_E start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT , bold_italic_α start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT = over¯ start_ARG caligraphic_A end_ARG start_POSTSUBSCRIPT * , italic_j end_POSTSUBSCRIPT ⋅ bold_italic_E start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT , (2)

where 𝜶i(u),𝜶j(v)dsuperscriptsubscript𝜶𝑖𝑢superscriptsubscript𝜶𝑗𝑣superscript𝑑\bm{\alpha}_{i}^{(u)},\bm{\alpha}_{j}^{(v)}\in\mathbbm{R}^{d}bold_italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , bold_italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT presents the aggregated collaborative information of centric nodes.

In order to refine multi-hop neighbours’ relation, we integrate multiple embedding propagation layers as a graph neural network. Combining residual connection to avoid gradient vanishing [3], we operate the Readout on different layers to get embeddings 𝒆i,l(u)𝑬¯l(GU)superscriptsubscript𝒆𝑖𝑙𝑢superscriptsubscriptbold-¯𝑬𝑙𝐺𝑈\bm{e}_{i,l}^{(u)}\in\bm{\bar{E}}_{l}^{(GU)}bold_italic_e start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ∈ overbold_¯ start_ARG bold_italic_E end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_U ) end_POSTSUPERSCRIPT and 𝒆j,l(v)𝑬¯l(GI)superscriptsubscript𝒆𝑗𝑙𝑣superscriptsubscriptbold-¯𝑬𝑙𝐺𝐼\bm{e}_{j,l}^{(v)}\in\bm{\bar{E}}_{l}^{(GI)}bold_italic_e start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT ∈ overbold_¯ start_ARG bold_italic_E end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_I ) end_POSTSUPERSCRIPT, in which embedding of l𝑙litalic_l-th layer is used to predict next user-item relation. Neighbour information transmission is towards the following formula:

𝒆i,l(u)=𝒆i,l1(u)+𝜶i,l(u),𝒆j,l(v)=𝒆j,l1(v)+𝜶j,l(v).formulae-sequencesuperscriptsubscript𝒆𝑖𝑙𝑢superscriptsubscript𝒆𝑖𝑙1𝑢superscriptsubscript𝜶𝑖𝑙𝑢superscriptsubscript𝒆𝑗𝑙𝑣superscriptsubscript𝒆𝑗𝑙1𝑣superscriptsubscript𝜶𝑗𝑙𝑣\bm{e}_{i,l}^{(u)}=\bm{e}_{i,l-1}^{(u)}+\bm{\alpha}_{i,l}^{(u)},\bm{e}_{j,l}^{% (v)}=\bm{e}_{j,l-1}^{(v)}+\bm{\alpha}_{j,l}^{(v)}.bold_italic_e start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT = bold_italic_e start_POSTSUBSCRIPT italic_i , italic_l - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT + bold_italic_α start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , bold_italic_e start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT = bold_italic_e start_POSTSUBSCRIPT italic_j , italic_l - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT + bold_italic_α start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT . (3)

The hyperedges of hypergraph [16] can connect any number of vertices, forming a similar effect of complete subgraphs with weighted attributes on the interaction graph, which is beneficial for aggregating non-adjacent but potentially similar nodes by leveraging hyperedges as intermediate hubs. In the top branch in Figure 2, based on the embedding result 𝑬l(G)superscriptsubscript𝑬𝑙𝐺\bm{E}_{l}^{(G)}bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT from GCN reasoning in middle branch, the output embedding 𝑬l(H)superscriptsubscript𝑬𝑙𝐻\bm{E}_{l}^{(H)}bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_H ) end_POSTSUPERSCRIPT which contains two hypergraphs is defined with H𝐻Hitalic_H hyperedges to represent users and items as (u)={γ1(u),γ2(u),γl(u)}I×Hsuperscript𝑢superscriptsubscript𝛾1𝑢superscriptsubscript𝛾2𝑢superscriptsubscript𝛾𝑙𝑢superscript𝐼𝐻\mathcal{H}^{(u)}=\{\gamma_{1}^{(u)},\gamma_{2}^{(u)},...\gamma_{l}^{(u)}\}\in% \mathbbm{R}^{I\times H}caligraphic_H start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT = { italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , … italic_γ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_I × italic_H end_POSTSUPERSCRIPT and (v)={γ1(v),γ2(v),γl(v)}J×Hsuperscript𝑣superscriptsubscript𝛾1𝑣superscriptsubscript𝛾2𝑣superscriptsubscript𝛾𝑙𝑣superscript𝐽𝐻\mathcal{H}^{(v)}=\{\gamma_{1}^{(v)},\gamma_{2}^{(v)},...\gamma_{l}^{(v)}\}\in% \mathbbm{R}^{J\times H}caligraphic_H start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT = { italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT , … italic_γ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_J × italic_H end_POSTSUPERSCRIPT. The progress is as follow:

𝜸l(u)=LeakyReLU((u)(u)𝑬l1(G)).superscriptsubscript𝜸𝑙𝑢𝐿𝑒𝑎𝑘𝑦𝑅𝑒𝐿𝑈superscript𝑢superscriptlimit-from𝑢topsuperscriptsubscript𝑬𝑙1𝐺\bm{\gamma}_{l}^{(u)}=LeakyReLU(\mathcal{H}^{(u)}\cdot\mathcal{H}^{(u)\top}% \cdot\bm{E}_{l-1}^{(G)}).bold_italic_γ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT = italic_L italic_e italic_a italic_k italic_y italic_R italic_e italic_L italic_U ( caligraphic_H start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ⋅ caligraphic_H start_POSTSUPERSCRIPT ( italic_u ) ⊤ end_POSTSUPERSCRIPT ⋅ bold_italic_E start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT ) . (4)

Hyper embeddings of items γl(v)superscriptsubscript𝛾𝑙𝑣\gamma_{l}^{(v)}italic_γ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT can be derived following a similar way, where (u)superscript𝑢\mathcal{H}^{(u)}caligraphic_H start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT and (v)superscript𝑣\mathcal{H}^{(v)}caligraphic_H start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT are substituted by a low-rank approximation to reduce computational cost, which is computed with l𝑙litalic_l-th readout embedding follows:

^l(u)=𝑬¯l(GU)𝑾(u),^l(v)=𝑬¯l(GI)𝑾(v),formulae-sequencesuperscriptsubscript^𝑙𝑢superscriptsubscriptbold-¯𝑬𝑙𝐺𝑈superscript𝑾𝑢superscriptsubscript^𝑙𝑣superscriptsubscriptbold-¯𝑬𝑙𝐺𝐼superscript𝑾𝑣{\mathcal{\hat{H}}_{l}^{(u)}}=\bm{\bar{E}}_{l}^{(GU)}\cdot\bm{W}^{(u)},% \mathcal{\hat{H}}_{l}^{(v)}=\bm{\bar{E}}_{l}^{(GI)}\cdot\bm{W}^{(v)},over^ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT = overbold_¯ start_ARG bold_italic_E end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_U ) end_POSTSUPERSCRIPT ⋅ bold_italic_W start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , over^ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT = overbold_¯ start_ARG bold_italic_E end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G italic_I ) end_POSTSUPERSCRIPT ⋅ bold_italic_W start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT , (5)

where 𝑾(u),𝑾(v)d×Hsuperscript𝑾𝑢superscript𝑾𝑣superscript𝑑𝐻\bm{W}^{(u)},\bm{W}^{(v)}\in\mathbbm{R}^{d{\times}H}bold_italic_W start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , bold_italic_W start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_H end_POSTSUPERSCRIPT are the parameter-learnable matrices representing the hyperedges for users and items.

In the bottom branch in Figure 2, following SimGCL [22], we adapt an another GCN which adds imperceptibly small perturbation. At each layer, the current embeddings are perturbed by a stochastic noise ΔΔ\Deltaroman_Δ, this corresponds to a numerical equivalence with points located on a hypersphere of a given radius r𝑟ritalic_r . Δ2=rsubscriptnormΔ2𝑟{||\Delta||}_{2}=r| | roman_Δ | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_r, and Δ=Δ¯sign(𝒆),Δ¯dU(0,1)formulae-sequenceΔdirect-product¯Δ𝑠𝑖𝑔𝑛𝒆¯Δsuperscript𝑑similar-to𝑈01\Delta=\bar{\Delta}{\odot}sign{({\bm{e})}},\bar{\Delta}\in\mathbbm{R}^{d}\sim{% U(0,1)}roman_Δ = over¯ start_ARG roman_Δ end_ARG ⊙ italic_s italic_i italic_g italic_n ( bold_italic_e ) , over¯ start_ARG roman_Δ end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∼ italic_U ( 0 , 1 ). Similar to the form of the above equation 2, the embeddings El(P)superscriptsubscript𝐸𝑙𝑃{E}_{l}^{(P)}italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_P ) end_POSTSUPERSCRIPT which is consist of El(PU)={β1,l(u),β2,l(u),βi,l(u)|iI}superscriptsubscript𝐸𝑙𝑃𝑈conditional-setsuperscriptsubscript𝛽1𝑙𝑢superscriptsubscript𝛽2𝑙𝑢superscriptsubscript𝛽𝑖𝑙𝑢𝑖𝐼{E}_{l}^{(PU)}=\{\beta_{1,l}^{(u)},\beta_{2,l}^{(u)},...\beta_{i,l}^{(u)}|i\in I\}italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_P italic_U ) end_POSTSUPERSCRIPT = { italic_β start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT 2 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , … italic_β start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT | italic_i ∈ italic_I } and 𝑬l(PI)={β1,l(u),β2,l(u),βj,l(u)|jJ}superscriptsubscript𝑬𝑙𝑃𝐼conditional-setsuperscriptsubscript𝛽1𝑙𝑢superscriptsubscript𝛽2𝑙𝑢superscriptsubscript𝛽𝑗𝑙𝑢𝑗𝐽\bm{E}_{l}^{(PI)}=\{{\beta}_{1,l}^{(u)},{\beta}_{2,l}^{(u)},...{\beta}_{j,l}^{% (u)}|j\in J\}bold_italic_E start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_P italic_I ) end_POSTSUPERSCRIPT = { italic_β start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT 2 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , … italic_β start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT | italic_j ∈ italic_J } obtained from l𝑙litalic_l-th layer in the perturbing-GCN follows:

𝜷i,l(u)=𝒜¯i,*𝑬l1(P)+Δ,𝜷j,l(v)=𝒜¯*,j𝑬l1(P)+Δ′′.formulae-sequencesuperscriptsubscript𝜷𝑖𝑙𝑢subscript¯𝒜𝑖superscriptsubscript𝑬𝑙1𝑃superscriptΔsuperscriptsubscript𝜷𝑗𝑙𝑣subscript¯𝒜𝑗superscriptsubscript𝑬𝑙1𝑃superscriptΔ′′\bm{\beta}_{i,l}^{(u)}=\bar{\mathcal{A}}_{i,*}\cdot\bm{E}_{l-1}^{(P)}+\Delta^{% \prime},\bm{\beta}_{j,l}^{(v)}=\bar{\mathcal{A}}_{*,j}\cdot\bm{E}_{l-1}^{(P)}+% \Delta^{\prime\prime}.bold_italic_β start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT = over¯ start_ARG caligraphic_A end_ARG start_POSTSUBSCRIPT italic_i , * end_POSTSUBSCRIPT ⋅ bold_italic_E start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_P ) end_POSTSUPERSCRIPT + roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT = over¯ start_ARG caligraphic_A end_ARG start_POSTSUBSCRIPT * , italic_j end_POSTSUBSCRIPT ⋅ bold_italic_E start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_P ) end_POSTSUPERSCRIPT + roman_Δ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT . (6)

III-C Bilateral Slicing Contrastive Learning

After a L𝐿Litalic_L-layer propagating progression of three kinds of GCN introduced above (GCN, GCN with perturbing and HyperGCN), we stack the outputs of each layer and obtain three matrices of embeddings with isomorphic structures and complementary semantics denote as 𝑬(G),𝑬(P),𝑬(H)(I+J)×d×Lsuperscript𝑬𝐺superscript𝑬𝑃superscript𝑬𝐻superscript𝐼𝐽𝑑𝐿\bm{E}^{(G)},\bm{E}^{(P)},\bm{E}^{(H)}\in\mathbbm{R}^{(I+J){\times}d{\times}L}bold_italic_E start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT , bold_italic_E start_POSTSUPERSCRIPT ( italic_P ) end_POSTSUPERSCRIPT , bold_italic_E start_POSTSUPERSCRIPT ( italic_H ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_I + italic_J ) × italic_d × italic_L end_POSTSUPERSCRIPT, respectively. These matrices can be recognized as views for contrastive learning because of the implicit supervising signals derived from subtle differences in latent space. Considering the inter-node distribution difference between bilateral nodes discussed in Section I, we slice each view into two subviews by side as illustrated in Figure 2.

TABLE I: Statistics of the Experimental Datasets
Datasets # Users # Items # Interactions  Density
Yelp 42712 26822 182357 1.6e4superscript𝑒4e^{-4}italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Last.FM 1892 17632 92834 2.8e3superscript𝑒3e^{-3}italic_e start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT

Considering that nodes on the user-side often have more relationship density, which means that user nodes have more similarity, we adopt a hypergraph structure equipped with hyperedges with node aggregation ability to model relation on this side. On the contrary, the number of neighbors of the item-side nodes is generally smaller, indicating that the features between items are relatively more scattered. On this side, we choose GCN with perturbing that combines noise disturbance and is better at learning differences between nodes. In general, we select the user-side subview from the Hypergraph-GCN 𝑬(HU)superscript𝑬𝐻𝑈\bm{E}^{(HU)}bold_italic_E start_POSTSUPERSCRIPT ( italic_H italic_U ) end_POSTSUPERSCRIPT and the item-side sub-view from GCN with perturbing 𝑬(PI)superscript𝑬𝑃𝐼\bm{E}^{(PI)}bold_italic_E start_POSTSUPERSCRIPT ( italic_P italic_I ) end_POSTSUPERSCRIPT to compare with the two sub-views from GCN 𝑬(GU),𝑬(GI)superscript𝑬𝐺𝑈superscript𝑬𝐺𝐼\bm{E}^{(GU)},\bm{E}^{(GI)}bold_italic_E start_POSTSUPERSCRIPT ( italic_G italic_U ) end_POSTSUPERSCRIPT , bold_italic_E start_POSTSUPERSCRIPT ( italic_G italic_I ) end_POSTSUPERSCRIPT, and use the InfoNCE [9] function to calculate the contrastive loss by layer as:

cl(U)=i=0Il=0Llogexp(sim(𝜶i,l(u),𝜸i,l(u))/τc)i=0Iexp(sim(𝜶i,l(u),𝜸i,l(u))/τc),superscriptsubscript𝑐𝑙𝑈superscriptsubscript𝑖0𝐼superscriptsubscript𝑙0𝐿𝑙𝑜𝑔𝑒𝑥𝑝𝑠𝑖𝑚superscriptsubscript𝜶𝑖𝑙𝑢superscriptsubscript𝜸𝑖𝑙𝑢subscript𝜏𝑐superscriptsubscriptsuperscript𝑖0𝐼𝑒𝑥𝑝𝑠𝑖𝑚superscriptsubscript𝜶𝑖𝑙𝑢superscriptsubscript𝜸superscript𝑖𝑙𝑢subscript𝜏𝑐\mathcal{L}_{cl}^{(U)}=\sum_{i=0}^{I}{\sum_{l=0}^{L}{-log\frac{exp(sim(\bm{% \alpha}_{i,l}^{(u)},\bm{\gamma}_{i,l}^{(u)})/\tau_{c})}{\sum_{i^{\prime}=0}^{I% }{exp(sim(\bm{\alpha}_{i,l}^{(u)},\bm{\gamma}_{i^{\prime},l}^{(u)})/\tau_{c})}% }}},caligraphic_L start_POSTSUBSCRIPT italic_c italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_U ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT - italic_l italic_o italic_g divide start_ARG italic_e italic_x italic_p ( italic_s italic_i italic_m ( bold_italic_α start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , bold_italic_γ start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ) / italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT italic_e italic_x italic_p ( italic_s italic_i italic_m ( bold_italic_α start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , bold_italic_γ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ) / italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_ARG , (7)
cl(I)=j=0Jl=0Llogexp(sim(𝜶j,l(v),𝜷j,l(v))/τc)j=0Jexp(sim(𝜶j,l(v),𝜷j,l(v))/τc),superscriptsubscript𝑐𝑙𝐼superscriptsubscript𝑗0𝐽superscriptsubscript𝑙0𝐿𝑙𝑜𝑔𝑒𝑥𝑝𝑠𝑖𝑚superscriptsubscript𝜶𝑗𝑙𝑣superscriptsubscript𝜷𝑗𝑙𝑣subscript𝜏𝑐superscriptsubscriptsuperscript𝑗0𝐽𝑒𝑥𝑝𝑠𝑖𝑚superscriptsubscript𝜶𝑗𝑙𝑣superscriptsubscript𝜷superscript𝑗𝑙𝑣subscript𝜏𝑐\mathcal{L}_{cl}^{(I)}=\sum_{j=0}^{J}{\sum_{l=0}^{L}{-log\frac{exp(sim(\bm{% \alpha}_{j,l}^{(v)},\bm{\beta}_{j,l}^{(v)})/\tau_{c})}{\sum_{j^{\prime}=0}^{J}% {exp(sim(\bm{\alpha}_{j,l}^{(v)},\bm{\beta}_{j^{\prime},l}^{(v)})/\tau_{c})}}}},caligraphic_L start_POSTSUBSCRIPT italic_c italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT - italic_l italic_o italic_g divide start_ARG italic_e italic_x italic_p ( italic_s italic_i italic_m ( bold_italic_α start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT ) / italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT italic_e italic_x italic_p ( italic_s italic_i italic_m ( bold_italic_α start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT ) / italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_ARG , (8)

where sim()𝑠𝑖𝑚sim(\cdot)italic_s italic_i italic_m ( ⋅ ) means the cosine similarity function, while utilizing a temperature coefficient τcsubscript𝜏𝑐\tau_{c}italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT for sensitivity in calculating contrastive learning loss, and L𝐿Litalic_L denotes the max number of convolutional layers.

III-D Dispersing Loss

Introducing noise in GCN with perturbing without limitations will cause the distribution of embedded features to tend towards over equilibrium, resulting in having relatively small distances in the latent space gradually, further blurring already subtle differences between nodes and exacerbating the phenomenon of over-smoothing. To this end and inspired by InfoNCE which has the ability to push away negative samples in vector space, we introduce a variation of infoNCE as a metric loss function to constrain the distance of embeddings. Treating all the other vectors of a single embedding matrix as negative samples, making contrastive learning with its own view, and achieving the dispersion effect that all embeddings are gradually dispersed to maintain sufficient distance for learning knowledge. We apply this constraint on the readout of the matrices obtained by GCN:

disp=k=0I+Jlogexp(sim(𝑹k,𝑹k)/τd)k=0I+Jexp(sim(𝑹k,𝑹k)/τd),subscript𝑑𝑖𝑠𝑝superscriptsubscript𝑘0𝐼𝐽𝑙𝑜𝑔𝑒𝑥𝑝𝑠𝑖𝑚subscript𝑹𝑘subscript𝑹𝑘subscript𝜏𝑑superscriptsubscriptsuperscript𝑘0𝐼𝐽𝑒𝑥𝑝𝑠𝑖𝑚subscript𝑹𝑘subscript𝑹superscript𝑘subscript𝜏𝑑\mathcal{L}_{disp}=\sum_{k=0}^{I+J}{-log\frac{exp(sim(\bm{R}_{k},\bm{R}_{k})/% \tau_{d})}{\sum_{k^{\prime}=0}^{I+J}{exp(sim(\bm{R}_{k},\bm{R}_{k^{\prime}})/% \tau_{d})}}},caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_s italic_p end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I + italic_J end_POSTSUPERSCRIPT - italic_l italic_o italic_g divide start_ARG italic_e italic_x italic_p ( italic_s italic_i italic_m ( bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_τ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I + italic_J end_POSTSUPERSCRIPT italic_e italic_x italic_p ( italic_s italic_i italic_m ( bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_R start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) / italic_τ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG , (9)

where 𝑹𝑹\bm{R}bold_italic_R represents the result E¯l(G)superscriptsubscript¯𝐸𝑙𝐺{\bar{E}}_{l}^{(G)}over¯ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_G ) end_POSTSUPERSCRIPT from the readout layer.

TABLE II: Overall performance comparison in terms of Recall and NDCG on three datasets, where the best-performing results under each metric are shown in bold, while the second best results are highlighted with underlines.
Methods Yelp Last.FM
Recall@20 \uparrow Recall@40 \uparrow NDCG@20 \uparrow NDCG@40 \uparrow Recall@20 \uparrow Recall@40 \uparrow NDCG@20 \uparrow NDCG@40 \uparrow
NGCF [13] 0.0681 0.1019 0.0336 0.0419 0.2081 0.2944 0.1474 0.1829
LightGCN [5] 0.0761 0.1175 0.0373 0.0474 0.2349 0.3220 0.1704 0.2022
HCCF [16] 0.0789 0.1185 0.0399 0.0496 0.2410 0.3232 0.1773 0.2051
SHT [15] 0.0794 0.1217 0.0395 0.0497 0.2420 0.3235 0.1770 0.2055
SLRec [19] 0.0665 0.1032 0.0327 0.0418 0.1957 0.2792 0.1442 0.1737
SGL [14] 0.0803 0.1226 0.0398 0.0502 0.2427 0.3405 0.1761 0.2104
SimGCL [22] 0.0813 0.1230 0.0408 0.0510 0.2398 0.3337 0.1780 0.2099
BusGCL 0.0840 0.1263 0.0424 0.0528 0.2437 0.3318 0.1796 0.2095

III-E Model Training

Bayesian Personalized Ranking (BPR) loss is commonly uesd for primary recommending prediction follows:

rec=(u,v+,v)Ωlogσ(y^uv+y^uv),subscript𝑟𝑒𝑐subscript𝑢superscript𝑣superscript𝑣Ω𝜎subscript^𝑦𝑢superscript𝑣subscript^𝑦𝑢superscript𝑣\mathcal{L}_{rec}=\sum_{{(u,v^{+},v^{-})\in\Omega}}{-\log{\sigma(\hat{y}_{uv^{% +}}-\hat{y}_{uv^{-}})}},caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_c end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT ( italic_u , italic_v start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_v start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ∈ roman_Ω end_POSTSUBSCRIPT - roman_log italic_σ ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_u italic_v start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_u italic_v start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , (10)

where Ω={(u,v+,v)|(u,v+)Ω+,(u,v)Ω}Ωconditional-set𝑢superscript𝑣superscript𝑣formulae-sequence𝑢superscript𝑣superscriptΩ𝑢superscript𝑣superscriptΩ\Omega=\{(u,v^{+},v^{-})|{(u,v^{+})}\in\Omega^{+},{(u,v^{-})}\in\Omega^{-}\}roman_Ω = { ( italic_u , italic_v start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_v start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) | ( italic_u , italic_v start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) ∈ roman_Ω start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , ( italic_u , italic_v start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ∈ roman_Ω start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT } represents the training set of triplet data, where Ω+superscriptΩ\Omega^{+}roman_Ω start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT denotes observed interactions and ΩsuperscriptΩ\Omega^{-}roman_Ω start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT denotes the unobserved ones. y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG indicates the users’ preference score for items.

Totally for model training, λcsubscript𝜆𝑐\lambda_{c}italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT,λdsubscript𝜆𝑑\lambda_{d}italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are hyperparameters that respectively control the strengths of contrastive learning, embedding dispersion and original prediction. In all, the performance of recommendation predictions are updated by optimizing this loss function:

=rec+λc(cl(U)+cl(I))+λddisp+λrΘF2,subscript𝑟𝑒𝑐subscript𝜆𝑐superscriptsubscript𝑐𝑙𝑈superscriptsubscript𝑐𝑙𝐼subscript𝜆𝑑subscript𝑑𝑖𝑠𝑝subscript𝜆𝑟superscriptsubscriptnormΘ𝐹2\mathcal{L}=\mathcal{L}_{rec}+\lambda_{c}(\mathcal{L}_{cl}^{(U)}+\mathcal{L}_{% cl}^{(I)})\\ +\lambda_{d}\mathcal{L}_{disp}+\lambda_{r}{||\Theta||}_{F}^{2},caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_c end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_L start_POSTSUBSCRIPT italic_c italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_U ) end_POSTSUPERSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_c italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT ) + italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_s italic_p end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | | roman_Θ | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (11)

where ΘF2superscriptsubscriptnormΘ𝐹2{||\Theta||}_{F}^{2}| | roman_Θ | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denotes an L2 regularization term with a low weight λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

IV Experiments

IV-A Experimental Settings

Evaluation Datasets

For convincing results, we have conducted experiments using two widely recognized real-world datasets: Yelp111https://www.yelp.com/dataset and Last.FM222https://www.last.fm/api. The parameter details of these datasets are presented in Table I.

Yelp: A commonly utilized dataset encapsulates users’ rating interaction collected on the Yelp platform, which allows users to share their check-ins about local venues.

Last.FM: A dataset collected from an online music radio platform, containing information such as tagging, social networking, and music preferences, etc.

Evaluation Metrics

We employ two widely-used metrics to assess the prediction accuracy of all implemented methods: Recall@N and Normalized Discounted Cumulative Gain NDCG@N, which are computed by the all-ranking protocol [5]. Recall@N quantifies the correctness of identifying items within top-N list derived from ground truth, and NDCG@N gives a higher score to better ranking positions.

Hyperparameter Settings

For model inference, we optimize the learning process by employing the Adam optimizer and set the learning rate to 1e31superscript𝑒31e^{-3}1 italic_e start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, and the decay ratio to 0.960.960.960.96. Dimensionalities of bilateral embeddings are configured as 32. In our experiments, we adjust the quantity of hyperedges of models combined with hypergraph structure are set following the original paper. The regularization weights λc,λdsubscript𝜆𝑐subscript𝜆𝑑\lambda_{c},\lambda_{d}italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are taking values from the range {1e2,1e1,1,10,100}1superscript𝑒21superscript𝑒1110100{\{1e^{-2},1e^{-1},1,10,100\}}{ 1 italic_e start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 1 italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , 1 , 10 , 100 } for loss balance. The temperature parameters τcsubscript𝜏𝑐\tau_{c}italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and τdsubscript𝜏𝑑\tau_{d}italic_τ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT are searched from the set {1e2,1e1,1,10}1superscript𝑒21superscript𝑒1110\{{1e^{-2},1e^{-1},1,10}\}{ 1 italic_e start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 1 italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , 1 , 10 } to regulate the intensity of the gradients in our contrastive learning process. The number of convolutional layers that are set to three as best.

IV-B Recommandation Performance

We evaluate the effectiveness of BusGCL on a unified SSL recommendation framework called SSLRec [10], which achieves a more fair performance evaluation based on unified source data processing and sampling. The results are summarized in Table II. Due to taking self-supervising training to fill the data gap in recommendation, some methods such as SGL [14] and SimGCL [22], outperform earlier approaches like NGCF [13] and LightGCN [5] with metric Recall@40 and NDCG@40 in Last.FM dataset especially. However, when comparing with all baselines, BusGCL not only achieves higher performance in two datasets but also demonstrates superiority over other existing methods in the Yelp dataset. Due to pairing subviews considerately, BusGCL can gather structural-similar user nodes properly and fit the relatively dispersed item nodes.

TABLE III: Ablation study on different BusGCL variants with/without dispersing loss. The metrics use Recall@20 and NDCG@20. “Disper.” is short for dispersing loss.
Variants Disper. Yelp Last.FM
Recall NDCG Recall NDCG
BusGCLper𝑝𝑒𝑟{}_{per}start_FLOATSUBSCRIPT italic_p italic_e italic_r end_FLOATSUBSCRIPT × 0.0713 0.0355 0.2279 0.1711
0.0781 0.0388 0.2295 0.1718
BusGCLhyp𝑦𝑝{}_{hyp}start_FLOATSUBSCRIPT italic_h italic_y italic_p end_FLOATSUBSCRIPT × 0.0822 0.0415 0.2342 0.1742
0.0824 0.0417 0.2329 0.1728
BusGCLrev𝑟𝑒𝑣{}_{rev}start_FLOATSUBSCRIPT italic_r italic_e italic_v end_FLOATSUBSCRIPT × 0.0817 0.0408 0.2236 0.1663
0.0827 0.0416 0.2234 0.1662
BusGCL × 0.0824 0.0417 0.2436 0.1796
0.0840 0.0424 0.2439 0.1797

IV-C Ablation Experiment

To investigate the impact of different selections about subview and dispersing loss, we conduct the results of the BusGCL on two datasets for three subview selections with/without dispersing loss. The performance of each variant is shown in Table III. Variants named XhypXpersubscript𝑋𝑦𝑝subscript𝑋𝑝𝑒𝑟X_{hyp}\/X_{per}italic_X start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_p italic_e italic_r end_POSTSUBSCRIPT refers to that the two slices of contrastive views are both from hypergraph-/perturbing-GCN, and Xrevsubscript𝑋𝑟𝑒𝑣X_{rev}italic_X start_POSTSUBSCRIPT italic_r italic_e italic_v end_POSTSUBSCRIPT means the selection of slices is reversed to BusGCL. Variants marked w/o disp means that the dispersing loss module is disabled. The Analysis is separate as follows:

IV-C1 Ablation of subview combination

By comparing all kinds of combinations of contrastive views generated from hypergraph- and perturbing-GCN, BusGCLhyp𝑦𝑝{}_{hyp}start_FLOATSUBSCRIPT italic_h italic_y italic_p end_FLOATSUBSCRIPT and BusGCLrev𝑟𝑒𝑣{}_{rev}start_FLOATSUBSCRIPT italic_r italic_e italic_v end_FLOATSUBSCRIPT gains close result to BusGCL, which increases accuracy by 0.5%percent0.50.5\%0.5 % than BusGCLper𝑝𝑒𝑟{}_{per}start_FLOATSUBSCRIPT italic_p italic_e italic_r end_FLOATSUBSCRIPT. And it is obvious that BusGCL has the best performance because of its tendency to seek aggregation on the user-side by hypergraph-GCN and retain differences on the item-side thought perturbing-GCN, which better reflects the real-world situation.

IV-C2 Ablation of dispersing loss

Observing the impact of dispersing constrain on related subview combinations vertically, it is explicit that dispersing loss has a more significant improvement on recommendation performance where the combining view is generated by perturbing-GCN like BusGCLper𝑝𝑒𝑟{}_{per}start_FLOATSUBSCRIPT italic_p italic_e italic_r end_FLOATSUBSCRIPT. This fact proves the effectiveness of dispersing in overcoming the over-smooth problem among nodes caused by the introduction of random noise.

Refer to caption

Figure 3: The effect of GCN layers, the data of the two images were measured on the Yelp and Last.FM datasets, respectively.

IV-D Further Visualization and Analysis

IV-D1 The influence on hyperparameter of GCN layers

To analyze the impact of varying the number of GCN layers, we initialize it with values in the set {1,2,3,4,5}12345\{1,2,3,4,5\}{ 1 , 2 , 3 , 4 , 5 }. The outcomes can be visualized in Figure 3. Upon reaching a total of 3333 GCN layers, the model attains its most optimal performance on the Yelp dataset, and on Last.FM, models with 2222 or 3333 layers both have relatively good performance. As the number of layers continues to increase, the performance on both datasets decreases, the phenomenon we attribute to the prevalent issue of over-smoothing positively correlating with model depth.

IV-D2 The impact of hyperparameters of dispersing loss

Figure 4 shows the synergistic effect of temperature coefficient τdsubscript𝜏𝑑\tau_{d}italic_τ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and weight hyperparameter λdsubscript𝜆𝑑\lambda_{d}italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT about dispersing loss. The strength of the temperature coefficient τdsubscript𝜏𝑑\tau_{d}italic_τ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT of dispsubscript𝑑𝑖𝑠𝑝\mathcal{L}_{disp}caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_s italic_p end_POSTSUBSCRIPT affects the constraint effect on the relation between embeddings by controlling the smoothness of logit. It can be observed that the optimal performance is attained when τd=1.0subscript𝜏𝑑1.0\tau_{d}=1.0italic_τ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 1.0. A lower τdsubscript𝜏𝑑\tau_{d}italic_τ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT may lead to over-sensitivity of the disperse loss function, which causes the performance degradation. On the other hand, when the value of λdsubscript𝜆𝑑\lambda_{d}italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is set to 1, the model’s recommendation performance reaches its peak.

Refer to caption

Figure 4: Influence of weight and temperature coefficient about dispersing loss on recommendation performance on Yelp dataset, that measurement by Recall@20 is presented in the left image and NDCG@20 on the right.

Refer to caption

Figure 5: Visualization of distribution from different stages of BusGCL training by t-SNE on Yelp dataset. (a) shows embeddings after GCN process and readout directly. (b) is after bilateral subview CL. (c) adds dispersing loss. Additionally, (d) and (e) illustrate the situation when subviews on both sides are selected from hyperGCN/GCN with perturbing.

IV-D3 Visualization results

We visualize the distribution of embeddings during the training process by t-SNE to seek depth analysis of how each module improves performance and their specific role in the embedded training process. As shown in Figure 5, from graph (a) to (b), the blank areas between groups are diffused, which means CL makes the embedding distribution more uniform. This is because CL can find more implicit similarities between nodes, thereby reducing the gap between groups. Comparing (b) and (c), we can observe that on the relatively balanced distribution, several larger group structures have been formed, and separate the dense embeddings at the center to a certain extent. This means the dispersing loss helps to preserve differences between similar embeddings by generally pushing away all the other embeddings to more manifest subtle collaborative signals.

By observing graph (d) and (e), it’s not difficult to find that distributions of two-side embeddings produced by CL with symmetrical GCN structures are more similar in appearance, which makes it difficult to retain the inherent structural differences between the users and items. The final embedding visualization of our model is more suitable for data distribution in real world compared to embeddings of previous models.

IV-D4 The impact of user-item model selection with different graph augmentations

To deeply analyze the rationality of using hyperGCN and GCN with perturbing as two graph augmentation methods in BusGCL, we compared multiple popular graph augmentations which are following [14] on the subview-CL mechanism in BusGCL, such as edge dropout, node dropout and random walk. Based on the experimental results in Table IV, we can draw the following conclusion: Compared to other GCN models, the node drop-based model has fine performance. On the contrary, the effects of edge drop or random walk are not ideal. We believe that masking the adjacency matrix cannot guarantee the same distribution of the augmented graph and the original graph. Especially when calculating multi-hop relationships, this deviation will be amplified. The GCN combination of hypergraphs and perturbing we selected has the best experimental performance, which verifies their aforementioned adaptability on user- and item-side, respectively. Our model takes the bilateral inter-node characteristics into account when selecting graph augmentation methods, and makes efforts in preserves the differences between users and items.

TABLE IV: The impact of different GCN model selections with different graph augmentations on Yelp dataset. “Hyper.” means HyperGCN.
User model Item model Recall@20 NDCG@20
Hyper. Perturb 0.0840 0.0424
Hyper. Node drop 0.0819 0.0419
Hyper. Edge drop 0.0509 0.0266
Hyper. Random walk 0.0655 0.0336
Node drop Perturb 0.0820 0.0410
Edge drop Perturb 0.0429 0.0220
Random walk Perturb 0.0594 0.0302

IV-D5 The effect of dispersing loss compare to others

To further verify the suitability of dispersing loss in the BusGCL framework, we attempt to replace the dispersing loss functions with another Kullback-Leibler divergence based loss function that can constrain the embedding probability distribution and experiment with its effectiveness. The results with optimized loss weights link to Table V. Specifically, we calculate the KL divergence of the bilateral embedding matrix for uniformly distributed matrices of the same shape as the loss value.

Compared with the variant without additional losses, the addition of KL divergence and dispersing loss both results in an increase in performance, indicating the significance of constraining the embedding distribution, that making the embedding more evenly distributed in the vector space does indeed help to improve the ability of contrastive learning. Our dispersing loss outperforms KL-divergence, indicating the embedding constraints based on positive and negative sample metrics are more suitable in the BusGCL framework. Essentially, dispersing loss is a trick used in the model training process, and its involvement enhances the ability of learning implicit collaborative relationship between embeddings.

TABLE V: The impact of alternative loss selections for mutual training in user-item reasoning.
Loss Yelp Last.FM
Recall@20 NDCG@20 Recall@20 NDCG@20
Dispersing loss 0.0840 0.0424 0.2437 0.1796
KL-divergence 0.0830 0.0421 0.2425 0.1794
No loss 0.0824 0.0417 0.2402 0.1771

V Conclusion

In this paper, we improve the recommendation system by utilizing the bilateral unsymmetry of node density on the user- and item-side, and propose bilateral slicing contrastive learning which generates user and item subviews to reason better results. Then we propose a multi-struct graph framework BusGCL, which considers the characteristics of different GCNs and from which we select bilateral subviews to match the relation density difference between two sides. For training, a dispersing loss is designed to alleviate the over-smoothing issue deteriorated by GCN. Overall experiments and extensive studies validate the superiority of our proposed framework BusGCL towards classic and competitive baselines because of its adaptability to real-world data. In future work, we may explore adaptive methods for contrastive view generation to expand the universality of graph contrastive learning for more specific recommendation tasks.

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant 62372335.

\true

moreauthor \truemorelabelname

References

  • [1] Xin Dong et al. “A hybrid collaborative filtering model with deep structure for recommender systems” In Proceedings of the AAAI Conference on artificial intelligence 31.1, 2017
  • [2] Qingyu Guo et al. “A survey on knowledge graph-based recommender systems” In IEEE Transactions on Knowledge and Data Engineering 34.8 IEEE, 2020, pp. 3549–3568
  • [3] Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun “Deep residual learning for image recognition” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
  • [4] Xiangnan He, Hanwang Zhang, Min-Yen Kan and Tat-Seng Chua “Fast matrix factorization for online recommendation with implicit feedback” In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 549–558
  • [5] Xiangnan He et al. “Lightgcn: Simplifying and powering graph convolution network for recommendation” In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020, pp. 639–648
  • [6] Chao Huang et al. “Online purchase prediction via multi-scale modeling of behavior dynamics” In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2613–2622
  • [7] Joseph A Konstan et al. “Grouplens: Applying collaborative filtering to usenet news” In Communications of the ACM 40.3 ACM New York, NY, USA, 1997, pp. 77–87
  • [8] Andriy Mnih and Russ R Salakhutdinov “Probabilistic matrix factorization” In Advances in neural information processing systems 20, 2007
  • [9] Aaron van den Oord, Yazhe Li and Oriol Vinyals “Representation learning with contrastive predictive coding” In arXiv preprint arXiv:1807.03748, 2018
  • [10] Xubin Ren et al. “SSLRec: A Self-Supervised Learning Library for Recommendation” In arXiv preprint arXiv:2308.05697, 2023
  • [11] Paul Resnick et al. “Grouplens: An open architecture for collaborative filtering of netnews” In Proceedings of the 1994 ACM conference on Computer supported cooperative work, 1994, pp. 175–186
  • [12] Badrul Sarwar, George Karypis, Joseph Konstan and John Riedl “Item-based collaborative filtering recommendation algorithms” In Proceedings of the 10th international conference on World Wide Web, 2001, pp. 285–295
  • [13] Xiang Wang et al. “Neural graph collaborative filtering” In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, 2019, pp. 165–174
  • [14] Jiancan Wu et al. “Self-supervised graph learning for recommendation” In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, 2021, pp. 726–735
  • [15] Lianghao Xia, Chao Huang and Chuxu Zhang “Self-supervised hypergraph transformer for recommender systems” In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 2100–2109
  • [16] Lianghao Xia et al. “Hypergraph contrastive collaborative filtering” In Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval, 2022, pp. 70–79
  • [17] Xu Xie et al. “Contrastive learning for sequential recommendation” In 2022 IEEE 38th international conference on data engineering (ICDE), 2022, pp. 1259–1273 IEEE
  • [18] Tiansheng Yao et al. “Self-supervised learning for deep models in recommendations” In arXiv preprint arXiv:2007.12865, 2020
  • [19] Tiansheng Yao et al. “Self-supervised learning for large-scale item recommendations” In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 4321–4330
  • [20] Hongzhi Yin et al. “Social influence-based group representation learning for group recommendation” In 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, pp. 566–577 IEEE
  • [21] Junliang Yu et al. “Adaptive implicit friends identification over heterogeneous network for social recommendation” In Proceedings of the 27th ACM international conference on information and knowledge management, 2018, pp. 357–366
  • [22] Junliang Yu et al. “Are graph augmentations necessary? simple graph contrastive learning for recommendation” In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, 2022, pp. 1294–1303
  • [23] Junliang Yu et al. “Generating reliable friends via adversarial training to improve social recommendation” In 2019 IEEE international conference on data mining (ICDM), 2019, pp. 768–777 IEEE
  • [24] Junliang Yu et al. “Self-supervised learning for recommender systems: A survey” In IEEE Transactions on Knowledge and Data Engineering IEEE, 2023
  • [25] Junliang Yu et al. “Self-supervised multi-channel hypergraph convolutional network for social recommendation” In Proceedings of the web conference 2021, 2021, pp. 413–424
  • [26] Chang Zhou et al. “Contrastive learning for debiased candidate generation in large-scale recommender systems” In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 3985–3995
  • [27] Kun Zhou et al. “S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization” In Proceedings of the 29th ACM international conference on information & knowledge management, 2020, pp. 1893–1902