\headevenname\zihao

-5Big Data Mining and Analytics, January 2018, 1(1): 000-000\headoddnameAnxin Yang et al.:A Survey of Reasoning for Substitution Relationships: Defnitions, Methods, and Directions

\zihao

5- BIG DATA MINING AND ANALYTICS

\zihao

5- I S S N 22 2 0 9 6 - 0 6 54   l l 0 ? / ? ? l l p p ? ? ?– ? ? ?

\zihao

5- V o l u m e  1 ,   N u m b e r  1,   J a n u a  r y  2 0 1 8

\zihao

5- V o l u m e 1, N u m b e r 1, S e p t e m b e lr 2 0 1 8

{strip}\zihao

3 A Survey of Reasoning for Substitution Relationships: Definitions, Methods, and Directions

\zihao

5 Anxin Yang, Zhijuan Du, and Tao Sun

\zihao-5 Abstract: Substitute relationships play a vital role in people’s daily lives, covering various fields. This study focuses on the understanding and prediction of substitute relationships of products in different domains, comprehensively analyzing the application of machine learning algorithms, natural language processing, and other technologies. By comparing the model methods in various directions such as the definition of substitutes in different domains, representation and learning of substitute relationships, and substitute reasoning, it provides a methodological basis for in-depth exploration of substitute relationships. Through continuous research and innovation, we can further enhance the personalization and accuracy of substitute recommendation systems, thereby promoting the development and application of this field.
Key words:
substitution relationships, recommender systems, information retrieval, reasoning algorithms, personalization
\zihao

6

\bullet Anxin Yang with School of Computer Science, Inner Mongolia University, Hohhot City, Inner Mongolia, 010021, China. E-mail: [email protected]
\bullet Zhijuan Du with School of Computer Science, Inner Mongolia University, Hohhot City, Inner Mongolia, 010021, China. E-mail: [email protected]
\bullet Tao Sun with School of Computer Science, Inner Mongolia University, Hohhot City, Inner Mongolia, 010021, China. E-mail: [email protected]
\sf{*} To whom correspondence should be addressed.
Manuscript received: year-month-day; accepted: year-month-day
\zihao

5

1 INTRODUCTION

In today’s diverse market environment, consumers have the option to choose different products or services as alternatives to their original choices. The concept of substitution has become increasingly important, as it offers selectivity and flexibility in meeting one’s needs.

  • \bullet

    Case 1: A user wants to purchase a specific brand and model of a smartphone, but discovers that it is sold out on the local e-commerce website. In such a situation, the user needs to find a substitute with similar features and performance, begins comparing other brands and models of smartphones. They compare various specifications, functions, design, and user reviews to select the most suitable phone for their needs. They may also consider other factors such as after-sales service, brand reputation, and reliability. Ultimately, they choose to purchase a smartphone from another brand as a substitute. This case highlights the behavior of consumers in the retail industry when a particular product fails to meet their needs, as they seek alternatives with similar features or characteristics. It demonstrates the flexibility and decision-making abilities of consumers in ensuring they obtain the products or services they require.

  • \bullet

    Case 2: Due to food allergies, religious beliefs, health requirements, or personal preferences, people may need to choose alternative ingredients based on specific requirements. If a recipe calls for a particular type of fish to make a dish, but this fish is not available in the local market, the person may choose another delicious fish with a similar texture as a substitute to ensure that the dish’s taste and flavor are not greatly affected. Alternatively, if someone plans to make a dessert but runs out of the required chocolate, they may choose to substitute it with cocoa powder and butter to achieve a similar taste and texture.

Consumers may harbor varied preferences and needs across different shop** scenarios, allowing them to choose among disparate products offering similar functionalities. For instance, when purchasing a television, consumers can make their selection based on personal preferences for brand, price, display technology, and other factors. Furthermore, substitution relationships can also provide alternative options when a particular product is temporarily out of stock or discontinued. For instance, when a certain product is temporarily unavailable or phased out, consumers can opt for other products with similar functionalities or features. In terms of food, people may need to choose alternative ingredients based on specific requirements due to food allergies, religious beliefs, health needs, or personal preferences. For example, for vegetarians, they may opt for alternatives such as legumes, mushrooms, or plant-based meat products as substitutes for animal meat.

1.1 Characterizations and Applications of Substitution

“Substitute” typically refers to the nature of offering items with similar functionalities or characteristics when a particular item fails to meet the user’s needs. These items, which possess substitutability, are referred to as substitutes. Explaining the process of substitution can aid in understanding why these substitutes are recommended and assist people in making better decisions.

In the retail industry: As demonstrated by the aforementioned case 1, substitutes refer to alternative goods that possess similar functionality or features in meeting the users’ needs. When a specific product fails to fulfill the users’ requirements, individuals tend to seek alternative products that offer comparable functionality or characteristics as replacements.

In the field of food: In case 2, specific application scenarios are mentioned. Substitutes are primarily defined based on the nutritional composition, taste, and texture of the food. They may exhibit similar flavors (such as sweet or sour), comparable textures (like crisp or fluffy), belong to the same food category (for instance, substituting different varieties of potatoes), or be used in similar recipe contexts (for example, utilizing bacon or chicken as the primary protein in a sandwich).

In other fields: Similarity and applicability are factors considered in determining substitutes. For instance, in fields like pharmaceuticals and business investment, the search for substitutes often involves seeking alternatives that exhibit a high degree of similarity.

Domain Data Source for Substitutes Criteria for Extracting Substitutes Pros Cons
Retail industry Price elasticity Increase in demand for item B due to price rise of item A Precise identification of substitutes High data requirement
Retail industry Co-view, view-but-purchase-another Users’ browsing data with “co-view” and “view-but-purchase-another” patterns Simple judgment easy data acquisition Label noise
The field of food User reviews Statements in reviews containing “substitute” information Simple judgment Inconsistent standards
The field of food Recipes Strong relevance to results Low universality
Other fields Similarity
\zihao-5Table 1: \zihao-5Pros and cons of the approach to extracting substitutes.

The characterization methods for substitutes are shown in Table 1. Overall, similarity is an important criterion for determining substitutes, as objects with substitutability have a high degree of similarity to each other.

In the retail industry, using price elasticity to determine substitutes is an economic method used to measure the impact of price changes on consumer demand and to identify other products that could potentially substitute for the original product. Price elasticity analysis can accurately determine substitutes for the original product in the market. However, conducting accurate price elasticity analysis requires a large amount of market data and consumer behavior data, as well as a substantial amount of consumption data. On the other hand, considering co-browsing behavior in the retail industry can provide a simple method of assessment. By analyzing users’ co-browsing behavior during online shop**, it is possible to infer their level of interest in a particular product.However, the limitation of this method lies in the relatively easy acquisition of data, but the issue of label noise can lead to inaccuracies in predictions[77]. In the food industry, using user reviews as an alternative data source is a simple operation, from which words such as ”substitute a for b ” or “replace b with a”[13].However, user reviews vary in terms of standards, making it challenging to have a benchmark for comparing results from different methods. This makes it difficult to make accurate judgments based on such data.By choosing substitute ingredients based on the recipe, high-quality substitutes with similar cooking effects and textures can be obtained. However, there is a strong correlation between dishes and ingredients, and different dishes cannot be universally applicable.In other fields, when data is lacking, it is common to search for highly similar items as substitutes through the analysis of their characteristic features. This process often involves manual review to ensure the accuracy of the data results.

Each method has its unique advantages and disadvantages, requiring a balance and consideration of the specific problem and the characteristics of the data to choose the most appropriate solution.

1.2 Research Value and Related Surveys

The search for substitutes can be achieved through substitute reasoning to meet personalized needs and better adaptation. Due to differences in taste and regional culture, people have diverse perceptions of food substitutes, making it difficult to find a satisfying solution that applies to everyone. Achieving recipe personalization through ingredient substitution may help people meet their dietary needs and preferences, avoid potential allergens, and simplify the process of culinary exploration. In order to address the issue of ingredient substitution, in the recommendation of food substitutes, we can use small-scale data learning to search for substitute foods with similar characteristics. For product substitution, we can use machine learning and automation technology to identify which products can substitute for the target product, or recommend related substitutes based on customer feedback. These methods and technologies not only help consumers to be more flexible and convenient in purchasing goods and cooking processes, but also provide more accurate recommendation services and brand marketing for businesses.

The substitution relationship provides consumers and decision-makers with the flexibility and options to make product choices in different scenarios, enabling them to make the best decisions based on their individual needs and goals. Study [35] conducted research on the influence of online product reviews among different brands of substitute products across a wide range of product categories on consumer purchasing decisions. The study revealed that, for substitute products, the majority of positive reviews have the potential to enhance product evaluation while potentially diminishing the appeal of other substitute products, thus shifting preferences towards the focal substitute product.Research on consumer behavior indicates that[81] when the recommended product’s price exceeds that of the focal product, it raises consumers’ psychological expectation of the price. During the product browsing stage, consumers tend to prefer comparing multiple substitutes, making it crucial to identify the complementary and substitutive relationships between products. Unlike previous research that primarily focused on the relationship between online reviews and sales figures, a study [78] has demonstrated the impact of the richness of online reviews (i.e., including videos or subsequent comments) on sales. The findings reveal that utilitarian products exhibit a stronger influence from online reviews compared to hedonic products, and negative reviews have a more pronounced effect on product perception than positive ones.

Starting from [46], the accuracy of product recommendations has been improved by analyzing substitute relationships and complementary relationships. Substitute relationships refer to the extent to which two products can replace each other. Complementary relationships refer to the ability of two products to complement or be used together. For example, cookies and milk have a complementary relationship because milk can be consumed with cookies to enhance the user’s eating experience. Recommendation systems can utilize complementary relationships to provide corresponding recommendations and increase user purchase intention [71, 53, 32].

The relationship reasoning technology of products can infer the correlation and connection between products based on the user’s historical behavior and preferences. By analyzing user’s purchase history, click behavior, ratings, and other data, it is possible to build a network of relationships between products. This network can describe relationships such as similarity, substitution, and complementarity between products.Based on these relationships, recommendation systems can utilize product relationship reasoning to help users browse through a large collection of products and find items that are relevant to their interests. Researchers have been working tirelessly to propose new models to improve the performance of product recommendation systems [86, 16, 75]. In recommendation systems, personalization and data sparsity have always been important aspects of concern, and some recently proposed models attempt to provide better solutions to these problems. Substitute relationships and complementary relationships are vital concepts that play a crucial role in improving recommendation accuracy and personalization by mining the similarity and connections between products. This, in turn, enhances the precision of recommendations and user satisfaction.

Visual similarity and substitute relationships are both used to describe the relationships between things, and they are used to compare the degree of similarity between two objects or concepts. Both visual similarity and substitute relationships have a certain degree of subjectivity, as people’s judgments of similarity and substitute relationships may vary due to individual experiences, cultural backgrounds, and cognitive preferences. However, visual similarity primarily focuses on comparing the degree of similarity between things based on visual features such as appearance, structure, shape, color, etc., i.e., the similarity based on external features. On the other hand, substitute relationships primarily focus on whether one object or concept can replace another in a specific context, i.e., the relationship based on substitutability. In the field of product recommendation, research on substitute relationships is often based on the visual features of the products, such as the style of product images[47].Subsequent research often focuses on analyzing images in a specific domain, such as the fashion and clothing industry [11, 22]. This is because in the fashion and clothing industry, the visual features of products play a crucial role in consumers’ purchasing decisions.Therefore, researchers typically apply image analysis to this specific domain to uncover substitute and complementary relationships between products[40, 51, 29]. This survey primarily focuses on the inference of general product relationships in recommendation systems, with less emphasis on image analysis in the fashion domain.

1.3 Framework for Organizing Substitute Work

A general framework for substitute reasoning includes data preprocessing, feature representation, substitute relation learning and inference models, model training and optimization, as well as model evaluation.

During the data preprocessing stage, the data is cleansed, standardized, and transformed to ensure its quality and consistency.Feature representation, a crucial component of the substitute reasoning framework, involves converting the elements and attributes of the data into an effective representation. This may involve traditional feature engineering methods such as statistical features or manually defined attributes, or it can leverage deep learning techniques for automatic feature extraction. The quality and effectiveness of feature representation are vital for subsequent learning and inference processes. Substitute relation learning and inference models form the core part of the framework, enabling the discovery and inference of new substitute relationships, thereby further understanding the hidden information and structure within the data. Based on the data processed through feature representation, learning and inference models suitable for the substitute reasoning task can be designed and constructed. These models can be implemented through different machine learning algorithms, deep neural networks, or graph neural networks, among other techniques.During the model training and optimization stage, it involves selecting appropriate objective functions, tuning model parameters, and choosing suitable optimization methods to ensure the model can better adapt to the training data and possesses good generalization capabilities. Finally, model evaluation utilizes appropriate datasets and evaluation metrics to measure the performance of the model, such as accuracy, recall, F1 score, and so on.

This survey aims to study and explore the application of methods for inferring substitute relationships, the specific structure is shown in the following diagram.

Refer to caption
\zihao-5Fig. 1: \zihao-5Structural Diagram
  • \bullet

    Chapter 2 will focus on introducing the classification methods, data processing methods, and main problems addressed in the inference of substitute relationships. The classification of substitutes will be based on task objectives, underlying factors, and directions of substitution. Data processing methods are crucial in substitute relationship inference and involve preprocessing of data and data structures, among others. Additionally, this chapter will also provide a detailed discussion on the main problems addressed by substitute relationship inference, such as interpretability issues, data sparsity, cold start, and relationship decoupling.

  • \bullet

    Chapter 3, the exploration delves deeper into the methods for feature representation, learning, reasoning, and model training associated with substitute relationships. Feature representation methods play a crucial role in effectively capturing the essential elements and attributes of causal relationships. The subsequent learning and reasoning methods aim to harness these features to uncover new substitute relationships, while model training methods, whether supervised or unsupervised, are essential in refining the inference process.

  • \bullet

    Chapter 4 will focus on the commonly used datasets and evaluation standards for substitute relationship inference. Appropriate datasets and accurate evaluation standards are crucial for the study of inference methods. This chapter will introduce some commonly used datasets and discuss evaluation metrics for assessing the effectiveness of inference, such as accuracy, recall, and others.

  • \bullet

    Finally,Chapter 5 reveals our future challenges , and Chapter 6 presents our conclusion.

Through comprehensive analysis and discussion of substitute relationship inference methods, this survey aims to provide a comprehensive framework for researchers in substitute relationship inference. The goal is to better apply and promote its development.

2 SUBSTITUTE DEFINITION

“Substitute” refers to the existence of items, services, or concepts in a specific context that can be mutually replaced. Substitutes often occur based on observation and research into the direction and basis of substitution. For example, in the food industry, we can analyze the attributes and nutritional components of ingredients to determine which ones can be substituted for each other. In the realm of commodities, we can compare product features and functions to find products that can be substituted for each other.Establishing feature initialization forms and embedding spaces helps in analyzing and predicting substitution patterns and directions. It enables us to gain a better understanding and analysis of the relationships and characteristics of substitutions.

Model Definition of Substitutes Problem Solving
Substitute Task Objectives Substitute Directions Data Form and Structure Common Problems Approaches

Food Substitutes

Product Substitutes

Other Substitutes

Modeling Complementarity

One-way substitutes

Two-way substitutes

Text data

Image data

Structured

Interpretability

Data sparsity

Multilingual

Cold start

Personalization

Decoupling multiple relations

Path-based

GISMo[21]
[77]
[38] KG
[88] KG
SCG-SPRe[85] Graph
DHGAN[87] Graph
M-HetSage[30] Graph
SPGCN[13] Corporate Investment Graph
KAPR[76] KG
Food2Vec/BERT[55]
DIISH[63]
A2CF[10]
DecGCN[42]
Product2Vec[6]
RRN[82]
KGDDS[62] Medicine KG
SPEM[84] Graph
LVA[59]
PMSC[66]
SHOPPER[61]
Sceptre[46] Graph
[47]
[4]
EMRIGCN[7] Graph
IRGNN[41] Graph
\zihao-5Table 2: \zihao-5Definition and problems of substitute reasoning

2.1 Substitution Classification

2.1.1 Substitution Task Objectives

When the required items are unavailable or missing, people may need to look for alternative products to meet their needs, and the requirements for substitutions can vary.

For example, in the substitution of ingredients, some people may be allergic to certain ingredients and need to find substitutes [55, 63, 4]; Because different ingredients can have varying flavors and textures, substituting specified ingredients in a given recipe require some modifications to the recipe and cooking method as well[21, 38], it is important to consider additional factors such as nutritional value and cooking characteristics.

Research[77, 30, 10] focuses on reasoning about substitution relationships between products. However, sometimes the task of relationship inference requires exploring multiple relationships. For example, the substitution relationship between products is often modeled together with complementary relationships.In research [7, 76, 59, 46], substitution relationships and complementary relationships are modeled separately, without considering the interaction between different relationships. However, some studies suggest that there is an interaction between substitution and complementary relationships[41, 85, 87, 76, 42, 6, 66],capture the dependencies between relationships can provide a better understanding of the implicit relationships between products. The figure 2 shows the coupling patterns of product relationships:

Refer to caption
Refer to caption
Refer to caption
Refer to caption
\zihao-5Fig. 2: \zihao-5Coupling patterns of relationships

2.1.2 Basis of Substitution

The basis for substitution refers to the selection and replacement of items, services, or concepts based on different conditions in a particular context. The basis for substitution can vary in different situations.

The concept [4] of using rule-based deduction for ingredient substitution was first introduced, and in the study, ingredients were transformed into topic distributions, with substitutions selected based on the density of the distribution. In research [63, 55, 38, 88, 21]believes that the substitutability and similarity of ingredients are related, and suggests using the semantic aspect of the ingredients as the basis for substitution. The research [38, 21] utilized the semantic aspect of ingredients while also considering the correlation of ingredients in different recipe contexts, determining substitutions based on specific scenario conditions and requirements; taking into account the similarity between different recipes and the co-occurrence of ingredients in recipes[63], the obtained information is integrated as a basis for substitution reasoning; in addition, [21, 55] incorporated ingredient images as auxiliary data, combining the semantic information from textual data with the visual features from images. By analyzing and processing the ingredient images, they were able to improve the accuracy of substitution choices.

In order to explore the substitution relationships between products, the study [47] attempted to simulate the human perception of the visual characteristics of objects by analyzing the style of images for substitution reasoning.In addition to product similarity, the study [46] suggests that co-browsed products are substitutable. Researches [66, 59, 87, 42, 7, 41] use existing product substitutions and complementary connections as the basis for relational inference, inferring their substitution relationships by learning the associations between products. Apart from co-browsing, other user behaviors also provide implicit feedback on the relationships between products. [46, 82] combine product reviews to predict relationships between products. Additionally, [82] also considers non-textual features of comments in their analysis; [85] distinguishes relationships between products based on the temporal patterns of user behavior sequences; [30] explores product substitution relationships through user search logs and product stock-out logs. There is mutual influence between products, and [61, 6] analyze user shop** baskets and orders to infer the substitutability of products based on their co-occurrence relationships.

And [13] discusses the selection of investment in enterprises, stating that the interdependencies between companies determine their substitutability. On the other hand, [62] defines substitution from the perspective of the effects and side effects of drugs.

2.1.3 Direction of Substitution

In practical applications, substitution often occurs in a bidirectional manner, however, not all substitution behaviors are bidirectional, and in some cases, substitution can only occur in a unidirectional manner.

For example, in some recipes, a specific ingredient is necessary to achieve the expected taste and texture. [21, 38] discusses ingredient substitution and recipes, aiming to replace rare, allergenic, expensive, or unhealthy ingredients with their common, non-allergenic, cheaper, or healthier counterparts.

In e-commerce, the substitution relationship of goods is assumed to be bidirectional by default, but the specific situation still depends on the usage context of the goods. When a consumer selects a product, an e-commerce platform may display other similar products as alternative options, but the specific substitution relationship still depends on personalized factors such as consumer needs and individual preferences. [46, 41] argue that there is a directionality in the relationships between products. [46] models the product relationship graph as a directed graph, while [41] makes directional predictions using non-commutative outer product operations. The specific substitution relationship depends on the product itself, user needs, and the specific application context.

2.2 Data Processing

2.2.1 Data Format and Structure

Choosing appropriate data format and structure can enhance the representation efficiency and access speed of data, thereby optimizing program performance.

Text Data: Text data comes from various sources, such as ingredient names in the food domain[88], recipes[4], and cooking steps[21]. Recipes serve as the contextual environment for ingredients and can help understand the correlation between ingredient names, providing more accurate recommendations and suggestions suitable for the current scenario. However, recipes are subject to cultural differences, and the lack of universality in ingredient substitutions. Cooking steps can ensure the quality and repeatability of recipes, but constructing a cooking model is not easy.

Common data sources used in product substitution include: product reviews[82], product titles[10], product descriptions[76], user behavior[30], purchase history[61], and tags[21].Product titles provide basic product information, typically in a more general manner, while product descriptions provide more detailed information. On the other hand, if the features provided by product descriptions are too detailed, it may result in overfitting of the model, making it difficult to generalize to new data. Understanding user preferences and providing personalized product choices can be done by analyzing reviews and user browsing/ purchasing behavior to find alternative products and discover cross-selling opportunities. However, user behavior data often follows a long-tail distribution, where popular or common products have a large number of reviews, while reviews for some unique or rare products are scarce. Therefore, to comprehensively model user behavior and product features, it is necessary to integrate data from multiple perspectives, such as user behavior data and product information.

Image data :[47] is used for modeling fashion style substitutions with other products. Some studies [21, 55, 30] combine text embeddings with image embeddings to accurately understand and infer data through multimodal learning and information extraction.

Choosing the appropriate data structure is crucial for efficient data representation and processing. Compared to unstructured textual data, graphs composed of nodes and edges can visually reflect the relationships between products [46, 84]. Graphs provide a clear view of the complex relationships and network structures among products, enabling us to understand dependencies and interdependencies between products.Graph algorithms can be used for analysis and processing [13, 30, 87], enabling the discovery and utilization of patterns of association among products. Knowledge graphs, as a type of heterogeneous graph, offer a richer and more flexible modeling approach based on entities and relationships. By integrating data from different sources and types into a unified knowledge graph [62, 76, 38], a more comprehensive understanding can be achieved.

2.2.2 Data initialization

For the process of feature initialization, researchers can use different methods and models to extract and represent the features of the data, in support of subsequent tasks and analyses.

In the realm of textual data, the LDA model can be utilized for topic modeling, extracting latent thematic features and performing low-dimensional embeddings on text, as well as embedding representations of product reviews [46, 82]. However, due to LDA’s unsuitability for modeling short texts, the Word2Vec model acquires word vector representations by learning the distribution of words in context, capturing the semantic information of words through generating word vectors. Some studies [66, 76] have chosen to initialize data using variants of Word2Vec [49], which better reflect the semantic relationships between words and are capable of efficiently processing large-scale corpora. Additionally, [77] leverages the FastText method, which introduces n-gram features for learning features.

Deep learning methods can learn complex relationships and patterns in data, extracting more informative features. [59] employs an encoder-decoder structure to learn data features. For image data, [47] uses convolutional methods for feature learning. The language model BERT (Bidirectional Encoder Representations from Transformers) can utilize contextual information in a sentence to learn semantic representations of words, better capturing the relationships between words. Compared to traditional context-independent language models such as word2vec, BERT is more suitable for natural language processing tasks that require considering contextual relevance. Additionally, BERT can be fine-tuned for specific tasks, allowing for post-training fine-tuning to optimize its performance [30, 38, 21].

Graph embedding methods can better capture the relationships between data and provide more comprehensive information. [7] selects PinSage [79] as the foundational model for embedding data. Additionally, [76] utilizes relationship representation methods from knowledge graphs to process structured data, employing the TransE [3] model for data embedding. Furthermore, [13, 77] employ LightGCN to learn from graph data.

2.3 Challenges and Issues

Alternative reasoning requires the ability to model and understand the real world, often involving the interaction of multiple factors. Therefore, alternative reasoning needs to effectively model and infer these complex relationships.

2.3.1 Main Existing Problems

The task of alternative reasoning not only involves proposing a new solution but also faces a series of challenges and problems that need to be overcome.

Interpretability: Interpretability is an important issue in relation to alternative reasoning. Due to the complex models and algorithms used in alternative reasoning, the decision-making process is often difficult to interpret. This limits the understanding of the reasoning results and trust in the outcomes. In [76], action paths from reinforcement learning are used as the basis for finding alternatives. [87, 66] represent the relationships and influences between different elements through graph structures, generating highly interpretable alternative solutions. [10] uses key attributes as the foundation for alternative recommendations, selecting recommended item vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT based on the performance of key attributes. Graph structure models have outstanding interpretability and comprehensibility in relation reasoning compared to other deep learning models. They can better understand and illustrate the connections between data, and have clear advantages in interpreting model predictions and optimizing models.

Data Sparsity: Sparse interaction data between users and items can lead to the issue of data sparsity. The lack of sufficient information to accurately infer user interests and item relevance poses challenges to the performance and accuracy of the model. In alternative reasoning, two common methods used to alleviate data sparsity are enhancing data representation or augmenting the dataset.

[76] enriches the features of products using product description information as the reward function in reinforcement learning to alleviate data sparsity. [38] enriches representations by pre-training the BERT model. Path representation in graph structures describes the connections between nodes through intermediate nodes, further enhancing the relationships and mutual influences between nodes. Studies such as [87, 66, 76, 41] expand the dataset through two-hop/ multi-hop paths of nodes. [61] additionally introduces an indicator of interchangeability, measuring the probability of substitutability of products by calculating the similarity of product distributions.

Personalization: Personalization refers to how to provide customized recommendations and services based on the specific needs, interests, and behaviors of users. In the context of alternative reasoning and recommendation for products, [10, 59] combine collaborative filtering methods to model user preferences. The study in [47] extracts user preferences from user comments. Research [61] incorporates the sequential choices of customers as a function of latent attributes and interaction coefficients into the calculation of substitutability probabilities. In the food domain, where there is no user information, but customization requirements exist for recipe substitutions, [21, 38] provide personalized alternative recommendations based on recipe context and editing for new recipes.

Cold Start: The cold start problem refers to the situation where a new system or a new item introduced into the system cannot effectively provide personalized recommendations due to a lack of specific user data or item information. In addition to inferring ingredients based on recipes, [38] also provides generalized ingredient inference for general substitution purposes, addressing the cold start issue. Studies such as [85, 59, 66, 46] tackle the cold start problem when new items are added to the system by using data such as titles and product descriptions for alternative reasoning, as there is a lack of user preference information for these new items. [47] utilizes image data and a predictor based on visual features to perform alternative reasoning even in the absence of user preference information.

Relationship Decoupling: In the substitution reasoning of products, using a heterogeneous graph to model products and their relationships is a common approach that facilitates handling complementary relationships between products. Different types of relationships may mutually influence each other, leading to inaccurate reasoning results. Studies such as [85, 87, 42, 46] have constructed different subgraphs for different relationships and conducted learning on each subgraph separately. Research such as [13] has constructed a dependency graph for enterprises, where complementary and substitute relationships can be seen as two directions in the enterprise dependency network, and learning is conducted separately for different directions. Decoupling also provides better flexibility and scalability. [46] learns a set of parameters for different relationship subgraphs, making it easier to add new types of relationships or properties without causing unnecessary interference or modification to other parts of the system. Additionally, certain relationships may have interactive influences, where different relationships may depend on or interact with each other, such as the coupling of substitute and complementary relationships mentioned in 2.2.1.. By constructing subgraphs for different relationships and conducting independent learning, as well as facilitating interactive learning between the subgraphs, the complexity and interdependence of different relationship types can be addressed.

3 SUBSTITUTE REASONING

In the process of substitute reasoning, the primary step is to represent the data and features in a computationally understandable manner. Subsequently, the focus is on studying substitute relationships through learning, thus reasoning about substitute relationships. Finally, these independent components are integrated into a complete model and trained and optimized. The table 3 displays the reasoning process in substitute inference.

Feature Representation Relation Reasoning Model Training Optimization
Ref NLP Graph Embedding loss Relation Learning Substitute Reasoning loss global loss Training Optimizer Negative Sampling
[21] GNN cross-entropy loss self-supervised contrastive loss Adam
[77] AdamW adding negative signs to negative samples
[38] cross-entropy loss
[88]
[85] BPR loss Adam
[87] GAN cross-entropy loss RSGD
[30] GNN LaAP loss Adam
[13] GCN cross-entropy loss Adam
[76] KGE cross-entropy loss
[55] AdamW
[63]
[10] BPR loss Adam
[42] graph-based loss Adam
[6] probability
[82] categorical cross-entropy loss Adam
[62]
[84]
[59]
[66] logistic loss SGD
[61]
[46] L-BFGS
[47] L-BFGS
[4]
[7] max-margin ranking loss Adam
[41] sum of cross-entropy loss Adam
\zihao-5Table 3: \zihao-5Inference Process in Substitute Reasoning

3.1 Feature Representation Method

The feature representation method refers to extracting features from the raw data in order to better analyze, and predict the data.

As mentioned in the literature [4, 46, 82], the LDA method was used for dimensionality reduction and embedding of topic distribution. In the study [82], a different approach was employed for similarity computation, using the outer product of the two vectors to explore all associations between two items (i.e., θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and θjsubscript𝜃𝑗\theta_{j}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT), rather than the inner product commonly used in most similar calculations. In the study, a feature set comprising seven important non-textual factors was developed, such as the number of comments, average ratings, variance, and so on. The textual and non-textual features were eventually concatenated into an (4K2superscript𝐾2K^{2}italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT+7)×1 vector representation of the product.The study [46] suggests that similar products typically have similar review texts. It uses logistic regression (sigmoid function) to predict the connections between products. Furthermore, the study also takes into account the directional nature of relationships and describes asymmetry in direction by modeling the differences between product topic vectors.

L(y,𝒯β,η,θ,ϕ,z)=𝐿𝑦conditional𝒯𝛽𝜂𝜃italic-ϕ𝑧absent\displaystyle L(y,\mathcal{T}\mid\beta,\eta,\theta,\phi,z)=italic_L ( italic_y , caligraphic_T ∣ italic_β , italic_η , italic_θ , italic_ϕ , italic_z ) = (1)
(i,j)Fβ(ψθ(i,j))Fη(φθ(i,j))(1Fη(φθ(j,i)))positive relations (F) and their direction of flow (F)superscriptsubscriptproduct𝑖𝑗superscriptsubscript𝐹𝛽subscript𝜓𝜃𝑖𝑗subscript𝐹𝜂subscript𝜑𝜃𝑖𝑗1subscript𝐹𝜂subscript𝜑𝜃𝑗𝑖positive relations superscript𝐹 and their direction of flow superscript𝐹\displaystyle\overbrace{\prod_{(i,j)\in\mathcal{E}}F_{\beta}^{\leftrightarrow}% \left(\psi_{\theta}(i,j)\right)F_{\eta}\rightarrow\left(\varphi_{\theta}(i,j)% \right)\left(1-F_{\eta}\left(\varphi_{\theta}(j,i)\right)\right)}^{\text{% positive relations }\left(F^{\leftrightarrow}\right)\text{ and their direction% of flow }\left(F^{\rightarrow}\right)}over⏞ start_ARG ∏ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_E end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ↔ end_POSTSUPERSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_i , italic_j ) ) italic_F start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT → ( italic_φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_i , italic_j ) ) ( 1 - italic_F start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_j , italic_i ) ) ) end_ARG start_POSTSUPERSCRIPT positive relations ( italic_F start_POSTSUPERSCRIPT ↔ end_POSTSUPERSCRIPT ) and their direction of flow ( italic_F start_POSTSUPERSCRIPT → end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT
(i,j)¯(1Fβ(ψθ(i,j)))non-relations d𝒯j=1Ndθzd,jϕzd,j,wd,jcorpus likelihoodsubscriptsubscriptproduct𝑖𝑗¯1superscriptsubscript𝐹𝛽subscript𝜓𝜃𝑖𝑗non-relations subscriptsubscriptproduct𝑑𝒯superscriptsubscriptproduct𝑗1subscript𝑁𝑑subscript𝜃subscript𝑧𝑑𝑗subscriptitalic-ϕsubscript𝑧𝑑𝑗subscript𝑤𝑑𝑗corpus likelihood\displaystyle\underbrace{\prod_{(i,j)\in\overline{\mathcal{E}}}\left(1-F_{% \beta}^{\leftrightarrow}\left(\psi_{\theta}(i,j)\right)\right)}_{\text{non-% relations }}\underbrace{\prod_{d\in\mathcal{T}}\prod_{j=1}^{N_{d}}\theta_{z_{d% ,j}}\phi_{z_{d,j},w_{d,j}}}_{\text{corpus likelihood }}under⏟ start_ARG ∏ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ over¯ start_ARG caligraphic_E end_ARG end_POSTSUBSCRIPT ( 1 - italic_F start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ↔ end_POSTSUPERSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_i , italic_j ) ) ) end_ARG start_POSTSUBSCRIPT non-relations end_POSTSUBSCRIPT under⏟ start_ARG ∏ start_POSTSUBSCRIPT italic_d ∈ caligraphic_T end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_d , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_d , italic_j end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_d , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT corpus likelihood end_POSTSUBSCRIPT

In [66], the method for link prediction also utilizes the sigmoid function and involves two types of vectors: the target vector V and the context vector Vsuperscript𝑉V^{\prime}italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which are used to capture different semantics of products in directed relationships.

Studies such as [47, 21, 55, 30] have utilized image data. In the study by [47], they used the Mahalanobis distance based on the projection of image feature vectors to measure the similarity between products. In the study [55], ImageNet [14] was used to embed a portion of the images in the data. For the textual part, two embedding methods were proposed: (1) utilizing a variation of CBOW from Word2vec for computation, and (2) pretraining with an extended vocabulary from ingredient data using BERT and performing vector dimensionality reduction with PCA. In [21], ViT [19] was used as the encoder for images. As for textual data, graph neural networks are the most commonly used method for processing graph-structured data. The study utilized multiple Graph Isomorphism Network (GIN) [74] layers as the component encoders, with each GIN layer defined as follows:

𝐡v(l)=f(l)((1+ϵ(l))𝐡v(l1)+u𝒩(v)(evu𝐡u(l1));θ(l))superscriptsubscript𝐡𝑣𝑙superscript𝑓𝑙1superscriptitalic-ϵ𝑙superscriptsubscript𝐡𝑣𝑙1subscript𝑢𝒩𝑣subscript𝑒𝑣𝑢superscriptsubscript𝐡𝑢𝑙1superscript𝜃𝑙\mathbf{h}_{v}^{(l)}=f^{(l)}\left(\left(1+\epsilon^{(l)}\right)\mathbf{h}_{v}^% {(l-1)}+\sum_{u\in\mathcal{N}(v)}\left(e_{vu}\mathbf{h}_{u}^{(l-1)}\right);% \theta^{(l)}\right)bold_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = italic_f start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( ( 1 + italic_ϵ start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) bold_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_v italic_u end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ) ; italic_θ start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) (2)

In the study [13], LightGCN [26] was used to embed graph data. LightGCN is a lightweight graph neural network model that was used to embed the dependency relationship network of enterprises. In the study [87], the data was embedded into hyperbolic space, and an attention mechanism was used for information aggregation. Unlike traditional Euclidean space, hyperbolic space can better handle data with complex structures and non-linear relationships.

In the study [85], the aim was to obtain disentangled representations of specific relationships for each product. Two graphs, Gs and Gc, were constructed based on known substitutive and complementary relationships. Complex dependency relationships were updated by combining data from different relationships in the representation updating phase.

h¯s,pk=superscriptsubscript¯𝑠𝑝𝑘absent\displaystyle\bar{h}_{s,p}^{k}=over¯ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = j𝒩s(p)A^s[p,j]hs,jk1subscript𝑗subscript𝒩𝑠𝑝subscript^𝐴𝑠𝑝𝑗superscriptsubscript𝑠𝑗𝑘1\displaystyle\sum_{j\in\mathcal{N}_{s}(p)}\hat{A}_{s}[p,j]\cdot h_{s,j}^{k-1}∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_p ) end_POSTSUBSCRIPT over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ italic_p , italic_j ] ⋅ italic_h start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT (3)
+tanh(A^s[p,j]hs,jk1)(hs,jk1+rs)direct-productsubscript^𝐴𝑠𝑝𝑗superscriptsubscript𝑠𝑗𝑘1superscriptsubscript𝑠𝑗𝑘1subscript𝑟𝑠\displaystyle+\tanh\left(\hat{A}_{s}[p,j]\cdot h_{s,j}^{k-1}\right)\odot\left(% h_{s,j}^{k-1}+r_{s}\right)+ roman_tanh ( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ italic_p , italic_j ] ⋅ italic_h start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ⊙ ( italic_h start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT + italic_r start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT )
+(A^s[p,j]hs,jk1+rs)tanh(hs,jk1)direct-productsubscript^𝐴𝑠𝑝𝑗superscriptsubscript𝑠𝑗𝑘1subscript𝑟𝑠superscriptsubscript𝑠𝑗𝑘1\displaystyle+\left(\hat{A}_{s}[p,j]\cdot h_{s,j}^{k-1}+r_{s}\right)\odot\tanh% \left(h_{s,j}^{k-1}\right)+ ( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ italic_p , italic_j ] ⋅ italic_h start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT + italic_r start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ⊙ roman_tanh ( italic_h start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT )
𝒉s,pk=(1αs,pk)𝒉¯s,pk+αs,pkTc>s(𝒉¯c,pk;θsT)superscriptsubscript𝒉𝑠𝑝𝑘1superscriptsubscript𝛼𝑠𝑝𝑘superscriptsubscript¯𝒉𝑠𝑝𝑘superscriptsubscript𝛼𝑠𝑝𝑘subscript𝑇limit-from𝑐𝑠superscriptsubscript¯𝒉𝑐𝑝𝑘superscriptsubscript𝜃𝑠𝑇\boldsymbol{h}_{s,p}^{k}=\left(1-\alpha_{s,p}^{k}\right)\cdot\overline{% \boldsymbol{h}}_{s,p}^{k}+\alpha_{s,p}^{k}\cdot T_{c->s}\left(\overline{% \boldsymbol{h}}_{c,p}^{k};\theta_{s}^{T}\right)bold_italic_h start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = ( 1 - italic_α start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⋅ over¯ start_ARG bold_italic_h end_ARG start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⋅ italic_T start_POSTSUBSCRIPT italic_c - > italic_s end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_h end_ARG start_POSTSUBSCRIPT italic_c , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ; italic_θ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) (4)

In the study [38], knowledge graphs, as a special type of graph structure, were embedded using the pre-trained language model BERT. The research retrained BERT on two tasks: predicting substituent composition and assessing the plausibility of triplets within the knowledge graph.

In addition to graph neural networks, the study [76] employed reinforcement learning methods using TransE [3] for embedding representations. The research formulated the inference of product relationships using structured data as a Markov decision process, with the following process:

In the study, non-structured data was embedded using TF-IDF and Doc2vec [36]. In the research [6], a Product2Vec model was created, which utilized basket composition information to transform products into low-dimensional vectors with continuous elements.

𝒱,𝒱=argmax𝒱,𝒱bBsibcjclog(si+jsi;𝒱,𝒱)𝒱superscript𝒱𝒱superscript𝒱subscript𝑏𝐵subscriptsubscript𝑠𝑖𝑏subscript𝑐𝑗𝑐conditionalsubscript𝑠𝑖𝑗subscript𝑠𝑖𝒱superscript𝒱\mathcal{V},\mathcal{V}^{\prime}=\underset{\mathcal{V},\mathcal{V}^{\prime}}{% \arg\max}\sum_{b\in B}\sum_{s_{i}\in b}\sum_{-c\leq j\leq c}\log\mathbb{P}% \left(s_{i+j}\mid s_{i};\mathcal{V},\mathcal{V}^{\prime}\right)caligraphic_V , caligraphic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = start_UNDERACCENT caligraphic_V , caligraphic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_b end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT - italic_c ≤ italic_j ≤ italic_c end_POSTSUBSCRIPT roman_log blackboard_P ( italic_s start_POSTSUBSCRIPT italic_i + italic_j end_POSTSUBSCRIPT ∣ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; caligraphic_V , caligraphic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) (5)

In [61], a logarithmic bilinear form is employed, where the probability of a customer selecting item c at step i depends on the latent features of item c and the user preference θusubscript𝜃𝑢\theta_{u}italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT.

Prob( item c items in basket )proportional-toProbconditional item 𝑐 items in basket absent\displaystyle\operatorname{Prob}(\text{ item }c\mid\text{ items in basket })\proptoroman_Prob ( item italic_c ∣ items in basket ) ∝ (6)
exp{θuαc+ρc(1i1j=1i1αyj)}superscriptsubscript𝜃𝑢topsubscript𝛼𝑐superscriptsubscript𝜌𝑐top1𝑖1superscriptsubscript𝑗1𝑖1subscript𝛼subscript𝑦𝑗\displaystyle\exp\left\{\theta_{u}^{\top}\alpha_{c}+\rho_{c}^{\top}\left(\frac% {1}{i-1}\sum_{j=1}^{i-1}\alpha_{y_{j}}\right)\right\}roman_exp { italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_i - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) }

In the study [10], collaborative filtering is applied separately to users X and items Y, and then combined. Traditional matrix factorization-based collaborative filtering methods have limited expressive power. In the research, residual feedforward networks are used to model pairwise attributes.

LDA and TF-IDF are commonly used methods in text analysis and topic modeling, but they disregard the specific meanings and semantic information of words. Word2Vec is a deep learning model used for learning word vector representations, which captures the semantic relationships between words but only captures local context dependencies. In contrast, BERT [15] can capture longer-range contextual information but requires a large amount of text data for pre-training. The choice of appropriate representation method depends on the characteristics of the data, such as a large amount of lengthy text data being more suitable for BERT, while less and shorter text can be considered for Word2Vec.

For structured data such as graph data, relationship data, and knowledge graphs, more attention is focused on the structure of the data rather than textual semantics. By employing comprehensive representation learning methods, it becomes possible to consider information related to structure, relationships, and semantics simultaneously, thereby improving the performance and effectiveness of structured data analysis tasks.

3.2 Substitute Learning and Inference

Substitute learning and inference involve using the learned features to infer other possible facts or relationships.

3.2.1 Relation Learning

Relation learning refers to the process of learning and understanding the relationships between entities from given data. These relationships can include interactions, dependencies, associations, and more. Most often, relation learning specifically refers to the learning of alternative relationships, while some models also encompass complementary relationships. This text will only focus on the learning and inference of alternative relationships.

In [4], a set of alternative rules was manually derived as the basis for alternative reasoning. In [61], it is argued that the probability of selecting item c depends on its features, so additional influencing factors (preferences, price factors, seasonal factors, popularity) were included for evaluation. In [59], the authors linked two Variational Autoencoders (VAEs) [34], with one serving as the encoder to sample features and the other serving as the decoder to learn relational features. Finally, the classifier part is implemented through the softmax function. The implemented function is as follows, where Dasubscript𝐷𝑎D_{a}italic_D start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and Dbsubscript𝐷𝑏D_{b}italic_D start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT represent the KL divergence terms for items a and b, V represents the total number of data points (or items), and M represents a random sample set extracted from V.

VMi=1M𝑉𝑀superscriptsubscript𝑖1𝑀\displaystyle\frac{V}{M}\sum_{i=1}^{M}divide start_ARG italic_V end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT [12σ2(xiaμϕ(xia))2𝒟a\displaystyle\left[-\frac{1}{2\sigma^{2}}\left(x_{i}^{a}-\mu_{\phi}\left(x_{i}% ^{a}\right)\right)^{2}-\mathcal{D}^{a}\right.[ - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - caligraphic_D start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT (7)
12σ2(xibμϕ(xib))2𝒟b12superscript𝜎2superscriptsuperscriptsubscript𝑥𝑖𝑏subscript𝜇italic-ϕsuperscriptsubscript𝑥𝑖𝑏2superscript𝒟𝑏\displaystyle-\frac{1}{2\sigma^{2}}\left(x_{i}^{b}-\mu_{\phi}\left(x_{i}^{b}% \right)\right)^{2}-\mathcal{D}^{b}- divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - caligraphic_D start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT
+log(eθ(y)T𝐳j=1Yeθ(j)T𝐳)]\displaystyle\left.+\log\left(\frac{e^{\theta^{(y)T}\mathbf{z}}}{\sum_{j=1}^{Y% }e^{\theta^{(j)T}\mathbf{z}}}\right)\right]+ roman_log ( divide start_ARG italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ( italic_y ) italic_T end_POSTSUPERSCRIPT bold_z end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Y end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ( italic_j ) italic_T end_POSTSUPERSCRIPT bold_z end_POSTSUPERSCRIPT end_ARG ) ]

In the study [84], it is argued that substitutable goods are not commonly purchased together, implying that there is no co-purchasing relationship between substitute products. Based on this, the authors propose a method to enhance second-order proximity and reduce first-order proximity. They use Autoencoders (AE) to embed the entire co-purchasing graph of goods, learn information from the neighborhood of nodes, and add constraints to the connections between nodes.

𝐲i(1)=σ(𝐖(1)𝐱i+𝐛(1))superscriptsubscript𝐲𝑖1𝜎superscript𝐖1subscript𝐱𝑖superscript𝐛1\displaystyle\mathbf{y}_{i}^{(1)}=\sigma\left(\mathbf{W}^{(1)}\mathbf{x}_{i}+% \mathbf{b}^{(1)}\right)bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = italic_σ ( bold_W start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_b start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) (8)
𝐲i(k)=σ(𝐖(k)𝐲i(k1)+𝐛(k)),k=2,,Kformulae-sequencesuperscriptsubscript𝐲𝑖𝑘𝜎superscript𝐖𝑘superscriptsubscript𝐲𝑖𝑘1superscript𝐛𝑘𝑘2𝐾\displaystyle\mathbf{y}_{i}^{(k)}=\sigma\left(\mathbf{W}^{(k)}\mathbf{y}_{i}^{% (k-1)}+\mathbf{b}^{(k)}\right),k=2,\ldots,Kbold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = italic_σ ( bold_W start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) , italic_k = 2 , … , italic_K
d(𝐲i(K),𝐲j(K))<d(𝐲i(K),𝐲t(K)),𝑑superscriptsubscript𝐲𝑖𝐾superscriptsubscript𝐲𝑗𝐾𝑑superscriptsubscript𝐲𝑖𝐾superscriptsubscript𝐲𝑡𝐾\displaystyle d\left(\mathbf{y}_{i}^{(K)},\mathbf{y}_{j}^{(K)}\right)<d\left(% \mathbf{y}_{i}^{(K)},\mathbf{y}_{t}^{(K)}\right),italic_d ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT , bold_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT ) < italic_d ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT , bold_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_K ) end_POSTSUPERSCRIPT ) , (9)
viV,vjVisub,vtVicomformulae-sequencefor-allsubscript𝑣𝑖𝑉formulae-sequencefor-allsubscript𝑣𝑗superscriptsubscript𝑉𝑖subfor-allsubscript𝑣𝑡superscriptsubscript𝑉𝑖com\displaystyle\forall v_{i}\in V,\forall v_{j}\in V_{i}^{\text{sub}},\forall v_% {t}\in V_{i}^{\text{com}}∀ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V , ∀ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sub end_POSTSUPERSCRIPT , ∀ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT com end_POSTSUPERSCRIPT

For cases with multiple relationships, studies such as [46, 66, 42, 13], and [87] focus on learning for different relationships independently. In [46], a predictor is trained for each subgraph formed by individual relationships. In the research by [7], the initial single-relation features are integrated, and then the method from [44] is used for structural integration, exploring the mutual influence between different types of relation neighbors, leading to improved model accuracy through knowledge transfer between different relationships. Additionally, in [66], vectors are projected into different relational spaces.

vr,i=vi+βrvi,vr,i=vi+βrviformulae-sequencesubscript𝑣𝑟𝑖subscript𝑣𝑖direct-productsubscript𝛽𝑟subscript𝑣𝑖superscriptsubscript𝑣𝑟𝑖superscriptsubscript𝑣𝑖direct-productsubscript𝛽𝑟superscriptsubscript𝑣𝑖v_{r,i}=v_{i}+\beta_{r}\odot v_{i},v_{r,i}^{\prime}=v_{i}^{\prime}+\beta_{r}% \odot v_{i}^{\prime}italic_v start_POSTSUBSCRIPT italic_r , italic_i end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊙ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_r , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_β start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊙ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (10)

The study [42] suggests that there may be inherent mutual influences between relationships and proposes a Co-attention neighborhood aggregation strategy, utilizing [73], to learn the semantics of different relationships by replacing parts of the subgraph’s structure.

𝐇~c=[𝐇c;𝐀c𝐇s]subscript~𝐇𝑐subscript𝐇𝑐subscript𝐀𝑐subscript𝐇𝑠\displaystyle\widetilde{\mathbf{H}}_{c}=\left[\mathbf{H}_{c};\mathbf{A}_{c}% \mathbf{H}_{s}\right]over~ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = [ bold_H start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ; bold_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ] (11)
𝐇~s=[𝐇s;𝐀s𝐇~c]subscript~𝐇𝑠subscript𝐇𝑠subscript𝐀𝑠subscript~𝐇𝑐\displaystyle\widetilde{\mathbf{H}}_{s}=\left[\mathbf{H}_{s};\mathbf{A}_{s}% \widetilde{\mathbf{H}}_{c}\right]over~ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = [ bold_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ; bold_A start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT over~ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ]

The study by [13] suggests that firms in interfirm networks rely on substitute firms to share similar contexts in the parallel direction, and they use direction-based graph convolutional methods to capture this substitutive relationship.

Pi,(l+1)=σ(rj𝒩i,r1ci,r,(l)Wr,(l)Pj(l)),superscriptsubscript𝑃𝑖𝑙1𝜎subscript𝑟subscript𝑗superscriptsubscript𝒩𝑖𝑟1superscriptsubscript𝑐𝑖𝑟𝑙superscriptsubscript𝑊𝑟𝑙superscriptsubscript𝑃𝑗𝑙\displaystyle P_{i,\uparrow}^{(l+1)}=\sigma\left(\sum_{r}\sum_{j\in\mathcal{N}% _{i,\uparrow}^{r}}\frac{1}{c_{i,r,\uparrow}^{(l)}}W_{r,\uparrow}^{(l)}P_{j}^{(% l)}\right),italic_P start_POSTSUBSCRIPT italic_i , ↑ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i , ↑ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_i , italic_r , ↑ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG italic_W start_POSTSUBSCRIPT italic_r , ↑ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) , (12)
Pi,(l+1)=σ(rj𝒩i,r1ci,r,(l)Wr,(l)Pj(l))superscriptsubscript𝑃𝑖𝑙1𝜎subscript𝑟subscript𝑗superscriptsubscript𝒩𝑖𝑟1superscriptsubscript𝑐𝑖𝑟𝑙superscriptsubscript𝑊𝑟𝑙superscriptsubscript𝑃𝑗𝑙\displaystyle P_{i,\downarrow}^{(l+1)}=\sigma\left(\sum_{r}\sum_{j\in\mathcal{% N}_{i,\downarrow}^{r}}\frac{1}{c_{i,r,\downarrow}^{(l)}}W_{r,\downarrow}^{(l)}% P_{j}^{(l)}\right)italic_P start_POSTSUBSCRIPT italic_i , ↓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i , ↓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_i , italic_r , ↓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG italic_W start_POSTSUBSCRIPT italic_r , ↓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT )

In [87] proposed DHGAN model, which decouples and embeds relationships in hyperbolic space. To propagate the mutual influence between two different relationships, two rounds of aggregation operations are performed, as illustrated in the model diagram in Fig. 3.

 target node s c neighbors s s neighbors, superscript𝑠 target node  c neighbors superscript𝑠 s neighbors, \text{ target node }\stackrel{{\scriptstyle s}}{{\longleftarrow}}\text{ c % neighbors }\stackrel{{\scriptstyle s}}{{\dashleftarrow}}\text{ s neighbors, }target node start_RELOP SUPERSCRIPTOP start_ARG ⟵ end_ARG start_ARG italic_s end_ARG end_RELOP c neighbors start_RELOP SUPERSCRIPTOP start_ARG ⇠ end_ARG start_ARG italic_s end_ARG end_RELOP s neighbors, (13)
Refer to caption
\zihao-5Fig. 3: \zihao-5DHGAN

In [21], the authors utilize the pre-trained CLIP model [57] to learn the context of recipes. On the other hand, [77] suggests that the elasticity effect of purchase rate (PR) reflects the substitutability of products. They treat the purchase rate as a label and apply a logarithmic transformation to alleviate the issue of a long-tail

3.2.2 Substitutive Reasoning

Once the model has learned the characteristics of substitutive relationships, it can include predicting missing data and filling in unknown relationships through reasoning or modeling.

For example, studies such as [4, 47, 84, 6] have measured the strength of relationships between items by calculating their similarity. According to the study [6], if both A and B have similar interactions with other products, they are considered interchangeable.

EAB=subscript𝐸𝐴𝐵absent\displaystyle E_{AB}=italic_E start_POSTSUBSCRIPT italic_A italic_B end_POSTSUBSCRIPT = 12[KL(p(A)p(B))+KL(p(B)p(A))]\displaystyle-\frac{1}{2}[KL(p(\cdot\mid A)\|p(\cdot\mid B))+KL(p(\cdot\mid B)% \|p(\cdot\mid A))]- divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ italic_K italic_L ( italic_p ( ⋅ ∣ italic_A ) ∥ italic_p ( ⋅ ∣ italic_B ) ) + italic_K italic_L ( italic_p ( ⋅ ∣ italic_B ) ∥ italic_p ( ⋅ ∣ italic_A ) ) ] (14)
=\displaystyle== 12kA,B[p(kA)log(p(kA)p(kB))\displaystyle-\frac{1}{2}\sum_{k\neq A,B}\left[p(k\mid A)\cdot\log\left(\frac{% p(k\mid A)}{p(k\mid B)}\right)\right.- divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_A , italic_B end_POSTSUBSCRIPT [ italic_p ( italic_k ∣ italic_A ) ⋅ roman_log ( divide start_ARG italic_p ( italic_k ∣ italic_A ) end_ARG start_ARG italic_p ( italic_k ∣ italic_B ) end_ARG )
+p(kB)log(p(kB)p(kA))]\displaystyle\left.+p(k\mid B)\cdot\log\left(\frac{p(k\mid B)}{p(k\mid A)}% \right)\right]+ italic_p ( italic_k ∣ italic_B ) ⋅ roman_log ( divide start_ARG italic_p ( italic_k ∣ italic_B ) end_ARG start_ARG italic_p ( italic_k ∣ italic_A ) end_ARG ) ]

In the study [84], in order to further capture the semantic similarity between substitute items, the comparison of their semantics is based on the precise positions of two distinct products in the product category tree (see Figure 4).

Refer to caption
\zihao-5Fig. 4: \zihao-5Product category tree

According to [46], substitution between products only occurs when they belong to similar categories. They use the product category tree and the path where the products are located as the basis for convergence in substitution inference. Similar studies include [88, 76, 66], which also seek substitute items within similar categories.

In the substitution inference process, [66] introduces two constraints: one is the product category constraint, and the other is the extension of the relationship between products through two-hop paths. This method helps alleviate the sparsity issue in product relationships.

The study[61, 59, 10, 85, 13] combine substitution inference with personalized recommendations. In the research by [59, 10, 13], user information is integrated in the form of collaborative filtering. Research [61] introduces the element of “forward thinking”, where each customer’s choice can change the purchase probability of substitute products. Additionally, [10] proposes a variant of BPR [60] to unify substitution and personalization.[85] leverages different temporal patterns of sequential behaviors to understand user preferences in browsing and purchasing products. They propose a kernel transformer network for analysis in this context.

In the study by [82], a feed-forward neural network is used to determine substitution relationships. However, it may suffer from low interpretability. On the other hand, [77] combines Gradient Boosting Decision Trees (GBDT) with a transformer-based model called XLM-R [12] for text sequence processing. In the research conducted by [38], text adversarial attacks are employed. The study proposes three strategies for these attacks, namely recipe editing, personalized ingredient substitution, and universal ingredient replacement.

In the study by [41], after obtaining embeddings, they propose an outer product layer to reconstruct a given graph structure. The outer product operation, represented as hv(L)hw(L)tensor-productsuperscriptsubscript𝑣𝐿superscriptsubscript𝑤𝐿h_{v}^{(L)}\otimes h_{w}^{(L)}italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ⊗ italic_h start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT, is non-commutative, meaning that [hv(L),hw(L)]superscriptsubscript𝑣𝐿superscriptsubscript𝑤𝐿\left[h_{v}^{(L)},h_{w}^{(L)}\right][ italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] and [hw(L),hv(L)]superscriptsubscript𝑤𝐿superscriptsubscript𝑣𝐿\left[h_{w}^{(L)},h_{v}^{(L)}\right][ italic_h start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] yield different results. Therefore, the direction can be utilized for prediction purposes.

Loss Calculation: In the study by [21], a self-supervised contrastive loss function is employed [9, 31]. On the other hand, [82, 13, 76] utilize cross-entropy loss functions, with [82] employing categorical cross-entropy.

In [84], a penalty is applied when two substitutable products under similar categories in the category tree are mapped far apart in the embedding space based on their positions in the category tree.The model in [87] applies graph-based loss for each sub-GCN, inferring different relationships, and the final loss function can be represented as a multi-task loss.

In [76], the loss is determined based on a reward function:

=absent\displaystyle\mathcal{L}=caligraphic_L = i,jεPyi,jlogMFI(i,j)subscript𝑖𝑗subscript𝜀𝑃subscript𝑦𝑖𝑗MFI𝑖𝑗\displaystyle\sum_{i,j\in\varepsilon_{P}}y_{i,j}\cdot\log\operatorname{MFI}(i,j)∑ start_POSTSUBSCRIPT italic_i , italic_j ∈ italic_ε start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ roman_log roman_MFI ( italic_i , italic_j ) (15)
+i,jε¯P(1yi,j)log(1MFI(i,j))subscript𝑖𝑗subscript¯𝜀𝑃1subscript𝑦𝑖𝑗1MFI𝑖𝑗\displaystyle+\sum_{i,j\in\bar{\varepsilon}_{P}}\left(1-y_{i,j}\right)\cdot% \log(1-\operatorname{MFI}(i,j))+ ∑ start_POSTSUBSCRIPT italic_i , italic_j ∈ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 - italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) ⋅ roman_log ( 1 - roman_MFI ( italic_i , italic_j ) )

3.2.3 Simplified Inference Methods

In this survey, we divide substitution reasoning into four parts: feature representation, relationship learning, substitution reasoning, and integrated training. However, some studies focus only on one or two of these parts, making them more streamlined and lightweight compared to the aforementioned substitution reasoning models. Therefore, in the following content, we will discuss these methods.

In [62], a large-scale antibiotic-related knowledge graph is constructed, utilizing TF-IDF embeddings and feed-forward layers for dimensionality reduction. All entity descriptions in the knowledge graph are extracted from Drugbank, and an attention mechanism is used to allocate feature weights. This approach aims to understand drug properties from the perspective of drug similarity for substitution.

In [55], text embedding methods such as Word2Vec and BERT are utilized to train Food2Vec and FoodBERT models on a recipe dataset (as shown in Figure 5). The study also introduces the concept of multimodality by integrated use of textual and visual information.

In [63], multiple scoring criteria are combined to develop a heuristic method called DIISH for identifying ingredient substitutability. The method incorporates the following four components: Obtaining latent semantic information using word embedding models; Calculating Positive Pointwise Mutual Information (PPMI) by considering contextual information; Computing co-occurrence substitutability scores between ingredients A and B across different recipes; Determining the similarity between the two ingredients and the recipes. By incorporating scores from these four components, the DIISH method provides insights into ingredient substitutability.

In [30], three different customer behavior data sources, namely CSS, search logs, and OOS, were selected. Image and title text features were extracted from each product as input features. The M-HetSage model (as shown in Figure 5-3-2) was proposed, which combines the loss functions from different datasets and weights them to unify several different tasks within one architecture.

Refer to caption
\zihao-5Fig. 5: \zihao-5M-HetSage

Furthermore, a LaAP loss was proposed, which directly optimizes the target ranking metric mAP and is equipped with a list-wise attention mechanism.

AP(𝐙i,Yi.)=1Ni+k=1NiCk(𝐙i,Yi.)rk(Zi,Yi)\displaystyle AP\left(\mathbf{Z}_{i},\mathrm{Y}_{i}.\right)=\frac{1}{N_{i}^{+}% }\sum_{k=1}^{N_{i}}C_{k}\left(\mathbf{Z}_{i},\mathrm{Y}_{i}.\right)r_{k}\left(% \mathrm{Z}_{i},\mathrm{Y}_{i}\cdot\right)italic_A italic_P ( bold_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . ) italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( roman_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ) (16)
qAP(𝐙i,𝐘i.)=1Ni+m=1MC^m(𝐙i,𝐘i.)r^m(𝐙i,𝐘i.)\displaystyle qAP\left(\mathbf{Z}_{i},\mathbf{Y}_{i}.\right)=\frac{1}{N_{i}^{+% }}\sum_{m=1}^{M}\hat{C}_{m}\left(\mathbf{Z}_{i},\mathbf{Y}_{i}.\right)\hat{r}_% {m}\left(\mathbf{Z}_{i},\mathbf{Y}_{i}.\right)italic_q italic_A italic_P ( bold_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . ) over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . )
LaAP(𝐙i,Yi.)=1Bi=1BwiqAP(𝐙i,𝐘i),\displaystyle\operatorname{LaAP}\left(\mathbf{Z}_{i},\mathrm{Y}_{i}.\right)=% \frac{1}{B}\sum_{i=1}^{B}w_{i}\cdot qAP\left(\mathbf{Z}_{i},\mathbf{Y}_{i}% \cdot\right),roman_LaAP ( bold_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . ) = divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_q italic_A italic_P ( bold_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ) ,

3.3 Model Training

The independently described components are organically integrated into a complete model, with each part playing a crucial role in the integration process, providing different optimization contributions to the performance of the model.

3.3.1 Integration of Components:

The method proposed by [21] decouples ingredient substitution reasoning and recipe editing, by first obtaining the optimal ingredient replacement and then combining it with image information to decode the recipe for recipe editing. Different weights are assigned to each module, as described in [66, 84, 87, 13], and the scores are summed to obtain the final result. Attention mechanisms are employed in [87, 13] to allocate the weights. The research conducted by [6, 61] takes into consideration factors such as price, preferences, marketing sensitivity, as well as complementary and substitution relationships, to integrate the final results.

3.3.2 Training Optimization

Training optimization involves multiple aspects: whether the dataset used has annotated information, the loss function used to calculate the model’s loss.

Supervised Learning: Currently, most machine learning applications still adopt supervised learning methods because annotated data can often provide more accurate supervision signals, resulting in better-trained models.

In the research [77] uses weakly supervised learning to improve model performance in situations where supervision signals are limited or inaccurate by supplementing them with enhanced supervision signals.

Studies such as [6, 59, 84] employ unsupervised learning to learn models by automatically discovering patterns or features from the data.

The research conducted by [21] utilizes self-supervised contrastive losses on textual data. It learns representations by generating fake samples and contrasting them with real samples.

Negative Sample Sampling: A common method is to use random sampling, where a subset of samples is randomly selected from the dataset as negative samples. In the research conducted by [77], increasing the negative samples can lead to negative samples showing an opposite trend or distribution to positive samples in the feature space, hel** the model better distinguish between positive and negative samples. In the study by [6], products are randomly sampled from the entire training set based on their purchase frequency distribution, minimizing the likelihood of the current product appearing with randomly chosen irrelevant products. The selection and handling methods for negative samples depend on the specific task and dataset, and need to be adjusted and experimented with based on the actual situation to obtain better model performance.

4 EVALUATION AND DATASETS

In this chapter, different datasets and evaluation metrics commonly used for various inference tasks are introduced, and alternative practical applications are summarized. This section can help researchers find suitable datasets and evaluation metrics to test their methods.

4.1 Available Datasets

Commodity Dataset:

  • \bullet

    Amazon Dataset: It includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, prices, brands, and image features), and links (also bought/also viewed charts). The complete dataset is divided into subdatasets by category, such as Amazon-Books, Amazon-Instant Videos, and Amazon-Electronics. Subdatasets within Amazon are commonly used to test the performance of user-item relationship inference and alternative recommendation methods.

  • \bullet

    E-Commerce Data: It contains all the transactions of a certain company’s non-physical online retail registered in the UK from December 1, 2010 to December 9, 2011. The company mainly sells unique all-weather gifts, and many of its customers are wholesalers.

Food Dataset:

  • \bullet

    FoodOn: (http://foodon.org) is an alliance-driven project aimed at establishing a comprehensive and user-friendly global food ontology, accurately and consistently describing universally known foods from various cultures around the world, from farm to table. It addresses gaps in food terminology and supports the traceability of food.

  • \bullet

    Recipes 1M+: A large-scale, structured corpus consisting of over one million cooking recipes and 13 million food images. It is a publicly available recipe dataset that offers alignable multimodal data.

  • \bullet

    Open Food Facts: is a free, open, collaborative global food database that contains information about ingredients, allergens, nutritional values, and all the information we can find on product labels. It includes data from over 600,000 products from more than 150 countries/regions.

  • \bullet

    FoodKG: FoodKG is a large-scale, unified food knowledge graph that integrates food ontology, recipes, ingredients, and nutritional data. It incorporates FoodOn into its WhatToMake ontology and includes recipe and nutrient instances extracted from Recipe1M+ as well as nutrient records from the United States Department of Agriculture (USDA). A comprehensive food knowledge graph with extensive recipe and nutritional information can support various applications such as recipe recommendation, ingredient substitution, and quality control.

Datasets Field Links Size Refs
Amazon Dataset E-Commerce http://jmcauley.ucsd.edu/data/amazon/
Ratings: 82.83 million
Users: 20.98 million
Items: 9.35 million
Timespan: May 1996 - July 2014
[76, 85, 87, 10, 82, 84, 46]
[47, 66, 42, 59, 7, 41]
Online Retail E-Commerce https://archive.ics.uci.edu/dataset/352/online+retail
Instances:541909
Features:6
Timespan:01/12/2010 and 09/12/2011
[61]
FoodOn Food http://foodon.org over 9,600 generic food product [88, 88]
Recipes 1M+ Food http://im2recipe.csail.mit.edu/
structured cooking recipes :one million
associated images: 13M
[55, 21]
Open Food Facts Food https://world.openfoodfacts.org/ product: 3068832
FoodKG Food https://foodkg.github.io approx 63 million triples [38]
\zihao-5Table 4: \zihao-5Dataset

4.2 Evaluation Criteria

Selecting appropriate metrics to assess the performance of comparative methods is of paramount importance. Below is a summary of the evaluation criteria utilized for various tasks.

 HitRate =TPTP+FN HitRate 𝑇𝑃𝑇𝑃𝐹𝑁\text{ HitRate }=\frac{TP}{TP+FN}HitRate = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_N end_ARG (17)
Literature Recall MRR HR AuPRC NDGC mAP Hit Accuracy Precision
[21]
[77]
[38]
[85]
[87]
[30]
[13]
[76]
[55]
[63]
[10]
[42]
[6]
[82]
[62]
[84]
[59]
[66]
[61]
[46]
[47]
[7]
[41]
\zihao-5Table 5: \zihao-5Evaluation Criteria

Precision, Recall, and F1 are widely used to evaluate the accuracy of top-K recommendations. Precision@K measures the proportion of items clicked by the user among the recommended top-K items. Recall@K calculates the proportion of user clicks on the recommended top-K items compared to the entire set of clicks. F1@K is a combination of Precision@K and Recall@K.

Precision@K (u)=|RK(u)T(u)|K,Precision@K 𝑢superscript𝑅𝐾𝑢𝑇𝑢𝐾\displaystyle\text{ Precision@K }(u)=\frac{\left|R^{K}(u)\cap T(u)\right|}{K},\quadPrecision@K ( italic_u ) = divide start_ARG | italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ) ∩ italic_T ( italic_u ) | end_ARG start_ARG italic_K end_ARG , (18)
Recall@K (u)=|RK(u)T(u)||T(u)|,Recall@K 𝑢superscript𝑅𝐾𝑢𝑇𝑢𝑇𝑢,\displaystyle\text{ Recall@K }(u)=\frac{\left|R^{K}(u)\cap T(u)\right|}{|T(u)|% }\text{, }Recall@K ( italic_u ) = divide start_ARG | italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ) ∩ italic_T ( italic_u ) | end_ARG start_ARG | italic_T ( italic_u ) | end_ARG ,
F1@K (u)=2× Precision @K(u)× Recall@ K(u) Precision @K(u)+ Recall@ K(u).F1@K 𝑢2 Precision @𝐾𝑢 Recall@ 𝐾𝑢 Precision @𝐾𝑢 Recall@ 𝐾𝑢\displaystyle\text{ F1@K }(u)=\frac{2\times\text{ Precision }@K(u)\times\text{% Recall@ }K(u)}{\text{ Precision }@K(u)+\text{ Recall@ }K(u)}.F1@K ( italic_u ) = divide start_ARG 2 × Precision @ italic_K ( italic_u ) × Recall@ italic_K ( italic_u ) end_ARG start_ARG Precision @ italic_K ( italic_u ) + Recall@ italic_K ( italic_u ) end_ARG .

HR (Hit Rate) measures the proportion of users who have clicked on at least one recommended item, where T(u) represents the ground truth item set, RK(u)superscript𝑅𝐾𝑢R^{K}(u)italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ) represents the top-K recommended item set, and I(·) is the indicator function.

 HR@K =1|𝒰|Σu𝒰I(|RK(u)T(u)|>0) HR@K 1𝒰subscriptΣ𝑢𝒰𝐼superscript𝑅𝐾𝑢𝑇𝑢0\text{ HR@K }=\frac{1}{|\mathcal{U}|}\Sigma_{u\in\mathcal{U}}I\left(\left|R^{K% }(u)\cap T(u)\right|>0\right)HR@K = divide start_ARG 1 end_ARG start_ARG | caligraphic_U | end_ARG roman_Σ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT italic_I ( | italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ) ∩ italic_T ( italic_u ) | > 0 ) (19)

NDCG (Normalized Discounted Cumulative Gain) distinguishes the contribution of accurately recommended items based on their ranking position, where (RkK(u)(R_{k}^{K}(u)( italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ) represents the k-th item in the recommendation list RK(u)superscript𝑅𝐾𝑢R^{K}(u)italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ).

 NDCG@K =1|𝒰|u𝒰k=1KI(RkK(u)T(u))log(k+1)k=1K1log(k+1) NDCG@K 1𝒰subscript𝑢𝒰superscriptsubscript𝑘1𝐾𝐼superscriptsubscript𝑅𝑘𝐾𝑢𝑇𝑢𝑘1superscriptsubscript𝑘1𝐾1𝑘1\text{ NDCG@K }=\frac{1}{|\mathcal{U}|}\sum_{u\in\mathcal{U}}\frac{\sum_{k=1}^% {K}\frac{I\left(R_{k}^{K}(u)\in T(u)\right)}{\log(k+1)}}{\sum_{k=1}^{K}\frac{1% }{\log(k+1)}}NDCG@K = divide start_ARG 1 end_ARG start_ARG | caligraphic_U | end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_I ( italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ) ∈ italic_T ( italic_u ) ) end_ARG start_ARG roman_log ( italic_k + 1 ) end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG roman_log ( italic_k + 1 ) end_ARG end_ARG (20)

MAP (Mean Average Precision) is a widely adopted ranking metric that measures the average precision for users:

MAP@K=MAP@Kabsent\displaystyle\text{MAP@K}=MAP@K = 1|𝒰|u𝒰k=1KI(RkK(u)T(u))Precision@K(u)K1𝒰subscript𝑢𝒰superscriptsubscript𝑘1𝐾𝐼superscriptsubscript𝑅𝑘𝐾𝑢𝑇𝑢Precision@K𝑢𝐾\displaystyle\frac{1}{|\mathcal{U}|}\sum_{u\in\mathcal{U}}\sum_{k=1}^{K}\frac{% I(R_{k}^{K}(u)\in T(u))\text{Precision@K}(u)}{K}divide start_ARG 1 end_ARG start_ARG | caligraphic_U | end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_I ( italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_u ) ∈ italic_T ( italic_u ) ) Precision@K ( italic_u ) end_ARG start_ARG italic_K end_ARG (21)

MRR (Mean Reciprocal Ranking) is an indicator used to evaluate the performance of ranking tasks, where the prediction with a higher rank corresponds to a larger reciprocal value, and a better score is reflected by the sum of these values:

MRR=1|S|i=1|S|1 rank i𝑀𝑅𝑅1𝑆superscriptsubscript𝑖1𝑆1subscript rank 𝑖MRR=\frac{1}{|S|}\sum_{i=1}^{|S|}\frac{1}{\text{ rank }_{i}}italic_M italic_R italic_R = divide start_ARG 1 end_ARG start_ARG | italic_S | end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG rank start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (22)

AUPRC (Area Under the Precision-Recall Curve) represents the area under the precision-recall curve, and its value ranges from 0 to 1. A value closer to 1 indicates better model performance.

TP represents true positive (the number of samples correctly predicted as positive by the model), FN represents false negative (the number of samples incorrectly predicted as negative by the model), FP represents false positive (the number of samples incorrectly predicted as positive by the model), and TN represents true negative (the number of samples correctly predicted as negative by the model).

Accuracy measures the overall prediction accuracy of the model on the entire sample set:

 Accuracy =TP+TNTP+TN+FP+FN Accuracy 𝑇𝑃𝑇𝑁𝑇𝑃𝑇𝑁𝐹𝑃𝐹𝑁\text{ Accuracy }=\frac{TP+TN}{TP+TN+FP+FN}Accuracy = divide start_ARG italic_T italic_P + italic_T italic_N end_ARG start_ARG italic_T italic_P + italic_T italic_N + italic_F italic_P + italic_F italic_N end_ARG (23)

Hit Rate is another common performance evaluation metric for classification models, referring to the proportion of samples correctly predicted as positive by the model out of all samples:

4.3 Comparison of the Results

Citation Model Name Year Code Link
[21] GISMo 2023 https://github.com/facebookresearch/gismo
[77] - 2023 -
[7] EMRIGCN 2023 -
[38] - 2022 https://github.com/DiyaLI916/FoodKGE
[88] - 2022 http://ontologydesignpatterns.org/wiki/Submissions:Food_Recipe
[85] SCG-SPRe 2022 -
[87] DHGAN 2022 https://github.com/wt-tju/DHGAN
[30] M-HetSage 2022 -
[13] SPGCN 2022 https://github.com/lem0nle/SPGCN
[41] IRGNN 2022 https://github.com/wwliu555/IRGNN_TNNLS_2021
[76] KAPR 2021 https://gitee.com/yangzi**g_flower/kapr/tree/master
[55]
Food2Vec
FoodBERT
2021 https://github.com/ChantalMP/
[63] DIISH 2020 https://foodkg.github.io/subs.html
[10] A2CF 2020 https://bit.ly/bitbucket-A2CF
[42] DecGCN 2020 https://github.com/liuyiding1993/CIKM2020_DecGCN
[6] Product2Vec 2020 -
[82] RRN 2019 -
[62] KGDDS 2019 http://www.iasokg.com/
[84] SPEM 2019 -
[59]
LVA
CLVA
2019 https://github.com/VRM1/WSDM19
[66] PMSC 2018 -
[61] SHOPPER 2017 https://github.com/franrruiz/shopper-src
[46] Sceptre 2015 http://cseweb.ucsd.edu/similar-to\simjmcauley/
[47] - 2015 http://cseweb.ucsd.edu/similar-to\simjmcauley/.
[4] - 2014 -
\zihao-5Table 6: \zihao-5Paper Code Link

The table 6 below collects the code for the models compared in this review, along with a comparison of their results.

In most fields, datasets are widely distributed, and there is no universal dataset, making it difficult to make cross comparisons. In the e-commerce field, many models tend to choose Amazon’s dataset for evaluating model performance. It is worth noting that Amazon’s dataset also includes various types of goods. As table 7, we present a comparison of the results of some models in terms of evaluation metrics such as Hit@10, NDCG@10, Accuracy, etc.

Segments Refs NDGC@10 HR@10 Hit@10 Accuracy Precision Women’s Clothing [66] - - - - 0.9777 [46] - - - 0.9587 - Video Games [84] - - 0.91 - - [41] - - - - 0.8177 Toys_and_Games [87] 0.794 0.914 - - - Sports_and_Outdoors [87] 0.745 0.866 - - - Office [10] 0.254 0.3143 - - - [66] - - - - 0.9778 Musical Instrurnents [41] - - - - 0.7977 Men’s Clothing [59] - - - 0.9282 - [46] - - - 0.9669 - Home and Kitchen [66] - - - - 0.8631 Grocery [85] 0.4097 0.6141 - - - Electronics [87] 0.713 0.855 - - - [76] - - 0.87 - - [42] 0.546 0.713 - - - [82] - - - 0.9351 - [84] - - 0.77 - - [59] - - - 0.9547 - [66] - - - - 0.979 [46] - - - 0.957 - [41] - - - - 0.757 Clothing [42] 0.472 0.631 - - - [41] - - - - 0.7428 Cell Phone [85] 0.4542 0.6588 - - - [76] - - 0.78 - - [10] 0.198 0.3449 - - - [82] - - - 0.9422 - [84] - - 0.56 - - [66] - - - - 0.9016 Books [59] - - - 0.9571 - [46] - - - 0.9376 - Beauty [76] - - 0.94 - - [42] 0.593 0.747 - - - [82] - - - 0.8546 - [84] - - 0.96 - - Baby [46] - - - 0.9218 - [85] 0.292 0.4857 - - - [76] - - 0.89 - - [82] - - - 0.8975 - [84] - - 0.89 - - Automovie [10] 0.1788 0.2978 - - -

\zihao-5Table 7: \zihao-5Amazon dataset results comparison

5 FUTURE DIRECTIONS AND PROSPECTS

Delving deeper into the study , the prospects for enhancing substitute recommendation systems appear promising. Through ongoing research and innovation, there is an opportunity to significantly improve the personalization and accuracy of substitute recommendations across various domains. This paves the way for the continued development and widespread application of substitute relationship analysis, opening up new avenues for exploration and advancement in this field.

5.1 Complex Substitute Relationship Modeling

5.1.1 Substitute relationship modeling in field of commodities

Fashion Recommendation considers both the similarity and takes compatibility into account [23, 39, 50]. This is a challenging task because it often requires the use of information from different sources, such as sha** fashion influences from photographs [1]. The future challenge lies in how to introduce more realistic try-on effects, such as using 3D virtual fitting technology to allow users to better preview the effect of products.

In the case of electrical appliances, substitute relationships mainly involve matching at the functional level. A future challenge is how to infer from the functional characteristics of electrical products to help users find substitute products. Another challenge in substitute relationship inference is effectively balancing the relationships among product functionality, performance, and price. This means that it is necessary to develop models that can consider multiple factors comprehensively .

Artistic works such as books, movies, and music can be collectively referred to as cultural and entertainment works. In the context of substitute recommendations for cultural and entertainment works, the characteristics of the content need to be considered.

  • \bullet

    Book recommendation: In book recommendation, it is necessary to consider semantic-level data features. How to accurately capture users’ subjective preferences and personalized needs is a challenge in building personalized book recommendation systems.

  • \bullet

    Movie recommendation: In movie recommendation, in addition to considering semantic features such as plot and theme, factors such as visual and audio styles can also be taken into account, by using movie tags, keywords, or metadata.

  • \bullet

    Music recommendation: The style, melody, rhythm, and instrumental performance of music are all important factors that determine the similarity between pieces of music. Music feature extraction and audio analysis techniques can be used to calculate the similarity of music.

5.1.2 Other fields

Modeling cross-domain substitution relationships faces unique challenges and opportunities[56, 70]. When conducting substitute recommendations in non-e-commerce fields such as academia, tourism, and healthcare, specific problems and challenges are encountered.

  • \bullet

    Academic literature covering multiple disciplinary fields , often characterized by high specialization and complexity. Therefore, when conducting substitute recommendations, it is necessary to consider how to accurately understand and match the professional requirements of different disciplinary fields.

  • \bullet

    In the tourism field, it may involve text-based destination introductions, travel guides. Substitute recommendations need to take into account seasonal factors , allowing users to choose the right destination at the right time. This can involve introducing cross-regional samples during the training process or using data synthesis techniques to generate samples with regional characteristics [24].

  • \bullet

    Image-based substitute recommendations may involve medical image diagnostics, necessitating the assurance that the recommended image data meets the requirements for diagnostic accuracy and timeliness in clinical applications.

Recommendations in the field of diet need to be based on reliable and accurate data sources, such as nutritional components and calorie information of ingredients. It is necessary to comprehensively consider factors such as nutritional balance, taste, health, and satiety. This involves the problem of multi-objective optimization and trade-offs. Balancing different needs is a difficult task.

5.1.3 Multimodal and Multilevel Relationship Modeling

By considering both multimodal and multilevel relationship modeling, we can more comprehensively and accurately represent substitute relationships between entities.

In multimodal relationship modeling, various data types can be combined to model substitute relationships. For example, semantic information from textual data can be fused with visual features from image data [68, 28] . Cross-modal fusion enables the interaction of information between different data types, providing richer feature and semantic expressions in relationship modeling [80].

Multilevel relationship modeling considers the various levels and complexities of relationships between entities [45]. This can be achieved by establishing a multilevel relationship network, where each level corresponds to different relationship types or levels of relationship abstraction. For example, in a knowledge graph, relationships can be categorized into different hierarchical levels such as parent-child relationships, attribute relationships, and so on, and graph neural networks or multilevel attention mechanisms can be used to model these relationships [54]. In this way, we can better capture relationships at different levels, leading to more accurate substitute relationship reasoning.

Combining substitutions can lead to more accurate and practical results.One-to-one substitutions may not always meet the users’ needs. For example, when a user wants to substitute chocolate, they may require cocoa powder, sugar, and butter. Combination substitutions also faces some challenges, encounter challenges such as compatibility, coordination, manufacturing costs, and user acceptance. Items need adjustments to work well together and might raise manufacturing complexity.

5.2 Considering Temporality and Dynamics

5.2.1 User Intent Extraction

Context-based substitute reasoning refers to inferring the current needs and intentions of users based on their context and background information, such as geographic location, time, personal preferences, etc., and then making substitute recommendations accordingly. The approach to extracting user intent involves using deep learning-based large language models and intent recognition, as well as mining algorithms based on user behavior and interests. By analyzing users’ browsing history, purchase records, or rating data [18, 17, 25]. At the same time, establishing real-time interaction with users is also a key.

When there are shifts or multiple complex intents in a user’s dialogue, this is because in such cases, the user’s expressions may be ambiguous or unclear, with multiple intents intertwined or overlap**. By incorporating user feedback, continuously optimizing intent recognition models, and utilizing contextual information, the system can achieve more accurate intent classification.

5.2.2 The problem of sequence recommendation

In the real world, there are sequential dependencies between many events and objects. For example, sequence recommendation can help users adapt to temporal changes and is widely used in fields like e-commerce, news, and social media[64, 83, 69, 17, 52]. User interests can include long-term and short-term preferences. Session-based recommendation systems[27, 72, 48, 67] capture users’ short-term preferences from their recent sessions and reflect preference dynamics from one session to another, thereby providing more accurate and timely recommendations.

However, the current relationship modeling models face interpretability issues. To address this, researchers have been working on develo** model interpretation techniques and designing models with strong interpretability[2, 37, 58]. Model interpretation techniques include rule-based explanations, local sensitivity analysis, attention mechanisms, etc., which can help reveal the basis and reasoning process behind model decisions. In addition, designing models with strong interpretability is also important. For example, using graph neural networks to model relationships in graph data can provide more intuitive explanations through node and edge representations.

5.3 Integration with Large Language Models

Large language models, such as GPT (Generative Pre-trained Transformer), have become a popular research direction. These models have a large number of parameters and are pre-trained on massive amounts of data, making them applicable to various tasks such as question answering, sentiment analysis, text classification, and more.

Large language models offer several advantages in relationship modeling. They can learn language patterns, capture rich semantic information, and have broad applications in substituting relational reasoning. However, these models also have limitations, with the most important one being the need for substantial amounts of training data. Additionally, the performance of large language models on specific tasks can be influenced by issues like data skew and domain differences, requiring further optimization and adaptation to achieve better results. Therefore, when using large language models for relationship modeling, it is important to consider both their strengths and limitations and thoroughly evaluate their suitability for the specific task at hand.

5.3.1 Data-driven Relationship Reasoning

Large language models exhibit biases in their training data, leading to the generation of results that may contain discriminatory, biased, or unfair content. In many fields, obtaining large-scale annotated data is a time-consuming and costly task. The advantage of unsupervised learning is that it can learn from unlabeled data and discover patterns without relying on a large amount of labeled data [65]. Unsupervised learning faces challenges such as distribution shift of data and label scarcity. To overcome these challenges some potential solutions can be proposed [8, 5]. The “few-shot” problem refers to a situation in machine learning and pattern recognition where a class has a limited number of samples, making it challenging for the model to grasp the characteristics and patterns of that class sufficiently during the learning process [43, 33, 20]. Substitute reasoning can address the few-shot problem by generating new samples through analogy reasoning or inference between samples, thereby expanding the quantity and diversity of existing data.

5.3.2 Generative Reasoning

Large language models are typically based on statistical modeling and lack true reasoning abilities. Generative reasoning is an inference approach based on logic and reasoning rules, aimed at deriving logically correct conclusions that are more flexible, rigorous, and expressive. Integrating generative reasoning with large language models allows for the combination of existing knowledge graphs or knowledge bases with the models. By incorporating external knowledge, reasoning can be supported, thereby enhancing the accuracy and logic of generative reasoning. When combined with substitute reasoning, for example, when recommending a product to a user, the model can generate a text that describes the characteristics and advantages of the product, explaining why it is a good choice for the user.

6 CONCLUSIONS

Substitution relationships play a crucial role in various aspects of daily life, from cooking to consumer demands. The ability to identify appropriate substitutions is a valuable skill that individuals must possess, as it enables them to adapt and overcome challenges while maintaining the desired outcomes. This reasoning for substitution relationships is essential in both personal and professional settings, as it aids in decision-making and problem-solving.

To delve deeper into substitution relationships, research methods have been developed to understand and predict these associations. Machine learning algorithms, natural language processing techniques, and semantic analysis tools have all contributed to enhancing our understanding of substitution reasoning. These methods analyze large datasets, including recipes, customer feedback, and historical buying patterns, to uncover patterns and establish substitution relationships. By combining qualitative and quantitative approaches, researchers can gain insights into the factors that influence substitution choices and develop accurate predictive models.

In conclusion, reasoning for substitution relationships is a fundamental aspect of daily life, influencing various domains such as cooking, consumer demands, and recommendation systems. It empowers individuals to adapt and make informed choices, while also enabling businesses to meet customer needs. Through research and innovation, we can continue to enhance personalization and accuracy in substitution recommendations, further improving efficiency and customer satisfaction in industries like e-commerce and food. Substitution reasoning is a dynamic field with vast potential, and further exploration and development in this area will undoubtedly lead to exciting advancements in the future.

\zihao

5 Acknowledgment

\zihao

5– This review study was carried out with the support of the National Science Foundation of China (No.62162048,62262047), the Self-topic Project of Engineering Research Center of Ecological Big Data, the Natural Science Foundation of Inner Mongolia in China (No.2021MS01023), the National Natural Science Foundation of China (No.62062052), Ministry of Education, and Inner Mongolia Science and Technology Plan Project (2021GG0164).

\zihao

5

References

  • [1] Ziad Al-Halah and Kristen Grauman. Modeling fashion influence from photos. IEEE Transactions on Multimedia, 23:4143–4157, 2020.
  • [2] Ke** Bi, Qingyao Ai, and W. Bruce Croft. Learning a fine-grained review-based transformer model for personalized product search. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, pages 123–132. Association for Computing Machinery, 2021.
  • [3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
  • [4] Corrado Boscarino, Vladimir Nedović, Nicole J. J. P. Koenderink, and Jan L. Top. Automatic extraction of ingredient’s substitutes. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pages 559–564. ACM, 2014.
  • [5] Bindita Chaudhuri, Nikolaos Sarafianos, Linda G. Shapiro, and Tony Tung. Semi-supervised synthesis of high-resolution editable textures for 3d humans. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7987–7996, 2021.
  • [6] Fanglin Chen, Xiao Liu, Davide Proserpio, Isamar Troncoso, and Feiyu Xiong. Studying product competition using representation learning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1261–1268. ACM, 2020.
  • [7] Huajie Chen, Jiyuan He, Weisheng Xu, Tao Feng, Ming Liu, Tianyu Song, Runfeng Yao, and Yuanyuan Qiao. Enhanced multi-relationships integration graph convolutional network for inferring substitutable and complementary items. 37(4):4157–4165, 2023.
  • [8] Jiawei Chen, Chengquan Jiang, Can Wang, Sheng Zhou, Yan Feng, Chun Chen, Martin Ester, and Xiangnan He. Cosam: An efficient collaborative adaptive sampler for recommendation. ACM Transactions on Information Systems (TOIS), 39:1 – 24, 2020.
  • [9] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple framework for contrastive learning of visual representations. ArXiv, abs/2002.05709, 2020.
  • [10] Tong Chen, Hongzhi Yin, Guanhua Ye, Zi Huang, Yang Wang, and Meng Wang. Try this instead: Personalized and interpretable substitute recommendation, 2020.
  • [11] Wen-Huang Cheng, Sijie Song, Chieh-Yun Chen, Shintami Chusnul Hidayati, and Jiaying Liu. Fashion meets computer vision. ACM Computing Surveys (CSUR), 54:1 – 41, 2020.
  • [12] Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale. In Annual Meeting of the Association for Computational Linguistics, 2019.
  • [13] Le Dai, Yu Yin, Chuan Qin, Enhong Chen, and Hui Xiong. Decomposing complementary and substitutable relations for intercorporate investment recommendation. In 2022 IEEE International Conference on Data Mining (ICDM), pages 909–914. IEEE, 2022.
  • [14] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, K. Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  • [15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019.
  • [16] **gtao Ding, Guanghui Yu, Yong Li, Xiangnan He, and Depeng **. Improving implicit recommender systems with auxiliary data. ACM Transactions on Information Systems (TOIS), 38:1 – 27, 2020.
  • [17] Yujuan Ding, Yunshan Ma, Wai Keung Wong, and Tat-Seng Chua. Modeling instant user intent and content-level transition for sequential fashion recommendation. IEEE Transactions on Multimedia, 24:2687–2700, 2022.
  • [18] Arnaud Doniec, Stéphane Lecoeuche, René Mandiau, and A Sylvain. Purchase intention-based agent for customer behaviours. Inf. Sci., 521:380–397, 2020.
  • [19] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020.
  • [20] Ziwei Fan, Zhiwei Liu, Shelby Heinecke, Jianguo Zhang, Huan Wang, Caiming Xiong, and Philip S. Yu. Zero-shot item-based recommendation via multi-task product knowledge graph pre-training. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, pages 483–493. Association for Computing Machinery, 2023.
  • [21] Bahare Fatemi, Quentin Duval, Rohit Girdhar, Michal Drozdzal, and Adriana Romero-Soriano. Learning to substitute ingredients in recipes, 2023.
  • [22] Xiaoling Gu, Fei Gao, Min Tan, and Pai Peng. Fashion analysis and understanding with artificial intelligence. Inf. Process. Manag., 57:102276, 2020.
  • [23] ** Fan, Pai Peng, and Mohan S. Kankanhalli. Paint: Photo-realistic fashion design synthesis. ACM Transactions on Multimedia Computing, Communications and Applications, 20:1 – 23, 2022.
  • [24] Gilles Hacheme and Nouréini Sayouti. Neural fashion image captioning : Accounting for data diversity. ArXiv, abs/2106.12154, 2021.
  • [25] Mingkai He, Weike Pan, and Zhong Ming. 2bar: Behavior-aware recommendation for sequential heterogeneous one-class collaborative filtering. Inf. Sci., 608:881–899, 2022.
  • [26] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020.
  • [27] Balázs Hidasi and Alexandros Karatzoglou. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 843–852, 2018.
  • [28] Wei-Lin Hsiao and Kristen Grauman. From culture to clothing: Discovering the world events behind a century of fashion images. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1046–1055, 2021.
  • [29] Hyunwoo Hwangbo, Yang Sok Kim, and Kyung ** Cha. Recommendation system development for fashion retail e-commerce. Electron. Commer. Res. Appl., 28:94–101, 2018.
  • [30] Tong Jian, Fan Yang, Zhen Zuo, Wenbo Wang, Michinari Momma, Tong Zhao, Chaosheng Dong, Yan Gao, and Yi Sun. Multi-task gnn for substitute identification. In Companion Proceedings of the Web Conference 2022, pages 228–231. ACM, 2022.
  • [31] Rudolf Kadlec, Ondrej Bajgar, and Jan Kleindienst. Knowledge base completion: Baselines strike back. In Rep4NLP@ACL, 2017.
  • [32] Wang-Cheng Kang, Eric Kim, Jure Leskovec, Charles R. Rosenberg, and Julian McAuley. Complete the look: Scene-based complementary product recommendation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10524–10533, 2018.
  • [33] Sungwon Kim, Junseok Lee, Namkyeong Lee, Wonjoong Kim, Seungyoon Choi, and Chanyoung Park. Task-equivariant graph few-shot learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pages 1120–1131. Association for Computing Machinery, 2023.
  • [34] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.
  • [35] Young Kwark, Gene Moo Lee, Paul A. Pavlou, and Liangfei Qiu. On the spillover effects of online product reviews on purchases: Evidence from clickstream data. Fox: Management Information Systems (Topic), 2016.
  • [36] Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents. In International Conference on Machine Learning, 2014.
  • [37] Trung-Hoang Le and Hady W. Lauw. Explainable recommendation with comparative constraints on product aspects. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, WSDM ’21, pages 967–975. Association for Computing Machinery, 2021.
  • [38] Diya Li and Mohammed J. Zaki. Food knowledge representation learning with adversarial substitution. In AACL, 2022.
  • [39] Shuiying Liao, Yujuan Ding, and P.Y. Mok. Recommendation of mix-and-match clothing by modeling indirect personal compatibility. ICMR ’23, page 560–564, New York, NY, USA, 2023. Association for Computing Machinery.
  • [40] Yen-Liang Lin, S. Tran, and Larry S. Davis. Fashion outfit complementary item retrieval. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3308–3316, 2019.
  • [41] Weiwen Liu, Yin Zhang, Jianling Wang, Yun He, James Caverlee, Patrick P. K. Chan, Daniel S. Yeung, and Pheng-Ann Heng. Item relationship graph neural networks for e-commerce. 33(9):4785–4799, 2022.
  • [42] Yiding Liu, Yulong Gu, Zhuoye Ding, Junchao Gao, Ziyi Guo, Yongjun Bao, and Weipeng Yan. Decoupled graph convolution network for inferring substitutable and complementary items. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 2621–2628. ACM, 2020.
  • [43] Yonghao Liu, Mengyu Li, Ximing Li, Fausto Giunchiglia, Xiaoyue Feng, and Renchu Guan. Few-shot node classification on attributed networks with graph meta-learning. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, pages 471–481. Association for Computing Machinery, 2022.
  • [44] Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
  • [45] Yao Ma, Zhaochun Ren, Ziheng Jiang, Jiliang Tang, and Dawei Yin. Multi-dimensional network embedding with hierarchical structure. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, page 387–395, New York, NY, USA, 2018. Association for Computing Machinery.
  • [46] Julian McAuley, Rahul Pandey, and Jure Leskovec. Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM, 2015.
  • [47] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43–52. ACM, 2015.
  • [48] Wen**g Meng, Deqing Yang, and Yanghua Xiao. Incorporating user micro-behaviors and item knowledge into multi-task learning for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1091–1100. ACM, 2020.
  • [49] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013.
  • [50] Seyed Omid Mohammadi and Ahmad Kalhor. Smart fashion: A review of ai applications in the fashion & apparel industry. ArXiv, abs/2111.00905, 2021.
  • [51] Ioannis Pachoulakis and Kostas Kapetanakis. Augmented reality platforms for virtual fitting rooms. The International Journal of Multimedia & Its Applications, 4:35–46, 2012.
  • [52] Ka-Ming Pang, Xingxing Zou, and Wai Keung Wong. Modeling fashion compatibility with explanation by using bidirectional lstm. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3889–3893, 2021.
  • [53] Rastislav Papso. Complementary product recommendation for long-tail products. In Proceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23, pages 1305–1311. Association for Computing Machinery, 2023.
  • [54] Shubham Patil, Debopriyo Banerjee, and Shamik Sural. A graph theoretic approach for multi-objective budget constrained capsule wardrobe recommendation. ACM Transactions on Information Systems (TOIS), 40:1 – 33, 2021.
  • [55] Chantal Pellegrini, Ege Özsoy, Monika Wintergerst, and Georg Groh. Exploiting food embeddings for ingredient substitution:. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, pages 67–77. SCITEPRESS - Science and Technology Publications, 2021.
  • [56] Mehmet Pilancı and Elif Vural. Domain adaptation on graphs by learning aligned graph bases. 34(2):587–600, 2022.
  • [57] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
  • [58] Antonio Rago, Oana Cocarascu, Christos Bechlivanidis, David A. Lagnado, and Francesca Toni. Argumentative explanations for interactive recommendations. Artif. Intell., 296:103506, 2021.
  • [59] Vineeth Rakesh, Suhang Wang, Kai Shu, and Huan Liu. Linked variational autoencoders for inferring substitutable and supplementary items. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 438–446. ACM, 2019.
  • [60] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. ArXiv, abs/1205.2618, 2009.
  • [61] Francisco J. R. Ruiz, Susan Athey, and David M. Blei. Shopper: A probabilistic model of consumer choice with substitutes and complements, 2019.
  • [62] Ying Shen, Kaiqi Yuan, **gchao Dai, Buzhou Tang, Min Yang, and Kai Lei. Kgdds: A system for drug-drug similarity measure in therapeutic substitution based on knowledge graph curation. 43(4):92, 2019.
  • [63] Sola S. Shirai, Oshani Seneviratne, Minor E. Gordon, Ching-Hua Chen, and Deborah L. McGuinness. Identifying ingredient substitutions using a knowledge graph of food. 3:621766, 2021.
  • [64] Zhongchuan Sun, Bin Wu, Youwei Wang, and Yangdong Ye. Sequential graph collaborative filtering. Inf. Sci., 592:244–260, 2022.
  • [65] Takashi Wada, Timothy Baldwin, Yuji Matsumoto, and Jey Han Lau. Unsupervised lexical substitution with decontextualised embeddings. In International Conference on Computational Linguistics, 2022.
  • [66] Zihan Wang, Ziheng Jiang, Zhaochun Ren, Jiliang Tang, and Dawei Yin. A path-constrained framework for discriminating substitutable and complementary products in e-commerce. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 619–627. ACM, 2018.
  • [67] Zihan Wang, Gang Wu, and Yan Wang. Effectively using long and short sessions for multi-session-based recommendations, 2022.
  • [68] Bin Wu, Xiangnan He, Yun Chen, Liqiang Nie, Kai Zheng, and Yangdong Ye. Modeling product’s visual and functional characteristics for recommender systems. IEEE Transactions on Knowledge and Data Engineering, 34:1330–1343, 2020.
  • [69] Bin Wu, Xiangnan He, Qi Zhang, Meng Wang, and Yangdong Ye. Gcrec: Graph-augmented capsule network for next-item recommendation. IEEE Transactions on Neural Networks and Learning Systems, 34:10164–10177, 2022.
  • [70] Bin Wu, Lihong Zhong, Lina Yao, and Yangdong Ye. Eagcn: An efficient adaptive graph convolutional network for item recommendation in social internet of things. IEEE Internet of Things Journal, 9:16386–16401, 2022.
  • [71] Longfeng Wu, Yao Zhou, and Dawei Zhou. Towards high-order complementary recommendation via logical reasoning network. In 2022 IEEE International Conference on Data Mining (ICDM), pages 1227–1232, 2022.
  • [72] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. Session-based recommendation with graph neural networks. 33(01):346–353, 2019.
  • [73] Caiming Xiong, Victor Zhong, and Richard Socher. Dynamic coattention networks for question answering. ArXiv, abs/1611.01604, 2016.
  • [74] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? ArXiv, abs/1810.00826, 2018.
  • [75] Surong Yan, Kwei-Jay Lin, Xiaolin Zheng, and Haosen Wang. Lkerec: Toward lightweight end-to-end joint representation learning for building accurate and effective recommendation. ACM Transactions on Information Systems (TOIS), 40:1 – 28, 2021.
  • [76] Zi**g Yang, Jiabo Ye, Linlin Wang, Xin Lin, and Liang He. Inferring substitutable and complementary products with knowledge-aware path reasoning based on dynamic policy network. 235:107579, 2022.
  • [77] Wenting Ye, Hongfei Yang, Shuai Zhao, Haoyang Fang, Xingjian Shi, and Naveen Neppalli. A transformer-based substitute recommendation model incorporating weakly supervised customer behavior data, 2023.
  • [78] Hang Yin, Shuang Zheng, William Yeoh, and Jie Ren. How online review richness impacts sales: An attribute substitution perspective. 72(7):901–917, 2021.
  • [79] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
  • [80] Feifei Zhang, Mingliang Xu, and Changsheng Xu. Geometry sensitive cross-modal reasoning for composed query based image retrieval. IEEE Transactions on Image Processing, 31:1000–1011, 2021.
  • [81] Mingyue Zhang and Jesse C. Bockstedt. Complements and substitutes in product recommendations: The differential effects on consumers’ willingness-to-pay. 2020.
  • [82] Mingyue Zhang, Xuan Wei, Xunhua Guo, Guoqing Chen, and Qiang Wei. Identifying complements and substitutes of products: A neural network framework based on product embedding. 13(3):1–29, 2019.
  • [83] Qi Zhang, Bin Wu, Zhongchuan Sun, and Yangdong Ye. Gating augmented capsule network for sequential recommendation. Knowl. Based Syst., 247:108817, 2022.
  • [84] Shijie Zhang, Hongzhi Yin, Qinyong Wang, Tong Chen, Hongxu Chen, and Quoc Viet Hung Nguyen. Inferring substitutable products with deep network embedding. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pages 4306–4312. International Joint Conferences on Artificial Intelligence Organization, 2019.
  • [85] Wei Zhang, Zeyuan Chen, Hongyuan Zha, and Jianyong Wang. Learning from substitutable and complementary relations for graph-based sequential product recommendation. 40(2):1–28, 2022.
  • [86] Yuan Zhang, Fei Sun, Xiaoyong Yang, Chen Xu, Wenwu Ou, and Yan Zhang. Graph-based regularization on embedding layers for recommendation. ACM Transactions on Information Systems (TOIS), 39:1 – 27, 2020.
  • [87] Zhiheng Zhou, Tao Wang, Linfang Hou, Xinyuan Zhou, Mian Ma, and Zhuoye Ding. Decoupled hyperbolic graph attention network for modeling substitutable and complementary item relationships. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 2763–2772. ACM, 2022.
  • [88] Agnieszka Ławrynowicz, Anna Wróblewska, Weronika T. Adrian, Bartosz Kulczyński, and Anna Gramza-Michałowska. Food recipe ingredient substitution ontology design pattern. 22(3):1095, 2022.
{strip}
{biography}

[yax.jpg] Anxin Yang obtained a Bachelor’s degree in Software Engineering from Inner Mongolia University. She is currently a graduate student at the School of Computer Science, Inner Mongolia University. Her main research areas are data mining and recommendation systems.

{biography}

[dzj.png] Zhijuan Du obtained her Ph.D. degree from Renmin University of China in 2018. She currently holds the position of associate professor and master’s supervisor. She focuses on researching knowledge graphs, intelligent assignment, recommendation systems, deep learning, and social big data mining.

{biography}

[st.jpg] Tao Sun obtained a Ph.D. in Computer Science from the School of Computer Science, Inner Mongolia University in 2013, and currently serves as a professor at the College of Computer Science. He focuses on researching formal methods and software testing.