Search | arXiv e-print repository

XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

Authors: Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia, Eugenio di Sciascio

Abstract: The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs:… ▽ More The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts. Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models. This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications. △ Less

Submitted 3 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

arXiv:2403.20095 [pdf, other]

KGUF: Simple Knowledge-aware Graph-based Recommender with User-based Semantic Features Filtering

Authors: Salvatore Bufi, Alberto Carlo Maria Mancino, Antonio Ferrara, Daniele Malitesta, Tommaso Di Noia, Eugenio Di Sciascio

Abstract: The recent integration of Graph Neural Networks (GNNs) into recommendation has led to a novel family of Collaborative Filtering (CF) approaches, namely Graph Collaborative Filtering (GCF). Following the same GNNs wave, recommender systems exploiting Knowledge Graphs (KGs) have also been successfully empowered by the GCF rationale to combine the representational power of GNNs with the semantics con… ▽ More The recent integration of Graph Neural Networks (GNNs) into recommendation has led to a novel family of Collaborative Filtering (CF) approaches, namely Graph Collaborative Filtering (GCF). Following the same GNNs wave, recommender systems exploiting Knowledge Graphs (KGs) have also been successfully empowered by the GCF rationale to combine the representational power of GNNs with the semantics conveyed by KGs, giving rise to Knowledge-aware Graph Collaborative Filtering (KGCF), which use KGs to mine hidden user intent. Nevertheless, empirical evidence suggests that computing and combining user-level intent might not always be necessary, as simpler approaches can yield comparable or superior results while kee** explicit semantic features. Under this perspective, user historical preferences become essential to refine the KG and retain the most discriminating features, thus leading to concise item representation. Driven by the assumptions above, we propose KGUF, a KGCF model that learns latent representations of semantic features in the KG to better define the item profile. By leveraging user profiles through decision trees, KGUF effectively retains only those features relevant to users. Results on three datasets justify KGUF's rationale, as our approach is able to reach performance comparable or superior to SOTA methods while maintaining a simpler formalization. Link to the repository: https://github.com/sisinflab/KGUF. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.19841 [pdf, other]

Dealing with Missing Modalities in Multimodal Recommendation: a Feature Propagation-based Approach

Authors: Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Fragkiskos D. Malliaros, Tommaso Di Noia

Abstract: Multimodal recommender systems work by augmenting the representation of the products in the catalogue through multimodal features extracted from images, textual descriptions, or audio tracks characterising such products. Nevertheless, in real-world applications, only a limited percentage of products come with multimodal content to extract meaningful features from, making it hard to provide accurat… ▽ More Multimodal recommender systems work by augmenting the representation of the products in the catalogue through multimodal features extracted from images, textual descriptions, or audio tracks characterising such products. Nevertheless, in real-world applications, only a limited percentage of products come with multimodal content to extract meaningful features from, making it hard to provide accurate recommendations. To the best of our knowledge, very few attention has been put into the problem of missing modalities in multimodal recommendation so far. To this end, our paper comes as a preliminary attempt to formalise and address such an issue. Inspired by the recent advances in graph representation learning, we propose to re-sketch the missing modalities problem as a problem of missing graph node features to apply the state-of-the-art feature propagation algorithm eventually. Technically, we first project the user-item graph into an item-item one based on co-interactions. Then, leveraging the multimodal similarities among co-interacted items, we apply a modified version of the feature propagation technique to impute the missing multimodal features. Adopted as a pre-processing stage for two recent multimodal recommender systems, our simple approach performs better than other shallower solutions on three popular datasets. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.05668 [pdf, other]

CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System

Authors: Yashar Deldjoo, Tommaso di Noia

Abstract: In the evolving landscape of recommender systems, the integration of Large Language Models (LLMs) such as ChatGPT marks a new era, introducing the concept of Recommendation via LLM (RecLLM). While these advancements promise unprecedented personalization and efficiency, they also bring to the fore critical concerns regarding fairness, particularly in how recommendations might inadvertently perpetua… ▽ More In the evolving landscape of recommender systems, the integration of Large Language Models (LLMs) such as ChatGPT marks a new era, introducing the concept of Recommendation via LLM (RecLLM). While these advancements promise unprecedented personalization and efficiency, they also bring to the fore critical concerns regarding fairness, particularly in how recommendations might inadvertently perpetuate or amplify biases associated with sensitive user attributes. In order to address these concerns, our study introduces a comprehensive evaluation framework, CFaiRLLM, aimed at evaluating (and thereby mitigating) biases on the consumer side within RecLLMs. Our research methodically assesses the fairness of RecLLMs by examining how recommendations might vary with the inclusion of sensitive attributes such as gender, age, and their intersections, through both similarity alignment and true preference alignment. By analyzing recommendations generated under different conditions-including the use of sensitive attributes in user prompts-our framework identifies potential biases in the recommendations provided. A key part of our study involves exploring how different detailed strategies for constructing user profiles (random, top-rated, recent) impact the alignment between recommendations made without consideration of sensitive attributes and those that are sensitive-attribute-aware, highlighting the bias mechanisms within RecLLMs. The findings in our study highlight notable disparities in the fairness of recommendations, particularly when sensitive attributes are integrated into the recommendation process, either individually or in combination. The analysis demonstrates that the choice of user profile sampling strategy plays a significant role in affecting fairness outcomes, highlighting the complexity of achieving fair recommendations in the era of LLMs. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04503 [pdf, other]

Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation

Authors: Matteo Attimonelli, Danilo Danese, Daniele Malitesta, Claudio Pomo, Giuseppe Gassi, Tommaso Di Noia

Abstract: In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and import of custom extraction models fine-tuned on specific tasks and datasets. Moreover, the new version is capable of extracting and processing features through multimodal-by-design large models. Notably, all these ne… ▽ More In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and import of custom extraction models fine-tuned on specific tasks and datasets. Moreover, the new version is capable of extracting and processing features through multimodal-by-design large models. Notably, all these new features are supported by optimized data loading and storing to the local memory. To showcase the capabilities of Ducho 2.0, we demonstrate a complete multimodal recommendation pipeline, from the extraction/processing to the final recommendation. The idea is to provide practitioners and experienced scholars with a ready-to-use tool that, put on top of any multimodal recommendation framework, may permit them to run extensive benchmarking analyses. All materials are accessible at: \url{https://github.com/sisinflab/Ducho}. △ Less

Submitted 18 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2310.11270 [pdf, other]

Graph Neural Networks for Recommendation: Reproducibility, Graph Topology, and Node Representation

Authors: Daniele Malitesta, Claudio Pomo, Tommaso Di Noia

Abstract: Graph neural networks (GNNs) have gained prominence in recommendation systems in recent years. By representing the user-item matrix as a bipartite and undirected graph, GNNs have demonstrated their potential to capture short- and long-distance user-item interactions, thereby learning more accurate preference patterns than traditional recommendation approaches. In contrast to previous tutorials on… ▽ More Graph neural networks (GNNs) have gained prominence in recommendation systems in recent years. By representing the user-item matrix as a bipartite and undirected graph, GNNs have demonstrated their potential to capture short- and long-distance user-item interactions, thereby learning more accurate preference patterns than traditional recommendation approaches. In contrast to previous tutorials on the same topic, this tutorial aims to present and examine three key aspects that characterize GNNs for recommendation: (i) the reproducibility of state-of-the-art approaches, (ii) the potential impact of graph topological characteristics on the performance of these models, and (iii) strategies for learning node representations when training features from scratch or utilizing pre-trained embeddings as additional item information (e.g., multimodal features). The goal is to provide three novel theoretical and practical perspectives on the field, currently subject to debate in graph learning but long been overlooked in the context of recommendation systems. △ Less

Submitted 28 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2309.05273 [pdf, other]

doi 10.1145/3662738

Formalizing Multimedia Recommendation through Multimodal Deep Learning

Authors: Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, Felice Antonio Merra, Tommaso Di Noia, Eugenio Di Sciascio

Abstract: Recommender systems (RSs) offer personalized navigation experiences on online platforms, but recommendation remains a challenging task, particularly in specific scenarios and domains. Multimodality can help tap into richer information sources and construct more refined user/item profiles for recommendations. However, existing literature lacks a shared and universal schema for modeling and solving… ▽ More Recommender systems (RSs) offer personalized navigation experiences on online platforms, but recommendation remains a challenging task, particularly in specific scenarios and domains. Multimodality can help tap into richer information sources and construct more refined user/item profiles for recommendations. However, existing literature lacks a shared and universal schema for modeling and solving the recommendation problem through the lens of multimodality. This work aims to formalize a general multimodal schema for multimedia recommendation. It provides a comprehensive literature review of multimodal approaches for multimedia recommendation from the last eight years, outlines the theoretical foundations of a multimodal pipeline, and demonstrates its rationale by applying it to selected state-of-the-art approaches. The work also conducts a benchmarking analysis of recent algorithms for multimedia recommendation within Elliot, a rigorous framework for evaluating recommender systems. The main aim is to provide guidelines for designing and implementing the next generation of multimodal approaches in multimedia recommendation. △ Less

Submitted 29 April, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted in the Special Issue on Knowledge Transferring for Recommender Systems (KT4Rec) in ACM Transactions on Recommender Systems

arXiv:2309.03613 [pdf, other]

Evaluating ChatGPT as a Recommender System: A Rigorous Approach

Authors: Dario Di Palma, Giovanni Maria Biancofiore, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia, Eugenio Di Sciascio

Abstract: Large Language Models (LLMs) have recently shown impressive abilities in handling various natural language-related tasks. Among different LLMs, current studies have assessed ChatGPT's superior performance across manifold tasks, especially under the zero/few-shot prompting conditions. Given such successes, the Recommender Systems (RSs) research community have started investigating its potential app… ▽ More Large Language Models (LLMs) have recently shown impressive abilities in handling various natural language-related tasks. Among different LLMs, current studies have assessed ChatGPT's superior performance across manifold tasks, especially under the zero/few-shot prompting conditions. Given such successes, the Recommender Systems (RSs) research community have started investigating its potential applications within the recommendation scenario. However, although various methods have been proposed to integrate ChatGPT's capabilities into RSs, current research struggles to comprehensively evaluate such models while considering the peculiarities of generative models. Often, evaluations do not consider hallucinations, duplications, and out-of-the-closed domain recommendations and solely focus on accuracy metrics, neglecting the impact on beyond-accuracy facets. To bridge this gap, we propose a robust evaluation pipeline to assess ChatGPT's ability as an RS and post-process ChatGPT recommendations to account for these aspects. Through this pipeline, we investigate ChatGPT-3.5 and ChatGPT-4 performance in the recommendation task under the zero-shot condition employing the role-playing prompt. We analyze the model's functionality in three settings: the Top-N Recommendation, the cold-start recommendation, and the re-ranking of a list of recommendations, and in three domains: movies, music, and books. The experiments reveal that ChatGPT exhibits higher accuracy than the baselines on books domain. It also excels in re-ranking and cold-start scenarios while maintaining reasonable beyond-accuracy metrics. Furthermore, we measure the similarity between the ChatGPT recommendations and the other recommenders, providing insights about how ChatGPT could be categorized in the realm of recommender systems. The evaluation pipeline is publicly released for future research. △ Less

Submitted 4 June, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.12911 [pdf, other]

On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis

Authors: Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, Tommaso Di Noia

Abstract: Multimodal-aware recommender systems (MRSs) exploit multimodal content (e.g., product images or descriptions) as items' side information to improve recommendation accuracy. While most of such methods rely on factorization models (e.g., MFBPR) as base architecture, it has been shown that MFBPR may be affected by popularity bias, meaning that it inherently tends to boost the recommendation of popula… ▽ More Multimodal-aware recommender systems (MRSs) exploit multimodal content (e.g., product images or descriptions) as items' side information to improve recommendation accuracy. While most of such methods rely on factorization models (e.g., MFBPR) as base architecture, it has been shown that MFBPR may be affected by popularity bias, meaning that it inherently tends to boost the recommendation of popular (i.e., short-head) items at the detriment of niche (i.e., long-tail) items from the catalog. Motivated by this assumption, in this work, we provide one of the first analyses on how multimodality in recommendation could further amplify popularity bias. Concretely, we evaluate the performance of four state-of-the-art MRSs algorithms (i.e., VBPR, MMGCN, GRCN, LATTICE) on three datasets from Amazon by assessing, along with recommendation accuracy metrics, performance measures accounting for the diversity of recommended items and the portion of retrieved niche items. To better investigate this aspect, we decide to study the separate influence of each modality (i.e., visual and textual) on popularity bias in different evaluation dimensions. Results, which demonstrate how the single modality may augment the negative effect of popularity bias, shed light on the importance to provide a more rigorous analysis of the performance of such models. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.10778 [pdf, other]

A Topology-aware Analysis of Graph Collaborative Filtering

Authors: Daniele Malitesta, Claudio Pomo, Vito Walter Anelli, Alberto Carlo Maria Mancino, Eugenio Di Sciascio, Tommaso Di Noia

Abstract: The successful integration of graph neural networks into recommender systems (RSs) has led to a novel paradigm in collaborative filtering (CF), graph collaborative filtering (graph CF). By representing user-item data as an undirected, bipartite graph, graph CF utilizes short- and long-range connections to extract collaborative signals that yield more accurate user preferences than traditional CF m… ▽ More The successful integration of graph neural networks into recommender systems (RSs) has led to a novel paradigm in collaborative filtering (CF), graph collaborative filtering (graph CF). By representing user-item data as an undirected, bipartite graph, graph CF utilizes short- and long-range connections to extract collaborative signals that yield more accurate user preferences than traditional CF methods. Although the recent literature highlights the efficacy of various algorithmic strategies in graph CF, the impact of datasets and their topological features on recommendation performance is yet to be studied. To fill this gap, we propose a topology-aware analysis of graph CF. In this study, we (i) take some widely-adopted recommendation datasets and use them to generate a large set of synthetic sub-datasets through two state-of-the-art graph sampling methods, (ii) measure eleven of their classical and topological characteristics, and (iii) estimate the accuracy calculated on the generated sub-datasets considering four popular and recent graph-based RSs (i.e., LightGCN, DGCF, UltraGCN, and SVD-GCN). Finally, the investigation presents an explanatory framework that reveals the linear relationships between characteristics and accuracy measures. The results, statistically validated under different graph sampling settings, confirm the existence of solid dependencies between topological characteristics and accuracy in the graph-based recommendation, offering a new perspective on how to interpret graph CF. △ Less

Submitted 26 November, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.09731 [pdf, other]

ChatGPT-HealthPrompt. Harnessing the Power of XAI in Prompt-Based Healthcare Decision Support using ChatGPT

Authors: Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia

Abstract: This study presents an innovative approach to the application of large language models (LLMs) in clinical decision-making, focusing on OpenAI's ChatGPT. Our approach introduces the use of contextual prompts-strategically designed to include task description, feature description, and crucially, integration of domain knowledge-for high-quality binary classification tasks even in data-scarce scenario… ▽ More This study presents an innovative approach to the application of large language models (LLMs) in clinical decision-making, focusing on OpenAI's ChatGPT. Our approach introduces the use of contextual prompts-strategically designed to include task description, feature description, and crucially, integration of domain knowledge-for high-quality binary classification tasks even in data-scarce scenarios. The novelty of our work lies in the utilization of domain knowledge, obtained from high-performing interpretable ML models, and its seamless incorporation into prompt design. By viewing these ML models as medical experts, we extract key insights on feature importance to aid in decision-making processes. This interplay of domain knowledge and AI holds significant promise in creating a more insightful diagnostic tool. Additionally, our research explores the dynamics of zero-shot and few-shot prompt learning based on LLMs. By comparing the performance of OpenAI's ChatGPT with traditional supervised ML models in different data conditions, we aim to provide insights into the effectiveness of prompt engineering strategies under varied data availability. In essence, this paper bridges the gap between AI and healthcare, proposing a novel methodology for LLMs application in clinical decision support systems. It highlights the transformative potential of effective prompt design, domain knowledge integration, and flexible learning approaches in enhancing automated decision-making. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.00404 [pdf, other]

doi 10.1145/3604915.3609489

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis

Authors: Vito Walter Anelli, Daniele Malitesta, Claudio Pomo, Alejandro Bellogín, Tommaso Di Noia, Eugenio Di Sciascio

Abstract: The success of graph neural network-based models (GNNs) has significantly advanced recommender systems by effectively modeling users and items as a bipartite, undirected graph. However, many original graph-based works often adopt results from baseline papers without verifying their validity for the specific configuration under analysis. Our work addresses this issue by focusing on the replicabilit… ▽ More The success of graph neural network-based models (GNNs) has significantly advanced recommender systems by effectively modeling users and items as a bipartite, undirected graph. However, many original graph-based works often adopt results from baseline papers without verifying their validity for the specific configuration under analysis. Our work addresses this issue by focusing on the replicability of results. We present a code that successfully replicates results from six popular and recent graph recommendation models (NGCF, DGCF, LightGCN, SGL, UltraGCN, and GFCF) on three common benchmark datasets (Gowalla, Yelp 2018, and Amazon Book). Additionally, we compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. Furthermore, we extend our study to two new datasets (Allrecipes and BookCrossing) that lack established setups in existing literature. As the performance on these datasets differs from the previous benchmarks, we analyze the impact of specific dataset characteristics on recommendation accuracy. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure. The code to reproduce our experiments is available at: https://github.com/sisinflab/Graph-RSs-Reproducibility. △ Less

Submitted 27 May, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: Accepted to RecSys '23 - Reproducility Track

arXiv:2306.17125 [pdf, other]

Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation

Authors: Daniele Malitesta, Giuseppe Gassi, Claudio Pomo, Tommaso Di Noia

Abstract: In multimodal-aware recommendation, the extraction of meaningful multimodal features is at the basis of high-quality recommendations. Generally, each recommendation framework implements its multimodal extraction procedures with specific strategies and tools. This is limiting for two reasons: (i) different extraction strategies do not ease the interdependence among multimodal recommendation framewo… ▽ More In multimodal-aware recommendation, the extraction of meaningful multimodal features is at the basis of high-quality recommendations. Generally, each recommendation framework implements its multimodal extraction procedures with specific strategies and tools. This is limiting for two reasons: (i) different extraction strategies do not ease the interdependence among multimodal recommendation frameworks; thus, they cannot be efficiently and fairly compared; (ii) given the large plethora of pre-trained deep learning models made available by different open source tools, model designers do not have access to shared interfaces to extract features. Motivated by the outlined aspects, we propose \framework, a unified framework for the extraction of multimodal features in recommendation. By integrating three widely-adopted deep learning libraries as backends, namely, TensorFlow, PyTorch, and Transformers, we provide a shared interface to extract and process features where each backend's specific methods are abstracted to the end user. Noteworthy, the extraction pipeline is easily configurable with a YAML-based file where the user can specify, for each modality, the list of models (and their specific backends/parameters) to perform the extraction. Finally, to make \framework accessible to the community, we build a public Docker image equipped with a ready-to-use CUDA environment and propose three demos to test its functionalities for different scenarios and tasks. The GitHub repository and the documentation are accessible at this link: https://github.com/sisinflab/Ducho. △ Less

Submitted 6 September, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.12165 [pdf, other]

Post-hoc Selection of Pareto-Optimal Solutions in Search and Recommendation

Authors: Vincenzo Paparella, Vito Walter Anelli, Franco Maria Nardini, Raffaele Perego, Tommaso Di Noia

Abstract: Information Retrieval (IR) and Recommender Systems (RS) tasks are moving from computing a ranking of final results based on a single metric to multi-objective problems. Solving these problems leads to a set of Pareto-optimal solutions, known as Pareto frontier, in which no objective can be further improved without hurting the others. In principle, all the points on the Pareto frontier are potentia… ▽ More Information Retrieval (IR) and Recommender Systems (RS) tasks are moving from computing a ranking of final results based on a single metric to multi-objective problems. Solving these problems leads to a set of Pareto-optimal solutions, known as Pareto frontier, in which no objective can be further improved without hurting the others. In principle, all the points on the Pareto frontier are potential candidates to represent the best model selected with respect to the combination of two, or more, metrics. To our knowledge, there are no well-recognized strategies to decide which point should be selected on the frontier. In this paper, we propose a novel, post-hoc, theoretically-justified technique, named "Population Distance from Utopia" (PDU), to identify and select the one-best Pareto-optimal solution from the frontier. In detail, PDU analyzes the distribution of the points by investigating how far each point is from its utopia point (the ideal performance for the objectives). The possibility of considering fine-grained utopia points allows PDU to select solutions tailored to individual user preferences, a novel feature we call "calibration". We compare PDU against existing state-of-the-art strategies through extensive experiments on tasks from both IR and RS. Experimental results show that PDU and combined with calibration notably impact the solution selection. Furthermore, the results show that the proposed framework selects a solution in a principled way, irrespective of its position on the frontier, thus overcoming the limits of other strategies. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2303.18136 [pdf, other]

Machine-learned Adversarial Attacks against Fault Prediction Systems in Smart Electrical Grids

Authors: Carmelo Ardito, Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Fatemeh Nazary, Giovanni Servedio

Abstract: In smart electrical grids, fault detection tasks may have a high impact on society due to their economic and critical implications. In the recent years, numerous smart grid applications, such as defect detection and load forecasting, have embraced data-driven methodologies. The purpose of this study is to investigate the challenges associated with the security of machine learning (ML) applications… ▽ More In smart electrical grids, fault detection tasks may have a high impact on society due to their economic and critical implications. In the recent years, numerous smart grid applications, such as defect detection and load forecasting, have embraced data-driven methodologies. The purpose of this study is to investigate the challenges associated with the security of machine learning (ML) applications in the smart grid scenario. Indeed, the robustness and security of these data-driven algorithms have not been extensively studied in relation to all power grid applications. We demonstrate first that the deep neural network method used in the smart grid is susceptible to adversarial perturbation. Then, we highlight how studies on fault localization and type classification illustrate the weaknesses of present ML algorithms in smart grids to various adversarial attacks △ Less

Submitted 30 January, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: Accepted in AdvML@KDD'22

arXiv:2301.06908 [pdf, other]

MAFUS: a Framework to predict mortality risk in MAFLD subjects

Authors: Domenico Lofù, Paolo Sorino, Tommaso Colafiglio, Caterina Bonfiglio, Fedelucio Narducci, Tommaso Di Noia, Eugenio Di Sciascio

Abstract: Metabolic (dysfunction) associated fatty liver disease (MAFLD) establishes new criteria for diagnosing fatty liver disease independent of alcohol consumption and concurrent viral hepatitis infection. However, the long-term outcome of MAFLD subjects is sparse. Few articles are focused on mortality in MAFLD subjects, and none investigate how to predict a fatal outcome. In this paper, we propose an a… ▽ More Metabolic (dysfunction) associated fatty liver disease (MAFLD) establishes new criteria for diagnosing fatty liver disease independent of alcohol consumption and concurrent viral hepatitis infection. However, the long-term outcome of MAFLD subjects is sparse. Few articles are focused on mortality in MAFLD subjects, and none investigate how to predict a fatal outcome. In this paper, we propose an artificial intelligence-based framework named MAFUS that physicians can use for predicting mortality in MAFLD subjects. The framework uses data from various anthropometric and biochemical sources based on Machine Learning (ML) algorithms. The framework has been tested on a state-of-the-art dataset on which five ML algorithms are trained. Support Vector Machines resulted in being the best model. Furthermore, an Explainable Artificial Intelligence (XAI) analysis has been performed to understand the SVM diagnostic reasoning and the contribution of each feature to the prediction. The MAFUS framework is easy to apply, and the required parameters are readily available in the dataset. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: 18 pages, 20 figures

arXiv:2209.01621 [pdf, other]

doi 10.1145/3657631

Interactive Question Answering Systems: Literature Review

Authors: Giovanni Maria Biancofiore, Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Fedelucio Narducci

Abstract: Question answering systems are recognized as popular and frequently effective means of information seeking on the web. In such systems, information seekers can receive a concise response to their query by presenting their questions in natural language. Interactive question answering is a recently proposed and increasingly popular solution that resides at the intersection of question answering and… ▽ More Question answering systems are recognized as popular and frequently effective means of information seeking on the web. In such systems, information seekers can receive a concise response to their query by presenting their questions in natural language. Interactive question answering is a recently proposed and increasingly popular solution that resides at the intersection of question answering and dialogue systems. On the one hand, the user can ask questions in normal language and locate the actual response to her inquiry; on the other hand, the system can prolong the question-answering session into a dialogue if there are multiple probable replies, very few, or ambiguities in the initial request. By permitting the user to ask more questions, interactive question answering enables users to dynamically interact with the system and receive more precise results. This survey offers a detailed overview of the interactive question-answering methods that are prevalent in current literature. It begins by explaining the foundational principles of question-answering systems, hence defining new notations and taxonomies to combine all identified works inside a unified framework. The reviewed published work on interactive question-answering systems is then presented and examined in terms of its proposed methodology, evaluation approaches, and dataset/application domain. We also describe trends surrounding specific tasks and issues raised by the community, so shedding light on the future interests of scholars. Our work is further supported by a GitHub page with a synthesis of all the major topics covered in this literature study. https://sisinflab.github.io/interactive-question-answering-systems-survey/ △ Less

Submitted 19 April, 2024; v1 submitted 4 September, 2022; originally announced September 2022.

Comments: 37 pages, 2 Figures, 6 Tables, just accepted

ACM Class: A.1; H.3.3; I.2.7

arXiv:2207.06025 [pdf, other]

doi 10.1109/OJVT.2023.3333676

URANUS: Radio Frequency Tracking, Classification and Identification of Unmanned Aircraft Vehicles

Authors: Domenico Lofù, Pietro Di Gennaro, Pietro Tedeschi, Tommaso Di Noia, Eugenio Di Sciascio

Abstract: Safety and security issues for Critical Infrastructures are growing as attackers adopt drones as an attack vector flying in sensitive airspaces, such as airports, military bases, city centers, and crowded places. Despite the use of UAVs for logistics, ship** recreation activities, and commercial applications, their usage poses severe concerns to operators due to the violations and the invasions… ▽ More Safety and security issues for Critical Infrastructures are growing as attackers adopt drones as an attack vector flying in sensitive airspaces, such as airports, military bases, city centers, and crowded places. Despite the use of UAVs for logistics, ship** recreation activities, and commercial applications, their usage poses severe concerns to operators due to the violations and the invasions of the restricted airspaces. A cost-effective and real-time framework is needed to detect the presence of drones in such cases. In this contribution, we propose an efficient radio frequency-based detection framework called URANUS. We leverage real-time data provided by the Radio Frequency/Direction Finding system, and radars in order to detect, classify and identify drones (multi-copter and fixed-wings) invading no-drone zones. We adopt a Multilayer Perceptron neural network to identify and classify UAVs in real-time, with $90$% accuracy. For the tracking task, we use a Random Forest model to predict the position of a drone with an MSE $\approx0.29$, MAE $\approx0.04$, and $R^2\approx 0.93$. Furthermore, coordinate regression is performed using Universal Transverse Mercator coordinates to ensure high accuracy. Our analysis shows that URANUS is an ideal framework for identifying, classifying, and tracking UAVs that most Critical Infrastructure operators can adopt. △ Less

Submitted 15 November, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

arXiv:2203.01155 [pdf, ps, other]

doi 10.1145/3503252.3531292

Top-N Recommendation Algorithms: A Quest for the State-of-the-Art

Authors: Vito Walter Anelli, Alejandro Bellogín, Tommaso Di Noia, Dietmar Jannach, Claudio Pomo

Abstract: Research on recommender systems algorithms, like other areas of applied machine learning, is largely dominated by efforts to improve the state-of-the-art, typically in terms of accuracy measures. Several recent research works however indicate that the reported improvements over the years sometimes "don't add up", and that methods that were published several years ago often outperform the latest mo… ▽ More Research on recommender systems algorithms, like other areas of applied machine learning, is largely dominated by efforts to improve the state-of-the-art, typically in terms of accuracy measures. Several recent research works however indicate that the reported improvements over the years sometimes "don't add up", and that methods that were published several years ago often outperform the latest models when evaluated independently. Different factors contribute to this phenomenon, including that some researchers probably often only fine-tune their own models but not the baselines. In this paper, we report the outcomes of an in-depth, systematic, and reproducible comparison of ten collaborative filtering algorithms - covering both traditional and neural models - on several common performance measures on three datasets which are frequently used for evaluation in the recent literature. Our results show that there is no consistent winner across datasets and metrics for the examined top-n recommendation task. Moreover, we find that for none of the accuracy measurements any of the considered neural models led to the best performance. Regarding the performance ranking of algorithms across the measurements, we found that linear models, nearest-neighbor methods, and traditional matrix factorization consistently perform well for the evaluated modest-sized, but commonly-used datasets. Our work shall therefore serve as a guideline for researchers regarding existing baselines to consider in future performance comparisons. Moreover, by providing a set of fine-tuned baseline models for different datasets, we hope that our work helps to establish a common understanding of the state-of-the-art for top-n recommendation tasks. △ Less

Submitted 13 May, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.02757 [pdf, other]

A Review of Modern Fashion Recommender Systems

Authors: Yashar Deldjoo, Fatemeh Nazary, Arnau Ramisa, Julian Mcauley, Giovanni Pellegrini, Alejandro Bellogin, Tommaso Di Noia

Abstract: The textile and apparel industries have grown tremendously over the last few years. Customers no longer have to visit many stores, stand in long queues, or try on garments in dressing rooms as millions of products are now available in online catalogs. However, given the plethora of options available, an effective recommendation system is necessary to properly sort, order, and communicate relevant… ▽ More The textile and apparel industries have grown tremendously over the last few years. Customers no longer have to visit many stores, stand in long queues, or try on garments in dressing rooms as millions of products are now available in online catalogs. However, given the plethora of options available, an effective recommendation system is necessary to properly sort, order, and communicate relevant product material or information to users. Effective fashion RS can have a noticeable impact on billions of customers' shop** experiences and increase sales and revenues on the provider side. The goal of this survey is to provide a review of recommender systems that operate in the specific vertical domain of garment and fashion products. We have identified the most pressing challenges in fashion RS research and created a taxonomy that categorizes the literature according to the objective they are trying to accomplish (e.g., item or outfit recommendation, size recommendation, explainability, among others) and type of side-information (users, items, context). We have also identified the most important evaluation goals and perspectives (outfit generation, outfit recommendation, pairing recommendation, and fill-in-the-blank outfit compatibility prediction) and the most commonly used datasets and evaluation metrics. △ Less

Submitted 12 September, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

Comments: 38 pages, 2 figures

arXiv:2111.05578 [pdf, ps, other]

Conversational Recommendation: Theoretical Model and Complexity Analysis

Authors: Tommaso Di Noia, Francesco Donini, Dietmar Jannach, Fedelucio Narducci, Claudio Pomo

Abstract: Recommender systems are software applications that help users find items of interest in situations of information overload in a personalized way, using knowledge about the needs and preferences of individual users. In conversational recommendation approaches, these needs and preferences are acquired by the system in an interactive, multi-turn dialog. A common approach in the literature to drive su… ▽ More Recommender systems are software applications that help users find items of interest in situations of information overload in a personalized way, using knowledge about the needs and preferences of individual users. In conversational recommendation approaches, these needs and preferences are acquired by the system in an interactive, multi-turn dialog. A common approach in the literature to drive such dialogs is to incrementally ask users about their preferences regarding desired and undesired item features or regarding individual items. A central research goal in this context is efficiency, evaluated with respect to the number of required interactions until a satisfying item is found. This is usually accomplished by making inferences about the best next question to ask to the user. Today, research on dialog efficiency is almost entirely empirical, aiming to demonstrate, for example, that one strategy for selecting questions is better than another one in a given application. With this work, we complement empirical research with a theoretical, domain-independent model of conversational recommendation. This model, which is designed to cover a range of application scenarios, allows us to investigate the efficiency of conversational approaches in a formal way, in particular with respect to the computational complexity of devising optimal interaction strategies. Through such a theoretical analysis we show that finding an efficient conversational strategy is NP-hard, and in PSPACE in general, but for particular kinds of catalogs the upper bound lowers to POLYLOGSPACE. From a practical point of view, this result implies that catalog characteristics can strongly influence the efficiency of individual conversational strategies and should therefore be considered when designing new strategies. A preliminary empirical analysis on datasets derived from a real-world one aligns with our findings. △ Less

Submitted 9 February, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

arXiv:2109.00818 [pdf, other]

Adherence and Constancy in LIME-RS Explanations for Recommendation

Authors: Vito Walter Anelli, Alejandro Bellogín, Tommaso Di Noia, Francesco Maria Donini, Vincenzo Paparella, Claudio Pomo

Abstract: Explainable Recommendation has attracted a lot of attention due to a renewed interest in explainable artificial intelligence. In particular, post-hoc approaches have proved to be the most easily applicable ones to increasingly complex recommendation models, which are then treated as black-boxes. The most recent literature has shown that for post-hoc explanations based on local surrogate models, th… ▽ More Explainable Recommendation has attracted a lot of attention due to a renewed interest in explainable artificial intelligence. In particular, post-hoc approaches have proved to be the most easily applicable ones to increasingly complex recommendation models, which are then treated as black-boxes. The most recent literature has shown that for post-hoc explanations based on local surrogate models, there are problems related to the robustness of the approach itself. This consideration becomes even more relevant in human-related tasks like recommendation. The explanation also has the arduous task of enhancing increasingly relevant aspects of user experience such as transparency or trustworthiness. This paper aims to show how the characteristics of a classical post-hoc model based on surrogates is strongly model-dependent and does not prove to be accountable for the explanations generated. △ Less

Submitted 8 October, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: accepted at KaRS Workshop @RecSys 2021

arXiv:2107.14290 [pdf, other]

doi 10.1145/3460231.3474243

Sparse Feature Factorization for Recommender Systems with Knowledge Graphs

Authors: Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio, Antonio Ferrara, Alberto Carlo Maria Mancino

Abstract: Deep Learning and factorization-based collaborative filtering recommendation models have undoubtedly dominated the scene of recommender systems in recent years. However, despite their outstanding performance, these methods require a training time proportional to the size of the embeddings and it further increases when also side information is considered for the computation of the recommendation li… ▽ More Deep Learning and factorization-based collaborative filtering recommendation models have undoubtedly dominated the scene of recommender systems in recent years. However, despite their outstanding performance, these methods require a training time proportional to the size of the embeddings and it further increases when also side information is considered for the computation of the recommendation list. In fact, in these cases we have that with a large number of high-quality features, the resulting models are more complex and difficult to train. This paper addresses this problem by presenting KGFlex: a sparse factorization approach that grants an even greater degree of expressiveness. To achieve this result, KGFlex analyzes the historical data to understand the dimensions the user decisions depend on (e.g., movie direction, musical genre, nationality of book writer). KGFlex represents each item feature as an embedding and it models user-item interactions as a factorized entropy-driven combination of the item attributes relevant to the user. KGFlex facilitates the training process by letting users update only those relevant features on which they base their decisions. In other words, the user-item prediction is mediated by the user's personal view that considers only relevant features. An extensive experimental evaluation shows the approach's effectiveness, considering the recommendation results' accuracy, diversity, and induced bias. The public implementation of KGFlex is available at https://split.to/kgflex. △ Less

Submitted 29 July, 2021; originally announced July 2021.

arXiv:2107.13876 [pdf, other]

Understanding the Effects of Adversarial Personalized Ranking Optimization Method on Recommendation Quality

Authors: Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra

Abstract: Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbat… ▽ More Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbations of its model parameters. Adversarial Personalized Ranking (APR) mitigates this issue by robustifying BPR via an adversarial training procedure. The empirical improvements of APR's accuracy performance on BPR have led to its wide use in several recommender models. However, a key overlooked aspect has been the beyond-accuracy performance of APR, i.e., novelty, coverage, and amplification of popularity bias, considering that recent results suggest that BPR, the building block of APR, is sensitive to the intensification of biases and reduction of recommendation novelty. In this work, we model the learning characteristics of the BPR and APR optimization frameworks to give mathematical evidence that, when the feedback data have a tailed distribution, APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items. Using matrix factorization (MF), we empirically validate the theoretical results by performing preliminary experiments on two public datasets to compare BPR-MF and APR-MF performance on accuracy and beyond-accuracy metrics. The experimental results consistently show the degradation of novelty and coverage measures and a worrying amplification of bias. △ Less

Submitted 29 July, 2021; originally announced July 2021.

Comments: 5 pages

arXiv:2107.13472 [pdf, other]

doi 10.1145/3460231.3475944

Reenvisioning Collaborative Filtering vs Matrix Factorization

Authors: Vito Walter Anelli, Alejandro Bellogín, Tommaso Di Noia, Claudio Pomo

Abstract: Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years. This is, in part, because ANNs have demonstrated good results in a wide variety of recommendation tasks. The introduction of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons… ▽ More Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years. This is, in part, because ANNs have demonstrated good results in a wide variety of recommendation tasks. The introduction of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons in terms of efficiency and effectiveness. One aspect most of these comparisons have in common is their focus on accuracy, neglecting other evaluation dimensions important for the recommendation, such as novelty, diversity, or accounting for biases. We replicate experiments from three papers that compare Neural Collaborative Filtering (NCF) and Matrix Factorization (MF), to extend the analysis to other evaluation dimensions. Our contribution shows that the experiments are entirely reproducible, and we extend the study including other accuracy metrics and two statistical hypothesis tests. We investigated the Diversity and Novelty of the recommendations, showing that MF provides a better accuracy also on the long tail, although NCF provides a better item coverage and more diversified recommendations. We discuss the bias effect generated by the tested methods. They show a relatively small bias, but other recommendation baselines, with competitive accuracy performance, consistently show to be less affected by this issue. This is the first work, to the best of our knowledge, where several evaluation dimensions have been explored for an array of SOTA algorithms covering recent adaptations of ANNs and MF. Hence, we show the potential these techniques may have on beyond-accuracy evaluation while analyzing the effect on reproducibility these complementary dimensions may spark. Available at github.com/sisinflab/Reenvisioning-the-comparison-between-Neural-Collaborative-Filtering-and-Matrix-Factorization △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: Preprint, Accepted for publication at ACM RecSys 2021,9 pages

ACM Class: H.3.3

arXiv:2103.02590 [pdf, other]

doi 10.1145/3404835.3463245

Elliot: a Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation

Authors: Vito Walter Anelli, Alejandro Bellogín, Antonio Ferrara, Daniele Malitesta, Felice Antonio Merra, Claudio Pomo, Francesco Maria Donini, Tommaso Di Noia

Abstract: Recommender Systems have shown to be an effective way to alleviate the over-choice problem and provide accurate and tailored recommendations. However, the impressive number of proposed recommendation algorithms, splitting strategies, evaluation protocols, metrics, and tasks, has made rigorous experimental evaluation particularly challenging. Puzzled and frustrated by the continuous recreation of a… ▽ More Recommender Systems have shown to be an effective way to alleviate the over-choice problem and provide accurate and tailored recommendations. However, the impressive number of proposed recommendation algorithms, splitting strategies, evaluation protocols, metrics, and tasks, has made rigorous experimental evaluation particularly challenging. Puzzled and frustrated by the continuous recreation of appropriate evaluation benchmarks, experimental pipelines, hyperparameter optimization, and evaluation procedures, we have developed an exhaustive framework to address such needs. Elliot is a comprehensive recommendation framework that aims to run and reproduce an entire experimental pipeline by processing a simple configuration file. The framework loads, filters, and splits the data considering a vast set of strategies (13 splitting methods and 8 filtering approaches, from temporal training-test splitting to nested K-folds Cross-Validation). Elliot optimizes hyperparameters (51 strategies) for several recommendation algorithms (50), selects the best models, compares them with the baselines providing intra-model statistics, computes metrics (36) spanning from accuracy to beyond-accuracy, bias, and fairness, and conducts statistical analysis (Wilcoxon and Paired t-test). The aim is to provide the researchers with a tool to ease (and make them reproducible) all the experimental evaluation phases, from data reading to results collection. Elliot is available on GitHub (https://github.com/sisinflab/elliot). △ Less

Submitted 28 July, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

Comments: Published at SIGIR 2021, 10 pages, 1 figure

Journal ref: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021

arXiv:2012.11328 [pdf, other]

FedeRank: User Controlled Feedback with Federated Recommender Systems

Authors: Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Antonio Ferrara, Fedelucio Narducci

Abstract: Recommender systems have shown to be a successful representative of how data availability can ease our everyday digital life. However, data privacy is one of the most prominent concerns in the digital era. After several data breaches and privacy scandals, the users are now worried about sharing their data. In the last decade, Federated Learning has emerged as a new privacy-preserving distributed m… ▽ More Recommender systems have shown to be a successful representative of how data availability can ease our everyday digital life. However, data privacy is one of the most prominent concerns in the digital era. After several data breaches and privacy scandals, the users are now worried about sharing their data. In the last decade, Federated Learning has emerged as a new privacy-preserving distributed machine learning paradigm. It works by processing data on the user device without collecting data in a central repository. We present FedeRank (https://split.to/federank), a federated recommendation algorithm. The system learns a personal factorization model onto every device. The training of the model is a synchronous process between the central server and the federated clients. FedeRank takes care of computing recommendations in a distributed fashion and allows users to control the portion of data they want to share. By comparing with state-of-the-art algorithms, extensive experiments show the effectiveness of FedeRank in terms of recommendation accuracy, even with a small portion of shared user data. Further analysis of the recommendation lists' diversity and novelty guarantees the suitability of the algorithm in real production environments. △ Less

Submitted 20 January, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: Accepted for publishing at 43rd European Conference on Information Retrieval (ECIR 2021)

arXiv:2010.01329 [pdf, other]

Multi-Step Adversarial Perturbations on Recommender Systems Embeddings

Authors: Vito Walter Anelli, Alejandro Bellogín, Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra

Abstract: Recommender systems (RSs) have attained exceptional performance in learning users' preferences and hel** them in finding the most suitable products. Recent advances in adversarial machine learning (AML) in the computer vision domain have raised interests in the security of state-of-the-art model-based recommenders. Recently, worrying deterioration of recommendation accuracy has been acknowledged… ▽ More Recommender systems (RSs) have attained exceptional performance in learning users' preferences and hel** them in finding the most suitable products. Recent advances in adversarial machine learning (AML) in the computer vision domain have raised interests in the security of state-of-the-art model-based recommenders. Recently, worrying deterioration of recommendation accuracy has been acknowledged on several state-of-the-art model-based recommenders (e.g., BPR-MF) when machine-learned adversarial perturbations contaminate model parameters. However, while the single-step fast gradient sign method (FGSM) is the most explored perturbation strategy, multi-step (iterative) perturbation strategies, that demonstrated higher efficacy in the computer vision domain, have been highly under-researched in recommendation tasks. In this work, inspired by the basic iterative method (BIM) and the projected gradient descent (PGD) strategies proposed in the CV domain, we adapt the multi-step strategies for the item recommendation task to study the possible weaknesses of embedding-based recommender models under minimal adversarial perturbations. Letting the magnitude of the perturbation be fixed, we illustrate the highest efficacy of the multi-step perturbation compared to the single-step one with extensive empirical evaluation on two widely adopted recommender datasets. Furthermore, we study the impact of structural dataset characteristics, i.e., sparsity, density, and size, on the performance degradation issued by presented perturbations to support RS designer in interpreting recommendation performance variation due to minimal variations of model parameters. Our implementation and datasets are available at https://anonymous.4open.science/r/9f27f909-93d5-4016-b01c-8976b8c14bc5/. △ Less

Submitted 3 October, 2020; originally announced October 2020.

Comments: 10 pages, 3 figures, 1 table

arXiv:2010.00984 [pdf, other]

An Empirical Study of DNNs Robustification Inefficacy in Protecting Visual Recommenders

Authors: Vito Walter Anelli, Tommaso Di Noia, Daniele Malitesta, Felice Antonio Merra

Abstract: Visual-based recommender systems (VRSs) enhance recommendation performance by integrating users' feedback with the visual features of product images extracted from a deep neural network (DNN). Recently, human-imperceptible images perturbations, defined \textit{adversarial attacks}, have been demonstrated to alter the VRSs recommendation performance, e.g., pushing/nuking category of products. Howev… ▽ More Visual-based recommender systems (VRSs) enhance recommendation performance by integrating users' feedback with the visual features of product images extracted from a deep neural network (DNN). Recently, human-imperceptible images perturbations, defined \textit{adversarial attacks}, have been demonstrated to alter the VRSs recommendation performance, e.g., pushing/nuking category of products. However, since adversarial training techniques have proven to successfully robustify DNNs in preserving classification accuracy, to the best of our knowledge, two important questions have not been investigated yet: 1) How well can these defensive mechanisms protect the VRSs performance? 2) What are the reasons behind ineffective/effective defenses? To answer these questions, we define a set of defense and attack settings, as well as recommender models, to empirically investigate the efficacy of defensive mechanisms. The results indicate alarming risks in protecting a VRS through the DNN robustification. Our experiments shed light on the importance of visual features in very effective attack scenarios. Given the financial impact of VRSs on many companies, we believe this work might rise the need to investigate how to successfully protect visual-based recommenders. Source code and data are available at https://anonymous.4open.science/r/868f87ca-c8a4-41ba-9af9-20c41de33029/. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: 9 pages, 1 figure

arXiv:2008.07192 [pdf, other]

doi 10.1145/3412841.3442010

How to Put Users in Control of their Data in Federated Top-N Recommendation with Learning to Rank

Authors: Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Antonio Ferrara, Fedelucio Narducci

Abstract: Recommendation services are extensively adopted in several user-centered applications as a tool to alleviate the information overload problem and help users in orienteering in a vast space of possible choices. In such scenarios, data ownership is a crucial concern since users may not be willing to share their sensitive preferences (e.g., visited locations) with a central server. Unfortunately, dat… ▽ More Recommendation services are extensively adopted in several user-centered applications as a tool to alleviate the information overload problem and help users in orienteering in a vast space of possible choices. In such scenarios, data ownership is a crucial concern since users may not be willing to share their sensitive preferences (e.g., visited locations) with a central server. Unfortunately, data harvesting and collection is at the basis of modern, state-of-the-art approaches to recommendation. To address this issue, we present FPL, an architecture in which users collaborate in training a central factorization model while controlling the amount of sensitive data leaving their devices. The proposed approach implements pair-wise learning-to-rank optimization by following the Federated Learning principles, originally conceived to mitigate the privacy risks of traditional machine learning. The public implementation is available at https://split.to/sisinflab-fpl. △ Less

Submitted 22 December, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: Accepted at the 36th ACM/SIGAPP Symposium on Applied Computing (SAC '21)

arXiv:2007.08893 [pdf, other]

Prioritized Multi-Criteria Federated Learning

Authors: Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Antonio Ferrara

Abstract: In Machine Learning scenarios, privacy is a crucial concern when models have to be trained with private data coming from users of a service, such as a recommender system, a location-based mobile service, a mobile phone text messaging service providing next word prediction, or a face image classification system. The main issue is that, often, data are collected, transferred, and processed by third… ▽ More In Machine Learning scenarios, privacy is a crucial concern when models have to be trained with private data coming from users of a service, such as a recommender system, a location-based mobile service, a mobile phone text messaging service providing next word prediction, or a face image classification system. The main issue is that, often, data are collected, transferred, and processed by third parties. These transactions violate new regulations, such as GDPR. Furthermore, users usually are not willing to share private data such as their visited locations, the text messages they wrote, or the photo they took with a third party. On the other hand, users appreciate services that work based on their behaviors and preferences. In order to address these issues, Federated Learning (FL) has been recently proposed as a means to build ML models based on private datasets distributed over a large number of clients, while preventing data leakage. A federation of users is asked to train a same global model on their private data, while a central coordinating server receives locally computed updates by clients and aggregate them to obtain a better global model, without the need to use clients' actual data. In this work, we extend the FL approach by pushing forward the state-of-the-art approaches in the aggregation step of FL, which we deem crucial for building a high-quality global model. Specifically, we propose an approach that takes into account a suite of client-specific criteria that constitute the basis for assigning a score to each client based on a priority of criteria defined by the service provider. Extensive experiments on two publicly available datasets indicate the merits of the proposed approach compared to standard FL baseline. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2005.10322 [pdf, other]

A survey on Adversarial Recommender Systems: from Attack/Defense strategies to Generative Adversarial Networks

Authors: Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra

Abstract: Latent-factor models (LFM) based on collaborative filtering (CF), such as matrix factorization (MF) and deep CF methods, are widely used in modern recommender systems (RS) due to their excellent performance and recommendation accuracy. However, success has been accompanied with a major new arising challenge: many applications of machine learning (ML) are adversarial in nature. In recent years, it… ▽ More Latent-factor models (LFM) based on collaborative filtering (CF), such as matrix factorization (MF) and deep CF methods, are widely used in modern recommender systems (RS) due to their excellent performance and recommendation accuracy. However, success has been accompanied with a major new arising challenge: many applications of machine learning (ML) are adversarial in nature. In recent years, it has been shown that these methods are vulnerable to adversarial examples, i.e., subtle but non-random perturbations designed to force recommendation models to produce erroneous outputs. The goal of this survey is two-fold: (i) to present recent advances on adversarial machine learning (AML) for the security of RS (i.e., attacking and defense recommendation models), (ii) to show another successful application of AML in generative adversarial networks (GANs) for generative applications, thanks to their ability for learning (high-dimensional) data distributions. In this survey, we provide an exhaustive literature review of 74 articles published in major RS and ML journals and conferences. This review serves as a reference for the RS community, working on the security of RS or on generative models using GANs to improve their quality. △ Less

Submitted 10 November, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: 37 pages, submitted to journal

ACM Class: H.3.3

arXiv:1909.05038 [pdf, other]

How to make latent factors interpretable by feeding Factorization machines with knowledge graphs

Authors: Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio, Azzurra Ragone, Joseph Trotta

Abstract: Model-based approaches to recommendation can recommend items with a very high level of accuracy. Unfortunately, even when the model embeds content-based information, if we move to a latent space we miss references to the actual semantics of recommended items. Consequently, this makes non-trivial the interpretation of a recommendation process. In this paper, we show how to initialize latent factors… ▽ More Model-based approaches to recommendation can recommend items with a very high level of accuracy. Unfortunately, even when the model embeds content-based information, if we move to a latent space we miss references to the actual semantics of recommended items. Consequently, this makes non-trivial the interpretation of a recommendation process. In this paper, we show how to initialize latent factors in Factorization Machines by using semantic features coming from a knowledge graph in order to train an interpretable model. With our model, semantic features are injected into the learning process to retain the original informativeness of the items available in the dataset. The accuracy and effectiveness of the trained model have been tested using two well-known recommender systems datasets. By relying on the information encoded in the original knowledge graph, we have also evaluated the semantic accuracy and robustness for the knowledge-aware interpretability of the final model. △ Less

Submitted 11 September, 2019; originally announced September 2019.

Comments: Accepted as full paper at ISWC 2019, 17 pages

arXiv:1909.02523 [pdf, other]

doi 10.1145/3298689.3347010

On the discriminative power of Hyper-parameters in Cross-Validation and how to choose them

Authors: Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio, Claudio Pomo, Azzurra Ragone

Abstract: Hyper-parameters tuning is a crucial task to make a model perform at its best. However, despite the well-established methodologies, some aspects of the tuning remain unexplored. As an example, it may affect not just accuracy but also novelty as well as it may depend on the adopted dataset. Moreover, sometimes it could be sufficient to concentrate on a single parameter only (or a few of them) inste… ▽ More Hyper-parameters tuning is a crucial task to make a model perform at its best. However, despite the well-established methodologies, some aspects of the tuning remain unexplored. As an example, it may affect not just accuracy but also novelty as well as it may depend on the adopted dataset. Moreover, sometimes it could be sufficient to concentrate on a single parameter only (or a few of them) instead of their overall set. In this paper we report on our investigation on hyper-parameters tuning by performing an extensive 10-Folds Cross-Validation on MovieLens and Amazon Movies for three well-known baselines: User-kNN, Item-kNN, BPR-MF. We adopted a grid search strategy considering approximately 15 values for each parameter, and we then evaluated each combination of parameters in terms of accuracy and novelty. We investigated the discriminative power of nDCG, Precision, Recall, MRR, EFD, EPC, and, finally, we analyzed the role of parameters on model evaluation for Cross-Validation. △ Less

Submitted 5 September, 2019; originally announced September 2019.

Comments: 5 pages RecSys 2019

arXiv:1908.07968 [pdf, other]

Assessing the Impact of a User-Item Collaborative Attack on Class of Users

Authors: Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra

Abstract: Collaborative Filtering (CF) models lie at the core of most recommendation systems due to their state-of-the-art accuracy. They are commonly adopted in e-commerce and online services for their impact on sales volume and/or diversity, and their impact on companies' outcome. However, CF models are only as good as the interaction data they work with. As these models rely on outside sources of informa… ▽ More Collaborative Filtering (CF) models lie at the core of most recommendation systems due to their state-of-the-art accuracy. They are commonly adopted in e-commerce and online services for their impact on sales volume and/or diversity, and their impact on companies' outcome. However, CF models are only as good as the interaction data they work with. As these models rely on outside sources of information, counterfeit data such as user ratings or reviews can be injected by attackers to manipulate the underlying data and alter the impact of resulting recommendations, thus implementing a so-called shilling attack. While previous works have focused on evaluating shilling attack strategies from a global perspective paying particular attention to the effect of the size of attacks and attacker's knowledge, in this work we explore the effectiveness of shilling attacks under novel aspects. First, we investigate the effect of attack strategies crafted on a target user in order to push the recommendation of a low-ranking item to a higher position, referred to as user-item attack. Second, we evaluate the effectiveness of attacks in altering the impact of different CF models by contemplating the class of the target user, from the perspective of the richness of her profile (i.e., cold v.s. warm user). Finally, similar to previous work we contemplate the size of attack (i.e., the amount of fake profiles injected) in examining their success. The results of experiments on two widely used datasets in business and movie domains, namely Yelp and MovieLens, suggest that warm and cold users exhibit contrasting behaviors in datasets with different characteristics. △ Less

Submitted 21 August, 2019; originally announced August 2019.

Comments: 5 pages, RecSys2019, The 1st Workshop on the Impact of Recommender Systems with ACM RecSys 2019

arXiv:1908.07420 [pdf, ps, other]

Towards Effective Device-Aware Federated Learning

Authors: Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Antonio Ferrara

Abstract: With the wealth of information produced by social networks, smartphones, medical or financial applications, speculations have been raised about the sensitivity of such data in terms of users' personal privacy and data security. To address the above issues, Federated Learning (FL) has been recently proposed as a means to leave data and computational resources distributed over a large number of node… ▽ More With the wealth of information produced by social networks, smartphones, medical or financial applications, speculations have been raised about the sensitivity of such data in terms of users' personal privacy and data security. To address the above issues, Federated Learning (FL) has been recently proposed as a means to leave data and computational resources distributed over a large number of nodes (clients) where a central coordinating server aggregates only locally computed updates without knowing the original data. In this work, we extend the FL framework by pushing forward the state the art in the field on several dimensions: (i) unlike the original FedAvg approach relying solely on single criteria (i.e., local dataset size), a suite of domain- and client-specific criteria constitute the basis to compute each local client's contribution, (ii) the multi-criteria contribution of each device is computed in a prioritized fashion by leveraging a priority-aware aggregation operator used in the field of information retrieval, and (iii) a mechanism is proposed for online-adjustment of the aggregation operator parameters via a local search strategy with backtracking. Extensive experiments on a publicly available dataset indicate the merits of the proposed approach compared to standard FedAvg baseline. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: 12 pages, AIIA 2019, 18th International Conference of the Italian Association for Artificial Intelligence

arXiv:1908.06708 [pdf, ps, other]

Recommender Systems Fairness Evaluation via Generalized Cross Entropy

Authors: Yashar Deldjoo, Vito Walter Anelli, Hamed Zamani, Alejandro Bellogin, Tommaso Di Noia

Abstract: Fairness in recommender systems has been considered with respect to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue in a multistakeholder setting). Regardless, the concept has been commonly interpreted as some form of equality -- i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this paper, we argue that fairness… ▽ More Fairness in recommender systems has been considered with respect to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue in a multistakeholder setting). Regardless, the concept has been commonly interpreted as some form of equality -- i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this paper, we argue that fairness in recommender systems does not necessarily imply equality, but instead it should consider a distribution of resources based on merits and needs. We present a probabilistic framework based on generalized cross entropy to evaluate fairness of recommender systems under this perspective, where we show that the proposed framework is flexible and explanatory by allowing to incorporate domain knowledge (through an ideal fair distribution) that can help to understand which item or user aspects a recommendation algorithm is over- or under-representing. Results on two real-world datasets show the merits of the proposed evaluation framework both in terms of user and item fairness. △ Less

Submitted 19 August, 2019; originally announced August 2019.

arXiv:1807.06300 [pdf, other]

Knowledge-aware Autoencoders for Explainable Recommender Sytems

Authors: Vito Bellini, Angelo Schiavone, Tommaso Di Noia, Azzurra Ragone, Eugenio Di Sciascio

Abstract: Recommender Systems have been widely used to help users in finding what they are looking for thus tackling the information overload problem. After several years of research and industrial findings looking after better algorithms to improve accuracy and diversity metrics, explanation services for recommendation are gaining momentum as a tool to provide a human-understandable feedback to results com… ▽ More Recommender Systems have been widely used to help users in finding what they are looking for thus tackling the information overload problem. After several years of research and industrial findings looking after better algorithms to improve accuracy and diversity metrics, explanation services for recommendation are gaining momentum as a tool to provide a human-understandable feedback to results computed, in most of the cases, by black-box machine learning techniques. As a matter of fact, explanations may guarantee users satisfaction, trust, and loyalty in a system. In this paper, we evaluate how different information encoded in a Knowledge Graph are perceived by users when they are adopted to show them an explanation. More precisely, we compare how the use of categorical information, factual one or a mixture of them both in building explanations, affect explanatory criteria for a recommender system. Experimental results are validated through an A/B testing platform which uses a recommendation engine based on a Semantics-Aware Autoencoder to build users profiles which are in turn exploited to compute recommendation lists and to provide an explanation. △ Less

Submitted 17 July, 2018; originally announced July 2018.

arXiv:1807.05006 [pdf, other]

Computing recommendations via a Knowledge Graph-aware Autoencoder

Authors: Vito Bellini, Angelo Schiavone, Tommaso Di Noia, Azzurra Ragone, Eugenio Di Sciascio

Abstract: In the last years, deep learning has shown to be a game-changing technology in artificial intelligence thanks to the numerous successes it reached in diverse application fields. Among others, the use of deep learning for the recommendation problem, although new, looks quite promising due to its positive performances in terms of accuracy of recommendation results. In a recommendation setting, in or… ▽ More In the last years, deep learning has shown to be a game-changing technology in artificial intelligence thanks to the numerous successes it reached in diverse application fields. Among others, the use of deep learning for the recommendation problem, although new, looks quite promising due to its positive performances in terms of accuracy of recommendation results. In a recommendation setting, in order to predict user ratings on unknown items a possible configuration of a deep neural network is that of autoencoders tipically used to produce a lower dimensionality representation of the original data. In this paper we present KG-AUTOENCODER, an autoencoder that bases the structure of its neural network on the semanticsaware topology of a knowledge graph thus providing a label for neurons in the hidden layer that are eventually used to build a user profile and then compute recommendations. We show the effectiveness of KG-AUTOENCODER in terms of accuracy, diversity and novelty by comparing with state of the art recommendation algorithms. △ Less

Submitted 13 July, 2018; originally announced July 2018.

arXiv:1807.04207 [pdf, other]

doi 10.1145/3297280.3297360

The importance of being dissimilar in Recommendation

Authors: Vito Walter Anelli, Joseph Trotta, Tommaso Di Noia, Eugenio Di Sciascio, Azzurra Ragone

Abstract: Similarity measures play a fundamental role in memory-based nearest neighbors approaches. They recommend items to a user based on the similarity of either items or users in a neighborhood. In this paper we argue that, although it keeps a leading importance in computing recommendations, similarity between users or items should be paired with a value of dissimilarity (computed not just as the comple… ▽ More Similarity measures play a fundamental role in memory-based nearest neighbors approaches. They recommend items to a user based on the similarity of either items or users in a neighborhood. In this paper we argue that, although it keeps a leading importance in computing recommendations, similarity between users or items should be paired with a value of dissimilarity (computed not just as the complement of the similarity one). We formally modeled and injected this notion in some of the most used similarity measures and evaluated our approach showing its effectiveness in terms of accuracy results. △ Less

Submitted 4 July, 2019; v1 submitted 11 July, 2018; originally announced July 2018.

Comments: Short Version, 5 pages

Journal ref: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC 2019, Limassol, Cyprus, April 8-12, 2019

arXiv:1807.04204 [pdf, other]

doi 10.1007/978-3-030-15712-8_63

Local Popularity and Time in top-N Recommendation

Authors: Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio, Azzurra Ragone, Joseph Trotta

Abstract: Items popularity is a strong signal in recommendation algorithms. It strongly affects collaborative filtering approaches and it has been proven to be a very good baseline in terms of results accuracy. Even though we miss an actual personalization, global popularity can be effectively used to recommend items to users. In this paper we introduce the idea of a time-aware personalized popularity in re… ▽ More Items popularity is a strong signal in recommendation algorithms. It strongly affects collaborative filtering approaches and it has been proven to be a very good baseline in terms of results accuracy. Even though we miss an actual personalization, global popularity can be effectively used to recommend items to users. In this paper we introduce the idea of a time-aware personalized popularity in recommender systems by considering both items popularity among neighbors and how it changes over time. An experimental evaluation shows a highly competitive behavior of the proposed approach, compared to state of the art model-based collaborative approaches, in terms of results accuracy. △ Less

Submitted 4 July, 2019; v1 submitted 11 July, 2018; originally announced July 2018.

Comments: ECIR short paper, 7 pages

Journal ref: Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18, 2019, Proceedings, Part I

arXiv:1706.07956 [pdf, other]

doi 10.1145/3125486.3125496

Auto-Encoding User Ratings via Knowledge Graphs in Recommendation Scenarios

Authors: Vito Bellini, Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio

Abstract: In the last decade, driven also by the availability of an unprecedented computational power and storage capabilities in cloud environments we assisted to the proliferation of new algorithms, methods, and approaches in two areas of artificial intelligence: knowledge representation and machine learning. On the one side, the generation of a high rate of structured data on the Web led to the creation… ▽ More In the last decade, driven also by the availability of an unprecedented computational power and storage capabilities in cloud environments we assisted to the proliferation of new algorithms, methods, and approaches in two areas of artificial intelligence: knowledge representation and machine learning. On the one side, the generation of a high rate of structured data on the Web led to the creation and publication of the so-called knowledge graphs. On the other side, deep learning emerged as one of the most promising approaches in the generation and training of models that can be applied to a wide variety of application fields. More recently, autoencoders have proven their strength in various scenarios, playing a fundamental role in unsupervised learning. In this paper, we instigate how to exploit the semantic information encoded in a knowledge graph to build connections between units in a Neural Network, thus leading to a new method, SEM-AUTO, to extract and weigh semantic features that can eventually be used to build a recommender system. As adding content-based side information may mitigate the cold user problems, we tested how our approach behave in the presence of a few rating from a user on the Movielens 1M dataset and compare results with BPRSLIM. △ Less

Submitted 24 July, 2017; v1 submitted 24 June, 2017; originally announced June 2017.

arXiv:1110.2742 [pdf, ps]

doi 10.1613/jair.2153

Semantic Matchmaking as Non-Monotonic Reasoning: A Description Logic Approach

Authors: T. Di Noia, E. Di Sciascio, F. M. Donini

Abstract: Matchmaking arises when supply and demand meet in an electronic marketplace, or when agents search for a web service to perform some task, or even when recruiting agencies match curricula and job profiles. In such open environments, the objective of a matchmaking process is to discover best available offers to a given request. We address the problem of matchmaking from a knowledge representation… ▽ More Matchmaking arises when supply and demand meet in an electronic marketplace, or when agents search for a web service to perform some task, or even when recruiting agencies match curricula and job profiles. In such open environments, the objective of a matchmaking process is to discover best available offers to a given request. We address the problem of matchmaking from a knowledge representation perspective, with a formalization based on Description Logics. We devise Concept Abduction and Concept Contraction as non-monotonic inferences in Description Logics suitable for modeling matchmaking in a logical framework, and prove some related complexity results. We also present reasonable algorithms for semantic matchmaking based on the devised inferences, and prove that they obey to some commonsense properties. Finally, we report on the implementation of the proposed matchmaking framework, which has been used both as a mediator in e-marketplaces and for semantic web services discovery. △ Less

Submitted 12 October, 2011; originally announced October 2011.

Journal ref: Journal Of Artificial Intelligence Research, Volume 29, pages 269-307, 2007

Showing 1–43 of 43 results for author: Di Noia, T