Search | arXiv e-print repository

How to Diversify any Personalized Recommender? A User-centric Pre-processing approach

Abstract: In this paper, we introduce a novel approach to improve the diversity of Top-N recommendations while maintaining recommendation performance. Our approach employs a user-centric pre-processing strategy aimed at exposing users to a wide array of content categories and topics. We personalize this strategy by selectively adding and removing a percentage of interactions from user profiles. This persona… ▽ More In this paper, we introduce a novel approach to improve the diversity of Top-N recommendations while maintaining recommendation performance. Our approach employs a user-centric pre-processing strategy aimed at exposing users to a wide array of content categories and topics. We personalize this strategy by selectively adding and removing a percentage of interactions from user profiles. This personalization ensures we remain closely aligned with user preferences while gradually introducing distribution shifts. Our pre-processing technique offers flexibility and can seamlessly integrate into any recommender architecture. To evaluate our approach, we run extensive experiments on two publicly available data sets for news and book recommendations. We test various standard and neural network-based recommender system algorithms. Our results show that our approach generates diverse recommendations, ensuring users are exposed to a wider range of items. Furthermore, leveraging pre-processed data for training leads to recommender systems achieving performance levels comparable to, and in some cases, better than those trained on original, unmodified data. Additionally, our approach promotes provider fairness by facilitating exposure to minority or niche categories. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2310.08775 [pdf, ps, other]

When Machine Learning Models Leak: An Exploration of Synthetic Training Data

Authors: Manel Slokom, Peter-Paul de Wolf, Martha Larson

Abstract: We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years, i.e., a propensity-to-move classifier. The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available. The attack also assumes that the attacker has o… ▽ More We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years, i.e., a propensity-to-move classifier. The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available. The attack also assumes that the attacker has obtained the values of non-sensitive attributes for a certain number of target individuals. The objective of the attack is to infer the values of sensitive attributes for these target individuals. We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes. △ Less

Submitted 19 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: Original paper published at PSD 2022. The paper was subsequently updated

arXiv:2208.03242 [pdf, ps, other]

Minimizing Mindless Mentions: Recommendation with Minimal Necessary User Reviews

Authors: Danny Stax, Manel Slokom, Martha Larson

Abstract: Recently, researchers have turned their attention to recommender systems that use only minimal necessary data. This trend is informed by the idea that recommender systems should use no more user interactions than are needed in order to provide users with useful recommendations. In this position paper, we make the case for applying the idea of minimal necessary data to recommender systems that use… ▽ More Recently, researchers have turned their attention to recommender systems that use only minimal necessary data. This trend is informed by the idea that recommender systems should use no more user interactions than are needed in order to provide users with useful recommendations. In this position paper, we make the case for applying the idea of minimal necessary data to recommender systems that use user reviews. We argue that the content of individual user reviews should be subject to minimization. Specifically, reviews used as training data to generate recommendations or reviews used to help users decide on purchases or consumption should be automatically edited to contain only the information that is needed. △ Less

Submitted 9 September, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

arXiv:2207.14218 [pdf, ps, other]

Gender In Gender Out: A Closer Look at User Attributes in Context-Aware Recommendation

Authors: Manel Slokom, Özlem Özgöbek, Martha Larson

Abstract: This paper studies user attributes in light of current concerns in the recommender system community: diversity, coverage, calibration, and data minimization. In experiments with a conventional context-aware recommender system that leverages side information, we show that user attributes do not always improve recommendation. Then, we demonstrate that user attributes can negatively impact diversity… ▽ More This paper studies user attributes in light of current concerns in the recommender system community: diversity, coverage, calibration, and data minimization. In experiments with a conventional context-aware recommender system that leverages side information, we show that user attributes do not always improve recommendation. Then, we demonstrate that user attributes can negatively impact diversity and coverage. Finally, we investigate the amount of information about users that ``survives'' from the training data into the recommendation lists produced by the recommender. This information is a weak signal that could in the future be exploited for calibration or studied further as a privacy leak. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2110.03275 [pdf, ps, other]

Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender Systems

Authors: Manel Slokom, Martha Larson

Abstract: We present a case that the newly emerging field of synthetic data in the area of recommender systems should prioritize `doing data right'. We consider this catchphrase to have two aspects: First, we should not repeat the mistakes of the past, and, second, we should explore the full scope of opportunities presented by synthetic data as we move into the future. We argue that explicit attention to da… ▽ More We present a case that the newly emerging field of synthetic data in the area of recommender systems should prioritize `doing data right'. We consider this catchphrase to have two aspects: First, we should not repeat the mistakes of the past, and, second, we should explore the full scope of opportunities presented by synthetic data as we move into the future. We argue that explicit attention to dataset design and description will help to avoid past mistakes with dataset bias and evaluation. In order to fully exploit the opportunities of synthetic data, we point out that researchers can investigate new areas such as using data synthesize to support reproducibility by making data open, as well as FAIR, and to push forward our understanding of data minimization. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: Contribution to the SimuRec Workshop at RecSys 2021

arXiv:2008.03797 [pdf, other]

Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

Authors: Manel Slokom, Martha Larson, Alan Hanjalic

Abstract: This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item matrix. We apply a transformation to the original data that changes some values, but leaves others the same. The result is a partially synthetic data set that can be… ▽ More This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item matrix. We apply a transformation to the original data that changes some values, but leaves others the same. The result is a partially synthetic data set that can be used for recommendation but contains less specific information about individual user preferences. Synthetic data has the potential to be useful for companies, who are interested in releasing data to allow outside parties to develop new recommender algorithms, i.e., in the case of a recommender system challenge, and also reducing the risks associated with data misappropriation. Our experiments run a set of recommender system algorithms on our partially synthetic data sets as well as on the original data. The results show that the relative performance of the algorithms on the partially synthetic data reflects the relative performance on the original data. Further analysis demonstrates that properties of the original data are preserved under synthesis, but that for certain examples of attributes accessible in the original data are hidden in the synthesized data. △ Less

Submitted 9 August, 2020; originally announced August 2020.

Comments: 11 pages, 4 figures

Showing 1–6 of 6 results for author: Slokom, M