-
Recommending Dream Jobs in a Biased Real World
Authors:
Nadia Fawaz
Abstract:
Machine learning models learn what we teach them to learn. Machine learning is at the heart of recommender systems. If a machine learning model is trained on biased data, the resulting recommender system may reflect the biases in its recommendations. Biases arise at different stages in a recommender system, from existing societal biases in the data such as the professional gender gap, to biases in…
▽ More
Machine learning models learn what we teach them to learn. Machine learning is at the heart of recommender systems. If a machine learning model is trained on biased data, the resulting recommender system may reflect the biases in its recommendations. Biases arise at different stages in a recommender system, from existing societal biases in the data such as the professional gender gap, to biases introduced by the data collection or modeling processes. These biases impact the performance of various components of recommender systems, from offline training, to evaluation and online serving of recommendations in production systems. Specific techniques can help reduce bias at each stage of a recommender system. Reducing bias in our recommender systems is crucial to successfully recommending dream jobs to hundreds of millions members worldwide, while being true to LinkedIn's vision: "To create economic opportunity for every member of the global workforce".
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
A relevance-scalability-interpretability tradeoff with temporally evolving user personas
Authors:
Snigdha Panigrahi,
Nadia Fawaz
Abstract:
The current work characterizes the users of a VoD streaming space through user-personas based on a tenure timeline and temporal behavioral features in the absence of explicit user profiles. A combination of tenure timeline and temporal characteristics caters to business needs of understanding the evolution and phases of user behavior as their accounts age. The personas constructed in this work suc…
▽ More
The current work characterizes the users of a VoD streaming space through user-personas based on a tenure timeline and temporal behavioral features in the absence of explicit user profiles. A combination of tenure timeline and temporal characteristics caters to business needs of understanding the evolution and phases of user behavior as their accounts age. The personas constructed in this work successfully represent both dominant and niche characterizations while providing insightful maturation of user behavior in the system. The two major highlights of our personas are demonstration of stability along tenure timelines on a population level, while exhibiting interesting migrations between labels on an individual granularity and clear interpretability of user labels. Finally, we show a trade-off between an indispensable trio of guarantees, relevance-scalability-interpretability by using summary information from personas in a CTR (Click through rate) predictive model. The proposed method of uncovering latent personas, consequent insights from these and application of information from personas to predictive models are broadly applicable to other streaming based products.
△ Less
Submitted 13 September, 2017; v1 submitted 25 April, 2017;
originally announced April 2017.
-
Guess Who Rated This Movie: Identifying Users Through Subspace Clustering
Authors:
Amy Zhang,
Nadia Fawaz,
Stratis Ioannidis,
Andrea Montanari
Abstract:
It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommend…
▽ More
It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.
△ Less
Submitted 9 August, 2014;
originally announced August 2014.
-
Identifying Users From Their Rating Patterns
Authors:
José Bento,
Nadia Fawaz,
Andrea Montanari,
Stratis Ioannidis
Abstract:
This paper reports on our analysis of the 2011 CAMRa Challenge dataset (Track 2) for context-aware movie recommendation systems. The train dataset comprises 4,536,891 ratings provided by 171,670 users on 23,974$ movies, as well as the household grou**s of a subset of the users. The test dataset comprises 5,450 ratings for which the user label is missing, but the household label is provided. The…
▽ More
This paper reports on our analysis of the 2011 CAMRa Challenge dataset (Track 2) for context-aware movie recommendation systems. The train dataset comprises 4,536,891 ratings provided by 171,670 users on 23,974$ movies, as well as the household grou**s of a subset of the users. The test dataset comprises 5,450 ratings for which the user label is missing, but the household label is provided. The challenge required to identify the user labels for the ratings in the test set. Our main finding is that temporal information (time labels of the ratings) is significantly more useful for achieving this objective than the user preferences (the actual ratings). Using a model that leverages on this fact, we are able to identify users within a known household with an accuracy of approximately 96% (i.e. misclassification rate around 4%).
△ Less
Submitted 26 July, 2012;
originally announced July 2012.