Where Do We Go From Here? Guidelines For Offline Recommender Evaluation

Schnabel, Tobias

Abstract:Various studies in recent years have pointed out large issues in the offline evaluation of recommender systems, making it difficult to assess whether true progress has been made. However, there has been little research into what set of practices should serve as a starting point during experimentation. In this paper, we examine four larger issues in recommender system research regarding uncertainty estimation, generalization, hyperparameter optimization and dataset pre-processing in more detail to arrive at a set of guidelines. We present a TrainRec, a lightweight and flexible toolkit for offline training and evaluation of recommender systems that implements these guidelines. Different from other frameworks, TrainRec is a toolkit that focuses on experimentation alone, offering flexible modules that can be can be used together or in isolation.
Finally, we demonstrate TrainRec's usefulness by evaluating a diverse set of twelve baselines across ten datasets. Our results show that (i) many results on smaller datasets are likely not statistically significant, (ii) there are at least three baselines that perform well on most datasets and should be considered in future experiments, and (iii) improved uncertainty quantification (via nested CV and statistical testing) rules out some reported differences between linear and neural methods. Given these results, we advocate that future research should standardize evaluation using our suggested guidelines.

Comments:	8 pages
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2211.01261 [cs.IR]
	(or arXiv:2211.01261v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2211.01261

Computer Science > Information Retrieval

Title:Where Do We Go From Here? Guidelines For Offline Recommender Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators