Search | arXiv e-print repository

Repeated Bidding with Dynamic Value

Authors: Benjamin Heymann, Alexandre Gilotte, Rémi Chan-Renous

Abstract: We consider a repeated auction where the buyer's utility for an item depends on the time that elapsed since his last purchase. We present an algorithm to build the optimal bidding policy, and then, because optimal might be impractical, we discuss the cost for the buyer of limiting himself to shading policies. We consider a repeated auction where the buyer's utility for an item depends on the time that elapsed since his last purchase. We present an algorithm to build the optimal bidding policy, and then, because optimal might be impractical, we discuss the cost for the buyer of limiting himself to shading policies. △ Less

Submitted 3 August, 2023; originally announced August 2023.

MSC Class: 91B26; 49N90

arXiv:2210.02450 [pdf, other]

Learning from aggregated data with a maximum entropy model

Authors: Alexandre Gilotte, Ahmed Ben Yahmed, David Rohde

Abstract: Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data.However, aggregated data -- even without noise -- is not an appropriate input for machine learning classifiers.In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a… ▽ More Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data.However, aggregated data -- even without noise -- is not an appropriate input for machine learning classifiers.In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. The resulting model is a Markov Random Field (MRF), and we detail how to apply, modify and scale a MRF training algorithm to our setting. Finally we present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2208.05327 [pdf, other]

Fast Offline Policy Optimization for Large Scale Recommendation

Authors: Otmane Sakhi, David Rohde, Alexandre Gilotte

Abstract: Personalised interactive systems such as recommender systems require selecting relevant items from massive catalogs dependent on context. Reward-driven offline optimisation of these systems can be achieved by a relaxation of the discrete problem resulting in policy learning or REINFORCE style learning algorithms. Unfortunately, this relaxation step requires computing a sum over the entire catalogu… ▽ More Personalised interactive systems such as recommender systems require selecting relevant items from massive catalogs dependent on context. Reward-driven offline optimisation of these systems can be achieved by a relaxation of the discrete problem resulting in policy learning or REINFORCE style learning algorithms. Unfortunately, this relaxation step requires computing a sum over the entire catalogue making the complexity of the evaluation of the gradient (and hence each stochastic gradient descent iterations) linear in the catalogue size. This calculation is untenable in many real world examples such as large catalogue recommender systems, severely limiting the usefulness of this method in practice. In this paper, we derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size. Our contribution is based upon combining three novel ideas: a new Monte Carlo estimate of the gradient of a policy, the self normalised importance sampling estimator and the use of fast maximum inner product search at training time. Extensive experiments show that our algorithm is an order of magnitude faster than naive approaches yet produces equally good policies. △ Less

Submitted 27 May, 2023; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: Accepted at AAAI 2023

arXiv:2201.13123 [pdf, other]

Lessons from the AdKDD'21 Privacy-Preserving ML Challenge

Authors: Eustache Diemert, Romain Fabre, Alexandre Gilotte, Fei Jia, Basile Leparmentier, Jérémie Mary, Zhonghua Qu, Ugo Tanielian, Hui Yang

Abstract: Designing data sharing mechanisms providing performance and strong privacy guarantees is a hot topic for the Online Advertising industry. Namely, a prominent proposal discussed under the Improving Web Advertising Business Group at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays. To study this proposal extensively, an open Privacy-Pres… ▽ More Designing data sharing mechanisms providing performance and strong privacy guarantees is a hot topic for the Online Advertising industry. Namely, a prominent proposal discussed under the Improving Web Advertising Business Group at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays. To study this proposal extensively, an open Privacy-Preserving Machine Learning Challenge took place at AdKDD'21, a premier workshop on Advertising Science with data provided by advertising company Criteo. In this paper, we describe the challenge tasks, the structure of the available datasets, report the challenge results, and enable its full reproducibility. A key finding is that learning models on large, aggregated data in the presence of a small set of unaggregated data points can be surprisingly efficient and cheap. We also run additional experiments to observe the sensitivity of winning methods to different parameters such as privacy budget or quantity of available privileged side information. We conclude that the industry needs either alternate designs for private data sharing or a breakthrough in learning with aggregated data only to keep ad relevance at a reasonable level. △ Less

Submitted 31 January, 2022; originally announced January 2022.

arXiv:1909.08471 [pdf, other]

Learning from Bandit Feedback: An Overview of the State-of-the-art

Authors: Olivier Jeunen, Dmytro Mykhaylov, David Rohde, Flavian Vasile, Alexandre Gilotte, Martin Bompaire

Abstract: In machine learning we often try to optimise a decision rule that would have worked well over a historical dataset; this is the so called empirical risk minimisation principle. In the context of learning from recommender system logs, applying this principle becomes a problem because we do not have available the reward of decisions we did not do. In order to handle this "bandit-feedback" setting, s… ▽ More In machine learning we often try to optimise a decision rule that would have worked well over a historical dataset; this is the so called empirical risk minimisation principle. In the context of learning from recommender system logs, applying this principle becomes a problem because we do not have available the reward of decisions we did not do. In order to handle this "bandit-feedback" setting, several Counterfactual Risk Minimisation (CRM) methods have been proposed in recent years, that attempt to estimate the performance of different policies on historical data. Through importance sampling and various variance reduction techniques, these methods allow more robust learning and inference than classical approaches. It is difficult to accurately estimate the performance of policies that frequently perform actions that were infrequently done in the past and a number of different types of estimators have been proposed. In this paper, we review several methods, based on different off-policy estimators, for learning from bandit feedback. We discuss key differences and commonalities among existing approaches, and compare their empirical performance on the RecoGym simulation environment. To the best of our knowledge, this work is the first comparison study for bandit algorithms in a recommender system setting. △ Less

Submitted 18 September, 2019; originally announced September 2019.

arXiv:1909.07926 [pdf, other]

Ranking metrics on non-shuffled traffic

Authors: Alexandre Gilotte

Abstract: Ranking metrics are a family of metrics largely used to evaluate recommender systems. However they typically suffer from the fact the reward is affected by the order in which recommended items are displayed to the user. A classical way to overcome this position bias is to uniformly shuffle a proportion of the recommendations, but this method may result in a bad user experience. It is nevertheless… ▽ More Ranking metrics are a family of metrics largely used to evaluate recommender systems. However they typically suffer from the fact the reward is affected by the order in which recommended items are displayed to the user. A classical way to overcome this position bias is to uniformly shuffle a proportion of the recommendations, but this method may result in a bad user experience. It is nevertheless common to use a stochastic policy to generate the recommendations, and we suggest a new method to overcome the position bias, by leveraging the stochasticity of the policy used to collect the dataset. △ Less

Submitted 17 September, 2019; originally announced September 2019.

arXiv:1801.07030 [pdf, other]

doi 10.1145/3159652.3159687

Offline A/B testing for Recommender Systems

Authors: Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, Simon Dollé

Abstract: Before A/B testing online a new version of a recommender system, it is usual to perform some offline evaluations on historical data. We focus on evaluation methods that compute an estimator of the potential uplift in revenue that could generate this new technology. It helps to iterate faster and to avoid losing money by detecting poor policies. These estimators are known as counterfactual or off-p… ▽ More Before A/B testing online a new version of a recommender system, it is usual to perform some offline evaluations on historical data. We focus on evaluation methods that compute an estimator of the potential uplift in revenue that could generate this new technology. It helps to iterate faster and to avoid losing money by detecting poor policies. These estimators are known as counterfactual or off-policy estimators. We show that traditional counterfactual estimators such as capped importance sampling and normalised importance sampling are experimentally not having satisfying bias-variance compromises in the context of personalised product recommendation for online advertising. We propose two variants of counterfactual estimates with different modelling of the bias that prove to be accurate in real-world conditions. We provide a benchmark of these estimators by showing their correlation with business metrics observed by running online A/B tests on a commercial recommender system. △ Less

Submitted 22 January, 2018; originally announced January 2018.

Showing 1–7 of 7 results for author: Gilotte, A