-
Exploring the Online Micro-targeting Practices of Small, Medium, and Large Businesses
Authors:
Salim Chouaki,
Islem Bouzenia,
Oana Goga,
Beatrice Roussillon
Abstract:
Facebook and other advertising platforms exploit users data for marketing purposes by allowing advertisers to select specific users and target them (the practice is being called micro-targeting). However, advertisers such as Cambridge Analytica have maliciously used these targeting features to manipulate users in the context of elections. The European Commission plans to restrict or ban some targe…
▽ More
Facebook and other advertising platforms exploit users data for marketing purposes by allowing advertisers to select specific users and target them (the practice is being called micro-targeting). However, advertisers such as Cambridge Analytica have maliciously used these targeting features to manipulate users in the context of elections. The European Commission plans to restrict or ban some targeting functionalities in the new European Democracy Action Plan act to protect users from such harms. The difficulty is that we do not know the economic impact of these restrictions on regular advertisers. In this paper, to inform the debate, we take a first step by understanding who is advertising on Facebook and how they use the targeting functionalities. For this, we asked 890 U.S. users to install a monitoring tool on their browsers to collect the ads they receive on Facebook and information about how these ads were targeted. By matching advertisers on Facebook with their LinkedIn profiles, we could see that 71% of advertisers are small and medium-sized businesses with 200 employees or less, and they are responsible for 61% of ads and 57% of ad impressions. Regarding micro-targeting, we found that only 32% of small and medium-sized businesses and 30% of large-sized businesses micro-target at least one of their ads. These results should not be interpreted as micro-targeting not being useful as a marketing strategy, but rather that advertisers prefer to outsource the micro-targeting task to ad platforms. Indeed, Facebook is employing optimization algorithms that exploit user data to decide which users should see what ads; which means ad platforms are performing an algorithmic-driven micro-targeting. Hence, when setting restrictions, legislators should take into account both the traditional advertiser-driven micro-targeting as well as algorithmic-driven micro-targeting performed by ad platforms.
△ Less
Submitted 2 March, 2024; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Scalable Optimal Classifiers for Adversarial Settings under Uncertainty
Authors:
Patrick Loiseau,
Benjamin Roussillon
Abstract:
We consider the problem of finding optimal classifiers in an adversarial setting where the class-1 data is generated by an attacker whose objective is not known to the defender -- an aspect that is key to realistic applications but has so far been overlooked in the literature. To model this situation, we propose a Bayesian game framework where the defender chooses a classifier with no a priori res…
▽ More
We consider the problem of finding optimal classifiers in an adversarial setting where the class-1 data is generated by an attacker whose objective is not known to the defender -- an aspect that is key to realistic applications but has so far been overlooked in the literature. To model this situation, we propose a Bayesian game framework where the defender chooses a classifier with no a priori restriction on the set of possible classifiers. The key difficulty in the proposed framework is that the set of possible classifiers is exponential in the set of possible data, which is itself exponential in the number of features used for classification. To counter this, we first show that Bayesian Nash equilibria can be characterized completely via functional threshold classifiers with a small number of parameters. We then show that this low-dimensional characterization enables to develop a training method to compute provably approximately optimal classifiers in a scalable manner; and to develop a learning algorithm for the online setting with low regret (both independent of the dimension of the set of possible data). We illustrate our results through simulations.
△ Less
Submitted 25 October, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Asymptotic Degradation of Linear Regression Estimates With Strategic Data Sources
Authors:
Benjamin Roussillon,
Nicolas Gast,
Patrick Loiseau,
Panayotis Mertikopoulos
Abstract:
We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes chara…
▽ More
We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes characterizing the agents' data -- a critical aspect of the problem when the number of agents is large. We provide a characterization of the game's equilibrium, which reveals an interesting connection with optimal design. Subsequently, we focus on the asymptotic behavior of the covariance of the linear regression parameters estimated via generalized least squares as the number of data sources becomes large. We provide upper and lower bounds for this covariance matrix and we show that, when the agents' provision costs are superlinear, the model's covariance converges to zero but at a slower rate relative to virtually all learning problems with exogenous data. On the other hand, if the agents' provision costs are linear, this covariance fails to converge. This shows that even the basic property of consistency of generalized least squares estimators is compromised when the data sources are strategic.
△ Less
Submitted 11 March, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Linear Regression from Strategic Data Sources
Authors:
Nicolas Gast,
Stratis Ioannidis,
Patrick Loiseau,
Benjamin Roussillon
Abstract:
Linear regression is a fundamental building block of statistical data analysis. It amounts to estimating the parameters of a linear model that maps input features to corresponding outputs. In the classical setting where the precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem in statistics states that generalized least squares (GLS) is a so-called "Best Linear Unbiased Est…
▽ More
Linear regression is a fundamental building block of statistical data analysis. It amounts to estimating the parameters of a linear model that maps input features to corresponding outputs. In the classical setting where the precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem in statistics states that generalized least squares (GLS) is a so-called "Best Linear Unbiased Estimator" (BLUE). In modern data science, however, one often faces strategic data sources, namely, individuals who incur a cost for providing high-precision data.
In this paper, we study a setting in which features are public but individuals choose the precision of the outputs they reveal to an analyst. We assume that the analyst performs linear regression on this dataset, and individuals benefit from the outcome of this estimation. We model this scenario as a game where individuals minimize a cost comprising two components: (a) an (agent-specific) disclosure cost for providing high-precision data; and (b) a (global) estimation cost representing the inaccuracy in the linear model estimate. In this game, the linear model estimate is a public good that benefits all individuals. We establish that this game has a unique non-trivial Nash equilibrium. We study the efficiency of this equilibrium and we prove tight bounds on the price of stability for a large class of disclosure and estimation costs. Finally, we study the estimator accuracy achieved at equilibrium. We show that, in general, Aitken's theorem does not hold under strategic data sources, though it does hold if individuals have identical disclosure costs (up to a multiplicative factor). When individuals have non-identical costs, we derive a bound on the improvement of the equilibrium estimation cost that can be achieved by deviating from GLS, under mild assumptions on the disclosure cost functions.
△ Less
Submitted 12 December, 2019; v1 submitted 30 September, 2013;
originally announced September 2013.