Search | arXiv e-print repository

Strategic Behavior and AI Training Data

Authors: Christian Peukert, Florian Abeillon, Jérémie Haese, Franziska Kaiser, Alexander Staub

Abstract: Human-created works represent critical data inputs to artificial intelligence (AI). Strategic behavior can play a major role for AI training datasets, be it in limiting access to existing works or in deciding which types of new works to create or whether to create new works at all. We examine creators' behavioral change when their works become training data for AI. Specifically, we focus on contri… ▽ More Human-created works represent critical data inputs to artificial intelligence (AI). Strategic behavior can play a major role for AI training datasets, be it in limiting access to existing works or in deciding which types of new works to create or whether to create new works at all. We examine creators' behavioral change when their works become training data for AI. Specifically, we focus on contributors on Unsplash, a popular stock image platform with about 6 million high-quality photos and illustrations. In the summer of 2020, Unsplash launched an AI research program by releasing a dataset of 25,000 images for commercial use. We study contributors' reactions, comparing contributors whose works were included in this dataset to contributors whose works were not included. Our results suggest that treated contributors left the platform at a higher-than-usual rate and substantially slowed down the rate of new uploads. Professional and more successful photographers react stronger than amateurs and less successful photographers. We also show that affected users changed the variety and novelty of contributions to the platform, with long-run implications for the stock of works potentially available for AI training. Taken together, our findings highlight the trade-off between interests of rightsholders and promoting innovation at the technological frontier. We discuss implications for copyright and AI policy. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2311.14684 [pdf, other]

The risks of risk-based AI regulation: taking liability seriously

Authors: Martin Kretschmer, Tobias Kretschmer, Alexander Peukert, Christian Peukert

Abstract: The development and regulation of multi-purpose, large "foundation models" of AI seems to have reached a critical stage, with major investments and new applications announced every other day. Some experts are calling for a moratorium on the training of AI systems more powerful than GPT-4. Legislators globally compete to set the blueprint for a new regulatory regime. This paper analyses the most ad… ▽ More The development and regulation of multi-purpose, large "foundation models" of AI seems to have reached a critical stage, with major investments and new applications announced every other day. Some experts are calling for a moratorium on the training of AI systems more powerful than GPT-4. Legislators globally compete to set the blueprint for a new regulatory regime. This paper analyses the most advanced legal proposal, the European Union's AI Act currently in the stage of final "trilogue" negotiations between the EU institutions. This legislation will likely have extra-territorial implications, sometimes called "the Brussels effect". It also constitutes a radical departure from conventional information and communications technology policy by regulating AI ex-ante through a risk-based approach that seeks to prevent certain harmful outcomes based on product safety principles. We offer a review and critique, specifically discussing the AI Act's problematic obligations regarding data quality and human oversight. Our proposal is to take liability seriously as the key regulatory mechanism. This signals to industry that if a breach of law occurs, firms are required to know in particular what their inputs were and how to retrain the system to remedy the breach. Moreover, we suggest differentiating between endogenous and exogenous sources of potential harm, which can be mitigated by carefully allocating liability between developers and deployers of AI technology. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2202.04131 [pdf, other]

Facebook Shadow Profiles

Authors: Luis Aguiar, Christian Peukert, Maximilian Schäfer, Hannes Ullrich

Abstract: We quantify Facebook's ability to build shadow profiles by tracking individuals across the web, irrespective of whether they are users of the social network. For a representative sample of US Internet users, we find that Facebook is able to track about 40 percent of the browsing time of both users and non-users of Facebook, including on privacy-sensitive domains and across user demographics. We sh… ▽ More We quantify Facebook's ability to build shadow profiles by tracking individuals across the web, irrespective of whether they are users of the social network. For a representative sample of US Internet users, we find that Facebook is able to track about 40 percent of the browsing time of both users and non-users of Facebook, including on privacy-sensitive domains and across user demographics. We show that the collected browsing data can produce accurate predictions of personal information that is valuable for advertisers, such as age or gender. Because Facebook users reveal their demographic information to the platform, and because the browsing behavior of users and non-users of Facebook overlaps, users impose a data externality on non-users by allowing Facebook to infer their personal information. △ Less

Submitted 19 July, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: 13 pages, 5 figures, 4 tables

Showing 1–3 of 3 results for author: Peukert, C