Search | arXiv e-print repository

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs. △ Less

Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2311.09144 [pdf, other]

Grounding Gaps in Language Model Generations

Authors: Omar Shaikh, Kristina Gligorić, Ashna Khetan, Matthias Gerstgrasser, Diyi Yang, Dan Jurafsky

Abstract: Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowle… ▽ More Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowledgment (I understand.). However, it is unclear whether large language models (LLMs) generate text that reflects human grounding. To this end, we curate a set of grounding acts and propose corresponding metrics that quantify attempted grounding. We study whether LLM generations contain grounding acts, simulating turn-taking from several dialogue datasets and comparing results to humans. We find that -- compared to humans -- LLMs generate language with less conversational grounding, instead generating text that appears to simply presume common ground. To understand the roots of the identified grounding gap, we examine the role of instruction tuning and preference optimization, finding that training on contemporary preference data leads to a reduction in generated grounding acts. Altogether, we highlight the need for more research investigating conversational grounding in human-AI interaction. △ Less

Submitted 2 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: NAACL 2024; 18 pages, 2 figures

arXiv:2311.00865 [pdf, other]

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, Tom Danino, Sarah Keren

Abstract: We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely… ▽ More We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super. △ Less

Submitted 23 April, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: published at NeurIPS 2023

Journal ref: Advances in Neural Information Processing Systems, 36 (2023)

arXiv:2210.11942 [pdf, other]

Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, David C. Parkes

Abstract: Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous… ▽ More Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework. △ Less

Submitted 1 June, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2210.03852 [pdf, other]

Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design

Authors: Gianluca Brero, Alon Eden, Darshan Chakrabarti, Matthias Gerstgrasser, Amy Greenwald, Vincent Li, David C. Parkes

Abstract: We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader… ▽ More We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader's learning environment, providing a formulation of the leader's learning problem as a POMDP that we call the Stackelberg POMDP. We prove that the optimal leader's strategy in the Stackelberg game is the optimal policy in our Stackelberg POMDP under a limited set of possible policies, establishing a connection between solving POMDPs and Stackelberg games. We solve our POMDP under a limited set of policy options via the centralized training with decentralized execution framework. For the specific case of followers that are modeled as no-regret learners, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited communication by agents. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to the case of correlated types. △ Less

Submitted 9 November, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2111.06614 [pdf, other]

Collaboration Promotes Group Resilience in Multi-Agent AI

Authors: Sarah Keren, Matthias Gerstgrasser, Ofir Abu, Jeffrey Rosenschein

Abstract: AI agents need to be robust to unexpected changes in their environment in order to safely operate in real-world scenarios. While some work has been done on this type of robustness in the single-agent case, in this work we introduce the idea that collaboration with other agents can help agents adapt to environment perturbations in multi-agent reinforcement learning settings. We first formalize this… ▽ More AI agents need to be robust to unexpected changes in their environment in order to safely operate in real-world scenarios. While some work has been done on this type of robustness in the single-agent case, in this work we introduce the idea that collaboration with other agents can help agents adapt to environment perturbations in multi-agent reinforcement learning settings. We first formalize this notion of resilience of a group of agents. We then empirically evaluate different collaboration protocols and examine their effect on resilience. We see that all of the collaboration approaches considered lead to greater resilience compared to baseline, in line with our hypothesis. We discuss future direction and the general relevance of the concept of resilience introduced in this work. △ Less

Submitted 9 December, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

ACM Class: I.2.11, I.2.6

arXiv:2010.01180 [pdf, other]

Reinforcement Learning of Sequential Price Mechanisms

Authors: Gianluca Brero, Alon Eden, Matthias Gerstgrasser, David C. Parkes, Duncan Rheingans-Yoo

Abstract: We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partially-observable Markov decision process. We provide r… ▽ More We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partially-observable Markov decision process. We provide rigorous conditions for when this class of mechanisms is more powerful than simpler static mechanisms, for sufficiency or insufficiency of observation statistics for learning, and for the necessity of complex (deep) policies. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings. △ Less

Submitted 5 May, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:1811.05130 [pdf, ps, other]

Multi-unit Bilateral Trade

Authors: Matthias Gerstgrasser, Paul W. Goldberg, Bart de Keijzer, Philip Lazos, Alexander Skopalik

Abstract: We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to… ▽ More We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular. Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1-e) approximation mechanism, matching the best known bounds for the single-item setting. △ Less

Submitted 13 November, 2018; originally announced November 2018.

Showing 1–8 of 8 results for author: Gerstgrasser, M