-
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Authors:
Matthias Gerstgrasser,
Rylan Schaeffer,
Apratim Dey,
Rafael Rafailov,
Henry Sleight,
John Hughes,
Tomasz Korbak,
Rajashree Agrawal,
Dhruv Pai,
Andrey Gromov,
Daniel A. Roberts,
Diyi Yang,
David L. Donoho,
Sanmi Koyejo
Abstract:
The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration…
▽ More
The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs.
△ Less
Submitted 29 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Grounding Gaps in Language Model Generations
Authors:
Omar Shaikh,
Kristina Gligorić,
Ashna Khetan,
Matthias Gerstgrasser,
Diyi Yang,
Dan Jurafsky
Abstract:
Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowle…
▽ More
Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowledgment (I understand.). However, it is unclear whether large language models (LLMs) generate text that reflects human grounding. To this end, we curate a set of grounding acts and propose corresponding metrics that quantify attempted grounding. We study whether LLM generations contain grounding acts, simulating turn-taking from several dialogue datasets and comparing results to humans. We find that -- compared to humans -- LLMs generate language with less conversational grounding, instead generating text that appears to simply presume common ground. To understand the roots of the identified grounding gap, we examine the role of instruction tuning and preference optimization, finding that training on contemporary preference data leads to a reduction in generated grounding acts. Altogether, we highlight the need for more research investigating conversational grounding in human-AI interaction.
△ Less
Submitted 2 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning
Authors:
Matthias Gerstgrasser,
Tom Danino,
Sarah Keren
Abstract:
We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely…
▽ More
We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.
△ Less
Submitted 23 April, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
Authors:
Matthias Gerstgrasser,
David C. Parkes
Abstract:
Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous…
▽ More
Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.
△ Less
Submitted 1 June, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design
Authors:
Gianluca Brero,
Alon Eden,
Darshan Chakrabarti,
Matthias Gerstgrasser,
Amy Greenwald,
Vincent Li,
David C. Parkes
Abstract:
We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader…
▽ More
We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader's learning environment, providing a formulation of the leader's learning problem as a POMDP that we call the Stackelberg POMDP. We prove that the optimal leader's strategy in the Stackelberg game is the optimal policy in our Stackelberg POMDP under a limited set of possible policies, establishing a connection between solving POMDPs and Stackelberg games. We solve our POMDP under a limited set of policy options via the centralized training with decentralized execution framework. For the specific case of followers that are modeled as no-regret learners, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited communication by agents. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to the case of correlated types.
△ Less
Submitted 9 November, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Collaboration Promotes Group Resilience in Multi-Agent AI
Authors:
Sarah Keren,
Matthias Gerstgrasser,
Ofir Abu,
Jeffrey Rosenschein
Abstract:
AI agents need to be robust to unexpected changes in their environment in order to safely operate in real-world scenarios. While some work has been done on this type of robustness in the single-agent case, in this work we introduce the idea that collaboration with other agents can help agents adapt to environment perturbations in multi-agent reinforcement learning settings. We first formalize this…
▽ More
AI agents need to be robust to unexpected changes in their environment in order to safely operate in real-world scenarios. While some work has been done on this type of robustness in the single-agent case, in this work we introduce the idea that collaboration with other agents can help agents adapt to environment perturbations in multi-agent reinforcement learning settings. We first formalize this notion of resilience of a group of agents. We then empirically evaluate different collaboration protocols and examine their effect on resilience. We see that all of the collaboration approaches considered lead to greater resilience compared to baseline, in line with our hypothesis. We discuss future direction and the general relevance of the concept of resilience introduced in this work.
△ Less
Submitted 9 December, 2022; v1 submitted 12 November, 2021;
originally announced November 2021.
-
Reinforcement Learning of Sequential Price Mechanisms
Authors:
Gianluca Brero,
Alon Eden,
Matthias Gerstgrasser,
David C. Parkes,
Duncan Rheingans-Yoo
Abstract:
We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partially-observable Markov decision process. We provide r…
▽ More
We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partially-observable Markov decision process. We provide rigorous conditions for when this class of mechanisms is more powerful than simpler static mechanisms, for sufficiency or insufficiency of observation statistics for learning, and for the necessity of complex (deep) policies. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings.
△ Less
Submitted 5 May, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Multi-unit Bilateral Trade
Authors:
Matthias Gerstgrasser,
Paul W. Goldberg,
Bart de Keijzer,
Philip Lazos,
Alexander Skopalik
Abstract:
We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to…
▽ More
We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular.
Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1-e) approximation mechanism, matching the best known bounds for the single-item setting.
△ Less
Submitted 13 November, 2018;
originally announced November 2018.