Search | arXiv e-print repository

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

Authors: Jacob Portes, Alex Trott, Sam Havens, Daniel King, Abhinav Venigalla, Moin Nadeem, Nikhil Sardana, Daya Khudia, Jonathan Frankle

Abstract: Although BERT-style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT. Here, we introduce Mo… ▽ More Although BERT-style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT. Here, we introduce MosaicBERT, a BERT-style encoder architecture and training recipe that is empirically optimized for fast pretraining. This efficient architecture incorporates FlashAttention, Attention with Linear Biases (ALiBi), Gated Linear Units (GLU), a module to dynamically remove padded tokens, and low precision LayerNorm into the classic transformer encoder block. The training recipe includes a 30% masking ratio for the Masked Language Modeling (MLM) objective, bfloat16 precision, and vocabulary size optimized for GPU throughput, in addition to best-practices from RoBERTa and other encoder models. When pretrained from scratch on the C4 dataset, this base model achieves a downstream average GLUE (dev) score of 79.6 in 1.13 hours on 8 A100 80 GB GPUs at a cost of roughly $20. We plot extensive accuracy vs. pretraining speed Pareto curves and show that MosaicBERT base and large are consistently Pareto optimal when compared to a competitive BERT base and large. This empirical speed up in pretraining enables researchers and engineers to pretrain custom BERT-style models at low cost instead of finetune on existing generic models. We open source our model weights and code. △ Less

Submitted 16 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: 10 pages, 4 figures in main text. 25 pages total

Journal ref: NeurIPS 2023

arXiv:2311.13133 [pdf, other]

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

Authors: Aditi Jha, Sam Havens, Jeremy Dohmann, Alex Trott, Jacob Portes

Abstract: Large Language Models are traditionally finetuned on large instruction datasets. However recent studies suggest that small, high-quality datasets can suffice for general purpose instruction following. This lack of consensus surrounding finetuning best practices is in part due to rapidly diverging approaches to LLM evaluation. In this study, we ask whether a small amount of diverse finetuning sampl… ▽ More Large Language Models are traditionally finetuned on large instruction datasets. However recent studies suggest that small, high-quality datasets can suffice for general purpose instruction following. This lack of consensus surrounding finetuning best practices is in part due to rapidly diverging approaches to LLM evaluation. In this study, we ask whether a small amount of diverse finetuning samples can improve performance on both traditional perplexity-based NLP benchmarks, and on open-ended, model-based evaluation. We finetune open-source MPT-7B and MPT-30B models on instruction finetuning datasets of various sizes ranging from 1k to 60k samples. We find that subsets of 1k-6k instruction finetuning samples are sufficient to achieve good performance on both (1) traditional NLP benchmarks and (2) model-based evaluation. Finally, we show that mixing textbook-style and open-ended QA finetuning datasets optimizes performance on both evaluation paradigms. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 36 pages, 12 figures, NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following

arXiv:2203.13395 [pdf, other]

Platform Behavior under Market Shocks: A Simulation Framework and Reinforcement-Learning Based Study

Authors: Xintong Wang, Gary Qiurui Ma, Alon Eden, Clara Li, Alexander Trott, Stephan Zheng, David C. Parkes

Abstract: We study the behavior of an economic platform (e.g., Amazon, Uber Eats, Instacart) under shocks, such as COVID-19 lockdowns, and the effect of different regulation considerations imposed on a platform. To this end, we develop a multi-agent Gym environment of a platform economy in a dynamic, multi-period setting, with the possible occurrence of economic shocks. Buyers and sellers are modeled as eco… ▽ More We study the behavior of an economic platform (e.g., Amazon, Uber Eats, Instacart) under shocks, such as COVID-19 lockdowns, and the effect of different regulation considerations imposed on a platform. To this end, we develop a multi-agent Gym environment of a platform economy in a dynamic, multi-period setting, with the possible occurrence of economic shocks. Buyers and sellers are modeled as economically-motivated agents, choosing whether or not to pay corresponding fees to use the platform. We formulate the platform's problem as a partially observable Markov decision process, and use deep reinforcement learning to model its fee setting and matching behavior. We consider two major types of regulation frameworks: (1) taxation policies and (2) platform fee restrictions, and offer extensive simulated experiments to characterize regulatory tradeoffs under optimal platform responses. Our results show that while many interventions are ineffective with a sophisticated platform actor, we identify a particular kind of regulation -- fixing fees to optimal, pre-shock fees while still allowing a platform to choose how to match buyer demands to sellers -- as promoting the efficiency, seller diversity, and resilience of the overall economic system. △ Less

Submitted 4 January, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

arXiv:2202.01691 [pdf, other]

Solving Dynamic Principal-Agent Problems with a Rationally Inattentive Principal

Authors: Tong Mu, Stephan Zheng, Alexander Trott

Abstract: Principal-Agent (PA) problems describe a broad class of economic relationships characterized by misaligned incentives and asymmetric information. The Principal's problem is to find optimal incentives given the available information, e.g., a manager setting optimal wages for its employees. Whereas the Principal is often assumed rational, comparatively little is known about solutions when the Princi… ▽ More Principal-Agent (PA) problems describe a broad class of economic relationships characterized by misaligned incentives and asymmetric information. The Principal's problem is to find optimal incentives given the available information, e.g., a manager setting optimal wages for its employees. Whereas the Principal is often assumed rational, comparatively little is known about solutions when the Principal is boundedly rational, especially in the sequential setting, with multiple Agents, and with multiple information channels. Here, we develop RIRL, a deep reinforcement learning framework that solves such complex PA problems with a rationally inattentive Principal. Such a Principal incurs a cost for paying attention to information, which can model forms of bounded rationality. We use RIRL to analyze rich economic phenomena in manager-employee relationships. In the single-step setting, 1) RIRL yields wages that are consistent with theoretical predictions; and 2) non-zero attention costs lead to simpler but less profitable wage structures, and increased Agent welfare. In a sequential setting with multiple Agents, RIRL shows opposing consequences of the Principal's inattention to different information channels: 1) inattention to Agents' outputs closes wage gaps based on ability differences; and 2) inattention to Agents' efforts induces a social dilemma dynamic in which Agents work harder, but essentially for free. Moreover, RIRL reveals non-trivial relationships between the Principal's inattention and Agent types, e.g., if Agents are prone to sub-optimal effort choices, payment schedules are more sensitive to the Principal's attention cost. As such, RIRL can reveal novel economic relationships and enables progress towards understanding the effects of bounded rationality in dynamic settings. △ Less

Submitted 17 February, 2022; v1 submitted 18 January, 2022; originally announced February 2022.

Comments: 22 pages, 8 figures, including appendix

arXiv:2201.01163 [pdf, other]

Analyzing Micro-Founded General Equilibrium Models with Many Agents using Deep Reinforcement Learning

Authors: Michael Curry, Alexander Trott, Soham Phade, Yu Bai, Stephan Zheng

Abstract: Real economies can be modeled as a sequential imperfect-information game with many heterogeneous agents, such as consumers, firms, and governments. Dynamic general equilibrium (DGE) models are often used for macroeconomic analysis in this setting. However, finding general equilibria is challenging using existing theoretical or computational methods, especially when using microfoundations to model… ▽ More Real economies can be modeled as a sequential imperfect-information game with many heterogeneous agents, such as consumers, firms, and governments. Dynamic general equilibrium (DGE) models are often used for macroeconomic analysis in this setting. However, finding general equilibria is challenging using existing theoretical or computational methods, especially when using microfoundations to model individual agents. Here, we show how to use deep multi-agent reinforcement learning (MARL) to find $ε$-meta-equilibria over agent types in microfounded DGE models. Whereas standard MARL fails to learn non-trivial solutions, our structured learning curricula enable stable convergence to meaningful solutions. Conceptually, our approach is more flexible and does not need unrealistic assumptions, e.g., continuous market clearing, that are commonly used for analytical tractability. Furthermore, our end-to-end GPU implementation enables fast real-time convergence with a large number of RL economic agents. We showcase our approach in open and closed real-business-cycle (RBC) models with 100 worker-consumers, 10 firms, and a social planner who taxes and redistributes. We validate the learned solutions are $ε$-meta-equilibria through best-response analyses, show that they align with economic intuitions, and show our approach can learn a spectrum of qualitatively distinct $ε$-meta-equilibria in open RBC models. As such, we show that hardware-accelerated MARL is a promising framework for modeling the complexity of economies based on microfoundations. △ Less

Submitted 23 February, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

arXiv:2108.02904 [pdf, other]

Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist

Authors: Alexander Trott, Sunil Srinivasa, Douwe van der Wal, Sebastien Haneuse, Stephan Zheng

Abstract: Optimizing economic and public policy is critical to address socioeconomic issues and trade-offs, e.g., improving equality, productivity, or wellness, and poses a complex mechanism design problem. A policy designer needs to consider multiple objectives, policy levers, and behavioral responses from strategic actors who optimize for their individual objectives. Moreover, real-world policies should b… ▽ More Optimizing economic and public policy is critical to address socioeconomic issues and trade-offs, e.g., improving equality, productivity, or wellness, and poses a complex mechanism design problem. A policy designer needs to consider multiple objectives, policy levers, and behavioral responses from strategic actors who optimize for their individual objectives. Moreover, real-world policies should be explainable and robust to simulation-to-reality gaps, e.g., due to calibration issues. Existing approaches are often limited to a narrow set of policy levers or objectives that are hard to measure, do not yield explicit optimal policies, or do not consider strategic behavior, for example. Hence, it remains challenging to optimize policy in real-world scenarios. Here we show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning (RL) and data-driven simulations. We validate our framework on optimizing the stringency of US state policies and Federal subsidies during a pandemic, e.g., COVID-19, using a simulation fitted to real data. We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes. Their behavior can be explained, e.g., well-performing policies respond strongly to changes in recovery and vaccination rates. They are also robust to calibration errors, e.g., infection rates that are over or underestimated. As of yet, real-world policymaking has not seen adoption of machine learning methods at large, including RL and AI-driven simulations. Our results show the potential of AI to guide policy design and improve social welfare amidst the complexity of the real world. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 41 pages, 14 figures. AT, SS, and SZ contributed equally

arXiv:2108.02755 [pdf, other]

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

Authors: Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, Richard Socher

Abstract: AI and reinforcement learning (RL) have improved many areas, but are not yet widely adopted in economic policy design, mechanism design, or economics at large. At the same time, current economic methodology is limited by a lack of counterfactual data, simplistic behavioral models, and limited opportunities to experiment with policies and evaluate behavioral responses. Here we show that machine-lea… ▽ More AI and reinforcement learning (RL) have improved many areas, but are not yet widely adopted in economic policy design, mechanism design, or economics at large. At the same time, current economic methodology is limited by a lack of counterfactual data, simplistic behavioral models, and limited opportunities to experiment with policies and evaluate behavioral responses. Here we show that machine-learning-based economic simulation is a powerful policy and mechanism design framework to overcome these limitations. The AI Economist is a two-level, deep RL framework that trains both agents and a social planner who co-adapt, providing a tractable solution to the highly unstable and novel two-level RL challenge. From a simple specification of an economy, we learn rational agent behaviors that adapt to learned planner policies and vice versa. We demonstrate the efficacy of the AI Economist on the problem of optimal taxation. In simple one-step economies, the AI Economist recovers the optimal tax policy of economic theory. In complex, dynamic economies, the AI Economist substantially improves both utilitarian social welfare and the trade-off between equality and productivity over baselines. It does so despite emergent tax-gaming strategies, while accounting for agent interactions and behavioral change more accurately than economic theory. These results demonstrate for the first time that two-level, deep RL can be used for understanding and as a complement to theory for economic design, unlocking a new computational learning-based approach to understanding economic policy. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Substantial Extension of arXiv:2004.13332. SZ and AT contributed equally

arXiv:2106.05492 [pdf, other]

Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents

Authors: Eric Zhao, Alexander R. Trott, Caiming Xiong, Stephan Zheng

Abstract: We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic responses of other agents, which is generally NP-hard. However, we show that no-regret dynamics can identify these worst-case responses in poly-time in smooth games. We propose a framework that uses t… ▽ More We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic responses of other agents, which is generally NP-hard. However, we show that no-regret dynamics can identify these worst-case responses in poly-time in smooth games. We propose a framework that uses this policy evaluation method for efficiently learning a robust principal policy using RL. This framework can be extended to provide robustness to boundedly rational agents too. Our motivating application is automated mechanism design: we empirically demonstrate our framework learns robust mechanisms in both matrix games and complex spatiotemporal games. In particular, we learn a dynamic tax policy that improves the welfare of a simulated trade-and-barter economy by 15%, even when facing previously unseen boundedly rational RL taxpayers. △ Less

Submitted 19 December, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: 15 pages, 6 figures. Appearing at the Thirty-seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

arXiv:2004.13332 [pdf, other]

The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies

Authors: Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher

Abstract: Tackling real-world socio-economic challenges requires designing and testing economic policies. However, this is hard in practice, due to a lack of appropriate (micro-level) economic data and limited opportunity to experiment. In this work, we train social planners that discover tax policies in dynamic economies that can effectively trade-off economic equality and productivity. We propose a two-le… ▽ More Tackling real-world socio-economic challenges requires designing and testing economic policies. However, this is hard in practice, due to a lack of appropriate (micro-level) economic data and limited opportunity to experiment. In this work, we train social planners that discover tax policies in dynamic economies that can effectively trade-off economic equality and productivity. We propose a two-level deep reinforcement learning approach to learn dynamic tax policies, based on economic simulations in which both agents and a government learn and adapt. Our data-driven approach does not make use of economic modeling assumptions, and learns from observational data alone. We make four main contributions. First, we present an economic simulation environment that features competitive pressures and market dynamics. We validate the simulation by showing that baseline tax systems perform in a way that is consistent with economic theory, including in regard to learned agent behaviors and specializations. Second, we show that AI-driven tax policies improve the trade-off between equality and productivity by 16% over baseline policies, including the prominent Saez tax framework. Third, we showcase several emergent features: AI-driven tax policies are qualitatively different from baselines, setting a higher top tax rate and higher net subsidies for low incomes. Moreover, AI-driven tax policies perform strongly in the face of emergent tax-gaming strategies learned by AI agents. Lastly, AI-driven tax policies are also effective when used in experiments with human participants. In experiments conducted on MTurk, an AI tax policy provides an equality-productivity trade-off that is similar to that provided by the Saez framework along with higher inverse-income weighted social welfare. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 46 pages, 21 figures

arXiv:2002.03647 [pdf, other]

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

Authors: Víctor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Giro-i-Nieto, Jordi Torres

Abstract: Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection between option discovery and information theory. Information-theoretic skill discovery methods have garnered much interest from the community, but little research has been conducted in un… ▽ More Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection between option discovery and information theory. Information-theoretic skill discovery methods have garnered much interest from the community, but little research has been conducted in understanding their limitations. Through theoretical analysis and empirical evidence, we show that existing algorithms suffer from a common limitation -- they discover options that provide a poor coverage of the state space. In light of this, we propose 'Explore, Discover and Learn' (EDL), an alternative approach to information-theoretic skill discovery. Crucially, EDL optimizes the same information-theoretic objective derived from the empowerment literature, but addresses the optimization problem using different machinery. We perform an extensive evaluation of skill discovery methods on controlled environments and show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned. Code is publicly available at https://github.com/victorcampos7/edl. △ Less

Submitted 3 August, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

Comments: 17 pages, 11 figures. Code is publicly available at https://github.com/victorcampos7/edl

arXiv:1911.01417 [pdf, other]

Kee** Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Authors: Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher

Abstract: While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward sha** often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to lea… ▽ More While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward sha** often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward sha** and enables policies to efficiently solve sparse reward tasks. Our augmented objective does not require any additional reward engineering or domain expertise to implement and converges to the original sparse objective as the agent learns to solve the task. We demonstrate that our method successfully solves a variety of hard-exploration tasks (including maze navigation and 3D construction in a Minecraft environment), where naive distance-based reward sha** otherwise fails, and intrinsic curiosity and reward relabeling strategies exhibit poor performance. △ Less

Submitted 4 November, 2019; originally announced November 2019.

Comments: NeurIPS 2019

arXiv:1907.00664 [pdf, other]

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

Authors: Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

Abstract: In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train… ▽ More In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train a latent pivotal state model and a curiosity-driven goal-conditioned policy in a task-agnostic manner. Second, provided with the information from the world graph, a high-level Manager quickly finds solution to new tasks and expresses subgoals in reference to pivotal states to a low-level Worker. The Worker can then also leverage the graph to easily traverse to the pivotal states of interest, even across long distance, and explore non-locally. We perform a thorough ablation study to evaluate our approach on a suite of challenging maze tasks, demonstrating significant advantages from the proposed framework over baselines that lack world graph knowledge in terms of performance and efficiency. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1902.00528 [pdf, other]

Competitive Experience Replay

Authors: Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong

Abstract: Deep learning has achieved remarkable successes in solving challenging reinforcement learning (RL) problems when dense reward function is provided. However, in sparse reward environment it still often suffers from the need to carefully shape reward function to guide policy optimization. This limits the applicability of RL in the real world since both reinforcement learning and domain-specific know… ▽ More Deep learning has achieved remarkable successes in solving challenging reinforcement learning (RL) problems when dense reward function is provided. However, in sparse reward environment it still often suffers from the need to carefully shape reward function to guide policy optimization. This limits the applicability of RL in the real world since both reinforcement learning and domain-specific knowledge are required. It is therefore of great practical importance to develop algorithms which can learn from a binary signal indicating successful task completion or other unshaped, sparse reward signals. We propose a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents. Our method complements the recently proposed hindsight experience replay (HER) by inducing an automatic exploratory curriculum. We evaluate our approach on the tasks of reaching various goal locations in an ant maze and manipulating objects with a robotic arm. Each task provides only binary rewards indicating whether or not the goal is achieved. Our method asymmetrically augments these sparse rewards for a pair of agents each learning the same task, creating a competitive game designed to drive exploration. Extensive experiments demonstrate that this method leads to faster converge and improved task performance. △ Less

Submitted 16 February, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

Comments: Published as a conference paper at Seventh International Conference on Learning Representations(ICLR 2019)

arXiv:1712.08697 [pdf, other]

Interpretable Counting for Visual Question Answering

Authors: Alexander Trott, Caiming Xiong, Richard Socher

Abstract: Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA). The most common approaches to VQA involve either classifying answers based on fixed length representations of both the image and question or summing fractional counts estimated from each section of the image. In contrast, we treat counting as a sequential decision process and… ▽ More Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA). The most common approaches to VQA involve either classifying answers based on fixed length representations of both the image and question or summing fractional counts estimated from each section of the image. In contrast, we treat counting as a sequential decision process and force our model to make discrete choices of what to count. Specifically, the model sequentially selects from detected objects and learns interactions between objects that influence subsequent selections. A distinction of our approach is its intuitive and interpretable output, as discrete counts are automatically grounded in the image. Furthermore, our method outperforms the state of the art architecture for VQA on multiple metrics that evaluate counting. △ Less

Submitted 1 March, 2018; v1 submitted 22 December, 2017; originally announced December 2017.

Comments: ICLR 2018

Showing 1–14 of 14 results for author: Trott, A