-
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
Authors:
Adam Roberts,
Hyung Won Chung,
Anselm Levskaya,
Gaurav Mishra,
James Bradbury,
Daniel Andor,
Sharan Narang,
Brian Lester,
Colin Gaffney,
Afroz Mohiuddin,
Curtis Hawthorne,
Aitor Lewkowycz,
Alex Salcianu,
Marc van Zee,
Jacob Austin,
Sebastian Goodman,
Livio Baldini Soares,
Haitang Hu,
Sasha Tsvyashchenko,
Aakanksha Chowdhery,
Jasmijn Bastings,
Jannis Bulian,
Xavier Garcia,
Jianmo Ni,
Andrew Chen
, et al. (18 additional authors not shown)
Abstract:
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen…
▽ More
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures.
$\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures
Authors:
Daniel Furrer,
Marc van Zee,
Nathan Scales,
Nathanael Schärli
Abstract:
While mainstream machine learning methods are known to have limited ability to compositionally generalize, new architectures and techniques continue to be proposed to address this limitation. We investigate state-of-the-art techniques and architectures in order to assess their effectiveness in improving compositional generalization in semantic parsing tasks based on the SCAN and CFQ datasets. We s…
▽ More
While mainstream machine learning methods are known to have limited ability to compositionally generalize, new architectures and techniques continue to be proposed to address this limitation. We investigate state-of-the-art techniques and architectures in order to assess their effectiveness in improving compositional generalization in semantic parsing tasks based on the SCAN and CFQ datasets. We show that masked language model (MLM) pre-training rivals SCAN-inspired architectures on primitive holdout splits. On a more complex compositional task, we show that pre-training leads to significant improvements in performance vs. comparable non-pre-trained models, whereas architectures proposed to encourage compositional generalization on SCAN or in the area of algorithm learning fail to lead to significant improvements. We establish a new state of the art on the CFQ compositional generalization benchmark using MLM pre-training together with an intermediate representation.
△ Less
Submitted 22 September, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Intention as Commitment toward Time
Authors:
Marc van Zee,
Dragan Doder,
Leendert van der Torre,
Mehdi Dastani,
Thomas Icard,
Eric Pacuit
Abstract:
In this paper we address the interplay among intention, time, and belief in dynamic environments. The first contribution is a logic for reasoning about intention, time and belief, in which assumptions of intentions are represented by preconditions of intended actions. Intentions and beliefs are coherent as long as these assumptions are not violated, i.e. as long as intended actions can be performe…
▽ More
In this paper we address the interplay among intention, time, and belief in dynamic environments. The first contribution is a logic for reasoning about intention, time and belief, in which assumptions of intentions are represented by preconditions of intended actions. Intentions and beliefs are coherent as long as these assumptions are not violated, i.e. as long as intended actions can be performed such that their preconditions hold as well. The second contribution is the formalization of what-if scenarios: what happens with intentions and beliefs if a new (possibly conflicting) intention is adopted, or a new fact is learned? An agent is committed to its intended actions as long as its belief-intention database is coherent. We conceptualize intention as commitment toward time and we develop AGM-based postulates for the iterated revision of belief-intention databases, and we prove a Katsuno-Mendelzon-style representation theorem.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
Authors:
Daniel Keysers,
Nathanael Schärli,
Nathan Scales,
Hylke Buisman,
Daniel Furrer,
Sergii Kashubin,
Nikola Momchev,
Danila Sinopalnikov,
Lukasz Stafiniak,
Tibor Tihon,
Dmitry Tsarkov,
Xiao Wang,
Marc van Zee,
Olivier Bousquet
Abstract:
State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence…
▽ More
State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and we quantitatively compare this method to other approaches for creating compositional generalization benchmarks. We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures. We find that they fail to generalize compositionally and that there is a surprisingly strong negative correlation between compound divergence and accuracy. We also demonstrate how our method can be used to create new compositionality benchmarks on top of the existing SCAN dataset, which confirms these findings.
△ Less
Submitted 25 June, 2020; v1 submitted 20 December, 2019;
originally announced December 2019.
-
Mechanics of epithelial tissue formation
Authors:
Ruben van Drongelen,
Tania Vazquez-Faci,
Teun A. P. M. Huijben,
Maurijn van der Zee,
Timon Idema
Abstract:
A key process in the life of any multicellular organism is its development from a single egg into a full grown adult. The first step in this process often consists of forming a tissue layer out of randomly placed cells on the surface of the egg. We present a model for generating such a tissue, and find that the resulting cellular pattern corresponds to the Voronoi tessellation of the nuclei of the…
▽ More
A key process in the life of any multicellular organism is its development from a single egg into a full grown adult. The first step in this process often consists of forming a tissue layer out of randomly placed cells on the surface of the egg. We present a model for generating such a tissue, and find that the resulting cellular pattern corresponds to the Voronoi tessellation of the nuclei of the cells. Experimentally, we obtain the same result in both fruit flies and flour beetles, with a distribution of cell shapes that matches that of the model, without any adjustable parameters. Finally, we show that this pattern is broken when the cells do not all grow at the same rate.
△ Less
Submitted 11 June, 2018; v1 submitted 17 May, 2017;
originally announced May 2017.
-
AGM-Style Revision of Beliefs and Intentions from a Database Perspective (Preliminary Version)
Authors:
Marc van Zee,
Dragan Doder
Abstract:
We introduce a logic for temporal beliefs and intentions based on Shoham's database perspective. We separate strong beliefs from weak beliefs. Strong beliefs are independent from intentions, while weak beliefs are obtained by adding intentions to strong beliefs and everything that follows from that. We formalize coherence conditions on strong beliefs and intentions. We provide AGM-style postulates…
▽ More
We introduce a logic for temporal beliefs and intentions based on Shoham's database perspective. We separate strong beliefs from weak beliefs. Strong beliefs are independent from intentions, while weak beliefs are obtained by adding intentions to strong beliefs and everything that follows from that. We formalize coherence conditions on strong beliefs and intentions. We provide AGM-style postulates for the revision of strong beliefs and intentions. We show in a representation theorem that a revision operator satisfying our postulates can be represented by a pre-order on interpretations of the beliefs, together with a selection function for the intentions.
△ Less
Submitted 26 April, 2016; v1 submitted 25 April, 2016;
originally announced April 2016.
-
Stochastic Modeling of Soil Salinity
Authors:
S. Suweis,
A. Rinaldo,
S. E. A. T. M. Van der Zee,
E. Daly,
A. Maritan,
A. Porporato
Abstract:
A minimalist stochastic model of primary soil salinity is proposed, in which the rate of soil salinization is determined by the balance between dry and wet salt deposition and the intermittent leaching events caused by rainfall events. The long term probability density functions of salt mass and concentration are found by reducing the coupled soil moisture and salt mass balance equation to a singl…
▽ More
A minimalist stochastic model of primary soil salinity is proposed, in which the rate of soil salinization is determined by the balance between dry and wet salt deposition and the intermittent leaching events caused by rainfall events. The long term probability density functions of salt mass and concentration are found by reducing the coupled soil moisture and salt mass balance equation to a single stochastic differential equation driven by multiplicative Poisson noise. The novel analytical solutions provide insight on the interplay of the main soil, plant and climate parameters responsible for long-term soil salinization. In particular, they show the existence of two distinct regimes, one where the mean salt mass remains nearly constant (or decreases) with increasing rainfall frequency, and another where mean salt content increases markedly with increasing rainfall frequency. As a result, relatively small reductions of rainfall in drier climates may entail dramatic shifts in long-term soil salinization trends, with significant consequences e.g. for climate change impacts on rain-fed agriculture.
△ Less
Submitted 10 July, 2012;
originally announced July 2012.