-
Transformers as Transducers
Authors:
Lena Strobl,
Dana Angluin,
David Chiang,
Jonathan Rawski,
Ashish Sabharwal
Abstract:
We study the sequence-to-sequence map** capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence funct…
▽ More
We study the sequence-to-sequence map** capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence functions and show that it computes exactly the first-order rational functions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular functions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular functions. Finally, we show that masked average-hard attention transformers can simulate S-RASP. A corollary of our results is a new proof that transformer decoders are Turing-complete.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
What Formal Languages Can Transformers Express? A Survey
Authors:
Lena Strobl,
William Merrill,
Gail Weiss,
David Chiang,
Dana Angluin
Abstract:
As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help clarify the power of transformers relative to other models of computation, their fundamental capabilities and limits, and the impact of architectural choices. Work…
▽ More
As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help clarify the power of transformers relative to other models of computation, their fundamental capabilities and limits, and the impact of architectural choices. Work in this subarea has made considerable progress in recent years. Here, we undertake a comprehensive survey of this work, documenting the diverse assumptions that underlie different results and providing a unified framework for harmonizing seemingly contradictory findings.
△ Less
Submitted 6 May, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Authors:
Andy Yang,
David Chiang,
Dana Angluin
Abstract:
The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that they are equivalent to line…
▽ More
The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that they are equivalent to linear temporal logic (LTL), which defines exactly the star-free languages. A key technique is the use of Boolean RASP as a convenient intermediate language between transformers and LTL. We then take numerous results known for LTL and apply them to transformers, characterizing how position embeddings, strict masking, and depth increase expressive power.
△ Less
Submitted 22 May, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Constructing Concise Characteristic Samples for Acceptors of Omega Regular Languages
Authors:
Dana Angluin,
Dana Fisman
Abstract:
A characteristic sample for a language $L$ and a learning algorithm $\textbf{L}$ is a finite sample of words $T_L$ labeled by their membership in $L$ such that for any sample $T \supseteq T_L$ consistent with $L$, on input $T$ the learning algorithm $\textbf{L}$ returns a hypothesis equivalent to $L$. Which omega automata have characteristic sets of polynomial size, and can these sets be construct…
▽ More
A characteristic sample for a language $L$ and a learning algorithm $\textbf{L}$ is a finite sample of words $T_L$ labeled by their membership in $L$ such that for any sample $T \supseteq T_L$ consistent with $L$, on input $T$ the learning algorithm $\textbf{L}$ returns a hypothesis equivalent to $L$. Which omega automata have characteristic sets of polynomial size, and can these sets be constructed in polynomial time? We address these questions here.
In brief, non-deterministic omega automata of any of the common types, in particular Büchi, do not have characteristic samples of polynomial size. For deterministic omega automata that are isomorphic to their right congruence automata, the fully informative languages, polynomial time algorithms for constructing characteristic samples and learning from them are given.
The algorithms for constructing characteristic sets in polynomial time for the different omega automata (of types Büchi, coBüchi, parity, Rabin, Street, or Muller), require deterministic polynomial time algorithms for (1) equivalence of the respective omega automata, and (2) testing membership of the language of the automaton in the informative classes, which we provide.
△ Less
Submitted 22 April, 2024; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Authors:
Yiding Hao,
Dana Angluin,
Robert Frank
Abstract:
This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the c…
▽ More
This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the class of languages recognizable by families of Boolean circuits of constant depth and polynomial size. This upper bound subsumes Hahn's (2020) results that GUHAT cannot recognize the DYCK languages or the PARITY language, since those languages are outside AC$^0$ (Furst et al., 1984). In contrast, the non-AC$^0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Polynomial time algorithms for inclusion and equivalence of deterministic omega acceptors
Authors:
Dana Angluin,
Dana Fisman
Abstract:
The class of omega languages recognized by deterministic parity acceptors (DPAs) or deterministic Muller acceptors (DMAs) is exactly the regular omega languages. The inclusion problem is the following: given two acceptors A1 and A2, determine whether the language recognized by A1 is a subset of the language recognized by A2, and if not, return an ultimately periodic omega word accepted by A1 but n…
▽ More
The class of omega languages recognized by deterministic parity acceptors (DPAs) or deterministic Muller acceptors (DMAs) is exactly the regular omega languages. The inclusion problem is the following: given two acceptors A1 and A2, determine whether the language recognized by A1 is a subset of the language recognized by A2, and if not, return an ultimately periodic omega word accepted by A1 but not A2. We describe polynomial time algorithms to solve this problem for two DPAs and for two DMAs. Corollaries include polynomial time algorithms to solve the equivalence problem for DPAs and DMAs, and also the inclusion and equivalence problems for deterministic Buechi and coBuechi acceptors.
△ Less
Submitted 9 May, 2020; v1 submitted 8 February, 2020;
originally announced February 2020.
-
Regular omega-Languages with an Informative Right Congruence
Authors:
Dana Angluin,
Dana Fisman
Abstract:
A regular language is almost fully characterized by its right congruence relation. Indeed, a regular language can always be recognized by a DFA isomorphic to the automaton corresponding to its right congruence, henceforth the Rightcon automaton. The same does not hold for regular omega-languages. The right congruence of a regular omega-language is not informative enough; many regular omega-langua…
▽ More
A regular language is almost fully characterized by its right congruence relation. Indeed, a regular language can always be recognized by a DFA isomorphic to the automaton corresponding to its right congruence, henceforth the Rightcon automaton. The same does not hold for regular omega-languages. The right congruence of a regular omega-language is not informative enough; many regular omega-languages have a trivial right congruence, and in general it is not always possible to define an omega-automaton recognizing a given language that is isomorphic to the rightcon automaton.
The class of weak regular omega-languages does have an informative right congruence. That is, any weak regular omega-language can always be recognized by a deterministic Büchi automaton that is isomorphic to the rightcon automaton. Weak regular omega-languages reside in the lower levels of the expressiveness hierarchy of regular omega-languages. Are there more expressive sub-classes of regular omega languages that have an informative right congruence? Can we fully characterize the class of languages with a trivial right congruence? In this paper we try to place some additional pieces of this big puzzle.
△ Less
Submitted 9 September, 2018;
originally announced September 2018.
-
Context-Free Transductions with Neural Stacks
Authors:
Yiding Hao,
William Merrill,
Dana Angluin,
Robert Frank,
Noah Amsel,
Andrew Benz,
Simon Mendelsohn
Abstract:
This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modelling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover in…
▽ More
This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modelling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover intuitive stack-based strategies for solving our tasks. However, stack RNNs are more difficult to train than classical architectures such as LSTMs. Rather than employ stack-based strategies, more complex networks often find approximate solutions by using the stack as unstructured memory.
△ Less
Submitted 8 September, 2018;
originally announced September 2018.
-
Query learning of derived $ω$-tree languages in polynomial time
Authors:
Dana Angluin,
Timos Antonopoulos,
Dana Fisman
Abstract:
We present the first polynomial time algorithm to learn nontrivial classes of languages of infinite trees. Specifically, our algorithm uses membership and equivalence queries to learn classes of $ω$-tree languages derived from weak regular $ω$-word languages in polynomial time. The method is a general polynomial time reduction of learning a class of derived $ω$-tree languages to learning the under…
▽ More
We present the first polynomial time algorithm to learn nontrivial classes of languages of infinite trees. Specifically, our algorithm uses membership and equivalence queries to learn classes of $ω$-tree languages derived from weak regular $ω$-word languages in polynomial time. The method is a general polynomial time reduction of learning a class of derived $ω$-tree languages to learning the underlying class of $ω$-word languages, for any class of $ω$-word languages recognized by a deterministic Büchi acceptor. Our reduction, combined with the polynomial time learning algorithm of Maler and Pnueli [1995] for the class of weak regular $ω$-word languages yields the main result. We also show that subset queries that return counterexamples can be implemented in polynomial time using subset queries that return no counterexamples for deterministic or non-deterministic finite word acceptors, and deterministic or non-deterministic Büchi $ω$-word acceptors.
A previous claim of an algorithm to learn regular $ω$-trees due to Jayasrirani, Begam and Thomas [2008] is unfortunately incorrect, as shown in Angluin [2016].
△ Less
Submitted 26 August, 2019; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Families of DFAs as Acceptors of $ω$-Regular Languages
Authors:
Dana Angluin,
Udi Boker,
Dana Fisman
Abstract:
Families of DFAs (FDFAs) provide an alternative formalism for recognizing $ω$-regular languages. The motivation for introducing them was a desired correlation between the automaton states and right congruence relations, in a manner similar to the Myhill-Nerode theorem for regular languages. This correlation is beneficial for learning algorithms, and indeed it was recently shown that $ω$-regular la…
▽ More
Families of DFAs (FDFAs) provide an alternative formalism for recognizing $ω$-regular languages. The motivation for introducing them was a desired correlation between the automaton states and right congruence relations, in a manner similar to the Myhill-Nerode theorem for regular languages. This correlation is beneficial for learning algorithms, and indeed it was recently shown that $ω$-regular languages can be learned from membership and equivalence queries, using FDFAs as the acceptors.
In this paper, we look into the question of how suitable FDFAs are for defining omega-regular languages. Specifically, we look into the complexity of performing Boolean operations, such as complementation and intersection, on FDFAs, the complexity of solving decision problems, such as emptiness and language containment, and the succinctness of FDFAs compared to standard deterministic and nondeterministic $ω$-automata.
We show that FDFAs enjoy the benefits of deterministic automata with respect to Boolean operations and decision problems. Namely, they can all be performed in nondeterministic logarithmic space. We provide polynomial translations of deterministic Büchi and co-Büchi automata to FDFAs and of FDFAs to nondeterministic Büchi automata (NBAs). We show that translation of an NBA to an FDFA may involve an exponential blowup. Last, we show that FDFAs are more succinct than deterministic parity automata (DPAs) in the sense that translating a DPA to an FDFA can always be done with only a polynomial increase, yet the other direction involves an inevitable exponential blowup in the worst case.
△ Less
Submitted 13 February, 2018; v1 submitted 24 December, 2016;
originally announced December 2016.
-
Learning and Verifying Quantified Boolean Queries by Example
Authors:
Azza Abouzied,
Dana Angluin,
Christos Papadimitriou,
Joseph M. Hellerstein,
Avi Silberschatz
Abstract:
To help a user specify and verify quantified queries --- a class of database queries known to be very challenging for all but the most expert users --- one can question the user on whether certain data objects are answers or non-answers to her intended query. In this paper, we analyze the number of questions needed to learn or verify qhorn queries, a special class of Boolean quantified queries who…
▽ More
To help a user specify and verify quantified queries --- a class of database queries known to be very challenging for all but the most expert users --- one can question the user on whether certain data objects are answers or non-answers to her intended query. In this paper, we analyze the number of questions needed to learn or verify qhorn queries, a special class of Boolean quantified queries whose underlying form is conjunctions of quantified Horn expressions. We provide optimal polynomial-question and polynomial-time learning and verification algorithms for two subclasses of the class qhorn with upper constant limits on a query's causal density.
△ Less
Submitted 15 April, 2013;
originally announced April 2013.
-
The computational power of population protocols
Authors:
Dana Angluin,
James Aspnes,
David Eisenstat,
Eric Ruppert
Abstract:
We consider the model of population protocols introduced by Angluin et al., in which anonymous finite-state agents stably compute a predicate of the multiset of their inputs via two-way interactions in the all-pairs family of communication networks. We prove that all predicates stably computable in this model (and certain generalizations of it) are semilinear, answering a central open question a…
▽ More
We consider the model of population protocols introduced by Angluin et al., in which anonymous finite-state agents stably compute a predicate of the multiset of their inputs via two-way interactions in the all-pairs family of communication networks. We prove that all predicates stably computable in this model (and certain generalizations of it) are semilinear, answering a central open question about the power of the model. Removing the assumption of two-way interaction, we also consider several variants of the model in which agents communicate by anonymous message-passing where the recipient of each message is chosen by an adversary and the sender is not identified to the recipient. These one-way models are distinguished by whether messages are delivered immediately or after a delay, whether a sender can record that it has sent a message, and whether a recipient can queue incoming messages, refusing to accept new messages until it has had a chance to send out messages of its own. We characterize the classes of predicates stably computable in each of these one-way models using natural subclasses of the semilinear predicates.
△ Less
Submitted 21 August, 2006;
originally announced August 2006.