-
Transformers meet Neural Algorithmic Reasoners
Authors:
Wilfried Bounsi,
Borja Ibarz,
Andrew Dudzik,
Jessica B. Hamrick,
Larisa Markeeva,
Alex Vitvitskyi,
Razvan Pascanu,
Petar Veličković
Abstract:
Transformers have revolutionized machine learning with their simple yet effective architecture. Pre-training Transformers on massive text datasets from the Internet has led to unmatched generalization for natural language understanding (NLU) tasks. However, such language models remain fragile when tasked with algorithmic forms of reasoning, where computations must be precise and robust. To address…
▽ More
Transformers have revolutionized machine learning with their simple yet effective architecture. Pre-training Transformers on massive text datasets from the Internet has led to unmatched generalization for natural language understanding (NLU) tasks. However, such language models remain fragile when tasked with algorithmic forms of reasoning, where computations must be precise and robust. To address this limitation, we propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs). Such NARs proved effective as generic solvers for algorithmic tasks, when specified in graph form. To make their embeddings accessible to a Transformer, we propose a hybrid architecture with a two-phase training procedure, allowing the tokens in the language model to cross-attend to the node embeddings from the NAR. We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning, both in and out of distribution.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
Authors:
Bruno Gavranović,
Paul Lessard,
Andrew Dudzik,
Tamara von Glehn,
João G. M. Araújo,
Petar Veličković
Abstract:
We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the univers…
▽ More
We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.
△ Less
Submitted 5 June, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Asynchronous Algorithmic Alignment with Cocycles
Authors:
Andrew Dudzik,
Tamara von Glehn,
Razvan Pascanu,
Petar Veličković
Abstract:
State-of-the-art neural algorithmic reasoners make use of message passing in graph neural networks (GNNs). But typical GNNs blur the distinction between the definition and invocation of the message function, forcing a node to send messages to its neighbours at every layer, synchronously. When applying GNNs to learn to execute dynamic programming algorithms, however, on most steps only a handful of…
▽ More
State-of-the-art neural algorithmic reasoners make use of message passing in graph neural networks (GNNs). But typical GNNs blur the distinction between the definition and invocation of the message function, forcing a node to send messages to its neighbours at every layer, synchronously. When applying GNNs to learn to execute dynamic programming algorithms, however, on most steps only a handful of the nodes would have meaningful updates to send. One, hence, runs the risk of inefficiencies by sending too much irrelevant data across the graph. But more importantly, many intermediate GNN steps have to learn the identity functions, which is a non-trivial learning problem. In this work, we explicitly separate the concepts of node state update and message function invocation. With this separation, we obtain a mathematical formulation that allows us to reason about asynchronous computation in both algorithms and neural networks. Our analysis yields several practical implementations of synchronous scalable GNN layers that are provably invariant under various forms of asynchrony.
△ Less
Submitted 12 January, 2024; v1 submitted 27 June, 2023;
originally announced June 2023.
-
A Generalist Neural Algorithmic Learner
Authors:
Borja Ibarz,
Vitaly Kurin,
George Papamakarios,
Kyriacos Nikiforou,
Mehdi Bennani,
Róbert Csordás,
Andrew Dudzik,
Matko Bošnjak,
Alex Vitvitskyi,
Yulia Rubanova,
Andreea Deac,
Beatrice Bevilacqua,
Yaroslav Ganin,
Charles Blundell,
Petar Veličković
Abstract:
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms…
▽ More
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
△ Less
Submitted 3 December, 2022; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Graph Neural Networks are Dynamic Programmers
Authors:
Andrew Dudzik,
Petar Veličković
Abstract:
Recent advances in neural algorithmic reasoning with graph neural networks (GNNs) are propped up by the notion of algorithmic alignment. Broadly, a neural network will be better at learning to execute a reasoning task (in terms of sample complexity) if its individual components align well with the target algorithm. Specifically, GNNs are claimed to align with dynamic programming (DP), a general pr…
▽ More
Recent advances in neural algorithmic reasoning with graph neural networks (GNNs) are propped up by the notion of algorithmic alignment. Broadly, a neural network will be better at learning to execute a reasoning task (in terms of sample complexity) if its individual components align well with the target algorithm. Specifically, GNNs are claimed to align with dynamic programming (DP), a general problem-solving strategy which expresses many polynomial-time algorithms. However, has this alignment truly been demonstrated and theoretically quantified? Here we show, using methods from category theory and abstract algebra, that there exists an intricate connection between GNNs and DP, going well beyond the initial observations over individual algorithms such as Bellman-Ford. Exposing this connection, we easily verify several prior findings in the literature, produce better-grounded GNN architectures for edge-centric tasks, and demonstrate empirical results on the CLRS algorithmic reasoning benchmark. We hope our exposition will serve as a foundation for building stronger algorithmically aligned GNNs.
△ Less
Submitted 10 October, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Imitating Interactive Intelligence
Authors:
Josh Abramson,
Arun Ahuja,
Iain Barr,
Arthur Brussee,
Federico Carnevale,
Mary Cassin,
Rachita Chhaparia,
Stephen Clark,
Bogdan Damoc,
Andrew Dudzik,
Petko Georgiev,
Aurelia Guy,
Tim Harley,
Felix Hill,
Alden Hung,
Zachary Kenton,
Jessica Landon,
Timothy Lillicrap,
Kory Mathewson,
Soňa Mokrá,
Alistair Muldal,
Adam Santoro,
Nikolay Savinov,
Vikrant Varma,
Greg Wayne
, et al. (4 additional authors not shown)
Abstract:
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central cha…
▽ More
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.
△ Less
Submitted 20 January, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Quantales and Hyperstructures: Monads, Mo' Problems
Authors:
Andrew Dudzik
Abstract:
We present a theory of lattice-enriched semirings, called quantic semirings, which generalize both quantales and powersets of hyperrings. Using these structures, we show how to recover the spectrum of a Krasner hyperring (and in particular, a commutative ring with unity) via universal constructions, and generalize the spectrum to a new class of hyperstructures, hypersemirings. (These include hyper…
▽ More
We present a theory of lattice-enriched semirings, called quantic semirings, which generalize both quantales and powersets of hyperrings. Using these structures, we show how to recover the spectrum of a Krasner hyperring (and in particular, a commutative ring with unity) via universal constructions, and generalize the spectrum to a new class of hyperstructures, hypersemirings. (These include hyperstructures currently studied under the name "semihyperrings", but we have weakened the distributivity axioms.)
Much of the work consists of background material on closure systems, suplattices, quantales, and hyperoperations, some of which is new. In particular, we define the category of covered semigroups, show their close relationship with quantales, and construct their spectra by exploiting the construction of a universal quotient frame by Rosenthal.
We extend these results to hypersemigroups, demonstrating various folkloric correspondences between hyperstructures and lattice-enriched structures on the powerset. Building on this, we proceed to define quantic semirings, and show that they are the lattice-enriched counterparts of hypersemirings. To a quantic semiring, we show how to define a universal quotient quantale, which we call the quantic spectrum, and using this, we show how to obtain the spectrum of a hypersemiring as a topological space in a canonical fashion.
Finally, we we conclude with some applications of the theory to the ordered blueprints of Lorscheid.
△ Less
Submitted 28 July, 2017;
originally announced July 2017.
-
Embeddings and immersions of tropical curves
Authors:
Dustin Cartwright,
Andrew Dudzik,
Madhusudan Manjunath,
Yuan Yao
Abstract:
We construct immersions of trivalent abstract tropical curves in the Euclidean plane and embeddings of all abstract tropical curves in higher dimensional Euclidean space. Since not all curves have an embedding in the plane, we define the tropical crossing number of an abstract tropical curve to be the minimum number of self-intersections, counted with multiplicity, over all its immersions in the p…
▽ More
We construct immersions of trivalent abstract tropical curves in the Euclidean plane and embeddings of all abstract tropical curves in higher dimensional Euclidean space. Since not all curves have an embedding in the plane, we define the tropical crossing number of an abstract tropical curve to be the minimum number of self-intersections, counted with multiplicity, over all its immersions in the plane. We show that the tropical crossing number is at most quadratic in the number of edges and this bound is sharp. For curves of genus up to two, we systematically compute the crossing number. Finally, we use our immersed tropical curves to construct totally faithful nodal algebraic curves via lifting results of Mikhalkin and Shustin.
△ Less
Submitted 16 July, 2015; v1 submitted 25 September, 2014;
originally announced September 2014.