Search | arXiv e-print repository

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Authors: David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro

Abstract: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by cap** the number of tokens ($k$) that… ▽ More Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by cap** the number of tokens ($k$) that can participate in the self-attention and MLP computations at a given layer. The tokens to be processed are determined by the network using a top-$k$ routing mechanism. Since $k$ is defined a priori, this simple procedure uses a static computation graph with known tensor sizes, unlike other conditional computation techniques. Nevertheless, since the identities of the $k$ tokens are fluid, this method can expend FLOPs non-uniformly across the time and model depth dimensions. Thus, compute expenditure is entirely predictable in sum total, but dynamic and context-sensitive at the token-level. Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50\% faster to step during post-training sampling. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2312.03635 [pdf, other]

Towards Time Sensitive Networking on Smart Cities: Techniques, Challenges, and Solutions

Authors: Rui Lopes, Duarte Raposo, Susana Sargento

Abstract: The rapid proliferation of smart cities has transformed urban landscapes into dynamic ecosystems teeming with interconnected computational nodes and sensors. During this evolution, the search for seamless communication in time-critical scenarios has become evident. With the escalating complexity of urban environments, envisioning a future with a blend of autonomous and conventional systems, each d… ▽ More The rapid proliferation of smart cities has transformed urban landscapes into dynamic ecosystems teeming with interconnected computational nodes and sensors. During this evolution, the search for seamless communication in time-critical scenarios has become evident. With the escalating complexity of urban environments, envisioning a future with a blend of autonomous and conventional systems, each demanding distinct quality-of-service considerations, services in smart cities vary criticality levels and necessitate differentiated traffic handling, prioritizing critical flows without compromising the network's reliability or failing on hard real-time requirements. To tackle these challenges, in this article we propose a Time-Sensitive Networking (TSN) approach which, at the scale of a smart city network, presents multifaceted challenges, notably interoperability among diverse technologies and standards. Nonetheless, TSN emerges as a promising toolkit, encompassing synchronization, latency management, redundancy, and configuration functionalities crucial for addressing smart city challenges. Moreover, the article scrutinizes how TSN, predominantly utilized in domains like automotive and industry, can be tailored to suit the intricate needs of smart cities, emphasizing the necessity for adaptability and scalability in network design. This survey consolidates current research on TSN, outlining its potential in fortifying critical machine-to-machine communications within smart cities while highlighting future challenges, potential solutions, and a roadmap for integrating TSN effectively into the fabric of urban connectivity. △ Less

Submitted 6 December, 2023; originally announced December 2023.

ACM Class: C.2.1

arXiv:2207.12200 [pdf, other]

Aveiro Tech City Living Lab: A Communication, Sensing and Computing Platform for City Environments

Authors: Pedro Rito, Ana Almeida, Andreia Figueiredo, Christian Gomes, Pedro Teixeira, Rodrigo Rosmaninho, Rui Lopes, Duarte Dias, Gonçalo Vítor, Gonçalo Perna, Miguel Silva, Carlos Senna, Duarte Raposo, Miguel Luís, Susana Sargento, Arnaldo Oliveira, Nuno Borges de Carvalho

Abstract: This article presents the deployment and experimentation architecture of the Aveiro Tech City Living Lab (ATCLL) in Aveiro, Portugal. This platform comprises a large number of Internet-of-Things devices with communication, sensing and computing capabilities. The communication infrastructure, built on fiber and Millimeter-wave (mmWave) links, integrates a communication network with radio terminals… ▽ More This article presents the deployment and experimentation architecture of the Aveiro Tech City Living Lab (ATCLL) in Aveiro, Portugal. This platform comprises a large number of Internet-of-Things devices with communication, sensing and computing capabilities. The communication infrastructure, built on fiber and Millimeter-wave (mmWave) links, integrates a communication network with radio terminals (WiFi, ITS-G5, C-V2X, 5G and LoRa(WAN)), multiprotocol, spread throughout 44 connected points of access in the city. Additionally, public transportation has also been equipped with communication and sensing units. All these points combine and interconnect a set of sensors, such as mobility (Radars, Lidars, video cameras) and environmental sensors. Combining edge computing and cloud management to deploy the services and manage the platform, and a data platform to gather and process the data, the living lab supports a wide range of services and applications: IoT, intelligent transportation systems and assisted driving, environmental monitoring, emergency and safety, among others. This article describes the architecture, implementation and deployment to make the overall platform to work and integrate researchers and citizens. Moreover, it showcases some examples of the performance metrics achieved in the city infrastructure, the data that can be collected, visualized and used to build services and applications to the cities, and, finally, different use cases in the mobility and safety scenarios. △ Less

Submitted 25 July, 2022; originally announced July 2022.

ACM Class: C.2.1

arXiv:2206.05396 [pdf, other]

A systematic approach on some relevant theorems that follows from Kolmogorov's axioms

Authors: Diego J. Raposo

Abstract: A selection of the relevant theorems of Probability Theory that comes directly from Kolmogorov's axioms, Set Theory basic results, definitions and rules of inference are listed and proven in a systematic approach, aiming the student who seeks a self-contained account on the matter before moving to more advanced material. A selection of the relevant theorems of Probability Theory that comes directly from Kolmogorov's axioms, Set Theory basic results, definitions and rules of inference are listed and proven in a systematic approach, aiming the student who seeks a self-contained account on the matter before moving to more advanced material. △ Less

Submitted 10 June, 2022; originally announced June 2022.

MSC Class: 60-01; 60A05

arXiv:2205.00793 [pdf, other]

Ultra-Reliable Low-Latency Millimeter-Wave Communications with Sliding Window Network Coding

Authors: Eurico Dias, Duarte Raposo, Homa Esfahanizadeh, Alejandro Cohen, Tânia Ferreira, Miguel Luís, Susana Sargento, Muriel Médard

Abstract: Ultra-reliability and low-latency are pivotal requirements of the new 6th generation of communication systems (xURLLC). Over the past years, to increase throughput, adaptive active antennas were introduced in advanced wireless communications, specifically in the domain of millimeter-wave (mmWave). Consequently, new lower-layer techniques were proposed to cope with practical challenges of high dime… ▽ More Ultra-reliability and low-latency are pivotal requirements of the new 6th generation of communication systems (xURLLC). Over the past years, to increase throughput, adaptive active antennas were introduced in advanced wireless communications, specifically in the domain of millimeter-wave (mmWave). Consequently, new lower-layer techniques were proposed to cope with practical challenges of high dimensional and electronically-steerable beams. The transition from omni-directional to highly directional antennas presents a new type of wireless systems that deliver high bandwidth, but that are susceptible to high losses and high latency variation. Classical approaches cannot close the rising gap between high throughput and low delay in those advanced systems. In this work, we incorporate effective sliding window network coding solutions in mmWave communications. While legacy systems such as rateless codes improve delay, cross-layer results show that they do not provide low latency communications (LLC - below 10 ms), due to the lossy behaviour of mmWave channel and the lower-layers' retransmission mechanisms. On the other hand, fixed sliding window random linear network coding (RLNC) is able to achieve LLC, and even better, adaptive sliding window RLNC obtains ultra-reliable LLC (Ultra-Reliable and Low-Latency Communications (URLLC) - LLC with maximum delay below 10 ms with more than 99% success rate). △ Less

Submitted 15 September, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

arXiv:2202.08137 [pdf, other]

A data-driven approach for learning to control computers

Authors: Peter C Humphreys, David Raposo, Toby Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Alex Goldin, Adam Santoro, Timothy Lillicrap

Abstract: It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse,… ▽ More It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on develo** a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would. △ Less

Submitted 11 November, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

arXiv:2102.12425 [pdf, other]

Synthetic Returns for Long-Term Credit Assignment

Authors: David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song

Abstract: Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) le… ▽ More Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two. In this work, we use SA-learning to model the contribution of past states to the current reward. With this model we can predict each state's contribution to the far future, a quantity we call "synthetic returns". TD-learning can then be applied to select actions that maximize these synthetic returns (SRs). We demonstrate the effectiveness of augmenting agents with SRs across a range of tasks on which TD-learning alone fails. We show that the learned SRs are interpretable: they spike for states that occur after critical actions are taken. Finally, we show that our IMPALA-based SR agent solves Atari Skiing -- a game with a lengthy reward delay that posed a major hurdle to deep-RL agents -- 25 times faster than the published state-of-the-art. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.03406 [pdf, other]

Symbolic Behaviour in Artificial Intelligence

Authors: Adam Santoro, Andrew Lampinen, Kory Mathewson, Timothy Lillicrap, David Raposo

Abstract: The ability to use symbols is the pinnacle of human intelligence, but has yet to be fully replicated in machines. Here we argue that the path towards symbolically fluent artificial intelligence (AI) begins with a reinterpretation of what symbols are, how they come to exist, and how a system behaves when it uses them. We begin by offering an interpretation of symbols as entities whose meaning is es… ▽ More The ability to use symbols is the pinnacle of human intelligence, but has yet to be fully replicated in machines. Here we argue that the path towards symbolically fluent artificial intelligence (AI) begins with a reinterpretation of what symbols are, how they come to exist, and how a system behaves when it uses them. We begin by offering an interpretation of symbols as entities whose meaning is established by convention. But crucially, something is a symbol only for those who demonstrably and actively participate in this convention. We then outline how this interpretation thematically unifies the behavioural traits humans exhibit when they use symbols. This motivates our proposal that the field place a greater emphasis on symbolic behaviour rather than particular computational mechanisms inspired by more restrictive interpretations of symbols. Finally, we suggest that AI research explore social and cultural engagement as a tool to develop the cognitive machinery necessary for symbolic behaviour to emerge. This approach will allow for AI to interpret something as symbolic on its own rather than simply manipulate things that are only symbols to human onlookers, and thus will ultimately lead to AI with more human-like symbolic fluency. △ Less

Submitted 21 January, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

arXiv:2010.00343 [pdf, other]

Bringing Network Coding into SDN: A Case-study for Highly Meshed Heterogeneous Communications

Authors: Alejandro Cohen, Homa Esfahanizadeh, Bruno Sousa, João P. Vilela, Miguel Luís, Duarte Raposo, Francois Michel, Susana Sargento, Muriel Médard

Abstract: Modern communications have moved away from point-to-point models to increasingly heterogeneous network models. In this article, we propose a novel controller-based protocol to deploy adaptive causal network coding in heterogeneous and highly-meshed communication networks. Specifically, we consider using Software-Defined-Network (SDN) as the main controller. We first present an architecture for the… ▽ More Modern communications have moved away from point-to-point models to increasingly heterogeneous network models. In this article, we propose a novel controller-based protocol to deploy adaptive causal network coding in heterogeneous and highly-meshed communication networks. Specifically, we consider using Software-Defined-Network (SDN) as the main controller. We first present an architecture for the highly-meshed heterogeneous multi-source multi-destination networks that represents the practical communication networks encountered in the fifth generation of wireless networks (5G) and beyond. Next, we present a promising solution to deploy network coding over the new architecture. In fact, we investigate how to generalize adaptive and causal random linear network coding (AC-RLNC), proposed for multipath multi-hop (MP-MH) communication channels, to a protocol for the new multi-source multi-destination network architecture using controller. To this end, we present a modularized implementation of AC-RLNC solution where the modules work together in a distributed fashion and perform the AC-RLNC technology. We also present a new controller-based setting through which the network coding modules can communicate and can attain their required information. Finally, we briefly discuss how the proposed architecture and network coding solution provide a good opportunity for future technologies, e.g., distributed coded computation and storage, mmWave communication environments, and innovative and efficient security features. △ Less

Submitted 1 October, 2020; originally announced October 2020.

arXiv:2006.03662 [pdf, other]

Rapid Task-Solving in Novel Environments

Authors: Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo

Abstract: We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern… ▽ More We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience. algorithm and that they generalize to situations beyond their training experience. △ Less

Submitted 19 April, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

arXiv:1904.10396 [pdf, other]

Is coding a relevant metaphor for building AI? A commentary on "Is coding a relevant metaphor for the brain?", by Romain Brette

Authors: Adam Santoro, Felix Hill, David Barrett, David Raposo, Matthew Botvinick, Timothy Lillicrap

Abstract: Brette contends that the neural coding metaphor is an invalid basis for theories of what the brain does. Here, we argue that it is an insufficient guide for building an artificial intelligence that learns to accomplish short- and long-term goals in a complex, changing environment. Brette contends that the neural coding metaphor is an invalid basis for theories of what the brain does. Here, we argue that it is an insufficient guide for building an artificial intelligence that learns to accomplish short- and long-term goals in a complex, changing environment. △ Less

Submitted 18 April, 2019; originally announced April 2019.

arXiv:1901.08162 [pdf, other]

Causal Reasoning from Meta-reinforcement Learning

Authors: Ishita Dasgupta, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, Zeb Kurth-Nelson

Abstract: Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel… ▽ More Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments. △ Less

Submitted 23 January, 2019; originally announced January 2019.

arXiv:1901.03559 [pdf, other]

An investigation of model-free planning

Authors: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

Abstract: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been propos… ▽ More The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning. △ Less

Submitted 20 May, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

arXiv:1806.01830 [pdf, other]

Relational Deep Reinforcement Learning

Authors: Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia

Abstract: We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and… ▽ More We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL. △ Less

Submitted 28 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

arXiv:1806.01822 [pdf, other]

Relational recurrent neural networks

Authors: Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, Timothy Lillicrap

Abstract: Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in wh… ▽ More Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in which entities are connected -- i.e., tasks involving relational reasoning. We then improve upon these deficits by using a new memory module -- a \textit{Relational Memory Core} (RMC) -- which employs multi-head dot product attention to allow memories to interact. Finally, we test the RMC on a suite of tasks that may profit from more capable relational reasoning across sequential information, and show large gains in RL domains (e.g. Mini PacMan), program evaluation, and language modeling, achieving state-of-the-art results on the WikiText-103, Project Gutenberg, and GigaWord datasets. △ Less

Submitted 28 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

arXiv:1806.01261 [pdf, other]

Relational inductive biases, deep learning, and graph networks

Authors: Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals , et al. (2 additional authors not shown)

Abstract: Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, rema… ▽ More Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice. △ Less

Submitted 17 October, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

arXiv:1805.09786 [pdf, other]

Hyperbolic Attention Networks

Authors: Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

Abstract: We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks… ▽ More We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while kee** the neural representations compact. △ Less

Submitted 24 May, 2018; originally announced May 2018.

arXiv:1706.01427 [pdf, other]

A simple neural network module for relational reasoning

Authors: Adam Santoro, David Raposo, David G. T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap

Abstract: Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset ca… ▽ More Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations. △ Less

Submitted 5 June, 2017; originally announced June 2017.

arXiv:1702.05068 [pdf, other]

Discovering objects and their relations from entangled scene representations

Authors: David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, Peter Battaglia

Abstract: Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and thei… ▽ More Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and their relations for learning a wide spectrum of tasks, and more generally when learning the structure underlying observed data. In this work, we introduce relation networks (RNs) - a general purpose neural network architecture for object-relation reasoning. We show that RNs are capable of learning object relations from scene description data. Furthermore, we show that RNs can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. The model can also be used in conjunction with differentiable memory mechanisms for implicit relation discovery in one-shot learning tasks. Our results suggest that relation networks are a potentially powerful architecture for solving a variety of problems that require object relation reasoning. △ Less

Submitted 16 February, 2017; originally announced February 2017.

Comments: ICLR Workshop 2017

Showing 1–19 of 19 results for author: Raposo, D