Search | arXiv e-print repository

A 72h exploration of the co-evolution of food insecurity and international migration

Authors: Duncan Cassells, Lorenzo Costantini, Ariel Flint Ashery, Shreyas Gadge, Diogo L. Pires, Miguel Á. Sánchez-Cortés, Arnaldo Santoro, Elisa Omodei

Abstract: Food insecurity, defined as the lack of physical or economic access to safe, nutritious and sufficient food, remains one of the main challenges of the 2030 Agenda for Sustainable Development. Food insecurity is a complex phenomenon, resulting from the interplay of environmental, socio-demographic, and political events. Previous work has investigated the nexus between climate change, conflict, migr… ▽ More Food insecurity, defined as the lack of physical or economic access to safe, nutritious and sufficient food, remains one of the main challenges of the 2030 Agenda for Sustainable Development. Food insecurity is a complex phenomenon, resulting from the interplay of environmental, socio-demographic, and political events. Previous work has investigated the nexus between climate change, conflict, migration and food security at the household level, however these relations are still largely unexplored at national scales. In this context, during the Complexity72h workshop, held at the Universidad Carlos III de Madrid in June 2024, we explored the co-evolution of international migration flows and food insecurity at the national scale, accounting for remittances, as well as for changes in the economic, conflict, and climate situation. To this aim, we gathered data from several publicly available sources (Food and Agriculture Organization, World Bank, and UN Department of Economic and Social Affairs) and analyzed the association between food insecurity and migration, migration and remittances, and remittances and food insecurity. We then propose a framework linking together these associations to model the co-evolution of food insecurity and international migrations. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2404.02258 [pdf, other]

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Authors: David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro

Abstract: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by cap** the number of tokens ($k$) that… ▽ More Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by cap** the number of tokens ($k$) that can participate in the self-attention and MLP computations at a given layer. The tokens to be processed are determined by the network using a top-$k$ routing mechanism. Since $k$ is defined a priori, this simple procedure uses a static computation graph with known tensor sizes, unlike other conditional computation techniques. Nevertheless, since the identities of the $k$ tokens are fluid, this method can expend FLOPs non-uniformly across the time and model depth dimensions. Thus, compute expenditure is entirely predictable in sum total, but dynamic and context-sensitive at the token-level. Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50\% faster to step during post-training sampling. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2302.02764 [pdf, other]

doi 10.1016/j.nima.2023.168449

Machine Learning based tool for CMS RPC currents quality monitoring

Authors: E. Shumka, A. Samalan, M. Tytgat, M. El Sawy, G. A. Alves, F. Marujo, E. A. Coelho, E. M. Da Costa, H. Nogima, A. Santoro, S. Fonseca De Souza, D. De Jesus Damiao, M. Thiel, K. Mota Amarilo, M. Barroso Ferreira Filho, A. Aleksandrov, R. Hadjiiska, P. Iaydjiev, M. Rodozov, M. Shopova, G. Soultanov, A. Dimitrov, L. Litov, B. Pavlov, P. Petkov , et al. (83 additional authors not shown)

Abstract: The muon system of the CERN Compact Muon Solenoid (CMS) experiment includes more than a thousand Resistive Plate Chambers (RPC). They are gaseous detectors operated in the hostile environment of the CMS underground cavern on the Large Hadron Collider where pp luminosities of up to $2\times 10^{34}$ $\text{cm}^{-2}\text{s}^{-1}$ are routinely achieved. The CMS RPC system performance is constantly m… ▽ More The muon system of the CERN Compact Muon Solenoid (CMS) experiment includes more than a thousand Resistive Plate Chambers (RPC). They are gaseous detectors operated in the hostile environment of the CMS underground cavern on the Large Hadron Collider where pp luminosities of up to $2\times 10^{34}$ $\text{cm}^{-2}\text{s}^{-1}$ are routinely achieved. The CMS RPC system performance is constantly monitored and the detector is regularly maintained to ensure stable operation. The main monitorable characteristics are dark current, efficiency for muon detection, noise rate etc. Herein we describe an automated tool for CMS RPC current monitoring which uses Machine Learning techniques. We further elaborate on the dedicated generalized linear model proposed already and add autoencoder models for self-consistent predictions as well as hybrid models to allow for RPC current predictions in a distant future. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2211.16591 [pdf, other]

doi 10.1016/j.nima.2023.168271

RPC based tracking system at CERN GIF++ facility

Authors: K. Mota Amarilo, A. Samalan, M. Tytgat, M. El Sawy, G. A. Alves, F. Marujo, E. A. Coelho, E. M. Da Costa, H. Nogima, A. Santoro, S. Fonseca De Souza, D. De Jesus Damiao, M. Thiel, M. Barroso Ferreira Filho, A. Aleksandrov, R. Hadjiiska, P. Iaydjiev, M. Rodozov, M. Shopova, G. Soultanov, A. Dimitrov, L. Litov, B. Pavlov, P. Petkov, A. Petrov , et al. (83 additional authors not shown)

Abstract: With the HL-LHC upgrade of the LHC machine, an increase of the instantaneous luminosity by a factor of five is expected and the current detection systems need to be validated for such working conditions to ensure stable data taking. At the CERN Gamma Irradiation Facility (GIF++) many muon detectors undergo such studies, but the high gamma background can pose a challenge to the muon trigger system… ▽ More With the HL-LHC upgrade of the LHC machine, an increase of the instantaneous luminosity by a factor of five is expected and the current detection systems need to be validated for such working conditions to ensure stable data taking. At the CERN Gamma Irradiation Facility (GIF++) many muon detectors undergo such studies, but the high gamma background can pose a challenge to the muon trigger system which is exposed to many fake hits from the gamma background. A tracking system using RPCs is implemented to clean the fake hits, taking profit of the high muon efficiency of these chambers. This work will present the tracking system configuration, used detector analysis algorithm and results. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 12 pages, 9 figures. Contribution to XVI Workshop on Resistive Plate Chambers and Related Detectors (RPC2022), September 26-30 2022. Submitted to Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment

arXiv:2211.11602 [pdf, other]

Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback

Authors: Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex Goldin, Alden Hung, Jessica Landon, Jirka Lhotka, Timothy Lillicrap, Alistair Muldal, George Powell, Adam Santoro, Guy Scully, Sanjana Srivastava, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu

Abstract: An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulate… ▽ More An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call "Inter-temporal Bradley-Terry" (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live interactions with agents. Altogether our results demonstrate how one can successfully leverage human judgments to improve agent behaviour, allowing us to use reinforcement learning in complex, embodied domains without programmatic reward functions. Videos of agent behaviour may be found at https://youtu.be/v_Z9F2_eKk4. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.03139 [pdf, other]

Intra-agent speech permits zero-shot task acquisition

Authors: Chen Yan, Federico Carnevale, Petko Georgiev, Adam Santoro, Aurelia Guy, Alistair Muldal, Chia-Chun Hung, Josh Abramson, Timothy Lillicrap, Gregory Wayne

Abstract: Human language learners are exposed to a trickle of informative, context-sensitive language, but a flood of raw sensory data. Through both social language use and internal processes of rehearsal and practice, language learners are able to build high-level, semantic representations that explain their perceptions. Here, we take inspiration from such processes of "inner speech" in humans (Vygotsky, 1… ▽ More Human language learners are exposed to a trickle of informative, context-sensitive language, but a flood of raw sensory data. Through both social language use and internal processes of rehearsal and practice, language learners are able to build high-level, semantic representations that explain their perceptions. Here, we take inspiration from such processes of "inner speech" in humans (Vygotsky, 1934) to better understand the role of intra-agent speech in embodied behavior. First, we formally pose intra-agent speech as a semi-supervised problem and develop two algorithms that enable visually grounded captioning with little labeled language data. We then experimentally compute scaling curves over different amounts of labeled data and compare the data efficiency against a supervised learning baseline. Finally, we incorporate intra-agent speech into an embodied, mobile manipulator agent operating in a 3D virtual world, and show that with as few as 150 additional image captions, intra-agent speech endows the agent with the ability to manipulate and answer questions about a new object without any related task-directed experience (zero-shot). Taken together, our experiments suggest that modelling intra-agent speech is effective in enabling embodied agents to learn new tasks efficiently and without direct interaction experience. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2205.13274 [pdf, other]

Evaluating Multimodal Interactive Agents

Authors: Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex Goldin, Alden Hung, Jessica Landon, Timothy Lillicrap, Alistair Muldal, Blake Richards, Adam Santoro, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan

Abstract: Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and prese… ▽ More Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data. Agents see replayed scenario context, receive an instruction, and are then given control to complete the interaction offline. These agent continuations are recorded and sent to human annotators to mark as success or failure, and agents are ranked according to the proportion of continuations in which they succeed. The resulting STS is fast, controlled, interpretable, and representative of naturalistic interactions. Altogether, the STS consolidates much of what is desirable across many of our standard evaluation metrics, allowing us to accelerate research progress towards producing agents that can interact naturally with humans. A video may be found at https://youtu.be/YR1TngGORGQ. △ Less

Submitted 14 July, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2205.05055 [pdf, other]

Data Distributional Properties Drive Emergent In-Context Learning in Transformers

Authors: Stephanie C. Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, Felix Hill

Abstract: Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distribu… ▽ More Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distributional properties such as burstiness (items appear in clusters rather than being uniformly distributed over time) and having large numbers of rarely occurring classes. In-context learning also emerges more strongly when item meanings or interpretations are dynamic rather than fixed. These properties are exemplified by natural language, but are also inherent to naturalistic data in a wide range of other domains. They also depart significantly from the uniform, i.i.d. training distributions typically used for standard supervised learning. In our initial experiments, we found that in-context learning traded off against more conventional weight-based learning, and models were unable to achieve both simultaneously. However, our later experiments uncovered that the two modes of learning could co-exist in a single model when it was trained on data following a skewed Zipfian distribution -- another common property of naturalistic data, including language. In further experiments, we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models. In sum, our findings indicate how the transformer architecture works together with particular properties of the training data to drive the intriguing emergent in-context learning behaviour of large language models, and how future work might encourage both in-context and in-weights learning in domains beyond language. △ Less

Submitted 17 November, 2022; v1 submitted 22 April, 2022; originally announced May 2022.

Comments: Accepted at NeurIPS 2022 (Oral). Code is available at: https://github.com/deepmind/emergent_in_context_learning

arXiv:2203.10702 [pdf, other]

doi 10.1038/s41567-022-01852-0

Unveiling the higher-order organization of multivariate time series

Authors: Andrea Santoro, Federico Battiston, Giovanni Petri, Enrico Amico

Abstract: Time series analysis has proven to be a powerful method to characterize several phenomena in biology, neuroscience and economics, and to understand some of their underlying dynamical features. Despite a plethora of methods have been proposed for the analysis of multivariate time series, most of them neglect the effect of non-pairwise interactions on the emerging dynamics. Here, we propose a novel… ▽ More Time series analysis has proven to be a powerful method to characterize several phenomena in biology, neuroscience and economics, and to understand some of their underlying dynamical features. Despite a plethora of methods have been proposed for the analysis of multivariate time series, most of them neglect the effect of non-pairwise interactions on the emerging dynamics. Here, we propose a novel framework to characterize the temporal evolution of higher-order dependencies within multivariate time series. Using network analysis and topology, we show that, unlike traditional tools based on pairwise statistics, our framework robustly differentiates various spatiotemporal regimes of coupled chaotic maps, including chaotic dynamical phases and various types of synchronization. Hence, using the higher-order co-fluctuation patterns in simulated dynamical processes as a guide, we highlight and quantify signatures of higher-order patterns in data from brain functional activity, financial markets, and epidemics. Overall, our approach sheds new light on the higher-order organization of multivariate time series, allowing a better characterization of dynamical group dependencies inherent to real-world data. △ Less

Submitted 12 September, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: 16 pages, 5 figures. Supplementary Information (16 figures, 2 tables)

arXiv:2202.08137 [pdf, other]

A data-driven approach for learning to control computers

Authors: Peter C Humphreys, David Raposo, Toby Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Alex Goldin, Adam Santoro, Timothy Lillicrap

Abstract: It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse,… ▽ More It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on develo** a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would. △ Less

Submitted 11 November, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

arXiv:2112.03763 [pdf, other]

Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning

Authors: DeepMind Interactive Agents Team, Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Felix Fischer, Petko Georgiev, Alex Goldin, Mansi Gupta, Tim Harley, Felix Hill, Peter C Humphreys, Alden Hung, Jessica Landon, Timothy Lillicrap, Hamza Merzic, Alistair Muldal, Adam Santoro, Guy Scully, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu

Abstract: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a… ▽ More A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection. Altogether, our results demonstrate that imitation of multi-modal, real-time human behaviour may provide a straightforward and surprisingly effective means of imbuing agents with a rich behavioural prior from which agents might then be fine-tuned for specific purposes, thus laying a foundation for training capable agents for interactive robots or digital assistants. A video of MIA's behaviour may be found at https://youtu.be/ZFgRhviF7mY △ Less

Submitted 2 February, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

arXiv:2112.03753 [pdf, other]

Tell me why! Explanations support learning relational and causal structure

Authors: Andrew K. Lampinen, Nicholas A. Roy, Ishita Dasgupta, Stephanie C. Y. Chan, Allison C. Tam, James L. McClelland, Chen Yan, Adam Santoro, Neil C. Rabinowitz, Jane X. Wang, Felix Hill

Abstract: Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational a… ▽ More Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational and causal knowledge, augmenting their experience by training them to predict language descriptions and explanations can overcome these limitations. We show that language can help agents learn challenging relational tasks, and examine which aspects of language contribute to its benefits. We then show that explanations can help agents to infer not only relational but also causal structure. Language can shape the way that agents to generalize out-of-distribution from ambiguous, causally-confounded training, and explanations even allow agents to learn to perform experimental interventions to identify causal relationships. Our results suggest that language description and explanation may be powerful tools for improving agent learning and generalization. △ Less

Submitted 25 May, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: ICML 2022; 23 pages

ACM Class: I.2.6

arXiv:2109.14331 [pdf, other]

doi 10.1088/1748-0221/17/01/C01011

Upgrade of the CMS Resistive Plate Chambers for the High Luminosity LHC

Authors: A. Samalan, M. Tytgat, G. A. Alves, F. Marujo, F. Torres Da Silva De Araujo, E. M. DaCosta, D. De Jesus Damiao, H. Nogima, A. Santoro, S. Fonseca De Souza, A. Aleksandrov, R. Hadjiiska, P. Iaydjiev, M. Rodozov, M. Shopova, G. Soultanov, M. Bonchev, A. Dimitrov, L. Litov, B. Pavlov, P. Petkov, A. Petrov, S. J. Qian, C. Bernal, A. Cabrera , et al. (86 additional authors not shown)

Abstract: During the upcoming High Luminosity phase of the Large Hadron Collider (HL-LHC), the integrated luminosity of the accelerator will increase to 3000 fb$^{-1}$. The expected experimental conditions in that period in terms of background rates, event pileup, and the probable aging of the current detectors present a challenge for all the existing experiments at the LHC, including the Compact Muon Solen… ▽ More During the upcoming High Luminosity phase of the Large Hadron Collider (HL-LHC), the integrated luminosity of the accelerator will increase to 3000 fb$^{-1}$. The expected experimental conditions in that period in terms of background rates, event pileup, and the probable aging of the current detectors present a challenge for all the existing experiments at the LHC, including the Compact Muon Solenoid (CMS) experiment. To ensure a highly performing muon system for this period, several upgrades of the Resistive Plate Chamber (RPC) system of the CMS are currently being implemented. These include the replacement of the readout system for the present system, and the installation of two new RPC stations with improved chamber and front-end electronics designs. The current overall status of this CMS RPC upgrade project is presented. △ Less

Submitted 2 November, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2102.13013 [pdf, other]

Optimising the mitigation of epidemic spreading through targeted adoption of contact tracing apps

Authors: Aleix Bassolas, Andrea Santoro, Sandro Sousa, Silvia Rognone, Vincenzo Nicosia

Abstract: The ongoing COVID-19 pandemic is the first epidemic in human history in which digital contact-tracing has been deployed at a global scale. Tracking and quarantining all the contacts of individuals who test positive to a virus can help slowing-down an epidemic, but the impact of contact-tracing is severely limited by the generally low adoption of contact-tracing apps in the population. We derive he… ▽ More The ongoing COVID-19 pandemic is the first epidemic in human history in which digital contact-tracing has been deployed at a global scale. Tracking and quarantining all the contacts of individuals who test positive to a virus can help slowing-down an epidemic, but the impact of contact-tracing is severely limited by the generally low adoption of contact-tracing apps in the population. We derive here an analytical expression for the effectiveness of contact-tracing app installation strategies in a SIR model on a given contact graph. We propose a decentralised heuristic to improve the effectiveness of contact tracing under fixed adoption rates, which targets a set of individuals to install contact-tracing apps, and can be easily implemented. Simulations on a large number of real-world contact networks confirm that this heuristic represents a feasible alternative to the current state of the art. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 10 pages, 5 figures, + 12 SI pages, 9 suppl. figs, 1 suppl. table

arXiv:2102.12425 [pdf, other]

Synthetic Returns for Long-Term Credit Assignment

Authors: David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song

Abstract: Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) le… ▽ More Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two. In this work, we use SA-learning to model the contribution of past states to the current reward. With this model we can predict each state's contribution to the far future, a quantity we call "synthetic returns". TD-learning can then be applied to select actions that maximize these synthetic returns (SRs). We demonstrate the effectiveness of augmenting agents with SRs across a range of tasks on which TD-learning alone fails. We show that the learned SRs are interpretable: they spike for states that occur after critical actions are taken. Finally, we show that our IMPALA-based SR agent solves Atari Skiing -- a game with a lengthy reward delay that posed a major hurdle to deep-RL agents -- 25 times faster than the published state-of-the-art. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.03406 [pdf, other]

Symbolic Behaviour in Artificial Intelligence

Authors: Adam Santoro, Andrew Lampinen, Kory Mathewson, Timothy Lillicrap, David Raposo

Abstract: The ability to use symbols is the pinnacle of human intelligence, but has yet to be fully replicated in machines. Here we argue that the path towards symbolically fluent artificial intelligence (AI) begins with a reinterpretation of what symbols are, how they come to exist, and how a system behaves when it uses them. We begin by offering an interpretation of symbols as entities whose meaning is es… ▽ More The ability to use symbols is the pinnacle of human intelligence, but has yet to be fully replicated in machines. Here we argue that the path towards symbolically fluent artificial intelligence (AI) begins with a reinterpretation of what symbols are, how they come to exist, and how a system behaves when it uses them. We begin by offering an interpretation of symbols as entities whose meaning is established by convention. But crucially, something is a symbol only for those who demonstrably and actively participate in this convention. We then outline how this interpretation thematically unifies the behavioural traits humans exhibit when they use symbols. This motivates our proposal that the field place a greater emphasis on symbolic behaviour rather than particular computational mechanisms inspired by more restrictive interpretations of symbols. Finally, we suggest that AI research explore social and cultural engagement as a tool to develop the cognitive machinery necessary for symbolic behaviour to emerge. This approach will allow for AI to interpret something as symbolic on its own rather than simply manipulate things that are only symbols to human onlookers, and thus will ultimately lead to AI with more human-like symbolic fluency. △ Less

Submitted 21 January, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

arXiv:2012.08508 [pdf, other]

Attention over learned object embeddings enables complex visual reasoning

Authors: David Ding, Felix Hill, Adam Santoro, Malcolm Reynolds, Matt Botvinick

Abstract: Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted a… ▽ More Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted approaches, however, is that they can be more brittle than general-purpose neural networks, requiring significant modification or even redesign according to the particular task at hand. Here, we propose a more general neural-network-based approach to dynamic visual reasoning problems that obtains state-of-the-art performance on three different domains, in each case outperforming bespoke modular approaches tailored specifically to the task. Our method relies on learned object-centric representations, self-attention and self-supervised dynamics learning, and all three elements together are required for strong performance to emerge. The success of this combination suggests that there may be no need to trade off flexibility for performance on problems involving spatio-temporal or causal-style reasoning. With the right soft biases and learning objectives in a neural network we may be able to attain the best of both worlds. △ Less

Submitted 26 October, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: 22 pages, 5 figures

arXiv:2012.05672 [pdf, other]

Imitating Interactive Intelligence

Authors: Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Soňa Mokrá, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne , et al. (4 additional authors not shown)

Abstract: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central cha… ▽ More A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount. △ Less

Submitted 20 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

arXiv:2012.03981 [pdf, other]

doi 10.1103/PhysRevLett.127.062003

Comparison of $pp$ and $p \bar{p}$ differential elastic cross sections and observation of the exchange of a colorless $C$-odd gluonic compound

Authors: V. M. Abazov, B. Abbott, B. S. Acharya, M. Adams, T. Adams, J. P. Agnew, G. D. Alexeev, G. Alkhazov, A. Alton, G. A. Alves, G. Antchev, A. Askew, P. Aspell, A. C. S. Assis Jesus, I. Atanassov, S. Atkins, K. Augsten, V. Aushev, Y. Aushev, V. Avati, C. Avila, F. Badaud, J. Baechler, L. Bagby, C. Baldenegro Barrera , et al. (451 additional authors not shown)

Abstract: We describe an analysis comparing the $p\bar{p}$ elastic cross section as measured by the D0 Collaboration at a center-of-mass energy of 1.96 TeV to that in $pp$ collisions as measured by the TOTEM Collaboration at 2.76, 7, 8, and 13 TeV using a model-independent approach. The TOTEM cross sections extrapolated to a center-of-mass energy of $\sqrt{s} =$ 1.96 TeV are compared with the D0 measurement… ▽ More We describe an analysis comparing the $p\bar{p}$ elastic cross section as measured by the D0 Collaboration at a center-of-mass energy of 1.96 TeV to that in $pp$ collisions as measured by the TOTEM Collaboration at 2.76, 7, 8, and 13 TeV using a model-independent approach. The TOTEM cross sections extrapolated to a center-of-mass energy of $\sqrt{s} =$ 1.96 TeV are compared with the D0 measurement in the region of the diffractive minimum and the second maximum of the $pp$ cross section. The two data sets disagree at the 3.4$σ$ level and thus provide evidence for the $t$-channel exchange of a colorless, $C$-odd gluonic compound, also known as the odderon. We combine these results with a TOTEM analysis of the same $C$-odd exchange based on the total cross section and the ratio of the real to imaginary parts of the forward elastic scattering amplitude in $pp$ scattering. The combined significance of these results is larger than 5$σ$ and is interpreted as the first observation of the exchange of a colorless, $C$-odd gluonic compound. △ Less

Submitted 25 June, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: D0 and TOTEM Collaborations

Journal ref: Phys. Rev. Lett. 127, 062003 (2021)

arXiv:2006.03662 [pdf, other]

Rapid Task-Solving in Novel Environments

Authors: Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo

Abstract: We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern… ▽ More We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience. algorithm and that they generalize to situations beyond their training experience. △ Less

Submitted 19 April, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

arXiv:2005.12769 [pdf, other]

CMS RPC Background -- Studies and Measurements

Authors: R. Hadjiiska, A. Samalan, M. Tytgat, N. Zaganidis, G. A. Alves, F. Marujo, F. Torres Da Silva De Araujo, E. M. Da Costa, D. De Jesus Damiao, H. Nogima, A. Santoro, S. Fonseca De Souza, A. Aleksandrov, P. Iaydjiev, M. Rodozov, M. Shopova, G. Sultanov, M. Bonchev, A. Dimitrov, L. Litov, B. Pavlov, P. Petkov, A. Petrov, S. J. Qian, C. Bernal , et al. (84 additional authors not shown)

Abstract: The expected radiation background in the CMS RPC system has been studied using the MC prediction with the CMS FLUKA simulation of the detector and the cavern. The MC geometry used in the analysis describes very accurately the present RPC system but still does not include the complete description of the RPC upgrade region with pseudorapidity $1.9 < \lvert η\rvert < 2.4$. Present results will be upd… ▽ More The expected radiation background in the CMS RPC system has been studied using the MC prediction with the CMS FLUKA simulation of the detector and the cavern. The MC geometry used in the analysis describes very accurately the present RPC system but still does not include the complete description of the RPC upgrade region with pseudorapidity $1.9 < \lvert η\rvert < 2.4$. Present results will be updated with the final geometry description, once it is available. The radiation background has been studied in terms of expected particle rates, absorbed dose and fluence. Two High Luminosity LHC (HL-LHC) scenarios have been investigated - after collecting $3000$ and $4000$ fb$^{-1}$. Estimations with safety factor of 3 have been considered, as well. △ Less

Submitted 13 December, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

Comments: 6 pages, Conference proceeding for the 2020 Resistive Plate Chambers and Related Detectors. Minor revision of the report, the results remain unchanged. Three new plots are added and some details were explained better

arXiv:1910.04783 [pdf, other]

doi 10.1103/PhysRevResearch.2.033122

Optimal percolation in correlated multilayer networks with overlap

Authors: Andrea Santoro, Vincenzo Nicosia

Abstract: Multilayer networks have been found to be prone to abrupt cascading failures under random and targeted attacks, but most of the targeting algorithms proposed so far have been mainly tested on uncorrelated systems. Here we show that the size of the critical percolation set of a multilayer network is substantially affected by the presence of inter-layer degree correlations and edge overlap. We provi… ▽ More Multilayer networks have been found to be prone to abrupt cascading failures under random and targeted attacks, but most of the targeting algorithms proposed so far have been mainly tested on uncorrelated systems. Here we show that the size of the critical percolation set of a multilayer network is substantially affected by the presence of inter-layer degree correlations and edge overlap. We provide extensive numerical evidence which confirms that the state-of-the-art optimal percolation strategies consistently fail to identify minimal percolation sets in synthetic and real-world correlated multilayer networks, thus overestimating their robustness. We propose two new targeting algorithms, based on the local estimation of path disruptions away from a given node, and a family of Pareto-efficient strategies that take into account both intra-layer and inter-layer heuristics, and can be easily extended to multiplex networks with an arbitrary number of layers. We show that these strategies consistently outperform existing attacking algorithms, on both synthetic and real-world multiplex networks, and provide some interesting insights about the interplay of correlations and overlap in determining the hyperfragility of real-world multilayer networks. Overall, the results presented in the paper suggest that we are still far from having fully identified the salient ingredients determining the robustness of multiplex networks to targeted attacks. △ Less

Submitted 22 July, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: 14 pages, 9 figures, 1 table

Journal ref: Phys. Rev. Research 2, 033122 (2020)

arXiv:1910.00571 [pdf, other]

Environmental drivers of systematicity and generalization in a situated agent

Authors: Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L. McClelland, Adam Santoro

Abstract: The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparative… ▽ More The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparatively generic agent architecture that exhibits strong performance on these tests. We then identify three aspects of the training regime and environment that make a significant difference to its performance: (a) the number of object/word experiences in the training set; (b) the visual invariances afforded by the agent's perspective, or frame of reference; and (c) the variety of visual input inherent in the perceptual aspect of the agent's perception. Our findings indicate that the degree of generalisation that networks exhibit can depend critically on particulars of the environment in which a given task is instantiated. They further suggest that the propensity for neural networks to generalise in systematic ways may increase if, like human children, those networks have access to many frames of richly varying, multi-modal observations as they learn. △ Less

Submitted 19 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

arXiv:1909.12892 [pdf, other]

Automated curricula through setter-solver interactions

Authors: Sebastien Racaniere, Andrew K. Lampinen, Adam Santoro, David P. Reichert, Vlad Firoiu, Timothy P. Lillicrap

Abstract: Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to make learning feasible. Human education instead relies on curricula--the breakdown of tasks into simpler, static challenges with dense rewards--to build up to… ▽ More Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to make learning feasible. Human education instead relies on curricula--the breakdown of tasks into simpler, static challenges with dense rewards--to build up to complex behaviors. While curricula are also useful for artificial agents, hand-crafting them is time consuming. This has lead researchers to explore automatic curriculum generation. Here we explore automatic curriculum generation in rich, dynamic environments. Using a setter-solver paradigm we show the importance of considering goal validity, goal feasibility, and goal coverage to construct useful curricula. We demonstrate the success of our approach in rich but sparsely rewarding 2D and 3D environments, where an agent is tasked to achieve a single goal selected from a set of possible goals that varies between episodes, and identify challenges for future work. Finally, we demonstrate the value of a novel technique that guides agents towards a desired goal distribution. Altogether, these results represent a substantial step towards applying automatic task curricula to learn complex, otherwise unlearnable goals, and to our knowledge are the first to demonstrate automated curriculum generation for goal-conditioned agents in environments where the possible goals vary between episodes. △ Less

Submitted 21 January, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

Journal ref: International Conference on Learning Representations, 2020

arXiv:1906.09068 [pdf, other]

Simplex2Vec embeddings for community detection in simplicial complexes

Authors: Jacob Charles Wright Billings, Mirko Hu, Giulia Lerda, Alexey N. Medvedev, Francesco Mottes, Adrian Onicas, Andrea Santoro, Giovanni Petri

Abstract: Topological representations are rapidly becoming a popular way to capture and encode higher-order interactions in complex systems. They have found applications in disciplines as different as cancer genomics, brain function, and computational social science, in representing both descriptive features of data and inference models. While intense research has focused on the connectivity and homological… ▽ More Topological representations are rapidly becoming a popular way to capture and encode higher-order interactions in complex systems. They have found applications in disciplines as different as cancer genomics, brain function, and computational social science, in representing both descriptive features of data and inference models. While intense research has focused on the connectivity and homological features of topological representations, surprisingly scarce attention has been given to the investigation of the community structures of simplicial complexes. To this end, we adopt recent advances in symbolic embeddings to compute and visualize the community structures of simplicial complexes. We first investigate the stability properties of embedding obtained for synthetic simplicial complexes to the presence of higher order interactions. We then focus on complexes arising from social and brain functional data and show how higher order interactions can be leveraged to improve clustering detection and assess the effect of higher order interaction on individual nodes. We conclude delineating limitations and directions for extension of this work. △ Less

Submitted 21 June, 2019; originally announced June 2019.

arXiv:1904.10396 [pdf, other]

Is coding a relevant metaphor for building AI? A commentary on "Is coding a relevant metaphor for the brain?", by Romain Brette

Authors: Adam Santoro, Felix Hill, David Barrett, David Raposo, Matthew Botvinick, Timothy Lillicrap

Abstract: Brette contends that the neural coding metaphor is an invalid basis for theories of what the brain does. Here, we argue that it is an insufficient guide for building an artificial intelligence that learns to accomplish short- and long-term goals in a complex, changing environment. Brette contends that the neural coding metaphor is an invalid basis for theories of what the brain does. Here, we argue that it is an insufficient guide for building an artificial intelligence that learns to accomplish short- and long-term goals in a complex, changing environment. △ Less

Submitted 18 April, 2019; originally announced April 2019.

arXiv:1903.08049 [pdf, other]

doi 10.1103/PhysRevX.10.021069

Algorithmic complexity of multiplex networks

Authors: Andrea Santoro, Vincenzo Nicosia

Abstract: Multilayer networks preserve full information about the different interactions among the constituents of a complex system, and have recently proven quite useful in modelling transportation networks, social circles, and the human brain. A fundamental and still open problem is to assess if and when the multilayer representation of a system provides a qualitatively better model than the classical sin… ▽ More Multilayer networks preserve full information about the different interactions among the constituents of a complex system, and have recently proven quite useful in modelling transportation networks, social circles, and the human brain. A fundamental and still open problem is to assess if and when the multilayer representation of a system provides a qualitatively better model than the classical single-layer aggregated network. Here we tackle this problem from an algorithmic information theory perspective. We propose an intuitive way to encode a multilayer network into a bit string, and we define the complexity of a multilayer network as the ratio of the Kolmogorov complexity of the bit strings associated to the multilayer and to the corresponding aggregated graph. We find that there exists a maximum amount of additional information that a multilayer model can encode with respect to the equivalent single-layer graph. We show how our complexity measure can be used to obtain low-dimensional representations of multidimensional systems, to cluster multilayer networks into a small set of meaningful super-families, and to detect tip** points in the evolution of different time-varying multilayer graphs. Interestingly, the low-dimensional multiplex networks obtained with the proposed method also retain most of the dynamical properties of the original systems, as demonstrated for instance by the preservation of the epidemic threshold in the multiplex SIS model. These results suggest that information-theoretic approaches can be effectively employed for a more systematic analysis of static and time-varying multidimensional complex systems. △ Less

Submitted 26 June, 2020; v1 submitted 19 March, 2019; originally announced March 2019.

Comments: 28 pages, 17 figures, 3 tables

Journal ref: Phys. Rev. X 10, 021069 (2020)

arXiv:1902.00120 [pdf, other]

Learning to Make Analogies by Contrasting Abstract Relational Structure

Authors: Felix Hill, Adam Santoro, David G. T. Barrett, Ari S. Morcos, Timothy Lillicrap

Abstract: Analogical reasoning has been a principal focus of various waves of AI research. Analogy is particularly challenging for machines because it requires relational structures to be represented such that they can be flexibly applied across diverse domains of experience. Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data. We… ▽ More Analogical reasoning has been a principal focus of various waves of AI research. Analogy is particularly challenging for machines because it requires relational structures to be represented such that they can be flexibly applied across diverse domains of experience. Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data. We find that the critical factor for inducing such a capacity is not an elaborate architecture, but rather, careful attention to the choice of data and the manner in which it is presented to the model. The most robust capacity for analogical reasoning is induced when networks learn analogies by contrasting abstract relational structures in their input domains, a training method that uses only the input data to force models to learn about important abstract features. Using this technique we demonstrate capacities for complex, visual and symbolic analogy making and generalisation in even the simplest neural network architectures. △ Less

Submitted 31 January, 2019; originally announced February 2019.

arXiv:1901.03559 [pdf, other]

An investigation of model-free planning

Authors: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

Abstract: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been propos… ▽ More The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning. △ Less

Submitted 20 May, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

arXiv:1808.00300 [pdf, other]

Learning Visual Question Answering by Bootstrap** Hard Attention

Authors: Mateusz Malinowski, Carl Doersch, Adam Santoro, Peter Battaglia

Abstract: Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is… ▽ More Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is re-weighted and aggregated, but never filtered out. Here, we introduce a new approach for hard attention and find it achieves very competitive performance on a recently-released visual question answering datasets, equalling and in some cases surpassing similar soft attention architectures while entirely ignoring some features. Even though the hard attention mechanism is thought to be non-differentiable, we found that the feature magnitudes correlate with semantic relevance, and provide a useful signal for our mechanism's attentional selection criterion. Because hard attention selects important features of the input information, it can also be more efficient than analogous soft attention mechanisms. This is especially important for recent approaches that use non-local pairwise operations, whereby computational and memory costs are quadratic in the size of the set of features. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: ECCV 2018

arXiv:1807.05680 [pdf, other]

doi 10.1088/1748-0221/14/10/C10037

High Rate RPC detector for LHC

Authors: F. Lagarde, A. Fagot, M. Gul, C. Roskas, M. Tytgat, N. Zaganidis, S. Fonseca De Souza, A. Santoro, F. Torres Da Silva De Araujo, A. Aleksandrov, R. Hadjiiska, P. Iaydjiev, M. Rodozov, M. Shopova, G. Sultanov, A. Dimitrov, L. Litov, B. Pavlov, P. Petkov, A. Petrov, S. J. Qian, D. Han, W. Yi, C. Avila, A. Cabrera , et al. (77 additional authors not shown)

Abstract: The High Luminosity LHC (HL-LHC) phase is designed to increase by an order of magnitude the amount of data to be collected by the LHC experiments. The foreseen gradual increase of the instantaneous luminosity of up to more than twice its nominal value of $10\times10^{34}\ {\rm cm}^{-1}{\rm s}^{-2}$ during Phase I and Phase II of the LHC running, presents special challenges for the experiments. The… ▽ More The High Luminosity LHC (HL-LHC) phase is designed to increase by an order of magnitude the amount of data to be collected by the LHC experiments. The foreseen gradual increase of the instantaneous luminosity of up to more than twice its nominal value of $10\times10^{34}\ {\rm cm}^{-1}{\rm s}^{-2}$ during Phase I and Phase II of the LHC running, presents special challenges for the experiments. The region with high pseudo rapidity ($η$) region of the forward muon spectrometer ($2.4 > |η| > 1.9$) is not equipped with RPC stations. The increase of the expected particles rate up to 2 kHz cm$^{-1}$ ( including a safety factor 3 ) motivates the installation of RPC chambers to guarantee redundancy with the CSC chambers already present. The current CMS RPC technology cannot sustain the expected background level. A new generation of Glass-RPC (GRPC) using low-resistivity glass was proposed to equip the two most far away of the four high $η$ muon stations of CMS. In their single-gap version they can stand rates of few kHz cm$^{-1}$. Their time precision of about 1 ns can allow to reduce the noise contribution leading to an improvement of the trigger rate. The proposed design for large size chambers is examined and some preliminary results obtained during beam tests at Gamma Irradiation Facility (GIF++) and Super Proton Synchrotron (SPS) at CERN are shown. They were performed to validate the capability of such detectors to support high irradiation environment with limited consequence on their efficiency. △ Less

Submitted 16 July, 2018; originally announced July 2018.

arXiv:1807.04587 [pdf, other]

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

Authors: Sergey Bartunov, Adam Santoro, Blake A. Richards, Luke Marris, Geoffrey E. Hinton, Timothy Lillicrap

Abstract: The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has… ▽ More The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has proved critical, or in architectures more structured than simple fully-connected networks. Here we present results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance. We present results on the MNIST, CIFAR-10, and ImageNet datasets and explore variants of target-propagation (TP) and feedback alignment (FA) algorithms, and explore performance in both fully- and locally-connected architectures. We also introduce weight-transport-free variants of difference target propagation (DTP) modified to remove backpropagation from the penultimate layer. Many of these algorithms perform well for MNIST, but for CIFAR and ImageNet we find that TP and FA variants perform significantly worse than BP, especially for networks composed of locally connected units, opening questions about whether new architectures and algorithms are required to scale these approaches. Our results and implementation details help establish baselines for biologically motivated deep learning schemes going forward. △ Less

Submitted 20 November, 2018; v1 submitted 12 July, 2018; originally announced July 2018.

Comments: NIPS 2018. Version 2 contains more experimental data including best hyperparameters found

arXiv:1807.04225 [pdf, other]

Measuring abstract reasoning in neural networks

Authors: David G. T. Barrett, Felix Hill, Adam Santoro, Ari S. Morcos, Timothy Lillicrap

Abstract: Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation `regimes' in which the training and test data differ in clearly-define… ▽ More Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation `regimes' in which the training and test data differ in clearly-defined ways. We show that popular models such as ResNets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with a structure designed to encourage reasoning, that does significantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model's ability to generalise improves markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: ICML 2018

arXiv:1806.01830 [pdf, other]

Relational Deep Reinforcement Learning

Authors: Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia

Abstract: We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and… ▽ More We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL. △ Less

Submitted 28 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

arXiv:1806.01822 [pdf, other]

Relational recurrent neural networks

Authors: Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, Timothy Lillicrap

Abstract: Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in wh… ▽ More Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in which entities are connected -- i.e., tasks involving relational reasoning. We then improve upon these deficits by using a new memory module -- a \textit{Relational Memory Core} (RMC) -- which employs multi-head dot product attention to allow memories to interact. Finally, we test the RMC on a suite of tasks that may profit from more capable relational reasoning across sequential information, and show large gains in RL domains (e.g. Mini PacMan), program evaluation, and language modeling, achieving state-of-the-art results on the WikiText-103, Project Gutenberg, and GigaWord datasets. △ Less

Submitted 28 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

arXiv:1806.01261 [pdf, other]

Relational inductive biases, deep learning, and graph networks

Authors: Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals , et al. (2 additional authors not shown)

Abstract: Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, rema… ▽ More Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice. △ Less

Submitted 17 October, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

arXiv:1805.09786 [pdf, other]

Hyperbolic Attention Networks

Authors: Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

Abstract: We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks… ▽ More We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while kee** the neural representations compact. △ Less

Submitted 24 May, 2018; originally announced May 2018.

arXiv:1803.10760 [pdf, other]

Unsupervised Predictive Memory in a Goal-Directed Agent

Authors: Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

Abstract: Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l… ▽ More Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences. △ Less

Submitted 28 March, 2018; originally announced March 2018.

arXiv:1711.08378 [pdf]

Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017

Authors: M. Botvinick, D. G. T. Barrett, P. Battaglia, N. de Freitas, D. Kumaran, J. Z Leibo, T. Lillicrap, J. Modayil, S. Mohamed, N. C. Rabinowitz, D. J. Rezende, A. Santoro, T. Schaul, C. Summerfield, G. Wayne, T. Weber, D. Wierstra, S. Legg, D. Hassabis

Abstract: We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approac… ▽ More We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here we survey several important examples of the progress that has been made toward building autonomous agents with humanlike abilities, and highlight some outstanding challenges. △ Less

Submitted 22 November, 2017; originally announced November 2017.

arXiv:1710.01068 [pdf, ps, other]

doi 10.1103/PhysRevLett.121.128302

Pareto optimality in multilayer network growth

Authors: Andrea Santoro, Vito Latora, Giuseppe Nicosia, Vincenzo Nicosia

Abstract: We model the formation of multi-layer transportation networks as a multi-objective optimization process, where service providers compete for passengers, and the creation of routes is determined by a multi-objective cost function encoding a trade-off between efficiency and competition. The resulting model reproduces well real-world systems as diverse as airplane, train and bus networks, thus sugges… ▽ More We model the formation of multi-layer transportation networks as a multi-objective optimization process, where service providers compete for passengers, and the creation of routes is determined by a multi-objective cost function encoding a trade-off between efficiency and competition. The resulting model reproduces well real-world systems as diverse as airplane, train and bus networks, thus suggesting that such systems are indeed compatible with the proposed local optimization mechanisms. In the specific case of airline transportation systems, we show that the networks of routes operated by each company are placed very close to the theoretical Pareto front in the efficiency-competition plane, and that most of the largest carriers of a continent belong to the corresponding Pareto front. Our results shed light on the fundamental role played by multi-objective optimization principles in sha** the structure of large-scale multilayer transportation systems, and provide novel insights to service providers on the strategies for the smart selection of novel routes. △ Less

Submitted 19 July, 2018; v1 submitted 3 October, 2017; originally announced October 2017.

Comments: 6 pages, 4 figures, Supplemental Material

Journal ref: Phys. Rev. Lett. 121, 128302 (2018)

arXiv:1706.08606 [pdf, other]

Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study

Authors: Samuel Ritter, David G. T. Barrett, Adam Santoro, Matt M. Botvinick

Abstract: Deep neural networks (DNNs) have achieved unprecedented performance on a wide range of complex tasks, rapidly outpacing our understanding of the nature of their solutions. This has caused a recent surge of interest in methods for rendering modern neural systems more interpretable. In this work, we propose to address the interpretability problem in modern DNNs using the rich history of problem desc… ▽ More Deep neural networks (DNNs) have achieved unprecedented performance on a wide range of complex tasks, rapidly outpacing our understanding of the nature of their solutions. This has caused a recent surge of interest in methods for rendering modern neural systems more interpretable. In this work, we propose to address the interpretability problem in modern DNNs using the rich history of problem descriptions, theories and experimental methods developed by cognitive psychologists to study the human mind. To explore the potential value of these tools, we chose a well-established analysis from developmental psychology that explains how children learn word labels for objects, and applied that analysis to DNNs. Using datasets of stimuli inspired by the original cognitive psychology experiments, we find that state-of-the-art one shot learning models trained on ImageNet exhibit a similar bias to that observed in humans: they prefer to categorize objects according to shape rather than color. The magnitude of this shape bias varies greatly among architecturally identical, but differently seeded models, and even fluctuates within seeds throughout training, despite nearly equivalent classification performance. These results demonstrate the capability of tools from cognitive psychology for exposing hidden computational properties of DNNs, while concurrently providing us with a computational model for human word learning. △ Less

Submitted 29 June, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

Comments: ICML 2017

arXiv:1706.01427 [pdf, other]

A simple neural network module for relational reasoning

Authors: Adam Santoro, David Raposo, David G. T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap

Abstract: Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset ca… ▽ More Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations. △ Less

Submitted 5 June, 2017; originally announced June 2017.

arXiv:1702.05068 [pdf, other]

Discovering objects and their relations from entangled scene representations

Authors: David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, Peter Battaglia

Abstract: Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and thei… ▽ More Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and their relations for learning a wide spectrum of tasks, and more generally when learning the structure underlying observed data. In this work, we introduce relation networks (RNs) - a general purpose neural network architecture for object-relation reasoning. We show that RNs are capable of learning object relations from scene description data. Furthermore, we show that RNs can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. The model can also be used in conjunction with differentiable memory mechanisms for implicit relation discovery in one-shot learning tasks. Our results suggest that relation networks are a potentially powerful architecture for solving a variety of problems that require object relation reasoning. △ Less

Submitted 16 February, 2017; originally announced February 2017.

Comments: ICLR Workshop 2017

arXiv:1702.04649 [pdf, other]

Generative Temporal Models with Memory

Authors: Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap

Abstract: We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel… ▽ More We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel elements that may help to predict the future. To create such models, we introduce Generative Temporal Models augmented with external memory systems. They are developed within the variational inference framework, which provides both a practical training methodology and methods to gain insight into the models' operation. We show, on a range of problems with sparse, long-term temporal dependencies, that these models store information from early in a sequence, and reuse this stored information efficiently. This allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs. △ Less

Submitted 21 February, 2017; v1 submitted 15 February, 2017; originally announced February 2017.

arXiv:1611.05079 [pdf, other]

doi 10.1088/0954-3899/43/11/110201

LHC Forward Physics

Authors: K. Akiba, M. Akbiyik, M. Albrow, M. Arneodo, V. Avati, J. Baechler, O. Villalobos Baillie, P. Bartalini, J. Bartels, S. Baur, C. Baus, W. Beaumont, U. Behrens, D. Berge, M. Berretti, E. Bossini, R. Boussarie, S. Brodsky, M. Broz, M. Bruschi, P. Bussey, W. Byczynski, J. C. Cabanillas Noris, E. Calvo Villar, A. Campbell , et al. (162 additional authors not shown)

Abstract: The goal of this report is to give a comprehensive overview of the rich field of forward physics, with a special attention to the topics that can be studied at the LHC. The report starts presenting a selection of the Monte Carlo simulation tools currently available, chapter 2, then enters the rich phenomenology of QCD at low, chapter 3, and high, chapter 4, momentum transfer, while the unique scat… ▽ More The goal of this report is to give a comprehensive overview of the rich field of forward physics, with a special attention to the topics that can be studied at the LHC. The report starts presenting a selection of the Monte Carlo simulation tools currently available, chapter 2, then enters the rich phenomenology of QCD at low, chapter 3, and high, chapter 4, momentum transfer, while the unique scattering conditions of central exclusive production are analyzed in chapter 5. The last two experimental topics, Cosmic Ray and Heavy Ion physics are presented in the chapter 6 and 7 respectively. Chapter 8 is dedicated to the BFKL dynamics, multiparton interactions, and saturation. The report ends with an overview of the forward detectors at LHC. Each chapter is correlated with a comprehensive bibliography, attempting to provide to the interested reader with a wide opportunity for further studies. △ Less

Submitted 9 December, 2017; v1 submitted 15 November, 2016; originally announced November 2016.

Comments: 358 pages; authors added that were missing; minor fixes in affiliations

Report number: CERN-PH-LPCC-2015-001, SLAC-PUB-16364, DESY 15-167

Journal ref: J. Phys. G: Nucl. Part. Phys. 43 (2016) 110201

arXiv:1605.06065 [pdf, other]

One-shot Learning with Memory-Augmented Neural Networks

Authors: Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, Timothy Lillicrap

Abstract: Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information wi… ▽ More Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms. △ Less

Submitted 19 May, 2016; originally announced May 2016.

Comments: 13 pages, 8 figures

arXiv:1506.04981 [pdf, other]

doi 10.1103/PhysRevLett.115.025503

Analogies between the cracking noise of ethanol-dampened charcoal and earthquakes

Authors: H. V. Ribeiro, L. S. Costa, L. G. A. Alves, P. A. Santoro, S. Picoli, E. K. Lenzi, R. S. Mendes

Abstract: We report on an extensive characterization of the cracking noise produced by charcoal samples when dampened with ethanol. We argue that the evaporation of ethanol causes transient and irregularly distributed internal stresses that promote the fragmentation of the samples and mimic some situations found in mining processes. The results show that, in general, the most fundamental seismic laws ruling… ▽ More We report on an extensive characterization of the cracking noise produced by charcoal samples when dampened with ethanol. We argue that the evaporation of ethanol causes transient and irregularly distributed internal stresses that promote the fragmentation of the samples and mimic some situations found in mining processes. The results show that, in general, the most fundamental seismic laws ruling earthquakes (Gutenberg-Richter law, unified scaling law for the recurrence times, Omori's law, productivity law and Bath's law) hold under the conditions of the experiment. Some discrepancies were also identified (a smaller exponent in Gutenberg-Richter law, a stationary behavior in the aftershock rates for long times and a double power-law relationship in productivity law) and related to the different loading condition. Our results thus corroborate to elucidate the parallel between seismic laws and fracture experiments caused by a more complex loading condition that also occurs in natural and induced seismicity (such as long-term fluid injection and gas-rock outbursts in mining processes). △ Less

Submitted 1 July, 2015; v1 submitted 16 June, 2015; originally announced June 2015.

Comments: Accepted for publication in PRL

Journal ref: Phys. Rev. Lett. 115, 025503 (2015)

arXiv:1411.4413 [pdf, other]

doi 10.1038/nature14474

Observation of the rare $B^0_s\toμ^+μ^-$ decay from the combined analysis of CMS and LHCb data

Authors: The CMS, LHCb Collaborations, :, V. Khachatryan, A. M. Sirunyan, A. Tumasyan, W. Adam, T. Bergauer, M. Dragicevic, J. Erö, M. Friedl, R. Frühwirth, V. M. Ghete, C. Hartl, N. Hörmann, J. Hrubec, M. Jeitler, W. Kiesenhofer, V. Knünz, M. Krammer, I. Krätschmer, D. Liko, I. Mikulec, D. Rabady, B. Rahbaran , et al. (2807 additional authors not shown)

Abstract: A joint measurement is presented of the branching fractions $B^0_s\toμ^+μ^-$ and $B^0\toμ^+μ^-$ in proton-proton collisions at the LHC by the CMS and LHCb experiments. The data samples were collected in 2011 at a centre-of-mass energy of 7 TeV, and in 2012 at 8 TeV. The combined analysis produces the first observation of the $B^0_s\toμ^+μ^-$ decay, with a statistical significance exceeding six sta… ▽ More A joint measurement is presented of the branching fractions $B^0_s\toμ^+μ^-$ and $B^0\toμ^+μ^-$ in proton-proton collisions at the LHC by the CMS and LHCb experiments. The data samples were collected in 2011 at a centre-of-mass energy of 7 TeV, and in 2012 at 8 TeV. The combined analysis produces the first observation of the $B^0_s\toμ^+μ^-$ decay, with a statistical significance exceeding six standard deviations, and the best measurement of its branching fraction so far. Furthermore, evidence for the $B^0\toμ^+μ^-$ decay is obtained with a statistical significance of three standard deviations. The branching fraction measurements are statistically compatible with SM predictions and impose stringent constraints on several theories beyond the SM. △ Less

Submitted 17 August, 2015; v1 submitted 17 November, 2014; originally announced November 2014.

Comments: Correspondence should be addressed to [email protected]

Report number: CERN-PH-EP-2014-220, CMS-BPH-13-007, LHCb-PAPER-2014-049

Journal ref: Nature 522, 68-72 (04 June 2015)

arXiv:1206.2404 [pdf, ps, other]

doi 10.1371/journal.pone.0040689

Complexity-Entropy Causality Plane as a Complexity Measure for Two-dimensional Patterns

Authors: H. V. Ribeiro, L. Zunino, E. K. Lenzi, P. A. Santoro, R. S. Mendes

Abstract: Complexity measures are essential to understand complex systems and there are numerous definitions to analyze one-dimensional data. However, extensions of these approaches to two or higher-dimensional data, such as images, are much less common. Here, we reduce this gap by applying the ideas of the permutation entropy combined with a relative entropic index. We build up a numerical procedure that c… ▽ More Complexity measures are essential to understand complex systems and there are numerous definitions to analyze one-dimensional data. However, extensions of these approaches to two or higher-dimensional data, such as images, are much less common. Here, we reduce this gap by applying the ideas of the permutation entropy combined with a relative entropic index. We build up a numerical procedure that can be easily implemented to evaluate the complexity of two or higher-dimensional patterns. We work out this method in different scenarios where numerical experiments and empirical data were taken into account. Specifically, we have applied the method to i) fractal landscapes generated numerically where we compare our measures with the Hurst exponent; ii) liquid crystal textures where nematic-isotropic-nematic phase transitions were properly identified; iii) 12 characteristic textures of liquid crystals where the different values show that the method can distinguish different phases; iv) and Ising surfaces where our method identified the critical temperature and also proved to be stable. △ Less

Submitted 11 June, 2012; originally announced June 2012.

Comments: Accepted for publication in PLoS One

Journal ref: PLoS ONE 7, e40689 (2012)

Showing 1–50 of 61 results for author: Santoro, A