Search | arXiv e-print repository

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Authors: Emily Cheng, Diego Doimo, Corentin Kervadec, Iuri Macocco, Jade Yu, Alessandro Laio, Marco Baroni

Abstract: A language model (LM) is a map** from a linguistic context to an output token. However, much remains to be known about this map**, including how its geometric properties relate to its function. We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets, a distinct phase characterized by high intrinsic dimensionalit… ▽ More A language model (LM) is a map** from a linguistic context to an output token. However, much remains to be known about this map**, including how its geometric properties relate to its function. We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets, a distinct phase characterized by high intrinsic dimensionality. During this phase, representations (1) correspond to the first full linguistic abstraction of the input; (2) are the first to viably transfer to downstream tasks; (3) predict each other across different LMs. Moreover, we find that an earlier onset of the phase strongly predicts better language modelling performance. In short, our results suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15454 [pdf, other]

Linearly Controlled Language Generation with Performative Guarantees

Authors: Emily Cheng, Marco Baroni, Carmen Amo Alonso

Abstract: The increasing prevalence of Large Language Models (LMs) in critical applications highlights the need for controlled language generation strategies that are not only computationally efficient but that also enjoy performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. In particular, we take the view that natural language g… ▽ More The increasing prevalence of Large Language Models (LMs) in critical applications highlights the need for controlled language generation strategies that are not only computationally efficient but that also enjoy performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. In particular, we take the view that natural language generation traces a trajectory in this continuous semantic space, realized by the language model's hidden activations. This view permits a control-theoretic treatment of text generation in latent space, in which we propose a lightweight, gradient-free intervention that dynamically steers trajectories away from regions corresponding to undesired meanings. Crucially, we show that this intervention, which we compute in closed form, is guaranteed (in probability) to steer the output into the allowed region. Finally, we demonstrate on a toxicity avoidance objective that the intervention steers language away from undesired content while maintaining text quality. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2402.15268 [pdf, other]

MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

Authors: Nathanaël Carraz Rakotonirina, Marco Baroni

Abstract: Transformer-based language models (LMs) track contextual information through large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed… ▽ More Transformer-based language models (LMs) track contextual information through large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed to probe a LM's ability to keep track of multiple fact updates, a MemoryPrompt-augmented LM outperforms much larger LMs that have access to the full input history. We also test MemoryPrompt on a long-distance dialogue dataset, where its performance is comparable to that of a model conditioned on the entire conversation history. In both experiments we also observe that, unlike full-finetuning approaches, MemoryPrompt does not suffer from catastrophic forgetting when adapted to new tasks, thus not disrupting the generalist capabilities of the underlying LM. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Published as conference paper at LREC-COLING 2024

arXiv:2401.11462 [pdf]

doi 10.1109/CSICC58665.2023.10105391

Frost Prediction Using Machine Learning Methods in Fars Province

Authors: Milad Barooni, Koorush Ziarati, Ali Barooni

Abstract: One of the common hazards and issues in meteorology and agriculture is the problem of frost, chilling or freezing. This event occurs when the minimum ambient temperature falls below a certain value. This phenomenon causes a lot of damage to the country, especially Fars province. Solving this problem requires that, in addition to predicting the minimum temperature, we can provide enough time to imp… ▽ More One of the common hazards and issues in meteorology and agriculture is the problem of frost, chilling or freezing. This event occurs when the minimum ambient temperature falls below a certain value. This phenomenon causes a lot of damage to the country, especially Fars province. Solving this problem requires that, in addition to predicting the minimum temperature, we can provide enough time to implement the necessary measures. Empirical methods have been provided by the Food and Agriculture Organization (FAO), which can predict the minimum temperature, but not in time. In addition to this, we can use machine learning methods to model the minimum temperature. In this study, we have used three methods Gated Recurrent Unit (GRU), Temporal Convolutional Network (TCN) as deep learning methods, and Gradient Boosting (XGBoost). A customized loss function designed for methods based on deep learning, which can be effective in reducing prediction errors. With methods based on deep learning models, not only do we observe a reduction in RMSE error compared to empirical methods but also have more time to predict minimum temperature. Thus, we can model the minimum temperature for the next 24 hours by having the current 24 hours. With the gradient boosting model (XGBoost) we can keep the prediction time as deep learning and RMSE error reduced. Finally, we experimentally concluded that machine learning methods work better than empirical methods and XGBoost model can have better performance in this problem among other implemented. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Accpeted by 28th International Computer Conference, Computer Society of Iran (CSICC)

arXiv:2310.15829 [pdf, other]

Unnatural language processing: How do language models handle machine-generated prompts?

Authors: Corentin Kervadec, Francesca Franzon, Marco Baroni

Abstract: Language model prompt optimization research has shown that semantically and grammatically well-formed manually crafted prompts are routinely outperformed by automatically generated token sequences with no apparent meaning or syntactic structure, including sequences of vectors from a model's embedding space. We use machine-generated prompts to probe how models respond to input that is not composed… ▽ More Language model prompt optimization research has shown that semantically and grammatically well-formed manually crafted prompts are routinely outperformed by automatically generated token sequences with no apparent meaning or syntactic structure, including sequences of vectors from a model's embedding space. We use machine-generated prompts to probe how models respond to input that is not composed of natural language expressions. We study the behavior of models of different sizes in multiple semantic tasks in response to both continuous and discrete machine-generated prompts, and compare it to the behavior in response to human-generated natural-language prompts. Even when producing a similar output, machine-generated and human prompts trigger different response patterns through the network processing pathways, including different perplexities, different attention and output entropy distributions, and different unit activation profiles. We provide preliminary insight into the nature of the units activated by different prompt types, suggesting that only natural language prompts recruit a genuinely linguistic circuit. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP 2023 Camera-Ready

arXiv:2310.13620 [pdf, other]

Bridging Information-Theoretic and Geometric Compression in Language Models

Authors: Emily Cheng, Corentin Kervadec, Marco Baroni

Abstract: For a language model (LM) to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions. We propose analyzing compression in (pre-trained) LMs from two points of view: geometric and information-theoretic. We demonstrate that the two views are highly correlated, such that the intrinsic geometric dimension of linguistic data predicts their… ▽ More For a language model (LM) to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions. We propose analyzing compression in (pre-trained) LMs from two points of view: geometric and information-theoretic. We demonstrate that the two views are highly correlated, such that the intrinsic geometric dimension of linguistic data predicts their coding length under the LM. We then show that, in turn, high compression of a linguistic dataset predicts rapid adaptation to that dataset, confirming that being able to compress linguistic information is an important part of successful LM performance. As a practical byproduct of our analysis, we evaluate a battery of intrinsic dimension estimators for the first time on linguistic data, showing that only some encapsulate the relationship between information-theoretic compression, geometric compression, and ease-of-adaptation. △ Less

Submitted 9 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Camera-Ready

arXiv:2304.01662 [pdf, other]

Cross-Domain Image Captioning with Discriminative Finetuning

Authors: Roberto Dessì, Michele Bevilacqua, Eleonora Gualdoni, Nathanael Carraz Rakotonirina, Francesca Franzon, Marco Baroni

Abstract: Neural captioners are typically trained to mimic human-generated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more… ▽ More Neural captioners are typically trained to mimic human-generated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify such image among a set of candidates. We experiment with the popular ClipCap captioner, also replicating the main results with BLIP. In terms of similarity to ground-truth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, our discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner without finetuning. We further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: CVPR 2023

arXiv:2302.09865 [pdf, other]

Can discrete information extraction prompts generalize across language models?

Authors: Nathanaël Carraz Rakotonirina, Roberto Dessì, Fabio Petroni, Sebastian Riedel, Marco Baroni

Abstract: We study whether automatically-induced prompts that effectively extract information from a language model can also be used, out-of-the-box, to probe other language models for the same information. After confirming that discrete prompts induced with the AutoPrompt algorithm outperform manual and semi-manual prompts on the slot-filling task, we demonstrate a drop in performance for AutoPrompt prompt… ▽ More We study whether automatically-induced prompts that effectively extract information from a language model can also be used, out-of-the-box, to probe other language models for the same information. After confirming that discrete prompts induced with the AutoPrompt algorithm outperform manual and semi-manual prompts on the slot-filling task, we demonstrate a drop in performance for AutoPrompt prompts learned on a model and tested on another. We introduce a way to induce prompts by mixing language models at training time that results in prompts that generalize well across models. We conduct an extensive analysis of the induced prompts, finding that the more general prompts include a larger proportion of existing English words and have a less order-dependent and more uniform distribution of information across their component tokens. Our work provides preliminary evidence that it's possible to generate discrete prompts that can be induced once and used with a number of different models, and gives insights on the properties characterizing such prompts. △ Less

Submitted 7 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Published as conference paper at ICLR 2023

arXiv:2302.08913 [pdf, other]

Referential communication in heterogeneous communities of pre-trained visual deep networks

Authors: Matéo Mahaut, Francesca Franzon, Roberto Dessì, Marco Baroni

Abstract: As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of \textit{referential communication}… ▽ More As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of \textit{referential communication} in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects. △ Less

Submitted 13 March, 2024; v1 submitted 4 February, 2023; originally announced February 2023.

arXiv:2210.11512 [pdf, other]

Communication breakdown: On the low mutual intelligibility between human and neural captioning

Authors: Roberto Dessì, Eleonora Gualdoni, Francesca Franzon, Gemma Boleda, Marco Baroni

Abstract: We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced ImageCoDe data-set (Krojer et al., 2022) which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher per… ▽ More We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced ImageCoDe data-set (Krojer et al., 2022) which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher performance when fed neural rather than human captions, despite the fact that the former, unlike the latter, were generated without awareness of the distractors that make the task hard. Even more remarkably, when the same neural captions are given to human subjects, their retrieval performance is almost at chance level. Our results thus add to the growing body of evidence that, even when the ``language'' of neural models resembles English, this superficial resemblance might be deeply misleading. △ Less

Submitted 27 April, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: Accepted as a short paper at EMNLP 2022

arXiv:2110.02782 [pdf, other]

How BPE Affects Memorization in Transformers

Authors: Eugene Kharitonov, Marco Baroni, Dieuwke Hupkes

Abstract: Training data memorization in NLP can both be beneficial (e.g., closed-book QA) and undesirable (personal data extraction). In any case, successful model training requires a non-trivial amount of memorization to store word spellings, various linguistic idiosyncrasies and common knowledge. However, little is known about what affects the memorization behavior of NLP models, as the field tends to foc… ▽ More Training data memorization in NLP can both be beneficial (e.g., closed-book QA) and undesirable (personal data extraction). In any case, successful model training requires a non-trivial amount of memorization to store word spellings, various linguistic idiosyncrasies and common knowledge. However, little is known about what affects the memorization behavior of NLP models, as the field tends to focus on the equally important question of generalization. In this work, we demonstrate that the size of the subword vocabulary learned by Byte-Pair Encoding (BPE) greatly affects both ability and tendency of standard Transformer models to memorize training data, even when we control for the number of learned parameters. We find that with a large subword vocabulary size, Transformer models fit random map**s more easily and are more vulnerable to membership inference attacks. Similarly, given a prompt, Transformer-based language models with large subword vocabularies reproduce the training data more often. We conjecture this effect is caused by reduction in the sequences' length that happens as the BPE vocabulary grows. Our findings can allow a more informed choice of hyper-parameters, that is better tailored for a particular use-case. △ Less

Submitted 2 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

arXiv:2106.08694 [pdf, ps, other]

On the proper role of linguistically-oriented deep net analysis in linguistic theorizing

Authors: Marco Baroni

Abstract: A lively research field has recently emerged that uses experimental methods to probe the linguistic behavior of modern deep networks. While work in this tradition often reports intriguing results about the grammatical skills of deep nets, it is not clear what their implications for linguistic theorizing should be. As a consequence, linguistically-oriented deep net analysis has had very little impa… ▽ More A lively research field has recently emerged that uses experimental methods to probe the linguistic behavior of modern deep networks. While work in this tradition often reports intriguing results about the grammatical skills of deep nets, it is not clear what their implications for linguistic theorizing should be. As a consequence, linguistically-oriented deep net analysis has had very little impact on linguistics at large. In this chapter, I suggest that deep networks should be treated as theories making explicit predictions about the acceptability of linguistic utterances. I argue that, if we overcome some obstacles standing in the way of seriously pursuing this idea, we will gain a powerful new theoretical tool, complementary to mainstream algebraic approaches. △ Less

Submitted 24 March, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: To appear in collective volume on Algebraic Systems and the Representation of Linguistic Knowledge, editor: Shalom Lappin, Taylor & Francis

arXiv:2106.04258 [pdf, other]

Interpretable agent communication from scratch (with a generic visual processor emerging on the side)

Authors: Roberto Dessì, Eugene Kharitonov, Marco Baroni

Abstract: As deep networks begin to be deployed as autonomous agents, the issue of how they can communicate with each other becomes important. Here, we train two deep nets from scratch to perform realistic referent identification through unsupervised emergent communication. We show that the largely interpretable emergent protocol allows the nets to successfully communicate even about object types they did n… ▽ More As deep networks begin to be deployed as autonomous agents, the issue of how they can communicate with each other becomes important. Here, we train two deep nets from scratch to perform realistic referent identification through unsupervised emergent communication. We show that the largely interpretable emergent protocol allows the nets to successfully communicate even about object types they did not see at training time. The visual representations induced as a by-product of our training regime, moreover, show comparable quality, when re-used as generic visual features, to a recent self-supervised learning model. Our results provide concrete evidence of the viability of (interpretable) emergent deep net communication in a more realistic scenario than previously considered, as well as establishing an intriguing link between this field and self-supervised visual learning. △ Less

Submitted 15 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2006.11098 [pdf]

doi 10.1016/j.cognition.2021.104699

Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans

Authors: Yair Lakretz, Dieuwke Hupkes, Alessandra Vergallito, Marco Marelli, Marco Baroni, Stanislas Dehaene

Abstract: Recursive processing in sentence comprehension is considered a hallmark of human linguistic abilities. However, its underlying neural mechanisms remain largely unknown. We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing, namely the storing of grammatical number and gender information in working memory and… ▽ More Recursive processing in sentence comprehension is considered a hallmark of human linguistic abilities. However, its underlying neural mechanisms remain largely unknown. We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing, namely the storing of grammatical number and gender information in working memory and its use in long-distance agreement (e.g., capturing the correct number agreement between subject and verb when they are separated by other phrases). Although the network, a recurrent architecture with Long Short-Term Memory units, was solely trained to predict the next word in a large corpus, analysis showed the emergence of a very sparse set of specialized units that successfully handled local and long-distance syntactic agreement for grammatical number. However, the simulations also showed that this mechanism does not support full recursion and fails with some long-range embedded dependencies. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns, with or without embedding. Human and model error patterns were remarkably similar, showing that the model echoes various effects observed in human data. However, a key difference was that, with embedded long-range dependencies, humans remained above chance level, while the model's systematic errors brought it below chance. Overall, our study shows that exploring the ways in which modern artificial neural networks process sentences leads to precise and testable hypotheses about human linguistic performance. △ Less

Submitted 3 May, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

Journal ref: Lakretz et al. (2021), Cognition

arXiv:2006.02419 [pdf, other]

Emergent Multi-Agent Communication in the Deep Learning Era

Authors: Angeliki Lazaridou, Marco Baroni

Abstract: The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language evolves in communities of deep agents and its emergent features can… ▽ More The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language evolves in communities of deep agents and its emergent features can shed light on human language evolution. From an applied perspective, endowing deep networks with the ability to solve problems interactively by communicating with each other and with us should make them more flexible and useful in everyday life. This article surveys representative recent language emergence studies from both of these two angles. △ Less

Submitted 14 July, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: Added some more references and discussion

arXiv:2004.10827 [pdf, ps, other]

doi 10.1146/annurev-linguistics-032020-051035

Syntactic Structure from Deep Learning

Authors: Tal Linzen, Marco Baroni

Abstract: Modern deep neural networks achieve impressive performance in engineering applications that require extensive linguistic skills, such as machine translation. This success has sparked interest in probing whether these models are inducing human-like grammatical knowledge from the raw data they are exposed to, and, consequently, whether they can shed new light on long-standing debates concerning the… ▽ More Modern deep neural networks achieve impressive performance in engineering applications that require extensive linguistic skills, such as machine translation. This success has sparked interest in probing whether these models are inducing human-like grammatical knowledge from the raw data they are exposed to, and, consequently, whether they can shed new light on long-standing debates concerning the innate structure necessary for language acquisition. In this article, we survey representative studies of the syntactic abilities of deep networks, and discuss the broader implications that this work has for theoretical linguistics. △ Less

Submitted 22 April, 2020; originally announced April 2020.

Comments: In press at Annual Reviews of Linguistics

arXiv:2004.09124 [pdf, other]

Compositionality and Generalization in Emergent Languages

Authors: Rahma Chaabouni, Eugene Kharitonov, Diane Bouchacourt, Emmanuel Dupoux, Marco Baroni

Abstract: Natural language allows us to refer to novel composite concepts by combining expressions denoting their parts according to systematic rules, a property known as \emph{compositionality}. In this paper, we study whether the language emerging in deep multi-agent simulations possesses a similar ability to refer to novel primitive combinations, and whether it accomplishes this feat by strategies akin t… ▽ More Natural language allows us to refer to novel composite concepts by combining expressions denoting their parts according to systematic rules, a property known as \emph{compositionality}. In this paper, we study whether the language emerging in deep multi-agent simulations possesses a similar ability to refer to novel primitive combinations, and whether it accomplishes this feat by strategies akin to human-language compositionality. Equipped with new ways to measure compositionality in emergent languages inspired by disentanglement in representation learning, we establish three main results. First, given sufficiently large input spaces, the emergent language will naturally develop the ability to refer to novel composite concepts. Second, there is no correlation between the degree of compositionality of an emergent language and its ability to generalize. Third, while compositionality is not necessary for generalization, it provides an advantage in terms of language transmission: The more compositional a language is, the more easily it will be picked up by new learners, even when the latter differ in architecture from the original agents. We conclude that compositionality does not arise from simple generalization pressure, but if an emergent language does chance upon it, it will be more likely to survive and thrive. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:2004.03420 [pdf, other]

Emergent Language Generalization and Acquisition Speed are not tied to Compositionality

Authors: Eugene Kharitonov, Marco Baroni

Abstract: Studies of discrete languages emerging when neural agents communicate to solve a joint task often look for evidence of compositional structure. This stems for the expectation that such a structure would allow languages to be acquired faster by the agents and enable them to generalize better. We argue that these beneficial properties are only loosely connected to compositionality. In two experiment… ▽ More Studies of discrete languages emerging when neural agents communicate to solve a joint task often look for evidence of compositional structure. This stems for the expectation that such a structure would allow languages to be acquired faster by the agents and enable them to generalize better. We argue that these beneficial properties are only loosely connected to compositionality. In two experiments, we demonstrate that, depending on the task, non-compositional languages might show equal, or better, generalization performance and acquisition speed than compositional ones. Further research in the area should be clearer about what benefits are expected from compositionality, and how the latter would lead to them. △ Less

Submitted 25 April, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

arXiv:2003.11922 [pdf, ps, other]

Rat big, cat eaten! Ideas for a useful deep-agent protolanguage

Authors: Marco Baroni

Abstract: Deep-agent communities develo** their own language-like communication protocol are a hot (or at least warm) topic in AI. Such agents could be very useful in machine-machine and human-machine interaction scenarios long before they have evolved a protocol as complex as human language. Here, I propose a small set of priorities we should focus on, if we want to get as fast as possible to a stage whe… ▽ More Deep-agent communities develo** their own language-like communication protocol are a hot (or at least warm) topic in AI. Such agents could be very useful in machine-machine and human-machine interaction scenarios long before they have evolved a protocol as complex as human language. Here, I propose a small set of priorities we should focus on, if we want to get as fast as possible to a stage where deep agents speak a useful protolanguage. △ Less

Submitted 17 March, 2020; originally announced March 2020.

arXiv:2003.05161 [pdf, other]

A Benchmark for Systematic Generalization in Grounded Language Understanding

Authors: Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, Brenden M. Lake

Abstract: Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts ("greet the pink brontosaurus by the ferris wheel"). Modern neural networks, by contrast, struggle to interpret novel compositions. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding. Going beyond a related benchmark th… ▽ More Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts ("greet the pink brontosaurus by the ferris wheel"). Modern neural networks, by contrast, struggle to interpret novel compositions. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding. Going beyond a related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world, facilitating novel evaluations of acquiring linguistically motivated rules. For example, agents must understand how adjectives such as 'small' are interpreted relative to the current world state or how adverbs such as 'cautiously' combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules. △ Less

Submitted 17 October, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

Comments: accepted at NeurIPS 2020

arXiv:1911.01892 [pdf, ps, other]

Focus on What's Informative and Ignore What's not: Communication Strategies in a Referential Game

Authors: Roberto Dessì, Diane Bouchacourt, Davide Crepaldi, Marco Baroni

Abstract: Research in multi-agent cooperation has shown that artificial agents are able to learn to play a simple referential game while develo** a shared lexicon. This lexicon is not easy to analyze, as it does not show many properties of a natural language. In a simple referential game with two neural network-based agents, we analyze the object-symbol map** trying to understand what kind of strategy w… ▽ More Research in multi-agent cooperation has shown that artificial agents are able to learn to play a simple referential game while develo** a shared lexicon. This lexicon is not easy to analyze, as it does not show many properties of a natural language. In a simple referential game with two neural network-based agents, we analyze the object-symbol map** trying to understand what kind of strategy was used to develop the emergent language. We see that, when the environment is uniformly distributed, the agents rely on a random subset of features to describe the objects. When we modify the objects making one feature non-uniformly distributed,the agents realize it is less informative and start to ignore it, and, surprisingly, they make a better use of the remaining features. This interesting result suggests that more natural, less uniformly distributed environments might aid in spurring the emergence of better-behaved languages. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: 3rd NeurIPS Workshop on Emergent Communication

arXiv:1907.00852 [pdf, other]

EGG: a toolkit for research on Emergence of lanGuage in Games

Authors: Eugene Kharitonov, Rahma Chaabouni, Diane Bouchacourt, Marco Baroni

Abstract: There is renewed interest in simulating language emergence among deep neural agents that communicate to jointly solve a task, spurred by the practical aim to develop language-enabled interactive AIs, as well as by theoretical questions about the evolution of human language. However, optimizing deep architectures connected by a discrete communication channel (such as that in which language emerges)… ▽ More There is renewed interest in simulating language emergence among deep neural agents that communicate to jointly solve a task, spurred by the practical aim to develop language-enabled interactive AIs, as well as by theoretical questions about the evolution of human language. However, optimizing deep architectures connected by a discrete communication channel (such as that in which language emerges) is technically challenging. We introduce EGG, a toolkit that greatly simplifies the implementation of emergent-language communication games. EGG's modular design provides a set of building blocks that the user can combine to create new games, easily navigating the optimization and architecture space. We hope that the tool will lower the technical barrier, and encourage researchers from various backgrounds to do original work in this exciting area. △ Less

Submitted 13 October, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: EMNLP 2019 Demo paper

arXiv:1906.07285 [pdf, other]

Tabula nearly rasa: Probing the Linguistic Knowledge of Character-Level Neural Language Models Trained on Unsegmented Text

Authors: Michael Hahn, Marco Baroni

Abstract: Recurrent neural networks (RNNs) have reached striking performance in many natural language processing tasks. This has renewed interest in whether these generic sequence processing devices are inducing genuine linguistic knowledge. Nearly all current analytical studies, however, initialize the RNNs with a vocabulary of known words, and feed them tokenized input during training. We present a multi-… ▽ More Recurrent neural networks (RNNs) have reached striking performance in many natural language processing tasks. This has renewed interest in whether these generic sequence processing devices are inducing genuine linguistic knowledge. Nearly all current analytical studies, however, initialize the RNNs with a vocabulary of known words, and feed them tokenized input during training. We present a multi-lingual study of the linguistic knowledge encoded in RNNs trained as character-level language models, on input data with word boundaries removed. These networks face a tougher and more cognitively realistic task, having to discover any useful linguistic unit from scratch based on input statistics. The results show that our "near tabula rasa" RNNs are mostly able to solve morphological, syntactic and semantic tasks that intuitively presuppose word-level knowledge, and indeed they learned, to some extent, to track word boundaries. Our study opens the door to speculations about the necessity of an explicit, rigid word lexicon in language learning and usage. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: Accepted by Transactions of the Association for Computational Linguistics

arXiv:1905.13687 [pdf, other]

Entropy Minimization In Emergent Languages

Authors: Eugene Kharitonov, Rahma Chaabouni, Diane Bouchacourt, Marco Baroni

Abstract: There is growing interest in studying the languages that emerge when neural agents are jointly trained to solve tasks requiring communication through a discrete channel. We investigate here the information-theoretic complexity of such languages, focusing on the basic two-agent, one-exchange setup. We find that, under common training procedures, the emergent languages are subject to an entropy mini… ▽ More There is growing interest in studying the languages that emerge when neural agents are jointly trained to solve tasks requiring communication through a discrete channel. We investigate here the information-theoretic complexity of such languages, focusing on the basic two-agent, one-exchange setup. We find that, under common training procedures, the emergent languages are subject to an entropy minimization pressure that has also been detected in human language, whereby the mutual information between the communicating agent's inputs and the messages is minimized, within the range afforded by the need for successful communication. That is, emergent languages are (nearly) as simple as the task they are developed for allow them to be. This pressure is amplified as we increase communication channel discreteness. Further, we observe that stronger discrete-channel-driven entropy minimization leads to representations with increased robustness to overfitting and adversarial attacks. We conclude by discussing the implications of our findings for the study of natural and artificial communication systems. △ Less

Submitted 26 June, 2020; v1 submitted 31 May, 2019; originally announced May 2019.

Comments: Accepted at ICML 2020

arXiv:1905.12561 [pdf, other]

Anti-efficient encoding in emergent communication

Authors: Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, Marco Baroni

Abstract: Despite renewed interest in emergent language simulations with neural networks, little is known about the basic properties of the induced code, and how they compare to human language. One fundamental characteristic of the latter, known as Zipf's Law of Abbreviation (ZLA), is that more frequent words are efficiently associated to shorter strings. We study whether the same pattern emerges when two n… ▽ More Despite renewed interest in emergent language simulations with neural networks, little is known about the basic properties of the induced code, and how they compare to human language. One fundamental characteristic of the latter, known as Zipf's Law of Abbreviation (ZLA), is that more frequent words are efficiently associated to shorter strings. We study whether the same pattern emerges when two neural networks, a "speaker" and a "listener", are trained to play a signaling game. Surprisingly, we find that networks develop an \emph{anti-efficient} encoding scheme, in which the most frequent inputs are associated to the longest messages, and messages in general are skewed towards the maximum length threshold. This anti-efficient code appears easier to discriminate for the listener, and, unlike in human communication, the speaker does not impose a contrasting least-effort pressure towards brevity. Indeed, when the cost function includes a penalty for longer messages, the resulting message distribution starts respecting ZLA. Our analysis stresses the importance of studying the basic features of emergent communication in a highly controlled setup, to ensure the latter will not strand too far from human language. Moreover, we present a concrete illustration of how different functional pressures can lead to successful communication codes that lack basic properties of human language, thus highlighting the role such pressures play in the latter. △ Less

Submitted 15 October, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1905.12330 [pdf, other]

Word-order biases in deep-agent emergent communication

Authors: Rahma Chaabouni, Eugene Kharitonov, Alessandro Lazaric, Emmanuel Dupoux, Marco Baroni

Abstract: Sequence-processing neural networks led to remarkable progress on many NLP tasks. As a consequence, there has been increasing interest in understanding to what extent they process language as humans do. We aim here to uncover which biases such models display with respect to "natural" word-order constraints. We train models to communicate about paths in a simple gridworld, using miniature languages… ▽ More Sequence-processing neural networks led to remarkable progress on many NLP tasks. As a consequence, there has been increasing interest in understanding to what extent they process language as humans do. We aim here to uncover which biases such models display with respect to "natural" word-order constraints. We train models to communicate about paths in a simple gridworld, using miniature languages that reflect or violate various natural language trends, such as the tendency to avoid redundancy or to minimize long-distance dependencies. We study how the controlled characteristics of our miniature languages affect individual learning and their stability across multiple network generations. The results draw a mixed picture. On the one hand, neural networks show a strong tendency to avoid long-distance dependencies. On the other hand, there is no clear preference for the efficient, non-redundant encoding of information that is widely attested in natural language. We thus suggest inoculating a notion of "effort" into neural networks, as a possible way to make their linguistic behavior more human-like. △ Less

Submitted 14 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: Conference: Association for Computational Linguistics (ACL)

arXiv:1905.11871 [pdf, other]

Miss Tools and Mr Fruit: Emergent communication in agents learning about object affordances

Authors: Diane Bouchacourt, Marco Baroni

Abstract: Recent research studies communication emergence in communities of deep network agents assigned a joint task, ho** to gain insights on human language evolution. We propose here a new task capturing crucial aspects of the human environment, such as natural object affordances, and of human conversation, such as full symmetry among the participants. By conducting a thorough pragmatic and semantic an… ▽ More Recent research studies communication emergence in communities of deep network agents assigned a joint task, ho** to gain insights on human language evolution. We propose here a new task capturing crucial aspects of the human environment, such as natural object affordances, and of human conversation, such as full symmetry among the participants. By conducting a thorough pragmatic and semantic analysis of the emergent protocol, we show that the agents solve the shared task through genuine bilateral, referential communication. However, the agents develop multiple idiolects, which makes us conclude that full symmetry is not a sufficient condition for a common language to emerge. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Comments: Association for Computational Linguistics

arXiv:1905.08527 [pdf, other]

CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks

Authors: Roberto Dessì, Marco Baroni

Abstract: Lake and Baroni (2018) introduced the SCAN dataset probing the ability of seq2seq models to capture compositional generalizations, such as inferring the meaning of "jump around" 0-shot from the component words. Recurrent networks (RNNs) were found to completely fail the most challenging generalization cases. We test here a convolutional network (CNN) on these tasks, reporting hugely improved perfo… ▽ More Lake and Baroni (2018) introduced the SCAN dataset probing the ability of seq2seq models to capture compositional generalizations, such as inferring the meaning of "jump around" 0-shot from the component words. Recurrent networks (RNNs) were found to completely fail the most challenging generalization cases. We test here a convolutional network (CNN) on these tasks, reporting hugely improved performance with respect to RNNs. Despite the big improvement, the CNN has however not induced systematic rules, suggesting that the difference between compositional and non-compositional behaviour is not clear-cut. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Comments: accepted as a short paper at ACL 2019

arXiv:1904.00157 [pdf, other]

doi 10.1098/rstb.2019.0307

Linguistic generalization and compositionality in modern artificial neural networks

Authors: Marco Baroni

Abstract: In the last decade, deep artificial neural networks have achieved astounding performance in many natural language processing tasks. Given the high productivity of language, these models must possess effective generalization abilities. It is widely assumed that humans handle linguistic productivity by means of algebraic compositional rules: Are deep networks similarly compositional? After reviewing… ▽ More In the last decade, deep artificial neural networks have achieved astounding performance in many natural language processing tasks. Given the high productivity of language, these models must possess effective generalization abilities. It is widely assumed that humans handle linguistic productivity by means of algebraic compositional rules: Are deep networks similarly compositional? After reviewing the main innovations characterizing current deep language processing networks, I discuss a set of studies suggesting that deep networks are capable of subtle grammar-dependent generalizations, but also that they do not rely on systematic compositional rules. I argue that the intriguing behaviour of these devices (still awaiting a full understanding) should be of interest to linguists and cognitive scientists, as it offers a new perspective on possible computational strategies to deal with linguistic productivity beyond rule-based compositionality, and it might lead to new insights into the less systematic generalization patterns that also appear in natural language. △ Less

Submitted 26 June, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

Comments: Please cite as "to appear in the Philosophical Transactions of the Royal Society B"

arXiv:1903.07435 [pdf, other]

The emergence of number and syntax units in LSTM language models

Authors: Yair Lakretz, German Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni

Abstract: Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the… ▽ More Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two `number units'. Importantly, the behaviour of these units is partially controlled by other units independently shown to track syntactic structure. We conclude that LSTMs are, to some extent, implementing genuinely syntactic processing mechanisms, paving the way to a more general understanding of grammatical encoding in LSTMs. △ Less

Submitted 2 April, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

Comments: To appear in Proceedings of NAACL, Minneapolis, MN, 2019

arXiv:1901.04587 [pdf, other]

Human few-shot learning of compositional instructions

Authors: Brenden M. Lake, Tal Linzen, Marco Baroni

Abstract: People learn in fast and flexible ways that have not been emulated by machines. Once a person learns a new verb "dax," he or she can effortlessly understand how to "dax twice," "walk and dax," or "dax vigorously." There have been striking recent improvements in machine learning for natural language processing, yet the best algorithms require vast amounts of experience and struggle to generalize ne… ▽ More People learn in fast and flexible ways that have not been emulated by machines. Once a person learns a new verb "dax," he or she can effortlessly understand how to "dax twice," "walk and dax," or "dax vigorously." There have been striking recent improvements in machine learning for natural language processing, yet the best algorithms require vast amounts of experience and struggle to generalize new concepts in compositional ways. To better understand these distinctively human abilities, we study the compositional skills of people through language-like instruction learning tasks. Our results show that people can learn and use novel functional concepts from very few examples (few-shot learning), successfully applying familiar functions to novel inputs. People can also compose concepts in complex ways that go beyond the provided demonstrations. Two additional experiments examined the assumptions and inductive biases that people make when solving these tasks, revealing three biases: mutual exclusivity, one-to-one map**s, and iconic concatenation. We discuss the implications for cognitive modeling and the potential for building machines with more human-like language learning capabilities. △ Less

Submitted 10 May, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

Comments: Please cite as: Lake, B. M., Linzen, T., and Baroni, M. (2019). Human few-shot learning of compositional instructions. In Proceedings of the 41st Annual Conference of the Cognitive Science Society

arXiv:1809.04640 [pdf, other]

Jump to better conclusions: SCAN both left and right

Authors: Jasmijn Bastings, Marco Baroni, Jason Weston, Kyunghyun Cho, Douwe Kiela

Abstract: Lake and Baroni (2018) recently introduced the SCAN data set, which consists of simple commands paired with action sequences and is intended to test the strong generalization abilities of recurrent sequence-to-sequence models. Their initial experiments suggested that such models may fail because they lack the ability to extract systematic rules. Here, we take a closer look at SCAN and show that it… ▽ More Lake and Baroni (2018) recently introduced the SCAN data set, which consists of simple commands paired with action sequences and is intended to test the strong generalization abilities of recurrent sequence-to-sequence models. Their initial experiments suggested that such models may fail because they lack the ability to extract systematic rules. Here, we take a closer look at SCAN and show that it does not always capture the kind of generalization that it was designed for. To mitigate this we propose a complementary dataset, which requires map** actions back to the original commands, called NACS. We show that models that do well on SCAN do not necessarily do well on NACS, and that NACS exhibits properties more closely aligned with realistic use-cases for sequence-to-sequence models. △ Less

Submitted 18 June, 2020; v1 submitted 12 September, 2018; originally announced September 2018.

arXiv:1808.10696 [pdf, other]

How agents see things: On visual representations in an emergent language game

Authors: Diane Bouchacourt, Marco Baroni

Abstract: There is growing interest in the language developed by agents interacting in emergent-communication settings. Earlier studies have focused on the agents' symbol usage, rather than on their representation of visual input. In this paper, we consider the referential games of Lazaridou et al. (2017) and investigate the representations the agents develop during their evolving interaction. We find that… ▽ More There is growing interest in the language developed by agents interacting in emergent-communication settings. Earlier studies have focused on the agents' symbol usage, rather than on their representation of visual input. In this paper, we consider the referential games of Lazaridou et al. (2017) and investigate the representations the agents develop during their evolving interaction. We find that the agents establish successful communication by inducing visual representations that almost perfectly align with each other, but, surprisingly, do not capture the conceptual properties of the objects depicted in the input images. We conclude that, if we are interested in develo** language-like communication systems, we must pay more attention to the visual semantics agents associate to the symbols they use. △ Less

Submitted 13 September, 2018; v1 submitted 31 August, 2018; originally announced August 2018.

Comments: 2018 Conference on Empirical Methods in Natural Language Processing

arXiv:1807.07545 [pdf, other]

Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks

Authors: João Loula, Marco Baroni, Brenden M. Lake

Abstract: Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it's seen as key to humans' capacity for generalization in language. Recent work has studied systematic compositionality in modern seq2seq models using generalization to novel navigation instructions in a grounded environment as a probing tool, requiring models to quickly bootstrap t… ▽ More Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it's seen as key to humans' capacity for generalization in language. Recent work has studied systematic compositionality in modern seq2seq models using generalization to novel navigation instructions in a grounded environment as a probing tool, requiring models to quickly bootstrap the meaning of new words. We extend this framework here to settings where the model needs only to recombine well-trained functional words (such as "around" and "right") in novel contexts. Our findings confirm and strengthen the earlier ones: seq2seq models can be impressively good at generalizing to novel combinations of previously-seen input, but only when they receive extensive training on the specific pattern to be generalized (e.g., generalizing from many examples of "X around right" to "jump around right"), while failing when generalization requires novel application of compositional rules (e.g., inferring the meaning of "around right" from those of "right" and "around"). △ Less

Submitted 19 July, 2018; originally announced July 2018.

arXiv:1805.01070 [pdf, other]

What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Authors: Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni

Abstract: Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the repres… ▽ More Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. We introduce here 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods. △ Less

Submitted 8 July, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

Comments: ACL 2018

arXiv:1803.11138 [pdf, other]

Colorless green recurrent networks dream hierarchically

Authors: Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco Baroni

Abstract: Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russia… ▽ More Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues ("The colorless green ideas I ate with the chair sleep furiously"), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence. △ Less

Submitted 29 March, 2018; originally announced March 2018.

Comments: Accepted to NAACL 2018

arXiv:1802.06467 [pdf, other]

Memorize or generalize? Searching for a compositional RNN in a haystack

Authors: Adam Liška, Germán Kruszewski, Marco Baroni

Abstract: Neural networks are very powerful learning systems, but they do not readily generalize from one task to the other. This is partly due to the fact that they do not learn in a compositional way, that is, by discovering skills that are shared by different tasks, and recombining them to solve new problems. In this paper, we explore the compositional generalization capabilities of recurrent neural netw… ▽ More Neural networks are very powerful learning systems, but they do not readily generalize from one task to the other. This is partly due to the fact that they do not learn in a compositional way, that is, by discovering skills that are shared by different tasks, and recombining them to solve new problems. In this paper, we explore the compositional generalization capabilities of recurrent neural networks (RNNs). We first propose the lookup table composition domain as a simple setup to test compositional behaviour and show that it is theoretically possible for a standard RNN to learn to behave compositionally in this domain when trained with standard gradient descent and provided with additional supervision. We then remove this additional supervision and perform a search over a large number of model initializations to investigate the proportion of RNNs that can still converge to a compositional solution. We discover that a small but non-negligible proportion of RNNs do reach partial compositional solutions even without special architectural constraints. This suggests that a combination of gradient descent and evolutionary strategies directly favouring the minority models that developed more compositional approaches might suffice to lead standard RNNs towards compositional solutions. △ Less

Submitted 25 July, 2018; v1 submitted 18 February, 2018; originally announced February 2018.

Comments: AEGAP Workshop (ICML 2018)

arXiv:1711.00350 [pdf, other]

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Authors: Brenden M. Lake, Marco Baroni

Abstract: Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We th… ▽ More Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply "mix-and-match" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the "dax" example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, suggesting that lack of systematicity might be partially responsible for neural networks' notorious training data thirst. △ Less

Submitted 6 June, 2018; v1 submitted 30 October, 2017; originally announced November 2017.

Comments: Published at the 35th International Conference on Machine Learning (ICML 2018)

Journal ref: Lake, B. M. and Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. International Conference on Machine Learning (ICML)

arXiv:1707.06556 [pdf, other]

High-risk learning: acquiring new word vectors from tiny data

Authors: Aurelie Herbelot, Marco Baroni

Abstract: Distributional semantics models are known to struggle with small data. It is generally accepted that in order to learn 'a good vector' for a word, a model must have sufficient examples of its usage. This contradicts the fact that humans can guess the meaning of a word from a few occurrences only. In this paper, we show that a neural language model such as Word2Vec only necessitates minor modificat… ▽ More Distributional semantics models are known to struggle with small data. It is generally accepted that in order to learn 'a good vector' for a word, a model must have sufficient examples of its usage. This contradicts the fact that humans can guess the meaning of a word from a few occurrences only. In this paper, we show that a neural language model such as Word2Vec only necessitates minor modifications to its standard architecture to learn new terms from tiny data, using background knowledge from a previously learnt semantic space. We test our model on word definitions and on a nonce task involving 2-6 sentences' worth of context, showing a large increase in performance over state-of-the-art models on the definitional task. △ Less

Submitted 20 July, 2017; originally announced July 2017.

Comments: Accepted as short paper at EMNLP 2017

arXiv:1702.07306 [pdf, other]

Causal Discovery Using Proxy Variables

Authors: Mateo Rojas-Carulla, Marco Baroni, David Lopez-Paz

Abstract: Discovering causal relations is fundamental to reasoning and intelligence. In particular, observational causal discovery algorithms estimate the cause-effect relation between two random entities $X$ and $Y$, given $n$ samples from $P(X,Y)$. In this paper, we develop a framework to estimate the cause-effect relation between two static entities $x$ and $y$: for instance, an art masterpiece $x$ and… ▽ More Discovering causal relations is fundamental to reasoning and intelligence. In particular, observational causal discovery algorithms estimate the cause-effect relation between two random entities $X$ and $Y$, given $n$ samples from $P(X,Y)$. In this paper, we develop a framework to estimate the cause-effect relation between two static entities $x$ and $y$: for instance, an art masterpiece $x$ and its fraudulent copy $y$. To this end, we introduce the notion of proxy variables, which allow the construction of a pair of random entities $(A,B)$ from the pair of static entities $(x,y)$. Then, estimating the cause-effect relation between $A$ and $B$ using an observational causal discovery algorithm leads to an estimation of the cause-effect relation between $x$ and $y$. For example, our framework detects the causal relation between unprocessed photographs and their modifications, and orders in time a set of shuffled frames from a video. As our main case study, we introduce a human-elicited dataset of 10,000 pairs of casually-linked pairs of words from natural language. Our methods discover 75% of these causal relations. Finally, we discuss the role of proxy variables in machine learning, as a general tool to incorporate static knowledge into prediction tasks. △ Less

Submitted 23 February, 2017; originally announced February 2017.

arXiv:1702.01815 [pdf, other]

Living a discrete life in a continuous world: Reference with distributed representations

Authors: Gemma Boleda, Sebastian Padó, Nghia The Pham, Marco Baroni

Abstract: Reference is a crucial property of language that allows us to connect linguistic expressions to the world. Modeling it requires handling both continuous and discrete aspects of meaning. Data-driven models excel at the former, but struggle with the latter, and the reverse is true for symbolic models. This paper (a) introduces a concrete referential task to test both aspects, called cross-modal en… ▽ More Reference is a crucial property of language that allows us to connect linguistic expressions to the world. Modeling it requires handling both continuous and discrete aspects of meaning. Data-driven models excel at the former, but struggle with the latter, and the reverse is true for symbolic models. This paper (a) introduces a concrete referential task to test both aspects, called cross-modal entity tracking; (b) proposes a neural network architecture that uses external memory to build an entity library inspired in the DRSs of DRT, with a mechanism to dynamically introduce new referents or add information to referents that are already in the library. Our model shows promise: it beats traditional neural network architectures on the task. However, it is still outperformed by Memory Networks, another model with external memory. △ Less

Submitted 4 September, 2017; v1 submitted 6 February, 2017; originally announced February 2017.

Comments: Accepted at IWCS 2017. Final version, 9 pages

arXiv:1701.08954 [pdf, ps, other]

CommAI: Evaluating the first steps towards a useful general AI

Authors: Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov

Abstract: With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal. However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In or… ▽ More With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal. However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In order to fill this gap, we propose here a set of concrete desiderata for general AI, together with a platform to test machines on how well they satisfy such desiderata, while kee** all further complexities to a minimum. △ Less

Submitted 27 March, 2017; v1 submitted 31 January, 2017; originally announced January 2017.

Comments: Published in ICLR 2017 Workshop Track

arXiv:1612.07182 [pdf, other]

Multi-Agent Cooperation and the Emergence of (Natural) Language

Authors: Angeliki Lazaridou, Alexander Peysakhovich, Marco Baroni

Abstract: The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in develo** interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a… ▽ More The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in develo** interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a sender and a receiver see a pair of images. The sender is told one of them is the target and is allowed to send a message from a fixed, arbitrary vocabulary to the receiver. The receiver must rely on this message to identify the target. Thus, the agents develop their own language interactively out of the need to communicate. We show that two networks with simple configurations are able to learn to coordinate in the referential game. We further explore how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images. In addition, we present a simple strategy for grounding the agents' code into natural language. Both of these are necessary steps towards develo** machines that are able to communicate with humans productively. △ Less

Submitted 5 March, 2017; v1 submitted 21 December, 2016; originally announced December 2016.

Comments: Accepted at ICLR 2017

arXiv:1606.08777 [pdf, other]

doi 10.1007/978-3-319-77113-7_17

"Show me the cup": Reference with Continuous Representations

Authors: Gemma Boleda, Sebastian Padó, Marco Baroni

Abstract: One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended… ▽ More One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended object if the expression has a unique referent, or indicates a failure, if it does not. The model, directly trained on reference acts, is competitive with a pipeline manually engineered to perform the same task, both when referents are purely visual, and when they are characterized by a combination of visual and linguistic properties. △ Less

Submitted 28 June, 2016; originally announced June 2016.

Journal ref: In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science, vol 10761. Springer, Cham

arXiv:1606.06031 [pdf, other]

The LAMBADA dataset: Word prediction requiring a broad discourse context

Authors: Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, Raquel Fernández

Abstract: We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAM… ▽ More We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAMBADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse. We show that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-of-the-art language models reaches accuracy above 1% on this novel benchmark. We thus propose LAMBADA as a challenging test set, meant to encourage the development of new models capable of genuine understanding of broad context in natural language text. △ Less

Submitted 20 June, 2016; originally announced June 2016.

Comments: 10 pages, Accepted as a long paper for ACL 2016

arXiv:1605.07133 [pdf, other]

Towards Multi-Agent Communication-Based Language Learning

Authors: Angeliki Lazaridou, Nghia The Pham, Marco Baroni

Abstract: We propose an interactive multimodal framework for language learning. Instead of being passively exposed to large amounts of natural text, our learners (implemented as feed-forward neural networks) engage in cooperative referential games starting from a tabula rasa setup, and thus develop their own language from the need to communicate in order to succeed at the game. Preliminary experiments provi… ▽ More We propose an interactive multimodal framework for language learning. Instead of being passively exposed to large amounts of natural text, our learners (implemented as feed-forward neural networks) engage in cooperative referential games starting from a tabula rasa setup, and thus develop their own language from the need to communicate in order to succeed at the game. Preliminary experiments provide promising results, but also suggest that it is important to ensure that agents trained in this way do not develop an adhoc communication code only effective for the game they are playing △ Less

Submitted 23 May, 2016; originally announced May 2016.

Comments: 9 pages, manuscript under submission

arXiv:1603.02618 [pdf, other]

The red one!: On learning to refer to things based on their discriminative properties

Authors: Angeliki Lazaridou, Nghia The Pham, Marco Baroni

Abstract: As a first step towards agents learning to communicate about their visual environment, we propose a system that, given visual representations of a referent (cat) and a context (sofa), identifies their discriminative attributes, i.e., properties that distinguish them (has_tail). Moreover, despite the lack of direct supervision at the attribute level, the model learns to assign plausible attributes… ▽ More As a first step towards agents learning to communicate about their visual environment, we propose a system that, given visual representations of a referent (cat) and a context (sofa), identifies their discriminative attributes, i.e., properties that distinguish them (has_tail). Moreover, despite the lack of direct supervision at the attribute level, the model learns to assign plausible attributes to objects (sofa-has_cushion). Finally, we present a preliminary experiment confirming the referential success of the predicted discriminative attributes. △ Less

Submitted 23 May, 2016; v1 submitted 8 March, 2016; originally announced March 2016.

Comments: Accepted as an ACL-short sumbmission

arXiv:1511.08130 [pdf, other]

A Roadmap towards Machine Intelligence

Authors: Tomas Mikolov, Armand Joulin, Marco Baroni

Abstract: The development of intelligent machines is one of the biggest unsolved challenges in computer science. In this paper, we propose some fundamental properties these machines should have, focusing in particular on communication and learning. We discuss a simple environment that could be used to incrementally teach a machine the basics of natural-language-based communication, as a prerequisite to more… ▽ More The development of intelligent machines is one of the biggest unsolved challenges in computer science. In this paper, we propose some fundamental properties these machines should have, focusing in particular on communication and learning. We discuss a simple environment that could be used to incrementally teach a machine the basics of natural-language-based communication, as a prerequisite to more complex interaction with human users. We also present some conjectures on the sort of algorithms the machine should support in order to profitably learn from the environment. △ Less

Submitted 26 February, 2016; v1 submitted 25 November, 2015; originally announced November 2015.

arXiv:1506.03500 [pdf, other]

Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation

Authors: Angeliki Lazaridou, Dat Tien Nguyen, Raffaella Bernardi, Marco Baroni

Abstract: We introduce language-driven image generation, the task of generating an image visualizing the semantic contents of a word embedding, e.g., given the word embedding of grasshopper, we generate a natural image of a grasshopper. We implement a simple method based on two map** functions. The first takes as input a word embedding (as produced, e.g., by the word2vec toolkit) and maps it onto a high-l… ▽ More We introduce language-driven image generation, the task of generating an image visualizing the semantic contents of a word embedding, e.g., given the word embedding of grasshopper, we generate a natural image of a grasshopper. We implement a simple method based on two map** functions. The first takes as input a word embedding (as produced, e.g., by the word2vec toolkit) and maps it onto a high-level visual space (e.g., the space defined by one of the top layers of a Convolutional Neural Network). The second function maps this abstract visual representation to pixel space, in order to generate the target image. Several user studies suggest that the current system produces images that capture general visual properties of the concepts encoded in the word embedding, such as color or typical environment, and are sufficient to discriminate between general categories of objects. △ Less

Submitted 23 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: A 6-page version to appear at the Multimodal Machine Learning NIPS 2015 Workshop

arXiv:1501.02714 [pdf, other]

From Visual Attributes to Adjectives through Decompositional Distributional Semantics

Authors: Angeliki Lazaridou, Georgiana Dinu, Adam Liska, Marco Baroni

Abstract: As automated image analysis progresses, there is increasing interest in richer linguistic annotation of pictures, with attributes of objects (e.g., furry, brown...) attracting most attention. By building on the recent "zero-shot learning" approach, and paying attention to the linguistic nature of attributes as noun modifiers, and specifically adjectives, we show that it is possible to tag images w… ▽ More As automated image analysis progresses, there is increasing interest in richer linguistic annotation of pictures, with attributes of objects (e.g., furry, brown...) attracting most attention. By building on the recent "zero-shot learning" approach, and paying attention to the linguistic nature of attributes as noun modifiers, and specifically adjectives, we show that it is possible to tag images with attribute-denoting adjectives even when no training data containing the relevant annotation are available. Our approach relies on two key observations. First, objects can be seen as bundles of attributes, typically expressed as adjectival modifiers (a dog is something furry, brown, etc.), and thus a function trained to map visual representations of objects to nominal labels can implicitly learn to map attributes to adjectives. Second, objects and attributes come together in pictures (the same thing is a dog and it is brown). We can thus achieve better attribute (and object) label retrieval by treating images as "visual phrases", and decomposing their linguistic representation into an attribute-denoting adjective and an object-denoting noun. Our approach performs comparably to a method exploiting manual attribute annotation, it outperforms various competitive alternatives in both attribute and object annotation, and it automatically constructs attribute-centric representations that significantly improve performance in supervised object recognition. △ Less

Submitted 24 March, 2015; v1 submitted 12 January, 2015; originally announced January 2015.

Comments: accepted at Transactions of the Association for Computational Linguistics (TACL), 3/2015

Showing 1–50 of 55 results for author: Baroni, M