-
Bidirectional Transformer Representations of (Spanish) Ambiguous Words in Context: A New Lexical Resource and Empirical Analysis
Authors:
Pamela D. Rivière,
Anne L. Beatty-Martínez,
Sean Trott
Abstract:
Lexical ambiguity -- where a single wordform takes on distinct, context-dependent meanings -- serves as a useful tool to compare across different large language models' (LLMs') ability to form distinct, contextualized representations of the same stimulus. Few studies have systematically compared LLMs' contextualized word embeddings for languages beyond English. Here, we evaluate multiple bidirecti…
▽ More
Lexical ambiguity -- where a single wordform takes on distinct, context-dependent meanings -- serves as a useful tool to compare across different large language models' (LLMs') ability to form distinct, contextualized representations of the same stimulus. Few studies have systematically compared LLMs' contextualized word embeddings for languages beyond English. Here, we evaluate multiple bidirectional transformers' (BERTs') semantic representations of Spanish ambiguous nouns in context. We develop a novel dataset of minimal-pair sentences evoking the same or different sense for a target ambiguous noun. In a pre-registered study, we collect contextualized human relatedness judgments for each sentence pair. We find that various BERT-based LLMs' contextualized semantic representations capture some variance in human judgments but fall short of the human benchmark, and for Spanish -- unlike English -- model scale is uncorrelated with performance. We also identify stereotyped trajectories of target noun disambiguation as a proportion of traversal through a given LLM family's architecture, which we partially replicate in English. We contribute (1) a dataset of controlled, Spanish sentence stimuli with human relatedness norms, and (2) to our evolving understanding of the impact that LLM specification (architectures, training protocols) exerts on contextualized embeddings.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphology
Authors:
Hagyeong Shin,
Sean Trott
Abstract:
Markedness in natural language is often associated with non-literal meanings in discourse. Differential Object Marking (DOM) in Korean is one instance of this phenomenon, where post-positional markers are selected based on both the semantic features of the noun phrases and the discourse features that are orthogonal to the semantic features. Previous work has shown that distributional models of lan…
▽ More
Markedness in natural language is often associated with non-literal meanings in discourse. Differential Object Marking (DOM) in Korean is one instance of this phenomenon, where post-positional markers are selected based on both the semantic features of the noun phrases and the discourse features that are orthogonal to the semantic features. Previous work has shown that distributional models of language recover certain semantic features of words -- do these models capture implied discourse-level meanings as well? We evaluate whether a set of large language models are capable of associating discourse meanings with different object markings in Korean. Results suggest that discourse meanings of a grammatical marker can be more challenging to encode than that of a discourse marker.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
Authors:
Catherine Arnett,
Pamela D. Rivière,
Tyler A. Chang,
Sean Trott
Abstract:
The relationship between language model tokenization and performance is an open area of research. Here, we investigate how different tokenization schemes impact number agreement in Spanish plurals. We find that morphologically-aligned tokenization performs similarly to other tokenization schemes, even when induced artificially for words that would not be tokenized that way during training. We then…
▽ More
The relationship between language model tokenization and performance is an open area of research. Here, we investigate how different tokenization schemes impact number agreement in Spanish plurals. We find that morphologically-aligned tokenization performs similarly to other tokenization schemes, even when induced artificially for words that would not be tokenized that way during training. We then present exploratory analyses demonstrating that language model embeddings for different plural tokenizations have similar distributions along the embedding space axis that maximally distinguishes singular and plural nouns. Our results suggest that morphologically-aligned tokenization is a viable tokenization approach, and existing models already generalize some morphological patterns to new items. However, our results indicate that morphological tokenization is not strictly required for performance.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Do Large Language Models know what humans know?
Authors:
Sean Trott,
Cameron Jones,
Tyler Chang,
James Michaelov,
Benjamin Bergen
Abstract:
Humans can attribute beliefs to others. However, it is unknown to what extent this ability results from an innate biological endowment or from experience accrued through child development, particularly exposure to language describing others' mental states. We test the viability of the language exposure hypothesis by assessing whether models exposed to large quantities of human language display sen…
▽ More
Humans can attribute beliefs to others. However, it is unknown to what extent this ability results from an innate biological endowment or from experience accrued through child development, particularly exposure to language describing others' mental states. We test the viability of the language exposure hypothesis by assessing whether models exposed to large quantities of human language display sensitivity to the implied knowledge states of characters in written passages. In pre-registered analyses, we present a linguistic version of the False Belief Task to both human participants and a Large Language Model, GPT-3. Both are sensitive to others' beliefs, but while the language model significantly exceeds chance behavior, it does not perform as well as the humans, nor does it explain the full extent of their behavior -- despite being exposed to more language than a human would in a lifetime. This suggests that while statistical learning from language exposure may in part explain how humans develop the ability to reason about the mental states of others, other mechanisms are also responsible.
△ Less
Submitted 31 May, 2023; v1 submitted 3 September, 2022;
originally announced September 2022.
-
Contextualized Sensorimotor Norms: multi-dimensional measures of sensorimotor strength for ambiguous English words, in context
Authors:
Sean Trott,
Benjamin Bergen
Abstract:
Most large language models are trained on linguistic input alone, yet humans appear to ground their understanding of words in sensorimotor experience. A natural solution is to augment LM representations with human judgments of a word's sensorimotor associations (e.g., the Lancaster Sensorimotor Norms), but this raises another challenge: most words are ambiguous, and judgments of words in isolation…
▽ More
Most large language models are trained on linguistic input alone, yet humans appear to ground their understanding of words in sensorimotor experience. A natural solution is to augment LM representations with human judgments of a word's sensorimotor associations (e.g., the Lancaster Sensorimotor Norms), but this raises another challenge: most words are ambiguous, and judgments of words in isolation fail to account for this multiplicity of meaning (e.g., "wooden table" vs. "data table"). We attempted to address this problem by building a new lexical resource of contextualized sensorimotor judgments for 112 English words, each rated in four different contexts (448 sentences total). We show that these ratings encode overlap** but distinct information from the Lancaster Sensorimotor Norms, and that they also predict other measures of interest (e.g., relatedness), above and beyond measures derived from BERT. Beyond shedding light on theoretical questions, we suggest that these ratings could be of use as a "challenge set" for researchers building grounded language models.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
RAW-C: Relatedness of Ambiguous Words--in Context (A New Lexical Resource for English)
Authors:
Sean Trott,
Benjamin Bergen
Abstract:
Most words are ambiguous--i.e., they convey distinct meanings in different contexts--and even the meanings of unambiguous words are context-dependent. Both phenomena present a challenge for NLP. Recently, the advent of contextualized word embeddings has led to success on tasks involving lexical ambiguity, such as Word Sense Disambiguation. However, there are few tasks that directly evaluate how we…
▽ More
Most words are ambiguous--i.e., they convey distinct meanings in different contexts--and even the meanings of unambiguous words are context-dependent. Both phenomena present a challenge for NLP. Recently, the advent of contextualized word embeddings has led to success on tasks involving lexical ambiguity, such as Word Sense Disambiguation. However, there are few tasks that directly evaluate how well these contextualized embeddings accommodate the more continuous, dynamic nature of word meaning--particularly in a way that matches human intuitions. We introduce RAW-C, a dataset of graded, human relatedness judgments for 112 ambiguous words in context (with 672 sentence pairs total), as well as human estimates of sense dominance. The average inter-annotator agreement (assessed using a leave-one-annotator-out method) was 0.79. We then show that a measure of cosine distance, computed using contextualized embeddings from BERT and ELMo, correlates with human judgments, but that cosine distance also systematically underestimates how similar humans find uses of the same sense of a word to be, and systematically overestimates how similar humans find uses of different-sense homonyms. Finally, we propose a synthesis between psycholinguistic theories of the mental lexicon and computational models of lexical semantics.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
(Re)construing Meaning in NLP
Authors:
Sean Trott,
Tiago Timponi Torrent,
Nancy Chang,
Nathan Schneider
Abstract:
Human speakers have an extensive toolkit of ways to express themselves. In this paper, we engage with an idea largely absent from discussions of meaning in natural language understanding--namely, that the way something is expressed reflects different ways of conceptualizing or construing the information being conveyed. We first define this phenomenon more precisely, drawing on considerable prior w…
▽ More
Human speakers have an extensive toolkit of ways to express themselves. In this paper, we engage with an idea largely absent from discussions of meaning in natural language understanding--namely, that the way something is expressed reflects different ways of conceptualizing or construing the information being conveyed. We first define this phenomenon more precisely, drawing on considerable prior work in theoretical cognitive semantics and psycholinguistics. We then survey some dimensions of construed meaning and show how insights from construal could inform theoretical and practical work in NLP.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Processing Natural Language About Ongoing Actions
Authors:
Steve Doubleday,
Sean Trott,
Jerome Feldman
Abstract:
Actions may not proceed as planned; they may be interrupted, resumed or overridden. This is a challenge to handle in a natural language understanding system. We describe extensions to an existing implementation for the control of autonomous systems by natural language, to enable such systems to handle incoming language requests regarding actions. Language Communication with Autonomous Systems (LCA…
▽ More
Actions may not proceed as planned; they may be interrupted, resumed or overridden. This is a challenge to handle in a natural language understanding system. We describe extensions to an existing implementation for the control of autonomous systems by natural language, to enable such systems to handle incoming language requests regarding actions. Language Communication with Autonomous Systems (LCAS) has been extended with support for X-nets, parameterized executable schemas representing actions. X-nets enable the system to control actions at a desired level of granularity, while providing a mechanism for language requests to be processed asynchronously. Standard semantics supported include requests to stop, continue, or override the existing action. The specific domain demonstrated is the control of motion of a simulated robot, but the approach is general, and could be applied to other domains.
△ Less
Submitted 30 July, 2016; v1 submitted 22 July, 2016;
originally announced July 2016.
-
Exploiting Deep Semantics and Compositionality of Natural Language for Human-Robot-Interaction
Authors:
Manfred Eppe,
Sean Trott,
Jerome Feldman
Abstract:
We develop a natural language interface for human robot interaction that implements reasoning about deep semantics in natural language. To realize the required deep analysis, we employ methods from cognitive linguistics, namely the modular and compositional framework of Embodied Construction Grammar (ECG) [Feldman, 2009]. Using ECG, robots are able to solve fine-grained reference resolution proble…
▽ More
We develop a natural language interface for human robot interaction that implements reasoning about deep semantics in natural language. To realize the required deep analysis, we employ methods from cognitive linguistics, namely the modular and compositional framework of Embodied Construction Grammar (ECG) [Feldman, 2009]. Using ECG, robots are able to solve fine-grained reference resolution problems and other issues related to deep semantics and compositionality of natural language. This also includes verbal interaction with humans to clarify commands and queries that are too ambiguous to be executed safely. We implement our NLU framework as a ROS package and present proof-of-concept scenarios with different robots, as well as a survey on the state of the art.
△ Less
Submitted 22 April, 2016;
originally announced April 2016.