-
Large Language Models estimate fine-grained human color-concept associations
Authors:
Kushin Mukherjee,
Timothy T. Rogers,
Karen B. Schloss
Abstract:
Concepts, both abstract and concrete, elicit a distribution of association strengths across perceptual color space, which influence aspects of visual cognition ranging from object recognition to interpretation of information visualizations. While prior work has hypothesized that color-concept associations may be learned from the cross-modal statistical structure of experience, it has been unclear…
▽ More
Concepts, both abstract and concrete, elicit a distribution of association strengths across perceptual color space, which influence aspects of visual cognition ranging from object recognition to interpretation of information visualizations. While prior work has hypothesized that color-concept associations may be learned from the cross-modal statistical structure of experience, it has been unclear whether natural environments possess such structure or, if so, whether learning systems are capable of discovering and exploiting it without strong prior constraints. We addressed these questions by investigating the ability of GPT-4, a multimodal large language model, to estimate human-like color-concept associations without any additional training. Starting with human color-concept association ratings for 71 color set spanning perceptual color space (\texttt{UW-71}) and concepts that varied in abstractness, we assessed how well association ratings generated by GPT-4 could predict human ratings. GPT-4 ratings were correlated with human ratings, with performance comparable to state-of-the-art methods for automatically estimating color-concept associations from images. Variability in GPT-4's performance across concepts could be explained by specificity of the concept's color-concept association distribution. This study suggests that high-order covariances between language and perception, as expressed in the natural environment of the internet, contain sufficient information to support learning of human-like color-concept associations, and provides an existence proof that a learning system can encode such associations without initial constraints. The work further shows that GPT-4 can be used to efficiently estimate distributions of color associations for a broad range of concepts, potentially serving as a critical tool for designing effective and intuitive information visualizations.
△ Less
Submitted 4 May, 2024;
originally announced June 2024.
-
Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks
Authors:
Yun-Shiuan Chuang,
Zach Studdiford,
Krirk Nirunwiroj,
Agam Goyal,
Vincent V. Frigo,
Sijia Yang,
Dhavan Shah,
Junjie Hu,
Timothy T. Rogers
Abstract:
Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estima…
▽ More
Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estimated a belief network encompassing 18 topics loading on two non-overlap** latent factors. We then seeded LLM-based agents with an opinion on one topic, and assessed the alignment of its expressed opinions on remaining test topics with corresponding human data. Role-playing based on demographic information alone did not align LLM and human opinions, but seeding the agent with a single belief greatly improved alignment for topics related in the belief network, and not for topics outside the network. These results suggest a novel path for human-LLM belief alignment in work seeking to simulate and understand patterns of belief distributions in society.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Semantic distance organizes social knowledge: Insights from semantic dementia and cross-modal conceptual space
Authors:
Y. Ivette Colón,
Matthew Rouse,
Matthew A. Lambon Ralph,
Timothy T. Rogers
Abstract:
Our interaction with others largely hinges on how we semantically organize the social world. The organization of such conceptual information is not static -- as we age, our experiences and ever-changing anatomy alter how we represent and arrange semantic information. How does semantic distance between concepts affect this organization, particularly for those with pathological deficits in semantic…
▽ More
Our interaction with others largely hinges on how we semantically organize the social world. The organization of such conceptual information is not static -- as we age, our experiences and ever-changing anatomy alter how we represent and arrange semantic information. How does semantic distance between concepts affect this organization, particularly for those with pathological deficits in semantic knowledge? Using triplet judgment responses collected from healthy participants, we compute an ordinal similarity embedding for a set of social words and images that vary in the dimensions of age and gender. We compare semantic distances between items in the space to patterns of error in a word-picture matching task performed by patients with semantic dementia (SD). Error patterns reveal that SD patients retain gender information more robustly than age information, and that age-related errors are a function of linear distance in age from a concept word. The distances between probed and exemplar items in the resulting conceptual map reflect error patterns in SD patient responses such that items semantically closer to a probed concept -- in gender category or in linear age -- are more likely to be erroneously chosen by patients in a word-picture matching task. To our knowledge, this is the first triplet embedding work to embed representations of words and images in a unified space, and to use this space to explain patterns of behavior in patients with impaired social semantic cognition.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
The Delusional Hedge Algorithm as a Model of Human Learning from Diverse Opinions
Authors:
Yun-Shiuan Chuang,
Jerry Zhu,
Timothy T. Rogers
Abstract:
Whereas cognitive models of learning often assume direct experience with both the features of an event and with a true label or outcome, much of everyday learning arises from hearing the opinions of others, without direct access to either the experience or the ground truth outcome. We consider how people can learn which opinions to trust in such scenarios by extending the hedge algorithm: a classi…
▽ More
Whereas cognitive models of learning often assume direct experience with both the features of an event and with a true label or outcome, much of everyday learning arises from hearing the opinions of others, without direct access to either the experience or the ground truth outcome. We consider how people can learn which opinions to trust in such scenarios by extending the hedge algorithm: a classic solution for learning from diverse information sources. We first introduce a semi-supervised variant we call the delusional hedge capable of learning from both supervised and unsupervised experiences. In two experiments, we examine the alignment between human judgments and predictions from the standard hedge, the delusional hedge, and a heuristic baseline model. Results indicate that humans effectively incorporate both labeled and unlabeled information in a manner consistent with the delusional hedge algorithm -- suggesting that human learners not only gauge the accuracy of information sources but also their consistency with other reliable sources. The findings advance our understanding of human learning from diverse opinions, with implications for the development of algorithms that better capture how people learn to weigh conflicting information sources.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Learning interactions to boost human creativity with bandits and GPT-4
Authors:
Ara Vartanian,
Xiaoxi Sun,
Yun-Shiuan Chuang,
Siddharth Suresh,
Xiao** Zhu,
Timothy T. Rogers
Abstract:
This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experimen…
▽ More
This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experiments with humans and with a language AI (GPT-4) we contrast behavior in the standard task versus a variant in which participants can ask for algorithmically-generated hints. Algorithm choice is administered by a multi-armed bandit whose reward indicates whether the hint helped generating more features. Humans and the AI show similar benefits from hints, and remarkably, bandits learning from AI responses prefer the same prompting strategy as those learning from human behavior. The results suggest that strategies for boosting human creativity via computer interactions can be learned by bandits run on groups of simulated participants.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents
Authors:
Yun-Shiuan Chuang,
Siddharth Suresh,
Nikunj Harlalka,
Agam Goyal,
Robert Hawkins,
Sijia Yang,
Dhavan Shah,
Junjie Hu,
Timothy T. Rogers
Abstract:
Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human gro…
▽ More
Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence.
△ Less
Submitted 16 February, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Evolving Domain Adaptation of Pretrained Language Models for Text Classification
Authors:
Yun-Shiuan Chuang,
Yi Wu,
Dhruv Gupta,
Rheeya Uppaal,
Ananya Kumar,
Luhang Sun,
Makesh Narsimhan Sreedhar,
Sijia Yang,
Timothy T. Rogers,
Junjie Hu
Abstract:
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self…
▽ More
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method. Our analysis across various datasets reveals that this incremental method excels at adapting PLMs to EDS, outperforming traditional domain adaptation techniques. These findings highlight the importance of continually updating PLMs to ensure their effectiveness in real-world applications, paving the way for future research into PLM robustness against the natural temporal evolution of language.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Simulating Opinion Dynamics with Networks of LLM-based Agents
Authors:
Yun-Shiuan Chuang,
Agam Goyal,
Nikunj Harlalka,
Siddharth Suresh,
Robert Hawkins,
Sijia Yang,
Dhavan Shah,
Junjie Hu,
Timothy T. Rogers
Abstract:
Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings re…
▽ More
Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings reveal a strong inherent bias in LLM agents towards producing accurate information, leading simulated agents to consensus in line with scientific reality. This bias limits their utility for understanding resistance to consensus views on issues like climate change. After inducing confirmation bias through prompt engineering, however, we observed opinion fragmentation in line with existing agent-based modeling and opinion dynamics research. These insights highlight the promise and limitations of LLM agents in this domain and suggest a path forward: refining LLMs with real-world discourse to better simulate the evolution of human beliefs.
△ Less
Submitted 31 March, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Computational Agent-based Models in Opinion Dynamics: A Survey on Social Simulations and Empirical Studies
Authors:
Yun-Shiuan Chuang,
Timothy T. Rogers
Abstract:
Understanding how an individual changes its attitude, belief, and opinion due to other people's social influences is vital because of its wide implications. A core methodology that is used to study the change of attitude under social influences is agent-based model (ABM). The goal of this review paper is to compare and contrast existing ABMs, which I classify into two families, the deductive ABMs…
▽ More
Understanding how an individual changes its attitude, belief, and opinion due to other people's social influences is vital because of its wide implications. A core methodology that is used to study the change of attitude under social influences is agent-based model (ABM). The goal of this review paper is to compare and contrast existing ABMs, which I classify into two families, the deductive ABMs and the inductive ABMs. The former subsumes social simulation studies, and the latter involves human experiments. To facilitate the comparison between ABMs of different formulations, I propose a general unified formulation, in which all ABMs can be viewed as special cases. In addition, I show the connections between deductive ABMs and inductive ABMs, and point out their strengths and limitations. At the end of the paper, I identify underexplored areas and suggest future research directions.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Semantic Feature Verification in FLAN-T5
Authors:
Siddharth Suresh,
Kushin Mukherjee,
Timothy T. Rogers
Abstract:
This study evaluates the potential of a large language model for aiding in generation of semantic feature norms - a critical tool for evaluating conceptual structure in cognitive science. Building from an existing human-generated dataset, we show that machine-verified norms capture aspects of conceptual structure beyond what is expressed in human norms alone, and better explain human judgments of…
▽ More
This study evaluates the potential of a large language model for aiding in generation of semantic feature norms - a critical tool for evaluating conceptual structure in cognitive science. Building from an existing human-generated dataset, we show that machine-verified norms capture aspects of conceptual structure beyond what is expressed in human norms alone, and better explain human judgments of semantic similarity amongst items that are distally related. The results suggest that LLMs can greatly enhance traditional methods of semantic feature norm verification, with implications for our understanding of conceptual representation in humans and machines.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Human-machine cooperation for semantic feature listing
Authors:
Kushin Mukherjee,
Siddharth Suresh,
Timothy T. Rogers
Abstract:
Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexica…
▽ More
Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexical-semantics from limited data with LLM-generated data to efficiently generate high-quality feature norms.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Conceptual structure coheres in human cognition but not in large language models
Authors:
Siddharth Suresh,
Kushin Mukherjee,
Xizheng Yu,
Wei-Chun Huang,
Lisa Padua,
Timothy T Rogers
Abstract:
Neural network models of language have long been used as a tool for develo** hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. Contemporary large language models (LLMs), however, make it possible to i…
▽ More
Neural network models of language have long been used as a tool for develo** hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. Contemporary large language models (LLMs), however, make it possible to interrogate the latent structure of conceptual representations using experimental methods nearly identical to those commonly used with human participants. The current work utilizes three common techniques borrowed from cognitive psychology to estimate and compare the structure of concepts in humans and a suite of LLMs. In humans, we show that conceptual structure is robust to differences in culture, language, and method of estimation. Structures estimated from LLM behavior, while individually fairly consistent with those estimated from human behavior, vary much more depending upon the particular task used to generate responses--across tasks, estimates of conceptual structure from the very same model cohere less with one another than do human structure estimates. These results highlight an important difference between contemporary LLMs and human cognition, with implications for understanding some fundamental limitations of contemporary machine language.
△ Less
Submitted 10 November, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.