Search | arXiv e-print repository

From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks

Authors: Jacob Russin, Sam Whitman McGrath, Danielle J. Williams, Lotem Elber-Dorozko

Abstract: Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to di… ▽ More Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs), which share the same fundamental design principles as their predecessors, have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, large language models (LLMs), DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs -- all behaviors thought to require compositional processing. In this chapter, we survey recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience, situating recent breakthroughs within the broader context of philosophical arguments about compositionality. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities: (1) architectural inductive biases, and (2) metalearning, or learning to learn. We also present findings suggesting that LLM pretraining can be understood as a kind of metalearning, and can thereby equip DNNs with compositional generalization abilities in a similar way. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition and by suggesting avenues for future research. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 32 pages (50 pages including references), 8 figures

arXiv:2405.13231 [pdf]

Multiple Realizability and the Rise of Deep Learning

Authors: Sam Whitman McGrath, Jacob Russin

Abstract: The multiple realizability thesis holds that psychological states may be implemented in a diversity of physical systems. The deep learning revolution seems to be bringing this possibility to life, offering the most plausible examples of man-made realizations of sophisticated cognitive functions to date. This paper explores the implications of deep learning models for the multiple realizability the… ▽ More The multiple realizability thesis holds that psychological states may be implemented in a diversity of physical systems. The deep learning revolution seems to be bringing this possibility to life, offering the most plausible examples of man-made realizations of sophisticated cognitive functions to date. This paper explores the implications of deep learning models for the multiple realizability thesis. Among other things, it challenges the widely held view that multiple realizability entails that the study of the mind can and must be pursued independently of the study of its implementation in the brain or in artificial analogues. Although its central contribution is philosophical, the paper has substantial methodological upshots for contemporary cognitive science, suggesting that deep neural networks may play a crucial role in formulating and evaluating hypotheses about cognition, even if they are interpreted as implementation-level models. In the age of deep learning, multiple realizability possesses a renewed significance. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2402.08674 [pdf, other]

Human Curriculum Effects Emerge with In-Context Learning in Neural Networks

Authors: Jacob Russin, Ellie Pavlick, Michael J. Frank

Abstract: Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tra… ▽ More Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tradeoff spontaneously emerges with ``in-context learning'' (ICL) both in neural networks trained with metalearning and in large language models (LLMs). ICL is the ability to learn new tasks ``in context'' -- without weight changes -- via an inner-loop algorithm implemented in activation dynamics. Experiments with pretrained LLMs and metalearning transformers show that ICL exhibits the blocking advantage demonstrated in humans on a task involving rule-like structure, and conversely, that concurrent in-weight learning reproduces the interleaving advantage observed in humans on tasks lacking such structure. △ Less

Submitted 12 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: 7 pages, 4 figures, accepted as a talk + full paper at CogSci 2024

arXiv:2401.16592 [pdf]

A compact and cost-effective laser-powered speckle visibility spectroscopy (SVS) device for measuring cerebral blood flow

Authors: Yu Xi Huang, Simon Mahler, Maya Dickson, Aidin Abedi, Julian M. Tyszka, Jack Lo Yu Tung, Jonathan Russin, Charles Liu, Changhuei Yang

Abstract: In the realm of cerebrovascular monitoring, primary metrics typically include blood pressure, which influences cerebral blood flow (CBF) and is contingent upon vessel radius. Measuring CBF non-invasively poses a persistent challenge, primarily attributed to the difficulty of accessing and obtaining signal from the brain. This study aims to introduce a compact speckle visibility spectroscopy (SVS)… ▽ More In the realm of cerebrovascular monitoring, primary metrics typically include blood pressure, which influences cerebral blood flow (CBF) and is contingent upon vessel radius. Measuring CBF non-invasively poses a persistent challenge, primarily attributed to the difficulty of accessing and obtaining signal from the brain. This study aims to introduce a compact speckle visibility spectroscopy (SVS) device designed for non-invasive CBF measurements, offering cost-effectiveness and scalability while tracking CBF with remarkable sensitivity and temporal resolution. The wearable hardware has a modular design approach consisting solely of a laser diode as the source and a meticulously selected board camera as the detector. They both can be easily placed on the head of a subject to measure CBF with no additional optical elements. The SVS device can achieve a sampling rate of 80 Hz with minimal susceptibility to external disturbances. The device also achieves better SNR compared with traditional fiber-based SVS devices, capturing about 70 times more signal and showing superior stability and reproducibility. It is designed to be paired and distributed in multiple configurations around the head, and measure signals that exceed the quality of prior optical CBF measurement techniques. Given its cost-effectiveness, scalability, and simplicity, this laser-centric tool offers significant potential in advancing non-invasive cerebral monitoring technologies. △ Less

Submitted 8 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2309.06629 [pdf, other]

The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

Authors: Taylor W. Webb, Steven M. Frankland, Awni Altabaa, Simon Segert, Kamesh Krishnamurthy, Declan Campbell, Jacob Russin, Tyler Giallanza, Zack Dulberg, Randall O'Reilly, John Lafferty, Jonathan D. Cohen

Abstract: A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck… ▽ More A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. In that approach, neural networks are constrained via their architecture to focus on relations between perceptual inputs, rather than the attributes of individual inputs. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain. △ Less

Submitted 1 May, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2206.00248 [pdf]

Transcranial photoacoustic computed tomography of human brain function

Authors: Yang Zhang, Shuai Na, Karteekeya Sastry, Jonathan J. Russin, Peng Hu, Li Lin, Xin Tong, Kay B. Jann, Danny J. Wang, Charles Y. Liu, Lihong V. Wang

Abstract: Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving trans… ▽ More Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving transcranial functional imaging. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2202.04773 [pdf, other]

A Neural Network Model of Continual Learning with Cognitive Control

Authors: Jacob Russin, Maryam Zolfaghar, Seongmin A. Park, Erie Boorman, Randall C. O'Reilly

Abstract: Neural networks struggle in continual learning settings from catastrophic forgetting: when trials are blocked, new learning can overwrite the learning from previous blocks. Humans learn effectively in these settings, in some cases even showing an advantage of blocking, suggesting the brain contains mechanisms to overcome this problem. Here, we build on previous work and show that neural networks e… ▽ More Neural networks struggle in continual learning settings from catastrophic forgetting: when trials are blocked, new learning can overwrite the learning from previous blocks. Humans learn effectively in these settings, in some cases even showing an advantage of blocking, suggesting the brain contains mechanisms to overcome this problem. Here, we build on previous work and show that neural networks equipped with a mechanism for cognitive control do not exhibit catastrophic forgetting when trials are blocked. We further show an advantage of blocking over interleaving when there is a bias for active maintenance in the control signal, implying a tradeoff between maintenance and the strength of control. Analyses of map-like representations learned by the networks provided additional insights into these mechanisms. Our work highlights the potential of cognitive control to aid continual learning in neural networks, and offers an explanation for the advantage of blocking that has been observed in humans. △ Less

Submitted 3 November, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: 7 pages, 5 figures, paper accepted as a talk to CogSci 2022 (https://escholarship.org/uc/item/3gn3w58z)

Journal ref: CogSci 2022, 44

arXiv:2108.03387 [pdf, other]

The Structure of Systematicity in the Brain

Authors: Randall C. O'Reilly, Charan Ranganath, Jacob L. Russin

Abstract: A hallmark of human intelligence is the ability to adapt to new situations, by applying learned rules to new content (systematicity) and thereby enabling an open-ended number of inferences and actions (generativity). Here, we propose that the human brain accomplishes these feats through pathways in the parietal cortex that encode the abstract structure of space, events, and tasks, and pathways in… ▽ More A hallmark of human intelligence is the ability to adapt to new situations, by applying learned rules to new content (systematicity) and thereby enabling an open-ended number of inferences and actions (generativity). Here, we propose that the human brain accomplishes these feats through pathways in the parietal cortex that encode the abstract structure of space, events, and tasks, and pathways in the temporal cortex that encode information about specific people, places, and things (content). Recent neural network models show how the separation of structure and content might emerge through a combination of architectural biases and learning, and these networks show dramatic improvements in the ability to capture systematic, generative behavior. We close by considering how the hippocampal formation may form integrative memories that enable rapid learning of new structure and content representations. △ Less

Submitted 7 August, 2021; originally announced August 2021.

Comments: 10 pages, 2 figures, Submitted to Current Directions in Psychological Science

arXiv:2105.08961 [pdf, other]

Compositional Processing Emerges in Neural Networks Solving Math Problems

Authors: Jacob Russin, Roland Fernandez, Hamid Palangi, Eric Rosen, Nebojsa Jojic, Paul Smolensky, Jianfeng Gao

Abstract: A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has… ▽ More A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings (e.g., the quantities corresponding to numerals) should be composed according to structured rules (e.g., order of operations). Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: 7 pages, 2 figures, Accepted to CogSci 2021 for poster presentation

arXiv:2105.08944 [pdf, other]

Complementary Structure-Learning Neural Networks for Relational Reasoning

Authors: Jacob Russin, Maryam Zolfaghar, Seongmin A. Park, Erie Boorman, Randall C. O'Reilly

Abstract: The neural mechanisms supporting flexible relational inferences, especially in novel situations, are a major focus of current research. In the complementary learning systems framework, pattern separation in the hippocampus allows rapid learning in novel environments, while slower learning in neocortex accumulates small weight changes to extract systematic structure from well-learned environments.… ▽ More The neural mechanisms supporting flexible relational inferences, especially in novel situations, are a major focus of current research. In the complementary learning systems framework, pattern separation in the hippocampus allows rapid learning in novel environments, while slower learning in neocortex accumulates small weight changes to extract systematic structure from well-learned environments. In this work, we adapt this framework to a task from a recent fMRI experiment where novel transitive inferences must be made according to implicit relational structure. We show that computational models capturing the basic cognitive properties of these two systems can explain relational transitive inferences in both familiar and novel environments, and reproduce key phenomena observed in the fMRI experiment. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: 7 pages, 4 figures, Accepted to CogSci 2021 for poster presentation

arXiv:2006.14800 [pdf, other]

Deep Predictive Learning in Neocortex and Pulvinar

Authors: Randall C. O'Reilly, Jacob L. Russin, Maryam Zolfaghar, John Rohrlich

Abstract: How do humans learn from raw sensory experience? Throughout life, but most obviously in infancy, we learn without explicit instruction. We propose a detailed biological mechanism for the widely-embraced idea that learning is based on the differences between predictions and actual outcomes (i.e., predictive error-driven learning). Specifically, numerous weak projections into the pulvinar nucleus of… ▽ More How do humans learn from raw sensory experience? Throughout life, but most obviously in infancy, we learn without explicit instruction. We propose a detailed biological mechanism for the widely-embraced idea that learning is based on the differences between predictions and actual outcomes (i.e., predictive error-driven learning). Specifically, numerous weak projections into the pulvinar nucleus of the thalamus generate top-down predictions, and sparse, focal driver inputs from lower areas supply the actual outcome, originating in layer 5 intrinsic bursting (5IB) neurons. Thus, the outcome is only briefly activated, roughly every 100 msec (i.e., 10 Hz, alpha), resulting in a temporal difference error signal, which drives local synaptic changes throughout the neocortex, resulting in a biologically-plausible form of error backpropagation learning. We implemented these mechanisms in a large-scale model of the visual system, and found that the simulated inferotemporal (IT) pathway learns to systematically categorize 3D objects according to invariant shape properties, based solely on predictive learning from raw visual inputs. These categories match human judgments on the same stimuli, and are consistent with neural representations in IT cortex in primates. △ Less

Submitted 28 January, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: 56 pages, 22 figures

arXiv:1904.09708 [pdf, other]

Compositional generalization in a deep seq2seq model by separating syntax and semantics

Authors: Jake Russin, Jason Jo, Randall C. O'Reilly, Yoshua Bengio

Abstract: Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntacti… ▽ More Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure. △ Less

Submitted 23 May, 2019; v1 submitted 21 April, 2019; originally announced April 2019.

Comments: 18 pages, 15 figures, preprint version of submission to NeurIPS 2019, under review

Showing 1–12 of 12 results for author: Russin, J