Search | arXiv e-print repository

Large language models surpass human experts in predicting neuroscience results

Authors: Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata , et al. (14 additional authors not shown)

Abstract: Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain… ▽ More Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors. △ Less

Submitted 21 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2312.00396 [pdf, other]

GFN-SR: Symbolic Regression with Generative Flow Networks

Authors: Sida Li, Ioana Marinescu, Sebastian Musslick

Abstract: Symbolic regression (SR) is an area of interpretable machine learning that aims to identify mathematical expressions, often composed of simple functions, that best fit in a given set of covariates $X$ and response $y$. In recent years, deep symbolic regression (DSR) has emerged as a popular method in the field by leveraging deep reinforcement learning to solve the complicated combinatorial search… ▽ More Symbolic regression (SR) is an area of interpretable machine learning that aims to identify mathematical expressions, often composed of simple functions, that best fit in a given set of covariates $X$ and response $y$. In recent years, deep symbolic regression (DSR) has emerged as a popular method in the field by leveraging deep reinforcement learning to solve the complicated combinatorial search problem. In this work, we propose an alternative framework (GFN-SR) to approach SR with deep learning. We model the construction of an expression tree as traversing through a directed acyclic graph (DAG) so that GFlowNet can learn a stochastic policy to generate such trees sequentially. Enhanced with an adaptive reward baseline, our method is capable of generating a diverse set of best-fitting expressions. Notably, we observe that GFN-SR outperforms other SR algorithms in noisy data regimes, owing to its ability to learn a distribution of rewards over a space of candidate solutions. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted by the NeurIPS 2023 AI4Science Workshop

arXiv:2307.07575 [pdf, other]

A Quantitative Approach to Predicting Representational Learning and Performance in Neural Networks

Authors: Ryan Pyle, Sebastian Musslick, Jonathan D. Cohen, Ankit B. Patel

Abstract: A key property of neural networks (both biological and artificial) is how they learn to represent and manipulate input information in order to solve a task. Different types of representations may be suited to different types of tasks, making identifying and understanding learned representations a critical part of understanding and designing useful networks. In this paper, we introduce a new pseudo… ▽ More A key property of neural networks (both biological and artificial) is how they learn to represent and manipulate input information in order to solve a task. Different types of representations may be suited to different types of tasks, making identifying and understanding learned representations a critical part of understanding and designing useful networks. In this paper, we introduce a new pseudo-kernel based tool for analyzing and predicting learned representations, based only on the initial conditions of the network and the training curriculum. We validate the method on a simple test case, before demonstrating its use on a question about the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: 30 pages, 16 figures

arXiv:2206.05379 [pdf, other]

A Benchmark for Compositional Visual Reasoning

Authors: Aimen Zerroug, Mohit Vaishnav, Julien Colin, Sebastian Musslick, Thomas Serre

Abstract: A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with state-of-the-art systems now reaching human accuracy on some of these benchmarks. Yet, a major gap remains in terms of the sample efficiency with which humans and AI system… ▽ More A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with state-of-the-art systems now reaching human accuracy on some of these benchmarks. Yet, a major gap remains in terms of the sample efficiency with which humans and AI systems learn new visual reasoning tasks. Humans' remarkable efficiency at learning has been at least partially attributed to their ability to harness compositionality -- such that they can efficiently take advantage of previously gained knowledge when learning new tasks. Here, we introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards the development of more data-efficient learning algorithms. We take inspiration from fluidic intelligence and non-verbal reasoning tests and describe a novel method for creating compositions of abstract rules and associated image datasets at scale. Our proposed benchmark includes measures of sample efficiency, generalization and transfer across task rules, as well as the ability to leverage compositionality. We systematically evaluate modern neural architectures and find that, surprisingly, convolutional architectures surpass transformer-based architectures across all performance measures in most data regimes. However, all computational models are a lot less data efficient compared to humans even after learning informative visual representations using self-supervision. Overall, we hope that our challenge will spur interest in the development of neural architectures that can learn to harness compositionality toward more efficient learning. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2103.13939 [pdf, other]

Recovering Quantitative Models of Human Information Processing with Differentiable Architecture Search

Authors: Sebastian Musslick

Abstract: The integration of behavioral phenomena into mechanistic models of cognitive function is a fundamental staple of cognitive science. Yet, researchers are beginning to accumulate increasing amounts of data without having the temporal or monetary resources to integrate these data into scientific theories. We seek to overcome these limitations by incorporating existing machine learning techniques into… ▽ More The integration of behavioral phenomena into mechanistic models of cognitive function is a fundamental staple of cognitive science. Yet, researchers are beginning to accumulate increasing amounts of data without having the temporal or monetary resources to integrate these data into scientific theories. We seek to overcome these limitations by incorporating existing machine learning techniques into an open-source pipeline for the automated construction of quantitative models. This pipeline leverages the use of neural architecture search to automate the discovery of interpretable model architectures, and automatic differentiation to automate the fitting of model parameters to data. We evaluate the utility of these methods based on their ability to recover quantitative models of human information processing from synthetic data. We find that these methods are capable of recovering basic quantitative motifs from models of psychophysics, learning and decision making. We also highlight weaknesses of this framework and discuss future directions for their mitigation. △ Less

Submitted 17 May, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2007.10527 [pdf, ps, other]

Navigating the Trade-Off between Multi-Task Learning and Learning to Multitask in Deep Neural Networks

Authors: Sachin Ravi, Sebastian Musslick, Maia Hamin, Theodore L. Willke, Jonathan D. Cohen

Abstract: The terms multi-task learning and multitasking are easily confused. Multi-task learning refers to a paradigm in machine learning in which a network is trained on various related tasks to facilitate the acquisition of tasks. In contrast, multitasking is used to indicate, especially in the cognitive science literature, the ability to execute multiple tasks simultaneously. While multi-task learning e… ▽ More The terms multi-task learning and multitasking are easily confused. Multi-task learning refers to a paradigm in machine learning in which a network is trained on various related tasks to facilitate the acquisition of tasks. In contrast, multitasking is used to indicate, especially in the cognitive science literature, the ability to execute multiple tasks simultaneously. While multi-task learning exploits the discovery of common structure between tasks in the form of shared representations, multitasking is promoted by separating representations between tasks to avoid processing interference. Here, we build on previous work involving shallow networks and simple task settings suggesting that there is a trade-off between multi-task learning and multitasking, mediated by the use of shared versus separated representations. We show that the same tension arises in deep networks and discuss a meta-learning algorithm for an agent to manage this trade-off in an unfamiliar environment. We display through different experiments that the agent is able to successfully optimize its training strategy as a function of the environment. △ Less

Submitted 5 January, 2021; v1 submitted 20 July, 2020; originally announced July 2020.

arXiv:2007.03124 [pdf, other]

Efficiency of learning vs. processing: Towards a normative theory of multitasking

Authors: Yotam Sagiv, Sebastian Musslick, Yael Niv, Jonathan D. Cohen

Abstract: A striking limitation of human cognition is our inability to execute some tasks simultaneously. Recent work suggests that such limitations can arise from a fundamental tradeoff in network architectures that is driven by the sharing of representations between tasks: sharing promotes quicker learning, at the expense of interference while multitasking. From this perspective, multitasking failures mig… ▽ More A striking limitation of human cognition is our inability to execute some tasks simultaneously. Recent work suggests that such limitations can arise from a fundamental tradeoff in network architectures that is driven by the sharing of representations between tasks: sharing promotes quicker learning, at the expense of interference while multitasking. From this perspective, multitasking failures might reflect a preference for learning efficiency over multitasking capability. We explore this hypothesis by formulating an ideal Bayesian agent that maximizes expected reward by learning either shared or separate representations for a task set. We investigate the agent's behavior and show that over a large space of parameters the agent sacrifices long-run optimality (higher multitasking capacity) for short-term reward (faster learning). Furthermore, we construct a general mathematical framework in which rational choices between learning speed and processing efficiency can be examined for a variety of different task environments. △ Less

Submitted 6 July, 2020; originally announced July 2020.

arXiv:1708.03263 [pdf, other]

Topological limits to parallel processing capability of network architectures

Authors: Giovanni Petri, Sebastian Musslick, Biswadip Dey, Kayhan Ozcimder, David Turner, Nesreen K. Ahmed, Theodore Willke, Jonathan D. Cohen

Abstract: The ability to learn new tasks and generalize performance to others is one of the most remarkable characteristics of the human brain and of recent AI systems. The ability to perform multiple tasks simultaneously is also a signature characteristic of large-scale parallel architectures, that is evident in the human brain, and has been exploited effectively more traditional, massively parallel comput… ▽ More The ability to learn new tasks and generalize performance to others is one of the most remarkable characteristics of the human brain and of recent AI systems. The ability to perform multiple tasks simultaneously is also a signature characteristic of large-scale parallel architectures, that is evident in the human brain, and has been exploited effectively more traditional, massively parallel computational architectures. Here, we show that these two characteristics are in tension, reflecting a fundamental tradeoff between interactive parallelism that supports learning and generalization, and independent parallelism that supports processing efficiency through concurrent multitasking. We formally show that, while the maximum number of tasks that can be performed simultaneously grows linearly with network size, under realistic scenarios (e.g. in an unpredictable environment), the expected number that can be performed concurrently grows radically sub-linearly with network size. Hence, even modest reliance on shared representation strictly constrains the number of tasks that can be performed simultaneously, implying profound consequences for the development of artificial intelligence that optimally manages the tradeoff between learning and processing, and for understanding the human brains remarkably puzzling mix of sequential and parallel capabilities. △ Less

Submitted 10 November, 2020; v1 submitted 10 August, 2017; originally announced August 2017.

Comments: version 4. Added SIs, 33 pages total, 4 figures + 14 figures in SI, major edits to text

arXiv:1706.00085 [pdf, other]

A Formal Approach to Modeling the Cost of Cognitive Control

Authors: Kayhan Ozcimder, Biswadip Dey, Sebastian Musslick, Giovanni Petri, Nesreen K. Ahmed, Theodore L. Willke, Jonathan D. Cohen

Abstract: This paper introduces a formal method to model the level of demand on control when executing cognitive processes. The cost of cognitive control is parsed into an intensity cost which encapsulates how much additional input information is required so as to get the specified response, and an interaction cost which encapsulates the level of interference between individual processes in a network. We de… ▽ More This paper introduces a formal method to model the level of demand on control when executing cognitive processes. The cost of cognitive control is parsed into an intensity cost which encapsulates how much additional input information is required so as to get the specified response, and an interaction cost which encapsulates the level of interference between individual processes in a network. We develop a formal relationship between the probability of successful execution of desired processes and the control signals (additive control biases). This relationship is also used to specify optimal control policies to achieve a desired probability of activation for processes. We observe that there are boundary cases when finding such control policies which leads us to introduce the interaction cost. We show that the interaction cost is influenced by the relative strengths of individual processes, as well as the directionality of the underlying competition between processes. △ Less

Submitted 31 May, 2017; originally announced June 2017.

Comments: 6 pages, 3 figures, Conference paper

arXiv:1611.02400 [pdf, other]

A Graph-Theoretic Approach to Multitasking

Authors: Noga Alon, Jonathan D. Cohen, Biswadip Dey, Tom Griffiths, Sebastian Musslick, Kayhan Ozcimder, Daniel Reichman, Igor Shinkar, Tal Wagner

Abstract: A key feature of neural network architectures is their ability to support the simultaneous interaction among large numbers of units in the learning and processing of representations. However, how the richness of such interactions trades off against the ability of a network to simultaneously carry out multiple independent processes -- a salient limitation in many domains of human cognition -- remai… ▽ More A key feature of neural network architectures is their ability to support the simultaneous interaction among large numbers of units in the learning and processing of representations. However, how the richness of such interactions trades off against the ability of a network to simultaneously carry out multiple independent processes -- a salient limitation in many domains of human cognition -- remains largely unexplored. In this paper we use a graph-theoretic analysis of network architecture to address this question, where tasks are represented as edges in a bipartite graph $G=(A \cup B, E)$. We define a new measure of multitasking capacity of such networks, based on the assumptions that tasks that \emph{need} to be multitasked rely on independent resources, i.e., form a matching, and that tasks \emph{can} be multitasked without interference if they form an induced matching. Our main result is an inherent tradeoff between the multitasking capacity and the average degree of the network that holds \emph{regardless of the network architecture}. These results are also extended to networks of depth greater than $2$. On the positive side, we demonstrate that networks that are random-like (e.g., locally sparse) can have desirable multitasking properties. Our results shed light into the parallel-processing limitations of neural systems and provide insights that may be useful for the analysis and design of parallel architectures. △ Less

Submitted 9 June, 2017; v1 submitted 8 November, 2016; originally announced November 2016.

Showing 1–10 of 10 results for author: Musslick, S