-
CogBench: a large language model walks into a psychology lab
Authors:
Julian Coda-Forno,
Marcel Binz,
Jane X. Wang,
Eric Schulz
Abstract:
Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predominant focus on performance metrics in most benchmarks. This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. This novel a…
▽ More
Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predominant focus on performance metrics in most benchmarks. This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. This novel approach offers a toolkit for phenoty** LLMs' behavior. We apply CogBench to 35 LLMs, yielding a rich and diverse dataset. We analyze this data using statistical multilevel modeling techniques, accounting for the nested dependencies among fine-tuned versions of specific LLMs. Our study highlights the crucial role of model size and reinforcement learning from human feedback (RLHF) in improving performance and aligning with human behavior. Interestingly, we find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior. Finally, we explore the effects of prompt-engineering techniques. We discover that chain-of-thought prompting improves probabilistic reasoning, while take-a-step-back prompting fosters model-based behaviors.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
In-context learning agents are asymmetric belief updaters
Authors:
Johannes A. Schubert,
Akshay K. Jagadish,
Marcel Binz,
Eric Schulz
Abstract:
We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappe…
▽ More
We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns. Taken together, our results contribute to our understanding of how in-context learning works by highlighting that the framing of a problem significantly influences how learning occurs, a phenomenon also observed in human cognition.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks
Authors:
Akshay K. Jagadish,
Julian Coda-Forno,
Mirko Thalmann,
Eric Schulz,
Marcel Binz
Abstract:
Ecological rationality refers to the notion that humans are rational agents adapted to their environment. However, testing this theory remains challenging due to two reasons: the difficulty in defining what tasks are ecologically valid and building rational models for these tasks. In this work, we demonstrate that large language models can generate cognitive tasks, specifically category learning t…
▽ More
Ecological rationality refers to the notion that humans are rational agents adapted to their environment. However, testing this theory remains challenging due to two reasons: the difficulty in defining what tasks are ecologically valid and building rational models for these tasks. In this work, we demonstrate that large language models can generate cognitive tasks, specifically category learning tasks, that match the statistics of real-world tasks, thereby addressing the first challenge. We tackle the second challenge by deriving rational agents adapted to these tasks using the framework of meta-learning, leading to a class of models called ecologically rational meta-learned inference (ERMI). ERMI quantitatively explains human data better than seven other cognitive models in two different experiments. It additionally matches human behavior on a qualitative level: (1) it finds the same tasks difficult that humans find difficult, (2) it becomes more reliant on an exemplar-based strategy for assigning categories with learning, and (3) it generalizes to unseen stimuli in a human-like way. Furthermore, we show that ERMI's ecologically valid priors allow it to achieve state-of-the-art performance on the OpenML-CC18 classification benchmark.
△ Less
Submitted 28 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Predicting the Future with Simple World Models
Authors:
Tankred Saanum,
Peter Dayan,
Eric Schulz
Abstract:
World models can represent potentially high-dimensional pixel observations in compact latent spaces, making it tractable to model the dynamics of the environment. However, the latent dynamics inferred by these models may still be highly complex. Abstracting the dynamics of the environment with simple models can have several benefits. If the latent dynamics are simple, the model may generalize bett…
▽ More
World models can represent potentially high-dimensional pixel observations in compact latent spaces, making it tractable to model the dynamics of the environment. However, the latent dynamics inferred by these models may still be highly complex. Abstracting the dynamics of the environment with simple models can have several benefits. If the latent dynamics are simple, the model may generalize better to novel transitions, and discover useful latent representations of environment states. We propose a regularization scheme that simplifies the world model's latent dynamics. Our model, the Parsimonious Latent Space Model (PLSM), minimizes the mutual information between latent states and the dynamics that arise between them. This makes the dynamics softly state-invariant, and the effects of the agent's actions more predictable. We combine the PLSM with three different model classes used for i) future latent state prediction, ii) video prediction, and iii) planning. We find that our regularization improves accuracy, generalization, and performance in downstream tasks.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
How should the advent of large language models affect the practice of science?
Authors:
Marcel Binz,
Stephan Alaniz,
Adina Roskies,
Balazs Aczel,
Carl T. Bergstrom,
Colin Allen,
Daniel Schad,
Dirk Wulff,
Jevin D. West,
Qiong Zhang,
Richard M. Shiffrin,
Samuel J. Gershman,
Ven Popov,
Emily M. Bender,
Marco Marelli,
Matthew M. Botvinick,
Zeynep Akata,
Eric Schulz
Abstract:
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schu…
▽ More
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and over-hyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Visual cognition in multimodal large language models
Authors:
Luca M. Schulze Buschoff,
Elif Akata,
Matthias Bethge,
Eric Schulz
Abstract:
A chief goal of artificial intelligence is to build machines that think like people. Yet it has been argued that deep neural network architectures fail to accomplish this. Researchers have asserted these models' limitations in the domains of causal reasoning, intuitive physics, and intuitive psychology. Yet recent advancements, namely the rise of large language models, particularly those designed…
▽ More
A chief goal of artificial intelligence is to build machines that think like people. Yet it has been argued that deep neural network architectures fail to accomplish this. Researchers have asserted these models' limitations in the domains of causal reasoning, intuitive physics, and intuitive psychology. Yet recent advancements, namely the rise of large language models, particularly those designed for visual processing, have rekindled interest in the potential to emulate human-like cognitive abilities. This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning, and intuitive psychology. Through a series of controlled experiments, we investigate the extent to which these modern models grasp complex physical interactions, causal relationships, and intuitive understanding of others' preferences. Our findings reveal that, while these models demonstrate a notable proficiency in processing and interpreting visual data, they still fall short of human capabilities in these areas. The models exhibit a rudimentary understanding of physical laws and causal relationships, but their performance is hindered by a lack of deeper insights - a key aspect of human cognition. Furthermore, in tasks requiring an intuitive theory of mind, the models fail altogether. Our results emphasize the need for integrating more robust mechanisms for understanding causality, physical dynamics, and social cognition into modern-day, vision-based language models, and point out the importance of cognitively-inspired benchmarks.
△ Less
Submitted 24 January, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
The Acquisition of Physical Knowledge in Generative Neural Networks
Authors:
Luca M. Schulze Buschoff,
Eric Schulz,
Marcel Binz
Abstract:
As children grow older, they develop an intuitive understanding of the physical processes around them. Their physical understanding develops in stages, moving along developmental trajectories which have been mapped out extensively in previous empirical research. Here, we investigate how the learning trajectories of deep generative neural networks compare to children's developmental trajectories us…
▽ More
As children grow older, they develop an intuitive understanding of the physical processes around them. Their physical understanding develops in stages, moving along developmental trajectories which have been mapped out extensively in previous empirical research. Here, we investigate how the learning trajectories of deep generative neural networks compare to children's developmental trajectories using physical understanding as a testbed. We outline an approach that allows us to examine two distinct hypotheses of human development - stochastic optimization and complexity increase. We find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Radial Outflow Explains the Rotation Curves of Disk Galaxies
Authors:
Earl Schulz
Abstract:
The circular velocities of the inner region of disk galaxies are predicted by standard physics but velocities beyond the stellar disks are not consistent with Newtonian physics if the material there is in stable circular orbits. However, this material is not gravitationally bound and so does not trace the gravitational field in the way that is usually assumed. The gravitational attraction near the…
▽ More
The circular velocities of the inner region of disk galaxies are predicted by standard physics but velocities beyond the stellar disks are not consistent with Newtonian physics if the material there is in stable circular orbits. However, this material is not gravitationally bound and so does not trace the gravitational field in the way that is usually assumed. The gravitational attraction near the edge of a flattened mass distribution is significantly greater than that of an equal mass in a spherical distribution. The size of the effect depends on the specifics of the mass distribution but is greater than a factor of two for reasonable models. In fact, the circular velocity can exceed the escape velocity so that these galaxies are gravitationally unstable in way not previously considered and disk material is lost due to thermal escape, bars or other disturbances. The nearly constant velocity observed in the outer disk region has been interpreted to mean that the dynamical mass of galaxies is much larger than the observed mass. In fact, there is no great discrepancy and no need to invoke dark matter at these scales. The gravitational field of a disk galaxy is determined at all radii by the observed mass. In the region of the stellar disk, stars and gas move in nearly circular orbits at velocities consistent with the gravitational field. In the outer regions the gravitational force drops rapidly so that stars and gas move outward almost unaffected by the attraction of the host galaxy.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Guideline for Trustworthy Artificial Intelligence -- AI Assessment Catalog
Authors:
Maximilian Poretschkin,
Anna Schmitz,
Maram Akila,
Linara Adilova,
Daniel Becker,
Armin B. Cremers,
Dirk Hecker,
Sebastian Houben,
Michael Mock,
Julia Rosenzweig,
Joachim Sicking,
Elena Schulz,
Angelika Voss,
Stefan Wrobel
Abstract:
Artificial Intelligence (AI) has made impressive progress in recent years and represents a key technology that has a crucial impact on the economy and society. However, it is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards and are effectively protected against new AI risks. For instance, AI bears…
▽ More
Artificial Intelligence (AI) has made impressive progress in recent years and represents a key technology that has a crucial impact on the economy and society. However, it is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards and are effectively protected against new AI risks. For instance, AI bears the risk of unfair treatment of individuals when processing personal data e.g., to support credit lending or staff recruitment decisions. The emergence of these new risks is closely linked to the fact that the behavior of AI applications, particularly those based on Machine Learning (ML), is essentially learned from large volumes of data and is not predetermined by fixed programmed rules.
Thus, the issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications by stakeholders in politics, business and society. In addition, there is mutual agreement that the requirements for trustworthy AI, which are often described in an abstract way, must now be made clear and tangible. One challenge to overcome here relates to the fact that the specific quality criteria for an AI application depend heavily on the application context and possible measures to fulfill them in turn depend heavily on the AI technology used. Lastly, practical assessment procedures are needed to evaluate whether specific AI applications have been developed according to adequate quality standards. This AI assessment catalog addresses exactly this point and is intended for two target groups: Firstly, it provides developers with a guideline for systematically making their AI applications trustworthy. Secondly, it guides assessors and auditors on how to examine AI applications for trustworthiness in a structured way.
△ Less
Submitted 20 June, 2023;
originally announced July 2023.
-
Language Aligned Visual Representations Predict Human Behavior in Naturalistic Learning Tasks
Authors:
Can Demircan,
Tankred Saanum,
Leonardo Pettini,
Marcel Binz,
Blazej M Baczkowski,
Paula Kaanders,
Christian F Doeller,
Mona M Garvert,
Eric Schulz
Abstract:
Humans possess the ability to identify and generalize relevant features of natural objects, which aids them in various situations. To investigate this phenomenon and determine the most effective representations for predicting human behavior, we conducted two experiments involving category learning and reward learning. Our experiments used realistic images as stimuli, and participants were tasked w…
▽ More
Humans possess the ability to identify and generalize relevant features of natural objects, which aids them in various situations. To investigate this phenomenon and determine the most effective representations for predicting human behavior, we conducted two experiments involving category learning and reward learning. Our experiments used realistic images as stimuli, and participants were tasked with making accurate decisions based on novel stimuli for all trials, thereby necessitating generalization. In both tasks, the underlying rules were generated as simple linear functions using stimulus dimensions extracted from human similarity judgments. Notably, participants successfully identified the relevant stimulus features within a few trials, demonstrating effective generalization. We performed an extensive model comparison, evaluating the trial-by-trial predictive accuracy of diverse deep learning models' representations of human choices. Intriguingly, representations from models trained on both text and image data consistently outperformed models trained solely on images, even surpassing models using the features that generated the task itself. These findings suggest that language-aligned visual representations possess sufficient richness to describe human generalization in naturalistic settings and emphasize the role of language in sha** human cognition.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Turning large language models into cognitive models
Authors:
Marcel Binz,
Eric Schulz
Abstract:
Large language models are powerful systems that excel at many tasks, ranging from translation to mathematical reasoning. Yet, at the same time, these models often show unhuman-like characteristics. In the present paper, we address this gap and ask whether large language models can be turned into cognitive models. We find that -- after finetuning them on data from psychological experiments -- these…
▽ More
Large language models are powerful systems that excel at many tasks, ranging from translation to mathematical reasoning. Yet, at the same time, these models often show unhuman-like characteristics. In the present paper, we address this gap and ask whether large language models can be turned into cognitive models. We find that -- after finetuning them on data from psychological experiments -- these models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains. In addition, we show that their representations contain the information necessary to model behavior on the level of individual subjects. Finally, we demonstrate that finetuning on multiple tasks enables large language models to predict human behavior in a previously unseen task. Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models, thereby opening up new research directions that could transform cognitive psychology and the behavioral sciences as a whole.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Reinforcement Learning with Simple Sequence Priors
Authors:
Tankred Saanum,
Noémi Éltető,
Peter Dayan,
Marcel Binz,
Eric Schulz
Abstract:
Everything else being equal, simpler models should be preferred over more complex ones. In reinforcement learning (RL), simplicity is typically quantified on an action-by-action basis -- but this timescale ignores temporal regularities, like repetitions, often present in sequential strategies. We therefore propose an RL algorithm that learns to solve tasks with sequences of actions that are compre…
▽ More
Everything else being equal, simpler models should be preferred over more complex ones. In reinforcement learning (RL), simplicity is typically quantified on an action-by-action basis -- but this timescale ignores temporal regularities, like repetitions, often present in sequential strategies. We therefore propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible. We explore two possible sources of simple action sequences: Sequences that can be learned by autoregressive models, and sequences that are compressible with off-the-shelf data compression algorithms. Distilling these preferences into sequence priors, we derive a novel information-theoretic objective that incentivizes agents to learn policies that maximize rewards while conforming to these priors. We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the DeepMind Control Suite. These priors also produce a powerful information-regularized agent that is robust to noisy observations and can perform open-loop control.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Playing repeated games with Large Language Models
Authors:
Elif Akata,
Lion Schulz,
Julian Coda-Forno,
Seong Joon Oh,
Matthias Bethge,
Eric Schulz
Abstract:
Large Language Models (LLMs) are transforming society and permeating into diverse applications. As a result, LLMs will frequently interact with us and other agents. It is, therefore, of great societal value to understand how LLMs behave in interactive social settings. Here, we propose to use behavioral game theory to study LLM's cooperation and coordination behavior. To do so, we let different LLM…
▽ More
Large Language Models (LLMs) are transforming society and permeating into diverse applications. As a result, LLMs will frequently interact with us and other agents. It is, therefore, of great societal value to understand how LLMs behave in interactive social settings. Here, we propose to use behavioral game theory to study LLM's cooperation and coordination behavior. To do so, we let different LLMs (GPT-3, GPT-3.5, and GPT-4) play finitely repeated games with each other and with other, human-like strategies. Our results show that LLMs generally perform well in such tasks and also uncover persistent behavioral signatures. In a large set of two players-two strategies games, we find that LLMs are particularly good at games where valuing their own self-interest pays off, like the iterated Prisoner's Dilemma family. However, they behave sub-optimally in games that require coordination. We, therefore, further focus on two games from these distinct families. In the canonical iterated Prisoner's Dilemma, we find that GPT-4 acts particularly unforgivingly, always defecting after another agent has defected only once. In the Battle of the Sexes, we find that GPT-4 cannot match the behavior of the simple convention to alternate between options. We verify that these behavioral signatures are stable across robustness checks. Finally, we show how GPT-4's behavior can be modified by providing further information about the other player as well as by asking it to predict the other player's actions before making a choice. These results enrich our understanding of LLM's social behavior and pave the way for a behavioral game theory for machines.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Authors:
Leonard Salewski,
Stephan Alaniz,
Isabel Rio-Torto,
Eric Schulz,
Zeynep Akata
Abstract:
In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social ident…
▽ More
In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social identity or domain expertise. In a multi-armed bandit task, we find that LLMs pretending to be children of different ages recover human-like developmental stages of exploration. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts. Finally, we test whether LLMs' impersonations are complementary to visual information when describing different categories. We find that impersonation can improve performance: an LLM prompted to be a bird expert describes birds better than one prompted to be a car expert. However, impersonation can also uncover LLMs' biases: an LLM prompted to be a man describes cars better than one prompted to be a woman. These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.
△ Less
Submitted 26 November, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Meta-in-context learning in large language models
Authors:
Julian Coda-Forno,
Marcel Binz,
Zeynep Akata,
Matthew Botvinick,
Jane X. Wang,
Eric Schulz
Abstract:
Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learnin…
▽ More
Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. We coin this phenomenon meta-in-context learning. Looking at two idealized domains, a one-dimensional regression task and a two-armed bandit task, we show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks. Furthermore, we find that meta-in-context learning modifies the in-context learning strategies of such models. Finally, we extend our approach to a benchmark of real-world regression problems where we observe competitive performance to traditional learning algorithms. Taken together, our work improves our understanding of in-context learning and paves the way toward adapting large language models to the environment they are applied purely through meta-in-context learning rather than traditional finetuning.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Inducing anxiety in large language models increases exploration and bias
Authors:
Julian Coda-Forno,
Kristin Witte,
Akshay K. Jagadish,
Marcel Binz,
Zeynep Akata,
Eric Schulz
Abstract:
Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these m…
▽ More
Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Meta-Learned Models of Cognition
Authors:
Marcel Binz,
Ishita Dasgupta,
Akshay Jagadish,
Matthew Botvinick,
Jane X. Wang,
Eric Schulz
Abstract:
Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize…
▽ More
Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize previous work in this field and establish such a research program. We rely on three key pillars to accomplish this goal. We first point out that meta-learning can be used to construct Bayes-optimal learning algorithms. This result not only implies that any behavioral phenomenon that can be explained by a Bayesian model can also be explained by a meta-learned model but also allows us to draw strong connections to the rational analysis of cognition. We then discuss several advantages of the meta-learning framework over traditional Bayesian methods. In particular, we argue that meta-learning can be applied to situations where Bayesian inference is impossible and that it enables us to make rational models of cognition more realistic, either by incorporating limited computational resources or neuroscientific knowledge. Finally, we reexamine prior studies from psychology and neuroscience that have applied meta-learning and put them into the context of these new insights. In summary, our work highlights that meta-learning considerably extends the scope of rational analysis and thereby of cognitive theories more generally.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Learning Parsimonious Dynamics for Generalization in Reinforcement Learning
Authors:
Tankred Saanum,
Eric Schulz
Abstract:
Humans are skillful navigators: We aptly maneuver through new places, realize when we are back at a location we have seen before, and can even conceive of shortcuts that go through parts of our environments we have never visited. Current methods in model-based reinforcement learning on the other hand struggle with generalizing about environment dynamics out of the training distribution. We argue t…
▽ More
Humans are skillful navigators: We aptly maneuver through new places, realize when we are back at a location we have seen before, and can even conceive of shortcuts that go through parts of our environments we have never visited. Current methods in model-based reinforcement learning on the other hand struggle with generalizing about environment dynamics out of the training distribution. We argue that two principles can help bridge this gap: latent learning and parsimonious dynamics. Humans tend to think about environment dynamics in simple terms -- we reason about trajectories not in reference to what we expect to see along a path, but rather in an abstract latent space, containing information about the places' spatial coordinates. Moreover, we assume that moving around in novel parts of our environment works the same way as in parts we are familiar with. These two principles work together in tandem: it is in the latent space that the dynamics show parsimonious characteristics. We develop a model that learns such parsimonious dynamics. Using a variational objective, our model is trained to reconstruct experienced transitions in a latent space using locally linear transformations, while encouraged to invoke as few distinct transformations as possible. Using our framework, we demonstrate the utility of learning parsimonious latent dynamics models in a range of policy learning and planning tasks.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Stochastic Gradient Descent Captures How Children Learn About Physics
Authors:
Luca M. Schulze Buschoff,
Eric Schulz,
Marcel Binz
Abstract:
As children grow older, they develop an intuitive understanding of the physical processes around them. They move along developmental trajectories, which have been mapped out extensively in previous empirical research. We investigate how children's developmental trajectories compare to the learning trajectories of artificial systems. Specifically, we examine the idea that cognitive development resu…
▽ More
As children grow older, they develop an intuitive understanding of the physical processes around them. They move along developmental trajectories, which have been mapped out extensively in previous empirical research. We investigate how children's developmental trajectories compare to the learning trajectories of artificial systems. Specifically, we examine the idea that cognitive development results from some form of stochastic optimization procedure. For this purpose, we train a modern generative neural network model using stochastic gradient descent. We then use methods from the developmental psychology literature to probe the physical understanding of this model at different degrees of optimization. We find that the model's learning trajectory captures the developmental trajectories of children, thereby providing support to the idea of development as stochastic optimization.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Minimal $\ell^2$ Norm Discrete Multiplier Method
Authors:
Erick Schulz,
Andy T. S. Wan
Abstract:
We introduce an extension to the Discrete Multiplier Method (DMM), called Minimal $\ell_2$ Norm Discrete Multiplier Method (MN-DMM), where conservative finite difference schemes for dynamical systems with multiple conserved quantities are constructed procedurally, instead of analytically as in the original DMM. For large dynamical systems with multiple conserved quantities, MN-DMM alleviates diffi…
▽ More
We introduce an extension to the Discrete Multiplier Method (DMM), called Minimal $\ell_2$ Norm Discrete Multiplier Method (MN-DMM), where conservative finite difference schemes for dynamical systems with multiple conserved quantities are constructed procedurally, instead of analytically as in the original DMM. For large dynamical systems with multiple conserved quantities, MN-DMM alleviates difficulties that can arise with the original DMM at constructing conservative schemes which satisfies the discrete multiplier conditions. In particular, MN-DMM utilizes the right Moore-Penrose pseudoinverse of the discrete multiplier matrix to solve an underdetermined least-square problem associated with the discrete multiplier conditions. We prove consistency and conservative properties of the MN-DMM schemes. We also introduce two variants - Mixed MN-DMM and MN-DMM using Singular Value Decomposition - and discuss their usage in practice. Moreover, numerical examples on various problems arising from the mathematical sciences are shown to demonstrate the wide applicability of MN-DMM and its relative ease of implementation compared to the original DMM.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
Using cognitive psychology to understand GPT-3
Authors:
Marcel Binz,
Eric Schulz
Abstract:
We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: it solves vignette-based tasks similarly or better than human subjects, is able to…
▽ More
We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: it solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multi-armed bandit task, and shows signatures of model-based reinforcement learning. Yet we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. These results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Traces for Hilbert Complexes
Authors:
Ralf Hiptmair,
Dirk Pauly,
Erick Schulz
Abstract:
We study a new notion of trace operators and trace spaces for abstract Hilbert complexes. We introduce trace spaces as quotient spaces/annihilators. We characterize the kernels and images of the related trace operators and discuss duality relationships between trace spaces. We elaborate that many properties of the classical boundary traces associated with the Euclidean de Rham complex on bounded L…
▽ More
We study a new notion of trace operators and trace spaces for abstract Hilbert complexes. We introduce trace spaces as quotient spaces/annihilators. We characterize the kernels and images of the related trace operators and discuss duality relationships between trace spaces. We elaborate that many properties of the classical boundary traces associated with the Euclidean de Rham complex on bounded Lipschitz domains are rooted in the general structure of Hilbert complexes. We arrive at abstract trace Hilbert complexes that can be formulated using quotient spaces/annihilators. We show that, if a Hilbert complex admits stable "regular decompositions" with compact lifting operators, then the associated trace Hilbert complex is Fredholm. Incarnations of abstract concepts and results in the concrete case of the de Rham complex in three-dimensional Euclidean space will be discussed throughout.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Modeling Human Exploration Through Resource-Rational Reinforcement Learning
Authors:
Marcel Binz,
Eric Schulz
Abstract:
Equip** artificial agents with useful exploration mechanisms remains a challenge to this day. Humans, on the other hand, seem to manage the trade-off between exploration and exploitation effortlessly. In the present article, we put forward the hypothesis that they accomplish this by making optimal use of limited computational resources. We study this hypothesis by meta-learning reinforcement lea…
▽ More
Equip** artificial agents with useful exploration mechanisms remains a challenge to this day. Humans, on the other hand, seem to manage the trade-off between exploration and exploitation effortlessly. In the present article, we put forward the hypothesis that they accomplish this by making optimal use of limited computational resources. We study this hypothesis by meta-learning reinforcement learning algorithms that sacrifice performance for a shorter description length (defined as the number of bits required to implement the given algorithm). The emerging class of models captures human exploration behavior better than previously considered approaches, such as Boltzmann exploration, upper confidence bound algorithms, and Thompson sampling. We additionally demonstrate that changing the description length in our class of models produces the intended effects: reducing description length captures the behavior of brain-lesioned patients while increasing it mirrors cognitive development during adolescence.
△ Less
Submitted 14 November, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety
Authors:
Sebastian Houben,
Stephanie Abrecht,
Maram Akila,
Andreas Bär,
Felix Brockherde,
Patrick Feifel,
Tim Fingscheidt,
Sujan Sai Gannamaneni,
Seyed Eghbal Ghobadi,
Ahmed Hammam,
Anselm Haselhoff,
Felix Hauser,
Christian Heinzemann,
Marco Hoffmann,
Nikhil Kapoor,
Falk Kappel,
Marvin Klingner,
Jan Kronenberger,
Fabian Küppers,
Jonas Löhdefink,
Michael Mlynarski,
Michael Mock,
Firas Mualla,
Svetlana Pavlitskaya,
Maximilian Poretschkin
, et al. (16 additional authors not shown)
Abstract:
The use of deep neural networks (DNNs) in safety-critical applications like mobile health and autonomous driving is challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability to problems with malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from safety conce…
▽ More
The use of deep neural networks (DNNs) in safety-critical applications like mobile health and autonomous driving is challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability to problems with malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from safety concerns. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. This work provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our paper addresses both machine learning experts and safety engineers: The former ones might profit from the broad range of machine learning topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern ML methods. We moreover hope that our contribution fuels discussions on desiderata for ML systems and strategies on how to propel existing approaches accordingly.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Plants Don't Walk on the Street: Common-Sense Reasoning for Reliable Semantic Segmentation
Authors:
Linara Adilova,
Elena Schulz,
Maram Akila,
Sebastian Houben,
Jan David Schneider,
Fabian Hueger,
Tim Wirtz
Abstract:
Data-driven sensor interpretation in autonomous driving can lead to highly implausible predictions as can most of the time be verified with common-sense knowledge. However, learning common knowledge only from data is hard and approaches for knowledge integration are an active research area. We propose to use a partly human-designed, partly learned set of rules to describe relations between objects…
▽ More
Data-driven sensor interpretation in autonomous driving can lead to highly implausible predictions as can most of the time be verified with common-sense knowledge. However, learning common knowledge only from data is hard and approaches for knowledge integration are an active research area. We propose to use a partly human-designed, partly learned set of rules to describe relations between objects of a traffic scene on a high level of abstraction. In doing so, we improve and robustify existing deep neural networks consuming low-level sensor information. We present an initial study adapting the well-established Probabilistic Soft Logic (PSL) framework to validate and improve on the problem of semantic segmentation. We describe in detail how we integrate common knowledge into the segmentation pipeline using PSL and verify our approach in a set of experiments demonstrating the increase in robustness against several severe image distortions applied to the A2D2 autonomous driving data set.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
First-Kind Boundary Integral Equations for the Dirac Operator in 3D Lipschitz Domains
Authors:
Erick Schulz,
Ralf Hiptmair
Abstract:
We develop novel first-kind boundary integral equations for Euclidean Dirac operators in 3D Lipschitz domains comprising square-integrable potentials and involving only weakly singular kernels. Generalized Garding inequalities are derived and we establish that the obtained boundary integral operators are Fredholm of index zero. Their finite dimensional kernels are characterized and we show that th…
▽ More
We develop novel first-kind boundary integral equations for Euclidean Dirac operators in 3D Lipschitz domains comprising square-integrable potentials and involving only weakly singular kernels. Generalized Garding inequalities are derived and we establish that the obtained boundary integral operators are Fredholm of index zero. Their finite dimensional kernels are characterized and we show that their dimension is equal to the number of topological invariants of the domain's boundary, in other words to the sum of its Betti numbers. This is explained by the fundamental discovery that the associated bilinear forms agree with those induced by the 2D surface Dirac operators for H-1/2 surface de Rham Hilbert complexes whose underlying inner-products are the non-local inner products defined through the classical single-layer boundary integral operators for the Laplacian. Decay conditions for well-posedness in natural energy spaces of the Dirac system in unbounded exterior domains are also presented.
△ Less
Submitted 13 October, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Div-Curl Problems and $\mathbf{H}^1$-regular Stream Functions in 3D Lipschitz Domains
Authors:
Matthias Kirchhart,
Erick Schulz
Abstract:
We consider the problem of recovering the divergence-free velocity field ${\mathbf U}\in\mathbf{L}^2(Ω)$ of a given vorticity ${\mathbf F}=\mathrm{curl}\,{\mathbf U}$ on a bounded Lipschitz domain $Ω\subset\mathbb{R}^3$. To that end, we solve the "div-curl problem" for a given ${\mathbf F}\in{\mathbf H}^{-1}(Ω)$. The solution is expressed in terms of a vector potential (or stream function)…
▽ More
We consider the problem of recovering the divergence-free velocity field ${\mathbf U}\in\mathbf{L}^2(Ω)$ of a given vorticity ${\mathbf F}=\mathrm{curl}\,{\mathbf U}$ on a bounded Lipschitz domain $Ω\subset\mathbb{R}^3$. To that end, we solve the "div-curl problem" for a given ${\mathbf F}\in{\mathbf H}^{-1}(Ω)$. The solution is expressed in terms of a vector potential (or stream function) ${\mathbf A}\in{\mathbf H}^1(Ω)$ such that ${\mathbf U}=\mathrm{curl}\,{\mathbf A}$. After discussing existence and uniqueness of solutions and associated vector potentials, we propose a well-posed construction for the stream function. A numerical method based on this construction is presented, and experiments confirm that the resulting approximations display higher regularity than those of another common approach.
△ Less
Submitted 3 February, 2021; v1 submitted 24 May, 2020;
originally announced May 2020.
-
Spurious Resonances in Coupled Domain-Boundary Variational Formulations of Transmission Problems in Electromagnetism and Acoustics
Authors:
Erick Schulz,
Ralf Hiptmair
Abstract:
We develop a framework shedding light on common features of coupled variational formulations arising in electromagnetic scattering and acoustics. We show that spurious resonances haunting coupled domain-boundary formulations based on direct boundary integral equations of the first kind originate from the formal structure of their Calderon identities. Using this observation, the kernel of the coupl…
▽ More
We develop a framework shedding light on common features of coupled variational formulations arising in electromagnetic scattering and acoustics. We show that spurious resonances haunting coupled domain-boundary formulations based on direct boundary integral equations of the first kind originate from the formal structure of their Calderon identities. Using this observation, the kernel of the coupled problem is characterized explicitly and we show that it completely vanishes under the exterior representation formula.
△ Less
Submitted 27 February, 2022; v1 submitted 31 March, 2020;
originally announced March 2020.
-
Coupled Domain-Boundary Variational Formulations For Hodge-Helmholtz Operators
Authors:
Erick Schulz,
Ralf Hiptmair
Abstract:
We couple the mixed variational problem for the generalized Hodge-Helmholtz or Hodge-Laplace equation posed on a bounded three-dimensional Lipschitz domain with the first-kind boundary integral equation arising from the latter when constant coefficients are assumed in the unbounded complement. Recently developed Calderon projectors for the relevant boundary integral operators are used to perform a…
▽ More
We couple the mixed variational problem for the generalized Hodge-Helmholtz or Hodge-Laplace equation posed on a bounded three-dimensional Lipschitz domain with the first-kind boundary integral equation arising from the latter when constant coefficients are assumed in the unbounded complement. Recently developed Calderon projectors for the relevant boundary integral operators are used to perform a symmetric coupling. We prove stability of the coupled problem away from resonant frequencies by establishing a generalized Garding inequality (T-coercivity). The resulting system of equations describes the scattering of monochromatic electromagnetic waves at a bounded inhomogeneous isotropic body possibly having a "rough" surface. The low-frequency robustness of the potential formulation of Maxwell's equations makes this model a promising starting point for Galerkin discretization.
△ Less
Submitted 13 October, 2021; v1 submitted 27 March, 2020;
originally announced March 2020.
-
The aqueous Triton X-100 - Dodecyltrimethylammonium bromide micellar mixed system. Experimental results and thermodynamic analysis
Authors:
Patricio Serafini,
Marcos Fernández-Leyes,
Jhon M. Sánchez,
Romina B. Pereyra,
Erica P. Schulz,
Gillermo A. Durand,
Pablo C. Schulz,
Hernán A. Ritacco
Abstract:
The micellization process of the aqueous mixed system Triton X-100 (TX100)-Dodecyltrimethylammonium Bromide (DTAB) was studied with a battery of procedures: surface tension, static and dynamic light scattering and ion-selective electrodes. Results were also analysed with two thermodynamic procedures. The system shows some changes in its behaviour with changing the mole fraction of DTAB,…
▽ More
The micellization process of the aqueous mixed system Triton X-100 (TX100)-Dodecyltrimethylammonium Bromide (DTAB) was studied with a battery of procedures: surface tension, static and dynamic light scattering and ion-selective electrodes. Results were also analysed with two thermodynamic procedures. The system shows some changes in its behaviour with changing the mole fraction of DTAB, $α_{DTAB}$, in the whole surfactant mixture. For $ α_{DTAB} < 0.40$ micelles are predominantly TX100 with scarce solubilized DTA+ ions, and TX100 acts as a nearly ideal solvent. In the range $0.50 < α_{DTAB} < 0.75$ it seems that none of the components acts as a solvent, and above $α_{DTAB} > 0.75$ there are abrupt changes in the size and electrophoretic mobility of micelles. These phenomena have been interpreted in the light of the thermodynamic results and some TX100-ionic surfactant mixtures of literature.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Extensions of the Heisenberg group by two-parameter groups of dilations
Authors:
Eckart Schulz,
Adisak Seesanea
Abstract:
We introduce extensions of the multidimensional Heisenberg group $\mathbb{H}^n$ by two-parameter groups of dilations, and then classify the extended groups up to isomorphism, by employing Lie algebra techniques. We show that the groups are isomorphic to subgroups of the symplectic group $\textit{Sp(}n+1,\mathbb{R})$ as well as subgroups of the affine group $\textit{Aff}(n+1,\mathbb{R})$. Thus, the…
▽ More
We introduce extensions of the multidimensional Heisenberg group $\mathbb{H}^n$ by two-parameter groups of dilations, and then classify the extended groups up to isomorphism, by employing Lie algebra techniques. We show that the groups are isomorphic to subgroups of the symplectic group $\textit{Sp(}n+1,\mathbb{R})$ as well as subgroups of the affine group $\textit{Aff}(n+1,\mathbb{R})$. Thus, they possess both, a metaplectic and a wavelet representation. Moreover, the metaplectic representation splits into a sum of two subrepresentations which both are equivalent to the same subrepresentation of the wavelet representation.
△ Less
Submitted 26 April, 2018;
originally announced April 2018.
-
Nanomechanics of CNTs for Sensor Application
Authors:
C. Wagner,
S. Hartmann,
B. Wunderle,
J. Schuster,
S. E. Schulz,
T. Gessner
Abstract:
A nanoscopic simulation for an acceleration sensor is aimed based on the piezoresistive effect of carbon nanotubes (CNTs). Therefore, a compact model is built from density functional theory (DFT), compared with results of molecular dynamics (MD) that describes the mechanics of carbon nanotubes in a parameterized way. The results for the interesting kind of CNTs [(6,3) and (7,4)] within the two app…
▽ More
A nanoscopic simulation for an acceleration sensor is aimed based on the piezoresistive effect of carbon nanotubes (CNTs). Therefore, a compact model is built from density functional theory (DFT), compared with results of molecular dynamics (MD) that describes the mechanics of carbon nanotubes in a parameterized way. The results for the interesting kind of CNTs [(6,3) and (7,4)] within the two approaches agree in a satisfying way, when DFT-calculations are performed with atomic configurations obtained by MD geometry optimization. Geometry optimization yields the Poisson ratio for CNTs. Thus, values from MD and DFT are compared. The simulation finally aims the modeling of the conductive behavior of CNTs when strain is applied, but this needs further verification. Here, we present the prediction of the tight binding model for suitable CNTs.
△ Less
Submitted 3 July, 2017;
originally announced July 2017.
-
Interaction between carbon nanotubes and metals: Electronic properties, stability, and sensing
Authors:
F. Fuchs,
A. Zienert,
C. Wagner,
J. Schuster,
S. E. Schulz
Abstract:
The interactions between carbon nanotubes (CNTs) and metal adatoms as well as metal contacts are studied by means of ab initio electronic structure calculations. We show that the electronic properties of a semiconducting (8,4) CNT can be modified by small amounts of Pd adatoms. Such a decoration conserves the piezoelectric properties of the CNT. Besides the electronic influence, the stability of a…
▽ More
The interactions between carbon nanotubes (CNTs) and metal adatoms as well as metal contacts are studied by means of ab initio electronic structure calculations. We show that the electronic properties of a semiconducting (8,4) CNT can be modified by small amounts of Pd adatoms. Such a decoration conserves the piezoelectric properties of the CNT. Besides the electronic influence, the stability of a single adatom, which is of big importance for future technology applications, is investigated as well. We find only small energy barriers for the diffusion of a Pd adatom on the CNT surface. Thus, single Pd adatoms will be mobile at room temperature. Finally we present results for the interaction between a metallic (6,0) CNT and metal surfaces. Binding energies and distances for Al, Cu, Pd, Ag, Pt, and Au are discussed and compared, showing remarkable agreement between the interaction of single metal atoms and metal surfaces with CNTs.
△ Less
Submitted 3 July, 2017;
originally announced July 2017.
-
Scaling Relations of Mass, Velocity and Radius for Disk Galaxies
Authors:
Earl J Schulz
Abstract:
I demonstrate four tight correlations of total baryonic mass, velocity and radius for a set of nearby disk galaxies: the Mass-Velocity relation $ Mt \propto V^4$; the Mass-Radius relation $ Mt \propto R^2$; the Radius-Velocity relation $R \propto V^2$; and the Mass-Radius-Velocity relation $ Mt \propto R V^2$. The Mass-Velocity relation is the familiar Baryonic Tully-Fisher relation(BTFR) and vers…
▽ More
I demonstrate four tight correlations of total baryonic mass, velocity and radius for a set of nearby disk galaxies: the Mass-Velocity relation $ Mt \propto V^4$; the Mass-Radius relation $ Mt \propto R^2$; the Radius-Velocity relation $R \propto V^2$; and the Mass-Radius-Velocity relation $ Mt \propto R V^2$. The Mass-Velocity relation is the familiar Baryonic Tully-Fisher relation(BTFR) and versions of the other three relations, using magnitude rather than baryonic mass, are also well known. These four observed correlations follow from a pair of more fundamental relations. First, the centripetal acceleration at the edge of the stellar disk is proportional to the acceleration predicted by Newtonian physics and secondly, this acceleration is a constant which is related to Milgrom's constant. The two primary relations can be manipulated algebraically to generate the four observed correlations and allow little room for dark matter inside the radius of the stellar disk. The primary relations do not explain the velocity of the outer gaseous disks of spiral galaxies which do not trace the Newtonian gravitational field of the observed matter.
△ Less
Submitted 18 January, 2017;
originally announced January 2017.
-
Convergence of Discrete Exterior Calculus Approximations for Poisson Problems
Authors:
Erick Schulz,
Gantumur Tsogtgerel
Abstract:
Discrete exterior calculus (DEC) is a framework for constructing discrete versions of exterior differential calculus objects, and is widely used in computer graphics, computational topology, and discretizations of the Hodge-Laplace operator and other related partial differential equations. However, a rigorous convergence analysis of DEC has always been lacking; as far as we are aware, the only con…
▽ More
Discrete exterior calculus (DEC) is a framework for constructing discrete versions of exterior differential calculus objects, and is widely used in computer graphics, computational topology, and discretizations of the Hodge-Laplace operator and other related partial differential equations. However, a rigorous convergence analysis of DEC has always been lacking; as far as we are aware, the only convergence proof of DEC so far appeared is for the scalar Poisson problem in two dimensions, and it is based on reinterpreting the discretization as a finite element method. Moreover, even in two dimensions, there have been some puzzling numerical experiments reported in the literature, apparently suggesting that there is convergence without consistency. In this paper, we develop a general independent framework for analyzing issues such as convergence of DEC without relying on theories of other discretization methods, and demonstrate its usefulness by establishing convergence results for DEC beyond the Poisson problem in two dimensions. Namely, we prove that DEC solutions to the scalar Poisson problem in arbitrary dimensions converge pointwise to the exact solution at least linearly with respect to the mesh size. We illustrate the findings by various numerical experiments, which show that the convergence is in fact of second order when the solution is sufficiently regular. The problems of explaining the second order convergence, and of proving convergence for general p-forms remain open.
△ Less
Submitted 4 June, 2018; v1 submitted 12 November, 2016;
originally announced November 2016.
-
Simple trees in complex forests: Growing Take The Best by Approximate Bayesian Computation
Authors:
Eric Schulz,
Maarten Speekenbrink,
Björn Meder
Abstract:
How can heuristic strategies emerge from smaller building blocks? We propose Approximate Bayesian Computation as a computational solution to this problem. As a first proof of concept, we demonstrate how a heuristic decision strategy such as Take The Best (TTB) can be learned from smaller, probabilistically updated building blocks. Based on a self-reinforcing sampling scheme, different building blo…
▽ More
How can heuristic strategies emerge from smaller building blocks? We propose Approximate Bayesian Computation as a computational solution to this problem. As a first proof of concept, we demonstrate how a heuristic decision strategy such as Take The Best (TTB) can be learned from smaller, probabilistically updated building blocks. Based on a self-reinforcing sampling scheme, different building blocks are combined and, over time, tree-like non-compensatory heuristics emerge. This new algorithm, coined Approximately Bayesian Computed Take The Best (ABC-TTB), is able to recover a data set that was generated by TTB, leads to sensible inferences about cue importance and cue directions, can outperform traditional TTB, and allows to trade-off performance and computational effort explicitly.
△ Less
Submitted 14 May, 2016; v1 submitted 5 May, 2016;
originally announced May 2016.
-
Better safe than sorry: Risky function exploitation through safe optimization
Authors:
Eric Schulz,
Quentin J. M. Huys,
Dominik R. Bach,
Maarten Speekenbrink,
Andreas Krause
Abstract:
Exploration-exploitation of functions, that is learning and optimizing a map** between inputs and expected outputs, is ubiquitous to many real world situations. These situations sometimes require us to avoid certain outcomes at all cost, for example because they are poisonous, harmful, or otherwise dangerous. We test participants' behavior in scenarios in which they have to find the optimum of a…
▽ More
Exploration-exploitation of functions, that is learning and optimizing a map** between inputs and expected outputs, is ubiquitous to many real world situations. These situations sometimes require us to avoid certain outcomes at all cost, for example because they are poisonous, harmful, or otherwise dangerous. We test participants' behavior in scenarios in which they have to find the optimum of a function while at the same time avoid outputs below a certain threshold. In two experiments, we find that Safe-Optimization, a Gaussian Process-based exploration-exploitation algorithm, describes participants' behavior well and that participants seem to care firstly whether a point is safe and then try to pick the optimal point from all such safe points. This means that their trade-off between exploration and exploitation can be seen as an intelligent, approximate, and homeostasis-driven strategy.
△ Less
Submitted 14 May, 2016; v1 submitted 2 February, 2016;
originally announced February 2016.
-
Reaction-in-Flight Neutrons as a Test of Stop** Power in Degenerate Plasmas
Authors:
A. C. Hayes,
Gerard Jungman,
A. E. Schulz,
M. Boswell,
M. M. Fowler,
G. Grim,
A. Klein,
R. S. Rundberg,
J. B. Wilhelmy,
D. Wilson
Abstract:
We present the first measurements of reaction-in-flight (RIF) neutrons in an inertial confinement fusion system. The experiments were carried out at the National Ignition Facility, using both Low Foot and High Foot drives and cryogenic plastic capsules. In both cases, the high-energy RIF ($E_n>$ 15 MeV) component of the neutron spectrum was found to be about $10^{-4}$ of the total. The majority of…
▽ More
We present the first measurements of reaction-in-flight (RIF) neutrons in an inertial confinement fusion system. The experiments were carried out at the National Ignition Facility, using both Low Foot and High Foot drives and cryogenic plastic capsules. In both cases, the high-energy RIF ($E_n>$ 15 MeV) component of the neutron spectrum was found to be about $10^{-4}$ of the total. The majority of the RIF neutrons were produced in the dense cold fuel surrounding the burning hotspot of the capsule and the data are consistent with a compressed cold fuel that is moderately to strongly coupled $(Γ\sim$0.6) and electron degenerate $(θ_\mathrm{Fermi}/θ_e\sim$4). The production of RIF neutrons is controlled by the stop** power in the plasma. Thus, the current RIF measurements provide a unique test of stop** power models in an experimentally unexplored plasma regime. We find that the measured RIF data strongly constrain stop** models in warm dense plasma conditions and some models are ruled out by our analysis of these experiments.
△ Less
Submitted 25 November, 2014;
originally announced November 2014.
-
Dwarf Galaxies in the Halo of NGC 891
Authors:
Earl J Schulz
Abstract:
We report the results of a survey of the region within 40 arcmin of NGC 891, a nearby nearly perfectly edge-on spiral galaxy. Candidate "non-stars" with diameters greater than 15 arcsec were selected from the GSC 2.3.2 catalog and cross-comparison of observations in several bands using archived \galex, DSS2, WISE, and 2MASS images identified contaminating stars, artifacts and background galaxies,…
▽ More
We report the results of a survey of the region within 40 arcmin of NGC 891, a nearby nearly perfectly edge-on spiral galaxy. Candidate "non-stars" with diameters greater than 15 arcsec were selected from the GSC 2.3.2 catalog and cross-comparison of observations in several bands using archived \galex, DSS2, WISE, and 2MASS images identified contaminating stars, artifacts and background galaxies, all of which were excluded. The resulting 71 galaxies, many of which were previously uncataloged, comprise a size limited survey of the region. A majority of the galaxies are in the background of NGC 891 and are for the most part members of the Abell 347 cluster at a distance of about 75 \Mpc. The new finds approximately double the known membership of Abell 347, previously thought to be relatively sparse. We identify a total of 7 dwarf galaxies, most of which are new discoveries. The newly discovered dwarf galaxies are dim and gas-poor and may be associated with the previously observed arcs of RGB halo stars in the halo and the prominent HI filament and the lopsided features in the disk of NGC 891. Several of the dwarfs show signs of disruption, consistent with being remnants of an ancient collision.
△ Less
Submitted 13 May, 2014;
originally announced May 2014.
-
Gravitational Collapse in One Dimension
Authors:
A. E. Schulz,
Walter Dehnen,
Gerard Jungman,
Scott Tremaine
Abstract:
We simulate the evolution of one-dimensional gravitating collisionless systems from non- equilibrium initial conditions, similar to the conditions that lead to the formation of dark- matter halos in three dimensions. As in the case of 3D halo formation we find that initially cold, nearly homogeneous particle distributions collapse to approach a final equilibrium state with a universal density prof…
▽ More
We simulate the evolution of one-dimensional gravitating collisionless systems from non- equilibrium initial conditions, similar to the conditions that lead to the formation of dark- matter halos in three dimensions. As in the case of 3D halo formation we find that initially cold, nearly homogeneous particle distributions collapse to approach a final equilibrium state with a universal density profile. At small radii, this attractor exhibits a power-law behavior in density, ρ(x) \propto |x|^(-γ_crit), γ_crit \simeq 0.47, slightly but significantly shallower than the value γ = 1/2 suggested previously. This state develops from the initial conditions through a process of phase mixing and violent relaxation. This process preserves the energy ranks of particles. By warming the initial conditions, we illustrate a cross-over from this power-law final state to a final state containing a homogeneous core. We further show that inhomogeneous but cold power-law initial conditions, with initial exponent γ_i > γ_crit, do not evolve toward the attractor but reach a final state that retains their original power-law behavior in the interior of the profile, indicating a bifurcation in the final state as a function of the initial exponent. Our results rely on a high-fidelity event-driven simulation technique.
△ Less
Submitted 1 June, 2012;
originally announced June 2012.
-
The gravitational force and potential of the finite Mestel disk
Authors:
Earl Schulz
Abstract:
Mestel determined the surface mass distribution of the finite disk for which the circular velocity is constant in the disk and found the gravitational field for points in the $z=0$ plane. Here we find the exact closed form solutions for the potential and the gravitational field of this disk in cylindrical coordinates over all the space. The Finite Mestel Disk (FMD) is characterized by a cuspy mass…
▽ More
Mestel determined the surface mass distribution of the finite disk for which the circular velocity is constant in the disk and found the gravitational field for points in the $z=0$ plane. Here we find the exact closed form solutions for the potential and the gravitational field of this disk in cylindrical coordinates over all the space. The Finite Mestel Disk (FMD) is characterized by a cuspy mass distribution in the inner disk region and by an exponential distribution in the outer region of the disk. The FMD is quite different from the better known exponential disk or the untruncated Mestel disk which, being infinite in extent, are not realistic models of real spiral galaxies. In particular, the FMD requires significantly less mass to explain a measured velocity curve.
△ Less
Submitted 15 December, 2011;
originally announced December 2011.
-
Testing adiabatic contraction with SDSS elliptical galaxies
Authors:
A. E. Schulz,
Rachel Mandelbaum,
Nikhil Padmanabhan
Abstract:
We study the profiles of 75 086 elliptical galaxies from the Sloan Digital Sky Survey (SDSS) at both large (50-500 kpc/h) and small (~3 kpc/h) scales. Weak lensing observations in the outskirts of the halo are combined with measurements of the stellar velocity dispersion in the interior regions of the galaxy for stacked galaxy samples. The weak lensing measurements are well characterized by a Na…
▽ More
We study the profiles of 75 086 elliptical galaxies from the Sloan Digital Sky Survey (SDSS) at both large (50-500 kpc/h) and small (~3 kpc/h) scales. Weak lensing observations in the outskirts of the halo are combined with measurements of the stellar velocity dispersion in the interior regions of the galaxy for stacked galaxy samples. The weak lensing measurements are well characterized by a Navarro, Frenk and White (NFW) profile. The dynamical mass measurements exceed the extrapolated NFW profile even after the estimated stellar masses are subtracted, providing evidence for the modification of the dark matter profile by the baryons. This excess mass is quantitatively consistent with the predictions of the adiabatic contraction (AC) hypothesis. Our finding suggests that the effects of AC during galaxy formation are stable to subsequent bombardment from major and minor mergers. We explore several theoretical and observational systematics and conclude that they cannot account for the inferred mass excess. The most significant source of systematic error is in the IMF, which would have to increase the stellar mass estimates by a factor of two relative to masses from the Kroupa IMF to fully explain the mass excess without AC. Such an increase would create tension with results from SAURON (Cappellari et al. 2006). We demonstrate a connection between the level of contraction of the dark matter halo profile and scatter in the size-luminosity relation, which is a projection of the fundamental plane. Whether or not AC is the mechanism supplying the excess mass, models of galaxy formation and evolution must reconcile the observed halo masses from weak lensing with the comparatively large dynamical masses at the half light radii of the galaxies.
△ Less
Submitted 5 January, 2010; v1 submitted 12 November, 2009;
originally announced November 2009.
-
Calibrating photometric redshift distributions with cross-correlations
Authors:
A. E. Schulz
Abstract:
The next generation of proposed galaxy surveys will increase the number of galaxies with photometric redshifts by two orders of magnitude, drastically expanding both redshift range and detection threshold from the current state of the art. Obtaining spectra for a fair sub-sample of this new data could be cumbersome and expensive. However, adequate calibration of the true redshift distribution of…
▽ More
The next generation of proposed galaxy surveys will increase the number of galaxies with photometric redshifts by two orders of magnitude, drastically expanding both redshift range and detection threshold from the current state of the art. Obtaining spectra for a fair sub-sample of this new data could be cumbersome and expensive. However, adequate calibration of the true redshift distribution of galaxies is vital to tap** the potential of these surveys. We examine a promising alternative to direct spectroscopic follow up: calibration of the redshift distribution of photometric galaxies via cross-correlation with an overlap** spectroscopic survey whose members trace the same density field. We review the theory, develop a pipeline, apply it to mock data from N-body simulations, and examine the properties of this redshift distribution estimator. We demonstrate that the method is effective, but the estimator is weakened by two factors. 1) The correlation function of the spectroscopic sample must be measured in many bins along the line of sight, rendering it noisy and interfering with high quality reconstruction of the photometric redshift distribution. 2) The method is not able to disentangle the photometric redshift distribution from evolution in the bias of the photometric sample. We establish the impact of these factors using our mock catalogs. Although it may still be necessary to spectroscopically follow up a fair subsample of the photometric survey data, further refinement may appreciably decrease the number of spectra that will be needed to calibrate future surveys.
△ Less
Submitted 19 October, 2009;
originally announced October 2009.
-
Challenges facing young astrophysicists
Authors:
N. L. Zakamska,
A. E. Schulz,
K. Heng,
M. Juric,
B. Kocsis,
M. Kuhlen,
R. Mandelbaum,
J. L. Mitchell,
M. Pan,
D. H. Rudd,
G. van de Ven,
Z. Zheng
Abstract:
In order to attract and retain excellent researchers and diverse individuals in astrophysics, we recommend action be taken in several key areas impacting young scientists: (1) Maintain balance between large collaborations and individual projects through distribution of funding; encourage public releases of observational and simulation data for use by a broader community. (2) Improve the involvem…
▽ More
In order to attract and retain excellent researchers and diverse individuals in astrophysics, we recommend action be taken in several key areas impacting young scientists: (1) Maintain balance between large collaborations and individual projects through distribution of funding; encourage public releases of observational and simulation data for use by a broader community. (2) Improve the involvement of women, particularly at leading institutions. (3) Address the critical shortage of child care options and design reasonable profession-wide parental leave policies. (4) Streamline the job application and hiring process. We summarize our reasons for bringing these areas to the attention of the committee, and we suggest several practical steps that can be taken to address them.
△ Less
Submitted 13 May, 2009;
originally announced May 2009.
-
Potential-density pairs for a family of finite disks
Authors:
Earl Schulz
Abstract:
Exact analytical solutions are given for the three finite disks with surface density $Σ_n=σ_0 (1-R^2/α^2)^{n-1/2} \textrm{with} n=0, 1, 2$. Closed-form solutions in cylindrical co-ordinates are given using only elementary functions for the potential and for the gravitational field of each of the disks.
The n=0 disk is the flattened homeoid for which $Σ_{hom} = σ_0/\sqrt{1-R^2/α^2}$. Improved r…
▽ More
Exact analytical solutions are given for the three finite disks with surface density $Σ_n=σ_0 (1-R^2/α^2)^{n-1/2} \textrm{with} n=0, 1, 2$. Closed-form solutions in cylindrical co-ordinates are given using only elementary functions for the potential and for the gravitational field of each of the disks.
The n=0 disk is the flattened homeoid for which $Σ_{hom} = σ_0/\sqrt{1-R^2/α^2}$. Improved results are presented for this disk. The n=1 disk is the Maclaurin disk for which $Σ_{Mac} = σ_0 \sqrt{1-R^2/α^2}$. The Maclaurin disk is a limiting case of the Maclaurin spheroid. The potential of the Maclaurin disk is found here by integrating the potential of the n=0 disk over $α$, exploiting the linearity of Poisson's equation. The n=2 disk has the surface density $Σ_{D2}=σ_0 (1-R^2/α^2)^{3/2}$. The potential is found by integrating the potential of the n=1 disk.
△ Less
Submitted 10 December, 2008;
originally announced December 2008.
-
Close Pairs as Proxies for Galaxy Cluster Mergers
Authors:
Andrew R. Wetzel,
A. E. Schulz,
Daniel E. Holz,
Michael S. Warren
Abstract:
Galaxy cluster merger statistics are an important component in understanding the formation of large-scale structure. Unfortunately, it is difficult to study merger properties and evolution directly because the identification of cluster mergers in observations is problematic. We use large N-body simulations to study the statistical properties of massive halo mergers, specifically investigating th…
▽ More
Galaxy cluster merger statistics are an important component in understanding the formation of large-scale structure. Unfortunately, it is difficult to study merger properties and evolution directly because the identification of cluster mergers in observations is problematic. We use large N-body simulations to study the statistical properties of massive halo mergers, specifically investigating the utility of close halo pairs as proxies for mergers. We examine the relationship between pairs and mergers for a wide range of merger timescales, halo masses, and redshifts (0<z<1). We also quantify the utility of pairs in measuring merger bias. While pairs at very small separations will reliably merge, these constitute a small fraction of the total merger population. Thus, pairs do not provide a reliable direct proxy to the total merger population. We do find an intriguing universality in the relation between close pairs and mergers, which in principle could allow for an estimate of the statistical merger rate from the pair fraction within a scaled separation, but including the effects of redshift space distortions strongly degrades this relation. We find similar behavior for galaxy-mass halos, making our results applicable to field galaxy mergers at high redshift. We investigate how the halo merger rate can be statistically described by the halo mass function via the merger kernel (coagulation), finding an interesting environmental dependence of merging: halos within the mass resolution of our simulations merge less efficiently in overdense environments. Specifically, halo pairs with separations less than a few Mpc/h are more likely to merge in underdense environments; at larger separations, pairs are more likely to merge in overdense environments.
△ Less
Submitted 28 April, 2008; v1 submitted 4 June, 2007;
originally announced June 2007.
-
Simulations of Baryon Oscillations
Authors:
Eric Huff,
A. E. Schulz,
Martin White,
David J. Schlegel,
Michael S. Warren
Abstract:
The coupling of photons and baryons by Thomson scattering in the early universe imprints features in both the Cosmic Microwave Background (CMB) and matter power spectra. The former have been used to constrain a host of cosmological parameters, the latter have the potential to strongly constrain the expansion history of the universe and dark energy. Key to this program is the means to localize th…
▽ More
The coupling of photons and baryons by Thomson scattering in the early universe imprints features in both the Cosmic Microwave Background (CMB) and matter power spectra. The former have been used to constrain a host of cosmological parameters, the latter have the potential to strongly constrain the expansion history of the universe and dark energy. Key to this program is the means to localize the primordial features in observations of galaxy spectra which necessarily involve galaxy bias, non-linear evolution and redshift space distortions. We present calculations, based on mock catalogs produced from high-resolution N-body simulations, which show the range of behaviors we might expect of galaxies in the real universe. We investigate physically motivated fitting forms which include the effects of non-linearity, galaxy bias and redshift space distortions and discuss methods for analysis of upcoming data. In agreement with earlier work, we find that a survey of several Gpc^3 would constrain the sound horizon at z~1 to about 1%.
△ Less
Submitted 19 July, 2006; v1 submitted 4 July, 2006;
originally announced July 2006.
-
Explicit cross-sections of singly generated group actions
Authors:
David Larson,
Eckart Schulz,
Darrin Speegle,
Keith Taylor
Abstract:
We consider two classes of actions on $\mathbb{R}^n$ - one continuous and one discrete. For matrices of the form $A = e^B$ with $B \in M_n(\R)$, we consider the action given by $γ\to γA^t$. We characterize the matrices $A$ for which there is a cross-section for this action. The discrete action we consider is given by $γ\to γA^k$, where $A\in GL_n(\R)$. We characterize the matrices $A$ for which…
▽ More
We consider two classes of actions on $\mathbb{R}^n$ - one continuous and one discrete. For matrices of the form $A = e^B$ with $B \in M_n(\R)$, we consider the action given by $γ\to γA^t$. We characterize the matrices $A$ for which there is a cross-section for this action. The discrete action we consider is given by $γ\to γA^k$, where $A\in GL_n(\R)$. We characterize the matrices $A$ for which there exists a cross-section for this action as well. We also characterize those $A$ for which there exist special types of cross-sections; namely, bounded cross-sections and finite measure cross-sections. Explicit examples of cross-sections are provided for each of the cases in which cross-sections exist. Finally, these explicit cross-sections are used to characterize those matrices for which there exist MSF wavelets with infinitely many wavelet functions. Along the way, we generalize a well-known aspect of the theory of shift-invariant spaces to shift-invariant spaces with infinitely many generators.
△ Less
Submitted 28 April, 2006;
originally announced April 2006.
-
Scale-dependent bias and the halo model
Authors:
A. E. Schulz,
Martin White
Abstract:
We use a simplified version of the halo model with a power law power spectrum to study scale dependence in galaxy bias at the very large scales relevant to baryon oscillations. In addition to providing a useful pedagogical explanation of the scale dependence of galaxy bias, the model provides an analytic tool for studying how changes in the Halo Occupation Distribution (HOD) impact the scale dep…
▽ More
We use a simplified version of the halo model with a power law power spectrum to study scale dependence in galaxy bias at the very large scales relevant to baryon oscillations. In addition to providing a useful pedagogical explanation of the scale dependence of galaxy bias, the model provides an analytic tool for studying how changes in the Halo Occupation Distribution (HOD) impact the scale dependence of galaxy bias on scales between 10 and 1000 Mpc/h, which is useful for interpreting the results of complex N-body simulations. We find that changing the mean number of galaxies per halo of a given mass will change the scale dependence of the bias, but that changing the way the galaxies are distributed within the halo has a smaller effect on the scale dependence of bias at large scales. We use the model to explain the decay in amplitude of the baryon oscillations as k increases, and generalize the model to make predictions about scale dependent galaxy bias when redshift space distortions are introduced.
△ Less
Submitted 18 November, 2005; v1 submitted 4 October, 2005;
originally announced October 2005.
-
Characterizing the Shapes of Galaxy Clusters Using Moments of the Gravitational Lensing Shear
Authors:
A. E. Schulz,
Joseph Hennawi,
Martin White
Abstract:
We explore the use of the tangential component of weak lensing shear to characterize the ellipticity of clusters of galaxies. We introduce an ellipticity estimator, and quantify its properties for isolated clusters from LCDM N-body simulations. We compare the N-body results to results from smooth analytic models. The expected distribution of the estimator for mock observations is presented, and…
▽ More
We explore the use of the tangential component of weak lensing shear to characterize the ellipticity of clusters of galaxies. We introduce an ellipticity estimator, and quantify its properties for isolated clusters from LCDM N-body simulations. We compare the N-body results to results from smooth analytic models. The expected distribution of the estimator for mock observations is presented, and we show how this distribution is impacted by contaminants such as noise, line of sight projections, and misalignment of the central galaxy used to determine the orientation of the triaxial halo. We examine the radial profile of the estimator and discuss tradeoffs in the observational strategy to determine cluster shape.
△ Less
Submitted 7 November, 2005; v1 submitted 3 August, 2005;
originally announced August 2005.