-
State Soup: In-Context Skill Learning, Retrieval and Mixing
Authors:
Maciej Pióro,
Maciej Wołczyk,
Razvan Pascanu,
Johannes von Oswald,
João Sacramento
Abstract:
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter inte…
▽ More
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter interpolation. Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined, exploiting the linearity of recurrence. We study this form of fast model merging on Mamba-2.8b, a pretrained recurrent model, and present preliminary evidence that simple linear state interpolation methods suffice to improve next-token perplexity as well as downstream in-context learning task performance.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Large Language Models as Planning Domain Generators
Authors:
James Oswald,
Kavitha Srinivas,
Harsha Kokel,
Junkyu Lee,
Michael Katz,
Shirin Sohrabi
Abstract:
Develo** domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a frame…
▽ More
Develo** domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.
△ Less
Submitted 2 April, 2024;
originally announced May 2024.
-
Linear Transformers are Versatile In-Context Learners
Authors:
Max Vladymyrov,
Johannes von Oswald,
Mark Sandler,
Rong Ge
Abstract:
Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that any linear transformer maintains an implicit linear model and can be interpreted as…
▽ More
Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that any linear transformer maintains an implicit linear model and can be interpreted as performing a variant of preconditioned gradient descent. We also investigate the use of linear transformers in a challenging scenario where the training data is corrupted with different levels of noise. Remarkably, we demonstrate that for this problem linear transformers discover an intricate and highly effective optimization algorithm, surpassing or matching in performance many reasonable baselines. We reverse-engineer this algorithm and show that it is a novel approach incorporating momentum and adaptive rescaling based on noise levels. Our findings show that even linear transformers possess the surprising ability to discover sophisticated optimization strategies.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Discovering modular solutions that generalize compositionally
Authors:
Simon Schug,
Sei** Kobayashi,
Yassir Akram,
Maciej Wołczyk,
Alexandra Proca,
Johannes von Oswald,
Razvan Pascanu,
João Sacramento,
Angelika Steger
Abstract:
Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which…
▽ More
Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which circumstances modular systems can discover hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. In particular we study modularity in hypernetworks representing a general class of multiplicative interactions. We show theoretically that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations. We further demonstrate empirically that under the theoretically identified conditions, meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
△ Less
Submitted 25 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Parallel Verification of Natural Deduction Proof Graphs
Authors:
James T. Oswald,
Brandon Rozek
Abstract:
Graph-based interactive theorem provers offer a visual representation of proofs, explicitly representing the dependencies and inferences between each of the proof steps in a graph or hypergraph format. The number and complexity of these dependency links can determine how long it takes to verify the validity of the entire proof. Towards this end, we present a set of parallel algorithms for the form…
▽ More
Graph-based interactive theorem provers offer a visual representation of proofs, explicitly representing the dependencies and inferences between each of the proof steps in a graph or hypergraph format. The number and complexity of these dependency links can determine how long it takes to verify the validity of the entire proof. Towards this end, we present a set of parallel algorithms for the formal verification of graph-based natural-deduction (ND) style proofs. We introduce a definition of layering that captures dependencies between the proof steps (nodes). Nodes in each layer can then be verified in parallel as long as prior layers have been verified. To evaluate the performance of our algorithms on proof graphs, we propose a framework for finding the performance bounds and patterns using directed acyclic network topologies (DANTs). This framework allows us to create concrete instances of DANTs for empirical evaluation of our algorithms. With this, we compare our set of parallel algorithms against a serial implementation with two experiments: one scaling both the problem size and the other scaling the number of threads. Our findings show that parallelization results in improved verification performance for certain DANT instances. We also show that our algorithms scale for certain DANT instances with respect to the number of threads.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Uncovering mesa-optimization algorithms in Transformers
Authors:
Johannes von Oswald,
Eyvind Niklasson,
Maximilian Schlegel,
Sei** Kobayashi,
Nicolas Zucchet,
Nino Scherrer,
Nolan Miller,
Mark Sandler,
Blaise Agüera y Arcas,
Max Vladymyrov,
Razvan Pascanu,
João Sacramento
Abstract:
Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning…
▽ More
Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Gated recurrent neural networks discover attention
Authors:
Nicolas Zucchet,
Sei** Kobayashi,
Yassir Akram,
Johannes von Oswald,
Maxime Larcher,
Angelika Steger,
João Sacramento
Abstract:
Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement…
▽ More
Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement (linear) self-attention, the main building block of Transformers. By reverse-engineering a set of trained RNNs, we find that gradient descent in practice discovers our construction. In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers. Our findings highlight the importance of multiplicative interactions in neural networks and suggest that certain RNNs might be unexpectedly implementing attention under the hood.
△ Less
Submitted 7 February, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
KDE-Based Coarse-graining of Semicrystalline Systems with Correlated Three-body Intramolecular Interaction
Authors:
Jianlan Ye,
Vipin Agrawal,
Minghao Liu,
**g Hu,
Jay Oswald
Abstract:
We present an extension to the iterative Boltzmann inversion method to generate coarse-grained models with three-body intramolecular potentials that can reproduce correlations in structural distribution functions. The coarse-grained structural distribution functions are computed using kernel density estimates to produce analytically differentiable distribution functions with controllable smootheni…
▽ More
We present an extension to the iterative Boltzmann inversion method to generate coarse-grained models with three-body intramolecular potentials that can reproduce correlations in structural distribution functions. The coarse-grained structural distribution functions are computed using kernel density estimates to produce analytically differentiable distribution functions with controllable smoothening via the kernel bandwidth parameters. Bicubic interpolation is used to accurately interpolate the three-body potentials trained by the method. To demonstrate this new approach, a coarse-grained model of polyethylene is constructed in which each bead represents an ethylene monomer. The resulting model reproduces the radial density function as well as the joint probability distribution of bond-length and bond-angles sampled from target atomistic simulations with only a 10% increase in the computational cost compared to models with independent bond-length and bond-angle potentials. Analysis of the predicted crystallization kinetics of the model developed by the new approach reveals that the bandwidth parameters can be tuned to accelerate the modeling of polymer crystallization. Specifically, computing target RDF with larger bandwidth slows down the secondary crystallization, and increasing the bandwidth in $θ$-direction of bond-length and bond-angle distribution reduces the primary crystallization rate.
△ Less
Submitted 23 November, 2023; v1 submitted 8 July, 2023;
originally announced July 2023.
-
Nanoscale electronic transport at graphene/pentacene van der Waals interface
Authors:
Michel Daher Mansour,
Jacopo Oswald,
Davide Beretta,
Michael Stiefe,
Roman Furrer,
Michel Calame,
Dominique Vuillaume
Abstract:
We report a study on the relationship between structure and electron transport properties of nanoscale graphene/pentacene interfaces. We fabricated graphene/pentacene interfaces from 10-30 nm thick needle-like pentacene nanostructures down to two-three layers (2L-3L) dendritic pentacene islands, and we measured their electron transport properties by conductive atomic force microscopy (C-AFM). The…
▽ More
We report a study on the relationship between structure and electron transport properties of nanoscale graphene/pentacene interfaces. We fabricated graphene/pentacene interfaces from 10-30 nm thick needle-like pentacene nanostructures down to two-three layers (2L-3L) dendritic pentacene islands, and we measured their electron transport properties by conductive atomic force microscopy (C-AFM). The energy barrier at the interfaces, i.e. the energy position of the pentacene highest occupied molecular orbital (HOMO) with respect to the Fermi energy of the graphene and the C-AFM metal tip, are determined and discussed with the appropriate electron transport model (double Schottky diode model and Landauer-Buttiker model, respectively) taking into account the voltage-dependent charge do** of graphene. In both types of samples, the energy barrier at the graphene/pentacene interface is slightly larger than that at the pentacene/metal tip interface, resulting in 0.47-0.55 eV and 0.21-0.34 eV, respectively, for the 10-30 nm thick needle-like pentacene islands, and in 0.92-1.44 eV and 0.67-1.05 eV, respectively, for the 2L-3L thick dendritic pentacene nanostructures. We attribute this difference to the molecular organization details of the pentacene/graphene heterostructures, with pentacene molecules lying flat on the graphene in the needle-like pentacene nansotructures, while standing upright in 2L-3L dendritic islands, as observed from Raman spectroscopy.
△ Less
Submitted 4 May, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Transformers learn in-context by gradient descent
Authors:
Johannes von Oswald,
Eyvind Niklasson,
Ettore Randazzo,
João Sacramento,
Alexander Mordvintsev,
Andrey Zhmoginov,
Max Vladymyrov
Abstract:
At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linea…
▽ More
At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss. Motivated by that construction, we show empirically that when training self-attention-only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the weights found by optimization match the construction. Thus we show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass. This allows us, at least in the domain of regression problems, to mechanistically understand the inner workings of in-context learning in optimized Transformers. Building on this insight, we furthermore identify how Transformers surpass the performance of plain gradient descent by learning an iterative curvature correction and learn linear models on deep data representations to solve non-linear regression tasks. Finally, we discuss intriguing parallels to a mechanism identified to be crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be understood as a specific case of in-context learning by gradient descent learning within Transformers. Code to reproduce the experiments can be found at https://github.com/google-research/self-organising-systems/tree/master/transformers_learn_icl_by_gd .
△ Less
Submitted 31 May, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Authors:
Sei** Kobayashi,
Pau Vilimelis Aceituno,
Johannes von Oswald
Abstract:
Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the pe…
▽ More
Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training. Finally, we propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Measuring Overfitting in Convolutional Neural Networks using Adversarial Perturbations and Label Noise
Authors:
Svetlana Pavlitskaya,
Joël Oswald,
J. Marius Zöllner
Abstract:
Although numerous methods to reduce the overfitting of convolutional neural networks (CNNs) exist, it is still not clear how to confidently measure the degree of overfitting. A metric reflecting the overfitting level might be, however, extremely helpful for the comparison of different architectures and for the evaluation of various techniques to tackle overfitting. Motivated by the fact that overf…
▽ More
Although numerous methods to reduce the overfitting of convolutional neural networks (CNNs) exist, it is still not clear how to confidently measure the degree of overfitting. A metric reflecting the overfitting level might be, however, extremely helpful for the comparison of different architectures and for the evaluation of various techniques to tackle overfitting. Motivated by the fact that overfitted neural networks tend to rather memorize noise in the training data than generalize to unseen data, we examine how the training accuracy changes in the presence of increasing data perturbations and study the connection to overfitting. While previous work focused on label noise only, we examine a spectrum of techniques to inject noise into the training data, including adversarial perturbations and input corruptions. Based on this, we define two new metrics that can confidently distinguish between correct and overfitted models. For the evaluation, we derive a pool of models for which the overfitting behavior is known beforehand. To test the effect of various factors, we introduce several anti-overfitting measures in architectures based on VGG and ResNet and study their impact, including regularization techniques, training set size, and the number of parameters. Finally, we assess the applicability of the proposed metrics by measuring the overfitting degree of several CNN architectures outside of our model pool.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Random initialisations performing above chance and how to find them
Authors:
Frederik Benzing,
Simon Schug,
Robert Meier,
Johannes von Oswald,
Yassir Akram,
Nicolas Zucchet,
Laurence Aitchison,
Angelika Steger
Abstract:
Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al.\ recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after t…
▽ More
Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al.\ recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after taking into account the permutation invariance of neural networks. Concretely, they hypothesise that any two solutions found by SGD can be permuted such that the linear interpolation between their parameters forms a path without significant increases in loss. Here, we use a simple but powerful algorithm to find such permutations that allows us to obtain direct empirical evidence that the hypothesis is true in fully connected networks. Strikingly, we find that two networks already live in the same loss valley at the time of initialisation and averaging their random, but suitably permuted initialisation performs significantly above chance. In contrast, for convolutional architectures, our evidence suggests that the hypothesis does not hold. Especially in a large learning rate regime, SGD seems to discover diverse modes.
△ Less
Submitted 7 November, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
The least-control principle for local learning at equilibrium
Authors:
Alexander Meulemans,
Nicolas Zucchet,
Sei** Kobayashi,
Johannes von Oswald,
João Sacramento
Abstract:
Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning. Here, we present a new principle for learning such systems with a temporally- and spatially-local rule. Our pr…
▽ More
Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning. Here, we present a new principle for learning such systems with a temporally- and spatially-local rule. Our principle casts learning as a least-control problem, where we first introduce an optimal controller to lead the system towards a solution state, and then define learning as reducing the amount of control needed to reach such a state. We show that incorporating learning signals within a dynamics as an optimal control enables transmitting activity-dependent credit assignment information, avoids storing intermediate states in memory, and does not rely on infinitesimal learning signals. In practice, our principle leads to strong performance matching that of leading gradient-based learning methods when applied to an array of problems involving recurrent neural networks and meta-learning. Our results shed light on how the brain might learn and offer new ways of approaching a broad class of machine learning problems.
△ Less
Submitted 31 October, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Learning where to learn: Gradient sparsity in meta and continual learning
Authors:
Johannes von Oswald,
Dominic Zhao,
Sei** Kobayashi,
Simon Schug,
Massimo Caccia,
Nicolas Zucchet,
João Sacramento
Abstract:
Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterne…
▽ More
Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis. This selective sparsity results in better generalization and less interference in a range of few-shot and continual learning problems. Moreover, we find that sparse learning also emerges in a more expressive model where learning rates are meta-learned. Our results shed light on an ongoing debate on whether meta-learning can discover adaptable features and suggest that learning by sparse gradient descent is a powerful inductive bias for meta-learning systems.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Revision of the edge channel picture for the integer quantum Hall effect
Authors:
Josef Oswald
Abstract:
State of the art computing opens now a new window to the integer quantum Hall effect (IQHE) regime, which enforces a major revision of the common knowledge accumulated so far. In our record-breaking application of the Hartree-Fock method we use up to 3000 electrons distributed over up to 5000 states for almost macroscopic system size of 1000x1000nm. In particular, the formation of compressible and…
▽ More
State of the art computing opens now a new window to the integer quantum Hall effect (IQHE) regime, which enforces a major revision of the common knowledge accumulated so far. In our record-breaking application of the Hartree-Fock method we use up to 3000 electrons distributed over up to 5000 states for almost macroscopic system size of 1000x1000nm. In particular, the formation of compressible and in-compressible edge stripes turns out to develop essentially different from the common picture used so far. Oppositely to the theory of Chklovskii, Shklovskii and Glazman (CSG), the narrow channels, as assumed by the early models of the IQHE, do not widen up into wide compressible stripes. Instead, the wide compressible stripes of CSG transform into a mixture of clusters of full and empty spin-split LLs, while the cluster boundaries create a network of still narrow quantum channels sitting on top of the wide compressible stripes. On this background the early models based on narrow edge channels do not suffer from neglecting electron-electron interaction as falsely stated in the past. Quite oppositely, in contrast to the common believe, our modelling demonstrates that also the IQHE regime carries the hallmark of many-body physics which stabilizes narrow edge channels also in the presence of electron-electron interaction.
△ Less
Submitted 30 May, 2022; v1 submitted 23 June, 2021;
originally announced June 2021.
-
A contrastive rule for meta-learning
Authors:
Nicolas Zucchet,
Simon Schug,
Johannes von Oswald,
Dominic Zhao,
João Sacramento
Abstract:
Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data. While synaptic plasticity is generically thought to underlie learning in the brain, the precise neural and synaptic mechanisms by which learning processes improve through experience are not well unders…
▽ More
Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data. While synaptic plasticity is generically thought to underlie learning in the brain, the precise neural and synaptic mechanisms by which learning processes improve through experience are not well understood. Here, we present a general-purpose, biologically-plausible meta-learning rule which estimates gradients with respect to the parameters of an underlying learning algorithm by simply running it twice. Our rule may be understood as a generalization of contrastive Hebbian learning to meta-learning and notably, it neither requires computing second derivatives nor going backwards in time, two characteristic features of previous gradient-based methods that are hard to conceive in physical neural circuits. We demonstrate the generality of our rule by applying it to two distinct models: a complex synapse with internal states which consolidate task-shared information, and a dual-system architecture in which a primary network is rapidly modulated by another one to learn the specifics of each task. For both models, our meta-learning rule matches or outperforms reference algorithms on a wide range of benchmark problems, while only using information presumed to be locally available at neurons and synapses. We corroborate these findings with a theoretical analysis of the gradient estimation error incurred by our rule.
△ Less
Submitted 3 October, 2022; v1 submitted 4 April, 2021;
originally announced April 2021.
-
Posterior Meta-Replay for Continual Learning
Authors:
Christian Henning,
Maria R. Cervera,
Francesco D'Angelo,
Johannes von Oswald,
Regina Traber,
Benjamin Ehret,
Sei** Kobayashi,
Benjamin F. Grewe,
João Sacramento
Abstract:
Learning a sequence of tasks without access to i.i.d. observations is a widely studied form of continual learning (CL) that remains challenging. In principle, Bayesian learning directly applies to this setting, since recursive and one-off Bayesian updates yield the same result. In practice, however, recursive updating often leads to poor trade-off solutions across tasks because approximate inferen…
▽ More
Learning a sequence of tasks without access to i.i.d. observations is a widely studied form of continual learning (CL) that remains challenging. In principle, Bayesian learning directly applies to this setting, since recursive and one-off Bayesian updates yield the same result. In practice, however, recursive updating often leads to poor trade-off solutions across tasks because approximate inference is necessary for most models of interest. Here, we describe an alternative Bayesian approach where task-conditioned parameter distributions are continually inferred from data. We offer a practical deep learning implementation of our framework based on probabilistic task-conditioned hypernetworks, an approach we term posterior meta-replay. Experiments on standard benchmarks show that our probabilistic hypernetworks compress sequences of posterior parameter distributions with virtually no forgetting. We obtain considerable performance gains compared to existing Bayesian CL methods, and identify task inference as our major limiting factor. This limitation has several causes that are independent of the considered sequential setting, opening up new avenues for progress in CL.
△ Less
Submitted 21 October, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
The microscopic picture of the integer quantum Hall regime
Authors:
Rudolf A. Römer,
Josef Oswald
Abstract:
Computer modelling of the integer quantum Hall effect based on self-consistent Hartee-Fock calculations has now reached an astonishing level of maturity. Spatially-resolved studies of the electron density at near macroscopic system sizes of up to $\sim 1\ μm^2$ reveal self-organized clusters of locally fully filled and locally fully depleted Landau levels depending on which spin polarization is fa…
▽ More
Computer modelling of the integer quantum Hall effect based on self-consistent Hartee-Fock calculations has now reached an astonishing level of maturity. Spatially-resolved studies of the electron density at near macroscopic system sizes of up to $\sim 1\ μm^2$ reveal self-organized clusters of locally fully filled and locally fully depleted Landau levels depending on which spin polarization is favoured. The behaviour results, for strong disorders, in an exchange-interaction induced $g$-factor enhancement and, ultimately, gives rise to narrow transport channels, including the celebrated narrow edge channels. For weak disorder, we find that bubble and stripes phases emerge with characteristics that predict experimental results very well. Hence the HF approach has become a convenient numerical basis to \emph{quantitatively} study the quantum Hall effects, superseding previous more qualitative approaches.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Sondheimer oscillations as a probe of non-ohmic flow in type-II Weyl semimetal WP$_2$
Authors:
Maarten R. van Delft,
Yaxian Wang,
Carsten Putzke,
Jacopo Oswald,
Georgios Varnavides,
Christina A. C. Garcia,
Chunyu Guo,
Heinz Schmid,
Vicky Süss,
Horst Borrmann,
Jonas Diaz,
Yan Sun,
Claudia Felser,
Bernd Gotsmann,
Prineha Narang,
Philip J. W. Moll
Abstract:
As conductors in electronic applications shrink, microscopic conduction processes lead to strong deviations from Ohm's law. Depending on the length scales of momentum conserving ($l_{MC}$) and relaxing ($l_{MR}$) electron scattering, and the device size ($d$), current flows may shift from ohmic to ballistic to hydrodynamic regimes and more exotic mixtures thereof. So far, an in situ, in-operando m…
▽ More
As conductors in electronic applications shrink, microscopic conduction processes lead to strong deviations from Ohm's law. Depending on the length scales of momentum conserving ($l_{MC}$) and relaxing ($l_{MR}$) electron scattering, and the device size ($d$), current flows may shift from ohmic to ballistic to hydrodynamic regimes and more exotic mixtures thereof. So far, an in situ, in-operando methodology to obtain these parameters self-consistently within a micro/nanodevice, and thereby identify its conduction regime, is critically lacking. In this context, we exploit Sondheimer oscillations, semi-classical magnetoresistance oscillations due to helical electronic motion, as a method to obtain $l_{MR}$ in micro-devices even when $l_{MR}\gg d$. This gives information on the bulk $l_{MR}$ complementary to quantum oscillations, which are sensitive to all scattering processes. We extract $l_{MR}$ from the Sondheimer amplitude in the topological semi-metal WP$_2$, at elevated temperatures up to $T\sim 50$~K, in a range most relevant for hydrodynamic transport phenomena. Our data on micrometer-sized devices are in excellent agreement with experimental reports of the large bulk $l_{MR}$ and thus confirm that WP$_2$ can be microfabricated without degradation. Indeed, the measured scattering rates match well with those of theoretically predicted electron-phonon scattering, thus supporting the notion of strong momentum exchange between electrons and phonons in WP$_2$ at these temperatures. These results conclusively establish Sondheimer oscillations as a quantitative probe of $l_{MR}$ in micro-devices in studying non-ohmic electron flow.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
Neural networks with late-phase weights
Authors:
Johannes von Oswald,
Sei** Kobayashi,
Alexander Meulemans,
Christian Henning,
Benjamin F. Grewe,
João Sacramento
Abstract:
The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring incre…
▽ More
The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring increased computational costs, we investigate a family of low-dimensional late-phase weight models which interact multiplicatively with the remaining parameters. Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are complemented with a theoretical analysis of a noisy quadratic problem which provides a simplified picture of the late phases of neural network learning.
△ Less
Submitted 11 April, 2022; v1 submitted 25 July, 2020;
originally announced July 2020.
-
Continual Learning in Recurrent Neural Networks
Authors:
Benjamin Ehret,
Christian Henning,
Maria R. Cervera,
Alexander Meulemans,
Johannes von Oswald,
Benjamin F. Grewe
Abstract:
While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on th…
▽ More
While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. In contrast to feedforward networks, RNNs iteratively reuse a shared set of weights and require working memory to process input samples. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements, which lead to an increased need for stability at the cost of decreased plasticity for learning subsequent tasks. We additionally provide theoretical arguments supporting this interpretation by studying linear RNNs. Our study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data.
△ Less
Submitted 10 March, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Microscopic details of stripes and bubbles in the quantum Hall regime
Authors:
Josef Oswald,
Rudolf A. Römer
Abstract:
We use a fully self-consistent laterally resolved Hartree-Fock approximation for numerically addressing the electron configurations at higher Landau levels in the quantum Hall regime for near-macroscopic sample sizes. Our results give microscopic details of stripe- and bubble-like charge density modulations and show how these emerge depending on the filling factor. We find that there exists a regi…
▽ More
We use a fully self-consistent laterally resolved Hartree-Fock approximation for numerically addressing the electron configurations at higher Landau levels in the quantum Hall regime for near-macroscopic sample sizes. Our results give microscopic details of stripe- and bubble-like charge density modulations and show how these emerge depending on the filling factor. We find that there exists a region at the boundaries of the stripes and bubbles with a density modulation that corresponds to a filling factor around half filling. The microscopic details of these boundary regions determine the geometrical boundary conditions for aligning the charge density modulation either as stripes or bubbles. Transport is modelled using a non-equilibrium network model giving a pronounced anisotropy in direction of the injected current in the stripe regime close to half filling. We obtain a stripe period of 2.9 cyclotron radii. Our results indicate the dominance of many particle physics in the integer quantum Hall regime and provide an intuitive understanding of its consequences in strong magnetic fields.
△ Less
Submitted 17 July, 2020; v1 submitted 21 January, 2020;
originally announced January 2020.
-
Continual learning with hypernetworks
Authors:
Johannes von Oswald,
Christian Henning,
Benjamin F. Grewe,
João Sacramento
Abstract:
Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instea…
▽ More
Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instead of recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing task-specific weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving state-of-the-art performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display a very large capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable hypernetwork weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.
△ Less
Submitted 11 April, 2022; v1 submitted 3 June, 2019;
originally announced June 2019.
-
Manifestation of many-body interactions in the integer quantum Hall effect regime
Authors:
Josef Oswald,
Rudolf A Römer
Abstract:
We use the self-consistent Hartree-Fock approximation for numerically addressing the integer quantum Hall (IQH) regime in terms of many-body physics at higher Landau levels (LL). The results exhibit a strong tendency to avoid the simultaneous existence of partly filled spin-up and spin-down LLs. Partly filled LLs appear as a mixture of coexisting regions of full and empty LLs. We obtain edge strip…
▽ More
We use the self-consistent Hartree-Fock approximation for numerically addressing the integer quantum Hall (IQH) regime in terms of many-body physics at higher Landau levels (LL). The results exhibit a strong tendency to avoid the simultaneous existence of partly filled spin-up and spin-down LLs. Partly filled LLs appear as a mixture of coexisting regions of full and empty LLs. We obtain edge stripes with approximately constant filling factor $ν$ close to half-odd filling at the boundaries between the regions of full and empty LLs, which we explain in terms of the $g$-factor enhancement as a function of a locally varying $ν$ across the compressible stripes.The many-particle interactions follow a behaviour as it would result from applying Hund's rule for the occupation of the spin split LLs. The screening of the disorder and edge potential appears significantly reduced as compared to screening based on a Thomas-Fermi approximation. For addressing carrier transport, we use a non-equilibrium network model (NNM) that handles the lateral distribution of the experimentally injected non-equilibrium chemical potentials $μ$.
△ Less
Submitted 5 July, 2017;
originally announced July 2017.
-
Exchange-mediated dynamic screening in the integer quantum Hall regime
Authors:
Josef Oswald,
Rudolf A. Römer
Abstract:
We study many-body interaction effects in the spatially-resolved filling factor ($ν$) distribution for higher Landau levels (LLs) via self-consistent Hartree-Fock simulations in the integer quantum Hall (IQH) regime. Our results indicate a strong, interaction-induced tendency to avoid the simultaneous existence of partially filled spin-up and spin-down LLs. Rather, we find that such partially fill…
▽ More
We study many-body interaction effects in the spatially-resolved filling factor ($ν$) distribution for higher Landau levels (LLs) via self-consistent Hartree-Fock simulations in the integer quantum Hall (IQH) regime. Our results indicate a strong, interaction-induced tendency to avoid the simultaneous existence of partially filled spin-up and spin-down LLs. Rather, we find that such partially filled LLs consist of coexisting regions of full and empty LLs. At the boundaries between the regions of full and empty LLs, we observe edge stripes of nearly constant $ν$ close to half-odd filling. This suggests that the exchange interaction induces a behavior similar to a Hund's rule for the occupation of the spin split LLs. The screening of the disorder and edge potential appears significantly reduced as compared to static Thomas-Fermi screening. Our results are consistent with a local, lateral $ν$ dependence of the exchange-enhanced spin splitting. Hence, on quantum-coherent length scales as probed here, the electron system of the IQH effect behaves similar to a non-interacting single particle system - not because of the absence, but rather due to the dominance of many-body effects.
△ Less
Submitted 22 May, 2017;
originally announced May 2017.
-
Microscopic Details of the Integer Quantum Hall Effect in an Anti-Hall Bar
Authors:
Christoph Uiberacker,
Christian Stecher,
Josef Oswald
Abstract:
Due to the lack of simulation tools that take into account the actual geometry of complicated quantum Hall samples there are lots of experiments that are not yet fully understood. Already some years ago R. G. Mani recorded a shift of the Hall resistance transitions to lower magnetic fields in samples of a Hall bar with embedded anti-Hall bar by using partial gating. We use a Nonequilibrium Network…
▽ More
Due to the lack of simulation tools that take into account the actual geometry of complicated quantum Hall samples there are lots of experiments that are not yet fully understood. Already some years ago R. G. Mani recorded a shift of the Hall resistance transitions to lower magnetic fields in samples of a Hall bar with embedded anti-Hall bar by using partial gating. We use a Nonequilibrium Network Model (NNM) to simulate this geometry and find qualitative agreement. Fitting the simulated resistance curves to the experimental results we can not only determine the carrier concentration but also obtain an estimate of the screened gating potential and especially the amplitude and lengthscale of potential fluctuations from charge inhomogenities which are not easily accessible by experiment.
△ Less
Submitted 23 December, 2011;
originally announced December 2011.
-
A systematic study of non-ideal contacts in integer quantum Hall systems
Authors:
Christoph Uiberacker,
Christian Stecher,
Josef Oswald
Abstract:
In the present article we investigate the influence of the contact region on the distribution of the chemical potential in integer quantum Hall samples, as well as the longitudinal and Hall resistance as a function of the magnetic field. First we use a standard quantum Hall sample geometry and analyse the influence of the length of the leads where current enters/leaves the sample and the ratio of…
▽ More
In the present article we investigate the influence of the contact region on the distribution of the chemical potential in integer quantum Hall samples, as well as the longitudinal and Hall resistance as a function of the magnetic field. First we use a standard quantum Hall sample geometry and analyse the influence of the length of the leads where current enters/leaves the sample and the ratio of the contact width to the width of these leads. Furthermore we investigate potential barriers in the current injecting leads and the measurement arms in order to simulate non-ideal contacts. Second we simulate nonlocal quantum Hall samples with applied gating voltage at the metallic contacts. For such samples it has been found experimentally that both the longitudinal and Hall resistance as a function of the magnetic field can change significantly. Using the nonequilibrium network model we are able to reproduce most qualitative features of the experiments.
△ Less
Submitted 18 December, 2010;
originally announced December 2010.
-
Comparatively High In-Field Critical Current in Type-II Superconductors from Heterogeneous Columnar Pins: A Molecular Dynamics Study
Authors:
J. P. Rodriguez,
E. J. Oswald
Abstract:
Theoretical work predicts that the strong dependence of Tc on pure shear strain within the a-b plane of optimally doped YBa2Cu3O{7-delta} results in heterogenous columnar pins of vortex lines about dislocation lines and about nano-columns inclusions aligned in parallel to the c axis. The critical current of a rigid vortex lattice driven by the Lorentz force in the presence of such clusters of pi…
▽ More
Theoretical work predicts that the strong dependence of Tc on pure shear strain within the a-b plane of optimally doped YBa2Cu3O{7-delta} results in heterogenous columnar pins of vortex lines about dislocation lines and about nano-columns inclusions aligned in parallel to the c axis. The critical current of a rigid vortex lattice driven by the Lorentz force in the presence of such clusters of pin/antipin lines is computed using two-dimensional (2D) collective pinning theory and by numerical simulation of the corresponding 2D vortex dynamics. Both theory and computer calculation find that the antipin component of the heterogenous columnar pins contributes substantially to the net in-field critical current.
△ Less
Submitted 5 October, 2009;
originally announced October 2009.
-
Optical Characterisation of MOVPE Grown Vertically Correlated InAs/GaAs Quantum Dots
Authors:
P. Hazdra,
J. Voves,
J. Oswald,
K. Kuldova,
A. Hospodkova,
E. Hulicius,
J. Pangrac
Abstract:
Structures with self-organised InAs quantum dots in a GaAs matrix were grown by the low pressure metal-organic vapour phase epitaxy (LP-MOVPE) technique. Photoluminescence in combination with photomodulated reflectance spectroscopy were used as the main characterisation methods for the growth optimisation. Results show that photoreflectance spectroscopy is an excellent tool for characterisation…
▽ More
Structures with self-organised InAs quantum dots in a GaAs matrix were grown by the low pressure metal-organic vapour phase epitaxy (LP-MOVPE) technique. Photoluminescence in combination with photomodulated reflectance spectroscopy were used as the main characterisation methods for the growth optimisation. Results show that photoreflectance spectroscopy is an excellent tool for characterisation of QD structures wetting layers (thickness and composition) and for identification of spacers in vertically stacked QDs structures.
△ Less
Submitted 10 August, 2007;
originally announced August 2007.
-
Substrate temperature changes during MBE growth of GaMnAs
Authors:
V. Novak,
K. Olejnik,
M. Cukr,
L. Smrcka,
Z. Remes,
J. Oswald
Abstract:
Remarkably big increase of the substrate temperature during the low-temperature MBE growth of GaMnAs layers is observed by means of band gap spectroscopy. It is explained and simulated in terms of changes in the absorption/emission characteristics of the growing layer. Options for the temperature variation dam** are discussed.
Remarkably big increase of the substrate temperature during the low-temperature MBE growth of GaMnAs layers is observed by means of band gap spectroscopy. It is explained and simulated in terms of changes in the absorption/emission characteristics of the growing layer. Options for the temperature variation dam** are discussed.
△ Less
Submitted 19 April, 2007;
originally announced April 2007.
-
A new representation of the bulk current in the quantum Hall effect regime
Authors:
Josef Oswald
Abstract:
In preceding papers a Landauer-Buttiker type representation of bulk current transport has been successfully used for the numerical simulation of the magneto transport of 2-dimensional electron systems in the high magnetic field regime. In this paper it is demonstrated, that this representation is in full agreement with a treatment of the bulk current transport as a tunneling process between magn…
▽ More
In preceding papers a Landauer-Buttiker type representation of bulk current transport has been successfully used for the numerical simulation of the magneto transport of 2-dimensional electron systems in the high magnetic field regime. In this paper it is demonstrated, that this representation is in full agreement with a treatment of the bulk current transport as a tunneling process between magnetic bound states. Additionally we find a correspondence between our network representation and the bulk current picture in terms of mixed phases mapped on a checkerboard: At half filled Landau level (LL) coupled droplets of a quantum Hall (QH) liquid phase and coupled droplets of an insulator phase phase exist at the same time, with each of them occupying half of the sample area. Removing a single electron from to such a QH liquid droplet at half filling completes the QH transition to the next higher QH plateau. Adding a single electron to such a droplet at half filling completes the QH transition to the previous lower QH plateau. As a consequence, the sharpness of the QH plateau transitions on the magnetic field axis depends on the typical size of the droplets, which can be understood as a measure of the disorder in the sample.
△ Less
Submitted 19 November, 2004;
originally announced November 2004.
-
Comment on ''Lack of Destructive Interference of Landau Edge States in the Quantum Hall Regime''
Authors:
J. Oswald
Abstract:
This is a comment on a paper published by J.E. Mueller in Phys. Rev. Lett. 72, 2616 (1994).
This is a comment on a paper published by J.E. Mueller in Phys. Rev. Lett. 72, 2616 (1994).
△ Less
Submitted 19 July, 1997;
originally announced July 1997.
-
Phasecoherent Transport in PbTe Wide Parabolic Quantum Wells
Authors:
J. Oswald,
G. Span,
A. Homer,
G. Heigl,
P. Ganitzer,
D. K. Maude,
J. C. Portal
Abstract:
Conductance fluctuations have been observed in macroscopic, quasi 3D PbTe wide quantum wells. A significant increase of the correlation field occurs in a temperature range from 40 mK to 1.2K. At the same time the fluctuation amplitude stays near e^2/h although the lateral sample size is two orders of magnitude larger than any typical length scale of diffusive electron transport. We interpret thi…
▽ More
Conductance fluctuations have been observed in macroscopic, quasi 3D PbTe wide quantum wells. A significant increase of the correlation field occurs in a temperature range from 40 mK to 1.2K. At the same time the fluctuation amplitude stays near e^2/h although the lateral sample size is two orders of magnitude larger than any typical length scale of diffusive electron transport. We interpret this behavior in terms of phasecoherent electrontransport, which takes advantage of a dramatic enhancement of the phasecoherence length of electrons in edgechannels and edge channel loops in the bulk region.
△ Less
Submitted 18 July, 1997;
originally announced July 1997.
-
Edge Channel Dominated Magnetotransport in PbTe Wide Parabolic Quantum Wells
Authors:
J. Oswald,
G. Span,
A. Homer,
G. Heigl,
P. Ganitzer,
D. K. Maude,
J. C. Portal
Abstract:
In PbTe wide parabolic quantum wells (WPQW) a plateau-like structure is observed in the Hall resistance, which corresponds to the Shubnikov-de Haas oscillations in the same manner as known from the quantum Hall effect. At the same time a non-local signal is observed which corresponds to the structure in Rxx and Rxy. We find a striking correspondence between a standard quantum Hall system and thi…
▽ More
In PbTe wide parabolic quantum wells (WPQW) a plateau-like structure is observed in the Hall resistance, which corresponds to the Shubnikov-de Haas oscillations in the same manner as known from the quantum Hall effect. At the same time a non-local signal is observed which corresponds to the structure in Rxx and Rxy. We find a striking correspondence between a standard quantum Hall system and this quasi 3D WPQW system.
△ Less
Submitted 18 July, 1997;
originally announced July 1997.
-
Novel non-local behaviour of quasi-3D Wide Quantum Wells
Authors:
J. Oswald,
G. Span
Abstract:
We investigate the high magnetic field regime of wide quantum wells (WQW) for the case of a many valley host semiconductor. The complete system is described within a modified Landauer-Buettiker formalism and we demonstrate that a parallel contribution of two electron systems in different valleys of the band structure can lead to an edge channel related non-local behaviour even in the 3D-regime.…
▽ More
We investigate the high magnetic field regime of wide quantum wells (WQW) for the case of a many valley host semiconductor. The complete system is described within a modified Landauer-Buettiker formalism and we demonstrate that a parallel contribution of two electron systems in different valleys of the band structure can lead to an edge channel related non-local behaviour even in the 3D-regime. From the obtained general result we derive also a simplified model which applies for the case of much different dissipation. It represents the most dissipative system by an Ohmic resistor network and the less dissipative system by an EC-system.
△ Less
Submitted 16 July, 1997;
originally announced July 1997.
-
Anomalous magnetotransport in wide quantum wells
Authors:
J. Oswald,
G. Span,
A. Homer,
G. Heigl,
P. Ganitzer,
D. K. Maude,
J. C. Portal
Abstract:
We present magneto transport experiments of quasi 3D PbTe wide quantum wells. A plateau-like structure in the Hall resistance is observed, which corresponds to the Shubnikov de Haas oscillations in the same manner as known from the quantum Hall effect. The onsets of plateaux in Rxy do not correspond to 2D filling factors but coincide with the occupation of 3D (bulk-) Landau levels. At the same t…
▽ More
We present magneto transport experiments of quasi 3D PbTe wide quantum wells. A plateau-like structure in the Hall resistance is observed, which corresponds to the Shubnikov de Haas oscillations in the same manner as known from the quantum Hall effect. The onsets of plateaux in Rxy do not correspond to 2D filling factors but coincide with the occupation of 3D (bulk-) Landau levels. At the same time a non-local signal is observed which corresponds to the structure in Rxx and Rxy and fulfils exactly the Onsager-Casimir relation (Rij,kl(B) = Rkl,ij(-B)). We explain the behaviour in terms of edge channel transport which is controlled by a permanent backscattering across a system of "percolative EC - loops" in the bulk region. Long range potential fluctuations with an amplitude of the order of the subband splitting are explained to play an essential role in this electron system.
△ Less
Submitted 16 July, 1997;
originally announced July 1997.
-
Non Local Transport in PbTe Wide Parabolic Quantum Wells
Authors:
J. Oswald,
G. Heigl,
G. Span,
A. Homer,
P. Ganitzer,
D. K. Maude,
J. C. Portal
Abstract:
The results of non-local experiments in different contact configurations are discussed in terms of a non-local behaviour of the contact arms. It is shown that the observed reproducible fluctuations can be understood to result from fluctuations of a non-local bulk current in the contact arms. The fluctuations are explained by edge channel backscattering because of potential fluctuations in the bu…
▽ More
The results of non-local experiments in different contact configurations are discussed in terms of a non-local behaviour of the contact arms. It is shown that the observed reproducible fluctuations can be understood to result from fluctuations of a non-local bulk current in the contact arms. The fluctuations are explained by edge channel backscattering because of potential fluctuations in the bulk region.
△ Less
Submitted 16 July, 1997;
originally announced July 1997.
-
Quenching of the Quantum Hall Effect in PbTe Wide Parabolic Quantum Wells
Authors:
J. Oswald,
G. Heigl,
G. Span,
A. Homer,
P. Ganitzer,
D. K. Maude,
J. C. Portal
Abstract:
We show that for the case of a many valley host semiconductor an edge channel (EC) related non-local behaviour can persist also in the 3D-regime where the quantum Hall effect (QHE) is already quenched. We demonstrate that the QHE is replaced by conductance fluctuations due to EC backscattering in the contact arms, which leads to a fluctuating current redistribution between a dissipative bulk ele…
▽ More
We show that for the case of a many valley host semiconductor an edge channel (EC) related non-local behaviour can persist also in the 3D-regime where the quantum Hall effect (QHE) is already quenched. We demonstrate that the QHE is replaced by conductance fluctuations due to EC backscattering in the contact arms, which leads to a fluctuating current redistribution between a dissipative bulk electron system and a less-dissipative EC-system. Both electron systems are located in different valleys of the band structure. The linear increase of Rxx with the magnetic field is explained by EC-backscattering in the Hall bar
△ Less
Submitted 16 July, 1997;
originally announced July 1997.
-
Conductance Fluctuations in PbTe Wide Parabolic Quantum Wells
Authors:
J. Oswald,
G. Heigl,
G. Span,
A. Homer,
P. Ganitzer,
D. K. Maude,
J. C. Portal
Abstract:
We report on conductance fluctuations which are observed in local and non-local magnetotransport experiments. Although the Hall bar samples are of macroscopic size, the amplitude of the fluctuations from the local measurements is close to e^2/h. It is shown that the fluctuations have to be attributed to edge channel effects.
We report on conductance fluctuations which are observed in local and non-local magnetotransport experiments. Although the Hall bar samples are of macroscopic size, the amplitude of the fluctuations from the local measurements is close to e^2/h. It is shown that the fluctuations have to be attributed to edge channel effects.
△ Less
Submitted 16 July, 1997;
originally announced July 1997.
-
Magnetotransport in wide parabolic PbTe quantum wells
Authors:
J. Oswald,
G. Heigl,
M. Pippan,
G. Span,
T. Stellberger,
D. K. Maude,
J. C. Portal
Abstract:
The 3D- and 2D- behaviour of wide parabolic PbTe single quantum wells, which consist of PbTe p-n-p-structures, are studied theoretically and experimentally. A simple model combines the 2D- subband levels and the 3D-Landau levels in order to calculate the density of states in a magnetic field perpendicular to the 2D plane. It is shown that at a channel width of about 300nm on can expect to observ…
▽ More
The 3D- and 2D- behaviour of wide parabolic PbTe single quantum wells, which consist of PbTe p-n-p-structures, are studied theoretically and experimentally. A simple model combines the 2D- subband levels and the 3D-Landau levels in order to calculate the density of states in a magnetic field perpendicular to the 2D plane. It is shown that at a channel width of about 300nm on can expect to observe 3D- and 2D-behaviour at the same time. Magnetotransport experiments in selectively contacted Hall bar samples are performed at temperatures down to T = 50 mK and at magnetic fields up to B = 17 T.
△ Less
Submitted 16 July, 1997;
originally announced July 1997.
-
Magnetotransport in PbTe nipi structures
Authors:
J. Oswald,
M. Pippan,
G. Heigl,
G. Span,
T. Stellberger
Abstract:
In this paper the 3D- and 2D- behavior of wide quantum wells which consist of one period of a PbTe nipi-structure is studied theoretically and experimentally. A simple model combines the 2D- subband levels and the 3D-Landau levels in order to calculate the density of states in a magnetic field perpendicular to the 2D plane. It is shown that at a channel width of about 500 nm on can expect to obs…
▽ More
In this paper the 3D- and 2D- behavior of wide quantum wells which consist of one period of a PbTe nipi-structure is studied theoretically and experimentally. A simple model combines the 2D- subband levels and the 3D-Landau levels in order to calculate the density of states in a magnetic field perpendicular to the 2D plane. It is shown that at a channel width of about 500 nm on can expect to observe 3D- and 2D-behavior at the same time. Finally the general design aspects for PbTe wide quantum wells are discussed.
△ Less
Submitted 16 July, 1997;
originally announced July 1997.
-
Universality in the Crossover between Edge Channel and Bulk Transport in the Quantum Hall Regime
Authors:
J. Oswald,
G. Span,
F. Kuchar
Abstract:
We present a new theoretical approach for the integer quantum Hall effect, which is able to describe the inter-plateau transitions as well as the transition to the Hall insulator. We find two regimes (metallic and insulator like) of the top Landau level, in which the dissipative bulk current appears in different directions. The regimes are separated by a temperature invariant point.
We present a new theoretical approach for the integer quantum Hall effect, which is able to describe the inter-plateau transitions as well as the transition to the Hall insulator. We find two regimes (metallic and insulator like) of the top Landau level, in which the dissipative bulk current appears in different directions. The regimes are separated by a temperature invariant point.
△ Less
Submitted 5 February, 1998; v1 submitted 10 July, 1997;
originally announced July 1997.