-
Resolving Variable Respiratory Motion From Unsorted 4D Computed Tomography
Authors:
Yuliang Huang,
Bjoern Eiben,
Kris Thielemans,
Jamie R. McClelland
Abstract:
4D Computed Tomography (4DCT) is widely used for many clinical applications such as radiotherapy treatment planning, PET and ventilation imaging. However, common 4DCT methods reconstruct multiple breath cycles into a single, arbitrary breath cycle which can lead to various artefacts, impacting the downstream clinical applications. Surrogate driven motion models can estimate continuous variable mot…
▽ More
4D Computed Tomography (4DCT) is widely used for many clinical applications such as radiotherapy treatment planning, PET and ventilation imaging. However, common 4DCT methods reconstruct multiple breath cycles into a single, arbitrary breath cycle which can lead to various artefacts, impacting the downstream clinical applications. Surrogate driven motion models can estimate continuous variable motion across multiple cycles based on CT segments `unsorted' from 4DCT, but it requires respiration surrogate signals with strong correlation to the internal motion, which are not always available. The method proposed in this study eliminates such dependency by adapting the hyper-gradient method to the optimization of surrogate signals as hyper-parameters, while achieving better or comparable performance, as demonstrated on digital phantom simulations and real patient data. Our method produces a high-quality motion-compensated image together with estimates of the motion, including breath-to-breath variability, throughout the image acquisition. Our method has the potential to improve downstream clinical applications, and also enables retrospective analysis of open access 4DCT dataset where no respiration signals are stored. Code is avaibale at https://github.com/Yuliang-Huang/4DCT-irregular-motion.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance
Authors:
Osama Yousuf,
Brian Hoskins,
Karthick Ramu,
Mitchell Fream,
William A. Borders,
Advait Madhavan,
Matthew W. Daniels,
Andrew Dienstfrey,
Jabez J. McClelland,
Martin Lueker-Boden,
Gina C. Adam
Abstract:
Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solution…
▽ More
Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototy** platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer map**, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Measurement-driven Langevin modeling of superparamagnetic tunnel junctions
Authors:
Liam A. Pocher,
Temitayo N. Adeyeye,
Sidra Gibeault,
Philippe Talatchian,
Ursula Ebels,
Daniel P. Lathrop,
Jabez J. McClelland,
Mark D. Stiles,
Advait Madhavan,
Matthew W. Daniels
Abstract:
Superparamagnetic tunnel junctions are important devices for a range of emerging technologies, but most existing compact models capture only their mean switching rates. Capturing qualitatively accurate analog dynamics of these devices will be important as the technology scales up. Here we present results using a one-dimensional overdamped Langevin equation that captures statistical properties of m…
▽ More
Superparamagnetic tunnel junctions are important devices for a range of emerging technologies, but most existing compact models capture only their mean switching rates. Capturing qualitatively accurate analog dynamics of these devices will be important as the technology scales up. Here we present results using a one-dimensional overdamped Langevin equation that captures statistical properties of measured time traces, including voltage histograms, drift and diffusion characteristics as measured with Kramers-Moyal coefficients, and dwell times distributions. While common macrospin models are more physically-motivated magnetic models than the Langevin model, we show that for the device measured here, they capture even fewer of the measured experimental behaviors.
△ Less
Submitted 2 July, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Programmable electrical coupling between stochastic magnetic tunnel junctions
Authors:
Sidra Gibeault,
Temitayo N. Adeyeye,
Liam A. Pocher,
Daniel P. Lathrop,
Matthew W. Daniels,
Mark D. Stiles,
Jabez J. McClelland,
William A. Borders,
Jason T. Ryan,
Philippe Talatchian,
Ursula Ebels,
Advait Madhavan
Abstract:
Superparamagnetic tunnel junctions (SMTJs) are promising sources of randomness for compact and energy efficient implementations of probabilistic computing techniques. Augmenting an SMTJ with electronic circuits, to convert the random telegraph fluctuations of its resistance state to stochastic digital signals, gives a basic building block known as a probabilistic bit or $p$-bit. Though scalable pr…
▽ More
Superparamagnetic tunnel junctions (SMTJs) are promising sources of randomness for compact and energy efficient implementations of probabilistic computing techniques. Augmenting an SMTJ with electronic circuits, to convert the random telegraph fluctuations of its resistance state to stochastic digital signals, gives a basic building block known as a probabilistic bit or $p$-bit. Though scalable probabilistic computing methods connecting $p$-bits have been proposed, practical implementations are limited by either minimal tunability or energy inefficient microprocessors-in-the-loop. In this work, we experimentally demonstrate the functionality of a scalable analog unit cell, namely a pair of $p$-bits with programmable electrical coupling. This tunable coupling is implemented with operational amplifier circuits that have a time constant of approximately 1us, which is faster than the mean dwell times of the SMTJs over most of the operating range. Programmability enables flexibility, allowing both positive and negative couplings, as well as coupling devices with widely varying device properties. These tunable coupling circuits can achieve the whole range of correlations from $-1$ to $1$, for both devices with similar timescales, and devices whose time scales vary by an order of magnitude. This range of correlation allows such circuits to be used for scalable implementations of simulated annealing with probabilistic computing.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Measurement-driven neural-network training for integrated magnetic tunnel junction arrays
Authors:
William A. Borders,
Advait Madhavan,
Matthew W. Daniels,
Vasileia Georgiou,
Martin Lueker-Boden,
Tiffany S. Santos,
Patrick M. Braganca,
Mark D. Stiles,
Jabez J. McClelland,
Brian D. Hoskins
Abstract:
The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as…
▽ More
The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications.
△ Less
Submitted 14 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
SODA: Bottleneck Diffusion Models for Representation Learning
Authors:
Drew A. Hudson,
Daniel Zoran,
Mateusz Malinowski,
Andrew K. Lampinen,
Andrew Jaegle,
James L. McClelland,
Loic Matthey,
Felix Hill,
Alexander Lerchner
Abstract:
We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, in turn, guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, and leveraging novel view synthesis as a self-supervised…
▽ More
We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, in turn, guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, and leveraging novel view synthesis as a self-supervised objective, we can turn diffusion models into strong representation learners, capable of capturing visual semantics in an unsupervised manner. To the best of our knowledge, SODA is the first diffusion model to succeed at ImageNet linear-probe classification, and, at the same time, it accomplishes reconstruction, editing and synthesis tasks across a wide range of datasets. Further investigation reveals the disentangled nature of its emergent latent space, that serves as an effective interface to control and manipulate the model's produced images. All in all, we aim to shed light on the exciting and promising potential of diffusion models, not only for image generation, but also for learning rich and robust representations.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Causal interventions expose implicit situation models for commonsense language understanding
Authors:
Takateru Yamakoshi,
James L. McClelland,
Adele E. Goldberg,
Robert D. Hawkins
Abstract:
Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively sma…
▽ More
Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched ``syntactic'' control where the situation model is not strictly necessary. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution.
△ Less
Submitted 7 June, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers
Authors:
Andrew J. Nam,
Mustafa Abdool,
Trevor Maxfield,
James L. McClelland
Abstract:
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language m…
▽ More
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers.
△ Less
Submitted 13 December, 2022; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Learning to Reason With Relational Abstractions
Authors:
Andrew J. Nam,
Mengye Ren,
Chelsea Finn,
James L. McClelland
Abstract:
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study…
▽ More
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.
△ Less
Submitted 5 December, 2022; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Authors:
Yuxuan Li,
James L. McClelland
Abstract:
Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional inputs. However, there is an ongoing debate about how and when transformers can acquire highly structured behavior and achieve systematic generalization. Here, we…
▽ More
Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional inputs. However, there is an ongoing debate about how and when transformers can acquire highly structured behavior and achieve systematic generalization. Here, we explore how well a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions of these operations. We demonstrate strong generalization to sequences longer than those used in training by replacing the standard positional encoding typically used in transformers with labels arbitrarily paired with items in the sequence. We search for the layer and head configuration sufficient to solve these tasks, then probe for signs of systematic processing in latent representations and attention patterns. We show that two-layer transformers learn reliable solutions to multi-level problems, develop signs of task decomposition, and encode input items in a way that encourages the exploitation of shared computation across related tasks. These results provide key insights into how attention layers support structured computation both within a task and across multiple tasks.
△ Less
Submitted 10 December, 2022; v1 submitted 1 October, 2022;
originally announced October 2022.
-
Language models show human-like content effects on reasoning tasks
Authors:
Ishita Dasgupta,
Andrew K. Lampinen,
Stephanie C. Y. Chan,
Hannah R. Sheahan,
Antonia Creswell,
Dharshan Kumaran,
James L. McClelland,
Felix Hill
Abstract:
Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semant…
▽ More
Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semantic content of a problem supports the correct logical inferences. These content-entangled reasoning patterns play a central role in debates about the fundamental nature of human intelligence. Here, we investigate whether language models $\unicode{x2014}$ whose prior expectations capture some aspects of human knowledge $\unicode{x2014}$ similarly mix content into their answers to logical problems. We explored this question across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task. We evaluate state of the art large language models, as well as humans, and find that the language models reflect many of the same patterns observed in humans across these tasks $\unicode{x2014}$ like humans, models answer more accurately when the semantic content of a task supports the logical inferences. These parallels are reflected both in answer patterns, and in lower-level features like the relationship between model answer distributions and human response times. Our findings have implications for understanding both these cognitive effects in humans, and the factors that contribute to language model performance.
△ Less
Submitted 30 October, 2023; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Authors:
Stephanie C. Y. Chan,
Adam Santoro,
Andrew K. Lampinen,
Jane X. Wang,
Aaditya Singh,
Pierre H. Richemond,
Jay McClelland,
Felix Hill
Abstract:
Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distribu…
▽ More
Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distributional properties such as burstiness (items appear in clusters rather than being uniformly distributed over time) and having large numbers of rarely occurring classes. In-context learning also emerges more strongly when item meanings or interpretations are dynamic rather than fixed. These properties are exemplified by natural language, but are also inherent to naturalistic data in a wide range of other domains. They also depart significantly from the uniform, i.i.d. training distributions typically used for standard supervised learning. In our initial experiments, we found that in-context learning traded off against more conventional weight-based learning, and models were unable to achieve both simultaneously. However, our later experiments uncovered that the two modes of learning could co-exist in a single model when it was trained on data following a skewed Zipfian distribution -- another common property of naturalistic data, including language. In further experiments, we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models. In sum, our findings indicate how the transformer architecture works together with particular properties of the training data to drive the intriguing emergent in-context learning behaviour of large language models, and how future work might encourage both in-context and in-weights learning in domains beyond language.
△ Less
Submitted 17 November, 2022; v1 submitted 22 April, 2022;
originally announced May 2022.
-
Can language models learn from explanations in context?
Authors:
Andrew K. Lampinen,
Ishita Dasgupta,
Stephanie C. Y. Chan,
Kory Matthewson,
Michael Henry Tessler,
Antonia Creswell,
James L. McClelland,
Jane X. Wang,
Felix Hill
Abstract:
Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different typ…
▽ More
Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different types of explanations, instructions, and controls affect zero- and few-shot performance. We analyze these results using statistical multilevel modeling techniques that account for the nested dependencies among conditions, tasks, prompts, and models. We find that explanations can improve performance -- even without tuning. Furthermore, explanations hand-tuned for performance on a small validation set offer substantially larger benefits, and building a prompt by selecting examples and explanations together substantially improves performance over selecting examples alone. Finally, even untuned explanations outperform carefully matched controls, suggesting that the benefits are due to the link between an example and its explanation, rather than lower-level features. However, only large models benefit. In summary, explanations can support the in-context learning of large LMs on challenging tasks.
△ Less
Submitted 10 October, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Implementation of a Binary Neural Network on a Passive Array of Magnetic Tunnel Junctions
Authors:
Jonathan M. Goodwill,
Nitin Prasad,
Brian D. Hoskins,
Matthew W. Daniels,
Advait Madhavan,
Lei Wan,
Tiffany S. Santos,
Michael Tran,
Jordan A. Katine,
Patrick M. Braganca,
Mark D. Stiles,
Jabez J. McClelland
Abstract:
The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magn…
▽ More
The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magnetic tunnel junctions (MTJs) to demonstrate neural network hardware inference based on passive arrays of MTJs. In general, transferring a trained network model to hardware for inference is confronted by degradation in performance due to device-to-device variations, write errors, parasitic resistance, and nonidealities in the substrate. To quantify the effect of these hardware realities, we benchmark 300 unique weight matrix solutions of a 2-layer perceptron to classify the Wine dataset for both classification accuracy and write fidelity. Despite device imperfections, we achieve software-equivalent accuracy of up to 95.3 % with proper tuning of network parameters in 15 x 15 MTJ arrays having a range of device sizes. The success of this tuning process shows that new metrics are needed to characterize the performance and quality of networks reproduced in mixed signal hardware.
△ Less
Submitted 6 May, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Tell me why! Explanations support learning relational and causal structure
Authors:
Andrew K. Lampinen,
Nicholas A. Roy,
Ishita Dasgupta,
Stephanie C. Y. Chan,
Allison C. Tam,
James L. McClelland,
Chen Yan,
Adam Santoro,
Neil C. Rabinowitz,
Jane X. Wang,
Felix Hill
Abstract:
Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational a…
▽ More
Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational and causal knowledge, augmenting their experience by training them to predict language descriptions and explanations can overcome these limitations. We show that language can help agents learn challenging relational tasks, and examine which aspects of language contribute to its benefits. We then show that explanations can help agents to infer not only relational but also causal structure. Language can shape the way that agents to generalize out-of-distribution from ambiguous, causally-confounded training, and explanations even allow agents to learn to perform experimental interventions to identify causal relationships. Our results suggest that language description and explanation may be powerful tools for improving agent learning and generalization.
△ Less
Submitted 25 May, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Systematic human learning and generalization from a brief tutorial with explanatory feedback
Authors:
Andrew J. Nam,
James L. McClelland
Abstract:
Neural networks have long been used to model human intelligence, capturing elements of behavior and cognition, and their neural basis. Recent advancements in deep learning have enabled neural network models to reach and even surpass human levels of intelligence in many respects, yet unlike humans, their ability to learn new tasks quickly remains a challenge. People can reason not only in familiar…
▽ More
Neural networks have long been used to model human intelligence, capturing elements of behavior and cognition, and their neural basis. Recent advancements in deep learning have enabled neural network models to reach and even surpass human levels of intelligence in many respects, yet unlike humans, their ability to learn new tasks quickly remains a challenge. People can reason not only in familiar domains, but can also rapidly learn to reason through novel problems and situations, raising the question of how well modern neural network models capture human intelligence and in which ways they diverge. In this work, we explore this gap by investigating human adults' ability to learn an abstract reasoning task based on Sudoku from a brief instructional tutorial with explanatory feedback for incorrect responses using a narrow range of training examples. We find that participants who master the task do so within a small number of trials and generalize well to puzzles outside of the training range. We also find that most of those who master the task can describe a valid solution strategy, and such participants perform better on transfer puzzles than those whose strategy descriptions are vague or incomplete. Interestingly, fewer than half of our human participants were successful in acquiring a valid solution strategy, and this ability is associated with high school mathematics education. We consider the challenges these findings pose for building computational models that capture all aspects of our findings and point toward a possible role for learning to engage in explanation-based reasoning to support rapid learning and generalization.
△ Less
Submitted 28 March, 2023; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Mutual control of stochastic switching for two electrically coupled superparamagnetic tunnel junctions
Authors:
Philippe Talatchian,
Matthew W. Daniels,
Advait Madhavan,
Matthew R. Pufall,
Emilie Jué,
William H. Rippard,
Jabez J. McClelland,
Mark D. Stiles
Abstract:
Superparamagnetic tunnel junctions (SMTJs) are promising sources for the randomness required by some compact and energy-efficient computing schemes. Coupling SMTJs gives rise to collective behavior that could be useful for cognitive computing. We use a simple linear electrical circuit to mutually couple two SMTJs through their stochastic electrical transitions. When one SMTJ makes a thermally indu…
▽ More
Superparamagnetic tunnel junctions (SMTJs) are promising sources for the randomness required by some compact and energy-efficient computing schemes. Coupling SMTJs gives rise to collective behavior that could be useful for cognitive computing. We use a simple linear electrical circuit to mutually couple two SMTJs through their stochastic electrical transitions. When one SMTJ makes a thermally induced transition, the voltage across both SMTJs changes, modifying the transition rates of both. This coupling leads to significant correlation between the states of the two devices. Using fits to a generalized Néel-Brown model for the individual thermally bistable magnetic devices, we can accurately reproduce the behavior of the coupled devices with a Markov model.
△ Less
Submitted 19 August, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Transforming task representations to perform novel tasks
Authors:
Andrew K. Lampinen,
James L. McClelland
Abstract:
An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero-shot), based on its relationship to previous tasks. Humans can exhibit this cognitive flexibility. By contrast, models that achieve superhuman performance in specific tasks often fail to adapt to even slight task alterations. To address this, we propose a general computational framework…
▽ More
An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero-shot), based on its relationship to previous tasks. Humans can exhibit this cognitive flexibility. By contrast, models that achieve superhuman performance in specific tasks often fail to adapt to even slight task alterations. To address this, we propose a general computational framework for adapting to novel tasks based on their relationship to prior tasks. We begin by learning vector representations of tasks. To adapt to new tasks, we propose meta-map**s, higher-order tasks that transform basic task representations. We demonstrate the effectiveness of this framework across a wide variety of tasks and computational paradigms, ranging from regression to image classification and reinforcement learning. We compare to both human adaptability and language-based approaches to zero-shot learning. Across these domains, meta-map** is successful, often achieving 80-90% performance, without any data, on a novel task, even when the new task directly contradicts prior experience. We further show that meta-map** can not only generalize to new tasks via learned relationships, but can also generalize using novel relationships unseen during training. Finally, using meta-map** as a starting point can dramatically accelerate later learning on a new task, and reduce learning time and cumulative error substantially. Our results provide insight into a possible computational basis of intelligent adaptability and offer a possible framework for modeling cognitive flexibility and building more flexible artificial intelligence systems.
△ Less
Submitted 6 October, 2020; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Extending Machine Language Models toward Human-Level Language Understanding
Authors:
James L. McClelland,
Felix Hill,
Maja Rudolph,
Jason Baldridge,
Hinrich Schütze
Abstract:
Language is crucial for human intelligence, but what exactly is its role? We take language to be a part of a system for understanding and communicating about situations. The human ability to understand and communicate about situations emerges gradually from experience and depends on domain-general principles of biological neural networks: connection-based learning, distributed representation, and…
▽ More
Language is crucial for human intelligence, but what exactly is its role? We take language to be a part of a system for understanding and communicating about situations. The human ability to understand and communicate about situations emerges gradually from experience and depends on domain-general principles of biological neural networks: connection-based learning, distributed representation, and context-sensitive, mutual constraint satisfaction-based processing. Current artificial language processing systems rely on the same domain general principles, embodied in artificial neural networks. Indeed, recent progress in this field depends on \emph{query-based attention}, which extends the ability of these systems to exploit context and has contributed to remarkable breakthroughs. Nevertheless, most current models focus exclusively on language-internal tasks, limiting their ability to perform tasks that depend on understanding situations. These systems also lack memory for the contents of prior situations outside of a fixed contextual span. We describe the organization of the brain's distributed understanding system, which includes a fast learning system that addresses the memory problem. We sketch a framework for future models of understanding drawing equally on cognitive neuroscience and artificial intelligence and exploiting query-based attention. We highlight relevant current directions and consider further developments needed to fully capture human-level language understanding in a computational system.
△ Less
Submitted 4 July, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Environmental drivers of systematicity and generalization in a situated agent
Authors:
Felix Hill,
Andrew Lampinen,
Rosalia Schneider,
Stephen Clark,
Matthew Botvinick,
James L. McClelland,
Adam Santoro
Abstract:
The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparative…
▽ More
The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparatively generic agent architecture that exhibits strong performance on these tests. We then identify three aspects of the training regime and environment that make a significant difference to its performance: (a) the number of object/word experiences in the training set; (b) the visual invariances afforded by the agent's perspective, or frame of reference; and (c) the variety of visual input inherent in the perceptual aspect of the agent's perception. Our findings indicate that the degree of generalisation that networks exhibit can depend critically on particulars of the environment in which a given task is instantiated. They further suggest that the propensity for neural networks to generalise in systematic ways may increase if, like human children, those networks have access to many frames of richly varying, multi-modal observations as they learn.
△ Less
Submitted 19 February, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Generative Continual Concept Learning
Authors:
Mohammad Rostami,
Soheil Kolouri,
James McClelland,
Praveen Pilly
Abstract:
After learning a concept, humans are also able to continually generalize their learned concepts to new domains by observing only a few labeled instances without any interference with the past learned knowledge. In contrast, learning concepts efficiently in a continual learning setting remains an open challenge for current Artificial Intelligence algorithms as persistent model retraining is necessa…
▽ More
After learning a concept, humans are also able to continually generalize their learned concepts to new domains by observing only a few labeled instances without any interference with the past learned knowledge. In contrast, learning concepts efficiently in a continual learning setting remains an open challenge for current Artificial Intelligence algorithms as persistent model retraining is necessary. Inspired by the Parallel Distributed Processing learning and the Complementary Learning Systems theories, we develop a computational model that is able to expand its previously learned concepts efficiently to new domains using a few labeled samples. We couple the new form of a concept to its past learned forms in an embedding space for effective continual learning. Doing so, a generative distribution is learned such that it is shared across the tasks in the embedding space and models the abstract concepts. This procedure enables the model to generate pseudo-data points to replay the past experience to tackle catastrophic forgetting.
△ Less
Submitted 7 September, 2019; v1 submitted 9 June, 2019;
originally announced June 2019.
-
Zero-shot task adaptation by homoiconic meta-map**
Authors:
Andrew K. Lampinen,
James L. McClelland
Abstract:
How can deep learning systems flexibly reuse their knowledge? Toward this goal, we propose a new class of challenges, and a class of architectures that can solve them. The challenges are meta-map**s, which involve systematically transforming task behaviors to adapt to new tasks zero-shot. The key to achieving these challenges is representing the task being performed in such a way that this task…
▽ More
How can deep learning systems flexibly reuse their knowledge? Toward this goal, we propose a new class of challenges, and a class of architectures that can solve them. The challenges are meta-map**s, which involve systematically transforming task behaviors to adapt to new tasks zero-shot. The key to achieving these challenges is representing the task being performed in such a way that this task representation is itself transformable. We therefore draw inspiration from functional programming and recent work in meta-learning to propose a class of Homoiconic Meta-Map** (HoMM) approaches that represent data points and tasks in a shared latent space, and learn to infer transformations of that space. HoMM approaches can be applied to any type of machine learning task. We demonstrate the utility of this perspective by exhibiting zero-shot remap** of behavior to adapt to new tasks.
△ Less
Submitted 12 November, 2019; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Streaming Batch Eigenupdates for Hardware Neuromorphic Networks
Authors:
Brian D. Hoskins,
Matthew W. Daniels,
Siyuan Huang,
Advait Madhavan,
Gina C. Adam,
Nikolai Zhitenev,
Jabez J. McClelland,
Mark D. Stiles
Abstract:
Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the tim…
▽ More
Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the time complexity of training does not scale with the network size, it is limited by the space complexity of stochastic gradient descent, which grows quadratically. The main objective of this work is to reduce this space complexity by using low-rank approximations of stochastic gradient descent. This low spatial complexity combined with streaming methods allows for significant reductions in memory and compute overhead, opening the doors for improvements in area, time and energy efficiency of training. We refer to this algorithm and architecture to implement it as the streaming batch eigenupdate (SBE) approach.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
A mathematical theory of semantic development in deep neural networks
Authors:
Andrew M. Saxe,
James L. McClelland,
Surya Ganguli
Abstract:
An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual expe…
▽ More
An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep learning dynamics to give rise to these regularities.
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Uncertainty in multitask learning: joint representations for probabilistic MR-only radiotherapy planning
Authors:
Felix J. S. Bragman,
Ryutaro Tanno,
Zach Eaton-Rosen,
Wenqi Li,
David J. Hawkes,
Sebastien Ourselin,
Daniel C. Alexander,
Jamie R. McClelland,
M. Jorge Cardoso
Abstract:
Multi-task neural network architectures provide a mechanism that jointly integrates information from distinct sources. It is ideal in the context of MR-only radiotherapy planning as it can jointly regress a synthetic CT (synCT) scan and segment organs-at-risk (OAR) from MRI. We propose a probabilistic multi-task network that estimates: 1) intrinsic uncertainty through a heteroscedastic noise model…
▽ More
Multi-task neural network architectures provide a mechanism that jointly integrates information from distinct sources. It is ideal in the context of MR-only radiotherapy planning as it can jointly regress a synthetic CT (synCT) scan and segment organs-at-risk (OAR) from MRI. We propose a probabilistic multi-task network that estimates: 1) intrinsic uncertainty through a heteroscedastic noise model for spatially-adaptive task loss weighting and 2) parameter uncertainty through approximate Bayesian inference. This allows sampling of multiple segmentations and synCTs that share their network representation. We test our model on prostate cancer scans and show that it produces more accurate and consistent synCTs with a better estimation in the variance of the errors, state of the art results in OAR segmentation and a methodology for quality assurance in radiotherapy treatment planning.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
One-shot and few-shot learning of word embeddings
Authors:
Andrew K. Lampinen,
James L. McClelland
Abstract:
Standard deep learning systems require thousands or millions of examples to learn a concept, and cannot integrate new concepts easily. By contrast, humans have an incredible ability to do one-shot or few-shot learning. For instance, from just hearing a word used in a sentence, humans can infer a great deal about it, by leveraging what the syntax and semantics of the surrounding words tells us. Her…
▽ More
Standard deep learning systems require thousands or millions of examples to learn a concept, and cannot integrate new concepts easily. By contrast, humans have an incredible ability to do one-shot or few-shot learning. For instance, from just hearing a word used in a sentence, humans can infer a great deal about it, by leveraging what the syntax and semantics of the surrounding words tells us. Here, we draw inspiration from this to highlight a simple technique by which deep recurrent networks can similarly exploit their prior knowledge to learn a useful representation for a new word from little data. This could make natural language processing systems much more flexible, by allowing them to learn continually from the new words they encounter.
△ Less
Submitted 2 January, 2018; v1 submitted 27 October, 2017;
originally announced October 2017.
-
Stateful characterization of resistive switching TiO2 with electron beam induced currents
Authors:
Brian D. Hoskins,
Gina C. Adam,
Evgheni Strelcov,
Nikolai Zhitenev,
Andrei Kolmakov,
Dmitri B. Strukov,
Jabez J. McClelland
Abstract:
Metal oxide resistive switches are increasingly important as possible artificial synapses in next generation neuromorphic networks. Nevertheless, there is still no codified set of tools for studying properties of the devices. To this end, we demonstrate electron beam induced current measurements as a powerful method to monitor the development of local resistive switching in TiO2 based devices. By…
▽ More
Metal oxide resistive switches are increasingly important as possible artificial synapses in next generation neuromorphic networks. Nevertheless, there is still no codified set of tools for studying properties of the devices. To this end, we demonstrate electron beam induced current measurements as a powerful method to monitor the development of local resistive switching in TiO2 based devices. By comparing beam-energy dependent electron beam induced currents with Monte Carlo simulations of the energy absorption in different device layers, it is possible to deconstruct the origins of filament image formation and relate this to both morphological changes and the state of the switch. By clarifying the contrast mechanisms in electron beam induced current microscopy it is possible to gain new insights into the scaling of the resistive switching phenomenon and observe the formation of a current leakage region around the switching filament. Additionally, analysis of symmetric device structures reveals propagating polarization domains.
△ Less
Submitted 30 October, 2017; v1 submitted 5 April, 2017;
originally announced April 2017.
-
EMDUnifrac: Exact Linear Time Computation of the Unifrac Metric and Identification of Differentially Abundant Organisms
Authors:
Jason McClelland,
David Koslicki
Abstract:
Both the weighted and unweighted Unifrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the Unifrac metric is equivalent to the so-called earth mover's distance (also known as the Kantorovich-Rubinstein metric) to develop an algorithm that not only co…
▽ More
Both the weighted and unweighted Unifrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the Unifrac metric is equivalent to the so-called earth mover's distance (also known as the Kantorovich-Rubinstein metric) to develop an algorithm that not only computes the Unifrac distance in linear time and space, but also simultaneously finds which operational taxonomic units are responsible for the observed differences between samples. This allows the algorithm, called EMDUnifrac, to determine why given samples are different, not just if they are different, and with no added computational burden. EMDUnifrac can be utilized on any distribution on a tree, and so is particularly suitable to analyzing both operational taxonomic units derived from amplicon sequencing, as well as community profiles resulting from classifying whole genome shotgun metagenomes. The EMDUnifrac source code (written in python) is freely available at: https://github.com/dkoslicki/EMDUnifrac.
△ Less
Submitted 14 November, 2016;
originally announced November 2016.
-
Imaging Nanophotonic Modes of Microresonators using a Focused Ion Beam
Authors:
Kevin A. Twedt,
Jie Zou,
Marcelo Davanco,
Kartik Srinivasan,
Jabez J. McClelland,
Vladimir A. Aksyuk
Abstract:
Optical microresonators have proven powerful in a wide range of applications, including cavity quantum electrodynamics, biosensing, microfludics, and cavity optomechanics. Their performance depends critically on the exact distribution of optical energy, confined and shaped by the nanoscale device geometry. Near-field optical probes can image this distribution, but the physical probe necessarily pe…
▽ More
Optical microresonators have proven powerful in a wide range of applications, including cavity quantum electrodynamics, biosensing, microfludics, and cavity optomechanics. Their performance depends critically on the exact distribution of optical energy, confined and shaped by the nanoscale device geometry. Near-field optical probes can image this distribution, but the physical probe necessarily perturbs the near field, which is particularly problematic for sensitive high quality factor resonances. We present a new approach to map** nanophotonic modes that uses a controllably small and local optomechanical perturbation introduced by a focused lithium ion beam. An ion beam (radius about 50 nm) induces a picometer-scale dynamic deformation of the resonator surface, which we detect through a shift in the optical resonance wavelength. We map five modes of a silicon microdisk resonator (Q > 20,000) with both high spatial and spectral resolution. Our technique also enables in-situ observation of ion implantation damage and relaxation dynamics in a silicon lattice.
△ Less
Submitted 22 December, 2015;
originally announced December 2015.
-
Bright focused ion beam sources based on laser-cooled atoms
Authors:
J. J. McClelland,
A. V. Steele,
B. Knuffman,
K. A. Twedt,
A. Schwarzkopf,
T. M. Wilson
Abstract:
Nanoscale focused ion beams (FIBs) represent one of the most useful tools in nanotechnology, enabling nanofabrication via milling and gas-assisted deposition, microscopy and microanalysis, and selective, spatially resolved do** of materials. Recently, a new type of FIB source has emerged, which uses ionization of laser cooled neutral atoms to produce the ion beam. The extremely cold temperatures…
▽ More
Nanoscale focused ion beams (FIBs) represent one of the most useful tools in nanotechnology, enabling nanofabrication via milling and gas-assisted deposition, microscopy and microanalysis, and selective, spatially resolved do** of materials. Recently, a new type of FIB source has emerged, which uses ionization of laser cooled neutral atoms to produce the ion beam. The extremely cold temperatures attainable with laser cooling (in the range of 100 uK or below) result in a beam of ions with a very small transverse velocity distribution. This corresponds to a source with extremely high brightness that rivals or may even exceed the brightness of the industry standard Ga+ liquid metal ion source. In this review we discuss the context of ion beam technology in which these new ion sources can play a role, their principles of operation, and some examples of recent demonstrations. The field is relatively new, so only a few applications have been demonstrated, most notably low energy ion microscopy with Li ions. Nevertheless, a number of promising new approaches have been proposed and/or demonstrated, suggesting that a rapid evolution of this type of source is likely in the near future.
△ Less
Submitted 10 February, 2016; v1 submitted 29 October, 2015;
originally announced October 2015.
-
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Authors:
Andrew M. Saxe,
James L. McClelland,
Surya Ganguli
Abstract:
Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map,…
▽ More
Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.
△ Less
Submitted 19 February, 2014; v1 submitted 20 December, 2013;
originally announced December 2013.
-
Narrow-line magneto-optical cooling and trap** of strongly magnetic atoms
Authors:
Andrew J. Berglund,
James L. Hanssen,
Jabez J. McClelland
Abstract:
Laser cooling on weak transitions is a useful technique for reaching ultracold temperatures in atoms with multiple valence electrons. However, for strongly magnetic atoms a conventional narrow-line magneto-optical trap (MOT) is destabilized by competition between optical and magnetic forces. We overcome this difficulty in Er by develo** an unusual narrow-line MOT that balances optical and magn…
▽ More
Laser cooling on weak transitions is a useful technique for reaching ultracold temperatures in atoms with multiple valence electrons. However, for strongly magnetic atoms a conventional narrow-line magneto-optical trap (MOT) is destabilized by competition between optical and magnetic forces. We overcome this difficulty in Er by develo** an unusual narrow-line MOT that balances optical and magnetic forces using laser light tuned to the blue side of a narrow (8 kHz) transition. The trap population is spin-polarized with temperatures reaching below 2 microkelvin. Our results constitute an alternative method for laser cooling on weak transitions, applicable to rare-earth-metal and metastable alkaline earth elements.
△ Less
Submitted 6 February, 2008;
originally announced February 2008.
-
Sub-Doppler laser cooling and magnetic trap** of erbium
Authors:
Andrew J. Berglund,
Siu Au Lee,
Jabez J. McClelland
Abstract:
We investigate cooling mechanisms in magneto-optically and magnetically trapped erbium. We find efficient sub-Doppler cooling in our trap, which can persist even in large magnetic fields due to the near degeneracy of two Lande g factors. Furthermore, a continuously loaded magnetic trap is demonstrated where we observe temperatures below 25 microkelvin. These favorable cooling and trap** proper…
▽ More
We investigate cooling mechanisms in magneto-optically and magnetically trapped erbium. We find efficient sub-Doppler cooling in our trap, which can persist even in large magnetic fields due to the near degeneracy of two Lande g factors. Furthermore, a continuously loaded magnetic trap is demonstrated where we observe temperatures below 25 microkelvin. These favorable cooling and trap** properties suggest a number of scientific possibilities for rare-earth atomic physics, including narrow linewidth laser cooling and spectroscopy, unique collision studies, and degenerate bosonic and fermionic gases with long-range magnetic dipole coupling.
△ Less
Submitted 6 February, 2008;
originally announced February 2008.