Search | arXiv e-print repository

Resolving Variable Respiratory Motion From Unsorted 4D Computed Tomography

Authors: Yuliang Huang, Bjoern Eiben, Kris Thielemans, Jamie R. McClelland

Abstract: 4D Computed Tomography (4DCT) is widely used for many clinical applications such as radiotherapy treatment planning, PET and ventilation imaging. However, common 4DCT methods reconstruct multiple breath cycles into a single, arbitrary breath cycle which can lead to various artefacts, impacting the downstream clinical applications. Surrogate driven motion models can estimate continuous variable mot… ▽ More 4D Computed Tomography (4DCT) is widely used for many clinical applications such as radiotherapy treatment planning, PET and ventilation imaging. However, common 4DCT methods reconstruct multiple breath cycles into a single, arbitrary breath cycle which can lead to various artefacts, impacting the downstream clinical applications. Surrogate driven motion models can estimate continuous variable motion across multiple cycles based on CT segments `unsorted' from 4DCT, but it requires respiration surrogate signals with strong correlation to the internal motion, which are not always available. The method proposed in this study eliminates such dependency by adapting the hyper-gradient method to the optimization of surrogate signals as hyper-parameters, while achieving better or comparable performance, as demonstrated on digital phantom simulations and real patient data. Our method produces a high-quality motion-compensated image together with estimates of the motion, including breath-to-breath variability, throughout the image acquisition. Our method has the potential to improve downstream clinical applications, and also enables retrospective analysis of open access 4DCT dataset where no respiration signals are stored. Code is avaibale at https://github.com/Yuliang-Huang/4DCT-irregular-motion. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Accepted by MICCAI 2024

arXiv:2404.15621 [pdf]

Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance

Authors: Osama Yousuf, Brian Hoskins, Karthick Ramu, Mitchell Fream, William A. Borders, Advait Madhavan, Matthew W. Daniels, Andrew Dienstfrey, Jabez J. McClelland, Martin Lueker-Boden, Gina C. Adam

Abstract: Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solution… ▽ More Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototy** platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer map**, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2403.11988 [pdf, other]

Measurement-driven Langevin modeling of superparamagnetic tunnel junctions

Authors: Liam A. Pocher, Temitayo N. Adeyeye, Sidra Gibeault, Philippe Talatchian, Ursula Ebels, Daniel P. Lathrop, Jabez J. McClelland, Mark D. Stiles, Advait Madhavan, Matthew W. Daniels

Abstract: Superparamagnetic tunnel junctions are important devices for a range of emerging technologies, but most existing compact models capture only their mean switching rates. Capturing qualitatively accurate analog dynamics of these devices will be important as the technology scales up. Here we present results using a one-dimensional overdamped Langevin equation that captures statistical properties of m… ▽ More Superparamagnetic tunnel junctions are important devices for a range of emerging technologies, but most existing compact models capture only their mean switching rates. Capturing qualitatively accurate analog dynamics of these devices will be important as the technology scales up. Here we present results using a one-dimensional overdamped Langevin equation that captures statistical properties of measured time traces, including voltage histograms, drift and diffusion characteristics as measured with Kramers-Moyal coefficients, and dwell times distributions. While common macrospin models are more physically-motivated magnetic models than the Langevin model, we show that for the device measured here, they capture even fewer of the measured experimental behaviors. △ Less

Submitted 2 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 22 pages (13 pages of main text), 21 figures

arXiv:2312.13171 [pdf, other]

Programmable electrical coupling between stochastic magnetic tunnel junctions

Authors: Sidra Gibeault, Temitayo N. Adeyeye, Liam A. Pocher, Daniel P. Lathrop, Matthew W. Daniels, Mark D. Stiles, Jabez J. McClelland, William A. Borders, Jason T. Ryan, Philippe Talatchian, Ursula Ebels, Advait Madhavan

Abstract: Superparamagnetic tunnel junctions (SMTJs) are promising sources of randomness for compact and energy efficient implementations of probabilistic computing techniques. Augmenting an SMTJ with electronic circuits, to convert the random telegraph fluctuations of its resistance state to stochastic digital signals, gives a basic building block known as a probabilistic bit or $p$-bit. Though scalable pr… ▽ More Superparamagnetic tunnel junctions (SMTJs) are promising sources of randomness for compact and energy efficient implementations of probabilistic computing techniques. Augmenting an SMTJ with electronic circuits, to convert the random telegraph fluctuations of its resistance state to stochastic digital signals, gives a basic building block known as a probabilistic bit or $p$-bit. Though scalable probabilistic computing methods connecting $p$-bits have been proposed, practical implementations are limited by either minimal tunability or energy inefficient microprocessors-in-the-loop. In this work, we experimentally demonstrate the functionality of a scalable analog unit cell, namely a pair of $p$-bits with programmable electrical coupling. This tunable coupling is implemented with operational amplifier circuits that have a time constant of approximately 1us, which is faster than the mean dwell times of the SMTJs over most of the operating range. Programmability enables flexibility, allowing both positive and negative couplings, as well as coupling devices with widely varying device properties. These tunable coupling circuits can achieve the whole range of correlations from $-1$ to $1$, for both devices with similar timescales, and devices whose time scales vary by an order of magnitude. This range of correlation allows such circuits to be used for scalable implementations of simulated annealing with probabilistic computing. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.06446 [pdf, other]

Measurement-driven neural-network training for integrated magnetic tunnel junction arrays

Authors: William A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. Hoskins

Abstract: The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as… ▽ More The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications. △ Less

Submitted 14 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 17 pages, 9 figures

arXiv:2311.17901 [pdf, other]

SODA: Bottleneck Diffusion Models for Representation Learning

Authors: Drew A. Hudson, Daniel Zoran, Mateusz Malinowski, Andrew K. Lampinen, Andrew Jaegle, James L. McClelland, Loic Matthey, Felix Hill, Alexander Lerchner

Abstract: We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, in turn, guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, and leveraging novel view synthesis as a self-supervised… ▽ More We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, in turn, guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, and leveraging novel view synthesis as a self-supervised objective, we can turn diffusion models into strong representation learners, capable of capturing visual semantics in an unsupervised manner. To the best of our knowledge, SODA is the first diffusion model to succeed at ImageNet linear-probe classification, and, at the same time, it accomplishes reconstruction, editing and synthesis tasks across a wide range of datasets. Further investigation reveals the disentangled nature of its emergent latent space, that serves as an effective interface to control and manipulate the model's produced images. All in all, we aim to shed light on the exciting and promising potential of diffusion models, not only for image generation, but also for learning rich and robust representations. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2306.03882 [pdf, other]

Causal interventions expose implicit situation models for commonsense language understanding

Authors: Takateru Yamakoshi, James L. McClelland, Adele E. Goldberg, Robert D. Hawkins

Abstract: Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively sma… ▽ More Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched ``syntactic'' control where the situation model is not strictly necessary. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution. △ Less

Submitted 7 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: Findings of ACL

arXiv:2210.03275 [pdf, other]

Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers

Authors: Andrew J. Nam, Mustafa Abdool, Trevor Maxfield, James L. McClelland

Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language m… ▽ More Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers. △ Less

Submitted 13 December, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.02615 [pdf, other]

Learning to Reason With Relational Abstractions

Authors: Andrew J. Nam, Mengye Ren, Chelsea Finn, James L. McClelland

Abstract: Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study… ▽ More Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning. △ Less

Submitted 5 December, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

arXiv:2210.00400 [pdf, other]

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Authors: Yuxuan Li, James L. McClelland

Abstract: Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional inputs. However, there is an ongoing debate about how and when transformers can acquire highly structured behavior and achieve systematic generalization. Here, we… ▽ More Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional inputs. However, there is an ongoing debate about how and when transformers can acquire highly structured behavior and achieve systematic generalization. Here, we explore how well a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions of these operations. We demonstrate strong generalization to sequences longer than those used in training by replacing the standard positional encoding typically used in transformers with labels arbitrarily paired with items in the sequence. We search for the layer and head configuration sufficient to solve these tasks, then probe for signs of systematic processing in latent representations and attention patterns. We show that two-layer transformers learn reliable solutions to multi-level problems, develop signs of task decomposition, and encode input items in a way that encourages the exploitation of shared computation across related tasks. These results provide key insights into how attention layers support structured computation both within a task and across multiple tasks. △ Less

Submitted 10 December, 2022; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: 18 pages

ACM Class: I.2.6

arXiv:2207.07051 [pdf, other]

Language models show human-like content effects on reasoning tasks

Authors: Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Hannah R. Sheahan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill

Abstract: Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semant… ▽ More Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semantic content of a problem supports the correct logical inferences. These content-entangled reasoning patterns play a central role in debates about the fundamental nature of human intelligence. Here, we investigate whether language models $\unicode{x2014}$ whose prior expectations capture some aspects of human knowledge $\unicode{x2014}$ similarly mix content into their answers to logical problems. We explored this question across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task. We evaluate state of the art large language models, as well as humans, and find that the language models reflect many of the same patterns observed in humans across these tasks $\unicode{x2014}$ like humans, models answer more accurately when the semantic content of a task supports the logical inferences. These parallels are reflected both in answer patterns, and in lower-level features like the relationship between model answer distributions and human response times. Our findings have implications for understanding both these cognitive effects in humans, and the factors that contribute to language model performance. △ Less

Submitted 30 October, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

arXiv:2205.05055 [pdf, other]

Data Distributional Properties Drive Emergent In-Context Learning in Transformers

Authors: Stephanie C. Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, Felix Hill

Abstract: Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distribu… ▽ More Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distributional properties such as burstiness (items appear in clusters rather than being uniformly distributed over time) and having large numbers of rarely occurring classes. In-context learning also emerges more strongly when item meanings or interpretations are dynamic rather than fixed. These properties are exemplified by natural language, but are also inherent to naturalistic data in a wide range of other domains. They also depart significantly from the uniform, i.i.d. training distributions typically used for standard supervised learning. In our initial experiments, we found that in-context learning traded off against more conventional weight-based learning, and models were unable to achieve both simultaneously. However, our later experiments uncovered that the two modes of learning could co-exist in a single model when it was trained on data following a skewed Zipfian distribution -- another common property of naturalistic data, including language. In further experiments, we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models. In sum, our findings indicate how the transformer architecture works together with particular properties of the training data to drive the intriguing emergent in-context learning behaviour of large language models, and how future work might encourage both in-context and in-weights learning in domains beyond language. △ Less

Submitted 17 November, 2022; v1 submitted 22 April, 2022; originally announced May 2022.

Comments: Accepted at NeurIPS 2022 (Oral). Code is available at: https://github.com/deepmind/emergent_in_context_learning

arXiv:2204.02329 [pdf, other]

Can language models learn from explanations in context?

Authors: Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill

Abstract: Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different typ… ▽ More Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different types of explanations, instructions, and controls affect zero- and few-shot performance. We analyze these results using statistical multilevel modeling techniques that account for the nested dependencies among conditions, tasks, prompts, and models. We find that explanations can improve performance -- even without tuning. Furthermore, explanations hand-tuned for performance on a small validation set offer substantially larger benefits, and building a prompt by selecting examples and explanations together substantially improves performance over selecting examples alone. Finally, even untuned explanations outperform carefully matched controls, suggesting that the benefits are due to the link between an example and its explanation, rather than lower-level features. However, only large models benefit. In summary, explanations can support the in-context learning of large LMs on challenging tasks. △ Less

Submitted 10 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: Findings of EMNLP 2022

arXiv:2112.09159 [pdf]

doi 10.1103/PhysRevApplied.18.014039

Implementation of a Binary Neural Network on a Passive Array of Magnetic Tunnel Junctions

Authors: Jonathan M. Goodwill, Nitin Prasad, Brian D. Hoskins, Matthew W. Daniels, Advait Madhavan, Lei Wan, Tiffany S. Santos, Michael Tran, Jordan A. Katine, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland

Abstract: The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magn… ▽ More The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magnetic tunnel junctions (MTJs) to demonstrate neural network hardware inference based on passive arrays of MTJs. In general, transferring a trained network model to hardware for inference is confronted by degradation in performance due to device-to-device variations, write errors, parasitic resistance, and nonidealities in the substrate. To quantify the effect of these hardware realities, we benchmark 300 unique weight matrix solutions of a 2-layer perceptron to classify the Wine dataset for both classification accuracy and write fidelity. Despite device imperfections, we achieve software-equivalent accuracy of up to 95.3 % with proper tuning of network parameters in 15 x 15 MTJ arrays having a range of device sizes. The success of this tuning process shows that new metrics are needed to characterize the performance and quality of networks reproduced in mixed signal hardware. △ Less

Submitted 6 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: 22 pages plus 8 pages supplemental material; 7 figures plus 7 supplemental figures

Journal ref: Physical Review Applied, 18(1) 014039 (2022)

arXiv:2112.03753 [pdf, other]

Tell me why! Explanations support learning relational and causal structure

Authors: Andrew K. Lampinen, Nicholas A. Roy, Ishita Dasgupta, Stephanie C. Y. Chan, Allison C. Tam, James L. McClelland, Chen Yan, Adam Santoro, Neil C. Rabinowitz, Jane X. Wang, Felix Hill

Abstract: Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational a… ▽ More Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational and causal knowledge, augmenting their experience by training them to predict language descriptions and explanations can overcome these limitations. We show that language can help agents learn challenging relational tasks, and examine which aspects of language contribute to its benefits. We then show that explanations can help agents to infer not only relational but also causal structure. Language can shape the way that agents to generalize out-of-distribution from ambiguous, causally-confounded training, and explanations even allow agents to learn to perform experimental interventions to identify causal relationships. Our results suggest that language description and explanation may be powerful tools for improving agent learning and generalization. △ Less

Submitted 25 May, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: ICML 2022; 23 pages

ACM Class: I.2.6

arXiv:2107.06994 [pdf, other]

Systematic human learning and generalization from a brief tutorial with explanatory feedback

Authors: Andrew J. Nam, James L. McClelland

Abstract: Neural networks have long been used to model human intelligence, capturing elements of behavior and cognition, and their neural basis. Recent advancements in deep learning have enabled neural network models to reach and even surpass human levels of intelligence in many respects, yet unlike humans, their ability to learn new tasks quickly remains a challenge. People can reason not only in familiar… ▽ More Neural networks have long been used to model human intelligence, capturing elements of behavior and cognition, and their neural basis. Recent advancements in deep learning have enabled neural network models to reach and even surpass human levels of intelligence in many respects, yet unlike humans, their ability to learn new tasks quickly remains a challenge. People can reason not only in familiar domains, but can also rapidly learn to reason through novel problems and situations, raising the question of how well modern neural network models capture human intelligence and in which ways they diverge. In this work, we explore this gap by investigating human adults' ability to learn an abstract reasoning task based on Sudoku from a brief instructional tutorial with explanatory feedback for incorrect responses using a narrow range of training examples. We find that participants who master the task do so within a small number of trials and generalize well to puzzles outside of the training range. We also find that most of those who master the task can describe a valid solution strategy, and such participants perform better on transfer puzzles than those whose strategy descriptions are vague or incomplete. Interestingly, fewer than half of our human participants were successful in acquiring a valid solution strategy, and this ability is associated with high school mathematics education. We consider the challenges these findings pose for building computational models that capture all aspects of our findings and point toward a possible role for learning to engage in explanation-based reasoning to support rapid learning and generalization. △ Less

Submitted 28 March, 2023; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: 27 pages, 108 references, 8 Figures, and one Table, plus Supplementary Materials

arXiv:2106.03604 [pdf, other]

doi 10.1103/PhysRevB.104.054427

Mutual control of stochastic switching for two electrically coupled superparamagnetic tunnel junctions

Authors: Philippe Talatchian, Matthew W. Daniels, Advait Madhavan, Matthew R. Pufall, Emilie Jué, William H. Rippard, Jabez J. McClelland, Mark D. Stiles

Abstract: Superparamagnetic tunnel junctions (SMTJs) are promising sources for the randomness required by some compact and energy-efficient computing schemes. Coupling SMTJs gives rise to collective behavior that could be useful for cognitive computing. We use a simple linear electrical circuit to mutually couple two SMTJs through their stochastic electrical transitions. When one SMTJ makes a thermally indu… ▽ More Superparamagnetic tunnel junctions (SMTJs) are promising sources for the randomness required by some compact and energy-efficient computing schemes. Coupling SMTJs gives rise to collective behavior that could be useful for cognitive computing. We use a simple linear electrical circuit to mutually couple two SMTJs through their stochastic electrical transitions. When one SMTJ makes a thermally induced transition, the voltage across both SMTJs changes, modifying the transition rates of both. This coupling leads to significant correlation between the states of the two devices. Using fits to a generalized Néel-Brown model for the individual thermally bistable magnetic devices, we can accurately reproduce the behavior of the coupled devices with a Markov model. △ Less

Submitted 19 August, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: 12 pages, 11 figures

Journal ref: Phys. Rev. B 104, 054427 (2021)

arXiv:2005.04318 [pdf, other]

doi 10.1073/pnas.2008852117

Transforming task representations to perform novel tasks

Authors: Andrew K. Lampinen, James L. McClelland

Abstract: An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero-shot), based on its relationship to previous tasks. Humans can exhibit this cognitive flexibility. By contrast, models that achieve superhuman performance in specific tasks often fail to adapt to even slight task alterations. To address this, we propose a general computational framework… ▽ More An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero-shot), based on its relationship to previous tasks. Humans can exhibit this cognitive flexibility. By contrast, models that achieve superhuman performance in specific tasks often fail to adapt to even slight task alterations. To address this, we propose a general computational framework for adapting to novel tasks based on their relationship to prior tasks. We begin by learning vector representations of tasks. To adapt to new tasks, we propose meta-map**s, higher-order tasks that transform basic task representations. We demonstrate the effectiveness of this framework across a wide variety of tasks and computational paradigms, ranging from regression to image classification and reinforcement learning. We compare to both human adaptability and language-based approaches to zero-shot learning. Across these domains, meta-map** is successful, often achieving 80-90% performance, without any data, on a novel task, even when the new task directly contradicts prior experience. We further show that meta-map** can not only generalize to new tasks via learned relationships, but can also generalize using novel relationships unseen during training. Finally, using meta-map** as a starting point can dramatically accelerate later learning on a new task, and reduce learning time and cumulative error substantially. Our results provide insight into a possible computational basis of intelligent adaptability and offer a possible framework for modeling cognitive flexibility and building more flexible artificial intelligence systems. △ Less

Submitted 6 October, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: 45 pages

ACM Class: I.2.0; I.2.6

Journal ref: PNAS December 29, 2020 117 (52) 32970-32981;

arXiv:1912.05877 [pdf, other]

Extending Machine Language Models toward Human-Level Language Understanding

Authors: James L. McClelland, Felix Hill, Maja Rudolph, Jason Baldridge, Hinrich Schütze

Abstract: Language is crucial for human intelligence, but what exactly is its role? We take language to be a part of a system for understanding and communicating about situations. The human ability to understand and communicate about situations emerges gradually from experience and depends on domain-general principles of biological neural networks: connection-based learning, distributed representation, and… ▽ More Language is crucial for human intelligence, but what exactly is its role? We take language to be a part of a system for understanding and communicating about situations. The human ability to understand and communicate about situations emerges gradually from experience and depends on domain-general principles of biological neural networks: connection-based learning, distributed representation, and context-sensitive, mutual constraint satisfaction-based processing. Current artificial language processing systems rely on the same domain general principles, embodied in artificial neural networks. Indeed, recent progress in this field depends on \emph{query-based attention}, which extends the ability of these systems to exploit context and has contributed to remarkable breakthroughs. Nevertheless, most current models focus exclusively on language-internal tasks, limiting their ability to perform tasks that depend on understanding situations. These systems also lack memory for the contents of prior situations outside of a fixed contextual span. We describe the organization of the brain's distributed understanding system, which includes a fast learning system that addresses the memory problem. We sketch a framework for future models of understanding drawing equally on cognitive neuroscience and artificial intelligence and exploiting query-based attention. We highlight relevant current directions and consider further developments needed to fully capture human-level language understanding in a computational system. △ Less

Submitted 4 July, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

arXiv:1910.00571 [pdf, other]

Environmental drivers of systematicity and generalization in a situated agent

Authors: Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L. McClelland, Adam Santoro

Abstract: The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparative… ▽ More The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparatively generic agent architecture that exhibits strong performance on these tests. We then identify three aspects of the training regime and environment that make a significant difference to its performance: (a) the number of object/word experiences in the training set; (b) the visual invariances afforded by the agent's perspective, or frame of reference; and (c) the variety of visual input inherent in the perceptual aspect of the agent's perception. Our findings indicate that the degree of generalisation that networks exhibit can depend critically on particulars of the environment in which a given task is instantiated. They further suggest that the propensity for neural networks to generalise in systematic ways may increase if, like human children, those networks have access to many frames of richly varying, multi-modal observations as they learn. △ Less

Submitted 19 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

arXiv:1906.03744 [pdf, other]

Generative Continual Concept Learning

Authors: Mohammad Rostami, Soheil Kolouri, James McClelland, Praveen Pilly

Abstract: After learning a concept, humans are also able to continually generalize their learned concepts to new domains by observing only a few labeled instances without any interference with the past learned knowledge. In contrast, learning concepts efficiently in a continual learning setting remains an open challenge for current Artificial Intelligence algorithms as persistent model retraining is necessa… ▽ More After learning a concept, humans are also able to continually generalize their learned concepts to new domains by observing only a few labeled instances without any interference with the past learned knowledge. In contrast, learning concepts efficiently in a continual learning setting remains an open challenge for current Artificial Intelligence algorithms as persistent model retraining is necessary. Inspired by the Parallel Distributed Processing learning and the Complementary Learning Systems theories, we develop a computational model that is able to expand its previously learned concepts efficiently to new domains using a few labeled samples. We couple the new form of a concept to its past learned forms in an embedding space for effective continual learning. Doing so, a generative distribution is learned such that it is shared across the tasks in the embedding space and models the abstract concepts. This procedure enables the model to generate pseudo-data points to replay the past experience to tackle catastrophic forgetting. △ Less

Submitted 7 September, 2019; v1 submitted 9 June, 2019; originally announced June 2019.

arXiv:1905.09950 [pdf, other]

Zero-shot task adaptation by homoiconic meta-map**

Authors: Andrew K. Lampinen, James L. McClelland

Abstract: How can deep learning systems flexibly reuse their knowledge? Toward this goal, we propose a new class of challenges, and a class of architectures that can solve them. The challenges are meta-map**s, which involve systematically transforming task behaviors to adapt to new tasks zero-shot. The key to achieving these challenges is representing the task being performed in such a way that this task… ▽ More How can deep learning systems flexibly reuse their knowledge? Toward this goal, we propose a new class of challenges, and a class of architectures that can solve them. The challenges are meta-map**s, which involve systematically transforming task behaviors to adapt to new tasks zero-shot. The key to achieving these challenges is representing the task being performed in such a way that this task representation is itself transformable. We therefore draw inspiration from functional programming and recent work in meta-learning to propose a class of Homoiconic Meta-Map** (HoMM) approaches that represent data points and tasks in a shared latent space, and learn to infer transformations of that space. HoMM approaches can be applied to any type of machine learning task. We demonstrate the utility of this perspective by exhibiting zero-shot remap** of behavior to adapt to new tasks. △ Less

Submitted 12 November, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

Comments: 27 pages

ACM Class: I.2.0; I.2.6

arXiv:1903.01635 [pdf]

doi 10.3389/fnins.2019.00793

Streaming Batch Eigenupdates for Hardware Neuromorphic Networks

Authors: Brian D. Hoskins, Matthew W. Daniels, Siyuan Huang, Advait Madhavan, Gina C. Adam, Nikolai Zhitenev, Jabez J. McClelland, Mark D. Stiles

Abstract: Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the tim… ▽ More Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the time complexity of training does not scale with the network size, it is limited by the space complexity of stochastic gradient descent, which grows quadratically. The main objective of this work is to reduce this space complexity by using low-rank approximations of stochastic gradient descent. This low spatial complexity combined with streaming methods allows for significant reductions in memory and compute overhead, opening the doors for improvements in area, time and energy efficiency of training. We refer to this algorithm and architecture to implement it as the streaming batch eigenupdate (SBE) approach. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: 13 pages, 5 figures

Journal ref: Frontiers in Neuroscience 13 (2019): 793

arXiv:1810.10531 [pdf, other]

doi 10.1073/pnas.1820226116

A mathematical theory of semantic development in deep neural networks

Authors: Andrew M. Saxe, James L. McClelland, Surya Ganguli

Abstract: An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual expe… ▽ More An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep learning dynamics to give rise to these regularities. △ Less

Submitted 23 October, 2018; originally announced October 2018.

arXiv:1806.06595 [pdf, other]

doi 10.1007/978-3-030-00937-3_1

Uncertainty in multitask learning: joint representations for probabilistic MR-only radiotherapy planning

Authors: Felix J. S. Bragman, Ryutaro Tanno, Zach Eaton-Rosen, Wenqi Li, David J. Hawkes, Sebastien Ourselin, Daniel C. Alexander, Jamie R. McClelland, M. Jorge Cardoso

Abstract: Multi-task neural network architectures provide a mechanism that jointly integrates information from distinct sources. It is ideal in the context of MR-only radiotherapy planning as it can jointly regress a synthetic CT (synCT) scan and segment organs-at-risk (OAR) from MRI. We propose a probabilistic multi-task network that estimates: 1) intrinsic uncertainty through a heteroscedastic noise model… ▽ More Multi-task neural network architectures provide a mechanism that jointly integrates information from distinct sources. It is ideal in the context of MR-only radiotherapy planning as it can jointly regress a synthetic CT (synCT) scan and segment organs-at-risk (OAR) from MRI. We propose a probabilistic multi-task network that estimates: 1) intrinsic uncertainty through a heteroscedastic noise model for spatially-adaptive task loss weighting and 2) parameter uncertainty through approximate Bayesian inference. This allows sampling of multiple segmentations and synCTs that share their network representation. We test our model on prostate cancer scans and show that it produces more accurate and consistent synCTs with a better estimation in the variance of the errors, state of the art results in OAR segmentation and a methodology for quality assurance in radiotherapy treatment planning. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Comments: Early-accept at MICCAI 2018, 8 pages, 4 figures

arXiv:1710.10280 [pdf, other]

One-shot and few-shot learning of word embeddings

Authors: Andrew K. Lampinen, James L. McClelland

Abstract: Standard deep learning systems require thousands or millions of examples to learn a concept, and cannot integrate new concepts easily. By contrast, humans have an incredible ability to do one-shot or few-shot learning. For instance, from just hearing a word used in a sentence, humans can infer a great deal about it, by leveraging what the syntax and semantics of the surrounding words tells us. Her… ▽ More Standard deep learning systems require thousands or millions of examples to learn a concept, and cannot integrate new concepts easily. By contrast, humans have an incredible ability to do one-shot or few-shot learning. For instance, from just hearing a word used in a sentence, humans can infer a great deal about it, by leveraging what the syntax and semantics of the surrounding words tells us. Here, we draw inspiration from this to highlight a simple technique by which deep recurrent networks can similarly exploit their prior knowledge to learn a useful representation for a new word from little data. This could make natural language processing systems much more flexible, by allowing them to learn continually from the new words they encounter. △ Less

Submitted 2 January, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

Comments: 15 pages, 7 figures, under review as a conference paper at ICLR 2018

ACM Class: I.2.7

arXiv:1704.01475 [pdf]

doi 10.1038/s41467-017-02116-9

Stateful characterization of resistive switching TiO2 with electron beam induced currents

Authors: Brian D. Hoskins, Gina C. Adam, Evgheni Strelcov, Nikolai Zhitenev, Andrei Kolmakov, Dmitri B. Strukov, Jabez J. McClelland

Abstract: Metal oxide resistive switches are increasingly important as possible artificial synapses in next generation neuromorphic networks. Nevertheless, there is still no codified set of tools for studying properties of the devices. To this end, we demonstrate electron beam induced current measurements as a powerful method to monitor the development of local resistive switching in TiO2 based devices. By… ▽ More Metal oxide resistive switches are increasingly important as possible artificial synapses in next generation neuromorphic networks. Nevertheless, there is still no codified set of tools for studying properties of the devices. To this end, we demonstrate electron beam induced current measurements as a powerful method to monitor the development of local resistive switching in TiO2 based devices. By comparing beam-energy dependent electron beam induced currents with Monte Carlo simulations of the energy absorption in different device layers, it is possible to deconstruct the origins of filament image formation and relate this to both morphological changes and the state of the switch. By clarifying the contrast mechanisms in electron beam induced current microscopy it is possible to gain new insights into the scaling of the resistive switching phenomenon and observe the formation of a current leakage region around the switching filament. Additionally, analysis of symmetric device structures reveals propagating polarization domains. △ Less

Submitted 30 October, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

Comments: 27 Pages 10 figures

Journal ref: Nature Communications 8, 1972 (2017)

arXiv:1611.04634 [pdf, other]

EMDUnifrac: Exact Linear Time Computation of the Unifrac Metric and Identification of Differentially Abundant Organisms

Authors: Jason McClelland, David Koslicki

Abstract: Both the weighted and unweighted Unifrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the Unifrac metric is equivalent to the so-called earth mover's distance (also known as the Kantorovich-Rubinstein metric) to develop an algorithm that not only co… ▽ More Both the weighted and unweighted Unifrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the Unifrac metric is equivalent to the so-called earth mover's distance (also known as the Kantorovich-Rubinstein metric) to develop an algorithm that not only computes the Unifrac distance in linear time and space, but also simultaneously finds which operational taxonomic units are responsible for the observed differences between samples. This allows the algorithm, called EMDUnifrac, to determine why given samples are different, not just if they are different, and with no added computational burden. EMDUnifrac can be utilized on any distribution on a tree, and so is particularly suitable to analyzing both operational taxonomic units derived from amplicon sequencing, as well as community profiles resulting from classifying whole genome shotgun metagenomes. The EMDUnifrac source code (written in python) is freely available at: https://github.com/dkoslicki/EMDUnifrac. △ Less

Submitted 14 November, 2016; originally announced November 2016.

MSC Class: 92-08

arXiv:1512.07111 [pdf]

doi 10.1038/nphoton.2015.248

Imaging Nanophotonic Modes of Microresonators using a Focused Ion Beam

Authors: Kevin A. Twedt, Jie Zou, Marcelo Davanco, Kartik Srinivasan, Jabez J. McClelland, Vladimir A. Aksyuk

Abstract: Optical microresonators have proven powerful in a wide range of applications, including cavity quantum electrodynamics, biosensing, microfludics, and cavity optomechanics. Their performance depends critically on the exact distribution of optical energy, confined and shaped by the nanoscale device geometry. Near-field optical probes can image this distribution, but the physical probe necessarily pe… ▽ More Optical microresonators have proven powerful in a wide range of applications, including cavity quantum electrodynamics, biosensing, microfludics, and cavity optomechanics. Their performance depends critically on the exact distribution of optical energy, confined and shaped by the nanoscale device geometry. Near-field optical probes can image this distribution, but the physical probe necessarily perturbs the near field, which is particularly problematic for sensitive high quality factor resonances. We present a new approach to map** nanophotonic modes that uses a controllably small and local optomechanical perturbation introduced by a focused lithium ion beam. An ion beam (radius about 50 nm) induces a picometer-scale dynamic deformation of the resonator surface, which we detect through a shift in the optical resonance wavelength. We map five modes of a silicon microdisk resonator (Q > 20,000) with both high spatial and spectral resolution. Our technique also enables in-situ observation of ion implantation damage and relaxation dynamics in a silicon lattice. △ Less

Submitted 22 December, 2015; originally announced December 2015.

Comments: published online in Nature Photonics

Journal ref: Nature Photon. 10, 35-39 (2016)

arXiv:1510.08673 [pdf]

doi 10.1063/1.4944491

Bright focused ion beam sources based on laser-cooled atoms

Authors: J. J. McClelland, A. V. Steele, B. Knuffman, K. A. Twedt, A. Schwarzkopf, T. M. Wilson

Abstract: Nanoscale focused ion beams (FIBs) represent one of the most useful tools in nanotechnology, enabling nanofabrication via milling and gas-assisted deposition, microscopy and microanalysis, and selective, spatially resolved do** of materials. Recently, a new type of FIB source has emerged, which uses ionization of laser cooled neutral atoms to produce the ion beam. The extremely cold temperatures… ▽ More Nanoscale focused ion beams (FIBs) represent one of the most useful tools in nanotechnology, enabling nanofabrication via milling and gas-assisted deposition, microscopy and microanalysis, and selective, spatially resolved do** of materials. Recently, a new type of FIB source has emerged, which uses ionization of laser cooled neutral atoms to produce the ion beam. The extremely cold temperatures attainable with laser cooling (in the range of 100 uK or below) result in a beam of ions with a very small transverse velocity distribution. This corresponds to a source with extremely high brightness that rivals or may even exceed the brightness of the industry standard Ga+ liquid metal ion source. In this review we discuss the context of ion beam technology in which these new ion sources can play a role, their principles of operation, and some examples of recent demonstrations. The field is relatively new, so only a few applications have been demonstrated, most notably low energy ion microscopy with Li ions. Nevertheless, a number of promising new approaches have been proposed and/or demonstrated, suggesting that a rapid evolution of this type of source is likely in the near future. △ Less

Submitted 10 February, 2016; v1 submitted 29 October, 2015; originally announced October 2015.

Journal ref: Applied Physics Review 3, 011302 (2016)

arXiv:1312.6120 [pdf, other]

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Authors: Andrew M. Saxe, James L. McClelland, Surya Ganguli

Abstract: Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map,… ▽ More Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos. △ Less

Submitted 19 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

Comments: Submission to ICLR2014. Revised based on reviewer feedback

arXiv:0802.0857 [pdf, ps, other]

doi 10.1103/PhysRevLett.100.113002

Narrow-line magneto-optical cooling and trap** of strongly magnetic atoms

Authors: Andrew J. Berglund, James L. Hanssen, Jabez J. McClelland

Abstract: Laser cooling on weak transitions is a useful technique for reaching ultracold temperatures in atoms with multiple valence electrons. However, for strongly magnetic atoms a conventional narrow-line magneto-optical trap (MOT) is destabilized by competition between optical and magnetic forces. We overcome this difficulty in Er by develo** an unusual narrow-line MOT that balances optical and magn… ▽ More Laser cooling on weak transitions is a useful technique for reaching ultracold temperatures in atoms with multiple valence electrons. However, for strongly magnetic atoms a conventional narrow-line magneto-optical trap (MOT) is destabilized by competition between optical and magnetic forces. We overcome this difficulty in Er by develo** an unusual narrow-line MOT that balances optical and magnetic forces using laser light tuned to the blue side of a narrow (8 kHz) transition. The trap population is spin-polarized with temperatures reaching below 2 microkelvin. Our results constitute an alternative method for laser cooling on weak transitions, applicable to rare-earth-metal and metastable alkaline earth elements. △ Less

Submitted 6 February, 2008; originally announced February 2008.

Comments: To appear in Phys. Rev. Lett. 4 pages, 5 figures

arXiv:0802.0836 [pdf, ps, other]

doi 10.1103/PhysRevA.76.053418

Sub-Doppler laser cooling and magnetic trap** of erbium

Authors: Andrew J. Berglund, Siu Au Lee, Jabez J. McClelland

Abstract: We investigate cooling mechanisms in magneto-optically and magnetically trapped erbium. We find efficient sub-Doppler cooling in our trap, which can persist even in large magnetic fields due to the near degeneracy of two Lande g factors. Furthermore, a continuously loaded magnetic trap is demonstrated where we observe temperatures below 25 microkelvin. These favorable cooling and trap** proper… ▽ More We investigate cooling mechanisms in magneto-optically and magnetically trapped erbium. We find efficient sub-Doppler cooling in our trap, which can persist even in large magnetic fields due to the near degeneracy of two Lande g factors. Furthermore, a continuously loaded magnetic trap is demonstrated where we observe temperatures below 25 microkelvin. These favorable cooling and trap** properties suggest a number of scientific possibilities for rare-earth atomic physics, including narrow linewidth laser cooling and spectroscopy, unique collision studies, and degenerate bosonic and fermionic gases with long-range magnetic dipole coupling. △ Less

Submitted 6 February, 2008; originally announced February 2008.

Journal ref: Phys. Rev. A 76, 053418 (2007)

Showing 1–33 of 33 results for author: McClelland, J