Search | arXiv e-print repository

Nanoscale imaging of He-ion irradiation effects on amorphous TaO$_x$ toward electroforming-free neuromorphic functions

Authors: Olha Popova, Steven J. Randolph, Sabine M. Neumayer, Liangbo Liang, Benjamin Lawrie, Olga S. Ovchinnikova, Robert J. Bondi, Matthew J. Marinella, Bobby G. Sumpter, Petro Maksymovych

Abstract: Resistive switching in thin films has been widely studied in a broad range of materials. Yet the mechanisms behind electroresistive switching have been persistently difficult to decipher and control, in part due to their non-equilibrium nature. Here, we demonstrate new experimental approaches that can probe resistive switching phenomena, utilizing amorphous TaO$_x$ as a model material system. Spec… ▽ More Resistive switching in thin films has been widely studied in a broad range of materials. Yet the mechanisms behind electroresistive switching have been persistently difficult to decipher and control, in part due to their non-equilibrium nature. Here, we demonstrate new experimental approaches that can probe resistive switching phenomena, utilizing amorphous TaO$_x$ as a model material system. Specifically, we apply Scanning Microwave Impedance Microscopy (sMIM) and cathodoluminescence (CL) microscopy as direct probes of conductance and electronic structure, respectively. These methods provide direct evidence of the electronic state of TaO$_x$ despite its amorphous nature. For example CL identifies characteristic impurity levels in TaO$_x$, in agreement with first principles calculations. We applied these methods to investigate He-ion-beam irradiation as a path to activate conductivity of materials and enable electroforming-free control over resistive switching. However, we find that even though He-ions begin to modify the nature of bonds even at the lowest doses, the films conductive properties exhibit remarkable stability with large displacement damage and they are driven to metallic states only at the limit of structural decomposition. Finally, we show that electroforming in a nanoscale junction can be carried out with a dissipated power of < 20 nW, a much smaller value compared to earlier studies and one that minimizes irreversible structural modifications of the films. The multimodal approach described here provides a new framework toward the theory/experiment guided design and optimization of electroresistive materials. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2301.11382 [pdf]

Parallel Matrix Multiplication Using Voltage Controlled Magnetic Anisotropy Domain Wall Logic

Authors: Nicholas Zogbi, Samuel Liu, Christopher H. Bennett, Sapan Agarwal, Matthew J. Marinella, Jean Anne C. Incorvia, T. Patrick Xiao

Abstract: The domain wall-magnetic tunnel junction (DW-MTJ) is a versatile device that can simultaneously store data and perform computations. These three-terminal devices are promising for digital logic due to their nonvolatility, low-energy operation, and radiation hardness. Here, we augment the DW-MTJ logic gate with voltage controlled magnetic anisotropy (VCMA) to improve the reliability of logical conc… ▽ More The domain wall-magnetic tunnel junction (DW-MTJ) is a versatile device that can simultaneously store data and perform computations. These three-terminal devices are promising for digital logic due to their nonvolatility, low-energy operation, and radiation hardness. Here, we augment the DW-MTJ logic gate with voltage controlled magnetic anisotropy (VCMA) to improve the reliability of logical concatenation in the presence of realistic process variations. VCMA creates potential wells that allow for reliable and repeatable localization of domain walls. The DW-MTJ logic gate supports different fanouts, allowing for multiple inputs and outputs for a single device without affecting area. We simulate a systolic array of DW-MTJ Multiply-Accumulate (MAC) units with 4-bit and 8-bit precision, which uses the nonvolatility of DW-MTJ logic gates to enable fine-grained pipelining and high parallelism. The DW-MTJ systolic array provides comparable throughput and efficiency to state-of-the-art CMOS systolic arrays while being radiation-hard. These results improve the feasibility of using domain wall-based processors, especially for extreme-environment applications such as space. △ Less

Submitted 14 April, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: 8 pages, 8 figures (main paper) + 4 pages, 5 figures (supplementary information)

arXiv:2207.07756 [pdf]

Tunable intervalence charge transfer in ruthenium Prussian blue analogue enables stable and efficient biocompatible artificial synapses

Authors: Donald A. Robinson, Michael E. Foster, Christopher H. Bennett, Austin Bhandarkar, Elizabeth R. Webster, Aleyna Celebi, Nisa Celebi, Elliot J. Fuller, Vitalie Stavila, Catalin D. Spataru, David S. Ashby, Matthew J. Marinella, Raga Krishnakumar, Mark D. Allendorf, A. Alec Talin

Abstract: Emerging concepts for neuromorphic computing, bioelectronics, and brain-computer interfacing inspire new research avenues aimed at understanding the relationship between oxidation state and conductivity in unexplored materials. Here, we present ruthenium Prussian blue analogue (RuPBA), a mixed valence coordination compound with an open framework structure and ability to conduct both ionic and elec… ▽ More Emerging concepts for neuromorphic computing, bioelectronics, and brain-computer interfacing inspire new research avenues aimed at understanding the relationship between oxidation state and conductivity in unexplored materials. Here, we present ruthenium Prussian blue analogue (RuPBA), a mixed valence coordination compound with an open framework structure and ability to conduct both ionic and electronic charge, for flexible artificial synapses that reversibly switch conductance by more than four orders of magnitude based on electrochemically tunable oxidation state. Retention of programmed states is improved by nearly two orders of magnitude compared to the extensively studied organic polymers, thus reducing the frequency, complexity and energy costs associated with error correction schemes. We demonstrate dopamine detection using RuPBA synapses and biocompatibility with neuronal cells, evoking prospective application for brain-computer interfacing. By application of electron transfer theory to in-situ spectroscopic probing of intervalence charge transfer, we elucidate a switching mechanism whereby the degree of mixed valency between N-coordinated Ru sites controls the carrier concentration and mobility, as supported by DFT. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2111.11516 [pdf]

Shape-Dependent Multi-Weight Magnetic Artificial Synapses for Neuromorphic Computing

Authors: Thomas Leonard, Samuel Liu, Mahshid Alamdar, Can Cui, Otitoaleke G. Akinola, Lin Xue, T. Patrick Xiao, Joseph S. Friedman, Matthew J. Marinella, Christopher H. Bennett, Jean Anne C. Incorvia

Abstract: In neuromorphic computing, artificial synapses provide a multi-weight conductance state that is set based on inputs from neurons, analogous to the brain. Additional properties of the synapse beyond multiple weights can be needed, and can depend on the application, requiring the need for generating different synapse behaviors from the same materials. Here, we measure artificial synapses based on ma… ▽ More In neuromorphic computing, artificial synapses provide a multi-weight conductance state that is set based on inputs from neurons, analogous to the brain. Additional properties of the synapse beyond multiple weights can be needed, and can depend on the application, requiring the need for generating different synapse behaviors from the same materials. Here, we measure artificial synapses based on magnetic materials that use a magnetic tunnel junction and a magnetic domain wall. By fabricating lithographic notches in a domain wall track underneath a single magnetic tunnel junction, we achieve 4-5 stable resistance states that can be repeatably controlled electrically using spin orbit torque. We analyze the effect of geometry on the synapse behavior, showing that a trapezoidal device has asymmetric weight updates with high controllability, while a straight device has higher stochasticity, but with stable resistance levels. The device data is input into neuromorphic computing simulators to show the usefulness of application-specific synaptic functions. Implementing an artificial neural network applied on streamed Fashion-MNIST data, we show that the trapezoidal magnetic synapse can be used as a metaplastic function for efficient online learning. Implementing a convolutional neural network for CIFAR-100 image recognition, we show that the straight magnetic synapse achieves near-ideal inference accuracy, due to the stability of its resistance levels. This work shows multi-weight magnetic synapses are a feasible technology for neuromorphic computing and provides design guidelines for emerging artificial synapse technologies. △ Less

Submitted 17 February, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 27 pages 6 figures 1 table

arXiv:2109.01262 [pdf, other]

doi 10.1109/MCAS.2022.3214409

On the Accuracy of Analog Neural Network Inference Accelerators

Authors: T. Patrick Xiao, Ben Feinberg, Christopher H. Bennett, Venkatraman Prabhakar, Prashant Saxena, Vineet Agrawal, Sapan Agarwal, Matthew J. Marinella

Abstract: Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy ef… ▽ More Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy efficiency, there is seldom a rigorous evaluation of the accuracy of these accelerators. This work shows how architectural design decisions, particularly in map** neural network parameters to analog memory cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet, the resilience of the system to analog non-idealities - cell programming errors, analog-to-digital converter resolution, and array parasitic resistances - all improve when analog quantities in the hardware are made proportional to the weights in the network. Moreover, contrary to the assumptions of prior work, nearly equivalent resilience to cell imprecision can be achieved by fully storing weights as analog quantities, rather than spreading weight bits across multiple devices, often referred to as bit slicing. By exploiting proportionality, analog system designers have the freedom to match the precision of the hardware to the needs of the algorithm, rather than attempting to guarantee the same level of precision in the intermediate results as an equivalent digital accelerator. This ultimately results in an analog accelerator that is more accurate, more robust to analog errors, and more energy-efficient. △ Less

Submitted 3 February, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: Changes in v3: modified definition of state-independent error (factor of 2) for fairer comparison to state-proportional. Added more results on INT4 network

Journal ref: IEEE Circuits and Systems Magazine, vol. 22, no. 4, pp. 26-48, 2022

arXiv:2107.02238 [pdf, other]

High-Speed CMOS-Free Purely Spintronic Asynchronous Recurrent Neural Network

Authors: Pranav O. Mathews, Christian B. Duffee, Abel Thayil, Ty E. Stovall, Christopher H. Bennett, Felipe Garcia-Sanchez, Matthew J. Marinella, Jean Anne C. Incorvia, Naimul Hassan, Xuan Hu, Joseph S. Friedman

Abstract: Neuromorphic computing systems overcome the limitations of traditional von Neumann computing architectures. These computing systems can be further improved upon by using emerging technologies that are more efficient than CMOS for neural computation. Recent research has demonstrated memristors and spintronic devices in various neural network designs boost efficiency and speed. This paper presents a… ▽ More Neuromorphic computing systems overcome the limitations of traditional von Neumann computing architectures. These computing systems can be further improved upon by using emerging technologies that are more efficient than CMOS for neural computation. Recent research has demonstrated memristors and spintronic devices in various neural network designs boost efficiency and speed. This paper presents a biologically inspired fully spintronic neuron used in a fully spintronic Hopfield RNN. The network is used to solve tasks, and the results are compared against those of current Hopfield neuromorphic architectures which use emerging technologies. △ Less

Submitted 30 September, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:2101.03095 [pdf]

doi 10.1109/LMAG.2021.3069666

Controllable reset behavior in domain wall-magnetic tunnel junction artificial neurons for task-adaptable computation

Authors: Samuel Liu, Christopher H. Bennett, Joseph S. Friedman, Matthew J. Marinella, David Paydarfar, Jean Anne C. Incorvia

Abstract: Neuromorphic computing with spintronic devices has been of interest due to the limitations of CMOS-driven von Neumann computing. Domain wall-magnetic tunnel junction (DW-MTJ) devices have been shown to be able to intrinsically capture biological neuron behavior. Edgy-relaxed behavior, where a frequently firing neuron experiences a lower action potential threshold, may provide additional artificial… ▽ More Neuromorphic computing with spintronic devices has been of interest due to the limitations of CMOS-driven von Neumann computing. Domain wall-magnetic tunnel junction (DW-MTJ) devices have been shown to be able to intrinsically capture biological neuron behavior. Edgy-relaxed behavior, where a frequently firing neuron experiences a lower action potential threshold, may provide additional artificial neuronal functionality when executing repeated tasks. In this study, we demonstrate that this behavior can be implemented in DW-MTJ artificial neurons via three alternative mechanisms: shape anisotropy, magnetic field, and current-driven soft reset. Using micromagnetics and analytical device modeling to classify the Optdigits handwritten digit dataset, we show that edgy-relaxed behavior improves both classification accuracy and classification rate for ordered datasets while sacrificing little to no accuracy for a randomized dataset. This work establishes methods by which artificial spintronic neurons can be flexibly adapted to datasets. △ Less

Submitted 8 January, 2021; originally announced January 2021.

Comments: 5 pages, 5 figures

arXiv:2011.06075 [pdf]

doi 10.1109/TED.2022.3159508

Domain Wall Leaky Integrate-and-Fire Neurons with Shape-Based Configurable Activation Functions

Authors: Wesley H. Brigner, Naimul Hassan, Xuan Hu, Christopher H. Bennett, Felipe Garcia-Sanchez, Can Cui, Alvaro Velasquez, Matthew J. Marinella, Jean Anne C. Incorvia, Joseph S. Friedman

Abstract: Complementary metal oxide semiconductor (CMOS) devices display volatile characteristics, and are not well suited for analog applications such as neuromorphic computing. Spintronic devices, on the other hand, exhibit both non-volatile and analog features, which are well-suited to neuromorphic computing. Consequently, these novel devices are at the forefront of beyond-CMOS artificial intelligence ap… ▽ More Complementary metal oxide semiconductor (CMOS) devices display volatile characteristics, and are not well suited for analog applications such as neuromorphic computing. Spintronic devices, on the other hand, exhibit both non-volatile and analog features, which are well-suited to neuromorphic computing. Consequently, these novel devices are at the forefront of beyond-CMOS artificial intelligence applications. However, a large quantity of these artificial neuromorphic devices still require the use of CMOS, which decreases the efficiency of the system. To resolve this, we have previously proposed a number of artificial neurons and synapses that do not require CMOS for operation. Although these devices are a significant improvement over previous renditions, their ability to enable neural network learning and recognition is limited by their intrinsic activation functions. This work proposes modifications to these spintronic neurons that enable configuration of the activation functions through control of the shape of a magnetic domain wall track. Linear and sigmoidal activation functions are demonstrated in this work, which can be extended through a similar approach to enable a wide variety of activation functions. △ Less

Submitted 11 November, 2020; originally announced November 2020.

arXiv:2010.13879 [pdf]

doi 10.1063/5.0038521

Domain Wall-Magnetic Tunnel Junction Spin Orbit Torque Devices and Circuits for In-Memory Computing

Authors: Mahshid Alamdar, Thomas Leonard, Can Cui, Bishweshwor P. Rimal, Lin Xue, Otitoaleke G. Akinola, T. Patrick Xiao, Joseph S. Friedman, Christopher H. Bennett, Matthew J. Marinella, Jean Anne C. Incorvia

Abstract: There are pressing problems with traditional computing, especially for accomplishing data-intensive and real-time tasks, that motivate the development of in-memory computing devices to both store information and perform computation. Magnetic tunnel junction (MTJ) memory elements can be used for computation by manipulating a domain wall (DW), a transition region between magnetic domains. But, these… ▽ More There are pressing problems with traditional computing, especially for accomplishing data-intensive and real-time tasks, that motivate the development of in-memory computing devices to both store information and perform computation. Magnetic tunnel junction (MTJ) memory elements can be used for computation by manipulating a domain wall (DW), a transition region between magnetic domains. But, these devices have suffered from challenges: spin transfer torque (STT) switching of a DW requires high current, and the multiple etch steps needed to create an MTJ pillar on top of a DW track has led to reduced tunnel magnetoresistance (TMR). These issues have limited experimental study of devices and circuits. Here, we study prototypes of three-terminal domain wall-magnetic tunnel junction (DW-MTJ) in-memory computing devices that can address data processing bottlenecks and resolve these challenges by using perpendicular magnetic anisotropy (PMA), spin-orbit torque (SOT) switching, and an optimized lithography process to produce average device tunnel magnetoresistance TMR = 164%, resistance-area product RA = 31 Ω-μm^2, close to the RA of the unpatterned film, and lower switching current density compared to using spin transfer torque. A two-device circuit shows bit propagation between devices. Device initialization variation in switching voltage is shown to be curtailed to 7% by controlling the DW initial position, which we show corresponds to 96% accuracy in a DW-MTJ full adder simulation. These results make strides in using MTJs and DWs for in-memory and neuromorphic computing applications. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 15 pages, 4 figures, 1 table. Mahshid Alamdar and Thomas Leonard are co-first authors

arXiv:2004.00802 [pdf]

Device-aware inference operations in SONOS nonvolatile memory arrays

Authors: Christopher H. Bennett, T. Patrick Xiao, Ryan Dellana, Vineet Agrawal, Ben Feinberg, Venkatraman Prabhakar, Krishnaswamy Ramkumar, Long Hinh, Swatilekha Saha, Vijay Raghavan, Ramesh Chettuvetty, Sapan Agarwal, Matthew J. Marinella

Abstract: Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, o… ▽ More Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: To be presented at IEEE International Physics Reliability Symposium (IRPS) 2020

arXiv:2003.11120 [pdf, other]

Unsupervised Competitive Hardware Learning Rule for Spintronic Clustering Architecture

Authors: Alvaro Velasquez, Christopher H. Bennett, Naimul Hassan, Wesley H. Brigner, Otitoaleke G. Akinola, Jean Anne C. Incorvia, Matthew J. Marinella, Joseph S. Friedman

Abstract: We propose a hardware learning rule for unsupervised clustering within a novel spintronic computing architecture. The proposed approach leverages the three-terminal structure of domain-wall magnetic tunnel junction devices to establish a feedback loop that serves to train such devices when they are used as synapses in a neuromorphic computing architecture. We propose a hardware learning rule for unsupervised clustering within a novel spintronic computing architecture. The proposed approach leverages the three-terminal structure of domain-wall magnetic tunnel junction devices to establish a feedback loop that serves to train such devices when they are used as synapses in a neuromorphic computing architecture. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2003.10396 [pdf, other]

Evaluating complexity and resilience trade-offs in emerging memory inference machines

Authors: Christopher H. Bennett, Ryan Dellana, T. Patrick Xiao, Ben Feinberg, Sapan Agarwal, Suma Cardwell, Matthew J. Marinella, William Severa, Brad Aimone

Abstract: Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our w… ▽ More Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our work proposes a middle path towards high performance and strong resilience utilizing the Mosaics framework, and specifically by re-using synaptic connections in a recurrent neural network implementation that possesses a natural form of noise-immunity. △ Less

Submitted 25 February, 2020; originally announced March 2020.

arXiv:2003.02357 [pdf, other]

Plasticity-Enhanced Domain-Wall MTJ Neural Networks for Energy-Efficient Online Learning

Authors: Christopher H. Bennett, T. Patrick Xiao, Can Cui, Naimul Hassan, Otitoaleke G. Akinola, Jean Anne C. Incorvia, Alvaro Velasquez, Joseph S. Friedman, Matthew J. Marinella

Abstract: Machine learning implements backpropagation via abundant training samples. We demonstrate a multi-stage learning system realized by a promising non-volatile memory device, the domain-wall magnetic tunnel junction (DW-MTJ). The system consists of unsupervised (clustering) as well as supervised sub-systems, and generalizes quickly (with few samples). We demonstrate interactions between physical prop… ▽ More Machine learning implements backpropagation via abundant training samples. We demonstrate a multi-stage learning system realized by a promising non-volatile memory device, the domain-wall magnetic tunnel junction (DW-MTJ). The system consists of unsupervised (clustering) as well as supervised sub-systems, and generalizes quickly (with few samples). We demonstrate interactions between physical properties of this device and optimal implementation of neuroscience-inspired plasticity learning rules, and highlight performance on a suite of tasks. Our energy analysis confirms the value of the approach, as the learning budget stays below 20 $μJ$ even for large tasks used typically in machine learning. △ Less

Submitted 4 March, 2020; originally announced March 2020.

arXiv:2002.00862 [pdf]

CMOS-Free Multilayer Perceptron Enabled by Four-Terminal MTJ Device

Authors: Wesley H. Brigner, Naimul Hassan, Xuan Hu, Christopher H. Bennett, Felipe Garcia-Sanchez, Matthew J. Marinella, Jean Anne C. Incorvia, Joseph S. Friedman

Abstract: Neuromorphic computing promises revolutionary improvements over conventional systems for applications that process unstructured information. To fully realize this potential, neuromorphic systems should exploit the biomimetic behavior of emerging nanodevices. In particular, exceptional opportunities are provided by the non-volatility and analog capabilities of spintronic devices. While spintronic d… ▽ More Neuromorphic computing promises revolutionary improvements over conventional systems for applications that process unstructured information. To fully realize this potential, neuromorphic systems should exploit the biomimetic behavior of emerging nanodevices. In particular, exceptional opportunities are provided by the non-volatility and analog capabilities of spintronic devices. While spintronic devices have previously been proposed that emulate neurons and synapses, complementary metal-oxide-semiconductor (CMOS) devices are required to implement multilayer spintronic perceptron crossbars. This work therefore proposes a new spintronic neuron that enables purely spintronic multilayer perceptrons, eliminating the need for CMOS circuitry and simplifying fabrication. △ Less

Submitted 3 February, 2020; originally announced February 2020.

arXiv:1912.11516 [pdf, other]

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM

Authors: Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic, Wen-mei Hwu, Kaushik Roy

Abstract: The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars a… ▽ More The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our evaluation shows that PANTHER achieves up to $8.02\times$, $54.21\times$, and $103\times$ energy reductions as well as $7.16\times$, $4.02\times$, and $16\times$ execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: 13 pages, 15 figures

arXiv:1912.04505 [pdf]

Maximized Lateral Inhibition in Paired Magnetic Domain Wall Racetracks for Neuromorphic Computing

Authors: C. Cui, O. G. Akinola, N. Hassan, C. H. Bennett, M. J. Marinella, J. S. Friedman, J. A. C. Incorvia

Abstract: Lateral inhibition is an important functionality in neuromorphic computing, modeled after the biological neuron behavior that a firing neuron deactivates its neighbors belonging to the same layer and prevents them from firing. In most neuromorphic hardware platforms lateral inhibition is implemented by external circuitry, thereby decreasing the energy efficiency and increasing the area overhead of… ▽ More Lateral inhibition is an important functionality in neuromorphic computing, modeled after the biological neuron behavior that a firing neuron deactivates its neighbors belonging to the same layer and prevents them from firing. In most neuromorphic hardware platforms lateral inhibition is implemented by external circuitry, thereby decreasing the energy efficiency and increasing the area overhead of such systems. Recently, the domain wall -- magnetic tunnel junction (DW-MTJ) artificial neuron is demonstrated in modeling to be inherently inhibitory. Without peripheral circuitry, lateral inhibition in DW-MTJ neurons results from magnetostatic interaction between neighboring neuron cells. However, the lateral inhibition mechanism in DW-MTJ neurons has not been studied thoroughly, leading to weak inhibition only in very closely-spaced devices. This work approaches these problems by modeling current- and field- driven DW motion in a pair of adjacent DW-MTJ neurons. We maximize the magnitude of lateral inhibition by tuning the magnetic interaction between the neurons. The results are explained by current-driven DW velocity characteristics in response to external magnetic field and quantified by an analytical model. Finally, the dependence of lateral inhibition strength on device parameters is investigated. This provides a guideline for the optimization of lateral inhibition implementation in DW-MTJ neurons. With strong lateral inhibition achieved, a path towards competitive learning algorithms such as the winner-take-all are made possible on such neuromorphic devices. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 8 pages, 5 figures

arXiv:1905.05485 [pdf]

doi 10.1109/TED.2019.2938952

Shape-based Magnetic Domain Wall Drift for an Artificial Spintronic Leaky Integrate-and-Fire Neuron

Authors: Wesley H. Brigner, Naimul Hassan, Lucian Jiang-Wei, Xuan Hu, Diptish Saha, Christopher H. Bennett, Matthew J. Marinella, Jean Anne C. Incorvia, Felipe Garcia-Sanchez, Joseph S. Friedman

Abstract: Spintronic devices based on domain wall (DW) motion through ferromagnetic nanowire tracks have received great interest as components of neuromorphic information processing systems. Previous proposals for spintronic artificial neurons required external stimuli to perform the leaking functionality, one of the three fundamental functions of a leaky integrate-and-fire (LIF) neuron. The use of this ext… ▽ More Spintronic devices based on domain wall (DW) motion through ferromagnetic nanowire tracks have received great interest as components of neuromorphic information processing systems. Previous proposals for spintronic artificial neurons required external stimuli to perform the leaking functionality, one of the three fundamental functions of a leaky integrate-and-fire (LIF) neuron. The use of this external magnetic field or electrical current stimulus results in either a decrease in energy efficiency or an increase in fabrication complexity. In this work, we modify the shape of previously demonstrated three-terminal magnetic tunnel junction neurons to perform the leaking operation without any external stimuli. The trapezoidal structure causes shape-based DW drift, thus intrinsically providing the leaking functionality with no hardware cost. This LIF neuron therefore promises to advance the development of spintronic neural network crossbar arrays. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1901.10570 [pdf]

Using Floating Gate Memory to Train Ideal Accuracy Neural Networks

Authors: Sapan Agarwal, Diana Garland, John Niroula, Robin B, Jacobs-Gedrim, Alex Hsia, Michael S. Van Heukelom, Elliot Fuller, Bruce Draper, Matthew J. Marinella

Abstract: Floating gate SONOS (Silicon-Oxygen-Nitrogen-Oxygen-Silicon) transistors can be used to train neural networks to ideal accuracies that match those of floating point digital weights on the MNIST dataset when using multiple devices to represent a weight or within 1% of ideal accuracy when using a single device. This is enabled by operating devices in the subthreshold regime, where they exhibit symme… ▽ More Floating gate SONOS (Silicon-Oxygen-Nitrogen-Oxygen-Silicon) transistors can be used to train neural networks to ideal accuracies that match those of floating point digital weights on the MNIST dataset when using multiple devices to represent a weight or within 1% of ideal accuracy when using a single device. This is enabled by operating devices in the subthreshold regime, where they exhibit symmetric write nonlinearities. A neural training accelerator core based on SONOS with a single device per weight would increase energy efficiency by 120X, operate 2.1X faster and require 5X lower area than an optimized SRAM based ASIC. △ Less

Submitted 27 February, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

arXiv:1710.09491 [pdf]

doi 10.1007/s00339-018-2041-3

Electroforming-Free TaOx Memristors using Focused Ion Beam Irradiations

Authors: J. L. Pacheco, D. L. Perry, D. R. Hughart, M. Marinella, E. Bielejec

Abstract: We demonstrate creation of electroforming-free TaOx memristive devices using focused ion beam irradiations to locally define conductive filaments in TaOx films. Electrical characterization shows that these irradiations directly create fully functional memristors without the need for electroforming. Ion beam forming of conductive filaments combined with state-of-the-art nano-patterning presents a C… ▽ More We demonstrate creation of electroforming-free TaOx memristive devices using focused ion beam irradiations to locally define conductive filaments in TaOx films. Electrical characterization shows that these irradiations directly create fully functional memristors without the need for electroforming. Ion beam forming of conductive filaments combined with state-of-the-art nano-patterning presents a CMOS compatible approach to wafer level fabrication of fully formed and operational memristors. △ Less

Submitted 25 October, 2017; originally announced October 2017.

arXiv:1707.09952 [pdf]

doi 10.1109/JETCAS.2018.2796379

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

Authors: Matthew J. Marinella, Sapan Agarwal, Alexander Hsia, Isaac Richter, Robin Jacobs-Gedrim, John Niroula, Steven J. Plimpton, Engin Ipek, Conrad D. James

Abstract: Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50M parameters are made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the n… ▽ More Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50M parameters are made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the next several orders of magnitude in performance per watt gains. Using an analog resistive memory (ReRAM) crossbar to perform key matrix operations in an accelerator is an attractive option. This work presents a detailed design using a state of the art 14/16 nm PDK for of an analog crossbar circuit block designed to process three key kernels required in training and inference of neural networks. A detailed circuit and device-level analysis of energy, latency, area, and accuracy are given and compared to relevant designs using standard digital ReRAM and SRAM operations. It is shown that the analog accelerator has a 270x energy and 540x latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate (MAC). Compared to an SRAM based accelerator, the energy is 430X better and latency is 34X better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies. △ Less

Submitted 16 February, 2018; v1 submitted 31 July, 2017; originally announced July 2017.

arXiv:1406.4033 [pdf]

doi 10.1063/1.4895526

Degenerate Resistive Switching and Ultrahigh Density Storage in Resistive Memory

Authors: Andrew J. Lohn, Patrick R. Mickel, Conrad D. James, Matthew J. Marinella

Abstract: We show that, in tantalum oxide resistive memories, activation power provides a multi-level variable for information storage that can be set and read separately from the resistance. These two state variables (resistance and activation power) can be precisely controlled in two steps: (1) the possible activation power states are selected by partially reducing resistance, then (2) a subsequent partia… ▽ More We show that, in tantalum oxide resistive memories, activation power provides a multi-level variable for information storage that can be set and read separately from the resistance. These two state variables (resistance and activation power) can be precisely controlled in two steps: (1) the possible activation power states are selected by partially reducing resistance, then (2) a subsequent partial increase in resistance specifies the resistance state and the final activation power state. We show that these states can be precisely written and read electrically, making this approach potentially amenable for ultra-high density memories. We provide a theoretical explanation for information storage and retrieval from activation power and experimentally demonstrate information storage in a third dimension related to the change in activation power with resistance. △ Less

Submitted 16 June, 2014; originally announced June 2014.

Showing 1–21 of 21 results for author: Marinella, M