Search | arXiv e-print repository

Measurement-driven neural-network training for integrated magnetic tunnel junction arrays

Authors: William A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. Hoskins

Abstract: The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as… ▽ More The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications. △ Less

Submitted 14 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 17 pages, 9 figures

arXiv:2301.12002 [pdf, other]

doi 10.1103/PhysRevC.109.L061301

New isomeric transition in $^{36}$Mg: Bridging the N=20 and N=28 islands of inversion

Authors: M. Madurga, J. M. Christie, Z. Xu, R. Grzywacz, A. Poves, T. King, J. M. Allmond, A. Chester, I. Cox, J. Farr, I. Fletcher, J. Heideman, D. Hoskins, A. Laminack, S. Liddick, S. Neupane, A. L. Richard, N. Shimizu, P. Shuai, K. Siegl, Y. Utsuno, P. Wagenknecht, R. Yokoyama

Abstract: We observed a new isomeric gamma transition at 168 keV in $^{36}$Mg, with a half-life of T$_{1/2}$=[130-500]$(\pm40)(^{+800}_{-20})_{sys}$ ns. We propose that the observed transition de-excites a new 0$^+$ isomeric state and populates the previously known first 2$^+$ state. The existence of this isomer is consistent with the predictions of the large-scale shell model calculations of $^{36}$Mg usin… ▽ More We observed a new isomeric gamma transition at 168 keV in $^{36}$Mg, with a half-life of T$_{1/2}$=[130-500]$(\pm40)(^{+800}_{-20})_{sys}$ ns. We propose that the observed transition de-excites a new 0$^+$ isomeric state and populates the previously known first 2$^+$ state. The existence of this isomer is consistent with the predictions of the large-scale shell model calculations of $^{36}$Mg using the sdpf-u-mix interaction. The observed excitation energy of the second 0$^+$ state is caused by the small energy separation between two prolate-deformed configurations where the intruder configuration corresponds to two neutron excitations from the {\it sd} to the {\it pf} shell. Within this interpretation, $^{36}$Mg becomes the crossing point between nuclei in which ground state deformed/superdeformed configurations are caused by the dominance of N=20 intruders ($^{32,34}$Mg) and nuclei where deformed configurations are associated with N=28 intruders ($^{38}$Mg and beyond). We found the lack of three-body monopole corrections in other effective interactions results in a predominance of N=20 intruder configurations past $^{38}$Mg incompatible with our observation. We conclude that $^{36}$Mg bridges the N=20 and N=28 islands of inversion, forming the so-called Big Island of Deformation. △ Less

Submitted 18 June, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. C 109, L061301 (2024)

arXiv:2112.09159 [pdf]

doi 10.1103/PhysRevApplied.18.014039

Implementation of a Binary Neural Network on a Passive Array of Magnetic Tunnel Junctions

Authors: Jonathan M. Goodwill, Nitin Prasad, Brian D. Hoskins, Matthew W. Daniels, Advait Madhavan, Lei Wan, Tiffany S. Santos, Michael Tran, Jordan A. Katine, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland

Abstract: The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magn… ▽ More The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magnetic tunnel junctions (MTJs) to demonstrate neural network hardware inference based on passive arrays of MTJs. In general, transferring a trained network model to hardware for inference is confronted by degradation in performance due to device-to-device variations, write errors, parasitic resistance, and nonidealities in the substrate. To quantify the effect of these hardware realities, we benchmark 300 unique weight matrix solutions of a 2-layer perceptron to classify the Wine dataset for both classification accuracy and write fidelity. Despite device imperfections, we achieve software-equivalent accuracy of up to 95.3 % with proper tuning of network parameters in 15 x 15 MTJ arrays having a range of device sizes. The success of this tuning process shows that new metrics are needed to characterize the performance and quality of networks reproduced in mixed signal hardware. △ Less

Submitted 6 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: 22 pages plus 8 pages supplemental material; 7 figures plus 7 supplemental figures

Journal ref: Physical Review Applied, 18(1) 014039 (2022)

arXiv:2004.12041 [pdf, other]

Memory-efficient training with streaming dimensionality reduction

Authors: Siyuan Huang, Brian D. Hoskins, Matthew W. Daniels, Mark D. Stiles, Gina C. Adam

Abstract: The movement of large quantities of data during the training of a Deep Neural Network presents immense challenges for machine learning workloads. To minimize this overhead, especially on the movement and calculation of gradient information, we introduce streaming batch principal component analysis as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations… ▽ More The movement of large quantities of data during the training of a Deep Neural Network presents immense challenges for machine learning workloads. To minimize this overhead, especially on the movement and calculation of gradient information, we introduce streaming batch principal component analysis as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic k-rank approximation of the network gradient. We demonstrate that the low rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini batch gradient descent. These results can lead to both improvements in the design of application specific integrated circuits for deep learning and in the speed of synchronization of machine learning models trained with data parallelism. △ Less

Submitted 24 April, 2020; originally announced April 2020.

arXiv:1903.01635 [pdf]

doi 10.3389/fnins.2019.00793

Streaming Batch Eigenupdates for Hardware Neuromorphic Networks

Authors: Brian D. Hoskins, Matthew W. Daniels, Siyuan Huang, Advait Madhavan, Gina C. Adam, Nikolai Zhitenev, Jabez J. McClelland, Mark D. Stiles

Abstract: Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the tim… ▽ More Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs). Though immense acceleration of the training process can be achieved by leveraging the fact that the time complexity of training does not scale with the network size, it is limited by the space complexity of stochastic gradient descent, which grows quadratically. The main objective of this work is to reduce this space complexity by using low-rank approximations of stochastic gradient descent. This low spatial complexity combined with streaming methods allows for significant reductions in memory and compute overhead, opening the doors for improvements in area, time and energy efficiency of training. We refer to this algorithm and architecture to implement it as the streaming batch eigenupdate (SBE) approach. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: 13 pages, 5 figures

Journal ref: Frontiers in Neuroscience 13 (2019): 793

arXiv:1802.02545 [pdf]

In aqua electrochemistry probed by XPEEM: experimental setup, examples, and challenges

Authors: Slavomír Nemšák, Evgheni Strelcov, Hongxuan Guo, Brian D. Hoskins, Tomáš Duchoň, David N. Mueller, Alexander Yulaev, Ivan Vlassiouk, Alexander Tselev, Claus M. Schneider, Andrei Kolmakov

Abstract: Recent developments in environmental and liquid cells equipped with electron transparent graphene windows have enabled traditional surface science spectromicroscopy tools, such as X-ray photoelectron spectroscopy (XPS), photoemission electron microscopy (PEEM), and scanning electron microscopy (SEM) to be applied to study solid-liquid and liquid-gas interfaces. Here, we focus on the experimental i… ▽ More Recent developments in environmental and liquid cells equipped with electron transparent graphene windows have enabled traditional surface science spectromicroscopy tools, such as X-ray photoelectron spectroscopy (XPS), photoemission electron microscopy (PEEM), and scanning electron microscopy (SEM) to be applied to study solid-liquid and liquid-gas interfaces. Here, we focus on the experimental implementation of PEEM to probe electrified graphene-liquid interfaces using electrolyte-filled microchannel arrays as a new sample platform. We demonstrate the important methodological advantage of these multi-sample arrays: they enable the combination of the wide field of view hyperspectral imaging capabilities from PEEM with the use of powerful data mining algorithms to reveal spectroscopic and temporal behaviors at the level of the individual microsample or the entire array ensemble △ Less

Submitted 7 February, 2018; originally announced February 2018.

Comments: 14 pages 6 figures 47 references

arXiv:1704.01475 [pdf]

doi 10.1038/s41467-017-02116-9

Stateful characterization of resistive switching TiO2 with electron beam induced currents

Authors: Brian D. Hoskins, Gina C. Adam, Evgheni Strelcov, Nikolai Zhitenev, Andrei Kolmakov, Dmitri B. Strukov, Jabez J. McClelland

Abstract: Metal oxide resistive switches are increasingly important as possible artificial synapses in next generation neuromorphic networks. Nevertheless, there is still no codified set of tools for studying properties of the devices. To this end, we demonstrate electron beam induced current measurements as a powerful method to monitor the development of local resistive switching in TiO2 based devices. By… ▽ More Metal oxide resistive switches are increasingly important as possible artificial synapses in next generation neuromorphic networks. Nevertheless, there is still no codified set of tools for studying properties of the devices. To this end, we demonstrate electron beam induced current measurements as a powerful method to monitor the development of local resistive switching in TiO2 based devices. By comparing beam-energy dependent electron beam induced currents with Monte Carlo simulations of the energy absorption in different device layers, it is possible to deconstruct the origins of filament image formation and relate this to both morphological changes and the state of the switch. By clarifying the contrast mechanisms in electron beam induced current microscopy it is possible to gain new insights into the scaling of the resistive switching phenomenon and observe the formation of a current leakage region around the switching filament. Additionally, analysis of symmetric device structures reveals propagating polarization domains. △ Less

Submitted 30 October, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

Comments: 27 Pages 10 figures

Journal ref: Nature Communications 8, 1972 (2017)

arXiv:1509.02986 [pdf]

Three-Dimensional Stateful Material Implication Logic

Authors: Gina C. Adam, Brian D. Hoskins, Mirko Prezioso, Dmitri B. Strukov

Abstract: Monolithic three-dimensional integration of memory and logic circuits could dramatically improve performance and energy efficiency of computing systems. Some conventional and emerging memories are suitable for vertical integration, including highly scalable metal-oxide resistive switching devices (memristors), yet integration of logic circuits proves to be much more challenging. Here we demonstrat… ▽ More Monolithic three-dimensional integration of memory and logic circuits could dramatically improve performance and energy efficiency of computing systems. Some conventional and emerging memories are suitable for vertical integration, including highly scalable metal-oxide resistive switching devices (memristors), yet integration of logic circuits proves to be much more challenging. Here we demonstrate memory and logic functionality in a monolithic three-dimensional circuit by adapting recently proposed memristor-based stateful material implication logic. Though such logic has been already implemented with a variety of memory devices, prohibitively large device variability in the most prospective memristor-based circuits has limited experimental demonstrations to simple gates and just a few cycles of operations. By develo** a low-temperature, low-variability fabrication process, and modifying the original circuit to increase its robustness to device imperfections, we experimentally show, for the first time, reliable multi-cycle multi-gate material implication logic operation within a three-dimensional stack of monolithically integrated memristors. The direct data manipulation in three dimensions enables extremely compact and high-throughput logic-in-memory computing and, remarkably, presents a viable solution for the Feynman grand challenge of implementing an 8-bit adder at the nanoscale. △ Less

Submitted 9 September, 2015; originally announced September 2015.

Comments: 24 pages, 13 figures

Showing 1–8 of 8 results for author: Hoskins, D