-
Time-Series Forecasting and Sequence Learning Using Memristor-based Reservoir System
Authors:
Abdullah M. Zyarah,
Dhireesha Kudithipudi
Abstract:
Pushing the frontiers of time-series information processing in ever-growing edge devices with stringent resources has been impeded by the system's ability to process information and learn locally on the device. Local processing and learning typically demand intensive computations and massive storage as the process involves retrieving information and tuning hundreds of parameters back in time. In t…
▽ More
Pushing the frontiers of time-series information processing in ever-growing edge devices with stringent resources has been impeded by the system's ability to process information and learn locally on the device. Local processing and learning typically demand intensive computations and massive storage as the process involves retrieving information and tuning hundreds of parameters back in time. In this work, we developed a memristor-based echo state network accelerator that features efficient temporal data processing and in-situ online learning. The proposed design is benchmarked using various datasets involving real-world tasks, such as forecasting the load energy consumption and weather conditions. The experimental results illustrate that the hardware model experiences a marginal degradation (~4.8%) in performance as compared to the software model. This is mainly attributed to the limited precision and dynamic range of network parameters when emulated using memristor devices. The proposed system is evaluated for lifespan, robustness, and energy-delay product. It is observed that the system demonstrates a reasonable robustness for device failure below 10%, which may occur due to stuck-at faults. Furthermore, 246X reduction in energy consumption is achieved when compared to a custom CMOS digital design implemented at the same technology node.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Watch Your Step: Optimal Retrieval for Continual Learning at Scale
Authors:
Truman Hickok,
Dhireesha Kudithipudi
Abstract:
In continual learning, a model learns incrementally over time while minimizing interference between old and new tasks. One of the most widely used approaches in continual learning is referred to as replay. Replay methods support interleaved learning by storing past experiences in a replay buffer. Although there are methods for selectively constructing the buffer and reprocessing its contents, ther…
▽ More
In continual learning, a model learns incrementally over time while minimizing interference between old and new tasks. One of the most widely used approaches in continual learning is referred to as replay. Replay methods support interleaved learning by storing past experiences in a replay buffer. Although there are methods for selectively constructing the buffer and reprocessing its contents, there is limited exploration of the problem of selectively retrieving samples from the buffer. Current solutions have been tested in limited settings and, more importantly, in isolation. Existing work has also not explored the impact of duplicate replays on performance. In this work, we propose a framework for evaluating selective retrieval strategies, categorized by simple, independent class- and sample-selective primitives. We evaluated several combinations of existing strategies for selective retrieval and present their performances. Furthermore, we propose a set of strategies to prevent duplicate replays and explore whether new samples with low loss values can be learned without replay. In an effort to match our problem setting to a realistic continual learning pipeline, we restrict our experiments to a setting involving a large, pre-trained, open vocabulary object detection model, which is fully fine-tuned on a sequence of 15 datasets.
△ Less
Submitted 9 May, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Continual Learning and Catastrophic Forgetting
Authors:
Gido M. van de Ven,
Nicholas Soures,
Dhireesha Kudithipudi
Abstract:
This book chapter delves into the dynamics of continual learning, which is the process of incrementally learning from a non-stationary stream of data. Although continual learning is a natural skill for the human brain, it is very challenging for artificial neural networks. An important reason is that, when learning something new, these networks tend to quickly and drastically forget what they had…
▽ More
This book chapter delves into the dynamics of continual learning, which is the process of incrementally learning from a non-stationary stream of data. Although continual learning is a natural skill for the human brain, it is very challenging for artificial neural networks. An important reason is that, when learning something new, these networks tend to quickly and drastically forget what they had learned before, a phenomenon known as catastrophic forgetting. Especially in the last decade, continual learning has become an extensively studied topic in deep learning. This book chapter reviews the insights that this field has generated.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Continual Learning: Applications and the Road Forward
Authors:
Eli Verwimp,
Rahaf Aljundi,
Shai Ben-David,
Matthias Bethge,
Andrea Cossu,
Alexander Gepperth,
Tyler L. Hayes,
Eyke Hüllermeier,
Christopher Kanan,
Dhireesha Kudithipudi,
Christoph H. Lampert,
Martin Mundt,
Razvan Pascanu,
Adrian Popescu,
Andreas S. Tolias,
Joost van de Weijer,
Bing Liu,
Vincenzo Lomonaco,
Tinne Tuytelaars,
Gido M. van de Ven
Abstract:
Continual learning is a subfield of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by examining recent continual learning papers published at four…
▽ More
Continual learning is a subfield of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by examining recent continual learning papers published at four major machine learning conferences, and show that memory-constrained settings dominate the field. Then, we discuss five open problems in machine learning, and even though they might seem unrelated to continual learning at first sight, we show that continual learning will inevitably be part of their solution. These problems are model editing, personalization and specialization, on-device learning, faster (re-)training and reinforcement learning. Finally, by comparing the desiderata from these unsolved problems and the current assumptions in continual learning, we highlight and discuss four future directions for continual learning research. We hope that this work offers an interesting perspective on the future of continual learning, while displaying its potential value and the paths we have to pursue in order to make it successful. This work is the result of the many discussions the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.
△ Less
Submitted 28 March, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Design Principles for Lifelong Learning AI Accelerators
Authors:
Dhireesha Kudithipudi,
Anurag Daram,
Abdullah M. Zyarah,
Fatima Tuz Zohora,
James B. Aimone,
Angel Yanguas-Gil,
Nicholas Soures,
Emre Neftci,
Matthew Mattina,
Vincenzo Lomonaco,
Clare D. Thiem,
Benjamin Epstein
Abstract:
Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI). The development of lifelong learning algorithms could lead to a range of novel AI applications, but this will also require the development of appropriate hardware accelerators, particularly if the models are to be deployed…
▽ More
Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI). The development of lifelong learning algorithms could lead to a range of novel AI applications, but this will also require the development of appropriate hardware accelerators, particularly if the models are to be deployed on edge platforms, which have strict size, weight, and power constraints. Here, we explore the design of lifelong learning AI accelerators that are intended for deployment in untethered environments. We identify key desirable capabilities for lifelong learning accelerators and highlight metrics to evaluate such accelerators. We then discuss current edge AI accelerators and explore the future design of lifelong learning accelerators, considering the role that different emerging technologies could play.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems
Authors:
Jason Yik,
Korneel Van den Berghe,
Douwe den Blanken,
Younes Bouhadjar,
Maxime Fabre,
Paul Hueber,
Denis Kleyko,
Noah Pacik-Nelson,
Pao-Sheng Vincent Sun,
Guangzhi Tang,
Shenqi Wang,
Biyan Zhou,
Soikat Hasan Ahmed,
George Vathakkattil Joseph,
Benedetto Leto,
Aurora Micheli,
Anurag Kumar Mishra,
Gregor Lenz,
Tao Sun,
Zergham Ahmed,
Mahmoud Akl,
Brian Anderson,
Andreas G. Andreou,
Chiara Bartolozzi,
Arindam Basu
, et al. (73 additional authors not shown)
Abstract:
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu…
▽ More
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community.
△ Less
Submitted 17 January, 2024; v1 submitted 10 April, 2023;
originally announced April 2023.
-
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems
Authors:
Megan M. Baker,
Alexander New,
Mario Aguilar-Simon,
Ziad Al-Halah,
Sébastien M. R. Arnold,
Ese Ben-Iwhiwhu,
Andrew P. Brna,
Ethan Brooks,
Ryan C. Brown,
Zachary Daniels,
Anurag Daram,
Fabien Delattre,
Ryan Dellana,
Eric Eaton,
Haotian Fu,
Kristen Grauman,
Jesse Hostetler,
Shariq Iqbal,
Cassandra Kent,
Nicholas Ketz,
Soheil Kolouri,
George Konidaris,
Dhireesha Kudithipudi,
Erik Learned-Miller,
Seungwon Lee
, et al. (22 additional authors not shown)
Abstract:
Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through th…
▽ More
Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
TENT: Efficient Quantization of Neural Networks on the tiny Edge with Tapered FixEd PoiNT
Authors:
Hamed F. Langroudi,
Vedant Karia,
Tej Pandit,
Dhireesha Kudithipudi
Abstract:
In this research, we propose a new low-precision framework, TENT, to leverage the benefits of a tapered fixed-point numerical format in TinyML models. We introduce a tapered fixed-point quantization algorithm that matches the numerical format's dynamic range and distribution to that of the deep neural network model's parameter distribution at each layer. An accelerator architecture for the tapered…
▽ More
In this research, we propose a new low-precision framework, TENT, to leverage the benefits of a tapered fixed-point numerical format in TinyML models. We introduce a tapered fixed-point quantization algorithm that matches the numerical format's dynamic range and distribution to that of the deep neural network model's parameter distribution at each layer. An accelerator architecture for the tapered fixed-point with TENT framework is proposed. Results show that the accuracy on classification tasks improves up to ~31 % with an energy overhead of ~17-30 % as compared to fixed-point, for ConvNet and ResNet-18 models.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
End-to-End Memristive HTM System for Pattern Recognition and Sequence Prediction
Authors:
Abdullah M. Zyarah,
Kevin Gomez,
Dhireesha Kudithipudi
Abstract:
Neuromorphic systems that learn and predict from streaming inputs hold significant promise in pervasive edge computing and its applications. In this paper, a neuromorphic system that processes spatio-temporal information on the edge is proposed. Algorithmically, the system is based on hierarchical temporal memory that inherently offers online learning, resiliency, and fault tolerance. Architectura…
▽ More
Neuromorphic systems that learn and predict from streaming inputs hold significant promise in pervasive edge computing and its applications. In this paper, a neuromorphic system that processes spatio-temporal information on the edge is proposed. Algorithmically, the system is based on hierarchical temporal memory that inherently offers online learning, resiliency, and fault tolerance. Architecturally, it is a full custom mixed-signal design with an underlying digital communication scheme and analog computational modules. Therefore, the proposed system features reconfigurability, real-time processing, low power consumption, and low-latency processing. The proposed architecture is benchmarked to predict on real-world streaming data. The network's mean absolute percentage error on the mixed-signal system is 1.129X lower compared to its baseline algorithm model. This reduction can be attributed to device non-idealities and probabilistic formation of synaptic connections. We demonstrate that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan. We also illustrate that the system offers 3.46X reduction in latency and 77.02X reduction in power consumption when compared to a custom CMOS digital design implemented at the same technology node. By employing specific low power techniques, such as clock gating, we observe 161.37X reduction in power consumption.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Metaplasticity in Multistate Memristor Synaptic Networks
Authors:
Fatima Tuz Zohora,
Abdullah M. Zyarah,
Nicholas Soures,
Dhireesha Kudithipudi
Abstract:
Recent studies have shown that metaplastic synapses can retain information longer than simple binary synapses and are beneficial for continual learning. In this paper, we explore the multistate metaplastic synapse characteristics in the context of high retention and reception of information. Inherent behavior of a memristor emulating the multistate synapse is employed to capture the metaplastic be…
▽ More
Recent studies have shown that metaplastic synapses can retain information longer than simple binary synapses and are beneficial for continual learning. In this paper, we explore the multistate metaplastic synapse characteristics in the context of high retention and reception of information. Inherent behavior of a memristor emulating the multistate synapse is employed to capture the metaplastic behavior. An integrated neural network study for learning and memory retention is performed by integrating the synapse in a $5\times3$ crossbar at the circuit level and $128\times128$ network at the architectural level. An on-device training circuitry ensures the dynamic learning in the network. In the $128\times128$ network, it is observed that the number of input patterns the multistate synapse can classify is $\simeq$ 2.1x that of a simple binary synapse model, at a mean accuracy of $\geq$ 75% .
△ Less
Submitted 25 February, 2020;
originally announced March 2020.
-
Analysis of Wide and Deep Echo State Networks for Multiscale Spatiotemporal Time Series Forecasting
Authors:
Zachariah Carmichael,
Humza Syed,
Dhireesha Kudithipudi
Abstract:
Echo state networks are computationally lightweight reservoir models inspired by the random projections observed in cortical circuitry. As interest in reservoir computing has grown, networks have become deeper and more intricate. While these networks are increasingly applied to nontrivial forecasting tasks, there is a need for comprehensive performance analysis of deep reservoirs. In this work, we…
▽ More
Echo state networks are computationally lightweight reservoir models inspired by the random projections observed in cortical circuitry. As interest in reservoir computing has grown, networks have become deeper and more intricate. While these networks are increasingly applied to nontrivial forecasting tasks, there is a need for comprehensive performance analysis of deep reservoirs. In this work, we study the influence of partitioning neurons given a budget and the effect of parallel reservoir pathways across different datasets exhibiting multi-scale and nonlinear dynamics.
△ Less
Submitted 1 July, 2019;
originally announced August 2019.
-
Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge
Authors:
Hamed F. Langroudi,
Zachariah Carmichael,
David Pastuch,
Dhireesha Kudithipudi
Abstract:
Low-precision DNNs have been extensively explored in order to reduce the size of DNN models for edge devices. Recently, the posit numerical format has shown promise for DNN data representation and compute with ultra-low precision in [5..8]-bits. However, previous studies were limited to studying posit for DNN inference only. In this paper, we propose the Cheetah framework, which supports both DNN…
▽ More
Low-precision DNNs have been extensively explored in order to reduce the size of DNN models for edge devices. Recently, the posit numerical format has shown promise for DNN data representation and compute with ultra-low precision in [5..8]-bits. However, previous studies were limited to studying posit for DNN inference only. In this paper, we propose the Cheetah framework, which supports both DNN training and inference using posits, as well as other commonly used formats. Additionally, the framework is amenable for different quantization approaches and supports mixed-precision floating point and fixed-point numerical formats. Cheetah is evaluated on three datasets: MNIST, Fashion MNIST, and CIFAR-10. Results indicate that 16-bit posits outperform 16-bit floating point in DNN training. Furthermore, performing inference with [5..8]-bit posits improves the trade-off between performance and energy-delay-product over both [5..8]-bit float and fixed-point.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Deep Learning Training on the Edge with Low-Precision Posits
Authors:
Hamed F. Langroudi,
Zachariah Carmichael,
Dhireesha Kudithipudi
Abstract:
Recently, the posit numerical format has shown promise for DNN data representation and compute with ultra-low precision ([5..8]-bit). However, majority of studies focus only on DNN inference. In this work, we propose DNN training using posits and compare with the floating point training. We evaluate on both MNIST and Fashion MNIST corpuses, where 16-bit posits outperform 16-bit floating point for…
▽ More
Recently, the posit numerical format has shown promise for DNN data representation and compute with ultra-low precision ([5..8]-bit). However, majority of studies focus only on DNN inference. In this work, we propose DNN training using posits and compare with the floating point training. We evaluate on both MNIST and Fashion MNIST corpuses, where 16-bit posits outperform 16-bit floating point for end-to-end DNN training.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks
Authors:
Zachariah Carmichael,
Hamed F. Langroudi,
Char Khazanov,
Jeffrey Lillie,
John L. Gustafson,
Dhireesha Kudithipudi
Abstract:
Deep neural networks (DNNs) have been demonstrated as effective prognostic models across various domains, e.g. natural language processing, computer vision, and genomics. However, modern-day DNNs demand high compute and memory storage for executing any reasonably complex task. To optimize the inference time and alleviate the power consumption of these networks, DNN accelerators with low-precision…
▽ More
Deep neural networks (DNNs) have been demonstrated as effective prognostic models across various domains, e.g. natural language processing, computer vision, and genomics. However, modern-day DNNs demand high compute and memory storage for executing any reasonably complex task. To optimize the inference time and alleviate the power consumption of these networks, DNN accelerators with low-precision representations of data and DNN parameters are being actively studied. An interesting research question is in how low-precision networks can be ported to edge-devices with similar performance as high-precision networks. In this work, we employ the fixed-point, floating point, and posit numerical formats at $\leq$8-bit precision within a DNN accelerator, Deep Positron, with exact multiply-and-accumulate (EMAC) units for inference. A unified analysis quantifies the trade-offs between overall network efficiency and performance across five classification tasks. Our results indicate that posits are a natural fit for DNN inference, outperforming at $\leq$8-bit precision, and can be realized with competitive resource requirements relative to those of floating point.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis
Authors:
Abdullah M. Zyarah,
Dhireesha Kudithipudi
Abstract:
Hierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise for invariant representations of spatial and spatiotemporal inputs. This paper presents a comprehensive neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation classifier, which are fundamental to the algorithm. There are several unique features in the…
▽ More
Hierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise for invariant representations of spatial and spatiotemporal inputs. This paper presents a comprehensive neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation classifier, which are fundamental to the algorithm. There are several unique features in the proposed architecture that tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM synapses is identified and a new Z-window function is proposed. The architecture exploits the concept of synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark spots caused by unutilized crossbar regions and supports rapid on-chip training within 2 clock cycles. This research also leverages plasticity mechanisms such as neurogenesis and homeostatic intrinsic plasticity to strengthen the robustness and performance of the SP. The proposed design is benchmarked for image recognition tasks using MNIST and Yale faces datasets, and is evaluated using different metrics including entropy, sparseness, and noise robustness. Detailed power analysis at different stages of the SP operations is performed to demonstrate the suitability for mobile platforms.
△ Less
Submitted 27 December, 2018;
originally announced December 2018.
-
Deep Positron: A Deep Neural Network Using the Posit Number System
Authors:
Zachariah Carmichael,
Hamed F. Langroudi,
Char Khazanov,
Jeffrey Lillie,
John L. Gustafson,
Dhireesha Kudithipudi
Abstract:
The recent surge of interest in Deep Neural Networks (DNNs) has led to increasingly complex networks that tax computational and memory resources. Many DNNs presently use 16-bit or 32-bit floating point operations. Significant performance and power gains can be obtained when DNN accelerators support low-precision numerical formats. Despite considerable research, there is still a knowledge gap on ho…
▽ More
The recent surge of interest in Deep Neural Networks (DNNs) has led to increasingly complex networks that tax computational and memory resources. Many DNNs presently use 16-bit or 32-bit floating point operations. Significant performance and power gains can be obtained when DNN accelerators support low-precision numerical formats. Despite considerable research, there is still a knowledge gap on how low-precision operations can be realized for both DNN training and inference. In this work, we propose a DNN architecture, Deep Positron, with posit numerical format operating successfully at $\leq$8 bits for inference. We propose a precision-adaptable FPGA soft core for exact multiply-and-accumulate for uniform comparison across three numerical formats, fixed, floating-point and posit. Preliminary results demonstrate that 8-bit posit has better accuracy than 8-bit fixed or floating-point for three different low-dimensional datasets. Moreover, the accuracy is comparable to 32-bit floating-point on a Xilinx Virtex-7 FPGA device. The trade-offs between DNN performance and hardware resources, i.e. latency, power, and resource utilization, show that posit outperforms in accuracy and latency at 8-bit and below.
△ Less
Submitted 18 January, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Semi-Trained Memristive Crossbar Computing Engine with In-Situ Learning Accelerator
Authors:
Abdullah M. Zyarah,
Dhireesha Kudithipudi
Abstract:
On-device intelligence is gaining significant attention recently as it offers local data processing and low power consumption. In this research, an on-device training circuitry for threshold-current memristors integrated in a crossbar structure is proposed. Furthermore, alternate approaches of map** the synaptic weights into fully-trained and semi-trained crossbars are investigated. In a semi-tr…
▽ More
On-device intelligence is gaining significant attention recently as it offers local data processing and low power consumption. In this research, an on-device training circuitry for threshold-current memristors integrated in a crossbar structure is proposed. Furthermore, alternate approaches of map** the synaptic weights into fully-trained and semi-trained crossbars are investigated. In a semi-trained crossbar a confined subset of memristors are tuned and the remaining subset of memristors are not programmed. This translates to optimal resource utilization and power consumption, compared to a fully programmed crossbar. The semi-trained crossbar architecture is applicable to a broad class of neural networks. System level verification is performed with an extreme learning machine for binomial and multinomial classification. The total power for a single 4x4 layer network, when implemented in IBM 65nm node, is estimated to be ~ 42.16uW and the area is estimated to be 26.48um x 22.35um.
△ Less
Submitted 22 August, 2018;
originally announced August 2018.
-
Neuromorphic Architecture for the Hierarchical Temporal Memory
Authors:
Abdullah M. Zyarah,
Dhireesha Kudithipudi
Abstract:
A biomimetic machine intelligence algorithm, that holds promise in creating invariant representations of spatiotemporal input streams is the hierarchical temporal memory (HTM). This unsupervised online algorithm has been demonstrated on several machine-learning tasks, including anomaly detection. Significant effort has been made in formalizing and applying the HTM algorithm to different classes of…
▽ More
A biomimetic machine intelligence algorithm, that holds promise in creating invariant representations of spatiotemporal input streams is the hierarchical temporal memory (HTM). This unsupervised online algorithm has been demonstrated on several machine-learning tasks, including anomaly detection. Significant effort has been made in formalizing and applying the HTM algorithm to different classes of problems. There are few early explorations of the HTM hardware architecture, especially for the earlier version of the spatial pooler of HTM algorithm. In this article, we present a full-scale HTM architecture for both spatial pooler and temporal memory. Synthetic synapse design is proposed to address the potential and dynamic interconnections occurring during learning. The architecture is interweaved with parallel cells and columns that enable high processing speed for the HTM. The proposed architecture is verified for two different datasets: MNIST and the European number plate font (EUNF), with and without the presence of noise. The spatial pooler architecture is synthesized on Xilinx ZYNQ-7, with 91.16% classification accuracy for MNIST and 90\% accuracy for EUNF, with noise. For the temporal memory sequence prediction, first and second order predictions are observed for a 5-number long sequence generated from EUNF dataset and 95% accuracy is obtained. Moreover, the proposed hardware architecture offers 1364X speedup over the software realization. These results indicate that the proposed architecture can serve as a digital core to build the HTM in hardware and eventually as a standalone self-learning system.
△ Less
Submitted 17 August, 2018;
originally announced August 2018.
-
Mod-DeepESN: Modular Deep Echo State Network
Authors:
Zachariah Carmichael,
Humza Syed,
Stuart Burtner,
Dhireesha Kudithipudi
Abstract:
Neuro-inspired recurrent neural network algorithms, such as echo state networks, are computationally lightweight and thereby map well onto untethered devices. The baseline echo state network algorithms are shown to be efficient in solving small-scale spatio-temporal problems. However, they underperform for complex tasks that are characterized by multi-scale structures. In this research, an intrins…
▽ More
Neuro-inspired recurrent neural network algorithms, such as echo state networks, are computationally lightweight and thereby map well onto untethered devices. The baseline echo state network algorithms are shown to be efficient in solving small-scale spatio-temporal problems. However, they underperform for complex tasks that are characterized by multi-scale structures. In this research, an intrinsic plasticity-infused modular deep echo state network architecture is proposed to solve complex and multiple timescale temporal tasks. It outperforms state-of-the-art for time series prediction tasks.
△ Less
Submitted 25 March, 2019; v1 submitted 1 August, 2018;
originally announced August 2018.
-
Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit
Authors:
Seyed H. F. Langroudi,
Tej Pandit,
Dhireesha Kudithipudi
Abstract:
Performing the inference step of deep learning in resource constrained environments, such as embedded devices, is challenging. Success requires optimization at both software and hardware levels. Low precision arithmetic and specifically low precision fixed-point number systems have become the standard for performing deep learning inference. However, representing non-uniform data and distributed pa…
▽ More
Performing the inference step of deep learning in resource constrained environments, such as embedded devices, is challenging. Success requires optimization at both software and hardware levels. Low precision arithmetic and specifically low precision fixed-point number systems have become the standard for performing deep learning inference. However, representing non-uniform data and distributed parameters (e.g. weights) by using uniformly distributed fixed-point values is still a major drawback when using this number system. Recently, the posit number system was proposed, which represents numbers in a non-uniform manner. Therefore, in this paper we are motivated to explore using the posit number system to represent the weights of Deep Convolutional Neural Networks. However, we do not apply any quantization techniques and hence the network weights do not require re-training. The results of this exploration show that using the posit number system outperformed the fixed point number system in terms of accuracy and memory utilization.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
On the Statistical Challenges of Echo State Networks and Some Potential Remedies
Authors:
Qiuyi Wu,
Ernest Fokoue,
Dhireesha Kudithipudi
Abstract:
Echo state networks are powerful recurrent neural networks. However, they are often unstable and shaky, making the process of finding an good ESN for a specific dataset quite hard. Obtaining a superb accuracy by using the Echo State Network is a challenging task. We create, develop and implement a family of predictably optimal robust and stable ensemble of Echo State Networks via regularizing the…
▽ More
Echo state networks are powerful recurrent neural networks. However, they are often unstable and shaky, making the process of finding an good ESN for a specific dataset quite hard. Obtaining a superb accuracy by using the Echo State Network is a challenging task. We create, develop and implement a family of predictably optimal robust and stable ensemble of Echo State Networks via regularizing the training and perturbing the input. Furthermore, several distributions of weights have been tried based on the shape to see if the shape of the distribution has the impact for reducing the error. We found ESN can track in short term for most dataset, but it collapses in the long run. Short-term tracking with large size reservoir enables ESN to perform strikingly with superior prediction. Based on this scenario, we go a further step to aggregate many of ESNs into an ensemble to lower the variance and stabilize the system by stochastic replications and bootstrap** of input data.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
Convolutional Drift Networks for Video Classification
Authors:
Dillon Graham,
Seyed Hamed Fatemi Langroudi,
Christopher Kanan,
Dhireesha Kudithipudi
Abstract:
Analyzing spatio-temporal data like video is a challenging task that requires processing visual and temporal information effectively. Convolutional Neural Networks have shown promise as baseline fixed feature extractors through transfer learning, a technique that helps minimize the training cost on visual information. Temporal information is often handled using hand-crafted features or Recurrent N…
▽ More
Analyzing spatio-temporal data like video is a challenging task that requires processing visual and temporal information effectively. Convolutional Neural Networks have shown promise as baseline fixed feature extractors through transfer learning, a technique that helps minimize the training cost on visual information. Temporal information is often handled using hand-crafted features or Recurrent Neural Networks, but this can be overly specific or prohibitively complex. Building a fully trainable system that can efficiently analyze spatio-temporal data without hand-crafted features or complex training is an open challenge. We present a new neural network architecture to address this challenge, the Convolutional Drift Network (CDN). Our CDN architecture combines the visual feature extraction power of deep Convolutional Neural Networks with the intrinsically efficient temporal processing provided by Reservoir Computing. In this introductory paper on the CDN, we provide a very simple baseline implementation tested on two egocentric (first-person) video activity datasets.We achieve video-level activity classification results on-par with state-of-the art methods. Notably, performance on this complex spatio-temporal task was produced by only training a single feed-forward layer in the CDN.
△ Less
Submitted 3 November, 2017;
originally announced November 2017.
-
Non-volatile Hierarchical Temporal Memory: Hardware for Spatial Pooling
Authors:
Lennard Streat,
Dhireesha Kudithipudi,
Kevin Gomez
Abstract:
Hierarchical Temporal Memory (HTM) is a biomimetic machine learning algorithm imbibing the structural and algorithmic properties of the neocortex. Two main functional components of HTM that enable spatio-temporal processing are the spatial pooler and temporal memory. In this research, we explore a scalable hardware realization of the spatial pooler closely coupled with the mathematical formulation…
▽ More
Hierarchical Temporal Memory (HTM) is a biomimetic machine learning algorithm imbibing the structural and algorithmic properties of the neocortex. Two main functional components of HTM that enable spatio-temporal processing are the spatial pooler and temporal memory. In this research, we explore a scalable hardware realization of the spatial pooler closely coupled with the mathematical formulation of spatial pooler. This class of neuromorphic algorithms are advantageous in solving a subset of the future engineering problems by extracting nonintuitive patterns in complex data. The proposed architecture, Non-volatile HTM (NVHTM), leverages large-scale solid state flash memory to realize a optimal memory organization, area and power envelope. A behavioral model of NVHTM is evaluated against the MNIST dataset, yielding 91.98% classification accuracy. A full custom layout is developed to validate the design in a TSMC 180nm process. The area and power profile of the spatial pooler are 30.538mm2 and 64.394mW, respectively. This design is a proof-of-concept that storage processing is a viable platform for large scale HTM network models.
△ Less
Submitted 8 November, 2016;
originally announced November 2016.
-
Unsupervised Learning in Neuromemristive Systems
Authors:
Cory Merkel,
Dhireesha Kudithipudi
Abstract:
Neuromemristive systems (NMSs) currently represent the most promising platform to achieve energy efficient neuro-inspired computation. However, since the research field is less than a decade old, there are still countless algorithms and design paradigms to be explored within these systems. One particular domain that remains to be fully investigated within NMSs is unsupervised learning. In this wor…
▽ More
Neuromemristive systems (NMSs) currently represent the most promising platform to achieve energy efficient neuro-inspired computation. However, since the research field is less than a decade old, there are still countless algorithms and design paradigms to be explored within these systems. One particular domain that remains to be fully investigated within NMSs is unsupervised learning. In this work, we explore the design of an NMS for unsupervised clustering, which is a critical element of several machine learning algorithms. Using a simple memristor crossbar architecture and learning rule, we are able to achieve performance which is on par with MATLAB's k-means clustering.
△ Less
Submitted 27 January, 2016;
originally announced January 2016.
-
A Mathematical Formalization of Hierarchical Temporal Memory's Spatial Pooler
Authors:
James Mnatzaganian,
Ernest Fokoué,
Dhireesha Kudithipudi
Abstract:
Hierarchical temporal memory (HTM) is an emerging machine learning algorithm, with the potential to provide a means to perform predictions on spatiotemporal data. The algorithm, inspired by the neocortex, currently does not have a comprehensive mathematical framework. This work brings together all aspects of the spatial pooler (SP), a critical learning component in HTM, under a single unifying fra…
▽ More
Hierarchical temporal memory (HTM) is an emerging machine learning algorithm, with the potential to provide a means to perform predictions on spatiotemporal data. The algorithm, inspired by the neocortex, currently does not have a comprehensive mathematical framework. This work brings together all aspects of the spatial pooler (SP), a critical learning component in HTM, under a single unifying framework. The primary learning mechanism is explored, where a maximum likelihood estimator for determining the degree of permanence update is proposed. The boosting mechanisms are studied and found to be only relevant during the initial few iterations of the network. Observations are made relating HTM to well-known algorithms such as competitive learning and attribute bagging. Methods are provided for using the SP for classification as well as dimensionality reduction. Empirical evidence verifies that given the proper parameterizations, the SP may be used for feature learning.
△ Less
Submitted 8 September, 2016; v1 submitted 22 January, 2016;
originally announced January 2016.