Search | arXiv e-print repository

arXiv:2006.14652 [pdf, other]

doi 10.1145/3210377.3210410

Constant-Depth and Subcubic-Size Threshold Circuits for Matrix Multiplication

Authors: Ojas Parekh, Cynthia A. Phillips, Conrad D. James, James B. Aimone

Abstract: Boolean circuits of McCulloch-Pitts threshold gates are a classic model of neural computation studied heavily in the late 20th century as a model of general computation. Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility. We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate lo… ▽ More Boolean circuits of McCulloch-Pitts threshold gates are a classic model of neural computation studied heavily in the late 20th century as a model of general computation. Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility. We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate logic with conventional fast matrix multiplication algorithms, that perform $O(N^ω)$ arithmetic operations for a positive constant $ω< 3$. Our approach converts such a fast matrix multiplication algorithm into a constant-depth threshold circuit with approximately $O(N^ω)$ gates. Prior to our work, it was not known whether the $Θ(N^3)$-gate barrier for matrix multiplication was surmountable by constant-depth threshold circuits. Dense matrix multiplication is a core operation in convolutional neural network training. Performing this work on a neural architecture instead of off-loading it to a GPU may be an appealing option. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: Appears in the proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2018

arXiv:1809.09261 [pdf, other]

Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting

Authors: Aleksandra Faust, James B. Aimone, Conrad D. James, Lydia Tapia

Abstract: Robots and autonomous agents often complete goal-based tasks with limited resources, relying on imperfect models and sensor measurements. In particular, reinforcement learning (RL) and feedback control can be used to help a robot achieve a goal. Taking advantage of this body of work, this paper formulates general computation as a feedback-control problem, which allows the agent to autonomously ove… ▽ More Robots and autonomous agents often complete goal-based tasks with limited resources, relying on imperfect models and sensor measurements. In particular, reinforcement learning (RL) and feedback control can be used to help a robot achieve a goal. Taking advantage of this body of work, this paper formulates general computation as a feedback-control problem, which allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing then becomes a sequential decision making problem, solved with reinforcement learning (RL), and analyzed with Lyapunov stability theory to assess the agent's resilience and progression to the goal. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort. △ Less

Submitted 24 September, 2018; originally announced September 2018.

Comments: 11 pages, accepted to CDC 2018. Here with additional evaluations

arXiv:1712.07671 [pdf, other]

Tracking Cyber Adversaries with Adaptive Indicators of Compromise

Authors: Justin E. Doak, Joe B. Ingram, Sam A. Mulder, John H. Naegle, Jonathan A. Cox, James B. Aimone, Kevin R. Dixon, Conrad D. James, David R. Follett

Abstract: A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary w… ▽ More A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of kee** IOCs, in this case regular expressions (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities. In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naive solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naive solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries. △ Less

Submitted 20 December, 2017; originally announced December 2017.

Comments: This was presented at the 4th Annual Conf. on Computational Science & Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas, Nevada, USA

Report number: SAND2017-12402 C

Journal ref: This will be in the proceedings of the 4th Annual Conf. on Computational Science & Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas, Nevada, USA

arXiv:1711.03947 [pdf, other]

Dynamic Analysis of Executables to Detect and Characterize Malware

Authors: Michael R. Smith, Joe B. Ingram, Christopher C. Lamb, Timothy J. Draelos, Justin E. Doak, James B. Aimone, Conrad D. James

Abstract: It is needed to ensure the integrity of systems that process sensitive information and control many aspects of everyday life. We examine the use of machine learning algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored rather than the bytes of an executable. We examine several machine learning techniques for de… ▽ More It is needed to ensure the integrity of systems that process sensitive information and control many aspects of everyday life. We examine the use of machine learning algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored rather than the bytes of an executable. We examine several machine learning techniques for detecting malware including random forests, deep learning techniques, and liquid state machines. The experiments examine the effects of concept drift on each algorithm to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to gain a better understanding about what differentiates the malware samples from the goodware, which can further be used as a forensics tool to understand what the malware (or goodware) was doing to provide directions for investigation and remediation. △ Less

Submitted 28 September, 2018; v1 submitted 10 November, 2017; originally announced November 2017.

Comments: 9 pages, 6 Tables, 4 Figures

Report number: SAND2018-11011 C

arXiv:1710.09934 [pdf, other]

Data-driven Feature Sampling for Deep Hyperspectral Classification and Segmentation

Authors: William M. Severa, Jerilyn A. Timlin, Suraj Kholwadwala, Conrad D. James, James B. Aimone

Abstract: The high dimensionality of hyperspectral imaging forces unique challenges in scope, size and processing requirements. Motivated by the potential for an in-the-field cell sorting detector, we examine a $\textit{Synechocystis sp.}$ PCC 6803 dataset wherein cells are grown alternatively in nitrogen rich or deplete cultures. We use deep learning techniques to both successfully classify cells and gener… ▽ More The high dimensionality of hyperspectral imaging forces unique challenges in scope, size and processing requirements. Motivated by the potential for an in-the-field cell sorting detector, we examine a $\textit{Synechocystis sp.}$ PCC 6803 dataset wherein cells are grown alternatively in nitrogen rich or deplete cultures. We use deep learning techniques to both successfully classify cells and generate a mask segmenting the cells/condition from the background. Further, we use the classification accuracy to guide a data-driven, iterative feature selection method, allowing the design neural networks requiring 90% fewer input features with little accuracy degradation. △ Less

Submitted 26 October, 2017; originally announced October 2017.

MSC Class: 68T01 ACM Class: I.2.1; I.4.6; J.3

arXiv:1707.09952 [pdf]

doi 10.1109/JETCAS.2018.2796379

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

Authors: Matthew J. Marinella, Sapan Agarwal, Alexander Hsia, Isaac Richter, Robin Jacobs-Gedrim, John Niroula, Steven J. Plimpton, Engin Ipek, Conrad D. James

Abstract: Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50M parameters are made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the n… ▽ More Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50M parameters are made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the next several orders of magnitude in performance per watt gains. Using an analog resistive memory (ReRAM) crossbar to perform key matrix operations in an accelerator is an attractive option. This work presents a detailed design using a state of the art 14/16 nm PDK for of an analog crossbar circuit block designed to process three key kernels required in training and inference of neural networks. A detailed circuit and device-level analysis of energy, latency, area, and accuracy are given and compared to relevant designs using standard digital ReRAM and SRAM operations. It is shown that the analog accelerator has a 270x energy and 540x latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate (MAC). Compared to an SRAM based accelerator, the energy is 430X better and latency is 34X better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies. △ Less

Submitted 16 February, 2018; v1 submitted 31 July, 2017; originally announced July 2017.

arXiv:1704.08306 [pdf, other]

A Digital Neuromorphic Architecture Efficiently Facilitating Complex Synaptic Response Functions Applied to Liquid State Machines

Authors: Michael R. Smith, Aaron J. Hill, Kristofor D. Carlson, Craig M. Vineyard, Jonathon Donaldson, David R. Follett, Pamela L. Follett, John H. Naegle, Conrad D. James, James B. Aimone

Abstract: Information in neural networks is represented as weighted connections, or synapses, between neurons. This poses a problem as the primary computational bottleneck for neural networks is the vector-matrix multiply when inputs are multiplied by the neural network weights. Conventional processing architectures are not well suited for simulating neural networks, often requiring large amounts of energy… ▽ More Information in neural networks is represented as weighted connections, or synapses, between neurons. This poses a problem as the primary computational bottleneck for neural networks is the vector-matrix multiply when inputs are multiplied by the neural network weights. Conventional processing architectures are not well suited for simulating neural networks, often requiring large amounts of energy and time. Additionally, synapses in biological neural networks are not binary connections, but exhibit a nonlinear response function as neurotransmitters are emitted and diffuse between neurons. Inspired by neuroscience principles, we present a digital neuromorphic architecture, the Spiking Temporal Processing Unit (STPU), capable of modeling arbitrary complex synaptic response functions without requiring additional hardware components. We consider the paradigm of spiking neurons with temporally coded information as opposed to non-spiking rate coded neurons used in most neural networks. In this paradigm we examine liquid state machines applied to speech recognition and show how a liquid state machine with temporal dynamics maps onto the STPU-demonstrating the flexibility and efficiency of the STPU for instantiating neural algorithms. △ Less

Submitted 21 March, 2017; originally announced April 2017.

Comments: 8 pages, 4 Figures, Preprint of 2017 IJCNN

arXiv:1612.03770 [pdf, other]

doi 10.1109/IJCNN.2017.7965898

Neurogenesis Deep Learning

Authors: Timothy J. Draelos, Nadine E. Miner, Christopher C. Lamb, Jonathan A. Cox, Craig M. Vineyard, Kristofor D. Carlson, William M. Severa, Conrad D. James, James B. Aimone

Abstract: Neural machine learning methods, such as deep neural networks (DNN), have achieved remarkable success in a number of complex data processing tasks. These methods have arguably had their strongest impact on tasks such as image and audio processing - data processing domains in which humans have long held clear advantages over conventional algorithms. In contrast to biological neural systems, which a… ▽ More Neural machine learning methods, such as deep neural networks (DNN), have achieved remarkable success in a number of complex data processing tasks. These methods have arguably had their strongest impact on tasks such as image and audio processing - data processing domains in which humans have long held clear advantages over conventional algorithms. In contrast to biological neural systems, which are capable of learning continuously, deep artificial networks have a limited ability for incorporating new information in an already trained network. As a result, methods for continuous learning are potentially highly impactful in enabling the application of deep networks to dynamic data sets. Here, inspired by the process of adult neurogenesis in the hippocampus, we explore the potential for adding new neurons to deep layers of artificial neural networks in order to facilitate their acquisition of novel information while preserving previously trained data representations. Our results on the MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes lower and upper case letters and digits, demonstrate that neurogenesis is well suited for addressing the stability-plasticity dilemma that has long challenged adaptive machine learning algorithms. △ Less

Submitted 28 March, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: 8 pages, 8 figures, Accepted to 2017 International Joint Conference on Neural Networks (IJCNN 2017)

Report number: SAND2017-2174 C

arXiv:1406.4033 [pdf]

doi 10.1063/1.4895526

Degenerate Resistive Switching and Ultrahigh Density Storage in Resistive Memory

Authors: Andrew J. Lohn, Patrick R. Mickel, Conrad D. James, Matthew J. Marinella

Abstract: We show that, in tantalum oxide resistive memories, activation power provides a multi-level variable for information storage that can be set and read separately from the resistance. These two state variables (resistance and activation power) can be precisely controlled in two steps: (1) the possible activation power states are selected by partially reducing resistance, then (2) a subsequent partia… ▽ More We show that, in tantalum oxide resistive memories, activation power provides a multi-level variable for information storage that can be set and read separately from the resistance. These two state variables (resistance and activation power) can be precisely controlled in two steps: (1) the possible activation power states are selected by partially reducing resistance, then (2) a subsequent partial increase in resistance specifies the resistance state and the final activation power state. We show that these states can be precisely written and read electrically, making this approach potentially amenable for ultra-high density memories. We provide a theoretical explanation for information storage and retrieval from activation power and experimentally demonstrate information storage in a third dimension related to the change in activation power with resistance. △ Less

Submitted 16 June, 2014; originally announced June 2014.

Showing 1–9 of 9 results for author: James, C D