-
Hyperparameter Optimization in Binary Communication Networks for Neuromorphic Deployment
Authors:
Maryam Parsa,
Catherine D. Schuman,
Prasanna Date,
Derek C. Rose,
Bill Kay,
J. Parker Mitchell,
Steven R. Young,
Ryan Dellana,
William Severa,
Thomas E. Potok,
Kaushik Roy
Abstract:
Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperpar…
▽ More
Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperparameters to achieve optimal performance. In this work, we introduce a Bayesian approach for optimizing the hyperparameters of an algorithm for training binary communication networks that can be deployed to neuromorphic hardware. We show that by optimizing the hyperparameters on this algorithm for each dataset, we can achieve improvements in accuracy over the previous state-of-the-art for this algorithm on each dataset (by up to 15 percent). This jump in performance continues to emphasize the potential when converting traditional neural networks to binary communication applicable to neuromorphic hardware.
△ Less
Submitted 20 April, 2020;
originally announced May 2020.
-
Inferring Convolutional Neural Networks' accuracies from their architectural characterizations
Authors:
Duc Hoang,
Jesse Hamer,
Gabriel N. Perdue,
Steven R. Young,
Jonathan Miller,
Anushree Ghosh
Abstract:
Convolutional Neural Networks (CNNs) have shown strong promise for analyzing scientific data from many domains including particle imaging detectors. However, the challenge of choosing the appropriate network architecture (depth, kernel shapes, activation functions, etc.) for specific applications and different data sets is still poorly understood. In this paper, we study the relationships between…
▽ More
Convolutional Neural Networks (CNNs) have shown strong promise for analyzing scientific data from many domains including particle imaging detectors. However, the challenge of choosing the appropriate network architecture (depth, kernel shapes, activation functions, etc.) for specific applications and different data sets is still poorly understood. In this paper, we study the relationships between a CNN's architecture and its performance by proposing a systematic language that is useful for comparison between different CNN's architectures before training time. We characterize CNN's architecture by different attributes, and demonstrate that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems -- event vertex finding and hadron multiplicity classification in the MINERvA experiment at Fermi National Accelerator Laboratory. In doing so, we extract several architectural attributes from optimized networks' architecture for the physics problems, which are outputs of a model selection algorithm called Multi-node Evolutionary Neural Networks for Deep Learning (MENNDL). We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training. The models perform 16-20% better than random guessing. Additionally, we found an coefficient of determination of 0.966 for an Ordinary Least Squares model in a regression on accuracy over a large population of networks.
△ Less
Submitted 9 January, 2020; v1 submitted 7 January, 2020;
originally announced January 2020.
-
Exascale Deep Learning to Accelerate Cancer Research
Authors:
Robert M. Patton,
J. Travis Johnston,
Steven R. Young,
Catherine D. Schuman,
Thomas E. Potok,
Derek C. Rose,
Seung-Hwan Lim,
Junghoon Chae,
Le Hou,
Shahira Abousamra,
Dimitris Samaras,
Joel Saltz
Abstract:
Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend…
▽ More
Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Deep Modulation (Deepmod): A Self-Taught PHY Layer for Resilient Digital Communications
Authors:
Adam Anderson,
Steven R. Young,
F. Kyle Reed,
Jason M. Vann
Abstract:
Traditional physical (PHY) layer protocols contain chains of signal processing blocks that have been mathematically optimized to transmit information bits efficiently over noisy channels. Unfortunately, this same optimality encourages ubiquity in wireless communication technology and enhances the potential for catastrophic cyber or physical attacks due to prolific knowledge of underlying physical…
▽ More
Traditional physical (PHY) layer protocols contain chains of signal processing blocks that have been mathematically optimized to transmit information bits efficiently over noisy channels. Unfortunately, this same optimality encourages ubiquity in wireless communication technology and enhances the potential for catastrophic cyber or physical attacks due to prolific knowledge of underlying physical layers. Additionally, optimal signal processing for one channel medium may not work for another without significant changes in the software protocol. Any truly resilient communications protocol must be capable of immediate redeployment to meet quality of service (QoS) demands in a wide variety of possible channel media. Contrary to many traditional approaches which use immutable man-made signal processing blocks, this work proposes generating real-time blocks {\it ad hoc} through a machine learning framework, so-called deepmod, that is only relevant to the particular channel medium being used. With this approach, traditional signal processing blocks are replaced with machine learning graphs which are trained, used, and discarded as needed. Our experiments show that deepmod, using the same machine intelligence, converges to viable communication links over vastly different channels including: radio frequency (RF), powerline communications (PLC), and acoustic channels.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
Deep Learning for Vertex Reconstruction of Neutrino-Nucleus Interaction Events with Combined Energy and Time Data
Authors:
Linghao Song,
Fan Chen,
Steven R. Young,
Catherine D. Schuman,
Gabriel Perdue,
Thomas E. Potok
Abstract:
We present a deep learning approach for vertex reconstruction of neutrino-nucleus interaction events, a problem in the domain of high energy physics. In this approach, we combine both energy and timing data that are collected in the MINERvA detector to perform classification and regression tasks. We show that the resulting network achieves higher accuracy than previous results while requiring a sm…
▽ More
We present a deep learning approach for vertex reconstruction of neutrino-nucleus interaction events, a problem in the domain of high energy physics. In this approach, we combine both energy and timing data that are collected in the MINERvA detector to perform classification and regression tasks. We show that the resulting network achieves higher accuracy than previous results while requiring a smaller model size and less training time. In particular, the proposed model outperforms the state-of-the-art by 4.00% on classification accuracy. For the regression task, our model achieves 0.9919 on the coefficient of determination, higher than the previous work (0.96).
△ Less
Submitted 2 February, 2019;
originally announced February 2019.
-
Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study
Authors:
Drahomira Herrmannova,
Steven R. Young,
Robert M. Patton,
Christopher G. Stahl,
Nicole C. Kleinstreuer,
Mary S. Wolfe
Abstract:
Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing…
▽ More
Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments (sentences) relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors.
△ Less
Submitted 3 November, 2018;
originally announced November 2018.
-
Data Mining for better material synthesis: the case of pulsed laser deposition of complex oxides
Authors:
Steven R. Young,
Artem Maksov,
Maxim Ziatdinov,
Ye Cao,
Matthew Burch,
Janakiraman Balachandran,
Linglong Li,
Suhas Somnath,
Robert M. Patton,
Sergei V. Kalinin,
Rama K. Vasudevan
Abstract:
The pursuit of more advanced electronics, finding solutions to energy needs, and tackling a wealth of social issues often hinges upon the discovery and optimization of new functional materials that enable disruptive technologies or applications. However, the discovery rate of these materials is alarmingly low. Much of the information that could drive this rate higher is scattered across tens of th…
▽ More
The pursuit of more advanced electronics, finding solutions to energy needs, and tackling a wealth of social issues often hinges upon the discovery and optimization of new functional materials that enable disruptive technologies or applications. However, the discovery rate of these materials is alarmingly low. Much of the information that could drive this rate higher is scattered across tens of thousands of papers in the extant literature published over several decades, and almost all of it is not collated and thus cannot be used in its entirety. Many of these limitations can be circumvented if the experimentalist has access to systematized collections of prior experimental procedures and results that can be analyzed and built upon. Here, we investigate the property-processing relationship during growth of oxide films by pulsed laser deposition. To do so, we develop an enabling software tool to (1) mine the literature of relevant papers for synthesis parameters and functional properties of previously studied materials, (2) enhance the accuracy of this mining through crowd sourcing approaches, (3) create a searchable repository that will be a community-wide resource enabling material scientists to leverage this information, and (4) provide through the Jupyter notebook platform, simple machine-learning-based analysis to learn the complex interactions between growth parameters and functional properties (all data and codes available on https://github.com/ORNL-DataMatls). The results allow visualization of growth windows, trends and outliers, and which can serve as a template for analyzing the distribution of growth conditions, provide starting points for related compounds and act as feedback for first-principles calculations. Such tools will comprise an integral part of the materials design schema in the coming decade.
△ Less
Submitted 1 March, 2018; v1 submitted 20 October, 2017;
originally announced October 2017.
-
A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers
Authors:
Thomas E. Potok,
Catherine Schuman,
Steven R. Young,
Robert M. Patton,
Federico Spedalieri,
Jeremy Liu,
Ke-Thia Yao,
Garrett Rose,
Gangotree Chakma
Abstract:
Current Deep Learning approaches have been very successful using convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers. Three limitations of this approach are: 1) they are based on a simple layered network topology, i.e., highly connected layers, without intra-layer connections; 2) the networks are manually configured to achieve optimal results, and…
▽ More
Current Deep Learning approaches have been very successful using convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers. Three limitations of this approach are: 1) they are based on a simple layered network topology, i.e., highly connected layers, without intra-layer connections; 2) the networks are manually configured to achieve optimal results, and 3) the implementation of neuron model is expensive in both cost and power. In this paper, we evaluate deep learning models using three different computing architectures to address these problems: quantum computing to train complex topologies, high performance computing (HPC) to automatically determine network topology, and neuromorphic computing for a low-power hardware implementation. We use the MNIST dataset for our experiment, due to input size limitations of current quantum computers. Our results show the feasibility of using the three architectures in tandem to address the above deep learning limitations. We show a quantum computer can find high quality values of intra-layer connections weights, in a tractable time as the complexity of the network increases; a high performance computer can find optimal layer-based topologies; and a neuromorphic computer can represent the complex topology and weights derived from the other architectures in low power memristive hardware.
△ Less
Submitted 13 July, 2017; v1 submitted 15 March, 2017;
originally announced March 2017.
-
Recurrent Online Clustering as a Spatio-Temporal Feature Extractor in DeSTIN
Authors:
Steven R. Young,
Itamar Arel
Abstract:
This paper presents a basic enhancement to the DeSTIN deep learning architecture by replacing the explicitly calculated transition tables that are used to capture temporal features with a simpler, more scalable mechanism. This mechanism uses feedback of state information to cluster over a space comprised of both the spatial input and the current state. The resulting architecture achieves state-of-…
▽ More
This paper presents a basic enhancement to the DeSTIN deep learning architecture by replacing the explicitly calculated transition tables that are used to capture temporal features with a simpler, more scalable mechanism. This mechanism uses feedback of state information to cluster over a space comprised of both the spatial input and the current state. The resulting architecture achieves state-of-the-art results on the MNIST classification benchmark.
△ Less
Submitted 16 January, 2013; v1 submitted 15 January, 2013;
originally announced January 2013.