Search | arXiv e-print repository

arXiv:2005.04171 [pdf, other]

Hyperparameter Optimization in Binary Communication Networks for Neuromorphic Deployment

Authors: Maryam Parsa, Catherine D. Schuman, Prasanna Date, Derek C. Rose, Bill Kay, J. Parker Mitchell, Steven R. Young, Ryan Dellana, William Severa, Thomas E. Potok, Kaushik Roy

Abstract: Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperpar… ▽ More Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperparameters to achieve optimal performance. In this work, we introduce a Bayesian approach for optimizing the hyperparameters of an algorithm for training binary communication networks that can be deployed to neuromorphic hardware. We show that by optimizing the hyperparameters on this algorithm for each dataset, we can achieve improvements in accuracy over the previous state-of-the-art for this algorithm on each dataset (by up to 15 percent). This jump in performance continues to emphasize the potential when converting traditional neural networks to binary communication applicable to neuromorphic hardware. △ Less

Submitted 20 April, 2020; originally announced May 2020.

Comments: 9 pages, 3 figures, To appear in WCCI 2020

arXiv:2001.02160 [pdf, other]

Inferring Convolutional Neural Networks' accuracies from their architectural characterizations

Authors: Duc Hoang, Jesse Hamer, Gabriel N. Perdue, Steven R. Young, Jonathan Miller, Anushree Ghosh

Abstract: Convolutional Neural Networks (CNNs) have shown strong promise for analyzing scientific data from many domains including particle imaging detectors. However, the challenge of choosing the appropriate network architecture (depth, kernel shapes, activation functions, etc.) for specific applications and different data sets is still poorly understood. In this paper, we study the relationships between… ▽ More Convolutional Neural Networks (CNNs) have shown strong promise for analyzing scientific data from many domains including particle imaging detectors. However, the challenge of choosing the appropriate network architecture (depth, kernel shapes, activation functions, etc.) for specific applications and different data sets is still poorly understood. In this paper, we study the relationships between a CNN's architecture and its performance by proposing a systematic language that is useful for comparison between different CNN's architectures before training time. We characterize CNN's architecture by different attributes, and demonstrate that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems -- event vertex finding and hadron multiplicity classification in the MINERvA experiment at Fermi National Accelerator Laboratory. In doing so, we extract several architectural attributes from optimized networks' architecture for the physics problems, which are outputs of a model selection algorithm called Multi-node Evolutionary Neural Networks for Deep Learning (MENNDL). We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training. The models perform 16-20% better than random guessing. Additionally, we found an coefficient of determination of 0.966 for an Ordinary Least Squares model in a regression on accuracy over a large population of networks. △ Less

Submitted 9 January, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

Comments: 6 pages, 5 figures, 5 tables, to appear in proceedings of the 18th International Conference on Machine Learning and Applications - ICMLA 2019

arXiv:1909.12291 [pdf, other]

Exascale Deep Learning to Accelerate Cancer Research

Authors: Robert M. Patton, J. Travis Johnston, Steven R. Young, Catherine D. Schuman, Thomas E. Potok, Derek C. Rose, Seung-Hwan Lim, Junghoon Chae, Le Hou, Shahira Abousamra, Dimitris Samaras, Joel Saltz

Abstract: Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend… ▽ More Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: Submitted to IEEE Big Data

arXiv:1908.11218 [pdf, other]

Deep Modulation (Deepmod): A Self-Taught PHY Layer for Resilient Digital Communications

Authors: Adam Anderson, Steven R. Young, F. Kyle Reed, Jason M. Vann

Abstract: Traditional physical (PHY) layer protocols contain chains of signal processing blocks that have been mathematically optimized to transmit information bits efficiently over noisy channels. Unfortunately, this same optimality encourages ubiquity in wireless communication technology and enhances the potential for catastrophic cyber or physical attacks due to prolific knowledge of underlying physical… ▽ More Traditional physical (PHY) layer protocols contain chains of signal processing blocks that have been mathematically optimized to transmit information bits efficiently over noisy channels. Unfortunately, this same optimality encourages ubiquity in wireless communication technology and enhances the potential for catastrophic cyber or physical attacks due to prolific knowledge of underlying physical layers. Additionally, optimal signal processing for one channel medium may not work for another without significant changes in the software protocol. Any truly resilient communications protocol must be capable of immediate redeployment to meet quality of service (QoS) demands in a wide variety of possible channel media. Contrary to many traditional approaches which use immutable man-made signal processing blocks, this work proposes generating real-time blocks {\it ad hoc} through a machine learning framework, so-called deepmod, that is only relevant to the particular channel medium being used. With this approach, traditional signal processing blocks are replaced with machine learning graphs which are trained, used, and discarded as needed. Our experiments show that deepmod, using the same machine intelligence, converges to viable communication links over vastly different channels including: radio frequency (RF), powerline communications (PLC), and acoustic channels. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 8 pages

arXiv:1902.00743 [pdf, other]

Deep Learning for Vertex Reconstruction of Neutrino-Nucleus Interaction Events with Combined Energy and Time Data

Authors: Linghao Song, Fan Chen, Steven R. Young, Catherine D. Schuman, Gabriel Perdue, Thomas E. Potok

Abstract: We present a deep learning approach for vertex reconstruction of neutrino-nucleus interaction events, a problem in the domain of high energy physics. In this approach, we combine both energy and timing data that are collected in the MINERvA detector to perform classification and regression tasks. We show that the resulting network achieves higher accuracy than previous results while requiring a sm… ▽ More We present a deep learning approach for vertex reconstruction of neutrino-nucleus interaction events, a problem in the domain of high energy physics. In this approach, we combine both energy and timing data that are collected in the MINERvA detector to perform classification and regression tasks. We show that the resulting network achieves higher accuracy than previous results while requiring a smaller model size and less training time. In particular, the proposed model outperforms the state-of-the-art by 4.00% on classification accuracy. For the regression task, our model achieves 0.9919 on the coefficient of determination, higher than the previous work (0.96). △ Less

Submitted 2 February, 2019; originally announced February 2019.

Comments: To appear in 2019 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)

arXiv:1811.01183 [pdf, other]

Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study

Authors: Drahomira Herrmannova, Steven R. Young, Robert M. Patton, Christopher G. Stahl, Nicole C. Kleinstreuer, Mary S. Wolfe

Abstract: Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing… ▽ More Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments (sentences) relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors. △ Less

Submitted 3 November, 2018; originally announced November 2018.

Comments: Ninth International Workshop on Health Text Mining and Information Analysis at EMNLP 2018

arXiv:1710.07721 [pdf]

doi 10.1063/1.5009942

Data Mining for better material synthesis: the case of pulsed laser deposition of complex oxides

Authors: Steven R. Young, Artem Maksov, Maxim Ziatdinov, Ye Cao, Matthew Burch, Janakiraman Balachandran, Linglong Li, Suhas Somnath, Robert M. Patton, Sergei V. Kalinin, Rama K. Vasudevan

Abstract: The pursuit of more advanced electronics, finding solutions to energy needs, and tackling a wealth of social issues often hinges upon the discovery and optimization of new functional materials that enable disruptive technologies or applications. However, the discovery rate of these materials is alarmingly low. Much of the information that could drive this rate higher is scattered across tens of th… ▽ More The pursuit of more advanced electronics, finding solutions to energy needs, and tackling a wealth of social issues often hinges upon the discovery and optimization of new functional materials that enable disruptive technologies or applications. However, the discovery rate of these materials is alarmingly low. Much of the information that could drive this rate higher is scattered across tens of thousands of papers in the extant literature published over several decades, and almost all of it is not collated and thus cannot be used in its entirety. Many of these limitations can be circumvented if the experimentalist has access to systematized collections of prior experimental procedures and results that can be analyzed and built upon. Here, we investigate the property-processing relationship during growth of oxide films by pulsed laser deposition. To do so, we develop an enabling software tool to (1) mine the literature of relevant papers for synthesis parameters and functional properties of previously studied materials, (2) enhance the accuracy of this mining through crowd sourcing approaches, (3) create a searchable repository that will be a community-wide resource enabling material scientists to leverage this information, and (4) provide through the Jupyter notebook platform, simple machine-learning-based analysis to learn the complex interactions between growth parameters and functional properties (all data and codes available on https://github.com/ORNL-DataMatls). The results allow visualization of growth windows, trends and outliers, and which can serve as a template for analyzing the distribution of growth conditions, provide starting points for related compounds and act as feedback for first-principles calculations. Such tools will comprise an integral part of the materials design schema in the coming decade. △ Less

Submitted 1 March, 2018; v1 submitted 20 October, 2017; originally announced October 2017.

Comments: 30 pages; 8 figures

arXiv:1703.05364 [pdf, other]

A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

Authors: Thomas E. Potok, Catherine Schuman, Steven R. Young, Robert M. Patton, Federico Spedalieri, Jeremy Liu, Ke-Thia Yao, Garrett Rose, Gangotree Chakma

Abstract: Current Deep Learning approaches have been very successful using convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers. Three limitations of this approach are: 1) they are based on a simple layered network topology, i.e., highly connected layers, without intra-layer connections; 2) the networks are manually configured to achieve optimal results, and… ▽ More Current Deep Learning approaches have been very successful using convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers. Three limitations of this approach are: 1) they are based on a simple layered network topology, i.e., highly connected layers, without intra-layer connections; 2) the networks are manually configured to achieve optimal results, and 3) the implementation of neuron model is expensive in both cost and power. In this paper, we evaluate deep learning models using three different computing architectures to address these problems: quantum computing to train complex topologies, high performance computing (HPC) to automatically determine network topology, and neuromorphic computing for a low-power hardware implementation. We use the MNIST dataset for our experiment, due to input size limitations of current quantum computers. Our results show the feasibility of using the three architectures in tandem to address the above deep learning limitations. We show a quantum computer can find high quality values of intra-layer connections weights, in a tractable time as the complexity of the network increases; a high performance computer can find optimal layer-based topologies; and a neuromorphic computer can represent the complex topology and weights derived from the other architectures in low power memristive hardware. △ Less

Submitted 13 July, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

arXiv:1301.3385 [pdf, other]

Recurrent Online Clustering as a Spatio-Temporal Feature Extractor in DeSTIN

Authors: Steven R. Young, Itamar Arel

Abstract: This paper presents a basic enhancement to the DeSTIN deep learning architecture by replacing the explicitly calculated transition tables that are used to capture temporal features with a simpler, more scalable mechanism. This mechanism uses feedback of state information to cluster over a space comprised of both the spatial input and the current state. The resulting architecture achieves state-of-… ▽ More This paper presents a basic enhancement to the DeSTIN deep learning architecture by replacing the explicitly calculated transition tables that are used to capture temporal features with a simpler, more scalable mechanism. This mechanism uses feedback of state information to cluster over a space comprised of both the spatial input and the current state. The resulting architecture achieves state-of-the-art results on the MNIST classification benchmark. △ Less

Submitted 16 January, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

Comments: 3 pages, 2 figures, Submitted to ICLR 2013

Showing 1–9 of 9 results for author: Young, S R