Search | arXiv e-print repository

NUNet: Deep Learning for Non-Uniform Super-Resolution of Turbulent Flows

Authors: Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran

Abstract: Deep Learning (DL) algorithms are becoming increasingly popular for the reconstruction of high-resolution turbulent flows (aka super-resolution). However, current DL approaches perform spatially uniform super-resolution - a key performance limiter for scalability of DL-based surrogates for Computational Fluid Dynamics (CFD). To address the above challenge, we introduce NUNet, a deep learning-bas… ▽ More Deep Learning (DL) algorithms are becoming increasingly popular for the reconstruction of high-resolution turbulent flows (aka super-resolution). However, current DL approaches perform spatially uniform super-resolution - a key performance limiter for scalability of DL-based surrogates for Computational Fluid Dynamics (CFD). To address the above challenge, we introduce NUNet, a deep learning-based adaptive mesh refinement (AMR) framework for non-uniform super-resolution of turbulent flows. NUNet divides the input low-resolution flow field into patches, scores each patch, and predicts their target resolution. As a result, it outputs a spatially non-uniform flow field, adaptively refining regions of the fluid domain to achieve the target accuracy. We train NUNet with Reynolds-Averaged Navier-Stokes (RANS) solutions from three different canonical flows, namely turbulent channel flow, flat plate, and flow around ellipses. NUNet shows remarkable discerning properties, refining areas with complex flow features, such as near-wall domains and the wake region in flow around solid bodies, while leaving areas with smooth variations (such as the freestream) in the low-precision range. Hence, NUNet demonstrates an excellent qualitative and quantitative alignment with the traditional OpenFOAM AMR solver. Moreover, it reaches the same convergence guarantees as the AMR solver while accelerating it by 3.2-5.5x, including unseen-during-training geometries and boundary conditions, demonstrating its generalization capacities. Due to NUNet's ability to super-resolve only regions of interest, it predicts the same target 1024x1024 spatial resolution 7-28.5x faster than state-of-the-art DL methods and reduces the memory usage by 4.4-7.65x, showcasing improved scalability. △ Less

Submitted 26 March, 2022; originally announced March 2022.

arXiv:2108.07667 [pdf, other]

SURFNet: Super-resolution of Turbulent Flows with Transfer Learning using Small Datasets

Authors: Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran

Abstract: Deep Learning (DL) algorithms are emerging as a key alternative to computationally expensive CFD simulations. However, state-of-the-art DL approaches require large and high-resolution training data to learn accurate models. The size and availability of such datasets are a major limitation for the development of next-generation data-driven surrogate models for turbulent flows. This paper introduces… ▽ More Deep Learning (DL) algorithms are emerging as a key alternative to computationally expensive CFD simulations. However, state-of-the-art DL approaches require large and high-resolution training data to learn accurate models. The size and availability of such datasets are a major limitation for the development of next-generation data-driven surrogate models for turbulent flows. This paper introduces SURFNet, a transfer learning-based super-resolution flow network. SURFNet primarily trains the DL model on low-resolution datasets and transfer learns the model on a handful of high-resolution flow problems - accelerating the traditional numerical solver independent of the input size. We propose two approaches to transfer learning for the task of super-resolution, namely one-shot and incremental learning. Both approaches entail transfer learning on only one geometry to account for fine-grid flow fields requiring 15x less training data on high-resolution inputs compared to the tiny resolution (64x256) of the coarse model, significantly reducing the time for both data collection and training. We empirically evaluate SURFNet's performance by solving the Navier-Stokes equations in the turbulent regime on input resolutions up to 256x larger than the coarse model. On four test geometries and eight flow configurations unseen during training, we observe a consistent 2-2.1x speedup over the OpenFOAM physics solver independent of the test geometry and the resolution size (up to 2048x2048), demonstrating both resolution-invariance and generalization capabilities. Our approach addresses the challenge of reconstructing high-resolution solutions from coarse grid models trained using low-resolution inputs (super-resolution) without loss of accuracy and requiring limited computational resources. △ Less

Submitted 17 August, 2021; originally announced August 2021.

arXiv:2005.04485 [pdf, other]

doi 10.1145/3392717.3392772

CFDNet: a deep learning-based accelerator for fluid simulations

Authors: Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran

Abstract: CFD is widely used in physical system design and optimization, where it is used to predict engineering quantities of interest, such as the lift on a plane wing or the drag on a motor vehicle. However, many systems of interest are prohibitively expensive for design optimization, due to the expense of evaluating CFD simulations. To render the computation tractable, reduced-order or surrogate models… ▽ More CFD is widely used in physical system design and optimization, where it is used to predict engineering quantities of interest, such as the lift on a plane wing or the drag on a motor vehicle. However, many systems of interest are prohibitively expensive for design optimization, due to the expense of evaluating CFD simulations. To render the computation tractable, reduced-order or surrogate models are used to accelerate simulations while respecting the convergence constraints provided by the higher-fidelity solution. This paper introduces CFDNet -- a physical simulation and deep learning coupled framework, for accelerating the convergence of Reynolds Averaged Navier-Stokes simulations. CFDNet is designed to predict the primary physical properties of the fluid including velocity, pressure, and eddy viscosity using a single convolutional neural network at its core. We evaluate CFDNet on a variety of use-cases, both extrapolative and interpolative, where test geometries are observed/not-observed during training. Our results show that CFDNet meets the convergence constraints of the domain-specific physics solver while outperforming it by 1.9 - 7.4x on both steady laminar and turbulent flows. Moreover, we demonstrate the generalization capacity of CFDNet by testing its prediction on new geometries unseen during training. In this case, the approach meets the CFD convergence criterion while still providing significant speedups over traditional domain-only models. △ Less

Submitted 9 May, 2020; originally announced May 2020.

Comments: It has been accepted and almost published in the International Conference in Supercomputing (ICS) 2020

arXiv:2004.12023 [pdf, other]

doi 10.1063/5.0004997

NWChem: Past, Present, and Future

Authors: E. Aprà, E. J. Bylaska, W. A. de Jong, N. Govind, K. Kowalski, T. P. Straatsma, M. Valiev, H. J. J. van Dam, Y. Alexeev, J. Anchell, V. Anisimov, F. W. Aquino, R. Atta-Fynn, J. Autschbach, N. P. Bauman, J. C. Becca, D. E. Bernholdt, K. Bhaskaran-Nair, S. Bogatko, P. Borowski, J. Boschen, J. Brabec, A. Bruner, E. Cauët, Y. Chen , et al. (89 additional authors not shown)

Abstract: Specialized computational chemistry packages have permanently reshaped the landscape of chemical and materials science by providing tools to support and guide experimental efforts and for the prediction of atomistic and electronic properties. In this regard, electronic structure packages have played a special role by using first-principledriven methodologies to model complex chemical and materials… ▽ More Specialized computational chemistry packages have permanently reshaped the landscape of chemical and materials science by providing tools to support and guide experimental efforts and for the prediction of atomistic and electronic properties. In this regard, electronic structure packages have played a special role by using first-principledriven methodologies to model complex chemical and materials processes. Over the last few decades, the rapid development of computing technologies and the tremendous increase in computational power have offered a unique chance to study complex transformations using sophisticated and predictive many-body techniques that describe correlated behavior of electrons in molecular and condensed phase systems at different levels of theory. In enabling these simulations, novel parallel algorithms have been able to take advantage of computational resources to address the polynomial scaling of electronic structure methods. In this paper, we briefly review the NWChem computational chemistry suite, including its history, design principles, parallel tools, current capabilities, outreach and outlook. △ Less

Submitted 26 May, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: This article appeared in volume 152, issue 18, page 184102 of the Journal of Chemical Physics. It can be found at https://doi.org/10.1063/5.0004997

Journal ref: J. Chem. Phys., 152, 184102 (2020)

arXiv:1907.07763 [pdf, ps, other]

doi 10.1007/s40993-019-0177-7

Algebraic Relations Between Partition Functions and the $j$-Function

Authors: Alice Lin, Eleanor McSpirit, Adit Vishnu

Abstract: We obtain identities and relationships between the modular $j$-function, the generating functions for the classical partition function and the Andrews $spt$-function, and two functions related to unimodal sequences and a new partition statistic we call the "signed triangular weight" of a partition. These results follow from the closed formula we obtain for the Hecke action on a distinguished harmo… ▽ More We obtain identities and relationships between the modular $j$-function, the generating functions for the classical partition function and the Andrews $spt$-function, and two functions related to unimodal sequences and a new partition statistic we call the "signed triangular weight" of a partition. These results follow from the closed formula we obtain for the Hecke action on a distinguished harmonic Maass form $\mathscr{M}(τ)$ defined by Bringmann in her work on the Andrews $spt$-function. This formula involves a sequence of polynomials in $j(τ)$, through which we ultimately arrive at expressions for the coefficients of the $j$-function purely in terms of these combinatorial quantities. △ Less

Submitted 4 November, 2019; v1 submitted 17 July, 2019; originally announced July 2019.

Journal ref: Res. number theory 6, 2 (2020)

arXiv:1808.04456 [pdf, other]

Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction

Authors: Garrett B. Goh, Khushmeen Sakloth, Charles Siegel, Abhinav Vishnu, Jim Pfaendtner

Abstract: Deep learning algorithms excel at extracting patterns from raw data, and with large datasets, they have been very successful in computer vision and natural language applications. However, in other domains, large datasets on which to learn representations from may not exist. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature en… ▽ More Deep learning algorithms excel at extracting patterns from raw data, and with large datasets, they have been very successful in computer vision and natural language applications. However, in other domains, large datasets on which to learn representations from may not exist. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature engineering as well as learned representations from raw data. We illustrate the effectiveness of such network designs in the chemical sciences, for predicting biodegradability. DeepBioD, a multimodal CNN-MLP network is more accurate than either standalone network designs, and achieves an error classification rate of 0.125 that is 27% lower than the current state-of-the-art. Thus, our work indicates that combining traditional feature engineering with representation learning can be effective, particularly in situations where labeled data is limited. △ Less

Submitted 13 September, 2018; v1 submitted 13 August, 2018; originally announced August 2018.

Comments: Submitted to a peer-reviewed ML conference

arXiv:1807.00462 [pdf, other]

doi 10.1007/s10618-018-0577-7

ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites

Authors: Jiankai Sun, Abhinav Vishnu, Aniket Chakrabarti, Charles Siegel, Srinivasan Parthasarathy

Abstract: Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start -- a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. Co… ▽ More Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start -- a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision$@1$, Accuracy, MRR) over the state-of-the-art models such as semantic matching by $159.5\%$,$31.84\%$, and $40.36\%$ for cold questions posted by existing askers, and $123.1\%$, $27.03\%$, and $34.81\%$ for cold questions posted by new askers respectively. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: Accepted to the Journal Track of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2018); Published by Springer: https://link.springer.com/article/10.1007%2Fs10618-018-0577-7

Journal ref: @Article{Sun2018, author="Sun, Jiankai and Vishnu, A. and Chakrabarti, A. and Siegel, C. and Parthasarathy, S.", title="ColdRoute: effective routing of cold questions in stack exchange sites", journal="ECML PKDD", year="2018"}

arXiv:1803.05880 [pdf, other]

GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent

Authors: Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, Vinay Amatya

Abstract: In this paper, we present GossipGraD - a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from Θ(log(p)) for p compute nodes in well-studied SGD to O(1), 2) model diffusion such that compute nodes exchange their up… ▽ More In this paper, we present GossipGraD - a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from Θ(log(p)) for p compute nodes in well-studied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent over-fitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights Landing (KNL) connected with Aries network. We evaluate GossipGraD using well-studied dataset ImageNet-1K (~250GB), and widely studied neural network topologies such as GoogLeNet and ResNet50 (current winner of ImageNet Large Scale Visualization Research Challenge (ILSVRC)). Our performance evaluation using both KNL and Pascal GPUs indicates that GossipGraD can achieve perfect efficiency for these datasets and their associated neural network topologies. Specifically, for ResNet50, GossipGraD is able to achieve ~100% compute efficiency using 128 NVIDIA Pascal P100 GPUs - while matching the top-1 classification accuracy published in literature. △ Less

Submitted 15 March, 2018; originally announced March 2018.

Comments: 13 pages, 17 figures

arXiv:1712.02734 [pdf, other]

Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

Authors: Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas

Abstract: With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervi… ▽ More With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet's accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties. △ Less

Submitted 18 March, 2018; v1 submitted 7 December, 2017; originally announced December 2017.

Comments: Submitted to SIGKDD 2018

arXiv:1712.02034 [pdf, other]

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Authors: Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu

Abstract: Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the n… ▽ More Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for predicting distinct chemical properties including toxicity, activity, solubility and solvation energy, while also outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by develo** an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, it identified specific parts of a chemical that is consistent with established first-principles knowledge with an accuracy of 88%. Our work demonstrates that neural networks can learn technically accurate chemical concept and provide state-of-the-art accuracy, making interpretable deep neural networks a useful tool of relevance to the chemical industry. △ Less

Submitted 18 March, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

Comments: Submitted to SIGKDD 2018

arXiv:1710.02238 [pdf, other]

How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?

Authors: Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas, Nathan Baker

Abstract: The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of m… ▽ More The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model's performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties. △ Less

Submitted 18 March, 2018; v1 submitted 5 October, 2017; originally announced October 2017.

Comments: In Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

arXiv:1709.03316 [pdf, other]

What does fault tolerant Deep Learning need from MPI?

Authors: Vinay Amatya, Abhinav Vishnu, Charles Siegel, Jeff Daily

Abstract: Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system… ▽ More Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM. △ Less

Submitted 11 September, 2017; originally announced September 2017.

arXiv:1706.06689 [pdf]

Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models

Authors: Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas, Nathan Baker

Abstract: In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. W… ▽ More In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general-purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in toxicity prediction. Having matched the performance of expert-developed QSAR/QSPR deep learning models, our work demonstrates the plausibility of using deep neural networks to assist in computational chemistry research, where the feature engineering process is performed primarily by a deep learning algorithm. △ Less

Submitted 20 June, 2017; originally announced June 2017.

Comments: Submitted to a chemistry peer-reviewed journal

arXiv:1704.04560 [pdf, other]

User-transparent Distributed TensorFlow

Authors: Abhinav Vishnu, Joseph Manzano, Charles Siegel, Jeff Daily

Abstract: Deep Learning (DL) algorithms have become the {\em de facto} choice for data analysis. Several DL implementations -- primarily limited to a single compute node -- such as Caffe, TensorFlow, Theano and Torch have become readily available. Distributed DL implementations capable of execution on large scale systems are becoming important to address the computational needs of large data produced by sci… ▽ More Deep Learning (DL) algorithms have become the {\em de facto} choice for data analysis. Several DL implementations -- primarily limited to a single compute node -- such as Caffe, TensorFlow, Theano and Torch have become readily available. Distributed DL implementations capable of execution on large scale systems are becoming important to address the computational needs of large data produced by scientific simulations and experiments. Yet, the adoption of distributed DL implementations faces significant impediments: 1) most implementations require DL analysts to modify their code significantly -- which is a show-stopper, 2) several distributed DL implementations are geared towards cloud computing systems -- which is inadequate for execution on massively parallel systems such as supercomputers. This work addresses each of these problems. We provide a distributed memory DL implementation by incorporating required changes in the TensorFlow runtime itself. This dramatically reduces the entry barrier for using a distributed TensorFlow implementation. We use Message Passing Interface (MPI) -- which provides performance portability, especially since MPI specific changes are abstracted from users. Lastly -- and arguably most importantly -- we make our implementation available for broader use, under the umbrella of Machine Learning Toolkit for Extreme Scale (MaTEx) at {\texttt http://hpc.pnl.gov/matex}. We refer to our implementation as MaTEx-TensorFlow. △ Less

Submitted 14 April, 2017; originally announced April 2017.

Comments: 9 pages, 8 figures

arXiv:1701.04503 [pdf]

Deep Learning for Computational Chemistry

Authors: Garrett B. Goh, Nathan O. Hodas, Abhinav Vishnu

Abstract: The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many do… ▽ More The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many domains, particularly in speech recognition and computer vision, to the extent that the majority of expert practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties that distinguish them from traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including QSAR, virtual screening, protein structure prediction, quantum chemistry, materials design and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non-neural networks state-of-the-art models across disparate research topics, and deep neural network based models often exceeded the "glass ceiling" expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a valuable tool for computational chemistry. △ Less

Submitted 16 January, 2017; originally announced January 2017.

arXiv:1610.05116 [pdf, ps, other]

Fault Tolerant Frequent Pattern Mining

Authors: Sameh Shohdy, Abhinav Vishnu, Gagan Agrawal

Abstract: FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel paralle… ▽ More FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing, though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model. △ Less

Submitted 17 October, 2016; originally announced October 2016.

Comments: 10 Pages, High Performance Computing Conference (HIPC 2016)

arXiv:1610.00790 [pdf, other]

Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems

Authors: Charles Siegel, Jeff Daily, Abhinav Vishnu

Abstract: We present novel techniques to accelerate the convergence of Deep Learning algorithms by conducting low overhead removal of redundant neurons -- apoptosis of neurons -- which do not contribute to model learning, during the training phase itself. We provide in-depth theoretical underpinnings of our heuristics (bounding accuracy loss and handling apoptosis of several neuron types), and present the m… ▽ More We present novel techniques to accelerate the convergence of Deep Learning algorithms by conducting low overhead removal of redundant neurons -- apoptosis of neurons -- which do not contribute to model learning, during the training phase itself. We provide in-depth theoretical underpinnings of our heuristics (bounding accuracy loss and handling apoptosis of several neuron types), and present the methods to conduct adaptive neuron apoptosis. Specifically, we are able to improve the training time for several datasets by 2-3x, while reducing the number of parameters by up to 30x (4-5x on average) on datasets such as ImageNet classification. For the Higgs Boson dataset, our implementation improves the accuracy (measured by Area Under Curve (AUC)) for classification from 0.88/1 to 0.94/1, while reducing the number of parameters by 3x in comparison to existing literature. The proposed methods achieve a 2.44x speedup in comparison to the default (no apoptosis) algorithm. △ Less

Submitted 3 October, 2016; originally announced October 2016.

Comments: 11 pages, 7 figures

arXiv:1606.06274 [pdf, other]

A Data-Driven Approach for Semantic Role Labeling from Induced Grammar Structures in Language

Authors: Vivek Datla, David Lin, Max Louwerse, Abhinav Vishnu

Abstract: Semantic roles play an important role in extracting knowledge from text. Current unsupervised approaches utilize features from grammar structures, to induce semantic roles. The dependence on these grammars, however, makes it difficult to adapt to noisy and new languages. In this paper we develop a data-driven approach to identifying semantic roles, the approach is entirely unsupervised up to the p… ▽ More Semantic roles play an important role in extracting knowledge from text. Current unsupervised approaches utilize features from grammar structures, to induce semantic roles. The dependence on these grammars, however, makes it difficult to adapt to noisy and new languages. In this paper we develop a data-driven approach to identifying semantic roles, the approach is entirely unsupervised up to the point where rules need to be learned to identify the position the semantic role occurs. Specifically we develop a modified-ADIOS algorithm based on ADIOS Solan et al. (2005) to learn grammar structures, and use these grammar structures to learn the rules for identifying the semantic roles based on the context in which the grammar structures appeared. The results obtained are comparable with the current state-of-art models that are inherently dependent on human annotated data. △ Less

Submitted 20 June, 2016; originally announced June 2016.

arXiv:1605.01369 [pdf, ps, other]

Accelerating Deep Learning with Shrinkage and Recall

Authors: Shuai Zheng, Abhinav Vishnu, Chris Ding

Abstract: Deep Learning is a very powerful machine learning model. Deep Learning trains a large number of parameters for multiple layers and is very slow when data is in large scale and the architecture size is large. Inspired from the shrinking technique used in accelerating computation of Support Vector Machines (SVM) algorithm and screening technique used in LASSO, we propose a shrinking Deep Learning wi… ▽ More Deep Learning is a very powerful machine learning model. Deep Learning trains a large number of parameters for multiple layers and is very slow when data is in large scale and the architecture size is large. Inspired from the shrinking technique used in accelerating computation of Support Vector Machines (SVM) algorithm and screening technique used in LASSO, we propose a shrinking Deep Learning with recall (sDLr) approach to speed up deep learning computation. We experiment shrinking Deep Learning with recall (sDLr) using Deep Neural Network (DNN), Deep Belief Network (DBN) and Convolution Neural Network (CNN) on 4 data sets. Results show that the speedup using shrinking Deep Learning with recall (sDLr) can reach more than 2.0 while still giving competitive classification performance. △ Less

Submitted 19 September, 2016; v1 submitted 4 May, 2016; originally announced May 2016.

Comments: The 22nd IEEE International Conference on Parallel and Distributed Systems (ICPADS 2016)

arXiv:1603.02339 [pdf, other]

Distributed TensorFlow with MPI

Authors: Abhinav Vishnu, Charles Siegel, Jeffrey Daily

Abstract: Machine Learning and Data Mining (MLDM) algorithms are becoming increasingly important in analyzing large volume of data generated by simulations, experiments and mobile devices. With increasing data volume, distributed memory systems (such as tightly connected supercomputers or cloud computing systems) are becoming important in designing in-memory and massively parallel MLDM algorithms. Yet, the… ▽ More Machine Learning and Data Mining (MLDM) algorithms are becoming increasingly important in analyzing large volume of data generated by simulations, experiments and mobile devices. With increasing data volume, distributed memory systems (such as tightly connected supercomputers or cloud computing systems) are becoming important in designing in-memory and massively parallel MLDM algorithms. Yet, the majority of open source MLDM software is limited to sequential execution with a few supporting multi-core/many-core execution. In this paper, we extend recently proposed Google TensorFlow for execution on large scale clusters using Message Passing Interface (MPI). Our approach requires minimal changes to the TensorFlow runtime -- making the proposed implementation generic and readily usable to increasingly large users of TensorFlow. We evaluate our implementation using an InfiniBand cluster and several well knowndatasets. Our evaluation indicates the efficiency of our proposed implementation. △ Less

Submitted 18 August, 2017; v1 submitted 7 March, 2016; originally announced March 2016.

Comments: 6 pages; fixed significant typo

arXiv:1512.01283 [pdf, other]

Predicting the top and bottom ranks of billboard songs using Machine Learning

Authors: Vivek Datla, Abhinav Vishnu

Abstract: The music industry is a $130 billion industry. Predicting whether a song catches the pulse of the audience impacts the industry. In this paper we analyze language inside the lyrics of the songs using several computational linguistic algorithms and predict whether a song would make to the top or bottom of the billboard rankings based on the language features. We trained and tested an SVM classifier… ▽ More The music industry is a $130 billion industry. Predicting whether a song catches the pulse of the audience impacts the industry. In this paper we analyze language inside the lyrics of the songs using several computational linguistic algorithms and predict whether a song would make to the top or bottom of the billboard rankings based on the language features. We trained and tested an SVM classifier with a radial kernel function on the linguistic features. Results indicate that we can classify whether a song belongs to top and bottom of the billboard charts with a precision of 0.76. △ Less

Submitted 3 December, 2015; originally announced December 2015.

arXiv:1406.5161 [pdf, other]

Fast Support Vector Machines Using Parallel Adaptive Shrinking on Distributed Systems

Authors: Jeyanthi Narasimhan, Abhinav Vishnu, Lawrence Holder, Adolfy Hoisie

Abstract: Support Vector Machines (SVM), a popular machine learning technique, has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. Whether it is identifying high-risk patients by health-care professionals, or potential high-school students to enroll in college by school districts, SVMs can play a major role for social good. This paper undertakes… ▽ More Support Vector Machines (SVM), a popular machine learning technique, has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. Whether it is identifying high-risk patients by health-care professionals, or potential high-school students to enroll in college by school districts, SVMs can play a major role for social good. This paper undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Intuitive techniques for improving the time-space complexity including adaptive elimination of samples for faster convergence and sparse format representation are proposed. Under sample elimination, several heuristics for {\em earliest possible} to {\em lazy} elimination of non-contributing samples are proposed. In several cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The algorithm and heuristics are implemented and evaluated on various publicly available datasets. Empirical evaluation shows up to 26x speed improvement on some datasets against the sequential baseline, when evaluated on multiple compute nodes, and an improvement in execution time up to 30-60\% is readily observed on a number of other datasets against our parallel baseline. △ Less

Submitted 19 June, 2014; originally announced June 2014.

Comments: 10 pages, 9 figures, 3 tables

Showing 1–22 of 22 results for author: Vishnu, A