SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Goh, Garrett B.; Hodas, Nathan O.; Siegel, Charles; Vishnu, Abhinav

Statistics > Machine Learning

arXiv:1712.02034 (stat)

[Submitted on 6 Dec 2017 (v1), last revised 18 Mar 2018 (this version, v2)]

Title:SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Authors:Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu

View PDF

Abstract:Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for predicting distinct chemical properties including toxicity, activity, solubility and solvation energy, while also outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by develo** an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, it identified specific parts of a chemical that is consistent with established first-principles knowledge with an accuracy of 88%. Our work demonstrates that neural networks can learn technically accurate chemical concept and provide state-of-the-art accuracy, making interpretable deep neural networks a useful tool of relevance to the chemical industry.

Comments:	Submitted to SIGKDD 2018
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1712.02034 [stat.ML]
	(or arXiv:1712.02034v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1712.02034

Submission history

From: Garrett Goh [view email]
[v1] Wed, 6 Dec 2017 04:29:28 UTC (630 KB)
[v2] Sun, 18 Mar 2018 13:50:32 UTC (181 KB)

Statistics > Machine Learning

Title:SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators