-
AI for Chemical Space Gap Filling and Novel Compound Generation
Authors:
Monee Y. McGrady,
Sean M. Colby,
Jamie R Nuñez,
Ryan S. Renslow,
Thomas O. Metz
Abstract:
When considering large sets of molecules, it is helpful to place them in the context of a "chemical space" - a multidimensional space defined by a set of descriptors that can be used to visualize and analyze compound grou** as well as identify regions that might be void of valid structures. The chemical space of all possible molecules in a given biological or environmental sample can be vast and…
▽ More
When considering large sets of molecules, it is helpful to place them in the context of a "chemical space" - a multidimensional space defined by a set of descriptors that can be used to visualize and analyze compound grou** as well as identify regions that might be void of valid structures. The chemical space of all possible molecules in a given biological or environmental sample can be vast and largely unexplored, mainly due to current limitations in processing of 'big data' by brute force methods (e.g., enumeration of all possible compounds in a space). Recent advances in artificial intelligence (AI) have led to multiple new cheminformatics tools that incorporate AI techniques to characterize and learn the structure and properties of molecules in order to generate plausible compounds, thereby contributing to more accessible and explorable regions of chemical space without the need for brute force methods. We have used one such tool, a deep-learning software called DarkChem, which learns a representation of the molecular structure of compounds by compressing them into a latent space. With DarkChem's design, distance in this latent space is often associated with compound similarity, making sparse regions interesting targets for compound generation due to the possibility of generating novel compounds. In this study, we used 1 million small molecules (less than 1000 Da) to create a representative chemical space (defined by calculated molecular properties) of all small molecules. We identified regions with few or no compounds and investigated their location in DarkChem's latent space. From these spaces, we generated 694,645 valid molecules, all of which represent molecules not found in any chemical database to date. These molecules filled 50.8% of the probed empty spaces in molecular property space. Generated molecules are provided in the supporting information.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Collision cross section specificity for small molecule identification workflows
Authors:
Jamie Nunez,
Eva Brayfindley,
Sean M. Colby,
Monee McGrady,
Kristin H. Jarman,
Ryan S. Renslow,
Thomas O. Metz
Abstract:
The physical-chemical property of molecular collision cross section (CCS) is increasingly used to assist in small molecule identification; however, questions remain regarding the extent of its true utility in contributing to such identifications, especially given its correlation with mass. To investigate the contribution of CCS to uniqueness within a given library, we measured its discriminatory c…
▽ More
The physical-chemical property of molecular collision cross section (CCS) is increasingly used to assist in small molecule identification; however, questions remain regarding the extent of its true utility in contributing to such identifications, especially given its correlation with mass. To investigate the contribution of CCS to uniqueness within a given library, we measured its discriminatory capacity as a function of error in CCS values (from measurement or prediction), CCS variance, parent mass, mass error, and/or reference database size using a multi-directional grid search. While experimental CCS databases exist, they are currently small; thus, we used a CCS prediction tool, DarkChem, to provide theoretical CCS values for use in this study. These predicted CCS values were then modified to mirror experimental variance. By augmenting our search within a library based on mass alone with CCS at a variety of accuracies, we found that, (i) the use of multiple adducts (i.e. alternative ionized forms of the same parent compound) for the same molecule, compared to using a single adduct, greatly improves specificity and (ii) even a single CCS leads to a significant specificity boost when low CCS error (e.g. 1% composite error) can be achieved. Based on these results, we recommend using multiple adducts to build up evidence of presence, as each adduct supplies additional information per dimension. Additionally, the utility of ion mobility spectrometry when coupled with mass spectrometry should still be considered, regardless of whether CCS is considered as an identification metric, due to advantages such as increased peak resolution, sensitivity (e.g. from reducing load on the detector at any given time), improvements in data-independent MS/MS spectra acquisition, and cleaner tandem mass spectral fragmentation patterns.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Application and Assessment of Deep Learning for the Generation of Potential NMDA Receptor Antagonists
Authors:
Katherine J. Schultz,
Sean M. Colby,
Yasemin Yesiltepe,
Jamie R. Nuñez,
Monee Y. McGrady,
Ryan R. Renslow
Abstract:
Uncompetitive antagonists of the N-methyl D-aspartate receptor (NMDAR) have demonstrated therapeutic benefit in the treatment of neurological diseases such as Parkinson's and Alzheimer's, but some also cause dissociative effects that have led to the synthesis of illicit drugs. The ability to generate NMDAR antagonists in silico is therefore desirable both for new medication development and for pre…
▽ More
Uncompetitive antagonists of the N-methyl D-aspartate receptor (NMDAR) have demonstrated therapeutic benefit in the treatment of neurological diseases such as Parkinson's and Alzheimer's, but some also cause dissociative effects that have led to the synthesis of illicit drugs. The ability to generate NMDAR antagonists in silico is therefore desirable both for new medication development and for preempting and identifying new designer drugs. Recently, generative deep learning models have been applied to de novo drug design as a means to expand the amount of chemical space that can be explored for potential drug-like compounds. In this study, we assess the application of a generative model to the NMDAR to achieve two primary objectives: (i) the creation and release of a comprehensive library of experimentally validated NMDAR phencyclidine (PCP) site antagonists to assist the drug discovery community and (ii) an analysis of both the advantages conferred by applying such generative artificial intelligence models to drug design and the current limitations of the approach. We apply, and provide source code for, a variety of ligand- and structure-based assessment techniques used in standard drug discovery analyses to the deep learning-generated compounds. We present twelve candidate antagonists that are not available in existing chemical databases to provide an example of what this type of workflow can achieve, though synthesis and experimental validation of these compounds is still required.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.