Advancing Standards-Free Methods for the Identification of Small Molecules in Complex Samples
Authors:
Jamie R. Nuñez,
Sean M. Colby,
Dennis G. Thomas,
Malak M. Tfaily,
Nikola Tolic,
Elin M. Ulrich,
Jon R. Sobus,
Thomas O. Metz,
Justin G. Teeguarden,
Ryan S. Renslow
Abstract:
The current gold standard for unambiguous identification in metabolomics analysis is based on comparing two or more orthogonal properties from the analysis of authentic, pure reference materials (standards) to experimental data acquired in the same laboratory with the same analytical methods. This represents a significant limitation for comprehensive chemical identification of small molecules in c…
▽ More
The current gold standard for unambiguous identification in metabolomics analysis is based on comparing two or more orthogonal properties from the analysis of authentic, pure reference materials (standards) to experimental data acquired in the same laboratory with the same analytical methods. This represents a significant limitation for comprehensive chemical identification of small molecules in complex samples since this process is time-consuming and costly, and the majority of molecules are not yet represented by standards, leading to a need for standards-free identification. To address this need, we are advancing chemical property calculations and develo** multi-attribute scoring and matching algorithms to utilize data from multiple analytical platforms through the utilization and creation of the in silico Chemical Library Engine (ISiCLE) and the Multi-Attribute Matching Engine (MAME). Here, we describe our results in a blinded analysis of synthetic chemical mixtures as part of the U.S. Environmental Protection Agency's (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT). The blinded false negative rate (FNR), false discovery rate (FDR), and accuracy were 57%, 77%, and 91%, respectively. For high confidence identifications, the FDR was 35%. After unblinding of the sample compositions, we improved our approach by optimizing the scoring parameters used to increase confidence. The final FNR, FDR, and accuracy were 67%, 53%, and 96%, respectively. For high confidence identifications, the FDR was 10%. This study demonstrates that standards-free small molecule identification and multi-attribute matching methods can significantly reduce reliance on standards.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
ISiCLE: A molecular collision cross section calculation pipeline for establishing large in silico reference libraries for compound identification
Authors:
Sean M. Colby,
Dennis G. Thomas,
Jamie R. Nunez,
Douglas J. Baxter,
Kurt R. Glaesemann,
Joseph M. Brown,
Meg A Pirrung,
Niranjan Govind,
Justin G. Teeguarden,
Thomas O. Metz,
Ryan S. Renslow
Abstract:
Comprehensive and confident identifications of metabolites and other chemicals in complex samples will revolutionize our understanding of the role these chemically diverse molecules play in biological systems. Despite recent advances, metabolomics studies still result in the detection of a disproportionate number of features than cannot be confidently assigned to a chemical structure. This inadequ…
▽ More
Comprehensive and confident identifications of metabolites and other chemicals in complex samples will revolutionize our understanding of the role these chemically diverse molecules play in biological systems. Despite recent advances, metabolomics studies still result in the detection of a disproportionate number of features than cannot be confidently assigned to a chemical structure. This inadequacy is driven by the single most significant limitation in metabolomics: the reliance on reference libraries constructed by analysis of authentic reference chemicals. To this end, we have developed the in silico chemical library engine (ISiCLE), a high-performance computing-friendly cheminformatics workflow for generating libraries of chemical properties. In the instantiation described here, we predict probable three-dimensional molecular conformers using chemical identifiers as input, from which collision cross sections (CCS) are derived. The approach employs state-of-the-art first-principles simulation, distinguished by use of molecular dynamics, quantum chemistry, and ion mobility calculations to generate structures and libraries, all without training data. Importantly, optimization of ISiCLE included a refactoring of the popular MOBCAL code for trajectory-based mobility calculations, improving its computational efficiency by over two orders of magnitude. Calculated CCS values were validated against 1,983 experimentally-measured CCS values and compared to previously reported CCS calculation approaches. An online database is introduced for sharing both calculated and experimental CCS values (metabolomics.pnnl.gov), initially including a CCS library with over 1 million entries. Finally, three successful applications of molecule characterization using calculated CCS are described. This work represents a promising method to address the limitations of small molecule identification.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.