-
Uncertainty Quantification for Molecular Property Predictions with Graph Neural Architecture Search
Authors:
Shengli Jiang,
Shiyi Qin,
Reid C. Van Lehn,
Prasanna Balaprakash,
Victor M. Zavala
Abstract:
Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertaint…
▽ More
Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
△ Less
Submitted 28 June, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
A cross-study analysis of drug response prediction in cancer cell lines
Authors:
Fangfang Xia,
Jonathan Allen,
Prasanna Balaprakash,
Thomas Brettin,
Cristina Garcia-Cardona,
Austin Clyde,
Judith Cohn,
James Doroshow,
Xiaotian Duan,
Veronika Dubinkina,
Yvonne Evrard,
Ya Ju Fan,
Jason Gans,
Stewart He,
Pinyi Lu,
Sergei Maslov,
Alexander Partin,
Maulik Shukla,
Eric Stahlberg,
Justin M. Wozniak,
Hyunseung Yoo,
George Zaki,
Yitan Zhu,
Rick Stevens
Abstract:
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimat…
▽ More
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: NCI60, CTRP, GDSC, CCLE and gCSI. Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies, and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.
△ Less
Submitted 13 August, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Graph Neural Network Architecture Search for Molecular Property Prediction
Authors:
Shengli Jiang,
Prasanna Balaprakash
Abstract:
Predicting the properties of a molecule from its structure is a challenging task. Recently, deep learning methods have improved the state of the art for this task because of their ability to learn useful features from the given data. By treating molecule structure as graphs, where atoms and bonds are modeled as nodes and edges, graph neural networks (GNNs) have been widely used to predict molecula…
▽ More
Predicting the properties of a molecule from its structure is a challenging task. Recently, deep learning methods have improved the state of the art for this task because of their ability to learn useful features from the given data. By treating molecule structure as graphs, where atoms and bonds are modeled as nodes and edges, graph neural networks (GNNs) have been widely used to predict molecular properties. However, the design and development of GNNs for a given data set rely on labor-intensive design and tuning of the network architectures. Neural architecture search (NAS) is a promising approach to discover high-performing neural network architectures automatically. To that end, we develop an NAS approach to automate the design and development of GNNs for molecular property prediction. Specifically, we focus on automated development of message-passing neural networks (MPNNs) to predict the molecular properties of small molecules in quantum mechanics and physical chemistry data sets from the MoleculeNet benchmark. We demonstrate the superiority of the automatically discovered MPNNs by comparing them with manually designed GNNs from the MoleculeNet benchmark. We study the relative importance of the choices in the MPNN search space, demonstrating that customizing the architecture is critical to enhancing performance in molecular property prediction and that the proposed approach can perform customization automatically with minimal manual effort.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.