-
NEBULA: Neural Empirical Bayes Under Latent Representations for Efficient and Controllable Design of Molecular Libraries
Authors:
Ewa M. Nowara,
Pedro O. Pinheiro,
Sai Pooja Mahajan,
Omar Mahmood,
Andrew Martin Watkins,
Saeed Saremi,
Michael Maser
Abstract:
We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest. Such libraries are crucial for scientific discovery, but it remains challenging to generate large numbers of high quality samples efficiently. 3D-voxel-based methods have recently shown great promise for generating high quality samples de novo from random…
▽ More
We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest. Such libraries are crucial for scientific discovery, but it remains challenging to generate large numbers of high quality samples efficiently. 3D-voxel-based methods have recently shown great promise for generating high quality samples de novo from random noise (Pinheiro et al., 2023). However, sampling in 3D-voxel space is computationally expensive and use in library generation is prohibitively slow. Here, we instead perform neural empirical Bayes sampling (Saremi & Hyvarinen, 2019) in the learned latent space of a vector-quantized variational autoencoder. NEBULA generates large molecular libraries nearly an order of magnitude faster than existing methods without sacrificing sample quality. Moreover, NEBULA generalizes better to unseen drug-like molecules, as demonstrated on two public datasets and multiple recently released drugs. We expect the approach herein to be highly enabling for machine learning-based drug discovery. The code is available at https://github.com/prescient-design/nebula
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
MoleCLUEs: Molecular Conformers Maximally In-Distribution for Predictive Models
Authors:
Michael Maser,
Natasa Tagasovska,
Jae Hyeon Lee,
Andrew Watkins
Abstract:
Structure-based molecular ML (SBML) models can be highly sensitive to input geometries and give predictions with large variance. We present an approach to mitigate the challenge of selecting conformations for such models by generating conformers that explicitly minimize predictive uncertainty. To achieve this, we compute estimates of aleatoric and epistemic uncertainties that are differentiable w.…
▽ More
Structure-based molecular ML (SBML) models can be highly sensitive to input geometries and give predictions with large variance. We present an approach to mitigate the challenge of selecting conformations for such models by generating conformers that explicitly minimize predictive uncertainty. To achieve this, we compute estimates of aleatoric and epistemic uncertainties that are differentiable w.r.t. latent posteriors. We then iteratively sample new latents in the direction of lower uncertainty by gradient descent. As we train our predictive models jointly with a conformer decoder, the new latent embeddings can be mapped to their corresponding inputs, which we call \textit{MoleCLUEs}, or (molecular) counterfactual latent uncertainty explanations \citep{antoran2020getting}. We assess our algorithm for the task of predicting drug properties from 3D structure with maximum confidence. We additionally analyze the structure trajectories obtained from conformer optimizations, which provide insight into the sources of uncertainty in SBML.
△ Less
Submitted 6 November, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
3D molecule generation by denoising voxel grids
Authors:
Pedro O. Pinheiro,
Joshua Rackers,
Joseph Kleinhenz,
Michael Maser,
Omar Mahmood,
Andrew Martin Watkins,
Stephen Ra,
Vishnu Sresht,
Saeed Saremi
Abstract:
We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy densit…
▽ More
We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the "clean" molecule by denoising the noisy grid with a single step. Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (ie, diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm. Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples.
△ Less
Submitted 8 March, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
BOtied: Multi-objective Bayesian optimization with tied multivariate ranks
Authors:
Ji Won Park,
NataĊĦa Tagasovska,
Michael Maser,
Stephen Ra,
Kyunghyun Cho
Abstract:
Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In t…
▽ More
Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function (CDF). Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied. BOtied inherits desirable invariance properties of the CDF, and an efficient implementation with copulas allows it to scale to many objectives. Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions while being computationally efficient for many objectives.
△ Less
Submitted 7 June, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
SupSiam: Non-contrastive Auxiliary Loss for Learning from Molecular Conformers
Authors:
Michael Maser,
Ji Won Park,
Joshua Yao-Yu Lin,
Jae Hyeon Lee,
Nathan C. Frey,
Andrew Watkins
Abstract:
We investigate Siamese networks for learning related embeddings for augmented samples of molecular conformers. We find that a non-contrastive (positive-pair only) auxiliary task aids in supervised training of Euclidean neural networks (E3NNs) and increases manifold smoothness (MS) around point-cloud geometries. We demonstrate this property for multiple drug-activity prediction tasks while maintain…
▽ More
We investigate Siamese networks for learning related embeddings for augmented samples of molecular conformers. We find that a non-contrastive (positive-pair only) auxiliary task aids in supervised training of Euclidean neural networks (E3NNs) and increases manifold smoothness (MS) around point-cloud geometries. We demonstrate this property for multiple drug-activity prediction tasks while maintaining relevant performance metrics, and propose an extension of MS to probabilistic and regression settings. We provide an analysis of representation collapse, finding substantial effects of task-weighting, latent dimension, and regularization. We expect the presented protocol to aid in the development of reliable E3NNs from molecular conformers, even for small-data drug discovery programs.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Graph Neural Networks for the Prediction of Substrate-Specific Organic Reaction Conditions
Authors:
Serim Ryou,
Michael R. Maser,
Alexander Y. Cui,
Travis J. DeLano,
Yisong Yue,
Sarah E. Reisman
Abstract:
We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions. To do so, we prepared a dataset collection of four ubiquitous reactions from the organic chemistry literature. We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions. We find that models are able to id…
▽ More
We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions. To do so, we prepared a dataset collection of four ubiquitous reactions from the organic chemistry literature. We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions. We find that models are able to identify specific graph features that affect reaction conditions and lead to accurate predictions. The results herein show great promise in advancing molecular machine learning.
△ Less
Submitted 9 July, 2020; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Nanoscale imaging of the electronic and structural transitions in vanadium dioxide
Authors:
M. M. Qazilbash,
A. Tripathi,
A. A. Schafgans,
Bong-Jun Kim,
Hyun-Tak Kim,
Zhonghou Cai,
M. V. Holt,
J. M. Maser,
F. Keilmann,
O. G. Shpyrko,
D. N. Basov
Abstract:
We investigate the electronic and structural changes at the nanoscale in vanadium dioxide (VO2) in the vicinity of its thermally driven phase transition. Both electronic and structural changes exhibit phase coexistence leading to percolation. In addition, we observe a dichotomy between the local electronic and structural transitions. Nanoscale x-ray diffraction reveals local, non-monotonic switchi…
▽ More
We investigate the electronic and structural changes at the nanoscale in vanadium dioxide (VO2) in the vicinity of its thermally driven phase transition. Both electronic and structural changes exhibit phase coexistence leading to percolation. In addition, we observe a dichotomy between the local electronic and structural transitions. Nanoscale x-ray diffraction reveals local, non-monotonic switching of the lattice structure, a phenomenon that is not seen in the electronic insulator-to-metal transition mapped by near-field infrared microscopy.
△ Less
Submitted 12 September, 2011;
originally announced September 2011.