HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery
Authors:
Alvaro Prat,
Hisham Abdel Aty,
Gintautas Kamuntavičius,
Tanya Paquet,
Povilas Norvaišas,
Piero Gasparotto,
Roy Tal
Abstract:
We propose HydraScreen, a deep-learning approach that aims to provide a framework for more robust machine-learning-accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network, designed for the effective representation of molecular structures and interactions in protein-ligand binding. We design an end-to-end pipeline for high-throughput screening and lead op…
▽ More
We propose HydraScreen, a deep-learning approach that aims to provide a framework for more robust machine-learning-accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network, designed for the effective representation of molecular structures and interactions in protein-ligand binding. We design an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assess our approach using established public benchmarks based on the CASF 2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). Furthermore, we utilize a novel interaction profiling approach to identify potential biases in the model and dataset to boost interpretability and support the unbiased nature of our method. Finally, we showcase HydraScreen's capacity to generalize across unseen proteins and ligands, offering directions for future development of robust machine learning scoring functions. HydraScreen (accessible at https://hydrascreen.ro5.ai) provides a user-friendly GUI and a public API, facilitating easy assessment of individual protein-ligand complexes.
△ Less
Submitted 22 September, 2023;
originally announced November 2023.
Competing Models
Authors:
Jose Luis Montiel Olea,
Pietro Ortoleva,
Mallesh M Pai,
Andrea Prat
Abstract:
Different agents need to make a prediction. They observe identical data, but have different models: they predict using different explanatory variables. We study which agent believes they have the best predictive ability -- as measured by the smallest subjective posterior mean squared prediction error -- and show how it depends on the sample size. With small samples, we present results suggesting i…
▽ More
Different agents need to make a prediction. They observe identical data, but have different models: they predict using different explanatory variables. We study which agent believes they have the best predictive ability -- as measured by the smallest subjective posterior mean squared prediction error -- and show how it depends on the sample size. With small samples, we present results suggesting it is an agent using a low-dimensional model. With large samples, it is generally an agent with a high-dimensional model, possibly including irrelevant variables, but never excluding relevant ones. We apply our results to characterize the winning model in an auction of productive assets, to argue that entrepreneurs and investors with simple models will be over-represented in new sectors, and to understand the proliferation of "factors" that explain the cross-sectional variation of expected stock returns in the asset-pricing literature.
△ Less
Submitted 11 November, 2021; v1 submitted 8 July, 2019;
originally announced July 2019.