-
Computationally driven discovery of SARS-CoV-2 Mpro inhibitors: from design to experimental validation
Authors:
L. El Khoury,
Z. **g,
A. Cuzzolin,
A. Deplano,
D. Loco,
B. Sattarov,
F. Hédin,
S. Wendeborn,
C. Ho,
D. El Ahdab,
T. Jaffrelot Inizan,
M. Sturlese,
A. Sosic,
M. Volpiana,
A. Lugato,
M. Barone,
B. Gatto,
M. Ludovica Macchia,
M. Bellanda,
R. Battistutta,
C. Salata,
I. Kondratov,
R. Iminov,
A. Khairulin,
Y. Mykhalonok
, et al. (10 additional authors not shown)
Abstract:
We report a fast-track computationally-driven discovery of new SARS-CoV2 Main Protease (M$^{pro}$) inhibitors whose potency range from mM for initial non-covalent ligands to sub-$μ$M for the final covalent compound (IC50=830 +/- 50 nM). The project extensively relied on high-resolution all-atom molecular dynamics simulations and absolute binding free energy calculations performed using the polariz…
▽ More
We report a fast-track computationally-driven discovery of new SARS-CoV2 Main Protease (M$^{pro}$) inhibitors whose potency range from mM for initial non-covalent ligands to sub-$μ$M for the final covalent compound (IC50=830 +/- 50 nM). The project extensively relied on high-resolution all-atom molecular dynamics simulations and absolute binding free energy calculations performed using the polarizable AMOEBA force field. The study is complemented by extensive adaptive sampling simulations that are used to rationalize the different ligands binding poses through the explicit reconstruction of the ligand-protein conformation spaces. Machine Learning predictions are also performed to predict selected compound properties. While simulations extensively use High Performance Computing to strongly reduce time-to-solution, they were systematically coupled to Nuclear Magnetic Resonance experiments to drive synthesis and to in vitro characterization of compounds. Such study highlights the power of in silico strategies that rely on structure-based approaches for drug design and allows to address the protein conformational multiplicity problem. The proposed fluorinated tetrahydroquinolines open routes for further optimization of M$^{pro}$ inhibitors towards low nM affinities.
△ Less
Submitted 24 January, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning
Authors:
Sai Krishna Gottipati,
Boris Sattarov,
Sufeng Niu,
Yashaswi Pathak,
Haoran Wei,
Shengchao Liu,
Karam M. J. Thomas,
Simon Blackburn,
Connor W. Coley,
Jian Tang,
Sarath Chandar,
Yoshua Bengio
Abstract:
Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models. However, current generative approaches exhibit a significant challenge as they do not ensure that the proposed molecular structures can be feasibly synthesized nor do they provide the synthesis routes of the proposed small molecules, thereby se…
▽ More
Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models. However, current generative approaches exhibit a significant challenge as they do not ensure that the proposed molecular structures can be feasibly synthesized nor do they provide the synthesis routes of the proposed small molecules, thereby seriously limiting their practical applicability. In this work, we propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo drug design system. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space by subjecting commercially available small molecule building blocks to valid chemical reactions at every time step of the iterative virtual multi-step synthesis process. The proposed environment for drug discovery provides a highly challenging test-bed for RL algorithms owing to the large state space and high-dimensional continuous action space with hierarchical actions. PGFS achieves state-of-the-art performance in generating structures with high QED and penalized clogP. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets. Finally, we describe how the end-to-end training conceptualized in this study represents an important paradigm in radically expanding the synthesizable chemical space and automating the drug discovery process.
△ Less
Submitted 19 May, 2020; v1 submitted 26 April, 2020;
originally announced April 2020.
-
Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders
Authors:
Esben Jannik Bjerrum,
Boris Sattarov
Abstract:
Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de-novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here it is shown that the choice of chemical representation, such as SMILES strings, has a large influenc…
▽ More
Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de-novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here it is shown that the choice of chemical representation, such as SMILES strings, has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks(RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the bottleneck in QSAR modelling of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a markedly increase in the rate of decoding to a different molecules than encoded, a tendency that can be counteracted with more complex network architectures.
△ Less
Submitted 17 September, 2018; v1 submitted 25 June, 2018;
originally announced June 2018.