Autonomous Drug Design with Multi-Armed Bandits
Authors:
Hampus Gummesson Svensson,
Esben Jannik Bjerrum,
Christian Tyrchan,
Ola Engkvist,
Morteza Haghir Chehreghani
Abstract:
Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules…
▽ More
Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select among provided suggestions in a closed-loop system. We formulate this task as a stochastic multi-armed bandit problem with multiple plays, volatile arms and similarity information. To solve this task, we adapt previous work on multi-armed bandits to this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection strategies. According to our simulation results, our approach has the potential to perform better exploration and exploitation of the chemical space for autonomous drug design.
△ Less
Submitted 20 January, 2023; v1 submitted 4 July, 2022;
originally announced July 2022.
DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs)
Authors:
Esben Jannik Bjerrum
Abstract:
The isoelectric point (IEP or pI) is the pH where the net charge on the molecular ensemble of peptides and proteins is zero. This physical-chemical property is dependent on protonable/deprotonable sidechains and their pKa values. Here an pI prediction model is trained from a database of peptide sequences and pIs using a recurrent neural network (RNN) with long short-term memory (LSTM) cells. The t…
▽ More
The isoelectric point (IEP or pI) is the pH where the net charge on the molecular ensemble of peptides and proteins is zero. This physical-chemical property is dependent on protonable/deprotonable sidechains and their pKa values. Here an pI prediction model is trained from a database of peptide sequences and pIs using a recurrent neural network (RNN) with long short-term memory (LSTM) cells. The trained model obtains an RMSE and R$^2$ of 0.28 and 0.95 for the external test set. The model is not based on pKa values, but prediction of constructed test sequences show similar rankings as already known pKa values. The prediction depends mostly on the existence of known acidic and basic amino acids with fine-adjusted based on the neighboring sequence and position of the charged amino acids in the peptide chain.
△ Less
Submitted 27 December, 2017;
originally announced December 2017.
Molecular Generation with Recurrent Neural Networks (RNNs)
Authors:
Esben Jannik Bjerrum,
Richard Threlfall
Abstract:
The potential number of drug like small molecules is estimated to be between 10^23 and 10^60 while current databases of known compounds are orders of magnitude smaller with approximately 10^8 compounds. This discrepancy has led to an interest in generating virtual libraries using hand crafted chemical rules and fragment based methods to cover a larger area of chemical space and generate chemical l…
▽ More
The potential number of drug like small molecules is estimated to be between 10^23 and 10^60 while current databases of known compounds are orders of magnitude smaller with approximately 10^8 compounds. This discrepancy has led to an interest in generating virtual libraries using hand crafted chemical rules and fragment based methods to cover a larger area of chemical space and generate chemical libraries for use in in silico drug discovery endeavors. Here it is explored to what extent a recurrent neural network with long short term memory cells can figure out sensible chemical rules and generate synthesizable molecules by being trained on existing compounds encoded as SMILES. The networks can to a high extent generate novel, but chemically sensible molecules. The properties of the molecules are tuned by training on two different datasets consisting of fragment like molecules and drug like molecules. The produced molecules and the training databases have very similar distributions of molar weight, predicted logP, number of hydrogen bond acceptors and donors, number of rotatable bonds and topological polar surface area when compared to their respective training sets. The compounds are for the most cases synthesizable as assessed with SA score and Wiley ChemPlanner.
△ Less
Submitted 17 May, 2017; v1 submitted 12 May, 2017;
originally announced May 2017.