Search | arXiv e-print repository

doi 10.1007/s10822-023-00512-6

Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES

Authors: Esben Jannik Bjerrum, Christian Margreitter, Thomas Blaschke, Raquel López-Ríos de Castro

Abstract: Using generative deep learning models and reinforcement learning together can effectively generate new molecules with desired properties. By employing a multi-objective scoring function, thousands of high-scoring molecules can be generated, making this approach useful for drug discovery and material science. However, the application of these methods can be hindered by computationally expensive or… ▽ More Using generative deep learning models and reinforcement learning together can effectively generate new molecules with desired properties. By employing a multi-objective scoring function, thousands of high-scoring molecules can be generated, making this approach useful for drug discovery and material science. However, the application of these methods can be hindered by computationally expensive or time-consuming scoring procedures, particularly when a large number of function calls are required as feedback in the reinforcement learning optimization. Here, we propose the use of double-loop reinforcement learning with simplified molecular line entry system (SMILES) augmentation to improve the efficiency and speed of the optimization. By adding an inner loop that augments the generated SMILES strings to non-canonical SMILES for use in additional reinforcement learning rounds, we can both reuse the scoring calculations on the molecular level, thereby speeding up the learning process, as well as offer additional protection against mode collapse. We find that employing between 5 and 10 augmentation repetitions is optimal for the scoring functions tested and is further associated with an increased diversity in the generated compounds, improved reproducibility of the sampling runs and the generation of molecules of higher similarity to known ligands. △ Less

Submitted 3 March, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: 25 pages and 18 Figures. Supplementary material included

MSC Class: 68T07 ACM Class: I.2.1; J.3

arXiv:1711.07839 [pdf, other]

Application of generative autoencoder in de novo molecular design

Authors: Thomas Blaschke, Marcus Olivecrona, Ola Engkvist, Jürgen Bajorath, Hongming Chen

Abstract: A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and thei… ▽ More A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the training set were identified. △ Less

Submitted 21 November, 2017; originally announced November 2017.

arXiv:1704.07555 [pdf, other]

Molecular De Novo Design through Deep Reinforcement Learning

Authors: Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen

Abstract: This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological… ▽ More This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hop** or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model. △ Less

Submitted 29 August, 2017; v1 submitted 25 April, 2017; originally announced April 2017.

Showing 1–3 of 3 results for author: Blaschke, T