-
RGFN: Synthesizable Molecular Generation Using GFlowNets
Authors:
Michał Koziarski,
Andrei Rekesh,
Dmytro Shevchuk,
Almer van der Sloot,
Piotr Gaiński,
Yoshua Bengio,
Cheng-Hao Liu,
Mike Tyers,
Robert A. Batey
Abstract:
Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN),…
▽ More
Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Generative Active Learning for the Search of Small-molecule Protein Binders
Authors:
Maksym Korablyov,
Cheng-Hao Liu,
Moksh Jain,
Almer M. van der Sloot,
Eric Jolicoeur,
Edward Ruediger,
Andrei Cristian Nica,
Emmanuel Bengio,
Kostiantyn Lapchevskyi,
Daniel St-Cyr,
Doris Alexandra Schuetz,
Victor Ion Butoi,
Jarrid Rector-Brooks,
Simon Blackburn,
Leo Feng,
Hadi Nekoei,
SaiKrishna Gottipati,
Priyesh Vijayan,
Prateek Gupta,
Ladislav Rampášek,
Sasikanth Avancha,
Pierre-Luc Bacon,
William L. Hamilton,
Brooks Paige,
Sanchit Misra
, et al. (9 additional authors not shown)
Abstract:
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu…
▽ More
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection
Authors:
Bonaventure F. P. Dossou,
Dianbo Liu,
Xu Ji,
Moksh Jain,
Almer M. van der Sloot,
Roger Palou,
Michael Tyers,
Yoshua Bengio
Abstract:
As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisis causing the death of millions of people every year. Antimicrobial Peptides (AMPs) are one of the candidates to tackle this problem because of their potential diversity, and ability to favorably modulate the host immune response. However, large-scale screenin…
▽ More
As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisis causing the death of millions of people every year. Antimicrobial Peptides (AMPs) are one of the candidates to tackle this problem because of their potential diversity, and ability to favorably modulate the host immune response. However, large-scale screening of new AMP candidates is expensive, time-consuming, and now affordable in develo** countries, which need the treatments the most. In this work, we propose a novel active machine learning-based framework that statistically minimizes the number of wet-lab experiments needed to design new AMPs, while ensuring a high diversity and novelty of generated AMPs sequences, in multi-rounds of wet-lab AMP screening settings. Combining recurrent neural network models and a graph-based filter (GraphCC), our proposed approach delivers novel and diverse candidates and demonstrates better performances according to our defined metrics.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro
Authors:
Paul Bertin,
Jarrid Rector-Brooks,
Deepak Sharma,
Thomas Gaudelet,
Andrew Anighoro,
Torsten Gross,
Francisco Martinez-Pena,
Eileen L. Tang,
Suraj M S,
Cristian Regep,
Jeremy Hayter,
Maksym Korablyov,
Nicholas Valiante,
Almer van der Sloot,
Mike Tyers,
Charles Roberts,
Michael M. Bronstein,
Luke L. Lairson,
Jake P. Taylor-King,
Yoshua Bengio
Abstract:
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not…
▽ More
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not necessarily generalise out of distribution. We employ a sequential model optimization search utilising a deep learning model to quickly discover synergistic drug combinations active against a cancer cell line, requiring substantially less screening than an exhaustive evaluation. Our small scale wet lab experiments only account for evaluation of ~5% of the total search space. After only 3 rounds of ML-guided in vitro experimentation (including a calibration round), we find that the set of drug pairs queried is enriched for highly synergistic combinations; two additional rounds of ML-guided experiments were performed to ensure reproducibility of trends. Remarkably, we rediscover drug combinations later confirmed to be under study within clinical trials. Moreover, we find that drug embeddings generated using only structural information begin to reflect mechanisms of action. Prior in silico benchmarking suggests we can enrich search queries by a factor of ~5-10x for highly synergistic drug combinations by using sequential rounds of evaluation when compared to random selection, or by a factor of >3x when using a pretrained model selecting all drug combinations at a single time point.
△ Less
Submitted 2 March, 2023; v1 submitted 6 February, 2022;
originally announced February 2022.