-
nach0: Multimodal Natural and Chemical Languages Foundation Model
Authors:
Micha Livne,
Zulfat Miftahutdinov,
Elena Tutubalina,
Maksim Kuznetsov,
Daniil Polykovskiy,
Annika Brundyn,
Aastha Jhunjhunwala,
Anthony Costa,
Alex Aliper,
Alán Aspuru-Guzik,
Alex Zhavoronkov
Abstract:
Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthe…
▽ More
Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups.
△ Less
Submitted 2 May, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor
Authors:
Feng Ren,
Xiao Ding,
Min Zheng,
Mikhail Korzinkin,
Xin Cai,
Wei Zhu,
Alexey Mantsyzov,
Alex Aliper,
Vladimir Aladinskiy,
Zhongying Cao,
Shanshan Kong,
Xi Long,
Bonnie Hei Man Liu,
Yingtao Liu,
Vladimir Naumov,
Anastasia Shneyderman,
Ivan V. Ozerov,
Ju Wang,
Frank W. Pun,
Alan Aspuru-Guzik,
Michael Levitt,
Alex Zhavoronkov
Abstract:
The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or li…
▽ More
The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, the second round of AI-powered compound generation was conducted and through which, a more potent hit molecule, ISM042-2 048, was discovered with a Kd value of 210.0 +/- 42.4 nM (n = 2), within 30 days and after synthesizing 6 compounds from the discovery of the first hit ISM042-2-001. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery.
△ Less
Submitted 12 February, 2022; v1 submitted 21 January, 2022;
originally announced January 2022.
-
Chemistry42: An AI-based platform for de novo molecular design
Authors:
Yan A. Ivanenkov,
Alex Zhebrak,
Dmitry Bezrukov,
Bogdan Zagribelnyy,
Vladimir Aladinskiy,
Daniil Polykovskiy,
Evgeny Putin,
Petrina Kamya,
Alexander Aliper,
Alex Zhavoronkov
Abstract:
Chemistry42 is a software platform for de novo small molecule design that integrates Artificial Intelligence (AI) techniques with computational and medicinal chemistry methods. Chemistry42 is unique in its ability to generate novel molecular structures with predefined properties validated through in vitro and in vivo studies. Chemistry42 is a core component of Insilico Medicine Pharma.ai drug disc…
▽ More
Chemistry42 is a software platform for de novo small molecule design that integrates Artificial Intelligence (AI) techniques with computational and medicinal chemistry methods. Chemistry42 is unique in its ability to generate novel molecular structures with predefined properties validated through in vitro and in vivo studies. Chemistry42 is a core component of Insilico Medicine Pharma.ai drug discovery suite that also includes target discovery and multi-omics data analysis (PandaOmics) and clinical trial outcomes predictions (InClinico).
△ Less
Submitted 22 January, 2021;
originally announced January 2021.