Search | arXiv e-print repository

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Authors: Artem Zholus, Maksim Kuznetsov, Roman Schutski, Rim Shayakhmetov, Daniil Polykovskiy, Sarath Chandar, Alex Zhavoronkov

Abstract: Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our mode… ▽ More Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our model produces molecular graphs and conformations jointly, eliminating the need for an extra graph reconstruction step. We pretrain BindGPT on a large-scale dataset and fine-tune it with reinforcement learning using scores from external simulation software. We demonstrate how a single pretrained language model can serve at the same time as a 3D molecular generative model, conformer generator conditioned on the molecular graph, and a pocket-conditioned 3D molecule generator. Notably, the model does not make any representational equivariance assumptions about the domain of generation. We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models, language models, and graph neural networks while being two orders of magnitude cheaper to sample. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2402.08210 [pdf, other]

Quantum Computing-Enhanced Algorithm Unveils Novel Inhibitors for KRAS

Authors: Mohammad Ghazi Vakili, Christoph Gorgulla, AkshatKumar Nigam, Dmitry Bezrukov, Daniel Varoli, Alex Aliper, Daniil Polykovsky, Krishna M. Padmanabha Das, Jamie Snider, Anna Lyakisheva, Ardalan Hosseini Mansob, Zhong Yao, Lela Bitar, Eugene Radchenko, Xiao Ding, **xin Liu, Fanye Meng, Feng Ren, Yudong Cao, Igor Stagljar, Alán Aspuru-Guzik, Alex Zhavoronkov

Abstract: The discovery of small molecules with therapeutic potential is a long-standing challenge in chemistry and biology. Researchers have increasingly leveraged novel computational techniques to streamline the drug development process to increase hit rates and reduce the costs associated with bringing a drug to market. To this end, we introduce a quantum-classical generative model that seamlessly integr… ▽ More The discovery of small molecules with therapeutic potential is a long-standing challenge in chemistry and biology. Researchers have increasingly leveraged novel computational techniques to streamline the drug development process to increase hit rates and reduce the costs associated with bringing a drug to market. To this end, we introduce a quantum-classical generative model that seamlessly integrates the computational power of quantum algorithms trained on a 16-qubit IBM quantum computer with the established reliability of classical methods for designing small molecules. Our hybrid generative model was applied to designing new KRAS inhibitors, a crucial target in cancer therapy. We synthesized 15 promising molecules during our investigation and subjected them to experimental testing to assess their ability to engage with the target. Notably, among these candidates, two molecules, ISM061-018-2 and ISM061-22, each featuring unique scaffolds, stood out by demonstrating effective engagement with KRAS. ISM061-018-2 was identified as a broad-spectrum KRAS inhibitor, exhibiting a binding affinity to KRAS-G12D at $1.4 μM$. Concurrently, ISM061-22 exhibited specific mutant selectivity, displaying heightened activity against KRAS G12R and Q61H mutants. To our knowledge, this work shows for the first time the use of a quantum-generative model to yield experimentally confirmed biological hits, showcasing the practical potential of quantum-assisted drug discovery to produce viable therapeutics. Moreover, our findings reveal that the efficacy of distribution learning correlates with the number of qubits utilized, underlining the scalability potential of quantum computing resources. Overall, we anticipate our results to be a step** stone towards develo** more advanced quantum generative models in drug discovery. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2311.12410 [pdf, other]

doi 10.1039/D4SC00966E

nach0: Multimodal Natural and Chemical Languages Foundation Model

Authors: Micha Livne, Zulfat Miftahutdinov, Elena Tutubalina, Maksim Kuznetsov, Daniil Polykovskiy, Annika Brundyn, Aastha Jhunjhunwala, Anthony Costa, Alex Aliper, Alán Aspuru-Guzik, Alex Zhavoronkov

Abstract: Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthe… ▽ More Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups. △ Less

Submitted 2 May, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Accepted to Chemical Science Journal. Models are publicly available via https://huggingface.co/insilicomedicine/nach0_base and https://huggingface.co/insilicomedicine/nach0_large

Journal ref: Chemical Science, 15(22), 8380-8389, 2024

arXiv:2210.16823 [pdf, other]

doi 10.1021/acs.jcim.3c00562

Exploring the Advantages of Quantum Generative Adversarial Networks in Generative Chemistry

Authors: Po-Yu Kao, Ya-Chu Yang, Wei-Yin Chiang, Jen-Yueh Hsiao, Yudong Cao, Alex Aliper, Feng Ren, Alan Aspuru-Guzik, Alex Zhavoronkov, Min-Hsiu Hsieh, Yen-Chu Lin

Abstract: De novo drug design with desired biological activities is crucial for develo** novel therapeutics for patients. The drug development process is time and resource-consuming, and it has a low probability of success. Recent advances in machine learning and deep learning technology have reduced the time and cost of the discovery process and therefore, improved pharmaceutical research and development… ▽ More De novo drug design with desired biological activities is crucial for develo** novel therapeutics for patients. The drug development process is time and resource-consuming, and it has a low probability of success. Recent advances in machine learning and deep learning technology have reduced the time and cost of the discovery process and therefore, improved pharmaceutical research and development. In this paper, we explore the combination of two rapidly-develo** fields with lead candidate discovery in the drug development process. First, Artificial intelligence has already been demonstrated to successfully accelerate conventional drug design approaches. Second, quantum computing has demonstrated promising potential in different applications, such as quantum chemistry, combinatorial optimizations, and machine learning. This manuscript explores hybrid quantum-classical generative adversarial networks (GAN) for small molecule discovery. We substituted each element of GAN with a variational quantum circuit (VQC) and demonstrated the quantum advantages in the small drug discovery. Utilizing a VQC in the noise generator of a GAN to generate small molecules achieves better physicochemical properties and performance in the goal-directed benchmark than the classical counterpart. Moreover, we demonstrate the potential of a VQC with only tens of learnable parameters in the generator of GAN to generate small molecules. We also demonstrate the quantum advantage of a VQC in the discriminator of GAN. In this hybrid model, the number of learnable parameters is significantly less than the classical ones, and it can still generate valid molecules. The hybrid model with only tens of training parameters in the quantum discriminator outperforms the MLP-based one in terms of both generated molecule properties and the achieved KL divergence. △ Less

Submitted 6 February, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

Comments: 28 pages, 12 tables, 16 figures;

arXiv:2201.09647 [pdf, other]

AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor

Authors: Feng Ren, Xiao Ding, Min Zheng, Mikhail Korzinkin, Xin Cai, Wei Zhu, Alexey Mantsyzov, Alex Aliper, Vladimir Aladinskiy, Zhongying Cao, Shanshan Kong, Xi Long, Bonnie Hei Man Liu, Yingtao Liu, Vladimir Naumov, Anastasia Shneyderman, Ivan V. Ozerov, Ju Wang, Frank W. Pun, Alan Aspuru-Guzik, Michael Levitt, Alex Zhavoronkov

Abstract: The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or li… ▽ More The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, the second round of AI-powered compound generation was conducted and through which, a more potent hit molecule, ISM042-2 048, was discovered with a Kd value of 210.0 +/- 42.4 nM (n = 2), within 30 days and after synthesizing 6 compounds from the discovery of the first hit ISM042-2-001. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery. △ Less

Submitted 12 February, 2022; v1 submitted 21 January, 2022; originally announced January 2022.

Comments: 9 pages, 6 figures

arXiv:2101.09050 [pdf]

Chemistry42: An AI-based platform for de novo molecular design

Authors: Yan A. Ivanenkov, Alex Zhebrak, Dmitry Bezrukov, Bogdan Zagribelnyy, Vladimir Aladinskiy, Daniil Polykovskiy, Evgeny Putin, Petrina Kamya, Alexander Aliper, Alex Zhavoronkov

Abstract: Chemistry42 is a software platform for de novo small molecule design that integrates Artificial Intelligence (AI) techniques with computational and medicinal chemistry methods. Chemistry42 is unique in its ability to generate novel molecular structures with predefined properties validated through in vitro and in vivo studies. Chemistry42 is a core component of Insilico Medicine Pharma.ai drug disc… ▽ More Chemistry42 is a software platform for de novo small molecule design that integrates Artificial Intelligence (AI) techniques with computational and medicinal chemistry methods. Chemistry42 is unique in its ability to generate novel molecular structures with predefined properties validated through in vitro and in vivo studies. Chemistry42 is a core component of Insilico Medicine Pharma.ai drug discovery suite that also includes target discovery and multi-omics data analysis (PandaOmics) and clinical trial outcomes predictions (InClinico). △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: 12 pages, 3 figures

arXiv:1811.12823 [pdf, other]

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Authors: Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan Aspuru-Guzik, Alex Zhavoronkov

Abstract: Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervised predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare an… ▽ More Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervised predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides a training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses. △ Less

Submitted 28 October, 2020; v1 submitted 29 November, 2018; originally announced November 2018.

arXiv:1707.02353 [pdf]

Evaluating race and sex diversity in the world's largest companies using deep neural networks

Authors: Konstantin Chekanov, Polina Mamoshina, Roman V. Yampolskiy, Radu Timofte, Morten Scheibye-Knudsen, Alex Zhavoronkov

Abstract: Diversity is one of the fundamental properties for the survival of species, populations, and organizations. Recent advances in deep learning allow for the rapid and automatic assessment of organizational diversity and possible discrimination by race, sex, age and other parameters. Automating the process of assessing the organizational diversity using the deep neural networks and eliminating the hu… ▽ More Diversity is one of the fundamental properties for the survival of species, populations, and organizations. Recent advances in deep learning allow for the rapid and automatic assessment of organizational diversity and possible discrimination by race, sex, age and other parameters. Automating the process of assessing the organizational diversity using the deep neural networks and eliminating the human factor may provide a set of real-time unbiased reports to all stakeholders. In this pilot study we applied the deep-learned predictors of race and sex to the executive management and board member profiles of the 500 largest companies from the 2016 Forbes Global 2000 list and compared the predicted ratios to the ratios within each company's country of origin and ranked them by the sex-, age- and race- diversity index (DI). While the study has many limitations and no claims are being made concerning the individual companies, it demonstrates a method for the rapid and impartial assessment of organizational diversity using deep neural networks. △ Less

Submitted 9 July, 2017; originally announced July 2017.

arXiv:1512.00126 [pdf]

GrantMed: a new, international system for tracking grants and funding trends in the life sciences

Authors: Yuri Nikolsky, Roman Gurinovich, Oleg Kuryan, Aleksandr Pashuk, Alexej Scherbakov, Konstantin Romantsov, Leslie C. Jellen, Alex Zhavoronkov

Abstract: Despite the success of PubMed and other search engines in managing the massive volume of biomedical literature and the retrieval of individual publications, grant-related data remains scattered and relatively inaccessible. This is problematic, as project and funding data has significant analytical value and could be integral to publication retrieval. Here, we introduce GrantMed, a searchable inter… ▽ More Despite the success of PubMed and other search engines in managing the massive volume of biomedical literature and the retrieval of individual publications, grant-related data remains scattered and relatively inaccessible. This is problematic, as project and funding data has significant analytical value and could be integral to publication retrieval. Here, we introduce GrantMed, a searchable international database of biomedical grants that integrates some 20 million publications with the nearly 1.4 million research projects and 650 billion dollars of funding that made them possible. For any given topic in the life sciences, Grantmed provides instantaneous visualization of the past 30 years of dollars spent and projects awarded, along with detailed individual project descriptions, funding amounts, and links to investigators, research organizations, and resulting publications. It summarizes trends in funding and publication rates for areas of interest and merges data from various national grant databases to create one international grant tracking system. This information will benefit the research community and funding entities alike. Users can view trends over time or current projects underway and use this information to navigate the decision-making process in moving forward. They can view projects prior to publication and records of previous projects. Convenient access to this data for analytical purposes will be beneficial in many ways, hel** to prevent project overlap, reduce funding redundancy, identify areas of success, accelerate dissemination of ideas, and expose knowledge gaps in moving forward. It is our hope that this will be a central resource for international life sciences research communities and the funding organizations that support them, ultimately streamlining progress. △ Less

Submitted 30 November, 2015; originally announced December 2015.

arXiv:1508.01060 [pdf]

Models of Innate Neural Attractors and Their Applications for Neural Information Processing

Authors: Ksenia P. Solovyeva, Iakov M. Karandashev, Alex Zhavoronkov, Witali L. Dunin-Barkowski

Abstract: In this work we reveal and explore a new class of attractor neural networks, based on inborn connections provided by model molecular markers, the molecular marker based attractor neural networks (MMBANN). We have explored conditions for the existence of attractor states, critical relations between their parameters and the spectrum of single neuron models, which can implement the MMBANN. Besides, w… ▽ More In this work we reveal and explore a new class of attractor neural networks, based on inborn connections provided by model molecular markers, the molecular marker based attractor neural networks (MMBANN). We have explored conditions for the existence of attractor states, critical relations between their parameters and the spectrum of single neuron models, which can implement the MMBANN. Besides, we describe functional models (perceptron and SOM) which obtain significant advantages, while using MMBANN. In particular, the perceptron based on MMBANN, gets specificity gain in orders of error probabilities values, MMBANN SOM obtains real neurophysiological meaning, the number of possible grandma cells increases 1000- fold with MMBANN. Each set of markers has a metric, which is used to make connections between neurons containing the markers. The resulting neural networks have sets of attractor states, which can serve as finite grids for representation of variables in computations. These grids may show dimensions of d = 0, 1, 2,... We work with static and dynamic attractor neural networks of dimensions d = 0 and d = 1. We also argue that the number of dimensions which can be represented by attractors of activities of neural networks with the number of elements N=104 does not exceed 8. △ Less

Submitted 5 August, 2015; originally announced August 2015.

Showing 1–10 of 10 results for author: Zhavoronkov, A