Search | arXiv e-print repository

MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

Authors: Yanru Qu, Keyue Qiu, Yuxuan Song, **g**g Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

Abstract: Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and… ▽ More Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT. △ Less

Submitted 27 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted to ICML 2024

arXiv:2403.15441 [pdf, other]

Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks

Authors: Yuxuan Song, **g**g Gong, Yanru Qu, Hao Zhou, Mingyue Zheng, **g**g Liu, Wei-Ying Ma

Abstract: Advanced generative model (e.g., diffusion model) derived from simplified continuity assumptions of data distribution, though showing promising progress, has been difficult to apply directly to geometry generation applications due to the multi-modality and noise-sensitive nature of molecule geometry. This work introduces Geometric Bayesian Flow Networks (GeoBFN), which naturally fits molecule geom… ▽ More Advanced generative model (e.g., diffusion model) derived from simplified continuity assumptions of data distribution, though showing promising progress, has been difficult to apply directly to geometry generation applications due to the multi-modality and noise-sensitive nature of molecule geometry. This work introduces Geometric Bayesian Flow Networks (GeoBFN), which naturally fits molecule geometry by modeling diverse modalities in the differentiable parameter space of distributions. GeoBFN maintains the SE-(3) invariant density modeling property by incorporating equivariant inter-dependency modeling on parameters of distributions and unifying the probabilistic modeling of different modalities. Through optimized training and sampling techniques, we demonstrate that GeoBFN achieves state-of-the-art performance on multiple 3D molecule generation benchmarks in terms of generation quality (90.87% molecule stability in QM9 and 85.6% atom stability in GEOM-DRUG. GeoBFN can also conduct sampling with any number of steps to reach an optimal trade-off between efficiency and quality (e.g., 20-times speedup without sacrificing performance). △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: ICLR 2024

arXiv:2402.17997 [pdf]

StaPep: an open-source tool for the structure prediction and feature extraction of hydrocarbon-stapled peptides

Authors: Zhe Wang, Jian** Wu, Mengjun Zheng, Chenchen Geng, Borui Zhen, Wei Zhang, Hui Wu, Zhengyang Xu, Gang Xu, Si Chen, Xiang Li

Abstract: Many tools exist for extracting structural and physiochemical descriptors from linear peptides to predict their properties, but similar tools for hydrocarbon-stapled peptides are lacking.Here, we present StaPep, a Python-based toolkit designed for generating 2D/3D structures and calculating 21 distinct features for hydrocarbon-stapled peptides.The current version supports hydrocarbon-stapled pepti… ▽ More Many tools exist for extracting structural and physiochemical descriptors from linear peptides to predict their properties, but similar tools for hydrocarbon-stapled peptides are lacking.Here, we present StaPep, a Python-based toolkit designed for generating 2D/3D structures and calculating 21 distinct features for hydrocarbon-stapled peptides.The current version supports hydrocarbon-stapled peptides containing 2 non-standard amino acids (norleucine and 2-aminoisobutyric acid) and 6 nonnatural anchoring residues (S3, S5, S8, R3, R5 and R8).Then we established a hand-curated dataset of 201 hydrocarbon-stapled peptides and 384 linear peptides with sequence information and experimental membrane permeability, to showcase StaPep's application in artificial intelligence projects.A machine learning-based predictor utilizing above calculated features was developed with AUC of 0.85, for identifying cell-penetrating hydrocarbon-stapled peptides.StaPep's pipeline spans data retrieval, cleaning, structure generation, molecular feature calculation, and machine learning model construction for hydrocarbon-stapled peptides.The source codes and dataset are freely available on Github: https://github.com/dahuilangda/stapep_package. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 26 pages, 6 figures

arXiv:2402.14315 [pdf, other]

Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling

Authors: Yuwei Yang, Siqi Ouyang, Xueyu Hu, Mingyue Zheng, Hao Zhou, Lei Li

Abstract: Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves op… ▽ More Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves optimization-based approaches more promising in practical scenario. However, existing optimization-based approaches choose to edit molecules in 2D space, and use molecular docking to estimate the activity using docking predicted 3D target-ligand complexes. The misalignment between the action space and the objective hinders the performance of these models, especially for those employ deep learning for acceleration. In this work, we propose MolEdit3D to combine 3D molecular generation with optimization frameworks. We develop a novel 3D graph editing model to generate molecules using fragments, and pre-train this model on abundant 3D ligands for learning target-independent properties. Then we employ a target-guided self-learning strategy to improve target-related properties using self-sampled molecules. MolEdit3D achieves state-of-the-art performance on majority of the evaluation metrics, and demonstrate strong capability of capturing both target-dependent and -independent properties. △ Less

Submitted 15 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2201.09647 [pdf, other]

AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor

Authors: Feng Ren, Xiao Ding, Min Zheng, Mikhail Korzinkin, Xin Cai, Wei Zhu, Alexey Mantsyzov, Alex Aliper, Vladimir Aladinskiy, Zhongying Cao, Shanshan Kong, Xi Long, Bonnie Hei Man Liu, Yingtao Liu, Vladimir Naumov, Anastasia Shneyderman, Ivan V. Ozerov, Ju Wang, Frank W. Pun, Alan Aspuru-Guzik, Michael Levitt, Alex Zhavoronkov

Abstract: The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or li… ▽ More The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, the second round of AI-powered compound generation was conducted and through which, a more potent hit molecule, ISM042-2 048, was discovered with a Kd value of 210.0 +/- 42.4 nM (n = 2), within 30 days and after synthesizing 6 compounds from the discovery of the first hit ISM042-2-001. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery. △ Less

Submitted 12 February, 2022; v1 submitted 21 January, 2022; originally announced January 2022.

Comments: 9 pages, 6 figures

arXiv:2102.11066 [pdf, other]

doi 10.1038/s41467-022-34027-9

Epidemic spreading under mutually independent intra- and inter-host pathogen evolution

Authors: Xiyun Zhang, Zhongyuan Ruan, Muhua Zheng, Jie Zhou, Stefano Boccaletti, Baruch Barzel

Abstract: The dynamics of epidemic spreading is often reduced to the single control parameter $R_0$, whose value, above or below unity, determines the state of the contagion. If, however, the pathogen evolves as it spreads, $R_0$ may change over time, potentially leading to a mutation-driven spread, in which an initially sub-pandemic pathogen undergoes a breakthrough mutation. To predict the boundaries of t… ▽ More The dynamics of epidemic spreading is often reduced to the single control parameter $R_0$, whose value, above or below unity, determines the state of the contagion. If, however, the pathogen evolves as it spreads, $R_0$ may change over time, potentially leading to a mutation-driven spread, in which an initially sub-pandemic pathogen undergoes a breakthrough mutation. To predict the boundaries of this pandemic phase, we introduce here a modeling framework to couple the network spreading patterns with the intra-host evolutionary dynamics. For many pathogens these two processes, intra- and inter-host, are driven by different selection forces. And yet here we show that even in the extreme case when these two forces are mutually independent, mutations can still fundamentally alter the pandemic phase-diagram, whose transitions are now shaped, not just by $R_0$, but also by the balance between the epidemic and the evolutionary timescales. If mutations are too slow, the pathogen prevalence decays prior to the appearance of a critical mutation. On the other hand, if mutations are too rapid, the pathogen evolution becomes volatile and, once again, it fails to spread. Between these two extremes, however, we identify a broad range of conditions in which an initially sub-pandemic pathogen can break through to gain widespread prevalence. △ Less

Submitted 4 November, 2022; v1 submitted 19 February, 2021; originally announced February 2021.

Journal ref: Nat Commun 13, 6218 (2022)

arXiv:1904.11793 [pdf, other]

doi 10.1073/pnas.1922248117

Geometric renormalization unravels self-similarity of the multiscale human connectome

Authors: Muhua Zheng, Antoine Allard, Patric Hagmann, Yasser Alemán-Gómez, M. Ángeles Serrano

Abstract: Structural connectivity in the brain is typically studied by reducing its observation to a single spatial resolution. However, the brain possesses a rich architecture organized over multiple scales linked to one another. We explored the multiscale organization of human connectomes using datasets of healthy subjects reconstructed at five different resolutions. We found that the structure of the hum… ▽ More Structural connectivity in the brain is typically studied by reducing its observation to a single spatial resolution. However, the brain possesses a rich architecture organized over multiple scales linked to one another. We explored the multiscale organization of human connectomes using datasets of healthy subjects reconstructed at five different resolutions. We found that the structure of the human brain remains self-similar when the resolution of observation is progressively decreased by hierarchical coarse-graining of the anatomical regions. Strikingly, a geometric network model, where distances are not Euclidean, predicts the multiscale properties of connectomes, including self-similarity. The model relies on the application of a geometric renormalization protocol which decreases the resolution by coarse-graining and averaging over short similarity distances. Our results suggest that simple organizing principles underlie the multiscale architecture of human structural brain networks, where the same connectivity law dictates short- and long-range connections between different brain regions over many resolutions. The implications are varied and can be substantial for fundamental debates, such as whether the brain is working near a critical point, as well as for applications including advanced tools to simplify the digital reconstruction and simulation of the brain. △ Less

Submitted 4 September, 2020; v1 submitted 26 April, 2019; originally announced April 2019.

Journal ref: Proceedings of the National Academy of Sciences, 117(33), 20244-20253 (2020)

arXiv:1610.02528 [pdf, ps, other]

doi 10.1038/s41598-017-02661-9

Synchronized and mixed outbreaks of coupled recurrent epidemics

Authors: Muhua Zheng, Ming Zhao, Byungjoon Min, Zonghua Liu

Abstract: Epidemic spreading has been studied for a long time and most of them are focused on the growing aspect of a single epidemic outbreak. Recently, we extended the study to the case of recurrent epidemics (Sci. Rep. {\bf 5}, 16010 (2015)) but limited only to a single network. We here report from the real data of coupled regions or cities that the recurrent epidemics in two coupled networks are closely… ▽ More Epidemic spreading has been studied for a long time and most of them are focused on the growing aspect of a single epidemic outbreak. Recently, we extended the study to the case of recurrent epidemics (Sci. Rep. {\bf 5}, 16010 (2015)) but limited only to a single network. We here report from the real data of coupled regions or cities that the recurrent epidemics in two coupled networks are closely related to each other and can show either synchronized outbreak phase where outbreaks occur simultaneously in both networks or mixed outbreak phase where outbreaks occur in one network but do not in another one. To reveal the underlying mechanism, we present a two-layered network model of coupled recurrent epidemics to reproduce the synchronized and mixed outbreak phases. We show that the synchronized outbreak phase is preferred to be triggered in two coupled networks with the same average degree while the mixed outbreak phase is preferred for the case with different average degrees. Further, we show that the coupling between the two layers is preferred to suppress the mixed outbreak phase but enhance the synchronized outbreak phase. A theoretical analysis based on microscopic Markov-chain approach is presented to explain the numerical results. This finding opens a new window for studying the recurrent epidemics in multi-layered networks. △ Less

Submitted 25 May, 2017; v1 submitted 8 October, 2016; originally announced October 2016.

Comments: 12 pages, 6 figures

Journal ref: Scientific Reports 7,2424 (2017)

Showing 1–8 of 8 results for author: Zheng, M