Skip to main content

Showing 1–20 of 20 results for author: Peng, J

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2407.01649  [pdf, other

    q-bio.QM cs.LG

    FAFE: Immune Complex Modeling with Geodesic Distance Loss on Noisy Group Frames

    Authors: Ruidong Wu, Ruihan Guo, Rui Wang, Shitong Luo, Yue Xu, Jiahan Li, Jianzhu Ma, Qiang Liu, Yunan Luo, Jian Peng

    Abstract: Despite the striking success of general protein folding models such as AlphaFold2(AF2, Jumper et al. (2021)), the accurate computational modeling of antibody-antigen complexes remains a challenging task. In this paper, we first analyze AF2's primary loss function, known as the Frame Aligned Point Error (FAPE), and raise a previously overlooked issue that FAPE tends to face gradient vanishing probl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.04628  [pdf, other

    cs.CE q-bio.QM

    Projecting Molecules into Synthesizable Chemical Spaces

    Authors: Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, Jianzhu Ma

    Abstract: Discovering new drug molecules is a pivotal yet challenging process due to the near-infinitely large chemical space and notorious demands on time and resources. Numerous generative models have recently been introduced to accelerate the drug discovery process, but their progression to experimental validation remains limited, largely due to a lack of consideration for synthetic accessibility in prac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2406.00735  [pdf, other

    q-bio.BM cs.AI cs.LG

    Full-Atom Peptide Design based on Multi-modal Flow Matching

    Authors: Jiahan Li, Chaoran Cheng, Zuofan Wu, Ruihan Guo, Shitong Luo, Zhizhou Ren, Jian Peng, Jianzhu Ma

    Abstract: Peptides, short chains of amino acid residues, play a vital role in numerous biological processes by interacting with other target molecules, offering substantial potential in drug discovery. In this work, we present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors. Drawing inspi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  4. arXiv:2404.15805  [pdf, other

    q-bio.BM cs.LG

    Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering

    Authors: Shujian Jiao, Bingxuan Li, Lei Wang, Xiao** Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei

    Abstract: Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  5. arXiv:2403.07902  [pdf, other

    q-bio.BM cs.LG

    DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design

    Authors: Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, Quanquan Gu

    Abstract: Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the… ▽ More

    Submitted 26 February, 2024; originally announced March 2024.

    Comments: Accepted to ICML 2023

  6. arXiv:2312.10519  [pdf, other

    q-bio.GN q-bio.QM

    Interpretable Online Network Dictionary Learning for Inferring Long-Range Chromatin Interactions

    Authors: Vishal Rana, Jianhao Peng, Chao Pan, Hanbaek Lyu, Albert Cheng, Minji Kim, Olgica Milenkovic

    Abstract: Dictionary learning (DL) is commonly used in computational biology to tackle ubiquitous clustering problems due to its conceptual simplicity and relatively low computational complexity. However, DL algorithms produce results that lack interpretability and are not optimized for large-scale graph-structured data. We propose a novel DL algorithm called online convex network dictionary learning (onlin… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  7. arXiv:2303.03543  [pdf, other

    q-bio.BM cs.LG

    3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction

    Authors: Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, Jianzhu Ma

    Abstract: Rich data and powerful machine learning models allow us to design drugs for a specific protein target \textit{in silico}. Recently, the inclusion of 3D structures during targeted drug design shows superior performance to other target-free models as the atomic interaction in the 3D space is explicitly modeled. However, current 3D target-aware models either rely on the voxelized atom densities or th… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to ICLR 2023

  8. arXiv:2205.07249  [pdf, other

    cs.LG q-bio.BM

    Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets

    Authors: Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, Jianzhu Ma

    Abstract: Deep generative models have achieved tremendous success in designing novel drug molecules in recent years. A new thread of works have shown the great potential in advancing the specificity and success rate of in silico drug design by considering the structure of protein pockets. This setting posts fundamental computational challenges in sampling new chemical compounds that could satisfy multiple g… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.

    Comments: ICML 2022 accepted

  9. arXiv:2203.10446  [pdf, other

    q-bio.BM cs.LG

    A 3D Generative Model for Structure-Based Drug Design

    Authors: Shitong Luo, Jiaqi Guan, Jianzhu Ma, Jian Peng

    Abstract: We study a fundamental problem in structure-based drug design -- generating molecules that bind to specific protein binding sites. While we have witnessed the great success of deep generative models in drug design, the existing methods are mostly string-based or graph-based. They are limited by the lack of spatial information and thus unable to be applied to structure-based design tasks. Particula… ▽ More

    Submitted 12 November, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: Accepted to NeurIPS 2021

  10. arXiv:2203.00854  [pdf, other

    cs.LG cs.AI cs.DC q-bio.QM

    FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

    Authors: Shenggan Cheng, Xuanlei Zhao, Guangyang Lu, Jiarui Fang, Zhongming Yu, Tian Zheng, Ruidong Wu, Xiwen Zhang, Jian Peng, Yang You

    Abstract: Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of the AlphaFold model are challenging due to its high com… ▽ More

    Submitted 5 February, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

  11. arXiv:2008.12473  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Pre-training of Graph Neural Network for Modeling Effects of Mutations on Protein-Protein Binding Affinity

    Authors: Xianggen Liu, Yunan Luo, Sen Song, Jian Peng

    Abstract: Modeling the effects of mutations on the binding affinity plays a crucial role in protein engineering and drug design. In this study, we develop a novel deep learning based framework, named GraphPPI, to predict the binding affinity changes upon mutations based on the features provided by a graph neural network (GNN). In particular, GraphPPI first employs a well-designed pre-training scheme to enfo… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

  12. arXiv:2004.04258  [pdf, other

    stat.AP q-bio.NC

    Estimating Fiber Orientation Distribution through Blockwise Adaptive Thresholding with Application to HCP Young Adults Data

    Authors: Seungyong Hwang, Thomas C. M. Lee, Debashis Paul, Jie Peng

    Abstract: Due to recent technological advances, large brain imaging data sets can now be collected. Such data are highly complex so extraction of meaningful information from them remains challenging. Thus, there is an urgent need for statistical procedures that are computationally scalable and can provide accurate estimates that capture the neuronal structures and their functionalities. We propose a fast me… ▽ More

    Submitted 28 June, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

  13. arXiv:1904.06183  [pdf, other

    q-bio.NC

    Angular Velocity Estimation of Image Motion Mimicking the Honeybee Tunnel Centring Behaviour

    Authors: Huatian Wang, Qinbing Fu, Hongxin Wang, Jigen Peng, Paul Baxter, Cheng Hu, Shigang Yue

    Abstract: Insects use visual information to estimate angular velocity of retinal image motion, which determines a variety of flight behaviours including speed regulation, tunnel centring and visual navigation. For angular velocity estimation, honeybees show large spatial-independence against visual stimuli, whereas the previous models have not fulfilled such an ability. To address this issue, we propose a b… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: 7 pages, 8 figures, conference, IEEE format. arXiv admin note: text overlap with arXiv:1904.02356

  14. arXiv:1604.03244  [pdf, other

    q-bio.MN cs.DS

    Complexes Detection in Biological Networks via Diversified Dense Subgraphs Mining

    Authors: Xiuli Ma, Guangyu Zhou, **g**g Wang, Jian Peng, Jiawei Han

    Abstract: Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interacting patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform the job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Id… ▽ More

    Submitted 12 April, 2016; originally announced April 2016.

  15. arXiv:1604.02699  [pdf, other

    q-bio.QM q-bio.GN

    Low-density locality-sensitive hashing boosts metagenomic binning

    Authors: Yunan Luo, Jianyang Zeng, Bonnie Berger, Jian Peng

    Abstract: Metagenomic binning is an essential task in analyzing metagenomic sequence datasets. To analyze structure or function of microbial communities from environmental samples, metagenomic sequence fragments are assigned to their taxonomic origins. Although sequence alignment algorithms can readily be used and usually provide high-resolution alignments and accurate binning results, the computational cos… ▽ More

    Submitted 10 April, 2016; originally announced April 2016.

    Comments: RECOMB 2016. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than the one in the PDF file

  16. arXiv:1512.00843  [pdf

    q-bio.BM cs.LG q-bio.QM

    Protein secondary structure prediction using deep convolutional neural fields

    Authors: Sheng Wang, Jian Peng, Jianzhu Ma, **bo Xu

    Abstract: Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extensi… ▽ More

    Submitted 10 December, 2015; v1 submitted 1 December, 2015; originally announced December 2015.

  17. arXiv:1506.05538  [pdf, other

    q-bio.GN

    Bermuda: Bidirectional de novo assembly of transcripts with new insights for handling uneven coverage

    Authors: Qingming Tang, Sheng Wang, Jian Peng, Jianzhu Ma, **bo Xu

    Abstract: Motivation: RNA-seq has made feasible the analysis of a whole set of expressed mRNAs. Map**-based assembly of RNA-seq reads sometimes is infeasible due to lack of high-quality references. However, de novo assembly is very challenging due to uneven expression levels among transcripts and also the read coverage variation within a single transcript. Existing methods either apply de Bruijn graphs of… ▽ More

    Submitted 17 June, 2015; originally announced June 2015.

  18. arXiv:1504.05467  [pdf, ps, other

    q-bio.BM

    iTreePack: Protein Complex Side-Chain Packing by Dual Decomposition

    Authors: Jian Peng, Raghavendra Hosur, Bonnie Berger, **bo Xu

    Abstract: Protein side-chain packing is a critical component in obtaining the 3D coordinates of a structure and drug discovery. Single-domain protein side-chain packing has been thoroughly studied. A major challenge in generalizing these methods to protein complexes is that they, unlike monomers, often have very large treewidth, and thus algorithms such as TreePack cannot be directly applied. To address thi… ▽ More

    Submitted 21 April, 2015; originally announced April 2015.

  19. arXiv:1504.02719  [pdf, other

    q-bio.MN cs.LG cs.SI stat.ML

    Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks

    Authors: Hyunghoon Cho, Bonnie Berger, Jian Peng

    Abstract: Complex biological systems have been successfully modeled by biochemical and genetic interaction networks, typically gathered from high-throughput (HTP) data. These networks can be used to infer functional relationships between genes or proteins. Using the intuition that the topological role of a gene in a network relates to its biological function, local or diffusion based "guilt-by-association"… ▽ More

    Submitted 10 April, 2015; originally announced April 2015.

    Comments: RECOMB 2015

  20. arXiv:1306.4420  [pdf

    q-bio.BM

    Statistical inference for template-based protein structure prediction

    Authors: Jian Peng

    Abstract: Protein structure prediction is one of the most important problems in computational biology. The most successful computational approach, also called template-based modeling, identifies templates with solved crystal structures for the query proteins and constructs three dimensional models based on sequence/structure alignments. Although substantial effort has been made to improve protein sequence a… ▽ More

    Submitted 19 June, 2013; originally announced June 2013.