Skip to main content

Showing 1–10 of 10 results for author: Ji, Y

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2405.11735  [pdf, other

    q-bio.GN

    Accurate and efficient protein embedding using multi-teacher distillation learning

    Authors: Jiayu Shang, Cheng Peng, Yongxin Ji, Jiaojiao Guan, Dehan Cai, Xubo Tang, Yanni Sun

    Abstract: Motivation: Protein embedding, which represents proteins as numerical vectors, is a crucial step in various learning-based protein annotation/classification problems, including gene ontology prediction, protein-protein interaction prediction, and protein structure prediction. However, existing protein embedding methods are often computationally expensive due to their large number of parameters, wh… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 3 pages; 1 figure

  2. arXiv:2306.15006  [pdf, other

    q-bio.GN cs.AI cs.CE cs.CL

    DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome

    Authors: Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, Han Liu

    Abstract: Decoding the linguistic intricacies of the genome is a crucial problem in biology, and pre-trained foundational models such as DNABERT and Nucleotide Transformer have made significant strides in this area. Existing works have largely hinged on k-mer, fixed-length permutations of A, T, C, and G, as the token of the genome language due to its simplicity. However, we argue that the computation and sa… ▽ More

    Submitted 18 March, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted by ICLR 2024

  3. arXiv:2305.15156  [pdf, other

    q-bio.BM cs.CE cs.LG

    SyNDock: N Rigid Protein Docking via Learnable Group Synchronization

    Authors: Yuanfeng Ji, Yatao Bian, Guoji Fu, Peilin Zhao, ** Luo

    Abstract: The regulation of various cellular processes heavily relies on the protein complexes within a living cell, necessitating a comprehensive understanding of their three-dimensional structures to elucidate the underlying mechanisms. While neural docking techniques have exhibited promising outcomes in binary protein docking, the application of advanced neural architectures to multimeric protein docking… ▽ More

    Submitted 24 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  4. arXiv:2201.09637  [pdf, other

    cs.LG cs.AI q-bio.QM

    DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

    Authors: Yuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Long-Kai Huang, Tingyang Xu, Yu Rong, Lanqing Li, Jie Ren, Ding Xue, Houtim Lai, Shaoyong Xu, **g Feng, Wei Liu, ** Luo, Shuigeng Zhou, Junzhou Huang, Peilin Zhao, Yatao Bian

    Abstract: AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise},… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: 54 pages, 11 figures

  5. arXiv:2006.05581  [pdf, other

    stat.ME q-bio.PE stat.AP

    Semiparametric Bayesian Inference for the Transmission Dynamics of COVID-19 with a State-Space Model

    Authors: Tianjian Zhou, Yuan Ji

    Abstract: The outbreak of Coronavirus Disease 2019 (COVID-19) is an ongoing pandemic affecting over 200 countries and regions. Inference about the transmission dynamics of COVID-19 can provide important insights into the speed of disease spread and the effects of mitigation policies. We develop a novel Bayesian approach to such inference based on a probabilistic compartmental model using data of daily confi… ▽ More

    Submitted 2 July, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

  6. arXiv:1708.05475  [pdf

    q-bio.QM

    Map-based cloning of the gene Pm21 that confers broad spectrum resistance to wheat powdery mildew

    Authors: Huagang He, Shanying Zhu, Yaoyong Ji, Zhengning Jiang, Renhui Zhao, Tongde Bie

    Abstract: Common wheat (Triticum aestivum L.) is one of the most important cereal crops. Wheat powdery mildew caused by Blumeria graminis f. sp. tritici (Bgt) is a continuing threat to wheat production. The Pm21 gene, originating from Dasypyrum villosum, confers high resistance to all known Bgt races and has been widely applied in wheat breeding in China. In this research, we identify Pm21 as a typical coil… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

  7. arXiv:1509.04026  [pdf, ps, other

    stat.AP q-bio.GN q-bio.PE

    A Bayesian feature allocation model for tumor heterogeneity

    Authors: Juhee Lee, Peter Müller, Kamalakar Gulukota, Yuan Ji

    Abstract: We develop a feature allocation model for inference on genetic tumor variation using next-generation sequencing data. Specifically, we record single nucleotide variants (SNVs) based on short reads mapped to human reference genome and characterize tumor heterogeneity by latent haplotypes defined as a scaffold of SNVs on the same homologous genome. For multiple samples from a single tumor, assuming… ▽ More

    Submitted 14 September, 2015; originally announced September 2015.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS817 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS817

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 621-639

  8. arXiv:1409.7158  [pdf, other

    stat.ME q-bio.GN

    Bayesian Inference for Tumor Subclones Accounting for Sequencing and Structural Variants

    Authors: Juhee Lee, Peter Mueller, Subhajit Sengupta, Kamalakar Gulukota, Yuan Ji

    Abstract: Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise… ▽ More

    Submitted 25 September, 2014; originally announced September 2014.

    Comments: 26 pages, 11 figures

  9. The prion-like folding behavior in aggregated proteins

    Authors: Yong-Yun Ji, You-Quan Li, Jun-Wen Mao, Xiao-Wei Tang

    Abstract: We investigate the folding behavior of protein sequences by numerically studying all sequences with maximally compact lattice model through exhaustive enumeration. We get the prion-like behavior of protein folding. Individual proteins remaining stable in the isolated native state may change their conformations when they aggregate. We observe the folding properties as the interfacial interaction… ▽ More

    Submitted 9 June, 2005; originally announced June 2005.

    Comments: 7 pages, 6 figures

    Journal ref: Physical Review E 72, 041912 (2005), Virtual Journal of Biological Physics Research(October 15, 2005)

  10. arXiv:q-bio/0408024  [pdf, ps, other

    q-bio.BM cond-mat.soft

    Medium effects on the selection of sequences folding into stable proteins in a simple model

    Authors: You-Quan Li, Yong-Yun Ji, Jun-Wen Mao, Xiao-Wei Tang

    Abstract: We study the medium effects on the selection of sequences in protein folding by taking account of the surface potential in HP-model. Our analysis on the proportion of H and P monomers in the sequences gives a direct interpretation that the lowly designable structures possess small average gap. The numerical calculation by means of our model exhibits that the surface potential enhances the averag… ▽ More

    Submitted 27 August, 2004; originally announced August 2004.

    Comments: 4 pages, 4 figures

    Journal ref: Phys. Rev. E 72, 021904 (2005)