Search | arXiv e-print repository

arXiv:2405.17433 [pdf]

ScAtt: an Attention based architecture to analyze Alzheimer's disease at cell type level from single-cell RNA-sequencing data

Authors: Xiaoxia Liu, Robert R Butler III, Prashnna K Gyawali, Frank M Longo, Zihuai He

Abstract: Alzheimer's disease (AD) is a pervasive neurodegenerative disorder that leads to memory and behavior impairment severe enough to interfere with daily life activities. Understanding this disease pathogenesis can drive the development of new targets and strategies to prevent and treat AD. Recent advances in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation… ▽ More Alzheimer's disease (AD) is a pervasive neurodegenerative disorder that leads to memory and behavior impairment severe enough to interfere with daily life activities. Understanding this disease pathogenesis can drive the development of new targets and strategies to prevent and treat AD. Recent advances in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation of massive amounts of transcriptomic data at the single-cell level provided remarkable insights into understanding the molecular pathogenesis of Alzheimer's disease. In this study, we introduce ScAtt, an innovative Attention-based architecture, devised specifically for the concurrent identification of cell-type specific AD-related genes and their associated gene regulatory network. ScAtt incorporates a flexible model capable of capturing nonlinear effects, leading to the detection of AD-associated genes that might be overlooked by traditional differentially expressed gene (DEG) analyses. Moreover, ScAtt effectively infers a gene regulatory network depicting the combined influences of genes on the targeted disease, as opposed to examining correlations among genes in conventional gene co-expression networks. In an application to 95,186 single-nucleus transcriptomes from 17 hippocampus samples, ScAtt shows substantially better performance in modeling quantitative changes in expression levels between AD and healthy controls. Consequently, ScAtt performs better than existing methods in the identification of AD-related genes, with more unique discoveries and less overlap between cell types. Functional enrichments of the corresponding gene modules detected from gene regulatory network show significant enrichment of biologically meaningful AD-related pathways across different cell types. △ Less

Submitted 12 March, 2024; originally announced May 2024.

arXiv:2402.19095 [pdf]

A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration

Authors: Yanlin Zhou, Kai Tan, Xinyu Shen, Zheng He, Haotian Zheng

Abstract: Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although t… ▽ More Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although the accuracy of protein secondary structure prediction has continuously improved with the development of machine learning and deep learning, progress in the field of protein structure prediction, unfortunately, remains insufficient to meet the large demand for protein information. Therefore, based on the advantages of deep learning-based methods in feature extraction and learning ability, this paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction. The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization. △ Less

Submitted 8 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.15091 [pdf, other]

Mixed strategy approach destabilizes cooperation in finite populations with clustering coefficient

Authors: Zehua Si, Zhixue He, Chen Shen, Jun Tanimoto

Abstract: Evolutionary game theory, encompassing discrete, continuous, and mixed strategies, is pivotal for understanding cooperation dynamics. Discrete strategies involve deterministic actions with a fixed probability of one, whereas continuous strategies employ intermediate probabilities to convey the extent of cooperation and emphasize expected payoffs. Mixed strategies, though akin to continuous ones, c… ▽ More Evolutionary game theory, encompassing discrete, continuous, and mixed strategies, is pivotal for understanding cooperation dynamics. Discrete strategies involve deterministic actions with a fixed probability of one, whereas continuous strategies employ intermediate probabilities to convey the extent of cooperation and emphasize expected payoffs. Mixed strategies, though akin to continuous ones, calculate immediate payoffs based on the action chosen at a given moment within intermediate probabilities. Although previous research has highlighted the distinct impacts of these strategic approaches on fostering cooperation, the reasons behind the differing levels of cooperation among these approaches have remained somewhat unclear. This study explores how these strategic approaches influence cooperation in the context of the prisoner's dilemma game, particularly in networked populations with varying clustering coefficients. Our research goes beyond existing studies by revealing that the differences in cooperation levels between these strategic approaches are not confined to finite populations; they also depend on the clustering coefficients of these populations. In populations with nonzero clustering coefficients, we observed varying degrees of stable cooperation for each strategic approach across multiple simulations, with mixed strategies showing the most variability, followed by continuous and discrete strategies. However, this variability in cooperation evolution decreased in populations with a clustering coefficient of zero, narrowing the differences in cooperation levels among the strategies. These findings suggest that in more realistic settings, the robustness of cooperation systems may be compromised, as the evolution of cooperation through mixed and continuous strategies introduces a degree of unpredictability. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.12724 [pdf, other]

Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression

Authors: Zhaomeng Chen, Zihuai He, Benjamin B. Chu, Jiaqi Gu, Tim Morrison, Chiara Sabatti, Emmanuel Candès

Abstract: Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to a… ▽ More Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs (He et al. [2022]) and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10551 [pdf, other]

Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information

Authors: Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt, Anand D. Jeyasekharan, Vaibhav Rajan

Abstract: Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall… ▽ More Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are challenging to build due to limited labelled patient data. Previous methods to address this problem have used various forms of transfer learning. However, they do not explicitly model the variable length sequential structure of the list of mutations in such diagnostic panels. Further, they do not utilize auxiliary information (like patient survival) for model training. We address these limitations through a novel transformer based method, which surpasses the performance of state-of-the-art DRP models on benchmark data. We also present the design of a treatment recommendation system (TRS), which is currently deployed at the National University Hospital, Singapore and is being evaluated in a clinical trial. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2310.15069 [pdf, other]

Second-order group knockoffs with applications to GWAS

Authors: Benjamin B Chu, Jiaqi Gu, Zhaomeng Chen, Tim Morrison, Emmanuel Candes, Zihuai He, Chiara Sabatti

Abstract: Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying… ▽ More Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct "group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. The described algorithms are implemented in an open-source Julia package Knockoffs.jl, for which both R and Python wrappers are available. △ Less

Submitted 3 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: 46 pages, 10 figures, 2 tables, 3 algorithms

arXiv:2306.08907 [pdf]

MCPI: Integrating Multimodal Data for Enhanced Prediction of Compound Protein Interactions

Authors: Li Zhang, Wenhao Li, Haotian Guan, Zhiquan He, Mingjun Cheng, Han Wang

Abstract: The identification of compound-protein interactions (CPI) plays a critical role in drug screening, drug repurposing, and combination therapy studies. The effectiveness of CPI prediction relies heavily on the features extracted from both compounds and target proteins. While various prediction methods employ different feature combinations, both molecular-based and network-based models encounter the… ▽ More The identification of compound-protein interactions (CPI) plays a critical role in drug screening, drug repurposing, and combination therapy studies. The effectiveness of CPI prediction relies heavily on the features extracted from both compounds and target proteins. While various prediction methods employ different feature combinations, both molecular-based and network-based models encounter the common obstacle of incomplete feature representations. Thus, a promising solution to this issue is to fully integrate all relevant CPI features. This study proposed a novel model named MCPI, which is designed to improve the prediction performance of CPI by integrating multiple sources of information, including the PPI network, CCI network, and structural features of CPI. The results of the study indicate that the MCPI model outperformed other existing methods for predicting CPI on public datasets. Furthermore, the study has practical implications for drug development, as the model was applied to search for potential inhibitors among FDA-approved drugs in response to the SARS-CoV-2 pandemic. The prediction results were then validated through the literature, suggesting that the MCPI model could be a useful tool for identifying potential drug candidates. Overall, this study has the potential to advance our understanding of CPI and guide drug development efforts. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 12 pages, 9 figures

arXiv:2211.13943 [pdf, other]

doi 10.1098/rsif.2024.0019

Simple bots breed social punishment in humans

Authors: Chen Shen, Zhixue He, Lei Shi, Zhen Wang, Jun Tanimoto

Abstract: Costly punishment has been suggested as a key mechanism for stabilizing cooperation in one-shot games. However, recent studies have revealed that the effectiveness of costly punishment can be diminished by second-order free riders (i.e., cooperators who never punish defectors) and antisocial punishers (i.e., defectors who punish cooperators). In a two-stage prisoner's dilemma game, players not onl… ▽ More Costly punishment has been suggested as a key mechanism for stabilizing cooperation in one-shot games. However, recent studies have revealed that the effectiveness of costly punishment can be diminished by second-order free riders (i.e., cooperators who never punish defectors) and antisocial punishers (i.e., defectors who punish cooperators). In a two-stage prisoner's dilemma game, players not only need to choose between cooperation and defection in the first stage, but also need to decide whether to punish their opponent in the second stage. Here, we extend the theory of punishment in one-shot games by introducing simple bots, who consistently choose prosocial punishment and do not change their actions over time. We find that this simple extension of the game allows prosocial punishment to dominate in well-mixed and networked populations, and that the minimum fraction of bots required for the dominance of prosocial punishment monotonically increases with increasing dilemma strength. Furthermore, if humans possess a learning bias toward a "copy the majority" rule or if bots are present at higher degree nodes in scale-free networks, the fully dominance of prosocial punishment is still possible at a high dilemma strength. These results indicate that introducing bots can be a significant factor in establishing prosocial punishment. We therefore, provide a novel explanation for the evolution of prosocial punishment, and note that the contrasting results that emerge from the introduction of different types of bots also imply that the design of the bots matters. △ Less

Submitted 28 November, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: 12 pages, 4 figures

Journal ref: Journal of the Royal Society Interface 21 (2024) 20240019

arXiv:2205.10605 [pdf, other]

Brain Cortical Functional Gradients Predict Cortical Folding Patterns via Attention Mesh Convolution

Authors: Li Yang, Zhibin He, Changhe Li, Junwei Han, Dajiang Zhu, Tianming Liu, Tuo Zhang

Abstract: Since gyri and sulci, two basic anatomical building blocks of cortical folding patterns, were suggested to bear different functional roles, a precise map** from brain function to gyro-sulcal patterns can provide profound insights into both biological and artificial neural networks. However, there lacks a generic theory and effective computational model so far, due to the highly nonlinear relatio… ▽ More Since gyri and sulci, two basic anatomical building blocks of cortical folding patterns, were suggested to bear different functional roles, a precise map** from brain function to gyro-sulcal patterns can provide profound insights into both biological and artificial neural networks. However, there lacks a generic theory and effective computational model so far, due to the highly nonlinear relation between them, huge inter-individual variabilities and a sophisticated description of brain function regions/networks distribution as mosaics, such that spatial patterning of them has not been considered. we adopted brain functional gradients derived from resting-state fMRI to embed the "gradual" change of functional connectivity patterns, and developed a novel attention mesh convolution model to predict cortical gyro-sulcal segmentation maps on individual brains. The convolution on mesh considers the spatial organization of functional gradients and folding patterns on a cortical sheet and the newly designed channel attention block enhances the interpretability of the contribution of different functional gradients to cortical folding prediction. Experiments show that the prediction performance via our model outperforms other state-of-the-art models. In addition, we found that the dominant functional gradients contribute less to folding prediction. On the activation maps of the last layer, some well-studied cortical landmarks are found on the borders of, rather than within, the highly activated regions. These results and findings suggest that a specifically designed artificial neural network can improve the precision of the map** between brain functions and cortical folding patterns, and can provide valuable insight of brain anatomy-function relation for neuroscience. △ Less

Submitted 21 May, 2022; originally announced May 2022.

arXiv:1812.05072 [pdf]

Building Computational Models to Predict One-Year Mortality in ICU Patients with Acute Myocardial Infarction and Post Myocardial Infarction Syndrome

Authors: Laura A. Barrett, Seyedeh Neelufar Payrovnaziri, Jiang Bian, Zhe He

Abstract: Heart disease remains the leading cause of death in the United States. Compared with risk assessment guidelines that require manual calculation of scores, machine learning-based prediction for disease outcomes such as mortality can be utilized to save time and improve prediction accuracy. This study built and evaluated various machine learning models to predict one-year mortality in patients diagn… ▽ More Heart disease remains the leading cause of death in the United States. Compared with risk assessment guidelines that require manual calculation of scores, machine learning-based prediction for disease outcomes such as mortality can be utilized to save time and improve prediction accuracy. This study built and evaluated various machine learning models to predict one-year mortality in patients diagnosed with acute myocardial infarction or post myocardial infarction syndrome in the MIMIC-III database. The results of the best performing shallow prediction models were compared to a deep feedforward neural network (Deep FNN) with back propagation. We included a cohort of 5436 admissions. Six datasets were developed and compared. The models applying Logistic Model Trees (LMT) and Simple Logistic algorithms to the combined dataset resulted in the highest prediction accuracy at 85.12% and the highest AUC at .901. In addition, other factors were observed to have an impact on outcomes as well. △ Less

Submitted 12 December, 2018; originally announced December 2018.

arXiv:1705.03998 [pdf, other]

Mining Functional Modules by Multiview-NMF of Phenome-Genome Association

Authors: YaoGong Zhang, YingJie Xu, Xin Fan, YuXiang Hong, Jiahui Liu, ZhiCheng He, YaLou Huang, MaoQiang Xie

Abstract: Background: Mining gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. In this work, we explore the plausibility of detecting gene modules by factorizing gene-phenotype associations from a phenotype ontology rather than the conventionally used gene expression data. In particular, the hierarchical structure… ▽ More Background: Mining gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. In this work, we explore the plausibility of detecting gene modules by factorizing gene-phenotype associations from a phenotype ontology rather than the conventionally used gene expression data. In particular, the hierarchical structure of ontology has not been sufficiently utilized in clustering genes while functionally related genes are consistently associated with phenotypes on the same path in the phenotype ontology. Results: We propose a hierarchal Nonnegative Matrix Factorization (NMF)-based method, called Consistent Multiple Nonnegative Matrix Factorization (CMNMF), to factorize genome-phenome association matrix at two levels of the hierarchical structure in phenotype ontology for mining gene functional modules. CMNMF constrains the gene clusters from the association matrices at two consecutive levels to be consistent since the genes are annotated with both the child phenotype and the parent phenotype in the consecutive levels. CMNMF also restricts the identified phenotype clusters to be densely connected in the phenotype ontology hierarchy. In the experiments on mining functionally related genes from mouse phenotype ontology and human phenotype ontology, CMNMF effectively improved clustering performance over the baseline methods. Gene ontology enrichment analysis was also conducted to reveal interesting gene modules. Conclusions: Utilizing the information in the hierarchical structure of phenotype ontology, CMNMF can identify functional gene modules with more biological significance than the conventional methods. CMNMF could also be a better tool for predicting members of gene pathways and protein-protein interactions. Availability: https://github.com/nkiip/CMNMF △ Less

Submitted 10 May, 2017; originally announced May 2017.

arXiv:1703.02386 [pdf, ps, other]

A quantum dynamic belief decision making model

Authors: Zichang He, Wen Jiang

Abstract: The sure thing principle and the law of total probability are basic laws in classic probability theory. A disjunction fallacy leads to the violation of these two classical probability laws. In this paper, a new quantum dynamic belief decision making model based on quantum dynamic modelling and Dempster-Shafer (D-S) evidence theory is proposed to address this issue and model the real human decision… ▽ More The sure thing principle and the law of total probability are basic laws in classic probability theory. A disjunction fallacy leads to the violation of these two classical probability laws. In this paper, a new quantum dynamic belief decision making model based on quantum dynamic modelling and Dempster-Shafer (D-S) evidence theory is proposed to address this issue and model the real human decision-making process. Some mathematical techniques are borrowed from quantum mathematics. Generally, belief and action are two parts in a decision making process. The uncertainty in belief part is represented by a superposition of certain states. The uncertainty in actions is represented as an extra uncertainty state. The interference effect is produced due to the entanglement between beliefs and actions. Basic probability assignment (BPA) of decisions is generated by quantum dynamic modelling. Then BPA of the extra uncertain state and an entanglement degree defined by an entropy function named Deng entropy are used to measure the interference effect. Compared the existing model, the number of free parameters is less in our model. Finally, a classical categorization decision-making experiment is illustrated to show the effectiveness of our model. △ Less

Submitted 6 March, 2017; originally announced March 2017.

Comments: 37 pages

arXiv:1612.05749 [pdf, ps, other]

Constructing backbone network by using tinker algorithm

Authors: Zhiwei He, Meng Zhan, Jianxiong Wang, Chenggui Yao

Abstract: Revealing how a biological network is organized to realize its function is one of the main topics in systems biology. The functional backbone network, defined as the primary structure of the biological network, is of great importance in maintaining the main function of the biological network. We propose a new algorithm, the tinker algorithm, to determine this core structure and apply it in the cel… ▽ More Revealing how a biological network is organized to realize its function is one of the main topics in systems biology. The functional backbone network, defined as the primary structure of the biological network, is of great importance in maintaining the main function of the biological network. We propose a new algorithm, the tinker algorithm, to determine this core structure and apply it in the cell-cycle system. With this algorithm, the backbone network of the cell-cycle network can be determined accurately and efficiently in various models such as the Boolean model, stochastic model, and ordinary differential equation model. Results show that our algorithm is more efficient than that used in the previous research. We hope this method can be put into practical use in relevant future studies. △ Less

Submitted 17 December, 2016; originally announced December 2016.

arXiv:1611.09544 [pdf]

Tip-enhanced Raman spectroscopic detection of aptamers

Authors: Siyu He, Hongyuan Li, Zhe He, Dmitri V. Voronine

Abstract: Single molecule detection, sequencing and conformational map** of aptamers are important for improving medical and biosensing technologies and for better understanding of biological processes at the molecular level. We obtain vibrational signals of single aptamers immobilized on gold substrates using tip-enhanced Raman spectroscopy (TERS). We compare topographic and optical signals and investiga… ▽ More Single molecule detection, sequencing and conformational map** of aptamers are important for improving medical and biosensing technologies and for better understanding of biological processes at the molecular level. We obtain vibrational signals of single aptamers immobilized on gold substrates using tip-enhanced Raman spectroscopy (TERS). We compare topographic and optical signals and investigate the fluctuations of the position-dependent TERS spectra. TERS map** provides information about the chemical composition and conformation of aptamers, and paves the way to future single-molecule label-free sequencing. △ Less

Submitted 29 November, 2016; originally announced November 2016.

arXiv:1505.01204 [pdf]

doi 10.1002/gepi.21864

A Weighted U Statistic for Genetic Association Analyses of Sequencing Data

Authors: Changshuai Wei, Ming Li, Zihuai He, Olga Vsevolozhskaya, Daniel J. Schaid, Qing Lu

Abstract: With advancements in next generation sequencing technology, a massive amount of sequencing data are generated, offering a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, this poses a great challenge for the statistical analysis of high-dimensional sequencing data. The association analyses based on traditional sta… ▽ More With advancements in next generation sequencing technology, a massive amount of sequencing data are generated, offering a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, this poses a great challenge for the statistical analysis of high-dimensional sequencing data. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a weighted U statistic, referred to as WU-seq, for the high-dimensional association analysis of sequencing data. Based on a non-parametric U statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used SKAT method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-seq to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. △ Less

Submitted 5 May, 2015; originally announced May 2015.

Journal ref: Genet Epidemiol. 2014 Dec;38(8):699-708

arXiv:1306.0025 [pdf]

Genetic Complexity in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin

Authors: Soo-Young Park, Michael Z. Ludwig, Natalia A. Tamarina, Bin Z. He, Sarah H. Carl, Desiree A. Dickerson, Levi Barse, Bharath Arun, Calvin Williams, Cecelia M. Miles, Louis H. Philipson, Donald F. Steiner, Graeme I. Bell, Martin Kreitman

Abstract: Here we use Drosophila melanogaster to create a genetic model of human permanent neonatal diabetes mellitus and present experimental results describing dimensions of this complexity. The approach involves the transgenic expression of a misfolded mutant of human preproinsulin, hINSC96Y, which is a cause of the disease. When expressed in fly imaginal discs, hINSC96Y causes a reduction of adult struc… ▽ More Here we use Drosophila melanogaster to create a genetic model of human permanent neonatal diabetes mellitus and present experimental results describing dimensions of this complexity. The approach involves the transgenic expression of a misfolded mutant of human preproinsulin, hINSC96Y, which is a cause of the disease. When expressed in fly imaginal discs, hINSC96Y causes a reduction of adult structures, including the eye, wing and notum. Eye imaginal discs exhibit defects in both the structure and arrangement of ommatidia. In the wing, expression of hINSC96Y leads to ectopic expression of veins and mechano-sensory organs, indicating disruption of wild type signaling processes regulating cell fates. These readily measurable disease phenotypes are sensitive to temperature, gene dose and sex. Mutant (but not wild type) proinsulin expression in the eye imaginal disc induces IRE1-mediated Xbp1 alternative splicing, a signal for endoplasmic reticulum stress response activation, and produces global change in gene expression. Mutant hINS transgene tester strains, when crossed to stocks from the Drosophila Genetic Reference Panel produces F1 adults with a continuous range of disease phenotypes and large broad-sense heritability. Surprisingly, the severity of mutant hINS-induced disease in the eye is not correlated with that in the notum in these crosses, nor with eye reduction phenotypes caused by the expression of two dominant eye mutants acting in two different eye development pathways, Drop (Dr) or Lobe (L) when crossed into the same genetic backgrounds. The tissue specificity of genetic variability for mutant hINS-induced disease thus has its own distinct signature. The genetic dominance of disease-specific phenotypic variability makes this approach amenable to genome-wide association study (GWAS) in a simple F1 screen of natural variation. △ Less

Submitted 31 May, 2013; originally announced June 2013.

Comments: 60 pages; 6 figures; 8 supporting figures; 11 supporting tables

arXiv:1305.5319 [pdf]

doi 10.1534/genetics.113.157800

Effect of Genetic Variation in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin

Authors: Bin Z. He, Michael Z. Ludwig, Desiree A. Dickerson, Levi Barse, Bharath Arun, Soo Young Park, Natalia A. Tamarina, Scott B. Selleck, Patricia Wittkopp, Graeme I. Bell, Martin Kreitman

Abstract: The identification and validation of gene-gene interactions is a major challenge in human studies. Here, we explore an approach for studying epistasis in humans using a Drosophila melanogaster model of neonatal diabetes mellitus. Expression of mutant preproinsulin, hINSC96Y, in the eye imaginal disc mimics the human disease activating conserved cell stress response pathways leading to cell death a… ▽ More The identification and validation of gene-gene interactions is a major challenge in human studies. Here, we explore an approach for studying epistasis in humans using a Drosophila melanogaster model of neonatal diabetes mellitus. Expression of mutant preproinsulin, hINSC96Y, in the eye imaginal disc mimics the human disease activating conserved cell stress response pathways leading to cell death and reduction in eye area. Dominant-acting variants in wild-derived inbred lines from the Drosophila Genetics Reference Panel produce a continuous, highly heritable, distribution of eye degeneration phenotypes. A genome-wide association study (GWAS) in 154 sequenced lines identified 29 candidate SNPs in 16 loci with P < 10-5 including one SNP in an intron of the gene sulfateless (sfl) which exceeded a conservative genome-wide significance threshold of P = 0.05 level (-log10 P > 7.62). RNAi knock-downs of sfl enhanced the eye degeneration phenotype in a mutant-hINS-dependent manner. sfl encodes a protein required for sulfation of the glycosaminoglycan, heparan sulfate. Two additional genes in the heparan sulfate (HS) biosynthetic pathway (tout velu, ttv and brother of tout velu, botv) also modified the eye phenotype, suggesting a link between HS-modified proteins and cellular responses to misfolded proteins. Finally, intronic variants marking the QTL were associated with decreased sfl expression, a result consistent with that predicted by RNAi studies. The ability to create a model of human genetic disease in the fly, map a QTL by GWAS to a specific gene (and noncoding variant), validate its contribution to disease with available genetic resources, and experimentally link the variant to a molecular mechanism, demonstrate the many advantages Drosophila holds in determining the genetic underpinnings of human disease. △ Less

Submitted 27 May, 2013; v1 submitted 23 May, 2013; originally announced May 2013.

Journal ref: Genetics 196 (2014) 557-567

arXiv:1211.6834 [pdf, ps, other]

On unbiased performance evaluation for protein inference

Authors: Zengyou He, Ting Huang, Peijun Zhu

Abstract: This letter is a response to the comments of Serang (2012) on Huang and He (2012) in Bioinformatics. Serang (2012) claimed that the parameters for the Fido algorithm should be specified using the grid search method in Serang et al. (2010) so as to generate a deserved accuracy in performance comparison. It seems that it is an argument on parameter tuning. However, it is indeed the issue of how to c… ▽ More This letter is a response to the comments of Serang (2012) on Huang and He (2012) in Bioinformatics. Serang (2012) claimed that the parameters for the Fido algorithm should be specified using the grid search method in Serang et al. (2010) so as to generate a deserved accuracy in performance comparison. It seems that it is an argument on parameter tuning. However, it is indeed the issue of how to conduct an unbiased performance evaluation for comparing different protein inference algorithms. In this letter, we would explain why we don't use the grid search for parameter selection in Huang and He (2012) and show that this procedure may result in an over-estimated performance that is unfair to competing algorithms. In fact, this issue has also been pointed out by Li and Radivojac (2012). △ Less

Submitted 29 November, 2012; originally announced November 2012.

arXiv:1211.6198

Running PeptideProphet Separately on Replicates Improves Peptide Identification Results

Authors: Chao Yang, Zengyou He, Weichuan Yu

Abstract: Limited spectrum coverage is a problem in shotgun proteomics. Replicates are generated to improve the spectrum coverage. When integrating peptide identification results obtained from replicates, the state-of-the-art algorithm PeptideProphet combines Peptide-Spectrum Matches (PSMs) before building the statistical model to calculate peptide probabilities. In this paper, we find the connection betw… ▽ More Limited spectrum coverage is a problem in shotgun proteomics. Replicates are generated to improve the spectrum coverage. When integrating peptide identification results obtained from replicates, the state-of-the-art algorithm PeptideProphet combines Peptide-Spectrum Matches (PSMs) before building the statistical model to calculate peptide probabilities. In this paper, we find the connection between merging results of replicates and Bagging, which is a standard routine to improve the power of statistical methods. Following Bagging's philosophy, we propose to run PeptideProphet separately on each replicate and combine the outputs to obtain the final peptide probabilities. In our experiments, we show that the proposed routine can improve PeptideProphet consistently on a standard protein dataset, a Human dataset and a Yeast dataset. △ Less

Submitted 2 December, 2012; v1 submitted 26 November, 2012; originally announced November 2012.

Comments: Due to an error

arXiv:1211.6179 [pdf, other]

A Combinatorial Perspective of the Protein Inference Problem

Authors: Chao Yang, Zengyou He, Weichuan Yu

Abstract: In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from the results of peptide identification. However, the relationship between protein identification and peptide identification… ▽ More In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from the results of peptide identification. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we are devoted to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (Protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain a closed-form formulation for protein inference. Based on our model, we study the impact of unique peptides and degenerate peptides on protein probabilities. Here, degenerate peptides are peptides shared by at least two proteins. Meanwhile, we also study the relationship of our model with other methods such as ProteinProphet. A probability confidence interval can be calculated and used together with probability to filter the protein identification result. Our method achieves competitive results with ProteinProphet in a more efficient manner in the experiment based on two datasets of standard protein mixtures and two datasets of real samples. We name our program ProteinInfer. Its Java source code is available at http://bioinformatics.ust.hk/proteininfer △ Less

Submitted 28 November, 2012; v1 submitted 26 November, 2012; originally announced November 2012.

arXiv:1210.2515 [pdf, ps, other]

Protein Inference and Protein Quantification: Two Sides of the Same Coin

Authors: Ting Huang, Peijun Zhu, Zengyou He

Abstract: Motivation: In mass spectrometry-based shotgun proteomics, protein quantification and protein identification are two major computational problems. To quantify the protein abundance, a list of proteins must be firstly inferred from the sample. Then the relative or absolute protein abundance is estimated with quantification methods, such as spectral counting. Until now, researchers have been dealing… ▽ More Motivation: In mass spectrometry-based shotgun proteomics, protein quantification and protein identification are two major computational problems. To quantify the protein abundance, a list of proteins must be firstly inferred from the sample. Then the relative or absolute protein abundance is estimated with quantification methods, such as spectral counting. Until now, researchers have been dealing with these two processes separately. In fact, they are two sides of same coin in the sense that truly present proteins are those proteins with non-zero abundances. Then, one interesting question is if we regard the protein inference problem as a special protein quantification problem, is it possible to achieve better protein inference performance? Contribution: In this paper, we investigate the feasibility of using protein quantification methods to solve the protein inference problem. Protein inference is to determine whether each candidate protein is present in the sample or not. Protein quantification is to calculate the abundance of each protein. Naturally, the absent proteins should have zero abundances. Thus, we argue that the protein inference problem can be viewed as a special case of protein quantification problem: present proteins are those proteins with non-zero abundances. Based on this idea, our paper tries to use three very simple protein quantification methods to solve the protein inference problem effectively. Results: The experimental results on six datasets show that these three methods are competitive with previous protein inference algorithms. This demonstrates that it is plausible to take the protein inference problem as a special case of protein quantification, which opens the door of devising more effective protein inference algorithms from a quantification perspective. △ Less

Submitted 9 October, 2012; originally announced October 2012.

Comments: 14 Pages, This paper has submitted to RECOMB2013

arXiv:1001.0887 [pdf, ps, other]

Stable Feature Selection for Biomarker Discovery

Authors: Zengyou He, Weichuan Yu

Abstract: Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker disc… ▽ More Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development. △ Less

Submitted 6 January, 2010; originally announced January 2010.

arXiv:0801.0357 [pdf]

doi 10.1209/0295-5075/82/20003

Spatiotemporal Noise Triggering Infiltrative Tumor Growth under Immune Surveillance

Authors: Wei-Rong Zhong, Yuan-Zhi Shao, Li Li, Feng-Hua Wang, Zhen-Hui He

Abstract: A spatiotemporal noise is assumed to reflect the environmental fluctuation in a spatially extended tumor system. We introduce firstly the structure factor to reveal the invasive tumor growth quantitatively. The homogenous environment can lead to an expansive growth of the tumor cells, while the inhomogenous environment underlies an infiltrative growth. The different responses of above two cases… ▽ More A spatiotemporal noise is assumed to reflect the environmental fluctuation in a spatially extended tumor system. We introduce firstly the structure factor to reveal the invasive tumor growth quantitatively. The homogenous environment can lead to an expansive growth of the tumor cells, while the inhomogenous environment underlies an infiltrative growth. The different responses of above two cases are separated by a characteristic critical intensity of the spatiotemporal noise. Theoretical and numerical results present a close annotation to the clinical images. △ Less

Submitted 2 January, 2008; originally announced January 2008.

Comments: 8 pages, 6 figures

arXiv:q-bio/0701033 [pdf, ps, other]

Noise Correlation Induced Synchronization in a Mutualism Ecosystem

Authors: Wei-Rong Zhong, Yuan-Zhi Shao, Zhen-Hui He, Meng-Jie Bie, Dan Huang

Abstract: Understanding the cause of the synchronization of population evolution is an important issue for ecological improvement. Here we present a Lotka-Volterra-type model driven by two correlated environmental noises and show, via theoretical analysis and direct simulation, that noise correlation can induce a synchronization of the mutualists. The time series of mutual species exhibit a chaotic-like f… ▽ More Understanding the cause of the synchronization of population evolution is an important issue for ecological improvement. Here we present a Lotka-Volterra-type model driven by two correlated environmental noises and show, via theoretical analysis and direct simulation, that noise correlation can induce a synchronization of the mutualists. The time series of mutual species exhibit a chaotic-like fluctuation, which is independent to the noise correlation, however, the chaotic fluctuation of mutual species ratio decreases with the noise correlation. A quantitative parameter defined for characterizing chaotic fluctuation provides a good approach to measure when the complete synchronization happens. △ Less

Submitted 21 January, 2007; originally announced January 2007.

Comments: 6pages,4figures

arXiv:q-bio/0601016 [pdf, ps, other]

doi 10.1103/PhysRevE.73.060902

Pure multiplicative stochastic resonance of anti-tumor model with seasonal modulability

Authors: Wei-Rong Zhong, Yuan-Zhi Shao, Zhen-Hui He

Abstract: The effects of pure multiplicative noise on stochastic resonance in an anti-tumor system modulated by a seasonal external field are investigated by using theoretical analyses of the generalized potential and numerical simulations. For optimally selected values of the multiplicative noise intensity quasi-symmetry of two potential minima and stochastic resonance are observed. Theoretical results a… ▽ More The effects of pure multiplicative noise on stochastic resonance in an anti-tumor system modulated by a seasonal external field are investigated by using theoretical analyses of the generalized potential and numerical simulations. For optimally selected values of the multiplicative noise intensity quasi-symmetry of two potential minima and stochastic resonance are observed. Theoretical results and numerical simulations are in good quantitative agreement. △ Less

Submitted 11 January, 2006; originally announced January 2006.

Comments: 5 pages, 5 figures

arXiv:q-bio/0508022 [pdf, ps, other]

Sensitive predators and endurant preys in an ecosystem driven by correlated noises

Authors: Wei-Rong Zhong, Yuan-Zhi Shao, Zhen-Hui He

Abstract: We investigate the Volterra ecosystem driven by correlated noises. The competition of the predators induces an increasing in population density of the predators. The competition of the preys, however, leads the predators to decay. The predators may have better stability under strong correlated noises. The predators undergo a sensitivity to a random environment, whereas the preys exhibit a surpri… ▽ More We investigate the Volterra ecosystem driven by correlated noises. The competition of the predators induces an increasing in population density of the predators. The competition of the preys, however, leads the predators to decay. The predators may have better stability under strong correlated noises. The predators undergo a sensitivity to a random environment, whereas the preys exhibit a surprising endurance to this stochasticity. △ Less

Submitted 17 August, 2005; originally announced August 2005.

Comments: 4 pages, 4 figures

Showing 1–26 of 26 results for author: He, Z