-
Large Language Models in Plant Biology
Authors:
Hilbert Yuen In Lam,
Xing Er Ong,
Marek Mutwil
Abstract:
Large Language Models (LLMs), such as ChatGPT, have taken the world by storm and have passed certain forms of the Turing test. However, LLMs are not limited to human language and analyze sequential data, such as DNA, protein, and gene expression. The resulting foundation models can be repurposed to identify the complex patterns within the data, resulting in powerful, multi-purpose prediction tools…
▽ More
Large Language Models (LLMs), such as ChatGPT, have taken the world by storm and have passed certain forms of the Turing test. However, LLMs are not limited to human language and analyze sequential data, such as DNA, protein, and gene expression. The resulting foundation models can be repurposed to identify the complex patterns within the data, resulting in powerful, multi-purpose prediction tools able to explain cellular systems. This review outlines the different types of LLMs and showcases their recent uses in biology. Since LLMs have not yet been embraced by the plant community, we also cover how these models can be deployed for the plant kingdom.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Ontology-based systematic classification and analysis of coronaviruses, hosts, and host-coronavirus interactions towards deep understanding of COVID-19
Authors:
Hong Yu,
Li Li,
Hsin-hui Huang,
Yang Wang,
Yingtong Liu,
Edison Ong,
Anthony Huffman,
Tao Zeng,
**gsong Zhang,
Pengpai Li,
Zhi** Liu,
Xiangyan Zhang,
Xianwei Ye,
Samuel K. Handelman,
Gerry Higgins,
Gilbert S. Omenn,
Brian Athey,
Junguk Hur,
Luonan Chen,
Yongqun He
Abstract:
Given the existing COVID-19 pandemic worldwide, it is critical to systematically study the interactions between hosts and coronaviruses including SARS-Cov, MERS-Cov, and SARS-CoV-2 (cause of COVID-19). We first created four host-pathogen interaction (HPI)-Outcome postulates, and generated a HPI-Outcome model as the basis for understanding host-coronavirus interactions (HCI) and their relations wit…
▽ More
Given the existing COVID-19 pandemic worldwide, it is critical to systematically study the interactions between hosts and coronaviruses including SARS-Cov, MERS-Cov, and SARS-CoV-2 (cause of COVID-19). We first created four host-pathogen interaction (HPI)-Outcome postulates, and generated a HPI-Outcome model as the basis for understanding host-coronavirus interactions (HCI) and their relations with the disease outcomes. We hypothesized that ontology can be used as an integrative platform to classify and analyze HCI and disease outcomes. Accordingly, we annotated and categorized different coronaviruses, hosts, and phenotypes using ontologies and identified their relations. Various COVID-19 phenotypes are hypothesized to be caused by the backend HCI mechanisms. To further identify the causal HCI-outcome relations, we collected 35 experimentally-verified HCI protein-protein interactions (PPIs), and applied literature mining to identify additional host PPIs in response to coronavirus infections. The results were formulated in a logical ontology representation for integrative HCI-outcome understanding. Using known PPIs as baits, we also developed and applied a domain-inferred prediction method to predict new PPIs and identified their pathological targets on multiple organs. Overall, our proposed ontology-based integrative framework combined with computational predictions can be used to support fundamental understanding of the intricate interactions between human patients and coronaviruses (including SARS-CoV-2) and their association with various disease outcomes.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
A scalable method for molecular network reconstruction identifies properties of targets and mutations in acute myeloid leukemia
Authors:
Edison Ong,
Anthony Szedlak,
Yunyi Kang,
Peyton Smith,
Nicholas Smith,
Madison McBride,
Darren Finlay,
Kristiina Vuori,
James Mason,
Edward D. Ball,
Carlo Piermarocchi,
Giovanni Paternostro
Abstract:
A key aim of systems biology is the reconstruction of molecular networks, however we do not yet have networks that integrate information from all datasets available for a particular clinical condition. This is in part due to the limited scalability, in terms of required computational time and power, of existing algorithms. Network reconstruction methods should also be scalable in the sense of allo…
▽ More
A key aim of systems biology is the reconstruction of molecular networks, however we do not yet have networks that integrate information from all datasets available for a particular clinical condition. This is in part due to the limited scalability, in terms of required computational time and power, of existing algorithms. Network reconstruction methods should also be scalable in the sense of allowing scientists from different backgrounds to efficiently integrate additional data. We present a network model of acute myeloid leukemia (AML). In the current version (AML 2.1) we have used gene expression data (both microarray and RNA-seq) from five different studies comprising a total of 771 AML samples and a protein-protein interactions dataset. Our scalable network reconstruction method is in part based on the well-known property of gene expression correlation among interacting molecules. The difficulty of distinguishing between direct and indirect interactions is addressed optimizing the coefficient of variation of gene expression, using a validated gold standard dataset of direct interactions. Computational time is much reduced compared to other network reconstruction methods. A key feature is the study of the reproducibility of interactions found in independent clinical datasets. An analysis of the most significant clusters, and of the network properties (intraset efficiency, degree, betweenness centrality and PageRank) of common AML mutations demonstrated the biological significance of the network. A statistical analysis of the response of blast cells from eleven AML patients to a library of kinase inhibitors provided an experimental validation of the network. A combination of network and experimental data identified CDK1, CDK2, CDK4 and CDK6 and other kinases as potential therapeutic targets in AML.
△ Less
Submitted 25 July, 2014;
originally announced July 2014.
-
Prediction of kinase inhibitor response using activity profiling, in-vitro screening, and elastic net regression
Authors:
Trish Tran,
Edison Ong,
Andrew P. Hodges,
Giovanni Paternostro,
Carlo Piermarocchi
Abstract:
Many kinase inhibitors have been approved as cancer therapies. Recently, libraries of kinase inhibitors have been extensively profiled, thus providing a map of the strength of action of each compound on a large number of its targets. These profiled libraries define drug-kinase networks that can predict the effectiveness of new untested drugs and elucidate the role played by specific kinases in dif…
▽ More
Many kinase inhibitors have been approved as cancer therapies. Recently, libraries of kinase inhibitors have been extensively profiled, thus providing a map of the strength of action of each compound on a large number of its targets. These profiled libraries define drug-kinase networks that can predict the effectiveness of new untested drugs and elucidate the role played by specific kinases in different cellular systems. Predictions of drug effectiveness based on a comprehensive network model of cellular signalling are difficult, due to our partial knowledge of the complex biological processes downstream of the targeted kinases. We have developed the Kinase Inhibitors Elastic Net (KIEN) method, which integrates information contained in drug-kinase networks with in vitro screening. The method uses the in vitro cell response of single drugs and drug pair combinations as a training set to build linear and nonlinear regression models. Besides predicting the effectiveness of untested drugs, the method identifies sets of kinases that are statistically associated to drug sensitivity in a given cell line. We compare different versions of the method, which is based on a regression technique known as elastic net. Data from two-drug combinations leads to predictive models, and predictivity can be improved by applying logarithmic transformation to the data. The method is applied to the A549 lung cancer cell line. A pathway enrichment analysis of the set of kinases identified by the method shows that axon guidance, activation of Rac, and semaphorin interactions pathways are associated to a selective response to therapeutic intervention in this cell line.
△ Less
Submitted 17 July, 2013;
originally announced July 2013.