-
Dynamics of cell-type transition mediated by epigenetic modifications
Authors:
Rongsheng Huang,
Qiaojun Situ,
**zhi Lei
Abstract:
Maintaining tissue homeostasis requires appropriate regulation of stem cell differentiation. The Waddington landscape posits that gene circuits in a cell form a potential landscape of different cell types, wherein cells follow attractors of the probability landscape to develop into distinct cell types. However, how adult stem cells achieve a delicate balance between self-renewal and differentiatio…
▽ More
Maintaining tissue homeostasis requires appropriate regulation of stem cell differentiation. The Waddington landscape posits that gene circuits in a cell form a potential landscape of different cell types, wherein cells follow attractors of the probability landscape to develop into distinct cell types. However, how adult stem cells achieve a delicate balance between self-renewal and differentiation remains unclear. We propose that random inheritance of epigenetic states plays a pivotal role in stem cell differentiation and present a hybrid model of stem cell differentiation induced by epigenetic modifications. Our comprehensive model integrates gene regulation networks, epigenetic state inheritance, and cell regeneration, encompassing multi-scale dynamics ranging from transcription regulation to cell population. Through model simulations, we demonstrate that random inheritance of epigenetic states during cell divisions can spontaneously induce cell differentiation, dedifferentiation, and transdifferentiation. Furthermore, we investigate the influences of interfering with epigenetic modifications and introducing additional transcription factors on the probabilities of dedifferentiation and transdifferentiation, revealing the underlying mechanism of cell reprogramming. This \textit{in silico} model provides valuable insights into the intricate mechanism governing stem cell differentiation and cell reprogramming and offers a promising path to enhance the field of regenerative medicine.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Authors:
Yin Fang,
Xiaozhuan Liang,
Ningyu Zhang,
Kangwei Liu,
Rui Huang,
Zhuo Chen,
Xiaohui Fan,
Huajun Chen
Abstract:
Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular doma…
▽ More
Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular domain. Mol-Instructions encompasses three key components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions. Each component aims to improve the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on LLMs, we demonstrate the effectiveness of Mol-Instructions in enhancing large models' performance in the intricate realm of biomolecular studies, thus fostering progress in the biomolecular research community. Mol-Instructions is publicly available for ongoing research and will undergo regular updates to enhance its applicability.
△ Less
Submitted 4 March, 2024; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Retro Drug Design: From Target Properties to Molecular Structures
Authors:
Yuhong Wang,
Sam Michael,
Ruili Huang,
**ghua Zhao,
Katlin Recabo,
Danielle Bougie,
Qiang Shu,
Paul Shinn,
Hongmao Sun
Abstract:
To generate drug molecules of desired properties with computational methods is the holy grail in pharmaceutical research. Here we describe an AI strategy, retro drug design, or RDD, to generate novel small molecule drugs from scratch to meet predefined requirements, including but not limited to biological activity against a drug target, and optimal range of physicochemical and ADMET properties. Tr…
▽ More
To generate drug molecules of desired properties with computational methods is the holy grail in pharmaceutical research. Here we describe an AI strategy, retro drug design, or RDD, to generate novel small molecule drugs from scratch to meet predefined requirements, including but not limited to biological activity against a drug target, and optimal range of physicochemical and ADMET properties. Traditional predictive models were first trained over experimental data for the target properties, using an atom ty** based molecular descriptor system, ATP. Monte Carlo sampling algorithm was then utilized to find the solutions in the ATP space defined by the target properties, and the deep learning model of Seq2Seq was employed to decode molecular structures from the solutions. To test feasibility of the algorithm, we challenged RDD to generate novel drugs that can activate μ opioid receptor (MOR) and penetrate blood brain barrier (BBB). Starting from vectors of random numbers, RDD generated 180,000 chemical structures, of which 78% were chemically valid. About 42,000 (31%) of the valid structures fell into the property space defined by MOR activity and BBB permeability. Out of the 42,000 structures, only 267 chemicals were commercially available, indicating a high extent of novelty of the AI-generated compounds. We purchased and assayed 96 compounds, and 25 of which were found to be MOR agonists. These compounds also have excellent BBB scores. The results presented in this paper illustrate that RDD has potential to revolutionize the current drug discovery process and create novel structures with multiple desired properties, including biological functions and ADMET properties. Availability of an AI-enabled fast track in drug discovery is essential to cope with emergent public health threat, such as pandemic of COVID-19.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
A Pathologist-Annotated Dataset for Validating Artificial Intelligence: A Project Description and Pilot Study
Authors:
Sarah N Dudgeon,
Si Wen,
Matthew G Hanna,
Rajarsi Gupta,
Mohamed Amgad,
Manasi Sheth,
Hetal Marble,
Richard Huang,
Markus D Herrmann,
Clifford H. Szu,
Darick Tong,
Bruce Werness,
Evan Szu,
Denis Larsimont,
Anant Madabhushi,
Evangelos Hytopoulos,
Weijie Chen,
Rajendra Singh,
Steven N. Hart,
Joel Saltz,
Roberto Salgado,
Brandon D Gallas
Abstract:
Purpose: In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images (WSIs). We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eo…
▽ More
Purpose: In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images (WSIs). We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eosin-stained ductal carcinoma core biopsies prepared at a single clinical site. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the FDA via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Mining of high throughput screening database reveals AP-1 and autophagy pathways as potential targets for COVID-19 therapeutics
Authors:
Hu Zhu,
Catherine Z. Chen,
Srilatha Sakamuru,
Anton Simeonov,
Mathew D. Hall,
Menghang Xia,
Wei Zheng,
Ruili Huang
Abstract:
The recent global pandemic of Coronavirus Disease 2019 (COVID-19) caused by the new coronavirus SARS-CoV-2 presents an urgent need for new therapeutic candidates. Many efforts have been devoted to screening existing drug libraries with the hope to repurpose approved drugs as potential treatments for COVID-19. However, the antiviral mechanisms of action for the drugs found active in these phenotypi…
▽ More
The recent global pandemic of Coronavirus Disease 2019 (COVID-19) caused by the new coronavirus SARS-CoV-2 presents an urgent need for new therapeutic candidates. Many efforts have been devoted to screening existing drug libraries with the hope to repurpose approved drugs as potential treatments for COVID-19. However, the antiviral mechanisms of action for the drugs found active in these phenotypic screens are largely unknown. To deconvolute the viral targets for more effective anti-COVID-19 drug development, we mined our in-house database of approved drug screens against 994 assays and compared their activity profiles with the drug activity profile in a cytopathic effect (CPE) assay of SARS-CoV-2. We found that the autophagy and AP-1 signaling pathway activity profiles are significantly correlated with the anti-SARS-CoV-2 activity profile. In addition, a class of neurology/psychiatry drugs was found significantly enriched with anti-SARS-CoV-2 activity. Taken together, these results have provided new insights into SARS-CoV-2 infection and potential targets for COVID-19 therapeutics.
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
Eye-Movement Control During the Reading of Chinese: An Analysis Using the Landolt-C Paradigm
Authors:
Yan** Liu,
Erik D. Reichle,
Ren Huang
Abstract:
Participants in an eye-movement experiment performed a modified version of the Landolt-C paradigm (Williams & Pollatsek, 2007) in which they searched for target squares embedded in linear arrays of spatially contiguous "words" (i.e., short sequences of squares having missing segments of variable size and orientation). Although the distributions of single- and first-of-multiple fixation locations r…
▽ More
Participants in an eye-movement experiment performed a modified version of the Landolt-C paradigm (Williams & Pollatsek, 2007) in which they searched for target squares embedded in linear arrays of spatially contiguous "words" (i.e., short sequences of squares having missing segments of variable size and orientation). Although the distributions of single- and first-of-multiple fixation locations replicated previous patterns suggesting saccade targeting (e.g., Yan, Kliegl, Richter, Nuthmann, & Shu, 2010), the distribution of all forward fixation locations was uniform, suggesting the absence of specific saccade targets. Furthermore, properties of the "words" (e.g., gap size) also influenced fixation durations and forward saccade length, suggesting that on-going processing affects decisions about when and where (i.e., how far) to move the eyes. The theoretical implications of these results for existing and future accounts of eye-movement control are discussed.
△ Less
Submitted 25 March, 2015;
originally announced March 2015.
-
Poly-Omic Prediction of Complex Traits: OmicKriging
Authors:
Heather E. Wheeler,
Keston Aquino-Michaels,
Eric R. Gamazon,
Vassily V. Trubetskoy,
M. Eileen Dolan,
R. Stephanie Huang,
Nancy J. Cox,
Hae Kyung Im
Abstract:
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevanc…
▽ More
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. We also integrate genotype and expression data to predict change in LDL cholesterol levels after statin treatment and show that OmicKriging performs better than the polygenic score method. We provide an R package to implement OmicKriging.
△ Less
Submitted 12 September, 2013; v1 submitted 7 March, 2013;
originally announced March 2013.